Strategies for effective transfer learning for HRTEM analysis with neural networks

Luis Rangel DaCosta; Henry Oill; Mary Scott

doi:10.69761/kjte5358

Computational Article

Contents

Considering these case studies altogether, transfer learning can be successfully employed under a variety of conditions. Large datasets in the transfer learning domain are not necessary to curate, but more data will always be more helpful. We recommend freezing large sections of the model to maximize training efficiency, especially during any dataset or hyperparameter exploration phases, and gradually unfreezing components to improve performance, as needed. In terms of model pretraining, utilizing a large, general dataset that closely matches the target domain will likely provide the best performance, and when the target domain is hard to match or predict, diversity of the training dataset will likely become more important. If model generalization to out-of-distribution scenarios is of particular importance to a given use-case or deployment scenario, e.g., for deploying a neural network model in an in situ experiment with rapidly fluctuating imaging conditions, we recommend exploring training datasets for both pretraining and transfer learning to understand how they may induce good OOD behavior for a domain of interest. Quantifying OOD behavior may be easiest to do with a set of diverse auxiliary datasets against which one could evaluate model performance.

Using performance data from over 10500 models, we demonstrate that transfer learning can be an effective strategy for adapting a model to a new (but similar) domain while achieving similar performance levels as direct training on the target domain. In particular, we investigated domain shifts which may be commonly present in electron microscopy experiments, including shifts in imaging conditions via defocus, noise distribution, and atomic structure distribution, all which we demonstrate can be reasonably addressed via transfer learning. Aggressive weight freezing can significantly improve training time with only a small performance compromise, and in many cases, the dependence of model performance on the source or distribution pretraining data is minimal, though the generalization behavior of a model out-of-distribution depends more strongly on both the pretraining and transfer learning datasets and their relationship.

Recipes and conclusions