Effect of training dataset imaging conditions on model performance
Naturally, beyond training hyperparameters, the content of the training datasets has the largest impact on model performance. Within the context of transfer learning, it is important to try to understand the relationship between pretraining datasets to the transfer learning domain(s) of interest and any impacts of such relationships on model performance. In our study, it seems that the source of the pretraining data has only a minor impact on model performance, possibly due to fairly constrained data domain. As shown in Figure 7.1, bottom, model performance is fairly consistent in the transfer learning domain after transfer learning has been performed regardless of the pretraining data, even though after only pretraining, there is notable performance dependence on the pretraining domain. It is useful, also, to consider an idealized comparison baseline to contextualize the success (or lack thereof) of transfer learning. In Figure 7.2, we showcase results from a series of models trained in just a single stage utilizing a mixture of two datasets. For a mixture of -10nm and 5nm defocus data, with 5nm data serving as the transfer domain, performance after transfer learning and when utilizing a single training phase seem comparable for similar amounts of overall 5nm data. However, though this seems more strongly dependent on the domains of interest, as for some defocus combinations the two approaches have more divergent behavior. Importantly, in many scenarios, transfer learning can readily provide performance comparable to training entirely with data in the transfer domain, even with a small dataset.
There can also exist fundamental asymmetries in the relative information between two ostensibly similar data domains, resulting in a greater utility in using one dataset to pretrain over the other. In our experiments, this is most obvious with the ‘in-focus’ dataset, which is the hardest for our models to successfully segment; yet, models trained on the in-focus data can achieve better performance out-of-distribution and are much more successful transfer learning to a new focal point than model trained on other focal points are to the in-focus data. This suggests that using more general and difficult training datasets for pretraining can possibly provide a distinct advantage in transfer learning workflows.

