Neural network architecture and training procedure

Figure 3.1:Architecture of U-Net neural network with ResNet backbone. Initial processing block consists a convolutional layer, a batch norm and activation layer, and a max pooling layer. The encoder half of the model encompasses the initial block and the downsampling blocks, while the decoder half encompasses the upsampling blocks.
For our neural network architecture, we use a standard U-Net model following previous work Rangel DaCosta et al., 2024Ronneberger et al., 2015 and shown in Figure 3.1; in total, each network had about 14 million trainable parameters. Using PyTorch Paszke et al., 2019 for model training and a analysis, we optimize U-Net models in two fixed length stages of 10 epochs each, corresponding to the pretraining and transfer learning stages. In the pretraining stage, we utilized a training dataset size of 512 images before dihedral augmentation and in the transfer learning stage, we variably trained models with 64, 128, 256, and 512 images, using a fixed minibatch size of 16 images in both stages. We optimize model parameters with the Adam optimizer Kingma & Ba, 2017 against the cross entropy loss of their predictions with learning rates of 1e-3 and 1e-4 during pretraining and 1e-4 and 1e-5 during transfer learning and a multiplicative learning rate decay factor of 0.99 per epoch. We also test model weight freezing approaches, making either all weights trainable or freezing parameters within the decoder or encoder portions of the U-Net model. For each hyperparameter/dataset configuration, we train 3 randomly initialized models. In our visualizations and analysis, we adjust measured losses by an offset of 0.3133, which is the minimum achievable cross entropy loss for a two-class segmentation or classification task.
Including all model variations across training hyperparameter conditions, we trained 2400 unique model configurations for a total of 7200 total trained models on the CdSe defocus dataset. We reduce the training conditions we test for the noise varying and structure varying datasets, adjusting only the weight freezing protocols and fixing learning rates to 1e-4 during pretraining and 1e-5 during transfer learning and the transfer dataset size to 128, for an additional 180 and 2754 models, respectively. The training of the models for the noise and structure varying series utilized automated mixed-precision training Micikevicius et al., 2018, which downcasts certain computations to half-precision to reduce training memory costs and time, and GPU power capping to 250W and limits the total power draw of GPUs (and thus energy consumption) at a small increase in training time Acun et al., 2024.
Lastly we ran a baseline comparison series of neural networks, trained only in a single pretraining phase using mixtures of 128:384, 256:256, and 384:128 images for each of unique dataset pairs in the defocus model series. We use both pretraining learning rates of 1e-3 and 1e-4. The baseline comparison series comprises an additional 720 models.
- Rangel DaCosta, L., Sytwu, K., Groschner, C. K., & Scott, M. C. (2024). A Robust Synthetic Data Generation Framework for Machine Learning in High-Resolution Transmission Electron Microscopy (HRTEM). Npj Comput Mater, 10(1), 1–11. 10.1038/s41524-024-01336-0
- Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv. 10.48550/arXiv.1505.04597
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Köpf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., … Chintala, S. (2019). PyTorch: an imperative style, high-performance deep learning library. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc.
- Kingma, D. P., & Ba, J. (2017). Adam: A Method for Stochastic Optimization. https://arxiv.org/abs/1412.6980
- Micikevicius, P., Narang, S., Alben, J., Diamos, G., Elsen, E., Garcia, D., Ginsburg, B., Houston, M., Kuchaiev, O., Venkatesh, G., & Wu, H. (2018). Mixed Precision Training. https://arxiv.org/abs/1710.03740
- Acun, F., Zhao, Z., Austin, B., Coskun, A. K., & Wright, N. J. (2024). Analysis of Power Consumption and GPU Power Capping for MILC. SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, 1856–1861. 10.1109/SCW63240.2024.00232