Generalization measures
In supervised learning, we use the validation loss—more formally known as the generalization gap in empirical risk minimization settings—as our working metric for measuring in-distribution generalization . For a parameterized model optimized against a loss function ,
where are the data and supervision labels (i.e., micrographs and segmentation masks) drawn from an underlying distribution and is the finite sample of data used to optimize the model . The loss represents the performance of the model over the entire domain , but in practice, we approximate this loss using a second finite dataset—deemed the validation dataset—which should be drawn independently and identically distributed to the training sample . During model training, we optimize model weights against the training loss ; commonly, final weights for a model are instead selected to correspond to parameters which minimize .
Using the shorthand , we can analogously define an out-of-distribution generalization metric as
where is the training distribution of and is a new distribution of data which differs from , i.e., under some measure, . Both terms in Eq. (4.2) must be approximated with validation datasets in practice. We note, also, that Eq. (4.2) is not always nonnegative, unlike Eq. (4.1) (which, in theory, should be): one can imagine a scenario in which a model improves out of distribution.