Statistical model validation: Difference between revisions

Content deleted Content added
→‎See also: annlink
 
Line 10:
In general, models can be validated using existing data or with new data, and both methods are discussed more in the following subsections, and a note of caution is provided, too.
 
=== Validation with Existing Data ===
Validation based on existing data involves analyzing the [[goodness of fit]] of the model or analyzing whether the [[Errors and residuals|residuals]] seem to be random (i.e. [[#Residual diagnostics|residual diagnostics]]). This method involves using analyses of the models closeness to the data and trying to understand how well the model predicts its own data. One example of this method is in Figure 1, which shows a polynomial function fit to some data. We see that the polynomial function does not conform well to the data, which appears linear, and might invalidate this polynomial model.
 
Commonly, statistical models on existing data are validated using a validation set, which may also be referred to as a holdout set. A validation set is a set of data points that the user leaves out when fitting a statistical model. After the statistical model is fitted, the validation set is used as a measure of the model's error. If the model fits well on the initial data but has a large error on the validation set, this is a sign of overfitting, as seen in Figure 1.
 
[[Image:Overfitted Data.png|thumb|300px|Figure 1.  Data (black dots), which was generated via the straight line and some added noise, is perfectly fitted by a curvy [[polynomial]].]]
 
=== Validation with New Data ===
If new data becomes available, an existing model can be validated by assessing whether the new data is predicted by the old model. If the new data is not predicted by the old model, then the model might not be valid for the researcher's goals.
 
With this in mind, a modern approach is to validate a neural network is to test its performance on domain-shifted data. This ascertains if the model learned domain-invariant features.<ref>{{Cite book |last1=Feng |first1=Cheng |last2=Zhong |first2=Chaoliang |last3=Wang |first3=Jie |last4=Zhang |first4=Ying |last5=Sun |first5=Jun |last6=Yokota |first6=Yasuto |title=Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence |chapter=Learning Unforgotten Domain-Invariant Representations for Online Unsupervised Domain Adaptation |date=July 2022 |pages=2958–2965 |location=California |publisher=International Joint Conferences on Artificial Intelligence Organization |doi=10.24963/ijcai.2022/410|isbn=978-1-956792-00-3 |doi-access=free }}</ref>
 
=== A Note of Caution ===
A model can be validated only relative to some application area.<ref name="NRC12" /><ref name="BBKK">{{citation | author1-first= J. J. | author1-last= Batzel | author2-first= M. | author2-last= Bachar | author3-first= J. M. | author3-last= Karemaker | author4-first= F. | author4-last= Kappel | pages= 3–19 | chapter= Chapter 1: Merging mathematical and physiological knowledge | editor1-first= J. J. | editor1-last= Batzel | editor2-first= M. | editor2-last= Bachar | editor3-first= F. | editor3-last= Kappel | title= Mathematical Modeling and Validation in Physiology | publisher= [[Springer Science+Business Media|Springer]] | year= 2013 | doi= 10.1007/978-3-642-32882-4_1}}.</ref> A model that is valid for one application might be invalid for some other applications. As an example, consider the curve in Figure&nbsp;1: if the application only used inputs from the interval [0,&nbsp;2], then the curve might well be an acceptable model.