OVERFITTING

Overfitting overview

Overfitting of a model means that the model is too well trained i.e. it takes the noises and fluctuations in the training data as a set of observations to train the model, due to this the model’s ability to predict the output with the new data is affected. the model’s accuracy is highly affected.

If the training data has many input factors it will have noise i.e. the randomness in the data which reduces the ability of the model to generalize. overfitted model will have high variance and low bias

Variance is how much a model changes in response to the training data. Bias is the flip side of variance as it represents the strength of our assumptions we make about our data. bias and variance are forms of prediction error in machine learning.

Overfit can be detected by dividing the data into training and test, if the model performs well on the training data than the test data, the model is overfitting

How to avoid Overfitting?

Finding overfitting in model is important but is of no use if we solve the problem. There are many methods for solving overfitting they are cross validation, train with more data, removing features, regularization

Cross validation:

There are many methods in cross validation they are k-fold, random subsampling, leave one out cross validation
In K Fold cross validation, the data is divided into k subsets. After that holdout method is performed k times (i.e.) ( holdout method means portioning the dataset into train and test and using test we can get the accuracy the model but there will error induced ) such that each time, one of the k subsets is used as the test set/ validation set and the other k-1 subsets are put together to form a training set.
Overall in cross validation the data is partitioned and used as training and test data to reduce overfitting but it will not fully prevent overfitting.

Train with more data:

It is not a definite solution it may not work sometimes but training the model with more data can help the model to detect the pattern in the dataset more effectively, but it can increase the risk of randomness in the data to prevent that the dataset has to be cleaned before training the model.

Removing features:

Dataset may contain many unwanted features vectors for the model which may became noise that will reduce the accuracy if the model prediction, so the unwanted features have to be removed to prevent the overfitting.

Early stopping:

Early stopping means When we are training a model iteratively, you can measure how well each iteration of the model performs.
Up until a certain number of iterations, new iterations improve the model. After that point, however, the model’s ability to generalize can weaken as it begins to overfit the training data.
Early stopping refers stopping the training process before the learner passes that point. It is mostly used in deep learning.

Regularization:

Regularization refers to a range of techniques for artificially forcing your model to be simpler.The method will depend on the type of algorithm we are using to train the model

BIAS-VARIANCE tradeoff:

There are three types of prediction error: bias, variance, noise
Noise is the randomness in the data set it can be reduced by choosing the correct algorithm to train the model.
Bias is the difference between the expected predictions and the true value we are trying to predict using the model.
Variance is the model’s sensitivity to change according to the dataset

Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

If the model has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it will have high variance and low bias. So we need to find the good balance without overfitting and underfitting the data.
This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.

Total Error
To build a good model, we need to find a good balance between bias and variance such that it minimizes the total error.
TOTAL ERROR =BIAS^2+ VARIANCE+IRREDUCIBLE ERROR(NOISE)

This bias variance tradeoff

Regularization – Ridge, LASSO:

Ridge and lasso are regularization techniques to reduce the model complexity and preventing the overfitting of model

In ridge regularization, the cost function is altered by adding a penalty equivalent to the square of the magnitude of the coefficients. Ridge regularization shrinks the coefficients and it helps to reduce the model complexity and multi-collinearity

In Lasso (least absolute shrinkage and selection operator) it adds penalty equivalent to absolute value of the magnitude of coefficients.
Lasso regression not only helps in reducing overfitting but it can help us in feature selection

ANOVA and f -test overview:

An ANOVA test is a way to find out if survey or experiment results are significant. In other words, they help you to figure out if you need to reject the null hypothesis or accept the alternate hypothesis. Basically, you’re testing groups to see if there’s a difference between them.
The F-test is used in regression analysis to test the hypothesis that all model parameters are zero. It is also used in statistical analysis when comparing statistical models that have been fitted using the same underlying factors and data set to determine the model with the best fit