Generalized Linear Modelling Vs Linear Regression
Generalized linear modeling can be used in data analysis that cannot be performed using linear regression, for instance:- If the relationship between the values in X and those in Y is not linear, which means it is more likely to be exponential
- If the Y variance is not constant with regards to the X variance. Here, the Y variance increases when the X variance increases
- If Y is a discrete variable
The components of generalized linear models
Generalized linear models consist of three major components:- Linear predictor: This is a linear combination of the explanatory variable (x) and parameter (b)
- Link function: The link function connects, or rather, links the parameter and the linear predictor for a probability distribution. When using Poisson regression the link function is typically the log function.
- Probability distribution: This component generates the observed variable (y).
Advantages of generalized linear modeling
Generalized linear modeling has a number of advantages over the traditional ordinary least squares regression. Here are some of them:- You do not necessarily have to transform the response (y) in order to have a normal distribution
- There is more flexibility in modeling because the link choice is separate from the random component choice
- If the link has additive effects, then a constant variance is not needed
- The models are fitted using maximum likelihood estimation. This ensures the maximum properties of the estimators.
Disadvantages of generalized linear modeling
- Does not allow the selection of features without step-by-step selection
- There are strict assumptions around the randomness of error terms and the distribution shape
- The predictor variables must be uncorrelated
- Generalized linear models cannot detect non-linearity directly. However, this can be addressed manually through feature engineering
- Generalized linear modeling is sensitive to outliers
- Low predictive power
Assumptions of generalized linear modeling
GLMs make some strict assumptions with regard to the data structure. These assumptions are based on:- Independence of each point of data
- Proper distribution of the residuals
- Proper specification of the structure of the variance
- A linear relationship between the linear predictor and the response (y)
Generalized linear model extensions
The standard generalized linear modeling assumes that all observations are not correlated. Over the years, developments have been made to generalized linear models to enable correlation between observations. These developments have resulted in the following extensions of generalized linear models:- Generalized estimation equations: This extension allows observations to be correlated without applying explicit probability models. They are commonly used when random effects and variances are not of integral interest because they allow correlation without defining its origin. The main purpose of a generalized estimation equation is to make an approximation of the average response (y) of a given population rather than defining regression parameters. For effective data analysis, generalized estimation equations are often used together with Huber-White standard errors.
- Generalized linear mixed model: This extension incorporates effects in the linear predictor producing an explicit probability model that defines the origin or a correlation. Generalized linear mixed models are also referred to as mixed models or multilevel models. They are more computationally intensive than generalized estimation equations.
- Generalized additive models: In this extension of generalized linear modeling, the linear predictor is not restricted to the covariates; it is the sum of the smoothing functions estimated from the set of data being analyzed.