Multivariate Classificaiton

Multivariate classification methods analyze multiple rating variables simultaneously to calculate rate relativities, properly accounting for correlations among the variables.

[!IMPORTANT] The Loss Ratio and Adjusted Pure Premium approaches only approximately correct for exposure correlation. They are not as accurate as multivariate techniques like Generalized Linear Models (GLMs). Do not suggest them as solutions when severe exposure correlation is present.

Benefits of Multivariate Analysis and GLMs

Properly Adjust for Exposure Correlations: Adjusts for instances where the distribution of one variable varies by another.
Focus on the “Signal”: Separates systematic risk factors from random noise in the loss data.
Provide Statistical Diagnostics: Outputs confidence intervals and significance tests for parameter estimates.
Consider Interactions: Accounts for response correlation where the effect of one variable differs depending on the level of another.

Exposure Correlation vs. Response Correlation

Exposure Correlation: Occurs when the distribution of exposures for one variable differs by the levels of another.
- Example: $75\%$ of 16-year-old drivers are male, whereas only $50\%$ of 50-year-old drivers are male.
Response Correlation (Interaction): Occurs when the relationship between a predictor and the response variable changes depending on the level of another predictor.
- Example: 16-year-old males have twice the claim frequency of 16-year-old females, but 50-year-old males and females have the same claim frequency. An interaction term is required to capture this.

Alternative Multivariate Approaches

1. Minimum Bias Procedure (MBP)

The Minimum Bias Procedure is an iterative numerical technique used to solve for relativities by minimizing a specified loss function (e.g., balance principle, least squares).

Advantage: Properly corrects for exposure correlation.
Disadvantages:
- Does not provide statistical diagnostics to test if variables are statistically significant.
- Computationally inefficient compared to modern regression algorithms.

2. Sequential Analysis

Sequential analysis computes relativities in a step-by-step, one-pass manner.

Methodology:
1. Perform a standard univariate analysis to determine relativities for the first variable.
2. Use the Adjusted Pure Premium method to obtain indicated relativities for the second variable, adjusting the exposures using the selected relativities of the first variable.
3. Repeat this adjustment sequentially for all remaining variables.
Pros:
- Non-iterative (only requires one pass).
- Allowed by specific regulations (e.g., California personal auto rating rules).
Cons:
- No closed-form solution; results depend heavily on the order in which variables are analyzed.

Generalized Linear Models (GLMs)

GLMs extend ordinary linear regression to allow for non-normal error distributions and non-linear relationships via a link function.

Mathematical Formulation

Linear Model: $Y_i = \mu_i + \epsilon_i = \beta_0 + \beta_1 X_{1,i} + \dots + \beta_p X_{p,i} + \epsilon_i$
GLM: $g(\mu_i) = \beta_0 + \beta_1 X_{1,i} + \dots + \beta_p X_{p,i}$

Where:

$Y_i$ : The target variable (e.g., pure premium, claim count, or severity).
$\mu_i$ : The expected value of $Y_i$ (predicted mean).
$X_{j,i}$ : The rating variables (predictors).
$\beta_j$ : Parameters estimated using Maximum Likelihood Estimation (MLE).
$g(\mu)$ : The link function (e.g., a logarithmic link $\ln(\mu)$ is typically used for multiplicative rating plans).

Key Modeling Decisions

Compile a clean, historical database with sufficient volume.
Select an appropriate link function (e.g., log-link).
Specify the underlying probability distribution of the random process (e.g., Poisson for frequency, Gamma for severity).
Apply MLE to estimate model parameters.

Why GLMs Predict Frequency and Severity Individually (Not Loss Ratios)

Premium Dependency: Loss ratios depend on current premium rates. Any rate change renders a loss ratio model obsolete.
No Priori Expectation: Unlike pure premiums, there is no priori physical distribution model for loss ratios.
No Standard Distribution: Loss ratios do not follow standard exponential family distributions.
Granular Insights: Modeling frequency and severity separately allows the actuary to model specific perils (e.g., wind, theft) or coverages (e.g., collision, bodily injury) independently.

Interpreting GLM Outputs

!assets/images/2025/10/Multivariate-1759310370589.webp

Exposures: Shown as bars (plotted on the right-hand axis). The level with the most exposures is usually selected as the base level (Relativity = $1.0$ ).
Univariate vs. GLM: Significant differences between univariate (one-way) and GLM relativities point to exposure correlation. The GLM controls for this correlation, while univariate analysis does not.
Counterintuitive Results: If the relativity for a higher deductible (e.g., $10,000) is higher than that of a lower deductible (e.g., $5,000) due to limited data, the result is counterintuitive and should not be implemented without manual adjustment or smoothing.

GLM Diagnostics

Actuaries use several diagnostic tools to evaluate and compare GLM specifications:

Standard Errors: Evaluate the width of the confidence interval around parameter estimates to determine if a specific class level or the entire variable is statistically significant.
Deviance Tests: Use likelihood ratio tests ( $\chi^2$ statistic, $F$ -statistic, AIC, or BIC) to compare nested models and decide if adding a variable justifies the increased model complexity. Complex models are typically selected if the $\chi^2$ p-value is below $5\%$ .
Time Consistency: Fit the model to separate, consecutive historical periods. If the estimated parameters remain stable across periods, the model shows time consistency. !assets/images/2025/10/Multivariate-1759310517358.webp
Validation Test (Holdout Set): Fit the model on a training subset and test predictions on a holdout validation set. This identifies underfitting or overfitting. !assets/images/2025/10/Multivariate-1759310722415.webp
Actuarial Judgement: Review findings for logical consistency. Overrule model indications that contradict physical risk characteristics (e.g., higher deductibles yielding higher rates) due to sparse data.

Data Mining & Advanced Machine Learning Techniques

Beyond GLMs, other techniques can be applied in risk classification:

Factor Analysis / PCA: Reduces the dimensionality of rating variables by constructing a smaller set of uncorrelated factors.
Cluster Analysis: Groups similar geographic units or risk profiles based on loss characteristics.
Classification and Regression Trees (CART): Utilizes recursive partitioning to create if-then decision rules for classification.
Multivariate Adaptive Regression Splines (MARS): Models non-linear relationships by converting continuous variables into piecewise linear pieces (splines).
Neural Networks: Identifies complex, non-linear interactions and patterns in highly dimensional datasets.

*[GLMs]: Generalized Linear Models *[GLM]: Generalized Linear Model *[MBP]: Minimum Bias Procedures *[RV]: Rating Variables