---
title: "Introduction"
output: rmarkdown::html_vignette
bibliography: ../inst/REFERENCES.bib
link-citations: yes
vignette: >
%\VignetteIndexEntry{Introduction}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(galamm)
```
This vignette is aimed to give you a high-level overview of the types of models supported by the galamm package, and to point you to relevant vignettes where you can find more information.
## Generalized Additive Latent and Mixed Models
Generalized additive latent and mixed models (GALAMMs) [@sorensenLongitudinalModelingAgeDependent2023] is an extension of generalized linear latent and mixed models (GLLAMMs) [@rabe-heskethGeneralizedMultilevelStructural2004;@skrondalGeneralizedLatentVariable2004] which allows both observed responses and latent variables to depend smoothly on observed variables. *Smoothly* here means that the relationship is not assumed to follow a particular parametric form, e.g., as specified by a linear model. Instead, an *a priori* assumption is made that the relationship is smooth, and the model then attempts to learn the relationship from the data. GALAMM uses smoothing splines to obtain this, identically to how generalized additive models (GAMs) [@woodGeneralizedAdditiveModels2017] are estimated.
The GLLAMM framework contains many elements which are currently not implemented in the galamm package. This includes both nonparametric random effects and a large number of model families, e.g., for censored responses. If you need any of this, but not semiparametric estimation, the Stata based [GLLAMM package](http://www.gllamm.org/) is likely the place you should go. Conversely, galamm incorporates crossed random effects easily and efficiently, while these are hard to specify using GLLAMM.
### Response Model
GALAMMs are specified using three building blocks. First, $n$ responses $y_{1}, \dots, y_{n}$ are assumed independently distributed according to an exponential family with density
$$
f\left(y | \theta, \phi\right) = \exp \left( \frac{y\theta(\mu) - b\left(\theta(\mu)\right)}{\phi} + c\left(y, \phi\right) \right)
$$
here $\mu = g^{-1}(\nu)$ is the mean, $g^{-1}(\cdot)$ is the inverse of link function $g(\cdot)$, $\nu$ is a "nonlinear predictor", $\phi$ is a dispersion parameter, and $b(\cdot)$ and $c(\cdot)$ are known functions. In contrast to what is assumed, e.g., by [lme4](https://cran.r-project.org/package=lme4) [@batesFittingLinearMixedEffects2015], the functions $b(\cdot)$, $c(\cdot)$, and $g(\cdot)$ are allowed to vary between observations. That is, the observations can come from different members of the exponential family. The vignette on [models with mixed response types](https://lcbc-uio.github.io/galamm/articles/mixed_response.html) describes this in detail.
Using canonical link functions, the response model simplifies to
$$
f\left(y | \nu, \phi\right) = \exp \left( \frac{y\nu - b\left(\nu\right)}{\phi} + c\left(y, \phi\right) \right)
$$
### Nonlinear Predictor
Next, the nonlinear predictor, which corresponds to the measurement model in a classical structural equation model, is defined by
$$
\nu = \sum_{s=1}^{S} f_{s}\left(\mathbf{x}\right) + \sum_{l=2}^{L}\sum_{m=1}^{M_{l}} \eta_{m}^{(l)} \mathbf{z}^{(l)}_{m}{}^{'}\boldsymbol{\lambda}_{m}^{(l)},
$$
where $\mathbf{x}$ are explanatory variables, $f_{s}(\mathbf{x})$, $s=1,\dots,S$ are smooth functions, $\eta_{m}^{(l)}$ are latent variables varying at level $l$, and $\boldsymbol{\lambda}_{m}^{(l)}{}^{T} \mathbf{z}_{m}^{(l)}$ is the weighted sum of a vector of explanatory variables $\mathbf{z}_{m}^{(l)}$ varying at level $l$ and parameters $\boldsymbol{\lambda}_{m}^{(l)}$. Let
$$\boldsymbol{\eta}^{(l)} = [\eta_{1}^{(l)}, \dots, \eta_{M_{l}}^{(l)}]^{T} \in \mathbb{R}^{M_{l}}$$
be the vector of all latent variables at level $l$, and
$$\boldsymbol{\eta} = [\boldsymbol{\eta}^{(2)}, \dots, \boldsymbol{\eta}^{(L)}]^{T} \in \mathbb{R}^{M}$$
the vector of all latent variables belonging to a given level-2 unit, where $M = \sum_{l=2}^{L} M_{l}$. The word "level" is here used to denote a grouping level; they are not necessarily hierarchical.
## Structural Model
The structural model specifies how the latent variables are related to each other and to observed variables, and is given by
$$
\boldsymbol{\eta} = \mathbf{B}\boldsymbol{\eta} + \mathbf{h}\left(\mathbf{w}\right)
+ \boldsymbol{\zeta}
$$
where $\mathbf{B}$ is an $M \times M$ matrix of regression coefficients for regression among latent variables and $\mathbf{w} \in \mathbb{R}^{Q}$ is a vector of $Q$ predictors for the latent variables. $\mathbf{h}(\mathbf{w}) = [\mathbf{h}_{2}(\mathbf{w}), \dots, \mathbf{h}_{L}(\mathbf{w})] \in \mathbb{R}^{M}$ is a vector of smooth functions whose components $\mathbf{h}_{l}(\mathbf{w}) \in \mathbb{R}^{M_{l}}$ are vectors of functions predicting the latent variables varying at level $l$, and depending on a subset of the elements $\mathbf{w}$. $\boldsymbol{\zeta}$ is a vector of normally distributed random effects, $\boldsymbol{\zeta}^{(l)} \sim N(\mathbf{0}, \boldsymbol{\Psi}^{(l)})$ for $l=2,\dots,L$, where $\boldsymbol{\Psi}^{(l)} \in \mathbb{R}^{M_{l} \times M_{l}}$ is the covariance matrix of random effects at level $l$. Defining the $M \times M$ covariance matrix $\boldsymbol{\Psi} = \text{diag}(\boldsymbol{\Psi}^{(2)}, \dots, \boldsymbol{\Psi}^{(L)})$, we also have $\boldsymbol{\zeta} \sim N(\mathbf{0}, \boldsymbol{\Psi})$.
## Mixed Model Representation
In @sorensenLongitudinalModelingAgeDependent2023 we show that any model specified as above can be transformed to a GLLAMM, which is essentially a generalized nonlinear mixed model. This transformation is rather complex, so we won't spell it out here, but the key steps are:
1. Converting smooth terms to their mixed model form.
2. Estimate the resulting GLLAMM.
3. Convert back to the original parametrization.
In galamm we use the same transformations as the [gamm4](https://CRAN.R-project.org/package=gamm4) package does.
## Maximum Marginal Likelihood Estimation
In mixed model representation, the nonlinear predictor can be written on the form
$$
\boldsymbol{\nu} = \mathbf{X}(\boldsymbol{\lambda}, \mathbf{B}) \boldsymbol{\beta} + \mathbf{Z}(\boldsymbol{\lambda}, \mathbf{B}) \boldsymbol{\zeta}
$$
where $\mathbf{X}(\boldsymbol{\lambda}, \mathbf{B})$ is the regression matrix for fixed effects $\boldsymbol{\beta}$ and $\mathbf{Z}(\boldsymbol{\lambda}, \mathbf{B})$ is the regression matrix for random effects $\boldsymbol{\zeta}$. In contrast to with generalized linear mixed models, however, both matrices will in general depend on factor loadings $\boldsymbol{\lambda}$ and regression coefficients between latent variables $\mathbf{B}$. Both of these are parameters that need to be estimated, and hence $\mathbf{X}(\boldsymbol{\lambda}, \mathbf{B})$ and $\boldsymbol{\beta}$ and $\mathbf{Z}(\boldsymbol{\lambda}, \mathbf{B})$ need to be updated throughout the estimation process.
### Evaluating the Marginal Likelihood
Plugging the nonlinear predictor into the structural model, we obtain the joint likelihood for the model. We then obtain the marginal likelihood by integrating over the random effects, yielding a marginal likelihood function of the form
$$
L\left(\boldsymbol{\beta}, \boldsymbol{\Lambda}, \boldsymbol{\Gamma}, \boldsymbol{\lambda}, \mathbf{B}, \boldsymbol{\phi}\right) = \left(2 \pi \phi_{1}\right)^{-r/2} \int_{\mathbb{R}^{r}} \exp\left( g\left(\boldsymbol{\beta}, \boldsymbol{\Lambda}, \boldsymbol{\Gamma}, \boldsymbol{\lambda}, \mathbf{B}, \boldsymbol{\phi}, \mathbf{u}\right) \right) \text{d} \mathbf{u}
$$
where $\mathbf{u}$ is a standardized version of $\boldsymbol{\zeta}$. In order to evaluate the marginal likelihood at a given set of parameter values, we use the Laplace approximation combined with sparse matrix operations, extending @batesFittingLinearMixedEffects2015's algorithm for linear mixed models.
### Maximizing the Marginal Likelihood
We obtain maximum marginal likelihood estimates by maximizing $L\left(\boldsymbol{\beta}, \boldsymbol{\Lambda}, \boldsymbol{\Gamma}, \boldsymbol{\lambda}, \mathbf{B}, \boldsymbol{\phi}\right)$, subject to possible constraints, e.g., that variances are non-negative. For this, we use the L-BFGS-B algorithm implement in `stats::optim`. The predicted values of random effects, $\widehat{\mathbf{u}}$ are obtained as posterior modes at the final estimates.
## Example Models
To see how galamm is used in practice, take a look at the vignettes describing models with different components.
- [Linear mixed models with factor structures](https://lcbc-uio.github.io/galamm/articles/lmm_factor.html).
- [Generalized linear mixed models with factor structures](https://lcbc-uio.github.io/galamm/articles/glmm_factor.html).
- [Linear mixed models with heteroscedastic residuals](https://lcbc-uio.github.io/galamm/articles/lmm_heteroscedastic.html).
- [Models with interactions between latent and observed covariates](https://lcbc-uio.github.io/galamm/articles/latent_observed_interaction.html).
- [Mixed models with mixed response types](https://lcbc-uio.github.io/galamm/articles/mixed_response.html).
- [Generalized additive mixed models with factor structures](https://lcbc-uio.github.io/galamm/articles/semiparametric.html).
# References