Introduction
The STDGLM
package provides a framework for fitting
spatio-temporal dynamic generalized linear models. These models are
useful for analyzing data that varies over both space and time, allowing
for the incorporation of spatial and temporal dependencies in the
modeling process. The package provides functions for fitting these
models, as well as tools for visualizing and interpreting the
results.
Installation
You can install the package from GitHub using the following command:
if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
require(devtools)
}
devtools::install_github("czaccard/STDGLM")
Run the following command to load the package:
Quick Usage Example
data(ApuliaAQ)
p = length(unique(ApuliaAQ$AirQualityStation)) # 51
t = length(unique(ApuliaAQ$time)) # 365
# distance matrix
W = as.matrix(dist(cbind(ApuliaAQ$Longitude[1:p], ApuliaAQ$Latitude[1:p])))
# response variable: temperature
y = matrix(ApuliaAQ$CL_t2m, p, t)
# covariates (intercept + altitude)
X = array(1, dim = c(p, t, 2))
X[,,2] = matrix(ApuliaAQ$Altitude, p, t)
mod <- stdglm(y=y, X=X, W=W)
Detailed Explanation on supported STDGLMs
Let
denote the number of spatial units (either georeferenced
locations or areal units) where data is collected, and let
denote the number of time points. As for the current version of
the package (0.0.0.9000
), only Gaussian outcomes are
supported. Specifically only dynamic linear models (DLMs) with the
following observation equation can be handled:
where:
is the response variable at spatial unit and time ,
is a -dimensional () vector of covariates at spatial unit at time (an intercept may or may not be included here),
is the state vector at time at spatial unit ,
is a -dimensional vector of covariates whose effects are constant (an intercept may or may not be included here),
is a vector of non-varying coefficients,
is the observation error at time at spatial unit .
The evolution of the state vector is described by the following state equation, which accounts for spatial correlations in the state vector:
where:
is the state vector related to the -th covariate , at time for all spatial units,
is a transition matrix,
is the covariance matrix of the state evolution error .
The transition matrix is assumed to be a scalar multiple of the identity matrix. The parameter controls the temporal autocorrelation of the state vector.
The state evolution covariance matrix can be structured to reflect spatial relationships, e.g.Β by assuming an exponential covariance function if the data are point-referenced: where is the partial sill, is the range parameter, and $ d_{i}$ is the Euclidean distance between locations and . At the moment, this is the only supported covariance structure for point-referenced data.
In this case, the evolution error is assumed to be a zero-mean Gaussian process with exponential covariance matrix parameteterized by and , which we will denote as .
If the data are areal, a proper conditional autoregressive (PCAR) covariance structure is assumed: where is a binary adjacency matrix, is a diagonal matrix with row sums of on the diagonal, and and are the conditional variance and autocorrelation parameters, respectively.
In this case, the evolution error follows a zero-mean PCAR process, and we will denote this as .
ANOVA Decomposition of the State Vector
The function stdglm
allows for the decomposition of the
state vector into components that can be interpreted as contributions
from different sources of variability. Dropping the subscript
for the sake of simplicity, the state vector
is decomposed as follows:
where:
is the overall mean effect,
is the spatial effect at spatial unit ,
is the temporal effect at time ,
is the interaction effect between space and time at spatial unit and time .
Bayesian Hierarchical Structure
The Bayesian model is as follows, for and :
The model is completed with the following priors (again, dropping the subscript , since these are common across ): where denotes a normal distribution with mean and variance truncated to the interval . The hyperparameters and depend on the type of spatial data. If the data are point-referenced, they are set based on the minimum and maximum distances between points, respectively. If the data are areal, and .
Note that the spacetime-varying coefficients are assumed independent a priori across .
Efficient Inference and Identifiability
To build an efficient sampler, the algorithm proposed by Chan and Jeliazkov (2009) is used in conjuction with sparse matrix techniques.
To make the model identifiable, some constraints are imposed on the varying parameters at each MCMC iteration:
Set .
Set .
Set for each .
Set for each .