Skip to contents

This function executes the Markov Chain Monte Carlo (MCMC) algorithm for STDGLMs, i.e. spatio-temporal dynamic (generalized) linear models.

Usage

stdglm(
  y,
  X,
  Z = NULL,
  offset = NULL,
  point.referenced = TRUE,
  random.walk = FALSE,
  blocks_indices = NULL,
  W,
  W_pred = NULL,
  W_cross = NULL,
  X_pred = NULL,
  Z_pred = NULL,
  offset_pred = NULL,
  ncores = NULL,
  nrep = 100,
  nburn = 100,
  thin = 1,
  print.interval = 10,
  prior = NULL,
  keepY = TRUE,
  keepLogLik = TRUE,
  last_run = NULL
)

Arguments

y

A p-by-t matrix corresponding to the response variable, where p is the number of spatial locations and t is the number of time points. Missing values (NA) are allowed.

X

A p-by-t-by-ncovx array corresponding to the covariates whose effects should vary across space and time.

Z

A p-by-t-by-ncovz array corresponding to the covariates whose effects are constant across space and time. Defaults to NULL.

offset

A p-by-t matrix corresponding to the offset term. If NULL, it is set to zero.

point.referenced

A logical indicating whether the data are point-referenced (TRUE) or areal (FALSE). Default is TRUE. If FALSE, predictions are not performed.

random.walk

A logical indicating whether the temporal dynamic should be modeled as a random walk (TRUE) or a first-order autoregressive process (FALSE). Default is FALSE.

blocks_indices

A list of integer vectors indicating the indices of the blocks for spatial predictions. Defaults to NULL, if no predictions are needed. See details.

W

A p-by-p matrix corresponding to the spatial weights matrix. If point.referenced is TRUE, the distance matrix among the observed locations should be provided. If point.referenced is FALSE, the 0/1 adjacency matrix should be provided.

W_pred

A list with p_b-by-p_b matrices corresponding to the distance matrix for the prediction locations in the b-th block, for b in 1:length(blocks_indices). If NULL, predictions are not performed.

W_cross

A list with p-by-p_b matrices corresponding to the cross distances between the observed and prediction locations in the b-th block.

X_pred

A p_new-by-t_new-by-ncovx array corresponding to the covariates with varying coefficients for predictions, where p_new is the total number of prediction locations and t_new=t+h_ahead, h_ahead>=0, is the number of time points for which predictions are to be made.

Z_pred

A p_new-by-t_new-by-ncovz array corresponding to the covariates with constant coefficients for predictions. Defaults to NULL.

offset_pred

A p_new-by-t_new matrix corresponding to the offset term for predictions. If NULL, it is set to zero.

ncores

An integer indicating the number of cores to parallelize the spatial predictions. If NULL, it defaults to 1.

nrep

An integer indicating the number of iterations to keep after burn-in.

nburn

An integer indicating the number of iterations to discard as burn-in.

thin

An integer indicating the thinning value. Default is 1.

print.interval

An integer indicating the interval at which to print progress messages.

prior

A named list containing the hyperparameters of the model. If NULL, default non-informative hyperparameters are used. See details.

keepY

A logical indicating whether to keep the response variable in the output. Default is TRUE.

keepLogLik

A logical indicating whether to keep the log-likelihood in the output. Default is TRUE.

last_run

An optional list containing the output from a previous run of the function, which can be used to restore the state of the sampler and continue the MCMC. Default is NULL.

Value

A named list containing ave, which stores posterior summaries, and out, which stores the MCMC samples.

The posterior summaries in ave include:

Yfitted_mean, Yfitted2_mean

First two moments of draws from the posterior predictive distribution for the observed data points.

Ypred_mean, Ypred2_mean

p_new-by-t_new matrices with first two moments of draws from the posterior predictive distribution for the new data points (only if out-of-sample predictions are required).

B_postmean, B2_postmean

First two moments of the overall effect of varying coefficients.

Btime_postmean, Btime2_postmean

First two moments of the temporal effect of varying coefficients.

Bspace_postmean, Bspace2_postmean

First two moments of the spatial effect of varying coefficients.

Bspacetime_postmean, Bspacetime2_postmean

First two moments of the spatio-temporal effect of varying coefficients.

B2_c_t_s_st

2nd moment of the varying coefficients, \(\beta_{it}\).

Btime_pred_postmean, Btime_pred2_postmean

First two moments of the temporal effect of varying coefficients at the predicted time points.

Bspace_pred_postmean, Bspace_pred2_postmean

First two moments of the spatial effect of varying coefficients at the predicted spatial locations.

Bspacetime_pred_postmean, Bspacetime_pred2_postmean

First two moments of the spatio-temporal effect of varying coefficients at the predicted spatial locations and all time points.

B_pred2_c_t_s_st

2nd moment of the varying coefficients, \(\beta_{it}\), at the predicted spatial locations and all time points.

Eta_tilde_mean

Posterior mean of the linear predictor (for non-gaussian outcomes).

thetay_mean

Noise-free version of Eta_tilde_mean. Note also that thetay_mean = meanY1mean + meanZmean + offset

meanY1mean

Contribution of covariates with varying coefficients.

meanZmean

Contribution of covariates with non-varying effects.

DIC, Dbar, pD

Deviance Information Criterion, \(DIC = \bar{D} + pD\).

WAIC, se_WAIC, pWAIC, se_pWAIC, elpd, se_elpd

Widely Applicable Information Criterion, the penalty term, and expected log pointwise predictive density. The prefix se_ denotes standard errors. See Gelman et al. (2014).

CRPS

Continuous Ranked Probability Score, as defined in Gschlößl & Czado (2007).

PMCC

Predictive model choice criterion proposed by Gelfand & Ghosh (1998).

pvalue_*

Bayesian p-values, see https://czaccard.github.io/STDGLM/articles/model_output.html.

The MCMC chains included in out are:

sigma2

Measurement error variance.

sigma2_Btime

A J-by-nrep matrix with the variance of the innovation of the temporal effects for j=1,...,J. The j-th row corresponds to the temporal effect of the j-th varying coefficient.

rho1_space, rho2_space

J-by-nrep matrices with spatial correlation parameters. The j-th row corresponds to the spatial effect of the j-th varying coefficient.

rho1_spacetime, rho2_spacetime

J-by-nrep matrices with the correlation parameters of spatially-structured innovations in the spatio-temporal effects. The j-th row corresponds to the spatio-temporal effect of the j-th varying coefficient.

gamma

Regression coefficients related to the covariates with non-varying effects.

loglik

Pointwise log-likelihood, if keepLogLik = TRUE.

fitted

Draws from the posterior predictive distribution for the observed data points, if keepY = TRUE.

Ypred

Draws from the posterior predictive distribution for the out-of-sample data points, if keepY = TRUE.

RMSE

In-sample Root Mean Squared Error.

MAE

In-sample Mean Absolute Error.

chi_sq_pred_, chi_sq_obs_

Chi-square statistics (i.e., sum of squared Pearson residuals) for predicted and observed values.

out contains also other elements needed for restarting.

Details

At the moment, only Gaussian outcomes are supported. The fitted model has the following form: $$y_{it} = \boldsymbol{x}_{it}' \boldsymbol{\beta}_{it} + \boldsymbol{z}_{it}' \boldsymbol{\gamma} + \epsilon_{it}, \quad \epsilon_{it} \sim N(0, \sigma_\epsilon^2)$$ $$\boldsymbol{\beta}_{j, t} = \boldsymbol{F}_{j,t} \boldsymbol{\beta}_{j, t-1} + \boldsymbol{\eta}_{j,t}, \quad \boldsymbol{\eta}_{j,t} \sim N_p(0, \boldsymbol{\Sigma}_{\eta, j}), \quad j=1, \dots, J$$ where \(\boldsymbol{F}_{j,t} = \phi_j^{(\mathsf{T})} \boldsymbol{I}_p\).

The function allows for the decomposition of the state vector into components that can be interpreted as contributions from different sources of variability, that is: $$\beta_{it} = \overline{\beta} + \beta_{i}^{(\mathsf{S})} + \beta_{t}^{(\mathsf{T})} + \beta_{it}^{(\mathsf{ST})}$$ where:

\(\overline{\beta}\)

The overall mean effect.

\(\beta_{i}^{(\mathsf{S})}\)

The spatial effect for location \(i\).

\(\beta_{t}^{(\mathsf{T})}\)

The temporal effect for time \(t\).

\(\beta_{it}^{(\mathsf{ST})}\)

The spatio-temporal effect for location \(i\) and time \(t\).

See the package vignette for more details.

prior must be a named list with these elements:

V_beta_0

A scalar defining the prior variance for the initial state of the time-varying coefficients, related to the covariates in X.

V_gamma

A scalar defining the prior variance for the initial state of the constant coefficients, related to the covariates in Z. Assign a value even if Z is NULL.

a1

A scalar defining the inverse-gamma prior shape for the temporal variance of the time-varying coefficients.

b1

A scalar defining the inverse-gamma prior rate for the temporal variance of the time-varying coefficients.

s2_a

A scalar defining the inverse-gamma prior shape for the measurement error variance.

s2_b

A scalar defining the inverse-gamma prior rate for the measurement error variance.

Out-of-sample predictions are performed only if point.referenced is TRUE and W_pred is provided. For spatial interpolation of space-varying coefficients, computations are performed block-wise. blocks_indices is a list of disjoint sets of indices specifying the block membership of each new spatial location. The b-th element of the list has p_b new spatial locations, and the total number of new spatial locations is p_new, given by the sum of p_b over all blocks. W_pred and W_cross must be lists of length equal to length(blocks_indices). To perform computations in parallel, ncores must be greater than 1.
For temporal predictions of time-varying coefficients, it suffices that t_new>t.

References

Gelfand, A. E., & Ghosh, S. K. (1998). Model choice: a minimum posterior predictive loss approach. Biometrika, 85(1), 1-11.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman and Hall/CRC.
Gschlößl, S., & Czado, C. (2007). Spatial modelling of claim frequency and claim size in non-life insurance. Scandinavian Actuarial Journal, 2007(3), 202–225. doi:10.1080/03461230701414764

See also

Examples

if (FALSE) { # \dontrun{
data(ApuliaAQ, package = "STDGLM")
p = length(unique(ApuliaAQ$AirQualityStation)) # 51
t = length(unique(ApuliaAQ$time))              # 365

# distance matrix
W = as.matrix(dist(cbind(ApuliaAQ$Longitude[1:p], ApuliaAQ$Latitude[1:p])))

# response variable: temperature
y = matrix(ApuliaAQ$CL_t2m, p, t)
# covariates with spacetime-varying coefficients: intercept + altitude
X = array(1, dim = c(p, t, 2))
X[,,2] = matrix(ApuliaAQ$Altitude, p, t)

mod <- stdglm(y=y, X=X, W=W)

# Model with spacetime-varying intercept, but fixed altitude effect
mod2 <- stdglm(y=y, X=X[,,1,drop=FALSE], Z=X[,,2,drop=FALSE], W=W)
} # }