Efficient Gibbs sampling of posterior distribution of parameters in GLM
Source:R/mcmcglm.R
mcmcglm.Rd
Obtain MCMC samples using slice sampling within Gibbs for generalized linear models (GLMs) using compute graph Gibbs (CGGibbs) which has linear runtime in the number of variables in the model matrix. Method is described in the article Is Gibbs sampling faster than Hamiltonian Monte Carlo on GLMs?, and see more in details below.
Usage
mcmcglm(
formula,
family = gaussian,
data,
beta_prior = distributional::dist_normal(0, 1),
log_likelihood_extra_args = list(sd = 1),
linear_predictor_calc = c("update", "naive"),
sample_method = c("slice_sampling", "normal-normal"),
qslice_fun = qslice::slice_stepping_out,
...,
n_samples = 500,
burnin = 100
)
Arguments
- formula
an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. See more details at stats::glm
- family
a description of the error distribution and link function to be used in the model. This can be a
character
string naming a family function, a family function or the result of a call to a family function. (See family for details of family functions.)- data
an optional
data frame
,list
orenvironment
(or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken fromenvironment(formula)
, typically the environment from which the function is called.- beta_prior
a
distribution
object created by a function from thedistributional
package. Could fx. bedistributional::dist_normal(mean = 0, sd = 1)
.- log_likelihood_extra_args
a named
list
with arguments passed onto the log_density function. Fx. specification oflog_likelihood_extra_args = list(sd = x)
is needed for the case offamily = "gaussian"
- linear_predictor_calc
a
character
specifying the method used to calculate the linear predictor in each step of the gibbs algorithm. Default is "update", which uses the CGGibbs procedure as described at the start of this section in a vignette. Other option is "naive", which does the usual- sample_method
a
character
specifying the method used for sampling. The default"slice_sampling"
is the intended value in most cases, as it works for any specification offamily
andbeta_prior
."normal-normal"
uses a conditional normal distribution to sample from in case of conjugate prior with gaussian response andbeta_prior
. Implemented for testing purposes but works for that niche case.- qslice_fun
a
function
from the qslice package. Default is qslice::slice_stepping_out which uses the slice sampler from Neal 2003, but all functions are available.- ...
arguments passed onto the function specified by
qslice_fun
. For default qslice::slice_stepping_outw
needs to be specified, while for fx. qslice::slice_elliptical,mu
andsigma
need to be specified- n_samples
a
numeric
with number of samples to draw of each parameter(/variable) in the model- burnin
a
numeric
with the number of samples to be marked as "burnin". Burnin samples are not included in thebeta_mean
calculation to increase finite sample performance of the LLN estimate
Value
An object of class mcmcglm
with methods for getting a data.frame
of parameter samples, plotting, etc.
Details
uses an updating scheme for the linear predictor during each draw of Gibbs sampling on coordinates of the parameter vector
Examples
if (FALSE) { # \dontrun{
# Create test data for different scenarios
n <- 100
x1 <- rnorm (n)
x2 <- rbinom (n, 1, .5)
b0 <- 1
b1 <- 1.5
b2 <- 2
lin_pred <- b0+b1*x1+b2*x2
#############################################
# Different families and priors
# For family "gaussian" and iid normal prior
y_norm <- rnorm(n, mean = lin_pred, sd = 1)
dat_norm <- data.frame(Y = y_norm, X1 = x1, X2 = x2)
norm <- mcmcglm(formula = Y ~ .,
data = dat_norm,
beta_prior = distributional::dist_normal(0, 1),
family = "gaussian",
n_samples = 100,
burnin = 10,
sample_method = "slice_sampling",
qslice_fun = qslice::slice_stepping_out,
w = 0.5)
norm
# For family "binomial" with logit link and iid gamma distributed prior
y_logit <- rbinom (n, 1, arm::invlogit(lin_pred))
dat_logit <- data.frame(Y = y_logit, X1 = x1, X2 = x2)
logit <- mcmcglm(formula = Y ~ .,
data = dat_logit,
beta_prior = distributional::dist_gamma(shape = 1, rate = 0.5),
family = binomial(link = "logit"),
n_samples = 100,
burnin = 10,
sample_method = "slice_sampling",
qslice_fun = qslice::slice_stepping_out,
w = 0.8)
logit
# For family "negative.binomial" and multivariate normal specification of parameter priors
y_log <- rnbinom(n, size = 1, mu = exp(lin_pred))
dat_log <- data.frame(Y = y_log, X1 = x1, X2 = x2)
log <- mcmcglm(formula = Y ~ X1,
data = dat_log,
beta_prior = distributional::dist_multivariate_normal(
mu = list(c(1, 2)),
sigma = list(matrix(c(1, 0.5, 0.5, 1), ncol = 2))
),
family = MASS::negative.binomial(3),
n_samples = 100,
burnin = 10,
sample_method = "slice_sampling",
qslice_fun = qslice::slice_stepping_out,
w = 0.8)
log
# For family "negative.binomial" and specification of different independent
# priors for each parameter
log2 <- mcmcglm(formula = Y ~ .,
data = dat_log,
beta_prior = list(distributional::dist_normal(0, 1),
distributional::dist_gamma(1, 1),
distributional::dist_exponential(2)),
family = MASS::negative.binomial(3),
n_samples = 100,
burnin = 10,
sample_method = "slice_sampling",
qslice_fun = qslice::slice_stepping_out,
w = 0.8)
log2
#############################################
# Using a different slice function
log3 <- mcmcglm(formula = Y ~ .,
data = dat_log,
beta_prior = list(distributional::dist_normal(0, 1),
distributional::dist_gamma(1, 1),
distributional::dist_exponential(2)),
family = MASS::negative.binomial(3),
n_samples = 100,
burnin = 10,
sample_method = "slice_sampling",
qslice_fun = qslice::slice_elliptical,
mu = 1.5,
sigma = 2)
log3
} # }