glmmTMBによるゼロ過剰ポアソンモデルのあてはめ

glmmTMBは、TMBを利用して一般化線形混合モデルのあてはめをおこなうRパッケージです。これをつかって、変量効果のあるゼロ過剰ポアソンモデルのあてはめをやってみました。

準備

library(glmmTMB)
## Warning in checkDepPackageVersion(dep_pkg = "TMB"): Package version inconsistency detected.
## glmmTMB was built with TMB version 1.9.11
## Current TMB version is 1.9.14
## Please re-install glmmTMB from source or restore original 'TMB' package (see '?reinstalling' for more information)
library(ggplot2)
set.seed(123)

データ

ゼロ過剰ポアソン分布で、ポアソン平均に変量効果が加わるというモデルを想定して、これにしたがう模擬データを生成します。\(p\)はベルヌーイ分布で0となる確率、\(q\)はその逆ロジット、\(\beta\)は全体のポアソン平均の対数、\(\epsilon\)は群\(g\)の変量効果とします。\(\sigma\)は変量効果の標準偏差です。

\[ z_i \sim \mathrm{Bern}(1 - p) \]

\[ \mathrm{logit}(p) = q \]

\[ y_i \sim \mathrm{Pois}(z_i \exp(\beta + \epsilon_{g[i]})) \]

\[ \epsilon_g \sim \mathrm{N}(0, \sigma^2) \]

ここでは、\(q\)に-0.8、\(\beta\)に1.2、\(\sigma\)に0.5を与えて、模擬データを生成しました。

N <- 400
N_group <- 10
group <- rep(1:N_group, each = N / N_group)
q <- -0.8    # inv_logit(0.8) = 0.310
beta <- 1.2  # exp(1.2) = 3.32
sigma <- 0.5

ranef <- rnorm(N_group, 0, sigma)
y <- rbinom(N, 1, 1 - 1 / (1 + exp(-q))) *
     rpois(N, exp(beta + ranef[group]))
df <- data.frame(y = y, group = group)

ggplot(df, aes(x = y)) +
  geom_bar()

glmmTMBによるあてはめと結果

glmmTMBでモデルを当てはめます。今回は、右辺は切片だけで、それに変量効果が加わるモデルとなっています。

モデル式はlme4パッケージのglmerなどと同様ですが、ziformula引数でゼロ過剰の式を指定するようになっています。

fit <- glmmTMB(y ~ (1|group), data = df, family = poisson,
               ziformula = ~1)

結果の要約です。

切片(\(\beta\))の値として1.23が、ベルヌーイ分布部分で0となる確率(\(p\))のロジット(\(q\))として-0.88が、変量効果の標準偏差(\(\sigma\))として0.47が、それぞれ推定されました。だいたい元の値に近い値となっています。

summary(fit)
##  Family: poisson  ( log )
## Formula:          y ~ (1 | group)
## Zero inflation:     ~1
## Data: df
## 
##      AIC      BIC   logLik deviance df.resid 
##   1573.3   1585.3   -783.7   1567.3      397 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.2178   0.4667  
## Number of obs: 400, groups:  group, 10
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.2305     0.1522   8.086 6.19e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Zero-inflation model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.8808     0.1217  -7.241 4.47e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

NIMBLEとの比較

ためしにNIMBLEと結果を比較してみます。

library(nimble)
## nimble version 1.2.0 is loaded.
## For more information on NIMBLE and a User Manual,
## please visit https://R-nimble.org.
## 
## Note for advanced users who have written their own MCMC samplers:
##   As of version 0.13.0, NIMBLE's protocol for handling posterior
##   predictive nodes has changed in a way that could affect user-defined
##   samplers in some situations. Please see Section 15.5.1 of the User Manual.
## 
## 次のパッケージを付け加えます: 'nimble'
## 以下のオブジェクトは 'package:stats' からマスクされています:
## 
##     simulate
## 以下のオブジェクトは 'package:base' からマスクされています:
## 
##     declare
library(posterior)
## This is posterior version 1.6.0
## 
## 次のパッケージを付け加えます: 'posterior'
## 以下のオブジェクトは 'package:stats' からマスクされています:
## 
##     mad, sd, var
## 以下のオブジェクトは 'package:base' からマスクされています:
## 
##     %in%, match

NIMBLEのモデル

code <- nimbleCode({
  for (n in 1:N) {
    z[n] ~ dbern(1 - p)
    Y[n] ~ dpois(z[n] * exp(beta + epsilon[g[n]]))
  }
  for (i in 1:G) {
    epsilon[i] ~ dnorm(0, sd = sigma)
  }

  # priors
  beta ~ dnorm(0, sd = 10)
  p ~ dunif(0, 1)
  sigma ~ dunif(0, 10)

  # generated quantities
  q <- log(p / (1 - p))
})

あてはめと結果

あてはめと、事後分布の要約です。こちらもだいたい元の値を再現できました。

\(\sigma\)の事後平均と中央値がやや大きくなっていますが、これは事後分布の裾が右に長くなっているためです。

fit <- nimbleMCMC(code,
                  constants = list(N = N, G = N_group, g = df$group),
                  data = list(Y = df$y),
                  inits = list(z = rep(1, N)),
                  monitors = c("beta", "q", "sigma"),
                  niter = 20000,
                  nburnin = 10000,
                  nchains = 3,
                  samplesAsCodaMCMC = TRUE,
                  summary = FALSE) |>
  as_draws_list()
## Defining model
## Building model
## Setting data and initial values
## Running calculate on model
##   [Note] Any error reports that follow may simply reflect missing values in model variables.
## Checking model sizes and dimensions
##   [Note] This model is not fully initialized. This is not an error.
##          To see which variables are not initialized, use model$initializeInfo().
##          For more information on model initialization, see help(modelInitialization).
## Checking model calculations
## [Note] NAs were detected in model variables: beta, logProb_beta, p, logProb_p, sigma, logProb_sigma, lifted_d1_minus_p, q, epsilon, logProb_epsilon, logProb_z, lifted_z_oBn_cB_times_exp_oPbeta_plus_epsilon_oBg_oBn_cB_cB_cP_L3, logProb_Y.
## Compiling
##   [Note] This may take a minute.
##   [Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
## running chain 1...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
## running chain 2...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
## running chain 3...
## |-------------|-------------|-------------|-------------|
## |-------------------------------------------------------|
summary(fit)
## # A tibble: 3 × 10
##   variable   mean median    sd   mad     q5    q95  rhat ess_bulk ess_tail
##   <chr>     <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>    <dbl>    <dbl>
## 1 beta      1.24   1.23  0.209 0.189  0.920  1.63   1.02     98.2     105.
## 2 q        -0.883 -0.879 0.120 0.120 -1.08  -0.691  1.00   5370.     7089.
## 3 sigma     0.591  0.555 0.189 0.148  0.371  0.933  1.00    874.     2297.