geostanパッケージでCARモデルのあてはめ

R
Stan
作者

伊東宏樹

公開

2024年8月29日

更新日

2026年2月6日

先月、「StanでCARモデル」という記事を書いたのですが、よくよく調べるとgeostanというパッケージがありました。今回は、geostanパッケージをつかってCAR (Conditional AutoRegressive) モデルのあてはめをおこないます。

準備

必要なライブラリの読み込みと擬似乱数系列の固定です。

library(geostan)
## This is geostan version 0.8.2
library(readr)
library(ggplot2)
set.seed(123)

データ

前回と同じく、岩波データサイエンスVol.1で使用した、Y方向10×X方向20のグリッド中のアラカシの株数データを使用します。今回はreadrパッケージのread_csv関数で読み込んでいます。

また、後にでてくるprep_car_data2関数により生成される隣接行列では、データのX軸方向が先に変化するようになっていますので、それにあわせてデータをならべかえています。この場合はデータはsf型にしなくてもよいようです。

qgl <- read_csv(url("https://raw.githubusercontent.com/iwanami-datascience/vol1/master/ito/Qglauca.csv")) |>
  dplyr::arrange(Y, X)
## Rows: 200 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): X, Y, N
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

図示

今回もまずはデータをプロットします。

ggplot(qgl, aes(x = X, y = Y)) +
  geom_tile(aes(fill = N)) +
  scale_y_reverse(breaks = c(0, 5, 10)) +
  scale_fill_gradient(name = "株数", low = "grey90", high = "grey10",
                      limits = c(0, 6)) +
  coord_fixed(ratio = 1) +
  theme_bw(base_family = "IPAexGothic")

モデル

CARモデル用の空間データの生成

prep_car_data2関数で、CARモデル用の空間データを生成します。prep_car_data2関数はラスターデータ用で、一般にはprep_car_dataを使います。

car_data <- prep_car_data2(row = 10, col = 20)
## Range of permissible rho values: -1, 1

あてはめ

stan_car関数でモデルをあてはめます。今回はY方向のグリッド位置の値を説明変数として与えるようにしました。目的変数は、グリッド内の株数という計数値データなので、誤差構造をポアソン分布、リンク関数を対数としています。

また、geostanにはstan_icarという関数もあって、Intrinsic CARモデルのほか、BYMモデルのあてはめもできます。

fit <- stan_car(N ~ Y, data = qgl, car_parts = car_data,
                family = poisson(link = log),
                iter = 4000, refresh = 1000, cores = 4)
## 
## *Setting prior parameters for intercept
## Distribution: normal
##   location scale
## 1     0.27     5
## 
## *Setting prior parameters for beta
## Distribution: normal
##   location scale
## 1        0     5
## 
## *Setting prior for CAR scale parameter (car_scale)
## Distribution: student_t
##   df location scale
## 1 10        0     3
## 
## *Setting prior for CAR spatial autocorrelation parameter (car_rho)
## Distribution: uniform
##   lower upper
## 1    -1     1
## 
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 0.000232 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.32 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 4000 [  0%]  (Warmup)
## 
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 9.3e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.93 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 4000 [  0%]  (Warmup)
## 
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 8.8e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.88 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 4000 [  0%]  (Warmup)
## 
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 7.7e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.77 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 4000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 4000 [ 25%]  (Warmup)
## Chain 2: Iteration: 1000 / 4000 [ 25%]  (Warmup)
## Chain 4: Iteration: 1000 / 4000 [ 25%]  (Warmup)
## Chain 3: Iteration: 1000 / 4000 [ 25%]  (Warmup)
## Chain 3: Iteration: 2000 / 4000 [ 50%]  (Warmup)
## Chain 3: Iteration: 2001 / 4000 [ 50%]  (Sampling)
## Chain 4: Iteration: 2000 / 4000 [ 50%]  (Warmup)
## Chain 4: Iteration: 2001 / 4000 [ 50%]  (Sampling)
## Chain 2: Iteration: 2000 / 4000 [ 50%]  (Warmup)
## Chain 2: Iteration: 2001 / 4000 [ 50%]  (Sampling)
## Chain 1: Iteration: 2000 / 4000 [ 50%]  (Warmup)
## Chain 1: Iteration: 2001 / 4000 [ 50%]  (Sampling)
## Chain 3: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 4: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 2: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 1: Iteration: 3000 / 4000 [ 75%]  (Sampling)
## Chain 3: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 1.526 seconds (Warm-up)
## Chain 3:                1.644 seconds (Sampling)
## Chain 3:                3.17 seconds (Total)
## Chain 3: 
## Chain 4: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 1.544 seconds (Warm-up)
## Chain 4:                1.717 seconds (Sampling)
## Chain 4:                3.261 seconds (Total)
## Chain 4: 
## Chain 2: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 1.574 seconds (Warm-up)
## Chain 2:                1.706 seconds (Sampling)
## Chain 2:                3.28 seconds (Total)
## Chain 2: 
## Chain 1: Iteration: 4000 / 4000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 1.65 seconds (Warm-up)
## Chain 1:                1.698 seconds (Sampling)
## Chain 1:                3.348 seconds (Total)
## Chain 1:
## Warning: There were 1 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems

結果

結果を表示します。Yの係数の事後平均は0.21と推定されました。car_rhoは空間自己相関の大きさ(0で自己相関なし、1でIntrinsic CARと同じ、のはず)。car_scaleはスケールパラメータとのことです。

print(fit)
## Spatial Model Results 
## Formula: N ~ Y
## Likelihood:  poisson 
## Link:  log 
## Spatial method:  CAR 
## Residual Moran Coefficient:  0.095349 
## Observations:  200 
## 
## Inference for Stan model: foundation.
## 4 chains, each with iter=4000; warmup=2000; thin=1; 
## post-warmup draws per chain=2000, total post-warmup draws=8000.
## 
##             mean se_mean    sd   2.5%    20%    50%    80%  97.5% n_eff  Rhat
## intercept -1.583   0.008 0.283 -2.162 -1.818 -1.571 -1.341 -1.054  1311 1.001
## Y          0.211   0.001 0.038  0.140  0.179  0.209  0.242  0.287  2378 1.000
## car_rho    0.063   0.002 0.191 -0.306 -0.100  0.062  0.230  0.431 12441 1.000
## car_scale  1.528   0.010 0.242  1.053  1.329  1.525  1.730  2.004   579 1.009
## 
## Samples were drawn using NUTS(diag_e) at Fri Feb  6 16:01:25 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

geostanはRStanでMCMC計算をおこないます。あてはめ結果のオブジェクトのstanfit要素がrstan::stanの返り値となっています。

class(fit$stanfit)
## [1] "stanfit"
## attr(,"package")
## [1] "rstan"

あてはめられた値の図示

結果から、あてはめられた値を取り出して図示してみます。

qgl$fitted_mean <- fitted(fit)$mean
ggplot(qgl, aes(x = X, y = Y)) +
  geom_tile(aes(fill = fitted_mean)) +
  scale_y_reverse(breaks = c(0, 5, 10)) +
  scale_fill_gradient(name = "株数", low = "grey90", high = "grey10",
                      limits = c(0, 6)) +
  coord_fixed(ratio = 1) +
  theme_bw(base_family = "IPAexGothic")