library(geostan)
## This is geostan version 0.8.2
library(readr)
library(ggplot2)
set.seed(123)先月、「StanでCARモデル」という記事を書いたのですが、よくよく調べるとgeostanというパッケージがありました。今回は、geostanパッケージをつかってCAR (Conditional AutoRegressive) モデルのあてはめをおこないます。
準備
必要なライブラリの読み込みと擬似乱数系列の固定です。
データ
前回と同じく、岩波データサイエンスVol.1で使用した、Y方向10×X方向20のグリッド中のアラカシの株数データを使用します。今回はreadrパッケージのread_csv関数で読み込んでいます。
また、後にでてくるprep_car_data2関数により生成される隣接行列では、データのX軸方向が先に変化するようになっていますので、それにあわせてデータをならべかえています。この場合はデータはsf型にしなくてもよいようです。
qgl <- read_csv(url("https://raw.githubusercontent.com/iwanami-datascience/vol1/master/ito/Qglauca.csv")) |>
dplyr::arrange(Y, X)
## Rows: 200 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): X, Y, N
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.図示
今回もまずはデータをプロットします。
ggplot(qgl, aes(x = X, y = Y)) +
geom_tile(aes(fill = N)) +
scale_y_reverse(breaks = c(0, 5, 10)) +
scale_fill_gradient(name = "株数", low = "grey90", high = "grey10",
limits = c(0, 6)) +
coord_fixed(ratio = 1) +
theme_bw(base_family = "IPAexGothic")
モデル
CARモデル用の空間データの生成
prep_car_data2関数で、CARモデル用の空間データを生成します。prep_car_data2関数はラスターデータ用で、一般にはprep_car_dataを使います。
car_data <- prep_car_data2(row = 10, col = 20)
## Range of permissible rho values: -1, 1あてはめ
stan_car関数でモデルをあてはめます。今回はY方向のグリッド位置の値を説明変数として与えるようにしました。目的変数は、グリッド内の株数という計数値データなので、誤差構造をポアソン分布、リンク関数を対数としています。
また、geostanにはstan_icarという関数もあって、Intrinsic CARモデルのほか、BYMモデルのあてはめもできます。
fit <- stan_car(N ~ Y, data = qgl, car_parts = car_data,
family = poisson(link = log),
iter = 4000, refresh = 1000, cores = 4)
##
## *Setting prior parameters for intercept
## Distribution: normal
## location scale
## 1 0.27 5
##
## *Setting prior parameters for beta
## Distribution: normal
## location scale
## 1 0 5
##
## *Setting prior for CAR scale parameter (car_scale)
## Distribution: student_t
## df location scale
## 1 10 0 3
##
## *Setting prior for CAR spatial autocorrelation parameter (car_rho)
## Distribution: uniform
## lower upper
## 1 -1 1
##
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000232 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.32 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup)
##
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 9.3e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.93 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 4000 [ 0%] (Warmup)
##
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 8.8e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.88 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 4000 [ 0%] (Warmup)
##
## SAMPLING FOR MODEL 'foundation' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 7.7e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.77 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 4000 [ 0%] (Warmup)
## Chain 1: Iteration: 1000 / 4000 [ 25%] (Warmup)
## Chain 2: Iteration: 1000 / 4000 [ 25%] (Warmup)
## Chain 4: Iteration: 1000 / 4000 [ 25%] (Warmup)
## Chain 3: Iteration: 1000 / 4000 [ 25%] (Warmup)
## Chain 3: Iteration: 2000 / 4000 [ 50%] (Warmup)
## Chain 3: Iteration: 2001 / 4000 [ 50%] (Sampling)
## Chain 4: Iteration: 2000 / 4000 [ 50%] (Warmup)
## Chain 4: Iteration: 2001 / 4000 [ 50%] (Sampling)
## Chain 2: Iteration: 2000 / 4000 [ 50%] (Warmup)
## Chain 2: Iteration: 2001 / 4000 [ 50%] (Sampling)
## Chain 1: Iteration: 2000 / 4000 [ 50%] (Warmup)
## Chain 1: Iteration: 2001 / 4000 [ 50%] (Sampling)
## Chain 3: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 4: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 2: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 1: Iteration: 3000 / 4000 [ 75%] (Sampling)
## Chain 3: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 1.526 seconds (Warm-up)
## Chain 3: 1.644 seconds (Sampling)
## Chain 3: 3.17 seconds (Total)
## Chain 3:
## Chain 4: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 1.544 seconds (Warm-up)
## Chain 4: 1.717 seconds (Sampling)
## Chain 4: 3.261 seconds (Total)
## Chain 4:
## Chain 2: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 1.574 seconds (Warm-up)
## Chain 2: 1.706 seconds (Sampling)
## Chain 2: 3.28 seconds (Total)
## Chain 2:
## Chain 1: Iteration: 4000 / 4000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 1.65 seconds (Warm-up)
## Chain 1: 1.698 seconds (Sampling)
## Chain 1: 3.348 seconds (Total)
## Chain 1:
## Warning: There were 1 divergent transitions after warmup. See
## https://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems結果
結果を表示します。Yの係数の事後平均は0.21と推定されました。car_rhoは空間自己相関の大きさ(0で自己相関なし、1でIntrinsic CARと同じ、のはず)。car_scaleはスケールパラメータとのことです。
print(fit)
## Spatial Model Results
## Formula: N ~ Y
## Likelihood: poisson
## Link: log
## Spatial method: CAR
## Residual Moran Coefficient: 0.095349
## Observations: 200
##
## Inference for Stan model: foundation.
## 4 chains, each with iter=4000; warmup=2000; thin=1;
## post-warmup draws per chain=2000, total post-warmup draws=8000.
##
## mean se_mean sd 2.5% 20% 50% 80% 97.5% n_eff Rhat
## intercept -1.583 0.008 0.283 -2.162 -1.818 -1.571 -1.341 -1.054 1311 1.001
## Y 0.211 0.001 0.038 0.140 0.179 0.209 0.242 0.287 2378 1.000
## car_rho 0.063 0.002 0.191 -0.306 -0.100 0.062 0.230 0.431 12441 1.000
## car_scale 1.528 0.010 0.242 1.053 1.329 1.525 1.730 2.004 579 1.009
##
## Samples were drawn using NUTS(diag_e) at Fri Feb 6 16:01:25 2026.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).geostanはRStanでMCMC計算をおこないます。あてはめ結果のオブジェクトのstanfit要素がrstan::stanの返り値となっています。
class(fit$stanfit)
## [1] "stanfit"
## attr(,"package")
## [1] "rstan"あてはめられた値の図示
結果から、あてはめられた値を取り出して図示してみます。
qgl$fitted_mean <- fitted(fit)$mean
ggplot(qgl, aes(x = X, y = Y)) +
geom_tile(aes(fill = fitted_mean)) +
scale_y_reverse(breaks = c(0, 5, 10)) +
scale_fill_gradient(name = "株数", low = "grey90", high = "grey10",
limits = c(0, 6)) +
coord_fixed(ratio = 1) +
theme_bw(base_family = "IPAexGothic")