geostanパッケージでCARモデルのあてはめ

先月、「StanでCARモデル」という記事を書いたのですが、よくよく調べるとgeostanというパッケージがありました。今回は、geostanパッケージをつかってCAR (Conditional AutoRegressive) モデルのあてはめをおこないます。

準備

必要なライブラリの読み込みと擬似乱数系列の固定です。

library(geostan)
## This is geostan version 0.6.2
library(readr)
library(ggplot2)
set.seed(123)

データ

前回と同じく、岩波データサイエンスVol.1で使用した、Y方向10×X方向20のグリッド中のアラカシの株数データを使用します。今回はreadrパッケージのread_csv関数で読み込んでいます。

また、後にでてくるprep_car_data2関数により生成される隣接行列では、データのX軸方向が先に変化するようになっていますので、それにあわせてデータをならべかえています。

qgl <- read_csv(url("https://raw.githubusercontent.com/iwanami-datascience/vol1/master/ito/Qglauca.csv")) |>
  dplyr::arrange(Y, X)
## Rows: 200 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): X, Y, N
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

図示

今回もまずはデータをプロットします。

ggplot(qgl, aes(x = X, y = Y)) +
  geom_tile(aes(fill = N)) +
  scale_y_reverse(breaks = c(0, 5, 10)) +
  scale_fill_gradient(name = "株数", low = "grey90", high = "grey10",
                      limits = c(0, 6)) +
  coord_fixed(ratio = 1) +
  theme_bw(base_family = "IPAexGothic")

モデル

CARモデル用の空間データの生成

prep_car_data2関数で、CARモデル用の空間データを生成します。prep_car_data2関数はラスターデータ用で、一般にはprep_car_dataを使います。

car_data <- prep_car_data2(row = 10, col = 20)
## Range of permissible rho values: -1, 1

あてはめ

stan_car関数でモデルをあてはめます。今回はY方向のグリッド位置の値を説明変数として与えるようにしました。目的変数は、グリッド内の株数という計数値データなので、誤差構造をポアソン分布、リンク関数を対数としています。

また、geostanにはstan_icarという関数もあって、Intrinsic CARモデルのほか、BYMモデルのあてはめもできます。

fit <- stan_car(N ~ Y, data = qgl, car_parts = car_data,
                family = poisson(link = log),
                iter = 4000, refresh = 1000, cores = 4)
## 
## *Setting prior parameters for intercept
## Distribution: normal
##   location scale
## 1     0.27     5
## 
## *Setting prior parameters for beta
## Distribution: normal
##   location scale
## 1        0     5
## 
## *Setting prior for CAR scale parameter (car_scale)
## Distribution: student_t
##   df location scale
## 1 10        0     3
## 
## *Setting prior for CAR spatial autocorrelation parameter (car_rho)
## Distribution: uniform
##   lower upper
## 1    -1     1

結果

結果を表示します。Yの係数の事後平均は0.21と推定されました。car_rhoは空間自己相関の大きさ（0で自己相関なし、1でIntrinsic CARと同じ、のはず）。car_scaleはスケールパラメータとのことです。

print(fit)
## Spatial Model Results 
## Formula: N ~ Y
## Spatial method (outcome):  CAR 
## Likelihood function:  poisson 
## Link function:  log 
## Residual Moran Coefficient:  0.09501938 
## WAIC:  512.75 
## Observations:  200 
## Data models (ME): none
## Inference for Stan model: foundation.
## 4 chains, each with iter=4000; warmup=2000; thin=1; 
## post-warmup draws per chain=2000, total post-warmup draws=8000.
## 
##             mean se_mean    sd   2.5%    20%    50%    80%  97.5% n_eff  Rhat
## intercept -1.582   0.008 0.284 -2.189 -1.814 -1.568 -1.342 -1.058  1421 1.002
## Y          0.210   0.001 0.037  0.141  0.179  0.209  0.241  0.286  2723 1.001
## car_rho    0.063   0.002 0.194 -0.320 -0.099  0.063  0.230  0.440 12455 1.000
## car_scale  1.527   0.010 0.246  1.059  1.319  1.515  1.734  2.027   590 1.004
## 
## Samples were drawn using NUTS(diag_e) at Thu Aug 29 14:05:07 2024.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

geostanはRStanでMCMC計算をおこないます。あてはめ結果のオブジェクトのstanfit要素がrstan::stanの返り値となっています。

class(fit$stanfit)
## [1] "stanfit"
## attr(,"package")
## [1] "rstan"

あてはめられた値の図示

結果から、あてはめられた値を取り出して図示してみます。

fitted_mean <- rstan::summary(fit$stanfit, pars = "fitted")$summary[, "mean"]
qgl$fitted_mean <- fitted_mean
ggplot(qgl, aes(x = X, y = Y)) +
  geom_tile(aes(fill = fitted_mean)) +
  scale_y_reverse(breaks = c(0, 5, 10)) +
  scale_fill_gradient(name = "株数", low = "grey90", high = "grey10",
                      limits = c(0, 6)) +
  coord_fixed(ratio = 1) +
  theme_bw(base_family = "IPAexGothic")