Stanによる歪正規分布のあてはめ – 伊東生態統計研究室

Stanの確率分布には歪正規分布もありますので、ちょっとためしてみました。

歪正規分布

歪正規分布についての数学的な詳細は、WikipediaのSkew normal distributionなどをご覧いただきたいのですが、要は正規分布を左右非対称に歪ませたようなものです。

Rでは、snパッケージで扱うことができます。

snパッケージとあとで使うパッケージを読み込みます。乱数のシードも設定します。

library(sn)
## 要求されたパッケージ stats4 をロード中です
## 
## 次のパッケージを付け加えます: 'sn'
## 以下のオブジェクトは 'package:stats' からマスクされています:
## 
##     sd
library(ggplot2)
library(rstan)
## 要求されたパッケージ StanHeaders をロード中です
## 
## rstan version 2.32.7 (Stan version 2.32.2)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
## change `threads_per_chain` option:
## rstan_options(threads_per_chain = 1)

set.seed(123)

形状パラメータのαを変化させて確率分布関数を描画してみます。

ggplot(data.frame(x = seq(-4, 4, length = 101)), aes(x)) +
  geom_function(fun = dsn, args = list(alpha = 0), colour = "black") +
  geom_function(fun = dsn, args = list(alpha = -8), colour = "pink") +      
  geom_function(fun = dsn, args = list(alpha = -3), colour = "orange") +      
  geom_function(fun = dsn, args = list(alpha = -1), colour = "red") +
  geom_function(fun = dsn, args = list(alpha = 1), colour = "blue") +
  geom_function(fun = dsn, args = list(alpha = 3), colour = "purple") +
  geom_function(fun = dsn, args = list(alpha = 8), colour = "skyblue")

乱数発生とあてはめ

rsn関数で、歪正規分布にしたがう乱数を発生させることができます。

x <- rsn(400, xi = 0, omega = 1, alpha = 3)
ggplot(data.frame(x = x), aes(x)) +
  geom_histogram(binwidth = 0.2, boundary = 0)

発生させた乱数に、Stanで歪正規分布をあてはめ、パラメータを推定します。

Stanのモデルです。Stanにはskew_normalという分布が用意されていますので、これを使います。

コンパイル下モデルをstan_modelというオブジェクトに格納しておきます。

data {
  int<lower=0> N;
  vector[N] X;
}
parameters {
  real xi;
  real<lower=0> omega;
  real alpha;
}
model {
  X ~ skew_normal(xi, omega, alpha);
}

この記事ではRStudio上でコンパイルしていますが、たとえばモデルを"model.stan"というファイルに保存した場合には以下のようにします。

stan_model <- stan_model(file = "model.stan")

rstan::sampling関数であてはめをおこないます。

fit <- rstan::sampling(stan_model,
                       data = list(N = length(x), X = x),
                       chain = 4, iter = 2000, warmup = 1000,
                       refresh = 0)

結果

パラメータの事後分布の要約を表示します。

print(fit)
## Inference for Stan model: anon_model.
## 4 chains, each with iter=2000; warmup=1000; thin=1; 
## post-warmup draws per chain=1000, total post-warmup draws=4000.
## 
##         mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
## xi     -0.06    0.00 0.05  -0.15  -0.09  -0.06  -0.03   0.04  1221    1
## omega   1.03    0.00 0.05   0.93   0.99   1.02   1.06   1.13  1307    1
## alpha   3.50    0.02 0.59   2.50   3.09   3.46   3.86   4.75  1177    1
## lp__  -11.87    0.03 1.24 -15.13 -12.45 -11.54 -10.96 -10.44  1554    1
## 
## Samples were drawn using NUTS(diag_e) at Tue Mar 18 20:49:00 2025.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

だいたい元の値に近い値が推定されました。