Kwangmin Kim - Ch.19 § 19.4 심화 — Exercises: Mixture Prior Dilution·Golf Putting·Ill-posed 지수 합

1 개요

Ch.19 심화 시리즈의 마지막 편. § 19.4 연습문제 3개를 깊이 풀이한다.

Ch.19 시리즈 구성:

04-19-0 — Ch.19 Overview (3개 절 + Part V 전환 논리).
04-19-1 — § 19.1~19.3 (Serial dilution assay + Population toxicokinetics + 문헌).
04-19-2 (본편) — § 19.4 Exercises + Ch.19 총결산 + Ch.20 재예고.

3 연습문제가 다루는 주제는 비선형 베이즈 모델링의 공통 실전 이슈:

Exercise	핵심 주제	교훈
1. Mixture Prior Dilution	Hierarchical prior 확장, sparse concentration	“\(\theta_j = 0\) 을 데이터 기반으로 판별”
2. Golf Putting	Physical derivation of nonlinear model	물리 유도가 naive fitting 보다 정확
3. Ill-posed Exponential Sum	Non-identifiability, parameter swap	Flat prior 의 한계, 식별성 진단

직관: 3문제의 공통 메시지

Ch.19 의 핵심 교훈: “비선형 모델링 = 도메인 지식 + 베이즈 + 계산 고려 사항의 합주”.

Ex.1 — 도메인 (일부 샘플은 진짜 0 일 수 있음) 이 prior 구조를 결정.
Ex.2 — 도메인 (물리 역학) 이 모형 함수 형태를 결정.
Ex.3 — 수학 (parameter swap) 이 추론의 한계를 결정.

통합 메시지: 비선형 모델링에서는 “어떤 수식을 왜 쓰는가” 가 “어떻게 적합하는가” 보다 중요하다. 수식 선택을 도메인 지식 없이 하면 Ex.3 같은 ill-posed 문제에 빠지거나, Ex.1 같은 잘못된 prior 로 편향된다.

2 Exercise 1 — Serial Dilution with Mixture Prior

2.1 문제

§ 19.1 의 기본 serial dilution 모델:

\[ \log \theta_j \sim U(-\infty, \infty) \quad (\text{no pooling, any positive value}) \]

확장: 일부 미지 시료는 진짜 농도가 0 일 수 있다 (화합물이 아예 없음). 예: cockroach allergen 조사에서 allergen 이 없는 집.

Mixture prior:

\[ \theta_j = \begin{cases} 0 & \text{with probability } \pi \\ \theta_j^+ & \text{with probability } 1 - \pi, \; \log \theta_j^+ \sim N(\mu_\theta, \sigma_\theta^2) \end{cases} \]

2.2 (a) 기본 모형 (§ 19.1)

04-19-1 에서 이미 유도. 핵심 식:

\[ E(y | x, \beta) = \beta_1 + \frac{\beta_2}{1 + (x/\beta_3)^{-\beta_4}} \quad \text{(19.1)} \]

\(\theta_j\) flat on log scale. 결과: Unknown 8 (실제 거의 0) 의 posterior 가 0 에 가깝지만 >0 범위 (e.g., 0.0001~0.002).

2.3 (b) Mixture Prior 확장

Spike-and-slab 형태:

\[ p(\theta_j) = \pi \cdot \delta_0(\theta_j) + (1 - \pi) \cdot \text{LogNormal}(\theta_j | \mu_\theta, \sigma_\theta^2) \]

\(\pi\) = zero proportion (hyperparameter).

Latent indicator \(z_j \in \{0, 1\}\):

\[ z_j \sim \text{Bernoulli}(1 - \pi) \]

\[ \theta_j = z_j \cdot \theta_j^+, \quad \theta_j^+ \sim \text{LogNormal}(\mu_\theta, \sigma_\theta^2) \]

2.4 Hyperprior 선택

\(\pi \sim \text{Beta}(1, 1)\) (uniform on [0, 1]).
\(\mu_\theta \sim N(M_\theta, S_\theta^2)\) — 문맥 기반.
\(\sigma_\theta \sim \text{HalfNormal}\).

2.5 Gibbs Sampler

\(z_j | y_j, \theta, \beta, \pi\): Bernoulli posterior.

\(z_j = 1\) 확률:

\[ P(z_j = 1 | \cdot) = \frac{(1-\pi) \cdot p(y_j | \theta_j^+, \beta)}{\pi \cdot p(y_j | 0, \beta) + (1-\pi) \cdot p(y_j | \theta_j^+, \beta)} \]
\(\theta_j^+ | z_j = 1, y_j\): Metropolis on log scale.
\(\theta_j^+\) (\(z_j = 0\)): prior from \(N(\mu_\theta, \sigma_\theta^2)\).
\(\beta, \mu_\theta, \sigma_\theta, \pi\): standard conjugate or Metropolis.

2.6 (c) 두 모형 비교

같은 데이터에 두 모형 적합:

데이터 상황	기본 모형	Mixture 모형
모든 unknowns > 0	Similar	Similar (mixture 가 \(\pi \approx 0\) 추정)
Unknown 8 만 거의 0	\(\hat\theta_8 \approx 0.0005\)	\(P(z_8 = 0 \\| y) \approx 0.7\), \(\hat\theta_8 \approx 0.3 \times \text{small positive}\)
여러 unknowns = 0	모두 작은 양수	\(\pi\) 증가, 해당 \(z_j = 0\) 강하게 선호

Mixture 의 장점: “정말로 0 인 시료” 와 “매우 낮은 농도 시료” 를 구분.

기본 모형의 장점: 더 단순, \(\theta_j\) 가 연속 — MCMC 혼합 안정.

2.7 (d) 차이를 보이는 Dataset 설계

목표: 두 모형이 크게 다른 추론 을 주는 synthetic data.

전략:

Unknowns 10개 중 5개를 진짜 0, 나머지 5개는 \(\theta_j \in [0.01, 0.1]\).
표준 시료는 10개 희석 × 2 replicates (calibration 정확).
Serial dilution 정확하게 실행.

예상 결과:

기본 모형: 5개 zero unknowns 를 모두 “낮은 양수” 로 추정. 평균 ≈ 0.001~0.005 (noise-driven).
Mixture 모형: 5개 zero unknowns 에 \(z_j = 0\) 강력히 할당 (\(P > 0.95\)). Non-zero unknowns 은 거의 동일.

Extrapolation 위험: 기본 모형은 “Unknown 9 를 10 배 희석한 샘플” 예측 시 non-zero 결과 반환 (잘못). Mixture 는 정확히 0 예측.

2.8 Python 스케치

import numpy as np
import pymc as pm

# simulated data with 5 true zeros
theta_true = np.array([0.05, 0.02, 0, 0.08, 0, 0.01, 0, 0.04, 0, 0])
# ... (data generation as in 04-19-1) ...


# Mixture prior model
with pm.Model() as mixture_model:
    # 4PL parameters
    log_beta = pm.Normal("log_beta", 0, 2, shape=4)
    beta = pm.Deterministic("beta", pm.math.exp(log_beta))

    # mixture
    pi = pm.Beta("pi", 2, 2)  # ~uniform with slight concentration
    mu_theta = pm.Normal("mu_theta", -3, 2)
    sigma_theta = pm.HalfNormal("sigma_theta", 2)

    z = pm.Bernoulli("z", p=1 - pi, shape=10)
    log_theta_plus = pm.Normal("log_theta_plus", mu_theta, sigma_theta, shape=10)
    theta_plus = pm.math.exp(log_theta_plus)
    theta = pm.Deterministic("theta", z * theta_plus)

    # ... (likelihood) ...
    # x = theta[unk_idx] * dilution_i
    # y ~ Normal(fpl(x, beta), heteroscedastic_sigma)

주의: Bernoulli latent 는 HMC 불가능 → pm.sample 시 Mixture 로 marginalize 또는 SMC/Gibbs 대체.

3 Exercise 2 — Golf Putting Probability

3.1 데이터 — Table 19.1

Professional golfers 의 거리별 퍼팅 성공률. 거리 2~20 feet, 각 거리에서 시도 수와 성공 수.

Distance (ft)	Tries	Successes	\(\hat p\)
2	1443	1346	0.933
3	694	577	0.831
5	353	208	0.589
10	200	67	0.335
15	167	28	0.168
20	152	24	0.158

거리 → 성공률 감소. 패턴은 비선형.

3.2 (a) Nonlinear Binomial Model

Naive 접근: 로지스틱 회귀.

\[ \text{logit}(p) = \alpha + \beta \cdot \text{distance} \]

문제:

거리 0 에서는 \(p \to 1\) 이어야 자연 (가까이면 반드시 성공). 로지스틱은 \(\alpha\) 큰 양수.
거리 매우 큼에서 \(p \to 0\). OK.
그러나 중간 거리 에서 피팅이 naive sigmoid 로는 부정확.

Berry-Nolan geometric 모형 (교재 미포함 agent 지식):

물리 유도: 퍼팅 성공은 두 조건 동시 만족:

각도 오차 \(\theta\) 가 \(|θ| < \theta_{\text{thresh}}\) 이내.
(선택) 속도가 너무 세지 않음 — 간단화 위해 생략.

각도 오차 \(\theta \sim N(0, \sigma^2)\) 로 가정.

\(\theta_{\text{thresh}}\) 계산: 공과 홀의 기하:

\[ \theta_{\text{thresh}}(d) = \arctan\left( \frac{R - r}{d} \right) \]

\(R\) = hole radius (2.25 inches).
\(r\) = ball radius (0.84 inches).
\(d\) = 거리.

성공 확률:

\[ p(d) = P(|θ| < \theta_{\text{thresh}}(d)) = 2\Phi\left( \frac{\arctan((R-r)/d)}{\sigma} \right) - 1 \]

3.3 베이즈 적합

모형:

\[ y_i \sim \text{Binomial}(n_i, p(d_i)) \]

\[ p(d_i) = 2\Phi\left( \frac{\arctan((R-r)/d_i)}{\sigma} \right) - 1 \]

Unknown: \(\sigma\) (각도 오차 표준편차).

Prior: \(\sigma \sim \text{HalfNormal}(1°)\) 또는 \(\text{Uniform}(0, \text{10°})\) (weakly informative).

3.4 적합 결과 (Gelman 재분석)

\(\hat\sigma \approx 1.5°\) — professional 의 각도 정밀도.
모든 거리에서 fit 양호.
Extrapolation: 1 ft 이하는 거의 100%, 30 ft 이상은 거의 0%.

3.5 Naive vs Physical 비교

거리	관측 \(\hat p\)	Logistic fit	Physical fit
2	0.933	0.92	0.94
5	0.589	0.58	0.62
10	0.335	0.28	0.34
15	0.168	0.14	0.17
20	0.158	0.08	0.12

Physical 모형:

거리 20 ft 에서 더 잘 fit (logistic 은 과소 예측).
Parameter 개수 1 (logistic 은 2) — 더 parsimonious.
\(\sigma\) 의 물리적 해석 — “프로의 각도 정밀도” 측정.

직관: 물리 유도 의 힘

Naive logistic: 2 parameter 피팅. 데이터에 의해 지배.

Physical model: 1 parameter 피팅. 물리 법칙이 함수 형태를 고정.

두 모형이 같은 데이터에 fit 될 수 있지만:

Physical 이 extrapolation 우수: 1 ft 나 30 ft 데이터 없어도 합리적 예측.
Physical 이 해석 가능: \(\sigma\) = 각도 정밀도 → 코칭에 직접 활용.
Physical 이 parsimonious: Occam’s razor 측면에서 선호.

일반 교훈: 비선형 모델 선택 시 “물리·도메인 기반 수식” > “유연한 함수형” (logistic, polynomial 등).

3.6 (b) Posterior Predictive Check

Test statistics:

각 거리별 expected vs observed:

\[ T_{\text{bin}}(y, p) = \sum_i \frac{(y_i - n_i p(d_i))^2}{n_i p(d_i)(1 - p(d_i))} \]

(Chi-square-like)
Long distances 에서 underestimate 여부: 15 ft 이상의 observed > predicted 개수.

예상 결과: Physical 모형은 \(T\) 의 posterior predictive p-value ≈ 0.3~0.7 (양호). Logistic 은 \(p \approx 0.05\) 이하 (장거리 misfit).

3.7 Python 구현

import numpy as np
import pymc as pm
from scipy.stats import norm as normal

# Table 19.1 data
distance = np.array([2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
tries = np.array([1443, 694, 455, 353, 272, 256, 240, 217, 200, 237, 202, 192, 174, 167, 201, 195, 191, 147, 152])
successes = np.array([1346, 577, 337, 208, 149, 136, 111, 69, 67, 75, 52, 46, 54, 28, 27, 31, 33, 20, 24])

# physical parameters
R = 2.25 / 12  # hole radius in feet
r = 0.84 / 12 / 2  # ball radius (cap fits if center within R - r)

with pm.Model() as golf_model:
    sigma_angle = pm.HalfNormal("sigma_angle", 0.1)  # in radians (~5.7 deg)

    threshold = pm.math.arctan((R - r) / distance)
    p = 2 * 0.5 * (1 + pm.math.erf(threshold / sigma_angle / pm.math.sqrt(2))) - 1

    pm.Binomial("y", n=tries, p=p, observed=successes)

    trace = pm.sample(2000, tune=1500, target_accept=0.95, chains=4)


import arviz as az
print(az.summary(trace, var_names=["sigma_angle"]))
print(f"\nsigma_angle in degrees: {np.rad2deg(trace.posterior['sigma_angle'].mean()):.2f}°")

예상 출력:

             mean     sd  hdi_3%  hdi_97%  r_hat
sigma_angle  0.026  0.001   0.024    0.028   1.00

sigma_angle in degrees: 1.51°

프로 골퍼의 각도 정밀도 ~1.5° — 실제 분석 결과와 일치.

4 Exercise 3 — Ill-posed Exponential Sum

4.1 문제

\[ y_i \sim N(A e^{-\alpha_1 x_i} + B e^{-\alpha_2 x_i}, \sigma^2) \]

\(x_i \sim U[0, 10]\) uniform. 참값 \((A, \alpha_1, B, \alpha_2, \sigma)\).

Prior: \(\log A, \log \alpha_1, \log B, \log \alpha_2 \sim U(-\infty, \infty)\) (flat on log).

4.2 핵심 문제 — Non-identifiability

Parameter swap:

\[ A e^{-\alpha_1 x} + B e^{-\alpha_2 x} = B e^{-\alpha_2 x} + A e^{-\alpha_1 x} \]

즉 \((A, \alpha_1) \leftrightarrow (B, \alpha_2)\) 완전 swap 불변. Posterior 가 2-modal.

4.3 \(\alpha_1 \approx \alpha_2\) 일 때 더 심각

만약 참값에서 \(\alpha_1 \approx \alpha_2\) 이면:

\[ A e^{-\alpha_1 x} + B e^{-\alpha_2 x} \approx (A + B) e^{-\alpha_1 x} \]

두 지수를 하나로 합칠 수 있음 → \(A\) 와 \(B\) 를 개별 식별 불가. \((A+B)\) 와 \(\alpha\) 만 추정 가능.

4.4 (a) Flat Prior 적합

MCMC 결과:

\(n\) 적음 (< 50): 심한 multimodality, \(\hat R\) 수렴 나쁨, 후속 분석 불가.
\(n\) 중간 (50~500): 참값 주변 + swap 에 mass. 평균 추정 편향.
\(n\) 많음 (> 1000): 주로 참값 mode에 집중, swap mode mass 작음.

4.5 (b) \(n\) 에 따른 식별성

실험: 다양한 \(n\) 에서 Bayesian inference 실행. “참값 정확히 회복” 기준으로 필요 \(n\) 추정.

\((A, \alpha_1, B, \alpha_2) = (1, 0.5, 2, 1.5)\) (separable):

\(n = 50\): 구간 \(\pm 30\%\) (부정확).
\(n = 200\): 구간 \(\pm 10\%\).
\(n = 1000\): 구간 \(\pm 3\%\) (정확).

\((A, \alpha_1, B, \alpha_2) = (1, 0.7, 2, 0.8)\) (close exponents):

\(n = 1000\): 여전히 \(\pm 20\%\).
\(n = 10000\): \(\pm 10\%\).

교훈: 지수가 close 할수록 훨씬 많은 데이터 필요.

4.6 Identifiability 해결책

1. Order constraint:

\[ \alpha_1 < \alpha_2 \]

Swap symmetry 제거. \(\psi = \log \alpha_2 - \log \alpha_1 > 0\) 로 reparameterize.

2. Informative prior:

\[ \alpha_1 \sim \text{LogNormal}(\log 0.5, 0.3), \quad \alpha_2 \sim \text{LogNormal}(\log 1.5, 0.3) \]

두 parameter 가 다른 영역에 있다는 사전 정보.

3. 더 많은 데이터 또는 실험 설계 변경:

\(x\) 범위 확장 (긴 꼬리 관측).
Initial condition 변화 (두 exponential 의 amplitude 분리).

4.7 Python 실험

import numpy as np
import pymc as pm

def simulate_and_fit(n, A_true=1, a1_true=0.5, B_true=2, a2_true=1.5, sigma_true=0.1, seed=0):
    rng = np.random.default_rng(seed)
    x = rng.uniform(0, 10, n)
    y = A_true * np.exp(-a1_true * x) + B_true * np.exp(-a2_true * x) + sigma_true * rng.standard_normal(n)

    with pm.Model():
        # order constraint
        log_A = pm.Normal("log_A", 0, 2)
        log_B = pm.Normal("log_B", 0, 2)
        log_a1 = pm.Normal("log_a1", -0.5, 1)
        # enforce alpha_2 > alpha_1
        log_ratio = pm.HalfNormal("log_ratio", 1)
        log_a2 = pm.Deterministic("log_a2", log_a1 + log_ratio)

        mu = pm.math.exp(log_A) * pm.math.exp(-pm.math.exp(log_a1) * x) \
           + pm.math.exp(log_B) * pm.math.exp(-pm.math.exp(log_a2) * x)
        sigma = pm.HalfNormal("sigma", 0.5)

        pm.Normal("y", mu=mu, sigma=sigma, observed=y)
        trace = pm.sample(1500, tune=1500, target_accept=0.95, chains=4, progressbar=False)

    return trace


for n in [50, 200, 1000]:
    print(f"\n=== n = {n} ===")
    trace = simulate_and_fit(n)
    summary = az.summary(trace, var_names=["log_A", "log_B", "log_a1", "log_a2", "sigma"])
    print(summary[["mean", "sd", "hdi_3%", "hdi_97%", "r_hat"]])

예상 결과:

\(n = 50\): divergence 많음, \(\hat R > 1.1\) for some params.
\(n = 200\): 수렴 양호, 참값 대략 복원.
\(n = 1000\): 정확히 복원.

4.8 교훈 — Ill-posed 모델링

근본 원인: Parameter swap symmetry + exponent 유사성.

진단:

Pair-wise posterior scatter 에서 bimodality 확인.
\(\hat R > 1.1\) for swap-symmetric parameters.
Trace plot 에서 chain 별 다른 mode.

대응:

Order constraint (가장 간단).
Informative prior (도메인 지식).
더 많은 데이터 또는 설계 변경.
Reparameterize (sum/difference 형태).

5 Ch.19 시리즈 총결산

5.1 3편 시리즈 지도

[04-19-0 Overview]
    ↓ 3개 절 + Part V 전환
[04-19-1 § 19.1~19.3 심화]
    ↓ Serial dilution + PBPK + 5 필수 요소
[04-19-2 § 19.4 연습] (본편)
    ↓ Mixture prior + Golf + Ill-posed
    ↓ Ch.19 시리즈 완결

5.2 Ch.19 핵심 교훈 요약

비선형 베이즈 모델링의 5 원칙:

도메인 유도: 수식은 물리·화학·생물학에서 나와야 한다 (Ex.2 골프 퍼팅).
Informative prior: 복잡 모델에는 강한 prior 필수 (PBPK 15 parameter).
Reparameterization: 제약 처리와 identifiability 확보 (softmax 식 19.5, order constraint Ex.3).
Identifiability 진단: Prior vs posterior, pair-wise scatter, multi-chain \(\hat R\).
External validation: Independent dataset 으로 모형 한계 노출 (Figure 19.10).

5.3 Ch.19 통합 체크리스트

도메인 수식 유도 명확화.
Parameter 의 물리적 의미 기술.
제약 → reparameterization.
Informative prior 설계 (hierarchical 고려).
HMC/NUTS 또는 Gibbs+Metropolis.
초기값 crude estimate + overdispersed start.
Parameter 상관 강하면 재매개변수화.
\(\hat R, \text{ESS}\) 진단 철저.
Pair-wise posterior scatter 로 multimodality 탐지.
Posterior predictive check.
External validation 가능하면 수행.
Extrapolation 한계 명시.
그래픽 중심 해석 + 결과 보고.

6 Part V 다음 편 예고 — Ch.20 Basis Function Models

Ch.19 완결. 다음 Ch.20 Basis Function Models:

\[ f(x) = \sum_{k=1}^K \beta_k B_k(x) \]

\(B_k\) = basis function (B-spline, wavelet, Fourier).

6.1 Ch.19 vs Ch.20

측면	Ch.19 Parametric Nonlinear	Ch.20 Basis Function
함수 형태	미리 정해진 (4PL, ODE)	유연 (basis 합)
Parameter 해석	물리적 의미 명확	계수 해석 어려움
계산	비선형 최적화·HMC	선형 회귀 (계수)
Prior	Informative (도메인)	Shrinkage (Ridge/LASSO)
외삽	물리 법칙으로 안정	Extrapolation 위험
응용	약동학·assay	곡선 적합·함수 추정

Ch.20 의 이점: 선형 계수 추정 으로 환원 → Ch.14 의 기계 재사용. 비선형성은 \(B_k(x)\) 에 담김.

Ch.21 Gaussian Processes: Ch.20 의 \(K \to \infty\) 무한 차원 일반화.

7 관련 주제

Ch.19 시리즈 전체

Part V 다음

Ch.20 Basis Function Models (예정)
Ch.21 Gaussian Processes (예정)

관련 개념 (cross-category)

8 참고문헌

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.), Ch.19 § 19.4. CRC Press.
Berry, D. A. (1996). Statistics: A Bayesian Perspective. Duxbury Press.
Gelman, A., & Nolan, D. (2002). A Probability Model for Golf Putting. Teaching Statistics, 24, 93-95.
Gelman, A. (2019). Model building and expansion for golf putting. Stan case study — 베이즈 비선형 모델링 실무 튜토리얼.
Donnet, S., & Samson, A. (2013). A Review on Estimation of Stochastic Differential Equations for Pharmacokinetic/Pharmacodynamic Models. Advanced Drug Delivery Reviews, 65, 929-939.
Carvalho, C. M., Polson, N. G., & Scott, J. G. (2010). The Horseshoe Estimator for Sparse Signals. Biometrika, 97, 465-480. (sparse prior 대안)