1 들어가며 — 정책 분석에서 Mixed-Effects Poisson 의 가치

§ 12.4 ~ 12.4.1 (12-3) 와 § 12.4.2 ~ 12.4.3 (12-4) 에서 mixed-effects Poisson + EB 의 추정 framework 를 다뤘다. § 12.5 는 그 framework 의 정책 분석 직접 응용 — 미국 county 자살률 분석 (Gibbons et al. 2005).

한 줄 요약

“§ 12.5 = Gibbons et al. (2005) 의 미국 county 자살률 분석. NCHS + IMS pharmacy 데이터 (1996-1998), county × age × sex × race 의 자살 수에 mixed-effects Poisson 적합. county random effects on intercept + drug 효과 로 county 별 이질성 + 약물 처방 변동 모형화. 핵심 발견: TCA 사용 ↑ → 자살률 ↑ (MLE = +0.20, p < .001), SSRI + non-SSRI 사용 ↑ → 자살률 ↓ (MLE = -0.15, p < .001). Hypothetical 시나리오 — TCA 제거 시 자살률 33% 감소, SSRI/non-SSRI 제거 시 50% 증가. Income 보정 후 효과 감소 but 유의 유지 — confounding 부분적. Mundlak 분해 (§ 4) 로 within vs between-county 효과 분리 — TCA within 비유의, between 강함 → TCA 가 의료 질의 marker (causal 아님). Non-TCA within 강한 유의 → causal 가능성 높음. 정책: 노년 의료에서 TCA → SSRI 전환의 통계적 토대.”

2 자살의 통계적 부담과 정책 동기

자살의 규모

저자 본문 통계 (Goldsmith et al. 2002):

Worldwide: 매년 약 100 만 명 자살.
US: 25 년간 약 75 만 명 (suicide 가 homicide 보다 3:2 이상 많음).
AIDS 와 비교: 자살이 AIDS 사망보다 200,000 명 많음 (지난 20 년).
경제적 비용: 손실 소득만 117 억 달러.
원인: 90% 이상이 정신과 질환 연관.

→ 자살이 숨겨진 공공 보건 위기.

RCT 의 한계 — 자살은 RCT 로 연구 어렵다

자살 예방을 위한 RCT 의 어려움:

낮은 base rate: 자살 발생률이 매우 낮음 (10 만 명당 12 정도).
큰 표본 필요: 통계적 검정력 확보 위해 수만 ~ 수십만 명 필요.
윤리적 문제: 자살 위험 환자에게 placebo 또는 위약 randomize 어려움.
시간 부담: 장기 follow-up 필요.

→ Population-level 관찰 연구가 alternative:

자연 실험 활용 (정책 변화, 약물 도입 등).
County 단위 cluster.
Mixed-effects Poisson 으로 정확한 분석.

직관 — 왜 county 가 자연 cluster 인가

미국 county 가 분석 단위로 적합한 이유:

의료 시스템 변동: county 별 medical infrastructure 가 다름.
정책 차이: state 와 county 별 mental health 정책 차이.
인구 구성: 인구학적 다양성 (urban vs rural).
의료 자원: psychiatrist 분포의 county 별 격차.
약물 처방 패턴: county 별 prescription practice 차이.

이런 county 별 unobserved heterogeneity 가 county random effects 로 흡수.

3 데이터

3.1 NCHS 자살 데이터

데이터 구조

Source: National Center for Health Statistics, 1996-1998.

단위: 미국 county.

Stratification:

Age: 5 그룹 (5-14, 15-24, 25-44, 45-64, 65+).
Sex: Male, Female.
Race: African American (Black), Other.

→ 20 그룹 (5 age × 2 race × 2 sex).

응답 (count):

\[ y_{\text{county}, \text{age}, \text{race}, \text{sex}} = \text{number of suicides in 1996-1998} \]

Population denominator:

\[ P_{\text{county}, \text{age}, \text{race}, \text{sex}} = \text{population in that group} \]

→ Suicide rate = \(y / P\) × 100,000.

직관 — Stratified count data 의 자연성

자살 분석에서 stratification 의 의미:

Age: 자살률이 나이에 매우 의존 (특히 노년층 ↑).
Sex: Male 이 female 보다 약 4 배 높음 (US 통계).
Race: African American 이 other 보다 낮음.

→ 이 인구학적 효과를 통제 안 하면 confounding.

County 별 stratified count:

각 county 의 20 cells.
Cell 별 자살 수 = 0 ~ 수백.
→ Poisson 분포 자연 (rare events of count).

Offset term:

\(\log(P)\) 가 offset (rate 모형).
\(\log(\text{rate}) = \log(y / P) = \log(y) - \log(P)\).
회귀: \(\log(y) = \log(P) + x'\beta\) → offset \(\log(P)\) 고정.

3.2 IMS Pharmacy Data

약물 처방 데이터

Source: IMS database.

Sampling: 미국 36,000 pharmacies 중 20,000 random sample (>50%).

응답: County 별 prescription 수 (1996-1998).

3 antidepressant subclass:

TCAs (Tricyclic antidepressants): 옛 약물, side effect 많음, overdose 시 lethal.
SSRIs (Selective Serotonin Reuptake Inhibitors): citalopram, paroxetine, fluoxetine, fluvoxamine, sertraline.
Other non-SSRI: nefazodone, mirtazepine, buproprion, venlafaxine.

Covariate transformation:

\[ \text{Drug covariate} = \log(\text{pills per person per year}) \]

이유:

1. Population size 보정.
1. Extreme low/high prescription county 의 영향 축소 (log 의 효과).

직관 — Antidepressant subclass 분류의 임상적 근거

TCA 의 특성:

1950s-60s 개발 (옛 약물).
효과적 항우울 작용.
Side effects 많음: anticholinergic, sedation, weight gain.
Overdose lethality: 자살 시도에 사용 가능 (특히 cardiac toxicity).
의료 표준이 낮은 지역에서 여전히 처방.

SSRI 의 특성:

1980s-90s 개발 (현대 약물).
TCA 보다 side effect 적음.
Overdose lethality 낮음: 자살 도구로 부적절.
현대 의료 표준에서 first-line.

→ TCA → SSRI 전환 이 의료 질 향상의 marker.

Other non-SSRI: SSRI 이후 도입 (1990s 후반), TCA 보다 안전.

이 분류가 § 12.5 분석의 핵심 — 단순 “antidepressant” 한 변수가 아니라 subclass 별 분리가 중요.

4 모형 Specification

4.1 Mixed-Effects Poisson

모형 정의

응답: \(y_{ij}\) = county \(i\) 의 stratum (age × sex × race) \(j\) 의 자살 수.

Mixed-effects Poisson (식 12.20 의 응용):

\[ \log(\lambda_{ij}) = \log(P_{ij}) + \beta_{0i} + \beta_{\text{TCA},i} \log(\text{TCA}_{ij}) + \beta_{\text{nonTCA},i} \log(\text{nonTCA}_{ij}) + \alpha^\top z_{ij} \]

표기:

\(\log(P_{ij})\): offset (population).
\(\beta_{0i}\): county 별 random intercept.
\(\beta_{\text{TCA},i}, \beta_{\text{nonTCA},i}\): county 별 random drug effects.
\(z_{ij}\): age × sex × race indicators.
\(\alpha\): 인구학적 fixed effects.

Random effects:

Intercept + 두 drug effects → 3 random effects per county.
이변량 정규 (또는 다변량 정규).

직관 — Random Slopes 의 임상적 의미

Random intercept (\(\beta_{0i}\)):

County 별 baseline 자살률 차이 (인구학적 보정 후).
의료 시스템 quality 의 종합 표현.

Random drug slopes (\(\beta_{\text{TCA},i}, \beta_{\text{nonTCA},i}\)):

“이 county 에서 TCA 사용이 자살률에 미치는 효과의 county-specific 강도”.
County 별 약물 효과의 이질성.

→ County 별 drug effect 가 다를 수 있음을 모형이 허용:

같은 약물이 county 마다 다른 효과 (의료 환경 차이 때문).
Empirical Bayes 로 county-specific drug effect 추정 가능.

Policy decomposition (저자 본문 인용):

“they estimated county-specific changes in suicide rates attributable to changes in antidepressant drug use … The effect of policy changes (e.g., adding or eliminating a particular type of antidepressant medication) was then estimated by accumulating the county-specific estimates over all counties.”

→ County 별 효과 추정 후 합산 → 전국 정책 효과 시뮬레이션.

이것이 mixed-effects 의 가장 강력한 가치 — county 별 정확한 추정 + 전국 집계.

4.2 세 모형 — 점진적 확장

3 모형의 비교 분석

Model 1 — Base:

Fixed: age, sex, race, antidepressant.
Random: county intercept + drug slopes.

Model 2 — + Income:

Model 1 에 county median income 추가.
가설: “Antidepressant effect 가 의료 접근성의 confounding”

Model 3 — Mundlak Decomposition (§ 4 의 within/between 분해):

같은 county 의 시간 가변 covariate 를 두 부분으로 분해:
- Between: county-specific 평균 (county 차이).
- Within: 시간 따라 county 평균에서의 편차 (시간 변화).
\(\log(\text{TCA}_{ij}) = \overline{\log(\text{TCA})}_i + (\log(\text{TCA}_{ij}) - \overline{\log(\text{TCA})}_i)\)
두 항을 별도 covariate.

직관 — Mundlak 분해의 인과 추론 가치

§ 4 의 Mundlak decomposition (Mundlak 1978) 의 핵심:

Between effect: county 가 평균적으로 TCA 많이 쓰면 자살률 어떻게 다른가? (cross-sectional variation).
Within effect: 같은 county 가 시간에 따라 TCA 사용을 늘리면 자살률 어떻게 변하나? (longitudinal variation).

Confounding 측면:

Between effect 는 unobserved county 특성과 confounding 가능 (의료 질, 인구 구성 등).
Within effect 는 같은 county 의 변화 — confounding 통제됨 (county-level fixed effect).

→ Within effect 가 causal effect 에 더 가깝다 (potential outcomes 관점).

저자의 분석 결과 (다음 절):

TCA between: 강하게 유의 (p < .0001) — county 차이.
TCA within: 비유의 (p < .39) — 시간 변화는 무관.
→ TCA 가 marker (의료 질) 일 가능성이 높음, causal 약함.
Non-TCA within: 강하게 유의 (p < .0001) — 시간에 따라 non-TCA 도입한 county 의 자살률 감소.
→ Non-TCA 의 causal 효과 가능성 높음.

이 분해가 정책 결정에 결정적 — 단순 회귀로는 두 효과를 분리 못 함.

5 적합도 — 표 12.1

표 12.1 — 관측 vs 예측 자살률 (요약)

20 stratum 별 비교 (모든 cells):

Group	Number	Population	Observed Rate	Mixed Pred	GEE Pred
5-14, Black, M	79	9.3M	9e-6	10e-6	9e-6
15-24, Other, M	9,482	47.9M	1.98e-4	2.22e-4	1.98e-4
25-44, Other, M	27,209	109.1M	2.49e-4	2.83e-4	2.50e-4
45-64, Other, M	17,358	72.7M	2.39e-4	2.67e-4	2.39e-4
65+, Other, M	14,074	38.9M	3.62e-4	3.98e-4	3.61e-4
65+, Black, F	83	5.1M	1.6e-5	1.4e-5	1.3e-5

전체: 91,673 관측 자살 (1996-1998), 예측 = 90,973 (mixed-effects), 0.76% 차이.

직관 — Mixed vs GEE 의 fit 비교

둘 다 매우 우수한 fit — 0.76% 차이.

핵심 차이 (저자 본문 인용):

“the mixed-effects models provide county-specific covariate adjusted suicide rate estimates whereas the GEE model only provides overall population suicide rates.”

→ Mixed-effects: county 별 추정 가능 (EB), GEE: marginal (population) 추정만.

정책 결정에 mixed-effects 의 우위:

위험 county 식별 (자원 배분).
County 별 hypothetical scenario 시뮬레이션.
Heterogeneous treatment effect 분석.

GEE 는 average effect 만 — 정책 적용에 한계.

Marginal vs Subject-specific 의 trade-off:

GEE: marginal interpretation (전국 평균).
Mixed: subject-specific (county 별, 조건부).
두 framework 모두 유용 — 질문에 따라 선택.

§ 12.5 의 결론: mixed-effects 가 더 풍부한 정보, GEE 는 검증 용도.

패턴의 임상적 해석

표 12.1 의 자살률 패턴:

Age: 증가 (특히 65+ white males 가장 높음, 0.000362).
Sex: Male >> Female (약 4-7 배).
Race: Other > African American (약 2-3 배).

이 패턴이 미국 자살의 표준 인구학적 특징:

노년 백인 남성 — 가장 위험 그룹.
Black female — 가장 낮은 자살률.
사회·문화·역사적 요인 (총기 접근성, 사회적 isolation, 종교적 보호 요인 등).

Mixed-effects 모형이 이 패턴을 정확히 재현 — fixed effects 의 추정 정확.

County random effects 의 추가 기여:

인구학적 효과 보정 후의 county 차이.
“이 county 가 평균보다 높은 자살률” 의 정량화.
정책 개입 우선순위 결정.

6 약물 효과 결과

6.1 전체 antidepressant 효과

첫 모형 — 단일 antidepressant covariate

전체 antidepressant (3 subclass 합):

\[ \log(\lambda_{ij}) = \log(P_{ij}) + \beta_0 + \beta_{\text{AD}} \log(\text{Total AD}) + \alpha^\top z_{ij} + \beta_{0i} \]

결과: \(Z = -1.46, p < .14\) — 비유의.

→ 단순 antidepressant 사용으로는 자살률 변화 안 보임.

직관 — Aggregation 의 정보 손실

전체 antidepressant 한 변수로 묶으면:

TCA (positive 효과 가설) 와 SSRI (negative 효과 가설) 가 합산.
두 반대 효과가 cancel out.
→ 평균 효과 비유의.

이것이 §12 의 핵심 메시지 — subclass 별 분리 분석 필수.

해결: TCA + SSRI/non-SSRI 별도 covariate 로 모형.

6.2 Subclass-specific 효과

두 번째 모형 — Subclass 분리

\[ \log(\lambda_{ij}) = \log(P_{ij}) + \beta_0 + \beta_{\text{TCA}} \log(\text{TCA}) + \beta_{\text{nonTCA}} \log(\text{nonTCA}) + \alpha^\top z_{ij} + \beta_{0i} \]

Multicolinearity 처리: SSRI 와 non-SSRI 의 county 별 사용량 상관 \(r = 0.98\) — 매우 높음. → 합쳐서 “non-TCA” 한 변수.

결과:

TCA: \(\text{MLE} = +0.20, p < .001\) — 양의 강한 효과 (TCA ↑ → 자살률 ↑).
SSRI + non-SSRI: \(\text{MLE} = -0.15, p < .001\) — 음의 강한 효과 (사용 ↑ → 자살률 ↓).

직관 — 두 효과의 비대칭의 임상적 의미

TCA 효과 (\(+0.20\)): TCA 사용이 1 단위 (log scale, 즉 \(e \approx 2.72\) 배) 증가하면 자살률이 \(\exp(0.20) = 1.22\) 배 증가.

SSRI/non-SSRI 효과 (\(-0.15\)): 사용 1 단위 증가하면 자살률이 \(\exp(-0.15) = 0.86\) 배 (14% 감소).

임상적 해석 가설:

TCA → 자살률 ↑: TCA 사용 자체가 자살 일으키지는 않을 수도 있지만, “TCA 처방 = 옛 의료 표준 = 의료 질 낮음” 의 indicator.
SSRI/non-SSRI → 자살률 ↓: 현대 의료의 효과적 우울증 치료.

Causal 해석의 어려움:

약물 처방이 county 별 의료 질·시스템과 confounded.
무작위 배정 불가능 (관찰 데이터).
Mundlak 분해 (다음 절) 가 부분 답 제공.

SSRI vs non-SSRI 의 multicolinearity (r = 0.98):

두 약물이 같은 county 에서 비슷한 시기에 도입.
별도 효과 분리 추정 어려움.
→ 합쳐서 “modern antidepressants” 로 analysis.
이것이 통계 분석의 실무적 결정 (data-driven choice).

6.3 Hypothetical Scenarios

정책 시뮬레이션

기준 통계 (1996-1998):

평균 인구: 248,060,988.
3 년 자살: 91,673.
연간 자살률: 12.32 / 100,000.

Scenario 1 — TCA 사용 제거:

다른 모든 변수 (age, sex, race, SSRI/non-SSRI) 동일.
TCA 만 0 으로.
예측: 10,237 fewer suicides/year, 4.12 / 100,000 감소 (33% 감소).

Scenario 2 — SSRI/non-SSRI 사용 제거:

다른 모든 변수 동일.
SSRI/non-SSRI 만 0 으로.
예측: 15,202 more suicides/year, 6.13 / 100,000 증가 (50% 증가).

직관 — Mixed-effects 가 가능하게 하는 정책 분석

이 hypothetical scenarios 가 가능한 이유:

County-specific drug effects (random slopes): 각 county 의 drug 변경 영향 추정.
County-specific suicide rate (EB intercept): 각 county 의 baseline.
Covariate adjustment: age, sex, race 통제 후 약물 효과만 분리.

각 county 별 drug 변경 → 자살 수 변화 → 전국 합산 = scenario.

임상·정책적 의미:

TCA 제거가 33% 자살 감소 → 매년 10,000+ 명 살림.
SSRI/non-SSRI 제거가 50% 증가 → 매년 15,000+ 명 추가 사망.
→ TCA → SSRI 전환 이 정책적으로 매우 가치 있음.

Causal 해석의 caveat (저자 본문 인용):

“Note that the finding of an association between TCA use and suicide, does not necessarily imply that it is a causal association in which TCA use leads to suicide. TCA use may be an indicator of one of a number of possible problems in the health care delivery system, such as limited access to quality mental health care.”

→ “TCA 사용이 자살을 직접 일으키는가” vs “TCA 사용이 의료 질 낮은 county 의 marker 인가” 의 구분.

이 질문에 대한 답을 위해 income 보정 + Mundlak 분해 진행.

7 Income 보정과 Urban/Rural Distinction

7.1 Income 모형

County median income 추가

가설: “약물 효과가 의료 접근성 (income) 의 confounding”.

모형: 식 위 + median county income 추가.

결과:

Income: \(\text{MLE} = -0.01, p < .001\) — 부유한 county 자살률 ↓.
TCA: \(\text{MLE} = 0.20 \to 0.07, p < .001\) — 효과 감소 but 여전히 유의.
SSRI + non-SSRI: \(\text{MLE} = -0.15 \to -0.04, p < .001\) — 효과 감소 but 여전히 유의.

직관 — 부분적 confounding

Income 보정 후 약물 효과 감소 (TCA: 0.20 → 0.07, non-TCA: -0.15 → -0.04):

약물 효과의 일부가 income (의료 접근성) 으로 설명됨.
그러나 여전히 강하게 유의.
→ Income 외에도 약물 효과의 추가 메커니즘 존재.

가능한 추가 메커니즘:

직접 약리학적 효과 (SSRI 가 정말 자살 위험 감소).
다른 의료 질 indicator (psychiatrist 분포, 진료 빈도 등).
처방 practice 의 quality (TCA 처방이 옛 의료 표준).

완전한 confounding 통제 불가능:

모든 confounder 측정 어려움.
Unobserved confounders 가능.
→ Causal 해석에 신중.

7.2 Figure 12.1-12.2 — Visual Patterns

Figure 12.1 — SSRI/TCA Ratio 와 자살률

X 축: \((\text{SSRI} + \text{non-SSRI}) / \text{TCA}\) 의 decile (10 그룹).

Y 축: 100,000 명 당 자살률.

패턴:

가장 낮은 decile (high TCA): ~15 / 100,000.
가장 높은 decile (low TCA): ~10 / 100,000.
단조 감소 패턴.

Figure 12.2 — SSRI/TCA Ratio 와 인구

X 축: 같은 ratio 의 decile.

Y 축: 인구 (millions).

패턴:

Larger county: SSRI/non-SSRI 우세 (high ratio).
Smaller county: TCA 우세 (low ratio).

→ Urban-rural distinction in prescription practice.

직관 — Rural Penalty 가설

저자 본문 인용:

“Rural areas may be poorer, have less access to psychiatrists and under treatment of depressive illness perhaps through use of subtherapeutic doses of side effect prone older tricyclic antidepressant medications that are more lethal on overdose, elevating suicide rates.”

Rural penalty 의 메커니즘:

빈곤: rural county 가 평균적으로 더 빈곤.
Psychiatrist 부족: rural 에 mental health specialist 적음.
Underdiagnosis: 우울증 인지·진단 부족.
Subtherapeutic dose: 처방 quality 낮음.
TCA 우세: 옛 약물 (overdose lethal) 처방 빈도 ↑.
자살률 ↑: 위 모든 요인의 결합.

정책 함의:

Rural mental health 자원 강화.
Telemedicine 활용.
Primary care provider 의 정신과 교육.
Modern antidepressant 보급.

이 정책 권고가 통계 분석 (mixed-effects Poisson) 에서 직접 도출.

8 Within vs Between-County Decomposition

8.1 Mundlak 분해

식 — Within/Between 분해

\(\log(\text{TCA}_{ij})\) 를 두 부분으로 분리:

\[ \log(\text{TCA}_{ij}) = \overline{\log(\text{TCA})}_i + [\log(\text{TCA}_{ij}) - \overline{\log(\text{TCA})}_i] \]

Between component: \(\overline{\log(\text{TCA})}_i\) = county \(i\) 의 시점 평균. County 차이.
Within component: \(\log(\text{TCA}_{ij}) - \overline{\log(\text{TCA})}_i\) = county 평균에서의 시점 편차. 시간 변화.

두 component 를 별도 covariate.

직관 — Causal Inference 의 핵심

Mundlak (1978) 분해의 정책 분석 가치:

Between effect = “전국 cross-section 에서 county 의 약물 사용 차이가 자살률 차이와 어떻게 연관”

Between confounding 가능 (의료 질, 인구 구성 등).
Causal 해석 약함.

Within effect = “같은 county 가 시간에 따라 약물 사용을 변경하면 자살률이 어떻게 변하나”

County-level fixed characteristics (의료 시스템, 인구 등) 자동 통제.
시간 가변 confounder 만 위협.
→ Causal 해석에 더 강한 증거.

Potential Outcomes 관점:

Between: \(E[Y \mid X = x_1] - E[Y \mid X = x_2]\) (다른 county 의 차이).
Within: \(E[Y_{i,t+1} - Y_{i,t} \mid X_{i,t+1} - X_{i,t}]\) (같은 county 의 변화).
Within 이 fixed effect difference-in-differences 와 비슷한 메커니즘.

§ 4 의 정규 종단 모형에서 본 Mundlak — 정확히 같은 발상의 Poisson 적용.

8.2 분해 결과

세 번째 모형 — Mundlak 분해 결과

효과	Between	Within
TCA	유의 (\(p < .0001\), +)	비유의 (\(p < .39\), +)
Non-TCA	유의 (\(p < .0001\), -)	유의 (\(p < .0001\), -)

직관 — 결과의 임상·정책적 함의

TCA 분석 결과:

Between: 강한 양의 효과 — TCA 많이 쓰는 county 가 평균적으로 자살률 높음.
Within: 비유의 — 같은 county 가 시간에 따라 TCA 사용을 늘려도 자살률 큰 변화 없음.

→ TCA 가 의료 질의 marker (causal 약함):

TCA 처방 자체가 자살 일으키는 것 아님.
TCA 가 옛 의료 표준 (modern antidepressant 부재) 의 indicator.
“Bad medical care → high suicide + high TCA” 의 confounding.

Non-TCA (SSRI + non-SSRI) 분석 결과:

Between: 강한 음의 효과 — SSRI/non-SSRI 많이 쓰는 county 자살률 낮음.
Within: 강한 음의 효과 — 같은 county 가 시간에 따라 SSRI/non-SSRI 도입하면 자살률 감소.

→ Non-TCA 의 causal 가능성 높음:

같은 county 가 시간에 따라 modern antidepressant 도입 → 자살률 감소.
County-level confounding 통제됨.
약리학적 효과 + 의료 표준 향상 의 결합 가능.

저자의 결론 (본문 인용):

“This finding adds further support to the notion that high TCA use in a county may be a marker of poorer access to high quality mental health care and/or poor detection or recognition of mental disorders.”

정책 권고:

Rural mental health 자원 강화 (quality of care 향상).
Modern antidepressant (SSRI 등) 처방 권고: 직접적 causal effect 의 증거.
TCA 처방 감소: marker 역할 그 자체보다 의료 질 향상의 결과.
Geographic equity 정책 (urban-rural gap 축소).

이것이 mixed-effects Poisson + Mundlak 분해의 정책 분석 가치 — 단순 통계 검정 너머 인과 추론에 가까운 결론.

9 § 12.4 framework 의 종합 응용

12-3, 12-4 의 모든 framework 의 결합

§ 12.5 가 보여주는 mixed-effects Poisson 의 정책 분석 가치:

§ 12.4.1 Mixed-effects Poisson 모형 정의 (12-3): County random effects on intercept + drug slopes.
§ 12.4.1 Score 와 Marginal MLE (12-3): County 별 추정 안정.
§ 12.4.2 Empirical Bayes (12-4): County 별 random effect 추정 → hypothetical scenarios.
§ 12.4 framework: Subclass 별 효과 분리 + multicolinearity 처리 + within/between 분해.

→ 통계 framework 가 정책 결정의 직접 도구.

Bayesian 확장 (현대 표준):

brms 또는 Stan 으로 더 풍부한 모형 가능.
Prior 활용으로 작은 county 의 안정 추정.
DIC, LOO 로 모형 비교.
Posterior predictive checks 로 fit 평가.

현재 (2026) 의 자살률 분석:

Gibbons 의 framework 가 표준.
추가 covariate (gun ownership, religion, social determinants).
Spatial random effects (CAR model 등).
Time-varying effects (수술 시간 분석).

Suicide prevention 의 통계적 토대 — § 12.5 의 framework.

10 응용 분야

분야	데이터 구조	Mixed-effects Poisson 의 가치
자살률 (Gibbons 2005)	County × age × sex × race	본 시연
사고 발생	도로 segment × time	도로 별 안전 분석
의료 이용	환자 × time	환자별 utilization rate
공정 결함	공정 × shift	공정 quality control
학생 결석	학생 × school × time	학교 별 정책 효과
보험 청구	Provider × month	Provider quality 평가
동물 행동	개체 × condition	개체 차이 + 조건 효과
자연 재해	지역 × year	지역 별 위험 평가

→ “정책 분석에서 클러스터 + 카운트 + 정확한 추정” 이 필요한 모든 분야.

11 코드 예시

11.1 Step 1: NIMH-like 자살률 데이터 시뮬레이션

library(lme4)


# 미국 county 자살률 데이터 시뮬레이션
set.seed(2026)
n_counties <- 100
n_years <- 3  # 1996-1998

# County 별 random intercept + drug slopes
county_intercept <- rnorm(n_counties, 0, 0.3)
county_tca_slope <- rnorm(n_counties, 0.20, 0.05)
county_nontca_slope <- rnorm(n_counties, -0.15, 0.05)

# 데이터 구성
df <- expand.grid(county = 1:n_counties, year = 1996:1998,
                   age_group = c("5-14", "15-24", "25-44", "45-64", "65+"),
                   sex = c("M", "F"), race = c("Black", "Other"))

# Population (county × age × sex × race 별)
df$population <- round(runif(nrow(df), 1000, 1000000))

# 약물 사용량 (log scale, county 별 고정 + 작은 시간 변동)
df$log_tca <- rep(rnorm(n_counties, 0, 0.5), each = nrow(df) / n_counties) +
              rnorm(nrow(df), 0, 0.05)
df$log_nontca <- rep(rnorm(n_counties, 0, 0.5), each = nrow(df) / n_counties) +
                  rnorm(nrow(df), 0, 0.05)

# 인구학적 효과 (Hedeker 표 12.1 의 패턴 모방)
age_effects <- c("5-14" = -3.0, "15-24" = -1.0, "25-44" = -0.5,
                  "45-64" = -0.5, "65+" = 0.0)
sex_effects <- c("M" = 0.0, "F" = -1.5)
race_effects <- c("Black" = -0.5, "Other" = 0.0)

# log lambda = log(rate * pop) = log(rate) + log(pop)
df$log_rate <- -8 + age_effects[df$age_group] + sex_effects[df$sex] +
               race_effects[df$race] +
               county_intercept[df$county] +
               county_tca_slope[df$county] * df$log_tca +
               county_nontca_slope[df$county] * df$log_nontca

df$expected <- exp(df$log_rate + log(df$population))
df$y <- rpois(nrow(df), df$expected)

cat("전체: ", nrow(df), "관측,", sum(df$y), "자살\n")
cat("연간 자살률:", sum(df$y) / sum(df$population) / n_years * 1e5, "/100,000\n")

시뮬레이션 검증

논문의 핵심 통계 (1996-1998):

91,673 자살, 248M 인구 → 12.32 / 100,000.

시뮬레이션 결과가 비슷한 범위 (10-15 / 100,000) 면 검증 성공.

모형 가정 반영:

TCA 효과: \(\beta = +0.20\) (양수).
SSRI/non-SSRI 효과: \(\beta = -0.15\) (음수).
County 별 random slopes (이질성).

11.2 Step 2: Mixed-Effects Poisson 적합 (R `lme4`)

# Mixed-effects Poisson with county random intercept + drug slopes
fit_mep <- glmer(y ~ age_group + sex + race + log_tca + log_nontca +
                 offset(log(population)) +
                 (1 + log_tca + log_nontca | county),
                 data = df, family = poisson,
                 control = glmerControl(optimizer = "bobyqa"))
summary(fit_mep)

# 약물 효과 회귀 계수 (논문 결과와 비교)
fixef_results <- fixef(fit_mep)
cat("\nTCA 효과 (논문: +0.20):", round(fixef_results["log_tca"], 3), "\n")
cat("non-TCA 효과 (논문: -0.15):", round(fixef_results["log_nontca"], 3), "\n")

# County random effects 추출 (EB)
ranef_county <- ranef(fit_mep)$county
print(head(ranef_county))

R glmer 의 해석

glmer 의 출력:

Fixed effects: age, sex, race, log_tca, log_nontca 의 평균 효과.
Random effects: county 의 SD (intercept + 두 slopes + correlations).
Offset: offset(log(population)) 가 rate 모형 보장.

해석:

log_tca 의 estimate = \(\beta_{\text{TCA}}\) — log scale 의 효과.
\(\exp(\beta_{\text{TCA}})\) = “log_tca 1 단위 증가 시 rate 비율”.

Convergence issue:

Mixed-effects Poisson 은 큰 모형 (3 random slopes) 에서 수렴 어려울 수 있음.
bobyqa optimizer 또는 Nelder_Mead 시도.
또는 random slope 단순화 (intercept 만).

11.3 Step 3: Hypothetical Scenario 시뮬레이션

# Scenario 1: TCA 사용 제거 (모든 county 의 log_tca = -inf 또는 매우 작은 값)
df_no_tca <- df
df_no_tca$log_tca <- log(0.001)  # 거의 0

predicted_no_tca <- predict(fit_mep, newdata = df_no_tca, type = "response")
predicted_actual <- predict(fit_mep, newdata = df, type = "response")

annual_actual <- sum(predicted_actual) / n_years
annual_no_tca <- sum(predicted_no_tca) / n_years

cat("Scenario — TCA 사용 제거:\n")
cat("  Actual: ", round(annual_actual), "suicides/year\n")
cat("  No TCA: ", round(annual_no_tca), "suicides/year\n")
cat("  Difference: ", round(annual_actual - annual_no_tca),
    "(", round(100 * (annual_actual - annual_no_tca) / annual_actual), "% 감소)\n")
# 논문: 33% 감소

# Scenario 2: SSRI/non-SSRI 사용 제거
df_no_nontca <- df
df_no_nontca$log_nontca <- log(0.001)

predicted_no_nontca <- predict(fit_mep, newdata = df_no_nontca, type = "response")
annual_no_nontca <- sum(predicted_no_nontca) / n_years

cat("\nScenario — SSRI/non-SSRI 사용 제거:\n")
cat("  No SSRI/non-SSRI: ", round(annual_no_nontca), "suicides/year\n")
cat("  Difference: ", round(annual_no_nontca - annual_actual),
    "(", round(100 * (annual_no_nontca - annual_actual) / annual_actual), "% 증가)\n")
# 논문: 50% 증가

Hypothetical Scenario 의 정책 가치

이 코드가 § 12.5 의 핵심 정책 분석 — “TCA 제거 시 33% 감소, SSRI/non-SSRI 제거 시 50% 증가” 의 직접 시뮬레이션.

시뮬레이션의 작동 원리:

적합된 모형의 회귀 계수 + random effects 활용.
Covariate 변경 (log_tca = 0 등).
Predicted lambda 계산.
전국 합산.
실제 vs 가상 시나리오 비교.

주의사항:

모형의 functional form (log link) 이 exterpolation 에 영향.
실제 정책 결정에는 sensitivity analysis 필요.
Causal 해석의 caveat (단순 association vs causation).

11.4 Step 4: Mundlak Decomposition (Within vs Between)

library(dplyr)


# 각 county 의 시간 평균 + 시점 편차
df <- df %>%
  group_by(county) %>%
  mutate(
    log_tca_mean = mean(log_tca),
    log_tca_dev = log_tca - log_tca_mean,
    log_nontca_mean = mean(log_nontca),
    log_nontca_dev = log_nontca - log_nontca_mean
  ) %>%
  ungroup()

# Mundlak 모형 — within + between 효과 별도
fit_mundlak <- glmer(y ~ age_group + sex + race +
                     log_tca_mean + log_tca_dev +
                     log_nontca_mean + log_nontca_dev +
                     offset(log(population)) +
                     (1 | county),
                     data = df, family = poisson,
                     control = glmerControl(optimizer = "bobyqa"))
summary(fit_mundlak)

# 결과 해석
fixef_mundlak <- fixef(fit_mundlak)
cat("\nTCA Between (논문: 강하게 유의, 양수):\n")
cat("  log_tca_mean:", round(fixef_mundlak["log_tca_mean"], 3), "\n")
cat("TCA Within (논문: 비유의, p > .39):\n")
cat("  log_tca_dev:", round(fixef_mundlak["log_tca_dev"], 3), "\n")

cat("\nNon-TCA Between (논문: 강하게 유의, 음수):\n")
cat("  log_nontca_mean:", round(fixef_mundlak["log_nontca_mean"], 3), "\n")
cat("Non-TCA Within (논문: 강하게 유의, 음수):\n")
cat("  log_nontca_dev:", round(fixef_mundlak["log_nontca_dev"], 3), "\n")

Mundlak 의 인과 추론 직관

논문의 핵심 발견 (재현 검증):

TCA: between 강함 + within 약함 → marker 역할.
Non-TCA: between + within 모두 강함 → causal 가능.

시뮬레이션 데이터에서 검증:

시뮬레이션이 between effect 만 큰 데이터면 within 작게.
시뮬레이션이 within 도 큰 데이터면 within 큰 결과.
→ 데이터 생성 가정에 따라 결과 다름.

실무 적용:

새 데이터 분석 시 Mundlak 분해 필수.
Within significance → causal 가능성 높음.
Between only significance → confounding 의심.

Bayesian 확장 (brms):

두 component 의 상관 모형화.
Spatial random effects 추가 (CAR model).
Time-varying coefficients.

이 모든 확장이 Gibbons et al. (2005) 의 framework 위에 가능.

12 관련 주제

선행 지식

Ch.12 Overview — Counts GLMM 의 큰 그림
§ 12.4 ~ 12.4.1 — Mixed-effects Poisson 모형 + 추정
§ 12.4.2 ~ 12.4.3 — Empirical Bayes (수십 county 추정)
§ 4 정규 종단 — Mundlak 분해의 원전

후속 주제 (Ch.12 sub-posts)

§ 12.6 — Ch.12 Summary

관련 개념

Gibbons et al. (2005) — 자살률 분석 원전 (American J Psychiatry)
Goldsmith et al. (2002) — 자살의 통계적 부담
Mundlak (1978) — Within/between decomposition (Econometrica)
IMS Health — Pharmacy database
NCHS (National Center for Health Statistics) — Mortality data
DHHS Suicide Prevention — 정책 권고
Ch.4 정규 종단 MRM — Mundlak 분해의 정규 모형 사례
Ch.13 Three-level data — County × age × sex × race 의 다수준 확장
Ch.14 결측 데이터 — NCHS 데이터의 missing pattern
Statistics 포아송 분포 — Poisson 의 토대
Experimentation 인과 추론 — Mundlak 의 causal inference 토대