1 들어가며

본 글의 범위:

§ 14.4 도입 — 왜 MCAR test, \(y_i^O\) 활용의 핵심.
2 시점 케이스 — t-test, 식 14.3-14.4, Ridout (1991) 의 reverse logistic (식 14.5).
다 시점 일반화 — \(h(y_i^O)\) 함수, discrete-time survival (식 14.8-14.9), person-period dataset, clog-log link (식 14.10).
§ 14.4.1 NIMH schizophrenia 예시 — Drug × MeanY interaction 의 결정적 발견.

한 줄 요약

“§ 14.4 = MCAR vs MAR 검정의 표준 절차. 본질: MCAR 는 missingness 가 \(y_i^O\) 와 독립, MAR 는 의존 가능 → \(y_i^O\) 활용 분석으로 검정 가능 (Little 1988, Diggle 1989). 2 시점: \(D_i = 0/1\) 그룹별 \(y_1\) t-test 또는 식 14.3 의 regression. Ridout (1991): reverse — 식 14.5 의 \(D_i\) 의 logistic regression on \(y_{i1}\) + covariates + interactions, MCAR rejected if \(\alpha_1 \neq 0\) or \(\alpha_3 \neq 0\). 다 시점: \(h(y_i^O)\) 함수 (식 14.6 평균, 식 14.7 weighted average), 식 14.8 의 discrete-time hazard model on time to dropout (Allison 1982, Singer-Willett 1993), 식 14.9 의 time-varying covariates, 식 14.10 의 clog-log link → grouped-time proportional hazards (Prentice-Gloeckler 1978). Person-period dataset 으로 표준 logistic 적합. § 14.4.1 NIMH 예시 — Drug × MeanY interaction 결정적 (\(\chi^2_1 = 21.36\), p < .0001): placebo 는 높은 IMPS79 환자가 dropout, drug 는 낮은 IMPS79 환자가 dropout — 그룹마다 반대 방향. Main-only 분석으로는 MeanY n.s. → MCAR 잘못 채택 위험 — interaction 검토의 결정적 중요성.”

2 § 14.4 도입 — 왜 MCAR Test 가 필요한가

2.1 모형 선택을 위한 검정

MCAR 검정의 동기

저자 본문 인용:

“if the missing data are MCAR then analysis using either GEE1 or MRM is fine, provided that the covariate matrix \(X_i\) includes any predictors of missingness. However, if the missing data are MAR, then the GEE1 analysis does not perform well, whereas MRM analysis is acceptable… Therefore, in choosing a method for a given analysis, it is helpful to determine whether MCAR is acceptable or not.”

모형 선택 절차:

MCAR test 수행.
통과 → GEE1 또는 MRM 자유 선택.
실패 → MRM/CPM (full likelihood) 사용 + WGEE.

직관 — 왜 MCAR vs MAR 가 검정 가능한가

저자 본문 인용:

“the essential distinction between MCAR and MAR is that missingness cannot depend on observed values of the dependent variable, \(y_i^O\), in the former, but can in the latter. Tests of whether MCAR is reasonable or not can therefore be based on analyses involving \(y_i^O\).”

검정 가능성의 논리:

MCAR: \(R \perp y^O\) → \(y^O\) 와 \(R\) 의 관계 = 0.
MAR: \(R\) 가 \(y^O\) 에 의존 가능 → \(y^O\) 와 \(R\) 의 관계 ≠ 0.
→ 자료에 있는 \(y^O\) 만으로 검정 가능.

MAR vs MNAR 와의 비교:

MAR vs MNAR 차이: \(y^M\) 에의 의존성 — 자료에 없음 → 검정 불가능.
MCAR vs MAR 차이: \(y^O\) 에의 의존성 — 자료에 있음 → 검정 가능.

검정의 두 핵심 출처:

Little (1988): MCAR test 의 표준 (JASA).
Diggle (1989): longitudinal context 에서 적용.

검정의 일반 원칙:

결측 indicator \(D_i\) 를 outcome 으로.
관측된 \(y\) (여러 형태) 를 predictor 로.
관측된 \(y\) 의 효과가 significant → MCAR rejected.

§ 14.3 의 simulation 결과 인용:

MCAR 데이터: GEE1, MRM 모두 unbiased (그러나 \(X\) 에 missingness predictor 포함 필요).
MAR 데이터: MRM 만 unbiased (V 정확 시), GEE1 biased.
→ MCAR 통과 시 GEE1 사용 가능, 실패 시 MRM 권고.

3 2 시점 케이스의 MCAR Test

3.1 단순한 t-test 접근

2 시점 시나리오의 가장 단순한 검정

저자 본문 인용:

“suppose that all subjects have data at time 1, but some are missing at time 2. Define the variable \(D_i = 0\) for subjects with data at both timepoints and \(D_i = 1\) for those that only have data at the first timepoint.”

시나리오:

시점 1: 모든 환자 관측.
시점 2: 일부 환자 dropout.
\(D_i = 1\) (dropout) vs \(D_i = 0\) (완료).

가장 단순한 검정:

\(D_i = 0\) 그룹의 \(y_1\) 평균 vs \(D_i = 1\) 그룹의 \(y_1\) 평균 → t-test.
MCAR: 두 그룹 평균 같음 (귀무가설).
p < 0.05 → MCAR rejected.

직관 — t-test 의 의미

MCAR 의 함의:

“결측 환자와 완료 환자의 baseline 분포가 같다”.
완료 환자의 baseline 평균과 dropout 환자의 baseline 평균에 차이 없어야.

예시:

우울증 RCT, 10 주 follow-up.
시점 1 (baseline): 모두 측정.
시점 10: 일부 dropout.
검정: dropout 환자의 baseline HAM-D vs 완료 환자의 baseline HAM-D → t-test.

해석:

p > .05: baseline 차이 없음 → MCAR 가능성.
p < .05: dropout 환자가 더 심함 (또는 가벼움) → MCAR 위반.

한계:

baseline (시점 1) 만 사용 — 시점 1 이후 관측 정보 활용 못함.
Covariate 효과 분리 못함 — 단순 비교만.
→ 식 14.3-14.4 로 확장.

3.2 식 (14.3) — Covariate Adjusted Regression

\(y_{i1}\) 의 Regression on \(D_i\) + Covariates

저자 본문 인용 (식 14.3):

\[y_{i1} = \beta_0 + \beta_1 D_i + \beta_2 x_i + \varepsilon_i\]

모수 의미:

\(\beta_1\): dropout 효과 (\(D_i = 1\) vs \(0\) 의 baseline 차이, covariate 보정 후).
\(\beta_2\): covariate 효과.

MCAR 검정:

귀무가설: \(\beta_1 = 0\).
t-test 또는 LR test 로 검정.

직관 — 왜 Covariate 보정 필요

Covariate 누락의 위험:

단순 t-test: dropout vs 완료 환자의 baseline 차이가 covariate 차이로 인한 것일 수도.
예: 노년 환자가 dropout 많음 + 노년이 baseline HAM-D 더 높음.
→ 가짜 association.

식 14.3 의 가치:

Sex, age, baseline severity 등 보정.
보정 후에도 \(\beta_1 \neq 0\) → MCAR 위반.

Covariate-dependent MCAR 와의 관계:

MCAR 가 covariate 의존 허용 (Little 1995): \(R \perp y \mid X\).
즉 \(X\) 보정 후 \(R\) 와 \(y\) 가 독립이어야.
→ 식 14.3 의 \(\beta_1 = 0\) 검정 = covariate-dependent MCAR 검정.

\(\beta_2\) 의 별도 의미:

\(\beta_2\) 는 covariate 와 \(y_1\) 의 관계 (관심사 아님).
검정의 핵심은 \(\beta_1\).

3.3 식 (14.4) — Interaction 추가

\(D_i \times x_i\) Interaction

저자 본문 인용 (식 14.4):

\[y_{i1} = \beta_0 + \beta_1 D_i + \beta_2 x_i + \beta_3 (D_i \times x_i) + \varepsilon_i\]

MCAR 검정:

귀무가설: \(\beta_1 = 0\) AND \(\beta_3 = 0\).
두 모수 동시 검정 (joint test).

직관 — Interaction 이 결정적인 이유

Interaction 의 의미:

\(\beta_3 \neq 0\): dropout 효과가 covariate 수준 따라 다름.
예: 노인 환자에서는 dropout 환자가 baseline 더 심함, 청년 환자에서는 같음.
→ covariate 수준별로 dropout 메커니즘 다름.

왜 Main effect 만 보면 위험:

\(\beta_1 = 0\) 이지만 \(\beta_3 \neq 0\) 가능.
평균적으로는 dropout vs 완료 차이 없지만, subgroup 별로는 큰 차이.
→ MCAR 가정 위반 (subgroup-specific).

14.4.1 NIMH 예시 의 핵심:

Drug × MeanY interaction 이 결정적.
Main effect 만 보면 MeanY 효과 n.s. → MCAR 채택할 뻔.
Interaction 추가 → 그룹마다 반대 방향 → MCAR rejected.

실무 권고:

주요 covariate (treatment group, 임상 변수) 와의 interaction 항상 검토.
Main-only 모형으로 MCAR 결론 내리지 말 것.

4 Ridout (1991) — Reverse Logistic Regression

4.1 식 (14.5) — Logistic of \(D_i\) on \(y_{i1}\)

Reverse Perspective

저자 본문 인용:

“As noted by Ridout [1991], it is beneficial to turn this question around and to specifically model dropout in terms of a logistic regression.”

식 14.5:

\[\log\left[\frac{P(D_i = 1)}{1 - P(D_i = 1)}\right] = \alpha_0 + \alpha_1 y_{i1} + \alpha_2 x_i + \alpha_3 (y_{i1} \times x_i)\]

MCAR 검정:

귀무가설: \(\alpha_1 = 0\) AND \(\alpha_3 = 0\).

직관 — 왜 Reverse 가 더 자연스러운가

식 14.3 vs 식 14.5 의 비교:

식 14.3: \(y_{i1}\) 을 outcome (continuous), \(D_i\) 를 predictor (binary).
식 14.5: \(D_i\) 를 outcome (binary), \(y_{i1}\) 을 predictor (continuous).

왜 Ridout 가 reverse 권고:

인과 방향 자연: 결측의 원인 분석.
- “왜 결측 발생?” → \(D_i\) 가 outcome.
- \(y\) 또는 covariate 가 dropout 결정 → predictor.
다 시점 확장 자연: 식 14.8 의 survival model 로 자연 확장.
- \(D_i \in \{1, \ldots, n+1\}\) → discrete-time hazard.
- Time-varying covariates 처리 자연.
표준 방법 활용: logistic regression 표준 software.
- SAS PROC LOGISTIC, R glm.
- Person-period dataset 변환 후 표준 절차.

임상 의미:

\(\alpha_1 > 0\): 높은 \(y\) 값 환자가 dropout 많음.
\(\alpha_1 < 0\): 낮은 \(y\) 값 환자가 dropout 많음.
\(\alpha_3 \neq 0\): 효과가 covariate 별로 다름 (예: 그룹별 반대 방향).

식 14.3 vs 식 14.5 의 통계적 동치성:

둘 다 \(y\) 와 \(D\) 의 association 검정.
결과 (p-value) 는 비슷.
그러나 식 14.5 가 다 시점 확장에 자연.

5 다 시점 일반화

5.1 식 (14.6) — 관측 평균 \(\bar{y}_i\)

\(h(y_i^O)\) 함수

다 시점에서 \(y_{i1}\) 대신 관측값의 함수 \(h(y_i^O)\) 사용.

가장 단순: 평균 (식 14.6):

\[\bar{y}_i = \frac{1}{n_i} \sum_{j=1}^{n_i} y_{ij}\]

여기서 \(n_i\) 는 환자 \(i\) 의 관측 시점 수.

직관 — 왜 평균이 좋은 시작점

평균의 장점:

단순.
Robust (outlier 영향 작음).
임상적 의미 명확 (“평균적 호전 정도”).

한계:

시간 정보 손실 (trajectory 무시).
마지막 관측치보다 baseline 영향 큼 (시점 수 같으면 weighting 같음).
→ 식 14.7 의 weighted average 로 보완.

Maxweek 같은 simpler approach:

\(D_i\) = subject 의 마지막 관측 시점.
\(y_{i, D_i}\) = 마지막 관측치.
→ “직전 관측치 → dropout 결정” 가설 검정.

5.2 식 (14.7) — Weighted Average

일반화된 함수

저자 본문 인용 (식 14.7):

\[h(y_i^O) = \sum_{j=1}^{n_i} w_j y_{ij}\]

\(w_j\) 선택의 가능성:

\(w_j = 1/n_i\): 단순 평균 (식 14.6).
\(w_j = 1\) for last only, 0 otherwise: 마지막 관측치만.
\(w_j = -1, +1\) for first and last: linear trend (or first-last difference).
\(w_j = j\): time-weighted (later observations matter more).

직관 — 다양한 \(h\) 함수의 가치

저자 본문 인용:

“Depending on the the specifications for \(w_j\), this allows a linear trend across the observed timepoints, a difference between the first and last observed timepoints, a difference between the second to last and last timepoint, etc.”

예시 — 5 시점 RCT:

\(h\) 함수	\(w\)	의미
평균	\((1/5, 1/5, 1/5, 1/5, 1/5)\)	전반적 수준
마지막값	\((0, 0, 0, 0, 1)\)	최근 상태
Trend	\((-2, -1, 0, 1, 2)/10\)	변화 추세
First-last	\((-1, 0, 0, 0, 1)\)	절대 변화
Slope (last-1)	\((0, 0, 0, -1, 1)\)	최근 변화

여러 \(h\) 함께 검정:

여러 \(h\) 함수를 모두 시도 → 어떤 측면이 dropout 과 연관?
예: 평균은 n.s., trend 는 significant → “호전 환자가 dropout”
→ MAR 메커니즘 식별.

임상적 권고:

Default: 누적 평균 또는 마지막 관측치.
임상 가설 따라 선택.
여러 형태 비교 → 결측 메커니즘 이해 깊이.

5.3 식 (14.8) — Discrete-Time Survival Model

Time to Dropout 의 Survival Model

저자 본문 인용 (식 14.8):

\[\log\left[\frac{P(D_i = j \mid D_i \geq j)}{1 - P(D_i = j \mid D_i \geq j)}\right] = \alpha_{0j} + \alpha_1 h(y_i^O) + \alpha_2 x_i + \alpha_3 (h(y_i^O) \times x_i)\]

구성:

\(D_i = j'\) if dropout between \((j'-1)\) and \(j'\).
\(D_i = n + 1\): completers.
\(\alpha_{0j}\): 시점 별 baseline hazard (logit).
우변의 다른 모수: predictor 효과.

MCAR 검정: \(\alpha_1 = 0\) AND \(\alpha_3 = 0\).

직관 — 왜 Discrete-Time Survival 가 자연스러운가

Survival framework 의 발상:

각 시점에서 “이번에 dropout 할 확률” 모형 (hazard).
Conditional on “아직 dropout 안 했음” (at risk).
→ discrete-time hazard.

Allison (1982), Singer-Willett (1993) 표준 방법:

Person-period dataset 변환.
표준 logistic regression 적합.
→ 표준 software 활용.

§ 10.2.3 의 ordinal survival 과 동일:

본 chapter 에서 Hedeker-Mermelstein 의 cumulative ordinal 과 같은 framework.
Time to dropout = censored survival data.

Intermittent missing 처리:

저자 본문 인용:

“we are ignoring intermittent missingness here and concentrating on whether time to dropout is MCAR or not. This simplification of missingness is reasonable to the extent that intermittent missingness is MCAR, but dropout is potentially not.”

Intermittent: 보통 random (MCAR 가정 OK).
Dropout: 임상 의미 큼 (MCAR 위반 가능성 ↑).
→ Dropout 검정에 집중.

시점 별 baseline hazard \(\alpha_{0j}\):

각 시점마다 다른 절편.
시점 1 dropout 확률 ≠ 시점 5 dropout 확률.
→ 자연스러운 시간 효과 처리.

5.4 식 (14.9) — Time-Varying Covariates

Time-Varying Predictors

저자 본문 인용 (식 14.9):

\[\log\left[\frac{P(D_i = j \mid D_i \geq j)}{1 - P(D_i = j \mid D_i \geq j)}\right] = \alpha_{0j} + \alpha_1 h(y_{ij}^O) + \alpha_2 x_{ij} + \alpha_3 (h(y_{ij}^O) \times x_{ij})\]

차이점:

\(h(y_{ij}^O)\): 시점 \(j\) 까지의 관측 정보 (변동).
\(x_{ij}\): 시점 \(j\) 에서의 covariate (변동).

예시 (저자 명시):

\(h(y_{i1}^O) = y_{i1}\).
\(h(y_{i2}^O) = (y_{i1} + y_{i2})/2\).
\(h(y_{i3}^O) = (y_{i1} + y_{i2} + y_{i3})/3\).

직관 — Cumulative Mean 의 자연스러움

Cumulative average:

매 시점에서 “지금까지의 평균 호전 정도”.
환자의 trajectory 정보 자연 누적.
→ 더 정확한 결측 메커니즘 식별.

Time-varying covariate 예시:

\(x_{ij}\) = 시점 \(j\) 의 stress level (자기 보고).
또는 Demirtas-Schafer (2003) 의 attendance 의향.
→ 동적 변화 capture.

Stress 효과:

\(\alpha_2 \neq 0\): stress 가 dropout 예측.
임상적으로 설명 가능 변수 → MAR 가정 만족 가능.

모형 적합:

Person-period dataset (각 환자 × 시점 = row).
표준 logistic regression.
차이는 covariate 가 시점별로 변동.

6 Person-Period Dataset

6.1 Singer-Willett (2003) 의 표준 방법

데이터 구조 변환

저자 본문 인용:

“the dataset needs to be created as a ‘person-period dataset’ [Singer and Willett, 2003]. This is described in Section 10.2.3, here we will go into more specific details. Essentially, in this type of dataset, each person contains as many records as the number of timepoints (or periods) that they are at risk of dropping out.”

구조:

각 환자 × at-risk period 별 row.
\(y_{ij}\) = 1 if dropout at period \(j\), else 0.
Dropout 후에는 row 없음 (more at-risk).

직관 — Table 14.4 의 4 환자 예시

저자 Table 14.4 (4 시점, 4 환자):

ID	\(D_i\)	Period	\(y_{ij}\)
101	1	1	1
102	2	1	0
102	2	2	1
103	3	1	0
103	3	2	0
103	3	3	1
104	(4 = 완료)	1	0
104	4	2	0
104	4	3	0

해석:

환자 101: period 1 dropout — 1 row, \(y_{i1} = 1\).
환자 102: period 2 dropout — 2 rows, last \(y = 1\).
환자 103: period 3 dropout — 3 rows, last \(y = 1\).
환자 104: 완료 — 3 rows, all \(y = 0\).

규칙:

Period 0 (baseline) 는 없음 (모두 관측 → at-risk 아님).
Dropout 시점에 \(y = 1\) → “이 시점에 dropout 발생”.
이후 row 없음 (더 이상 at-risk 아님).

Person-period dataset 의 가치:

표준 logistic regression 적합 가능.
Time-varying covariate 자연 처리.
Hazard rate 직접 추정.

Table 14.5 의 covariate 추가:

Sex (time-invariant): 모든 row 같음.
Stress (time-varying): 시점별 변동.
\(y\) Average (time-varying): 누적 평균 (식 14.9 의 \(h(y_{ij}^O)\)).

6.2 식 (14.10) — Clog-Log Link

Grouped-Time Proportional Hazards

저자 본문 인용 (식 14.10):

\[\log(-\log(1 - P(D_i = j \mid D_i \geq j))) = \alpha_{0j} + \alpha_1 h(y_{ij}^O) + \alpha_2 x_{ij} + \alpha_3 (h(y_{ij}^O) \times x_{ij})\]

Logit vs Clog-Log:

Logit (식 14.8): standard logistic, no specific survival interpretation.
Clog-Log (식 14.10): grouped-time proportional hazards (Prentice-Gloeckler 1978; Hedeker et al. 2000).

Clog-log 의 advantage:

Cox proportional hazards 의 grouped-time analog.
Hazard ratio 해석 가능.

직관 — Logit vs Clog-Log

Clog-log link 의 의미:

\(\log(-\log(1 - p))\) 가 linear in \(\alpha\).
\(1 - p = \exp(-\exp(\alpha))\) — Gompertz hazard 형태.
연속 시간 PH 모형의 grouped 버전.

해석의 차이:

Logit: \(\exp(\alpha)\) = odds ratio of dropout.
Clog-log: \(\exp(\alpha)\) = hazard ratio of dropout.

어느 것 선택:

저자 본문 인용:

“In practice, it often doesn’t matter greatly if the logit or clog-clog link is selected for this purpose, see Singer and Willett [2003] for more discussion on this issue.”

검정 결과는 보통 비슷.
Hazard ratio 해석이 임상 친화 → clog-log 선호 가능.
Hedeker et al. (2000) 의 § 10.2.3 ordinal survival 과 일관성.

§ 10.2.3 와 연결:

본 chapter 의 § 10.2.3 의 cumulative ordinal proportional hazards 같은 framework.
Discrete-time survival = ordinal cumulative model.

7 § 14.4.1 — NIMH Schizophrenia Example

7.1 데이터 Setup

NIMH Schizophrenia Study (Ch.9 와 동일)

저자 본문 인용:

“Consider the schizophrenia study described previously in Chapter 9. In that study, subjects were measured at a baseline timepoint and weekly for up to 6 weeks. The study protocol specified the primary measurement weeks as 0 (baseline), 1, 3, and 6; however, some subjects were also measured at weeks 2, 4, and 5.”

데이터:

437 환자 (108 placebo, 329 drug).
Outcome: IMPS79 (정신분열증 심각도 척도).
시점: week 0 (baseline) + weekly (1-6).
Dropout: 102 of 437 (23.3%).
결측 패턴: dropout 위주, 일부 intermittent.

Maxweek 표기

Maxweek 정의:

Maxweek = 환자의 마지막 관측 주.
\(D_i\) = Maxweek (식 14.8 의 표기).

값의 의미:

Maxweek = 1: 1 주차에 마지막 측정.
Maxweek = 6: 완료 (6 주차 측정).

가능한 dropout 시점:

Week 1, 2, 3, 4, 5 (5 가지).
Maxweek = 6 → completer (D_i = n+1 같은 표기).

7.2 Table 14.6 — Drug × Maxweek Crosstab

Drug 별 Dropout Pattern

저자 본문 명시 (Table 14.6):

Drug	Week 1	Week 2	Week 3	Week 4	Week 5	Week 6 (완료)	Total
Placebo	13 (.12)	5 (.05)	16 (.15)	2 (.02)	2 (.02)	70 (.65)	108
Drug	24 (.07)	5 (.02)	26 (.08)	3 (.01)	6 (.02)	265 (.81)	329

핵심 관찰:

Placebo 의 65% 만 완료 (35% dropout).
Drug 의 81% 완료 (19% dropout).
→ Drug 그룹 retention 더 좋음.

검정 결과 (저자 명시):

Pearson \(\chi^2\): \(p < .025\) → drug 별 dropout 다름.
Mantel-Haenszel trend \(\chi^2\): \(p < .0013\) → trend 강함.

결론: dropout 이 covariate (drug) 에 의존 — covariate-dependent.

직관 — 임상적 해석

Placebo 의 높은 dropout:

효과 없음 → 환자가 더 이상 약 안 받음.
또는 부작용 (placebo 도 placebo effect side effects).
또는 연구 참여 동기 ↓.

Drug 의 낮은 dropout:

효과 있음 → 환자가 약 계속 받음.
또는 강제 protocol (clinical trial).

Covariate-dependent 의 의미:

단순 MCAR 가정 (random) 위반.
그러나 Drug 가 모형에 포함되면 covariate-dependent MCAR 가정 만족 가능.
→ MAR (관측 outcome 의존) 까지 갈지 추가 검정 필요.

다음 단계:

Drug 만으로 부족 → MeanY (관측 IMPS79 평균) 도 검정.
식 14.8 의 discrete-time survival 적합.

7.3 Table 14.7 — Sequential Model Selection

모형 선택 절차

저자 본문 명시 (Table 14.7):

Covariates	\(p\) (모수 수)	Deviance
Week + Drug + MeanY	7	729.44
+ Week × Drug	11	728.13
+ Drug × MeanY	12	706.77
+ Week × MeanY	16	700.50
+ Week × Drug × MeanY	20	697.71

핵심 발견:

Drug × MeanY interaction 추가: deviance 728.13 → 706.77.
\(\chi^2_1 = 21.36\), \(p < .0001\) → highly significant.
다른 interaction 은 n.s.

직관 — Drug × MeanY 의 결정적 의미

모형 비교의 의미:

Main effects 만 (deviance 729.44) → Drug 의 평균적 effect.
Drug × MeanY 추가 (deviance 706.77) → Drug 별 MeanY effect 다름.

Drug × MeanY 의 임상 해석:

Placebo: MeanY 높을수록 dropout 많음 (한 방향).
Drug: MeanY 낮을수록 dropout 많음 (반대 방향).
→ 그룹마다 반대 방향 → 평균하면 효과 사라짐 (n.s.).

왜 Main-only 분석이 위험:

Main effect 만 보면 → MeanY 효과 = 평균 (양방향 cancellation).
작은 효과 또는 n.s. → MCAR 채택 가능성.
→ 잘못된 결론.

Interaction 의 결정적 가치:

Subgroup-specific 메커니즘 식별.
MAR 가능성을 정확히 식별.
→ 결측 메커니즘의 정확한 이해.

7.4 Table 14.8 — Final Model Results

최종 Discrete-Time Survival Model

저자 본문 명시 (Table 14.8):

Term	Estimate	SE	\(p\)
Intercept	-6.573	1.208	.0001
Week 1	1.327	.393	.0007
Week 2	0.096	.476	.84
Week 3	1.549	.386	.0001
Week 4	-0.494	.570	.39
Drug	4.765	1.297	.0002
MeanY	0.635	.214	.003
Drug × MeanY	-1.108	.249	.0001

직관 — 결과의 정밀 해석

Placebo 그룹의 MeanY 효과 (\(\alpha_1 = 0.635\)):

\(\exp(0.635) = 1.89\).
“MeanY 1 단위 증가 → dropout hazard 1.89 배”.
→ placebo 환자: 높은 IMPS79 (심한 환자) 가 dropout 많음.

Drug 그룹의 MeanY 효과:

\(0.635 - 1.108 = -0.473\).
\(\exp(-0.473) = 0.623\), inverted = \(1/0.623 = 1.60\).
“MeanY 1 단위 감소 → dropout hazard 1.60 배”.
→ drug 환자: 낮은 IMPS79 (호전 환자) 가 dropout 많음.
SE = .131, \(p = .0003\) → highly significant.

임상 해석:

Placebo: “약 효과 없음 → 심한 환자가 다른 치료 받으러 dropout”.
Drug: “약 효과 → 호전 환자가 더 이상 진료 필요 없어 dropout”.
→ 그룹마다 반대 방향의 dropout 메커니즘.

§ 14.3.2 의 MAR(b) 시뮬레이션과 정확히 일치:

시뮬레이션의 MAR(b): group 0 (높은 값 dropout), group 1 (낮은 값 dropout).
NIMH 데이터의 실제 패턴: placebo (높은 IMPS79 dropout), drug (낮은 IMPS79 dropout).
→ MAR(b) 시뮬레이션이 임상 현실 정확히 반영.

MCAR 검정 결과:

\(\alpha_1 = 0\) AND \(\alpha_3 = 0\) → 강하게 거부.
→ MCAR rejected, MAR (or MNAR) 채택.

모형 선택 시사:

GEE1 사용 시 biased (시뮬레이션에서 보았듯).
→ MRM/CPM (full likelihood) 사용 권고.
또는 WGEE (MAR 처리 GEE).

7.5 Main-Only Analysis 의 위험성

Interaction 무시 시 잘못된 결론

저자 본문 인용:

“if one excludes Drug × MeanY from the model (i.e., the main effects model with deviance 729.44), the effect of MeanY is estimated to be -.147 and is not significant (\(p = .18\)). Thus, based on this main effects analysis, one might conclude that MCAR is reasonable!”

Main-only model:

MeanY 효과: \(-0.147\) (n.s., \(p = .18\)).
→ “MeanY 와 dropout 관련 없음” → MCAR 채택할 뻔.

왜 잘못:

Placebo 와 drug 의 반대 방향 효과가 평균에서 cancel.
전체 평균으로는 n.s., 그러나 subgroup 별로는 큰 효과.
→ MCAR 잘못 채택 → GEE1 사용 → biased.

직관 — 검정 시 Interaction 의 결정적 중요성

저자 본문 인용:

“Of course, in this model, the effect of MeanY is the averaged effect across both groups, and the effect clearly varies across groups. This illustrates that the examination of interactions is important in the testing of MCAR.”

실무 권고:

단순한 main-effects MCAR 검정 의존 안 함:
- MeanY (또는 \(h(y^O)\)) 의 main effect 만 검정 → 위험.
- Subgroup-specific 메커니즘 놓침.
주요 covariate 와의 interaction 항상 포함:
- Treatment group × \(y^O\).
- Sex × \(y^O\), Age × \(y^O\).
- Baseline severity × \(y^O\).
Sequential model selection (Table 14.7):
- Main-only → main + 2-way interaction → 3-way.
- Likelihood ratio test 로 단계별 평가.
임상적 가설 반영:
- Drug 마다 dropout 메커니즘 다른가?
- Sex 별 다른가?
- 임상 지식으로 가설 설정.

§ 14.4 의 종합 메시지:

MCAR 검정 = 모형 선택의 결정적 단계.
Interaction 검토 없이 결론 X.
NIMH 예시: 그룹마다 반대 방향 → MCAR rejected.
→ MRM/CPM 권고.
→ 후속 § 14.5 의 nonignorable 분석 정당화 (실제로는 MNAR 가능성도).

8 응용 분야

분야	\(h(y^O)\) 후보	주요 covariate interaction
임상시험 (RCT)	누적 평균, 마지막 관측	Treatment × MeanY
항암제 long-term	마지막 종양 size	Stage × tumor
우울증 RCT	누적 HAM-D 평균	Drug × HAM-D
만성통증	VAS 평균	Pain medication × VAS
학교 종단	성적 trend	School type × grade
Survey panel	응답 frequency	Demographics × response

9 코드 예시

9.1 Step 1: Person-Period Dataset 생성

library(dplyr)
library(tidyr)


# 시뮬레이션 데이터 (long format) → person-period
# Maxweek = 환자의 마지막 관측 주
df_long <- data.frame(
  subject = c(101, 101, 102, 102, 102, 103, 103, 103, 103,
              104, 104, 104, 104, 104, 104),
  week = c(0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5),
  y = rnorm(15, 50, 10)
)

# Maxweek 계산
maxweek <- df_long %>%
  group_by(subject) %>%
  summarise(maxweek = max(week))

# Person-period dataset (1 ~ Maxweek)
# Each row: 환자가 특정 week 에 dropout 했는지
n_periods <- 5  # weeks 1-5

pp_data <- maxweek %>%
  rowwise() %>%
  mutate(periods = list(1:min(maxweek + 1, n_periods))) %>%
  unnest(periods) %>%
  rename(period = periods) %>%
  mutate(dropout = ifelse(period == maxweek + 1 & maxweek < n_periods, 1, 0))

head(pp_data, 15)

Person-Period 의 직관

환자 별 row 수:

환자 101 (Maxweek = 1): 1 row (period 1, dropout = 1).
환자 102 (Maxweek = 2): 2 rows (period 1, 2 — period 2 에 dropout).
환자 104 (Maxweek = 5, 완료): 5 rows (모두 dropout = 0).

Long format 의 가치:

표준 logistic regression 적합.
Time-varying covariate 자연 처리.
Survival framework 자연.

9.2 Step 2: \(h(y_{ij}^O)\) 계산 — 누적 평균

# 누적 평균 계산 (식 14.9 의 h(y_ij^O))
df_long_with_h <- df_long %>%
  arrange(subject, week) %>%
  group_by(subject) %>%
  mutate(meany_cumulative = cummean(y))

# Person-period 와 join
pp_with_h <- pp_data %>%
  left_join(
    df_long_with_h %>%
      mutate(period = week + 1) %>%  # week 0 → period 1
      select(subject, period, meany_cumulative),
    by = c("subject", "period")
  )

head(pp_with_h, 15)

누적 평균의 시간 변동

환자 102 의 예:

Week 0: \(y = 48\) → MeanY = 48.
Week 1: \(y = 52\) → MeanY = 50.
→ Person-period 에 시점별 다른 MeanY.

이 시간 변동이 식 14.9 의 핵심:

매 시점에서 dropout 결정 시 “지금까지의 평균”.
→ 누적 정보 활용.

9.3 Step 3: Discrete-Time Survival Model 적합

library(survival)

# 시뮬레이션 데이터로 식 14.8/14.10 의 적합
# 가정: drug = 0/1 추가
pp_with_h$drug <- ifelse(pp_with_h$subject %% 2 == 0, 0, 1)

# Logit link (식 14.8)
fit_logit <- glm(dropout ~ factor(period) + drug + meany_cumulative +
                          drug:meany_cumulative,
                 data = pp_with_h, family = binomial(link = "logit"))
summary(fit_logit)

# Clog-log link (식 14.10) — grouped-time PH
fit_cloglog <- glm(dropout ~ factor(period) + drug + meany_cumulative +
                            drug:meany_cumulative,
                   data = pp_with_h, family = binomial(link = "cloglog"))
summary(fit_cloglog)

# MCAR test: alpha_1 = 0 AND alpha_3 = 0 검정
# (meany_cumulative + drug:meany_cumulative joint test)
library(car)
linearHypothesis(fit_cloglog,
                c("meany_cumulative = 0", "drug:meany_cumulative = 0"))

MCAR Test 의 구현

Joint hypothesis test:

\(H_0\): \(\alpha_1 = 0\) AND \(\alpha_3 = 0\).
Wald test 또는 LR test.
p < .05 → MCAR rejected.

Logit vs Clog-log:

보통 비슷한 결과.
Clog-log 가 hazard ratio 해석 자연.
임상 보고에는 clog-log 권고.

해석:

\(\alpha_1 \neq 0\): 관측 outcome 평균이 dropout 예측 (MAR 또는 MNAR).
\(\alpha_3 \neq 0\): 효과가 covariate (drug) 별로 다름.
둘 중 하나라도 significant → MCAR 위반.

9.4 Step 4: NIMH 식 결과 재현 (시뮬레이션)

# Hedeker Table 14.8 결과 시뮬레이션
set.seed(2026)
n_subjects <- 437
n_placebo <- 108

# Drug 변수
drug <- c(rep(0, n_placebo), rep(1, n_subjects - n_placebo))

# 시뮬레이션 dropout pattern (Drug × MeanY interaction)
# Placebo: high MeanY → dropout
# Drug: low MeanY → dropout
simulate_dropout <- function(drug, meany) {
  alpha_0 <- -6.573
  alpha_drug <- 4.765
  alpha_meany <- 0.635
  alpha_drug_meany <- -1.108
  logit <- alpha_0 + alpha_drug * drug + alpha_meany * meany +
           alpha_drug_meany * drug * meany
  prob <- plogis(logit)
  return(rbinom(length(prob), 1, prob))
}

# 시뮬레이션
meany <- rnorm(n_subjects, 5, 1)  # IMPS79 의 평균 ~ 5
dropout <- simulate_dropout(drug, meany)

# 적합
fit_sim <- glm(dropout ~ drug + meany + drug:meany,
               family = binomial(link = "logit"))
summary(fit_sim)

# 추정량 비교
cat("\nHedeker Table 14.8 추정량 비교:\n")
cat("진짜 alpha = (-6.573, 4.765, 0.635, -1.108)\n")
cat("추정 alpha =", round(coef(fit_sim), 3), "\n")

NIMH 결과의 임상 시사

Placebo 의 MeanY effect (\(\alpha_1 = 0.635\)):

\(\exp(0.635) = 1.89\).
심한 환자 (높은 IMPS79) 가 dropout 많음.
임상: “효과 없음 → 다른 치료 받으러 떠남”.

Drug 의 MeanY effect (\(\alpha_1 + \alpha_3 = -0.473\)):

\(\exp(-0.473) = 0.62\), \(1/0.62 = 1.60\).
호전 환자 (낮은 IMPS79) 가 dropout 많음.
임상: “약 효과 → 더 이상 진료 필요 없음”.

MCAR 채택의 위험:

Drug × MeanY interaction 무시 → MeanY effect = -0.147 (n.s., p = .18).
→ 잘못된 MCAR 채택 → GEE1 사용 → biased.

실무 결론:

NIMH 데이터: MCAR 강하게 거부.
→ MRM (full likelihood) 사용 필수 (Ch.9 의 분석).
→ 또는 WGEE.

10 관련 주제

선행 지식

Ch.9 NIMH GLMM — NIMH schizophrenia 데이터
Ch.10 § 10.2.3 Discrete-time survival — Person-period framework
Ch.14 Overview — Ch.14 전체
§ 14.1~14.2 Mechanisms — MCAR/MAR/MNAR 정의
§ 14.3 Model simulations — MAR(b) 시뮬레이션 비교

후속 주제 (Ch.14 sub-posts)

§ 14.5.1 — Selection model (Diggle-Kenward 1994)
§ 14.5.2 — Pattern-mixture model (Little 1993, 1994)

관련 개념

Little (1988) — MCAR test 표준 (JASA)
Diggle (1989) — Longitudinal context 적용
Ridout (1991) — Reverse logistic regression 권고
Allison (1982) — Discrete-time survival
D’Agostino, Lee, Belanger, Cupples, Anderson, Kannel (1990) — Discrete-time hazard
Singer & Willett (1993) — Discrete-time survival tutorial
Singer & Willett (2003) — Person-period dataset textbook
Prentice & Gloeckler (1978) — Grouped-time proportional hazards (clog-log)
Hedeker, Siddiqui & Hu (2000) — Random-effects discrete-time PH model
Demirtas & Schafer (2003) — Attendance question 권고
Kenward (1998) — Sensitivity to nonignorability