Kwangmin Kim - 반사실 반응 유형과 충분원인

이 글은 Phase J 시리즈의 7 번째 글이자 Hernan Ch.5 시리즈 의 세 번째. 이전 글에서 interaction 의 통계적 정의 를 다뤘다면, 본 글은 response types 의 깊은 분류 와 sufficient cause framework 라는 대안적 인과 관점 을 다룬다.

1 진입 직관 — “환자 유형의 분포가 인과를 결정”

Ch.4-1 에서 단일 처치의 4 response types (Doomed, Helped, Hurt, Immune) 을 봤다. 두 처치 시 유형의 수가 폭발적 증가:

\(2^4 = 16\) types — 각 환자가 4 counterfactual (\(Y^{a=1, e=1}, Y^{a=1, e=0}, Y^{a=0, e=1}, Y^{a=0, e=0}\)) 의 가능한 0/1 조합.

모집단의 16 types 분포 가 interaction 의 본질. 통계적 패턴 (RD, RR) 은 이 분포의 관찰 가능한 표현.

비유 — 유전자 조합: 두 유전자 (A, B) 가 각각 우성/열성. 4 조합 (AABB, AAbb, aaBB, aabb). 각 조합의 표현형 분포가 유전 상호작용 을 결정.

2 16 Response Types — Hernan Table 5.2

2.1 정의

두 처치 \(A, E\) 와 4 counterfactual \(Y^{1,1}, Y^{1,0}, Y^{0,1}, Y^{0,0}\). 각 0 또는 1 → \(2^4 = 16\) 가능 조합.

2.2 표

Hernan Table 5.2 의 16 types:

Type	\(Y^{1,1}\)	\(Y^{1,0}\)	\(Y^{0,1}\)	\(Y^{0,0}\)	의미
1	1	1	1	1	Doomed (어느 처치도 무관, 항상 결과)
2	1	1	1	0	\(E=0\) + \(A=0\) 일 때만 살아남음
3	1	1	0	1	\(E=1\) + \(A=0\) 일 때만 살아남음
4	1	1	0	0	\(A=1\) 시 항상 결과, \(A=0\) 시 결과 없음 (\(A\) 만 영향)
5	1	0	1	1	\(E=0, A=1\) 일 때만 살아남음
6	1	0	1	0	\(E=0\) 일 때만 결과 (\(E\) 만 영향, A 와 동일 효과)
7	1	0	0	1	복잡 — XOR 같은 패턴
8	1	0	0	0	\(A=1\) + \(E=1\) 일 때만 결과 (매우 specific)
…	…	…	…	…	…
16	0	0	0	0	Immune (어느 처치도 무관, 항상 살아남음)

2.3 Type 별 분류

2.3.1 Single-treatment effect types

Type	의미
1, 16	Doomed / Immune — 처치 무관
4, 13	\(A\) 만 영향 (Helped or Hurt by A)
6, 11	\(E\) 만 영향

2.3.2 Two-treatment effect types

Type	의미
7	XOR — 두 처치 독립 시만 결과
10	AND — 두 처치 모두 시만 결과
8, 9	그 외 복잡 패턴

수식 직관: 16 types 는 모든 가능한 인과 메커니즘의 enumeration. 각 type 의 모집단 비율이 통계적 interaction 을 결정.

2.4 Identifiability — 16 types 의 분포는 식별 불가능

Hernan 의 결정적 한계: 통계적 데이터로는 16 types 의 분포를 unique 식별 못 함.

즉 같은 RD, RR 데이터가 여러 types 분포 에서 나올 수 있음.

결과: Type-specific 인과 추론 어려움. Population-level 추론만 가능.

3 Single Treatment 와의 비교

3.1 Marginal types (단일 처치)

단일 처치 시 4 types (Doomed, Helped, Hurt, Immune).

3.2 Joint types

두 처치 시 16 types — 단일 4 types 의 cross product.

3.3 변환

두 처치의 joint type 으로부터 각 처치의 marginal type 도출 가능. 그러나 반대 (marginal → joint) 는 일반적으로 불가능.

예: 환자 type 4 (1, 1, 0, 0) — \(A\) 만 영향. 이를 marginal 로 보면:

\(A\) 의 marginal type: \(A=1\) 시 \(Y=1\), \(A=0\) 시 \(Y=0\) → Hurt by A

\(E\) 의 marginal type: \(E=1, E=0\) 모두 결과 동일 → 일관성 없음 (E 의 marginal 만으로 식별 어려움)

4 Sufficient Cause Framework — Ch.5.4

4.1 동기

Counterfactual framework 의 대안적 (보완적) 관점. Rothman (1976) 이 역학에서 도입.

통계적 분석 (counterfactual) 보다 생물학적·기계론적 직관 에 친숙.

4.2 Component Cause

그 자체로는 결과를 일으키기에 부족 한 인과 요소. 다른 component 와 결합 해야 결과 발생.

예: 결핵 발병의 component causes = 결핵균 노출, 면역 약화, 유전 감수성. 어느 하나만으로는 발병 안 함.

4.3 Sufficient Cause

모든 필요 component 를 갖춘 결합. 그것이 발생하면 반드시 결과 야기.

위 사례: 결핵균 노출 + 면역 약화 + 유전 감수성 = 결핵 발병의 한 sufficient cause.

4.4 Causal Pies — 시각적 비유

Sufficient cause = 완성된 원 (pie). Component cause = 원의 조각 (slice).

모든 조각이 모여 원을 완성해야 질병 발생.

Sufficient Cause 1 (SC1):
   [A] [B] [C] = 완성된 원 → 결과 발생

Sufficient Cause 2 (SC2):
   [D] [E] [F] = 다른 완성된 원 → 결과 발생

각 환자에 어느 SC 라도 완성되면 결과.

4.5 한 결과의 여러 sufficient causes

같은 결과 (예: 폐암) 가 여러 sufficient causes 에서 나옴:

SC1: 흡연 + 유전 X + 환경 요인 Z

SC2: 석면 노출 + 유전 Y

SC3: 방사선 노출 + 면역 약화

한 사람에 어느 SC 라도 완성 되면 폐암 발병.

직관: 각 환자의 폐암 원인이 다를 수 있음. 통계적 위험 인자 는 흔한 component 의 식별.

4.6 Strength of Component Cause

얼마나 흔한 component 인가. 흔한 component 의 제거 가 예방 효과 큼.

예: 흡연 이 폐암의 가장 흔한 component (50%+ 사례). 흡연 제거 → 폐암 50%+ 감소.

5 Sufficient Cause 와 Counterfactual 의 관계

5.1 관계 1: Sufficient Cause → Counterfactual

모든 sufficient causes 의 분포 가 알려지면 모든 counterfactual 계산 가능.

예: \(Y^{a=1}\) = \(A=1\) 처치 시 어느 sufficient cause 라도 완성 되는 비율. SC 의 정확한 구조 알면 계산 가능.

5.2 관계 2: Counterfactual → Sufficient Cause

반대 방향은 일반적으로 불가능. 같은 counterfactual 분포가 여러 sufficient cause 구조 에서 나옴.

함의: Counterfactual 은 통계적으로 식별 가능, sufficient cause 는 추가 가정 필요.

6 두 Framework 의 강점

6.1 Counterfactual

강점	설명
통계 분석에 적합	Standardization, IP weighting 등 도구 풍부
Identifiability 명확	어떤 가정 하에 무엇이 식별 가능
무작위 시험에 자연	RCT 의 기본 framework
ATE, CATE 등 명확	정확한 정의 + 추정량

6.2 Sufficient Cause

강점	설명
생물학적 직관	메커니즘 에 대한 자연스런 사고
역학 응용	위험 인자 식별, 인과 다원성 인정
Public health	예방 가능 분율 (PAF) 등 metric
Causal complexity	다중 경로 자연스럽게 표현

6.3 Hernan 의 통합 입장 (Ch.5.6)

“두 framework 는 상호 보완적. 같은 인과 현상의 다른 측면. 통계 분석 은 counterfactual, 메커니즘 논의 는 sufficient cause.”

본 시리즈는 counterfactual 위주 (통계 분석 도구로서). Sufficient cause 는 해석 보조.

7 사례 — 흡연 + 석면의 폐암

7.1 데이터 (가상)

4 cell:

흡연 (\(A\)) 석면 (\(E\)) 폐암 risk

0 0 0.01

1 0 0.10 (흡연 단독 +0.09)

0 1 0.05 (석면 단독 +0.04)

1 1 0.50 (둘 다 +0.49)

흡연 (\(A\))	석면 (\(E\))	폐암 risk
0	0	0.01
1	0	0.10 (흡연 단독 +0.09)
0	1	0.05 (석면 단독 +0.04)
1	1	0.50 (둘 다 +0.49)

7.2 Counterfactual 분석

Additive interaction = \(0.49 - 0.09 - 0.04 = +0.36\) — Superadditive.

Multiplicative: \(\frac{0.50/0.01}{(0.10/0.01) \times (0.05/0.01)} = \frac{50}{10 \times 5} = 1\) — Multiplicative interaction 없음.

7.3 Sufficient Cause 해석

Possible SCs:

SC1: 흡연 + 유전 변이 X + 환경 인자 (흡연자의 일부에서 폐암)

SC2: 석면 노출 + 유전 변이 Y (석면 작업자 일부)

SC3: 흡연 + 석면 + 면역 약화 (둘 다의 시너지)

SC3 의 존재가 superadditive interaction 을 설명. 둘 다 component 인 SC 가 존재 — biologic interaction.

7.4 함의

흡연자는 석면 회피 매우 중요 (SC3 차단). 석면 작업자는 흡연 회피 매우 중요.

Sufficient cause framework 가 예방 우선순위 의 직관 제공.

8 시뮬레이션 — Response Types 의 분포 추정

import numpy as np

np.random.seed(42)

# 가상 모집단의 16 types 분포
# 일부 types 만 존재 (단순화)
n = 10000
type_distribution = {
    1:  0.10,   # Doomed
    4:  0.20,   # A 만 영향 (1100)
    6:  0.15,   # E 만 영향 (1010)
    8:  0.05,   # AND only (1000) — 둘 다 시 결과
    11: 0.10,   # 0010
    13: 0.10,   # 0011
    16: 0.30,   # Immune
}
# 정규화
total = sum(type_distribution.values())
for k in type_distribution:
    type_distribution[k] /= total

# 각 type 의 counterfactual outcomes
type_to_cf = {
    1:  (1, 1, 1, 1),
    2:  (1, 1, 1, 0),
    3:  (1, 1, 0, 1),
    4:  (1, 1, 0, 0),
    5:  (1, 0, 1, 1),
    6:  (1, 0, 1, 0),
    7:  (1, 0, 0, 1),
    8:  (1, 0, 0, 0),
    9:  (0, 1, 1, 1),
    10: (0, 1, 1, 0),
    11: (0, 1, 0, 1),
    12: (0, 1, 0, 0),
    13: (0, 0, 1, 1),
    14: (0, 0, 1, 0),
    15: (0, 0, 0, 1),
    16: (0, 0, 0, 0),
}

# 환자 sample 생성
types = np.random.choice(list(type_distribution.keys()),
                         size=n, p=list(type_distribution.values()))

# 모든 가능한 처치 조합의 risk
print("[16 Response Types 모집단의 counterfactual risks]\n")
print(f"Type 분포: {type_distribution}\n")

for (a, e) in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    # idx: 1,1 → 0, 1,0 → 1, 0,1 → 2, 0,0 → 3
    idx = {(1,1): 0, (1,0): 1, (0,1): 2, (0,0): 3}[(a, e)]
    risks = [type_to_cf[t][idx] for t in types]
    risk = np.mean(risks)
    print(f"  Pr(Y^{{a={a},e={e}}} = 1) = {risk:.3f}")

# Counterfactual interaction
p11 = np.mean([type_to_cf[t][0] for t in types])
p10 = np.mean([type_to_cf[t][1] for t in types])
p01 = np.mean([type_to_cf[t][2] for t in types])
p00 = np.mean([type_to_cf[t][3] for t in types])

print(f"\n[Interaction]")
print(f"  Additive interaction = (p11 - p01) - (p10 - p00) = {(p11 - p01) - (p10 - p00):+.3f}")
print(f"  Type 8 (AND only) 가 superadditive interaction 의 주요 기여자")
print(f"  → 같은 통계적 결과가 *여러 type 분포* 에서 나올 수 있음 (identifiability 한계)")

9 결론

두 처치의 16 response types 는 interaction 의 underlying 구조. Sufficient cause framework 는 생물학적 직관 의 보완. 두 framework 는 같은 현상의 다른 측면.

핵심 메시지:

16 types: 단일 처치 4 types → 두 처치 16 types
One-sided vs Two-sided effect types: 분류
Identifiability 한계: Counterfactual 분포 → Type 분포 unique 안 됨
Sufficient Cause (Rothman 1976): Component, SC, Causal Pies
여러 SC: 한 결과의 다중 인과 경로
두 framework 통합: Counterfactual 통계, Sufficient Cause 메커니즘

다음 글에서 Sufficient Cause Interaction (5.5) + 통합 (5.6) 깊이.

10 관련 주제

선행 지식

Interaction — Ch.5 개관
공동 개입과 식별
(Phase D) Counterfactual framework

Phase J 후속 글

Sufficient Cause Interaction (5.5+5.6) (placeholder)

11 참고문헌

Hernán, M. A. & Robins, J. M. (2020). Causal Inference: What If, Chapter 5.3, 5.4. Chapman & Hall/CRC.
Rothman, K. J. (1976). Causes. Am. J. Epidemiol. 104, 587-592.
Greenland, S. & Brumback, B. (2002). An overview of relations among causal modelling methods. Int. J. Epidemiol. 31, 1030-1037.
VanderWeele, T. J. (2009a). Sufficient cause interactions and statistical interactions. Epidemiology 20, 6-13.
VanderWeele, T. J. & Robins, J. M. (2008). Empirical and counterfactual conditions for sufficient cause interactions. Biometrika 95, 49-61.
Greenland, S. (2009). Interactions in epidemiology: relevance, identification, and estimation. Epidemiology 20, 14-17.