Kwangmin Kim - 충분원인 상호작용

이 글은 Phase J 시리즈의 8 번째 글이자 Hernan Ch.5 시리즈의 마지막 글. Sufficient cause interaction 의 정확한 정의와 counterfactual interaction 과의 관계를 다룬다 (Hernan Ch.5.5, 5.6).

1 진입 직관 — “두 처치가 같은 메커니즘 안에”

이전 글에서 sufficient cause framework 의 기본을 봤다. Sufficient cause 는 완성된 인과 메커니즘 — 모든 component 가 모이면 결과 발생.

결정적 질문: 두 처치 \(A\) 와 \(E\) 가 같은 sufficient cause 의 component 인가? 즉 같은 메커니즘 안에서 함께 작용 하는가?

그렇다면 biologic interaction 또는 sufficient cause interaction 존재.

1.1 시각적 직관

Sufficient Cause SC1:
   [A] [E] [C] = 완성된 원 → 결과

A 와 E 가 모두 component → "biologic interaction"
A 와 E 중 하나라도 빠지면 SC1 미완성

비유 — 화재: 화재의 sufficient cause 의 한 예 — 연료 + 산소 + 점화. 세 가지가 같은 SC 의 component. 하나라도 빠지면 화재 안 일어남. Three-way biologic interaction.

2 Sufficient Cause Interaction 의 정의

정의: Sufficient Cause Interaction (Rothman 1976)

두 처치 \(A\) 와 \(E\) 가 같은 sufficient cause 의 component 일 때 sufficient cause interaction 존재.

2.1 형식

어떤 sufficient cause \(SC\) 가 다음 형태:

\[ SC = A \cap E \cap (\text{other components}) \]

즉 \(A=1\) and \(E=1\) 이 모두 component.

2.2 함의

\(SC\) 가 완성되려면 \(A=1\) 과 \(E=1\) 모두 필요. 둘 중 하나만이라도 부재하면 그 SC 경로 로는 결과 발생 안 함.

수식 직관: Counterfactual interaction 은 통계적 패턴. Sufficient cause interaction 은 기계론적 사실. 다른 차원.

3 Counterfactual vs Sufficient Cause Interaction — 정확한 관계

3.1 Hernan 의 통찰

“두 framework 는 부분적으로 일치. Sufficient cause interaction 이 있으면 대부분 counterfactual interaction 도. 그러나 역은 항상 성립 안 함.”

3.2 VanderWeele & Robins (2008) 의 결과

Counterfactual interaction 으로부터 sufficient cause interaction 을 추론하려면 추가 가정 필요. 일반적으로 통계적 패턴 ≠ 메커니즘.

3.3 예시 — Counterfactual Interaction 있지만 Sufficient Cause Interaction 없음

4 cell:

\(A\) \(E\) risk

0 0 0.10

1 0 0.20

0 1 0.30

1 1 0.50

\(A\)	\(E\)	risk
0	0	0.10
1	0	0.20
0	1	0.30
1	1	0.50

Additive interaction = \((0.50 - 0.30) - (0.20 - 0.10) = 0.10\) — Superadditive counterfactual.

그러나 sufficient cause 분석:

SC1 (\(A\) 만): 위험 0.10 (A=1 시 발생)

SC2 (\(E\) 만): 위험 0.20 (E=1 시 발생)

둘이 독립 (같은 SC 의 component 아님)

두 SC 가 독립 이면 두 동시 발생 위험 = \(1 - (1-0.10)(1-0.20) = 0.28\). 0.50 보다 작음.

그러나 관찰된 \(\Pr[Y | A=1, E=1] = 0.50\) — 더 큼. 따라서 추가 SC (둘 다 component) 의 존재 시사.

결론: 통계적 superadditive interaction → sufficient cause interaction 가능성. 그러나 certain 아님.

3.4 예시 — Counterfactual Interaction 없지만 Sufficient Cause Interaction 있음

가능. 두 sufficient causes 의 효과가 통계적으로 상쇄.

Hernan 의 노트 (Ch.5.2): “Interaction between A and E without modification of the effect of A by E 도 논리적으로 가능, 그러나 드묾. Dual effects of A and exact cancellations 필요.”

3.5 Empirical 식별

Sufficient cause interaction 의 직접 식별 어려움. 추가 monotonicity 가정 필요 (VanderWeele & Robins 2008).

Monotonicity: 처치가 누구에게도 해롭지 않음 (또는 누구에게도 이롭지 않음). 강한 가정.

4 Biologic Interaction — 용어와 의미

4.1 다른 이름

Sufficient cause interaction 은 흔히 biologic interaction 또는 mechanistic interaction 으로도 부름.

4.2 의미

생물학적 메커니즘 에서 두 인자가 함께 작용. 통계적 패턴이 아닌 underlying biology.

4.3 사례 — 흡연 + 석면

흡연자 + 석면 노출자에서 폐암 위험 시너지적 증가. 메커니즘:

흡연이 기관지 점막 손상 + 발암물질 흡수 증가

석면이 점막 노출 + 발암성 fibers

두 인자 같은 폐 조직에서 누적 → 같은 세포의 변형 가속

Same biologic pathway — biologic interaction.

4.4 임상적 함의

Biologic interaction 은 예방 우선순위 결정. 흡연자에게 석면 노출 특히 위험 — 둘 다 회피 권고.

5 Counterfactual or Sufficient Causes? — Ch.5.6

5.1 Hernan 의 통합 입장

“두 framework 는 상호 보완적. 같은 인과 현상의 다른 측면.”

5.2 권장 사용

분석 목적	Framework
ATE, CATE 등 통계 추정	Counterfactual
식별 가정 검토	Counterfactual
biologic mechanism 논의	Sufficient Cause
예방 우선순위	Sufficient Cause
대중 의사 소통	둘 다 가능

5.3 Hernan 의 우선

“Counterfactual framework 우선. Sufficient Cause 는 해석 보조. 통계 분석은 counterfactual 로, 메커니즘 논의는 sufficient cause 로.”

5.4 이유

Counterfactual 의 통계적 strict 정의 와 식별 가정의 명확성. Sufficient cause 는 유연하나 추가 가정 의존.

5.5 두 framework 모두의 한계

모두 underlying causal structure 의 부분적 표현. 진짜 인과는 데이터로 완전히 식별 불가능.

Hernan 의 강조: “인과 추론은 항상 가정 위에 성립. 다른 framework 는 다른 가정·관점.”

6 Phase J Ch.4·5 시리즈의 종합

8 글 (Ch.4 4편 + Ch.5 4편) 의 통합 메시지:

6.1 1: Heterogeneity 는 본질적

처치 효과는 한 숫자 가 아닌 분포. 모든 사람에게 같지 않음.

6.2 2: Effect Modification ≠ Interaction

변수의 인과적 지위 (조작 가능 vs baseline) 가 결정적 차이. V (modifier) vs E (처치).

6.3 3: 통계적 Interaction 의 두 scale

Additive (RD), Multiplicative (RR). 같은 데이터, 다른 결론. 둘 다 보고.

6.4 4: Stratification 의 두 얼굴

Effect modification 식별 + Confounding 보정. 같은 도구, 다른 목적.

6.5 5: Matching 은 stratification 의 극단

Propensity score matching 의 매력과 한계. ATT 추정 경향.

6.6 6: Joint Intervention \(Y^{a,e}\)

Two-treatment causal inference 의 기초. 4 cell 의 모든 식별.

6.7 7: Response Types 의 폭발

Single 4 → Joint 16. Identifiability 한계 — type 분포 unique 식별 불가능.

6.8 8: Sufficient Cause Framework

메커니즘적 보완. Biologic interaction 의 직관. Counterfactual 우선, Sufficient Cause 보조.

7 Phase J 시리즈에서의 다음 단계

이 8 글 (Ch.4 + Ch.5) 이 HTE 의 통계적 정통 정리. 후속:

J-MLHTE 시리즈 (4 편): ML 기반 HTE — Meta-learners, Causal Forest, DML

J-DID 시리즈 (4 편): Difference-in-Differences

J-RDD 시리즈 (4 편): Regression Discontinuity

J-SWITCH 시리즈 (4 편): Switchback, Geo

J-ADAPT 시리즈 (4 편): Adaptive Trial

각 고급 응용 이 Effect Modification·Interaction 의 다른 측면 활용.

8 시뮬레이션 — Sufficient Cause vs Counterfactual

import numpy as np

np.random.seed(42)

# 가정: 두 sufficient causes
# SC1: A 만 component (다른 component 와 결합) — 환자의 30% 가 가짐
# SC2: E 만 component — 환자의 25% 가 가짐
# SC3 (있음): A AND E 모두 component — 환자의 15% 가 가짐
# Disease only if 적어도 1 SC 완성

n = 10000
has_sc1 = np.random.random(n) < 0.30   # SC1 완성을 위한 다른 component 보유
has_sc2 = np.random.random(n) < 0.25
has_sc3 = np.random.random(n) < 0.15

# 4 cell의 결과
def get_outcome(a, e):
    """주어진 (A, E) 처치 시 결과 (1 = 발병)"""
    sc1_active = has_sc1 & (a == 1)
    sc2_active = has_sc2 & (e == 1)
    sc3_active = has_sc3 & (a == 1) & (e == 1)
    return (sc1_active | sc2_active | sc3_active).astype(int)

# 4 cell risks
print("[4 cell risks — SC framework 시뮬레이션]\n")
risks = {}
for a in [0, 1]:
    for e in [0, 1]:
        Y = get_outcome(a, e)
        risks[(a, e)] = Y.mean()
        print(f"  Pr(Y^{{a={a},e={e}}} = 1) = {Y.mean():.3f}")

# Counterfactual interaction
p00 = risks[(0, 0)]
p10 = risks[(1, 0)]
p01 = risks[(0, 1)]
p11 = risks[(1, 1)]

print(f"\n[Counterfactual Interaction]")
print(f"  Additive interaction = (p11 - p01) - (p10 - p00) = {(p11 - p01) - (p10 - p00):+.3f}")
print(f"  Superadditive — SC3 (둘 다 component) 의 존재 시사")

# Sufficient cause 비율 직접 측정
print(f"\n[Sufficient Cause 분포 (시뮬레이션 정답)]")
print(f"  SC1 component 보유: {has_sc1.mean():.2%}")
print(f"  SC2 component 보유: {has_sc2.mean():.2%}")
print(f"  SC3 component 보유 (A∩E): {has_sc3.mean():.2%}")

print(f"\n[직관]")
print(f"  → Counterfactual superadditive 는 biologic interaction 의 *signal*")
print(f"  → 그러나 SC 의 *정확한 분포* 는 통계적으로 식별 불가능")
print(f"  → 같은 RD 패턴을 만들 수 있는 SC 분포가 *여러 가지*")

결과 해석:

4 cell risks: SC 분포로부터 자연 도출.

Superadditive interaction: SC3 (둘 다 component) 의 통계적 표현.

식별 한계: 통계 데이터만으로 SC1, SC2, SC3 의 정확한 비율 추정 어려움.

9 결론 — Phase J Ch.4+5 시리즈 종합

Effect modification 과 Interaction 은 인과 추론의 heterogeneity 표현. Counterfactual framework 가 통계 분석 우선, Sufficient Cause framework 가 메커니즘 보완. 두 관점의 통합이 풍부한 인과 이해를 제공.

핵심 메시지 (8 글 종합):

Heterogeneity 는 본질적 — 처치 효과는 분포
Effect Modification ≠ Interaction — 변수 지위 차이
Two scales (Additive, Multiplicative) 모두 보고
Stratification 의 두 얼굴 — 식별 + 보정
Matching = 극한 stratification — 다른 trade-off
Joint Intervention \(Y^{a,e}\) — 두 처치 인과 추론
Response Types — Identifiability 한계
Sufficient Cause — 메커니즘 보완
두 Framework 통합 — Hernan 의 권고

다음 시리즈 (J-MLHTE) 는 ML 기반 HTE 추정 — Meta-learners, Causal Forest, DML.

10 관련 주제

선행 지식

Phase J 후속 시리즈

ML HTE — Meta-learners, Causal Forest, DML (placeholder)
DiD — Difference-in-Differences (placeholder)
RDD — Regression Discontinuity (placeholder)
Switchback·Geo (placeholder)
Adaptive Clinical Trial (placeholder)

11 참고문헌

Hernán, M. A. & Robins, J. M. (2020). Causal Inference: What If, Chapter 5.5, 5.6. Chapman & Hall/CRC.
VanderWeele, T. J. & Robins, J. M. (2008). Empirical and counterfactual conditions for sufficient cause interactions. Biometrika 95, 49-61.
VanderWeele, T. J. (2009b). Sufficient cause interactions and statistical interactions. Epidemiology 20, 6-13.
Rothman, K. J. (1976). Causes. Am. J. Epidemiol. 104, 587-592.
Greenland, S. & Poole, C. (1988). Invariants and noninvariants in the concept of interdependent effects. Scand. J. Work. Environ. Health 14, 125-129.
VanderWeele, T. J. (2015). Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press.