1 Experimentation

시작점 — 학습 로드맵

처음 방문이라면 Experimentation 학습 로드맵 — 역학에서 인과추론·A/B Test 까지 를 먼저 읽는다. 11 Phase × 7 교재 매핑과 독자 유형별 진입점을 5 분 안에 파악할 수 있다.

실험 설계와 인과추론의 체계적 이해를 위한 구조화된 학습 경로를 제시
Epidemiology 기반 지식을 활용하여 A/B Testing과 Multi-Armed Bandit을 포함한 Experimentation 전반을 학습할 수 있도록 구성

1.1 Experimentation 하위구조

Experimentation
- Epidemiology
- AB_test
- Fundamentals
- MAB (Multi-Armed Bandit)
- Causal_Inference
- Advanced
- Platform

1.2 Learning Path

Epidemiology → Classical A/B Testing → Multi-Armed Bandits → Advanced Methods 순서로 실험 설계와 인과추론을 체계적으로 정리

1.2.1 Core Concepts Connection Map

Epidemiology RCT → A/B Testing → Sequential Testing → Adaptive Testing → MAB
       ↓              ↓                                        ↓
Causal Inference → DAG/SUTVA ────────────────────→ Interference Handling
       ↓              ↓                                        ↓
Effect Measures → Lift/Uplift ──────────────────→ Heterogeneous TE
       ↓
Sample Size → Power Analysis → Sequential Monitoring → Variance Reduction

1.2.2 Key Mathematical Connections

Epidemiology	Experimentation	수식
Relative Risk	Lift	\(\frac{P(Y=1\|T=1)}{P(Y=1\|T=0)}\)
Risk Difference	Absolute Uplift	\(P(Y=1\|T=1) - P(Y=1\|T=0)\)
NNT	Number to Convert	\(\frac{1}{RD}\)
Effect Modification	HTE	\(E[Y(1)-Y(0)\|X]\)

1.3 Foundations

1.3.1 Epidemiology Foundations

RCT(무작위 대조 시험)는 A/B Testing의 직접적 원형이며, 인과추론의 수학적 프레임워크(Potential Outcomes, SUTVA)가 실험 설계의 이론적 기초를 제공한다. 역학의 effect measures(RR, RD, NNT)는 디지털 실험의 lift와 uplift 지표로 직접 대응되며, bias와 confounding 개념은 실험의 타당도를 판단하는 핵심 도구다. Sample size와 power 계산은 모든 실험 설계의 출발점이다.

Epidemiology Overview ⭐
- 2023-02-27, Types of Study Designs in Epidemiology
- 2023-05-23, Measures of Risk: Relative Risk & Odds Ratio
- Comprehensive Measures Series ⭐ (2026-05-08 신규)
  - 2026-05-08, Effect Measures: 효과·영향·인과 추정량 종합 — RD/ARR/NNT/PAR + ATE/ITT/LATE + p/CI/power + E-value + OEC/MDE
  - 2026-05-08, Time-to-Event Measures: 시간-사건 분석 지표 종합 — IR/IRR/HR/KM/Cox/SMR/SIR/Causal Survival/메타분석
  - 2026-05-08, Diagnostic & Screening Measures: 진단·분류 평가 지표 종합 — Sn/Sp/PPV/LR±/AUC/Brier/Calibration + Bayesian update
- Phase B — Schulz & Grimes Ch.2: Descriptive Studies (기술 연구)
  - 2026-05-08, 기술 연구 개관 — 무엇을 할 수 있고 무엇을 할 수 없는가
  - 2026-05-08, 기술 역학 5W + So What
  - 2026-05-08, 기술 연구 5 유형 — Case Report부터 Ecological까지
  - 2026-05-08, 기술 연구의 활용·장단점·데이터 초월 위험
- Phase B — Schulz & Grimes Ch.4: Cohort Studies (코호트 연구)
  - 2026-05-08, 코호트 연구 개관 — 노출에서 결과로 행진하다
  - 2026-05-08, 전향·후향·양방향 + 코호트의 강점
  - 2026-05-08, 생존 분석·Cox PH 모형 + 코호트의 약점
  - 2026-05-08, 코호트 평가 기준과 추적 손실 관리
  - 2026-05-08, 코호트 보고 표준(STROBE)과 변형 설계
- Phase B — Woodward Ch.5: Cohort Studies (통계 lens)
  - 2026-05-08, WOO Ch.5 overview — 통계학자의 시각으로 본 코호트
  - 2026-05-08, 코호트 설계와 분석적 고려
  - 2026-05-08, 생명표와 Kaplan-Meier 추정
  - 2026-05-08, 생존 비교와 경쟁 위험
  - 2026-05-08, 인-년 방법과 기간-코호트 분석
- Phase B — Schulz & Grimes Ch.5: Case-Control (환자-대조군 연구)
  - 2026-05-08, 환자-대조군 연구 개관 — Research in Reverse
  - 2026-05-08, Case-Control 기본 설계 + 강점·약점
  - 2026-05-08, Case·Control 군 선정
  - 2026-05-08, 노출 측정과 교란 통제
- Phase B — Schulz & Grimes Ch.6: Finding Controls (대조군 찾기)
  - 2026-05-08, 대조군 찾기 개관 — Compared to What?
  - 2026-05-08, 대조군의 목적 + Known Group + RDD
  - 2026-05-08, Unknown Group 의 Control 옵션
  - 2026-05-08, Control 의 수와 평가 기준
- Phase B — Woodward Ch.6: Case-Control (통계 lens)
  - 2026-05-08, WOO Ch.6 overview — 통계학자의 시각으로 본 Case-Control
  - 2026-05-08, Case-Control 설계와 분석 방법
  - 2026-05-08, Cases·Controls 선정 통계 lens
  - 2026-05-08, 매칭과 매칭 분석 — McNemar·Conditional Logistic
  - 2026-05-08, Nested Case-Control 과 Case-Cohort
  - 2026-05-08, Case-Crossover — 같은 사람의 시점 비교
- Phase B — Schulz & Grimes Ch.7: Limitations (관찰 역학의 구조적 한계)
  - 2026-05-08, 관찰 역학의 구조적 한계 개관
  - 2026-05-08, 거짓 주장과 아마추어 연구자
  - 2026-05-08, 행정 데이터베이스와 약한 연관
  - 2026-05-08, 동료 심사의 취약과 연구 사기
- Phase B — Schulz & Grimes Ch.8: Screening (선별 검사)
  - 2026-05-08, 선별 검사 (Screening) 개관
  - 2026-05-08, 선별의 윤리와 도입 기준
  - 2026-05-08, Validity 와 Predictive Value
  - 2026-05-08, 검사 결합 — Sequential vs Parallel
  - 2026-05-08, Lead-time / Length bias 와 평가 지침
- Phase B — Schulz & Grimes Ch.9: Likelihood Ratio (우도비)
  - 2026-05-08, Likelihood Ratio (LR) 개관
  - 2026-05-08, 이분 검사 LR — 정의와 Why Bother
  - 2026-05-08, Cut-point 선택과 Fagan Nomogram
  - 2026-05-08, LR 의 다양한 형태 — 0~1, 큰 LR, Multi-level
  - 2026-05-08, Pretest Probability + 진단 임계값 + 한계
- Phase B — Woodward Ch.4.5~4.6: Standardisation + MH
  - 2026-05-08, 표준화와 Mantel-Haenszel 개관
  - 2026-05-08, Direct Standardisation
  - 2026-05-08, Indirect Standardisation 과 SMR
  - 2026-05-08, Mantel-Haenszel 방법
- Phase B — Hill 의 인과 9 기준
  - 2026-05-08, Hill 의 인과 9 기준 — 관찰 연구의 인과 추론 가이드

1.3.2 Phase H — Statistical Modeling (Woodward lens)

통계 모델링을 실험 분석 lens 로 정리한 시리즈. Logistic regression, Cox/Survival, Meta-analysis, Risk Scores·ROC 의 4 챕터 분해 (총 37 편). Statistics 카테고리의 GLM·Survival·LDA 시리즈와의 cross-link.

Phase H — Woodward Ch.10: Logistic Regression (8 편)
- 2026-05-08, Logistic Regression 개관 — 실험 분석 lens
- 2026-05-08, 표준 회귀의 3 문제와 Logit 변환
- 2026-05-08, Logistic 과 계수 해석 (1) — Binary·Quantitative
- 2026-05-08, 계수 해석 (2) — 범주·순서·FAR
- 2026-05-08, 일반 데이터와 다중 로지스틱
- 2026-05-08, Logistic 가설검정 — Wald·LR·GoF·IC
- 2026-05-08, 교란과 상호작용 — Logistic 의 두 핵심 도구
- 2026-05-08, 양적 설명 변수 처리 — Linear·Categorical·Spline
Phase H — Woodward Ch.11: Cox/Survival (10 편)
- 2026-05-08, Survival·Cox 회귀 개관 — 추적 데이터 모델링
- 2026-05-08, 생존 함수와 위험 함수 — 시간-사건 분석의 두 기둥
- 2026-05-08, Hazard 추정 4 방법 — KM·Person-time·Actuarial·NA
- 2026-05-08, 모수적 모형 — Exponential·Weibull·Log-logistic·Gompertz
- 2026-05-08, 모수적 Proportional Hazards 회귀
- 2026-05-08, Cox PH 와 Weibull PH
- 2026-05-08, Cox PH 모형 진단 — Schoenfeld·Log-log·Time-interaction
- 2026-05-08, 경쟁 위험 (Competing Risks) 과 Joint Modeling
- 2026-05-08, Poisson 회귀 — Person-Time 자료의 모형
- 2026-05-08, Pooled Logistic Regression — 시간 의존 + 인과 추론
Phase H — Woodward Ch.12: Meta-analysis (9 편)
- 2026-05-08, Meta-analysis 개관 — 여러 연구의 통합 분석
- 2026-05-08, Systematic Review — Cochrane 표준
- 2026-05-08, Fixed vs Random Effects — IV 통합의 두 모형
- 2026-05-08, Heterogeneity 의 정량화 — Q·I²·τ²
- 2026-05-08, 다양한 Outcome 의 통합 — RD·Mean·Mixed
- 2026-05-08, Heterogeneity 조사 — Forest·Influence·Sensitivity·Meta-regression
- 2026-05-08, 표 자료의 통합 — Mantel-Haenszel·Peto·Zeros
- 2026-05-08, IPD Meta-analysis 와 Study Quality
- 2026-05-08, Publication Bias — Funnel·Egger·Trim-and-Fill
Phase H — Woodward Ch.13: Risk Scores·ROC (10 편)
- 2026-05-08, Risk Scores 와 Clinical Decision Rules 개관
- 2026-05-08, Population vs Individual 수준 개입 — Rose 의 역설
- 2026-05-08, Association vs Prognosis — 인과와 예측의 분업
- 2026-05-08, 통계 모형으로부터 Risk Score 산출
- 2026-05-08, ROC 와 AUC — Discrimination 의 표준
- 2026-05-08, Calibration — 예측 확률의 정직성
- 2026-05-08, Recalibration — 모형 보정의 도구
- 2026-05-08, Brier Score 와 외래 변수
- 2026-05-08, Reclassification — NRI 와 IDI
- 2026-05-08, Validation·Presentation·Impact Studies
- Study Design Framework
  - 1111-11-11, Randomized Controlled Trial (무작위 대조 시험) ⭐
  - 1111-11-11, Quasi-experimental Studies (준실험 연구)
  - 1111-11-11, Factorial Design (요인 설계)
- Measures of Association and Impact
  - 1111-11-11, Relative Risk and Risk Difference
  - 1111-11-11, Effect Size Measures
  - 1111-11-11, Number Needed to Treat
- Bias and Confounding
  - 1111-11-11, Selection Bias
  - 1111-11-11, Information Bias
  - 1111-11-11, Confounding and Control
- Study Design Fundamentals
  - 1111-11-11, Sample Size and Power ⭐
  - 1111-11-11, Randomization Methods ⭐

1.3.3 Study Design Series

역학과 임상시험에서 정립된 연구 설계 체계를 IT 실험 설계와 연결하여 정리한 시리즈다. 연구 설계의 대분류, 각 설계의 상세, 타당성과 편향, RCT, 관찰 연구, 준실험 설계, 인과 추론 프레임워크를 포괄한다.

Study Design Overview
- 2026-03-08, 연구 설계 대분류 — Overview
- 2026-03-08, 각 연구 설계 상세
- 2026-03-08, 타당성, 편향, 인과 추론, 효과 지표
Experimental and Observational Designs
- 2026-03-08, RCT와 A/B 테스트의 설계 원칙
  - RCT 5요소 Deep Dive — Schulz Phase C 시리즈 (42편 완결)
    - Ch.10 모집 (Recruitment)
      - 2026-05-08, RCT 모집의 현실 — Ch.10 개관
      - 2026-05-08, 모집 어려움의 정량화 — Lasagna · Muench · π · Fractions
      - 2026-05-08, 모집 개선 전략 — Zelen · cmRCT · Cochrane 4 전략
    - Ch.12 무작위 배정 순서 (Allocation Sequence)
      - 2026-05-08, 무작위 배정 순서 — Ch.12 개관
      - 2026-05-08, 무작위화의 역사와 세 가지 이점 — Fisher · Hill
      - 2026-05-08, 무작위 배정 방법 비교 — Simple · Block · Urn · Stratified
    - Ch.13 비맹검 RCT 의 추측 위험
      - 2026-05-08, 비맹검 RCT 의 추측 위험 — Ch.13 개관
      - 2026-05-08, Cosmetic Credibility 미신과 Block 함정
      - 2026-05-08, Urn 과 Mixed Randomisation
    - Ch.14 Allocation Concealment
      - 2026-05-08, Allocation Concealment — Ch.14 개관
      - 2026-05-08, 은폐의 중요성과 해독 시도
      - 2026-05-08, Concealment 평가 기준과 사례
      - 2026-05-08, 기저선 비교의 함정
    - Ch.15 Exclusions and Losses (ITT)
      - 2026-05-08, Exclusions · ITT — Ch.15 개관 (Sulfinpyrazone 32% → 21%)
      - 2026-05-08, 무작위 전 제외 — 외적 타당도 trade-off
      - 2026-05-08, 무작위 후 제외와 ITT 절대 원칙
      - 2026-05-08, 사후 제외와 추적 손실 — 5-and-20 rule
    - Ch.16 Blinding
      - 2026-05-08, Blinding — Ch.16 개관 (Hiding Who Got What)
      - 2026-05-08, 맹검의 효과와 용어 — Multiple Sclerosis 사례
      - 2026-05-08, Masking vs Blinding · 위약과 Double-Dummy
      - 2026-05-08, 맹검의 편향 방지 효과와 평가
    - Ch.17 Blinding Implementation
      - 2026-05-08, Blinding 구현 — Ch.17 개관
      - 2026-05-08, 단일 맹검 구현 — Sham Procedures
      - 2026-05-08, 이중 맹검 옵션 — Encapsulation · Double-Dummy
      - 2026-05-08, Double-Dummy 와 Placebo 의 중요성
    - Ch.18 Surrogate · Composite Outcomes
      - 2026-05-08, Surrogate · Composite — Ch.18 개관 (Encainide 사망 3 배)
      - 2026-05-08, Surrogate 정의와 9 가지 약물 실패 사례
      - 2026-05-08, Surrogate Validation — Fleming-DeMets 두 기준
      - 2026-05-08, BEST Resource 의 6 가지 endpoint 분류
      - 2026-05-08, Composite Outcomes — DREAM Trial 60% 함정
    - Ch.20 Multiplicity II — Subgroup · Interim
      - 2026-05-08, Multiplicity II — Ch.20 개관 (별자리 아스피린)
      - 2026-05-08, Subgroup 분석과 Test of Interaction
      - 2026-05-08, Interim · Group Sequential — O’Brien-Fleming
      - 2026-05-08, Stopping for Harm · Futility · 기타 방법
    - Ch.21 Prospective Meta-Analysis
      - 2026-05-08, PMA — Ch.21 개관
      - 2026-05-08, PMA vs MCRCT 비교
      - 2026-05-08, MCRCT 와 PMA 의 단점 비교
      - 2026-05-08, PMA 실행 단계 — IPD 통합 분석
    - Ch.22 CONSORT 보고 지침
      - 2026-05-08, CONSORT — Ch.22 개관
      - 2026-05-08, Deficient · Selective Reporting
      - 2026-05-08, CONSORT 2010 Checklist 깊이
      - 2026-05-08, STARD · STROBE · PRISMA — RCT 이외 보고 지침
- 2026-03-08, 관찰 연구 설계: 코호트, 케이스-컨트롤, 단면 연구
- 2026-03-08, 준실험적 설계: ITS, RDD, Stepped Wedge
- 2026-03-08, 인과 추론 프레임워크 총정리

1.3.4 Statistical Foundations

가설 검정의 Type I/II 오류와 statistical power 개념은 실험 결과 해석의 필수 요소다. Multiple testing problem은 여러 지표를 동시에 평가하거나 중간 결과를 확인할 때 발생하는 false positive를 통제하는 방법을 제공한다. Effect size와 MDE(최소감지효과) 이해 없이는 실무적으로 의미 있는 실험 설계가 불가능하다.

관련 참조: 가설 검정, GLM, 분포 이론 등 통계 기초는 Statistics 섹션을 참고한다. 특히 GLM (일반화선형모형)은 실험 분석에서 핵심적으로 활용된다.

Statistical Foundations for Experimentation
- 1111-11-11, Hypothesis Testing Framework
  - 1111-11-11, Type I and Type II Errors (1종·2종 오류)
  - 1111-11-11, Statistical Power (검정력)
  - 1111-11-11, p-values and Significance (p-값과 유의성)
  - 2026-04-12, 대표본에서 p-값의 구조적 한계와 실무 대안 지표
- 1111-11-11, Effect Size and Practical Significance
  - 1111-11-11, Minimum Detectable Effect (MDE, 최소감지효과)
  - 1111-11-11, Cohen’s d and Standardized Effect Sizes
- 1111-11-11, Multiple Testing Problem
  - 1111-11-11, Family-wise Error Rate (FWER)
  - 1111-11-11, False Discovery Rate (FDR)
  - 1111-11-11, Bonferroni and Holm Corrections
- 1111-11-11, Confidence Intervals
  - 1111-11-11, Interpretation and Common Mistakes
  - 1111-11-11, Bootstrap Methods

1.3.5 Classical A/B Testing Fundamentals

디지털 제품에서 RCT 원리를 적용하는 구체적 방법론이다. Randomization unit 선택, traffic allocation, metric selection 등 실무적 의사결정 프레임워크를 제공한다. Fixed-horizon testing의 원칙과 “peeking problem”을 이해해야 sequential testing과 MAB의 필요성을 정확히 파악할 수 있다. 대부분의 실무 실험이 이 방법으로 진행되므로 가장 실용적이다.

Classical A/B Testing ⭐
- 2026-03-20, A/B 테스트 개요
- 2026-03-16, A/B 테스트의 핵심 메커니즘
- 2026-03-16, 전후 비교(Before-and-After)가 위험한 이유
- 1111-11-11, A/B Test Design Principles
  - 1111-11-11, Hypothesis Formulation (가설 수립)
  - 1111-11-11, Metric Selection (지표 선택)
  - 1111-11-11, Randomization Unit (무작위 배정 단위)
  - 1111-11-11, Traffic Allocation (트래픽 할당)
- 1111-11-11, Sample Size Calculation ⭐
  - 1111-11-11, Power Analysis for A/B Tests
  - 1111-11-11, Continuous vs. Binary Metrics
  - 1111-11-11, Unequal Allocation
- 1111-11-11, Randomization Strategies
  - 1111-11-11, Simple Randomization
  - 1111-11-11, Stratified Randomization
  - 1111-11-11, Consistent Hashing
- 1111-11-11, Analysis Methods
  - 1111-11-11, t-tests and z-tests
  - 1111-11-11, Chi-square Tests
  - 1111-11-11, Non-parametric Tests (Mann-Whitney)
- 1111-11-11, Duration and Stopping Rules
  - 1111-11-11, Fixed Horizon Testing
  - 1111-11-11, When to Stop (언제 중단할 것인가)
  - 1111-11-11, Peeking Problem (중간확인 문제)

1.4 Core Methods

1.4.1 Advanced A/B Testing

Sequential testing은 실험 진행 중 중간 결과를 확인하면서도 Type I 오류를 통제하는 방법으로, 실무에서 가장 많이 요구되는 기술이다. Bayesian A/B testing은 “승률 확률” 같은 비즈니스 친화적 해석을 제공한다. Multi-variate testing은 여러 요소의 상호작용 효과를 동시에 평가할 수 있어 복잡한 제품 최적화에 필수적이다.

Advanced A/B Testing Techniques
- 1111-11-11, Sequential Testing ⭐
  - 1111-11-11, Group Sequential Designs (그룹순차설계)
  - 1111-11-11, Sequential Probability Ratio Test (SPRT)
  - 1111-11-11, Alpha Spending Functions
  - 1111-11-11, Always-Valid Inference
- 1111-11-11, Bayesian A/B Testing
  - 1111-11-11, Prior Distribution Selection (사전분포 선택)
  - 1111-11-11, Posterior Probability (사후확률)
  - 1111-11-11, Credible Intervals (신용구간)
  - 1111-11-11, Probability of Being Best
- 1111-11-11, Multi-variate Testing
  - 1111-11-11, Full Factorial Designs (완전요인설계)
  - 1111-11-11, Fractional Factorial Designs (부분요인설계)
  - 1111-11-11, Interaction Effects (상호작용효과)
- 1111-11-11, A/A Testing
  - 1111-11-11, Platform Validation
  - 1111-11-11, Sample Ratio Mismatch (SRM) Detection
- 1111-11-11, Bayesian Hierarchical Models for Experimentation
  - 1111-11-11, Hierarchical Modeling of Treatment Effects (처치효과의 계층 모델링)
  - 1111-11-11, Bayesian Model Checking and Posterior Predictive Checks (모형 점검과 사후예측점검)
  - 1111-11-11, Bayes Factor for Model Comparison (모형 비교를 위한 베이즈 인자)

1.4.2 Causal Inference Framework

Potential outcomes와 DAG는 “왜 무작위 배정이 중요한가”, “어떤 변수를 통제해야 하는가”에 대한 수학적으로 엄밀한 답을 제공한다. SUTVA 위반(network effects, spillover)은 디지털 실험에서 흔히 발생하며, 이를 감지하고 처리하는 방법이 필요하다. HTE(heterogeneous treatment effects) 분석의 이론적 기초가 되어 “어떤 사용자에게 효과적인가”를 과학적으로 답할 수 있게 한다.

2026-03-20, 인과 추론 개요 — 정의, 프레임워크, 방법론 지도
2026-03-20, 인과 효과의 정의와 잠재 결과 — Ch.1
2026-03-20, 무작위 실험과 교환가능성 — Ch.2
2026-03-20, 관찰 연구와 식별 조건 — Ch.3
2026-03-20, 효과 수정과 상호작용 — Ch.4-5
2026-03-20, DAG와 인과 다이어그램 — Ch.6
2026-03-20, 교란 — Ch.7
2026-03-20, 선택 편향 — Ch.8
2026-03-20, 측정 오차와 랜덤 변동 — Ch.9-10
2026-05-08, 왜 모형이 필요한가 — Ch.11 개관
2026-05-08, 데이터는 스스로 말하지 않는다 + 모수적 조건부 평균 추정량 — Ch.11.1-11.2
2026-05-08, 비모수 추정량과 평활 — Ch.11.3-11.4
2026-05-08, 편향-분산 트레이드오프 — Ch.11.5
2026-05-08, IPW와 MSM — Ch.12 개관
2026-05-08, NHEFS 인과 질문 + IP 가중치 모델링 — Ch.12.1-12.2
2026-05-08, 안정화 IP 가중치 + MSM — Ch.12.3-12.4
2026-05-08, 효과 수정 MSM + censoring — Ch.12.5-12.6
2026-05-08, 표준화·g-formula — Ch.13 개관
2026-05-08, 표준화 절차 + 결과 모형 추정 — Ch.13.1-13.2
2026-05-08, 4 단계 g-formula 알고리즘 + IPW vs 표준화 — Ch.13.3-13.4
2026-05-08, 추정값 신뢰성과 sensitivity — Ch.13.5
2026-05-08, G-estimation과 SNMM — Ch.14 개관
2026-05-08, 조건부 효과 + 교환가능성 logistic 표현 — Ch.14.1-14.2
2026-05-08, SNMM + rank preservation — Ch.14.3-14.4
2026-05-08, G-estimation 절차 + 다중 모수 — Ch.14.5-14.6
2026-05-08, 결과 회귀와 PS — Ch.15 개관
2026-05-08, Outcome regression + PS 정의 — Ch.15.1-15.2
2026-05-08, PS 층화·표준화·매칭 — Ch.15.3-15.4
2026-05-08, Propensity·Structural·Predictive 비교 — Ch.15.5
2026-05-08, 도구변수 추정 — Ch.16 개관
2026-05-08, 세 IV 조건 + Wald 추정량 — Ch.16.1-16.2
2026-05-08, Homogeneity + Monotonicity·LATE — Ch.16.3-16.4
2026-05-08, 약한 IV + 도구 비교 — Ch.16.5-16.6
2026-05-08, 인과 생존 분석 — Ch.17 개관
2026-05-08, Hazard·Risk·Survival 변환 — Ch.17.1-17.2
2026-05-08, Censoring + IPW MSM — Ch.17.3-17.4
2026-05-08, Parametric g-formula + SNMM 생존 — Ch.17.5-17.6
2026-05-08, 변수 선택과 고차원 — Ch.18 개관
2026-05-08, 편향 유발 변수 (Collider·Mediator·M-bias·Z-bias) — Ch.18.1-18.2
2026-05-08, Doubly Robust ML + Sample Splitting + Cross-fitting — Ch.18.3-18.4
2026-05-08, 변수 선택의 본질적 어려움 — Ch.18.5
2026-05-08, 시간변동 처치 — Ch.19 개관 (Part III 시작)
2026-05-08, 처치 전략 분류 — Ch.19.1-19.2
2026-05-08, Sequentially Randomized + Sequential Exchangeability — Ch.19.3-19.4
2026-05-08, 부분 식별 + 시간변동 Confounder — Ch.19.5-19.6
2026-05-08, Treatment-Confounder Feedback — Ch.20 개관
2026-05-08, TC Feedback elements + Table 20.1 의 4 도구 실패 — Ch.20.1-20.2
2026-05-08, 실패 메커니즘 + Fix 시도 실패 — Ch.20.3-20.4
2026-05-08, 과거 처치 보정 + Mismeasurement — Ch.20.5
2026-05-08, G-methods for Time-Varying — Ch.21 개관
2026-05-08, Time-Varying G-Formula + IPW MSM — Ch.21.1-21.2
2026-05-08, Doubly Robust + G-Estimation — Ch.21.3-21.4
2026-05-08, Censoring 통합 + Big G-Formula — Ch.21.5-21.6
2026-05-08, Target Trial Emulation — Ch.22 개관
2026-05-08, ITT vs Per-Protocol — Ch.22.1-22.2
2026-05-08, Emulation 절차 + Time Zero — Ch.22.3-22.4
2026-05-08, What-If 통합 + Framework 표준화 — Ch.22.5
2026-05-08, Causal Mediation — Ch.23 개관
2026-05-08, 매개 분석의 비판과 옹호 — Ch.23.1-23.2
2026-05-08, 경험적 매개 + 개입주의 매개 이론 — Ch.23.3-23.4
Causal Inference for Experimentation ⭐
- 1111-11-11, Potential Outcomes Framework ⭐
  - 1111-11-11, Rubin Causal Model (루빈 인과 모형)
  - 1111-11-11, Average Treatment Effect (ATE, 평균처치효과)
  - 1111-11-11, CATE and ITE (조건부·개별 처치효과)
  - 1111-11-11, SUTVA (안정단위처치값 가정)
- 1111-11-11, Directed Acyclic Graphs ⭐
  - 1111-11-11, Causal Pathways (인과경로)
  - 1111-11-11, Confounding Paths (교란경로)
  - 1111-11-11, Colliders and Selection Bias (충돌자와 선택편향)
  - 1111-11-11, d-separation (d-분리)
  - 1111-11-11, Backdoor and Frontdoor Criteria
- 1111-11-11, Assignment Mechanisms
  - 1111-11-11, Intent-to-Treat (ITT) Analysis
  - 1111-11-11, Per-Protocol Analysis
  - 1111-11-11, As-Treated Analysis
  - 1111-11-11, Complier Average Causal Effect (CACE)
- 1111-11-11, Interference and Spillover
  - 1111-11-11, Network Effects (네트워크 효과)
  - 1111-11-11, Cluster Randomization (군집 무작위배정)
  - 1111-11-11, Switchback Experiments
  - 1111-11-11, Geo-experiments

1.4.3 Multi-Armed Bandit Fundamentals

A/B Testing이 “학습 후 의사결정”이라면, MAB는 “학습과 최적화를 동시에” 수행한다. Exploration-exploitation trade-off는 제한된 자원(트래픽)으로 최대 성과를 내는 전략의 핵심이다. Regret 개념은 “실험 비용”을 수학적으로 정량화하여 알고리즘 간 비교를 가능하게 한다.

Phase I — Multi-Armed Bandit 시리즈 (8 편 완결, 교재 미보유 — agent 사전학습 기반)
- 2026-05-09, MAB 문제 정의 — Exploration vs Exploitation
- 2026-05-09, Epsilon-Greedy — Fixed · Decaying 의 한계
- 2026-05-09, UCB1 — 신뢰구간 기반 탐색
- 2026-05-09, Thompson Sampling — Beta-Bernoulli · Gaussian
- 2026-05-09, MAB vs A/B Test — Decision Framework
- 2026-05-09, Contextual Bandit — LinUCB
- 2026-05-09, Non-stationary Bandit — Sliding Window · Discounted
- 2026-05-09, Best Arm Identification — Pure Exploration
Multi-Armed Bandits Overview ⭐
- 1111-11-11, MAB Problem Formulation
  - 1111-11-11, Exploration vs. Exploitation Trade-off
  - 1111-11-11, Regret Definition (후회 정의)
    - Cumulative Regret (누적 후회)
    - Simple Regret (단순 후회)
  - 1111-11-11, Reward Distributions (보상 분포)
  - 1111-11-11, Stationarity Assumptions (정상성 가정)
- 1111-11-11, Performance Metrics
  - 1111-11-11, Expected Cumulative Regret
  - 1111-11-11, Best Arm Identification
  - 1111-11-11, Sample Complexity
  - 1111-11-11, Convergence Rates

1.4.4 Classical MAB Algorithms

Epsilon-greedy는 가장 단순하지만 exploration 비율을 수동으로 조정해야 하는 한계가 있다. UCB는 불확실성을 자동으로 정량화하여 exploration을 관리하며, 이론적 regret bound가 증명되었다. Thompson Sampling은 1933년 의학 실험을 위해 개발된 알고리즘으로, 실무에서 가장 성능이 좋고 구현이 간단하여 광범위하게 사용된다.

Classical Bandit Algorithms
- 1111-11-11, Epsilon-Greedy Methods
  - 1111-11-11, Fixed Epsilon Strategy
  - 1111-11-11, Decaying Epsilon Strategy
  - 1111-11-11, Theoretical Regret Bounds
- 1111-11-11, Upper Confidence Bound (UCB)
  - 1111-11-11, UCB1 Algorithm
  - 1111-11-11, UCB-Tuned
  - 1111-11-11, Bayesian UCB
  - 1111-11-11, KL-UCB
- 1111-11-11, Thompson Sampling ⭐
  - 1111-11-11, Beta-Bernoulli Thompson Sampling
  - 1111-11-11, Gaussian Thompson Sampling
  - 1111-11-11, Theoretical Properties
  - 1111-11-11, Historical Context (의학 실험 기원)

1.4.5 MAB vs. A/B Testing

두 방법의 trade-off를 이해해야 상황에 맞는 선택이 가능하다. A/B Testing은 statistical validity가 명확하고 효과 크기를 정확히 추정하지만, MAB는 실험 중 기회비용을 최소화한다. 비즈니스 목표(정확한 측정 vs. 빠른 최적화), 트래픽 규모, 의사결정 맥락에 따라 최적 방법이 달라진다.

1.5 Advanced Applications (2-3개월)

1.5.1 Phase J — 고급 응용 (28편 완결, 2026-05-09)

Phase J 는 Phase D, E, F 의 종합 응용. HTE (이질적 처치 효과), Quasi-experimental (DiD/RDD), Network spillover (Switchback/Geo), Adaptive trial 의 4 그룹.

1.5.2 Heterogeneous Treatment Effects

“평균적으로 효과가 있다”는 것과 “모든 사용자에게 효과가 있다”는 것은 다르다. HTE 분석을 통해 어떤 사용자 세그먼트에서 효과가 크고 작은지 파악할 수 있다. Causal forests와 meta-learners 같은 ML 방법은 수백 개의 특성 조합에서 효과 패턴을 자동으로 발견한다. 개인화 전략 수립의 과학적 기반이 된다.

1111-11-11, When to Use MAB vs. A/B Testing ⭐
- 1111-11-11, Trade-offs and Decision Framework
- 1111-11-11, Statistical Validity Considerations
- 1111-11-11, Business Context and Goals
- 1111-11-11, Hybrid Approaches
Heterogeneous Treatment Effects ⭐
- 1111-11-11, Subgroup Analysis
  - 1111-11-11, Pre-specified Subgroups
  - 1111-11-11, Multiple Comparison Corrections
  - 1111-11-11, Statistical vs. Practical Significance
- 1111-11-11, Effect Modification Analysis
  - 1111-11-11, Interaction Terms (상호작용항)
  - 1111-11-11, Stratified Analysis (층화분석)
- 1111-11-11, Machine Learning Methods
  - 1111-11-11, Causal Forests
  - 1111-11-11, Meta-learners (S-, T-, X-learner)
  - 1111-11-11, Double Machine Learning (DML)
  - 1111-11-11, BART (Bayesian Additive Regression Trees)

1.5.3 Variance Reduction Techniques

동일한 sample size로 더 정확한 추정이 가능하면, 실험 기간을 단축하거나 더 작은 효과를 감지할 수 있다. CUPED는 실험 전 데이터(baseline)를 활용해 분산을 최대 50% 이상 줄일 수 있어 실무에서 표준 기법이 되었다. Stratification과 regression adjustment는 역학에서 검증된 방법으로, 디지털 실험에 직접 적용 가능하다.

Variance Reduction Methods ⭐
- 1111-11-11, Pre-experiment Methods
  - 1111-11-11, Stratification (층화)
  - 1111-11-11, Matched Pair Design (대응설계)
  - 1111-11-11, Blocking (블록화)
- 1111-11-11, Post-experiment Methods
  - 1111-11-11, CUPED (Controlled-experiment Using Pre-Experiment Data) ⭐
  - 1111-11-11, CUPAC (CUPED with Asymptotic Confidence)
  - 1111-11-11, Regression Adjustment (회귀조정)
  - 1111-11-11, Difference-in-Differences (이중차분법)

1.5.4 Contextual and Advanced Bandits

Contextual bandits는 사용자 특성(context)을 고려해 개인화된 의사결정을 내린다. Non-stationary bandits는 시간에 따라 최적 선택지가 변하는 현실을 반영한다. 추천 시스템, 동적 가격 책정, 개인화 마케팅 등 복잡한 실무 문제에 적용된다.

Advanced Bandit Methods
- 1111-11-11, Contextual Bandits
  - 1111-11-11, Linear Contextual Bandits
  - 1111-11-11, LinUCB Algorithm
  - 1111-11-11, Neural Bandits
  - 1111-11-11, Policy Gradient Methods
- 1111-11-11, Non-stationary Bandits
  - 1111-11-11, Sliding Window Approaches
  - 1111-11-11, Discounted Rewards
  - 1111-11-11, Change Detection Methods
  - 1111-11-11, Switching Bandits
- 1111-11-11, Structured Bandits
  - 1111-11-11, Combinatorial Bandits
  - 1111-11-11, Dueling Bandits
  - 1111-11-11, Ranking Bandits

1.5.5 Practical Implementation Challenges

이론적으로 완벽한 실험도 실무에서는 metric 정의, novelty effects, network interference, SRM 등 다양한 문제에 직면한다. 이러한 문제들을 감지하고 완화하는 방법을 모르면 잘못된 의사결정으로 이어진다. North star metric과 guardrail metric 설정은 실험 프로그램의 성공을 좌우한다.

Practical Experimentation Challenges
- 1111-11-11, Metric Design
  - 1111-11-11, North Star Metrics (핵심지표)
  - 1111-11-11, Proxy Metrics (대리지표)
  - 1111-11-11, Guardrail Metrics (가드레일지표)
  - 1111-11-11, Long-term vs. Short-term Metrics
- 1111-11-11, Novelty and Primacy Effects
  - 1111-11-11, Detection Methods
  - 1111-11-11, Mitigation Strategies
- 1111-11-11, Network Effects and Interference
  - 1111-11-11, Detection of SUTVA Violations
  - 1111-11-11, Cluster-based Approaches
  - 1111-11-11, Graph Cluster Randomization
- 1111-11-11, Sample Ratio Mismatch
  - 1111-11-11, Detection Methods
  - 1111-11-11, Root Cause Analysis
  - 1111-11-11, Prevention Strategies

1.6 Platform and Specialization (지속적)

1.6.0.1 Experimentation Platform Architecture

실험이 일회성이 아닌 조직의 표준 프로세스가 되려면 확장 가능한 인프라가 필요하다. Assignment service, logging, analysis engine의 설계는 실험의 신뢰성과 효율성을 결정한다. Feature flag integration과 auto-stopping 같은 자동화는 실험 운영 비용을 획기적으로 줄인다.

Platform Design and Infrastructure
- 1111-11-11, Core Components
  - 1111-11-11, Assignment Service (배정 서비스)
  - 1111-11-11, Logging and Tracking (로깅과 추적)
  - 1111-11-11, Analysis Engine (분석 엔진)
  - 1111-11-11, Reporting Dashboard (보고 대시보드)
- 1111-11-11, Technical Considerations
  - 1111-11-11, Consistent Hashing for Assignment
  - 1111-11-11, Experiment Overlap and Orthogonality
  - 1111-11-11, Feature Flag Integration
  - 1111-11-11, A/A Testing for Validation
- 1111-11-11, Scale and Automation
  - 1111-11-11, Auto-stopping Rules (자동중단규칙)
  - 1111-11-11, Winner Selection Algorithms
  - 1111-11-11, Multi-objective Optimization

1.6.1 Domain-Specific Applications

제품 최적화, 마케팅, 의료 등 도메인마다 고유한 제약과 요구사항이 있다. Healthcare의 경우 FDA guidance를 따라야 하며, marketplace 실험은 양면 시장의 특성을 고려해야 한다. 도메인 특화 지식이 실험 설계의 성공 여부를 결정한다.

Industry Applications
- 1111-11-11, Product Optimization
  - 1111-11-11, UI/UX Experiments
  - 1111-11-11, Recommendation System Testing
  - 1111-11-11, Search Ranking Experiments
- 1111-11-11, Growth and Marketing
  - 1111-11-11, Conversion Funnel Optimization
  - 1111-11-11, Pricing Experiments
  - 1111-11-11, Email and Notification Testing
- 1111-11-11, Healthcare Applications
  - 1111-11-11, Adaptive Clinical Trials
  - 1111-11-11, Response-Adaptive Randomization
  - 1111-11-11, Platform Trials
  - 1111-11-11, Regulatory Considerations (FDA Guidance)

1.6.2 Research Frontiers

RL과의 통합, differential privacy, causal discovery 등은 차세대 실험 방법론이다. Off-policy evaluation은 과거 실험 데이터로 새로운 정책을 평가할 수 있게 하여 실험 비용을 줄인다. 이 분야의 최신 연구를 추적하면 경쟁 우위를 확보할 수 있다.

Emerging Topics and Research
- 1111-11-11, Reinforcement Learning Integration
  - 1111-11-11, Bandits as RL Problems
  - 1111-11-11, Deep RL for Experimentation
  - 1111-11-11, Off-policy Evaluation
- 1111-11-11, Privacy-Preserving Experiments
  - 1111-11-11, Differential Privacy in Experiments
  - 1111-11-11, Federated A/B Testing
- 1111-11-11, Causal Discovery
  - 1111-11-11, Constraint-based Methods
  - 1111-11-11, Score-based Methods
  - 1111-11-11, Experimentation for Graph Learning

관련 참조: HTE 분석을 기반으로 한 Agent 개인화 전략은 Agent - Segmentation & Personalization 섹션에서 다룬다. Agent 시스템에서의 사용자 세분화 및 A/B 테스트 적용 사례를 확인할 수 있다.

1.7 Trustworthy Online Controlled Experiments — Kohavi (2020)

Kohavi, Tang, Xu (2020) “Trustworthy Online Controlled Experiments” 의 Part I~V (Ch.4~23) 를 챕터별로 분해한 시리즈 (총 64 편 예정, Phase F). 디지털 실험의 7 대 도전 (OEC, SRM, CUPED, Triggering, Ramping, Leakage, Long-term) 을 정통으로 흡수한다.

1.7.1 Ch.4 — Platform & Culture

1.7.2 Ch.5 — Speed Matters

2026-05-08, Ch.5 개관 — Speed Matters End-to-End ⭐
2026-05-08, 국소 선형 근사와 사이트 속도 측정
2026-05-08, 의도적 지연 실험 설계 + 페이지 요소별 영향
2026-05-08, 극단적 결과 주의 — 외삽의 함정

1.7.3 Ch.6 — Organizational Metrics

2026-05-08, Ch.6 개관 — 조직 지표
2026-05-08, 지표 분류 체계 (Goal·Driver·Guardrail)
2026-05-08, 지표 형성·평가·진화
2026-05-08, 가드레일과 Gameability

1.7.4 Ch.7 — OEC (Overall Evaluation Criterion)

1.7.5 Ch.8 — Institutional Memory and Meta-Analysis

2026-05-08, Ch.8 개관 — 제도적 기억과 메타분석
2026-05-08, 제도적 기억의 정의와 가치

1.7.6 Ch.9 — Ethics in Controlled Experiments

1.7.7 Ch.12 — Client-Side Experiments

2026-05-08, Ch.12 개관 — 클라이언트 사이드 실험
2026-05-08, Server vs Client 차이
2026-05-08, 함의 1~3 — 사전 예측·지연 로깅·Failsafe
2026-05-08, 함의 4~5 — Triggered 분석·Guardrail 추적
2026-05-08, 함의 6~7 — Quasi-experimental·Multi-Device + 결론

1.7.8 Ch.13 — Instrumentation

2026-05-08, Ch.13 개관 — 계측
2026-05-08, Client-Side vs Server-Side 계측
2026-05-08, 로그 처리와 계측 문화

1.7.9 Ch.14 — Choosing a Randomization Unit

2026-05-08, Ch.14 개관 — 무작위 배정 단위
2026-05-08, 배정·분석 단위와 사용자 단위

1.7.10 Ch.15 — Ramping Experiments (SQR)

2026-05-08, Ch.15 개관 — Ramping with SQR
2026-05-08, Ramping 정의와 SQR 프레임워크
2026-05-08, 4 Ramp 단계 — Pre-MPR·MPR·Post-MPR
2026-05-08, 장기 홀드아웃·복제·사후 단계

1.7.11 Ch.16 — Scaling Experiment Analyses

1.7.12 Ch.18 — Variance Estimation and CUPED

1.7.13 Ch.19 — A/A Test

1.7.14 Ch.20 — Triggering for Improved Sensitivity

2026-05-08, Ch.20 개관 — Triggering
2026-05-08, Triggering 사례 1~3 — Partial·Conditional·Coverage
2026-05-08, 사례 4~5 — Coverage Change·Counterfactual
2026-05-08, 수치 분석 + Optimal·Conservative Triggering
2026-05-08, 신뢰할 수 있는 Triggering + 흔한 함정
2026-05-08, Triggering 미해결 문제

1.7.15 Ch.21 — Sample Ratio Mismatch (SRM) and Trust Guardrails

2026-05-08, Ch.21 개관 — SRM 과 Trust Guardrails
2026-05-08, SRM 정의와 시나리오 1·2
2026-05-08, SRM 원인과 디버깅 절차
2026-05-08, 신뢰 가드레일 지표 4 종

1.7.16 Ch.22 — Leakage and Interference between Variants

2026-05-09, Ch.22 개관 — Leakage·Interference ⭐
2026-05-09, Direct·Indirect Connection 6 사례
2026-05-09, Rule-of-Thumb · Ecosystem Value
2026-05-09, Isolation 4 갈래 · Edge-Level · Detection

1.7.17 Ch.23 — Measuring Long-Term Treatment Effects

2026-05-09, Ch.23 개관 — 장기 처리 효과 ⭐
2026-05-09, 정의 · 6 갈래 차이 · 측정 목적 3 가지
2026-05-09, Long-Running Experiments 의 4 가지 한계
2026-05-09, Cohort · Post-Period · Time-Staggered · Holdback

1.8 Design of Experiments — Maxwell·Montgomery (Phase G)

Maxwell & Delaney (2004) “Designing Experiments and Analyzing Data” 와 Das & Giri (1986) “Design and Analysis of Experiments” 의 정통 DOE 체계를 챕터 단위로 분해한 시리즈다 (총 89 편 예정, Phase G). ANOVA 모형 비교, factorial, ANCOVA, repeated measures, multilevel, incomplete block, response surface 를 커버한다. 농학·심리학 사례를 IT multivariate testing · ML 하이퍼파라미터 튜닝 맥락으로 매핑한다.

1.8.1 MAX Ch.6 — Trend Analysis (양적 요인의 추세 분해)

1.8.2 MAX Ch.7 — Two-Way Factorial (이원 요인 설계)

1.8.3 MAX Ch.8 — Higher-Order Factorial (삼원 이상의 요인 설계)

1.8.4 MAX Ch.9 — ANCOVA (공변량으로 분산 감소·편향 보정)

2026-05-08, Ch.9 개관 — ANCOVA 개관
2026-05-08, ANCOVA 모형과 회귀 동질성
2026-05-08, 보정 평균과 Lord 의 역설
2026-05-08, Change Score · Residual ANOVA · Blocking

1.8.5 MAX Ch.10 — Random and Nested Factors (임의 효과·중첩 설계)

2026-05-08, Ch.10 개관 — Random/Nested 개관
2026-05-08, Fixed vs Random + EMS
2026-05-08, Two-way Mixed Model + 오차항 선택
2026-05-08, 중첩 설계와 분산 성분

1.8.6 MAX Ch.11 — Within-Subjects Univariate (반복 측정 단변량)

2026-05-08, Ch.11 개관 — Within-Subjects 개관
2026-05-08, 세 가지 시나리오와 차이 점수
2026-05-08, 혼합 모형 ANOVA 와 구형성
2026-05-08, GG/HF ε 조정과 순서 효과
2026-05-08, 라틴 방격 역균형화와 검정력

1.8.7 MAX Ch.12 — Higher-Order Within Univariate (다요인 반복 측정)

1.8.8 MAX Ch.13 — Within-Subjects Multivariate (반복 측정 다변량)

2026-05-08, Ch.13 개관 — 다변량 within 개관
2026-05-08, D 변수와 Hotelling T²
2026-05-08, Wilks Λ · Pillai · Hotelling-Lawley · Roy
2026-05-08, Univariate vs Multivariate 선택

1.8.9 MAX Ch.14 — Higher-Order Within Multivariate

2026-05-08, Ch.14 개관
2026-05-08, 2×2 D 변수 형성
2026-05-08, a×b 확장
2026-05-08, 분할구 다변량 분석

1.8.10 MAX Ch.15 — Multilevel Models (다층 모형)

1.8.11 MAX Ch.16 — Hierarchical Mixed Nested

1.8.12 MON Ch.2 — Complete Block Designs (CRD, RBD, Latin Square)

1.8.13 MON Ch.3 — Factorial Experiments (정통 factorial)

2026-05-08, Ch.3 개관 — Factorial 개관
2026-05-08, \(2^k\) Factorial
2026-05-08, 유한체와 교호작용 그룹화
2026-05-08, 교락 (Confounding)
2026-05-08, \(3^k\) Factorial
2026-05-08, 일반 구성과 최대 요인 수
2026-05-08, 분수실시법 (Fractional Factorial)

1.8.14 MON Ch.4 — Asymmetrical · Split-Plot Designs

1.8.15 MON Ch.5 — Incomplete Block Designs (BIB, PBIB, Lattice)

2026-05-08, Ch.5 개관 — Incomplete Block 개관
2026-05-08, BIB 도입
2026-05-08, BIB 의 구성 방법
2026-05-08, BIB 분석과 블록 간 정보 복구
2026-05-08, Youden 과 Lattice 설계
2026-05-08, PBIB 와 분석
2026-05-08, 블록 간 정보 복구와 최적성

1.8.16 MON Ch.6 — Orthogonal Latin Squares (MOLS, Euler 추측)

2026-05-08, Ch.6 개관 — 직교 라틴 방격 개관
2026-05-08, MOLS 의 최대 수
2026-05-08, MOLS 의 구성
2026-05-08, Pairwise Balanced 와 Euler 추측의 거짓

1.8.17 MON Ch.7 — Bio-assays · Response Surface (생물 검정·반응표면)

1.8.18 MON Ch.8 — ANCOVA · Transformation

1.8.19 MON Ch.9 — Weighing Designs (계량 설계)

1.9 Project: AI Agent A/B Test 실험설계

MINERVA Agent(QnA Chatbot, Data Standardization Helper, Insilico Code Analysis)의 성능 측정을 위한 실험설계 시리즈다. 오프라인 평가부터 프로덕션 동적 라우팅까지 단계적으로 다룬다.

1.10 Analytics Applications

1111-11-11, Conversation Analytics
- 대화 데이터 분석 및 최적화
- 사용자 인터랙션 패턴 분석
- 실험을 통한 대화 품질 개선

1.11 References

Books: - Kohavi, Tang, and Xu (2020). “Trustworthy Online Controlled Experiments” - Lattimore and Szepesvári (2020). “Bandit Algorithms” - Imbens and Rubin (2015). “Causal Inference for Statistics, Social, and Biomedical Sciences” - Pearl and Mackenzie (2018). “The Book of Why”

Papers: - Thompson (1933). “On the likelihood that one unknown probability exceeds another” - Auer et al. (2002). “Finite-time analysis of the multiarmed bandit problem” - Deng et al. (2013). “Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data” (CUPED)

Online Courses: - Stanford CS234: Reinforcement Learning - MIT 6.S897: Machine Learning for Healthcare