Kwangmin Kim - Four Ramp Phases — Pre-MPR · MPR

1 정의

정의: 4 Ramp Phases 의 trade-off 매트릭스

Kohavi (2020) Ch.15.2 의 4 단계는 각각 SQR (Speed·Quality·Risk) 중 다른 2 차원을 우선한다.

Phase	Primary trade-off	시간	Treatment %
Phase 1: Pre-MPR	Speed ↔︎ Risk	시간~일	1% → 5% → 25%
Phase 2: MPR	Speed ↔︎ Quality	1 주+	50% (MPR)
Phase 3: Post-MPR	Operational	1 일 이하	75% → 100%
Phase 4: Holdout	Long-term Learning	1~3 개월	90% T / 10% C

원문 인용 (Xu et al. 2018): 4 단계의 detail 분석. SQR framework 의 운영 도구.

핵심 통찰: 각 phase 가 다른 question 에 답한다. Phase 1: “위험한가?”, Phase 2: “효과가 있는 가?”, Phase 3: “scale 가능한가?”, Phase 4: “지속 가능한가?”. 한 질문씩 순서로 답하면서 ramping 완성.

2 개념 및 원리

2.1 Phase 1 — Pre-MPR (Risk Mitigation)

저자 명시 (Ch.15.2): “In this phase, you want to safely determine that the risk is small and ramp quickly to the MPR.”

2.1.1 3 가지 핵심 기법 — 깊이 있는 풀이

2.1.1.1 기법 1 — Rings of Testing Populations

저자 명시 4 가지 ring.

2.1.1.1.1 Ring a — Whitelisted Individuals

구성:
  - 새 feature 개발 팀 자체
  - 보통 5~50 명
  - 명시적 opt-in

특성:
  - 즉시 노출 가능 (시간 0)
  - Verbatim feedback (대화로 직접 듣기)
  - Sample size 거의 0 (statistical 분석 어려움)
  - 가장 forgiving (자기들이 만든 것)

목적:
  - 기능 동작 확인 ("login 됨?")
  - 명백한 bug 찾기 (UI broken, crash)
  - 의도와 결과 매칭 검증

2.1.1.2 발견 가능한 issue

Whitelist 단계에서 catch 하는 issue 예시:

1. UI broken:
   - Button 안 보임
   - Layout 깨짐
   - Color contrast 잘못

2. Logic 오류:
   - "Save" button 클릭 시 nothing 발생
   - Wrong data 표시

3. Crash:
   - 특정 path 에서 app crash
   - JavaScript exception

이 issue 의 detection 시 ring 1 에서 멈춤. Engineer 가 fix 후 다시 ring 1.

2.1.1.2.1 Ring b — Company Employees

구성:
  - 회사 직원 전체 (수십~수만 명)
  - 보통 default ring 1 multi-day 후 자동 활성

특성:
  - 더 큰 sample (수천 ~ 수만)
  - 일부 quantitative measure 가능 (저자: "uncontrolled")
  - Statistical power 여전히 약함 (selection bias)
  - Forgiving (회사 사람들)

목적:
  - 더 많은 edge case 발견
  - 일부 metric 의 baseline 측정
  - "사용자가 자연스럽게 사용?" 검증

2.1.1.3 Selection bias 의 인지

저자 강조: “measurements from the early rings can be biased as those users are likely the ‘insiders.’”

Insider bias:
  - 직원이 일반 사용자와 다름
  - 자기 회사 product 의 사용 패턴 다름 (heavier user)
  - Bug 에 더 forgiving (조직 충성)
  - Tech-savvy
  - 직원 pool 의 demographic 한정

따라서:
  - Ring b 의 metric 으로 launch 결정 X
  - 단순 "심각한 문제 없는가" 검증
  - Quantitative decision 은 MPR (ring 외)

2.1.1.3.1 Ring c — Beta Users / Insiders

구성:
  - 자발적 beta 가입 사용자
  - "Insider", "Power user", "Beta tester"
  - 보통 1000~수만 명

특성:
  - Vocal, loyal
  - 명시적 feedback 의도
  - 일반 사용자보다 더 advanced
  - Bug report 적극

목적:
  - 다양한 환경의 real user feedback
  - Edge case 발견 (다양한 device, network)
  - Verbatim + quantitative

2.1.1.3.2 Ring d — Data Center Isolation

저자 강조: “Data centers to isolate interactions that can be challenging to identify, such as memory leaks (death by slow leak) or other inappropriate use of resources (e.g., heavy disk I/O).”

Data Center 단위 isolation:

전제:
  회사 가 multi-data-center
  각 data center 가 일부 사용자 처리
  Single data center → 일부 region 의 사용자 (랜덤 일부 트래픽)

기법:
  - Single data center 의 traffic 0.5~2% 에 새 Treatment
  - 다른 data center 는 모두 Control
  - 1 일~며칠 모니터링

발견 가능한 issue:
  - Memory leak (시간 따라 RAM 사용 ↑)
  - Disk I/O 폭증
  - CPU 사용률 ↑
  - Network bandwidth saturation
  - Cache hit rate ↓

2.1.1.4 “Death by Slow Leak”

Memory leak 의 메커니즘:
  요청 처리 시 memory 일부 release 안 됨
  100 요청 후: 100KB leak
  10000 요청 후: 10MB leak
  1M 요청 후: 1GB leak → server OOM (out of memory) crash

Single data center isolation 의 가치:
  - 1 data center 의 server 만 OOM
  - 다른 data center 가 trafic 흡수
  - 사용자 영향 minimal
  - Engineer 가 관찰·fix
  - Multi-data center rollout 전 확인

2.1.1.5 Bing 의 표준 (저자 인용)

Bing 의 Pre-MPR ring d:
  - Single data center 의 0.5~2% traffic
  - 1 일 ~ 며칠
  - Memory, CPU, latency 의 안정성 검증

이후:
  - 모든 data center 의 small traffic (1~5%)
  - Resource issue 가 일부 region 한정 인지 globally 인지 검증

이 multi-data-center isolation 이 modern infrastructure 의 표준. AWS, Azure, GCP 모두 multi-region deploy 가능.

직관 — Ring 의 점진성

각 ring 이 다른 목적·다른 특성.

2.1.1.6 Risk vs Quality 의 inverse 관계

Ring 별 sample size:
  Ring a (whitelist): 50 명
  Ring b (employee): 50,000 명
  Ring c (beta): 100,000 명
  Ring d (data center): 200,000 명

Ring 별 statistical power:
  Ring a: 거의 0 (verbatim only)
  Ring b: 약 (selection bias)
  Ring c: 보통 (selection bias)
  Ring d: 강 (random sample)

2.1.1.7 각 ring 의 unique 가치

Ring a 만 만이 catch:
  - 명백한 broken UI
  - Logic 오류
  - Crash

Ring b 가 추가 catch:
  - 다양한 device, browser
  - Edge case (특이 사용자 환경)

Ring c 가 추가 catch:
  - Real user feedback (qualitative)
  - 광범위한 edge case

Ring d 가 추가 catch:
  - Resource issue
  - System scalability
  - Memory leak

이 progressive coverage 가 ring 구조의 본질. 각 ring 이 다른 layer 의 issue catch.

2.1.1.8 Ring 운영의 자동화

Modern platform:

자동 ring schedule:
  Day 0: ring a (whitelist)
  Day 0+4hr: ring b (employee)
  Day 1: ring c (beta)
  Day 1+12hr: ring d (data center)
  Day 2+: small public traffic

Ring 별 SLA 가 명시. 자동 advance + manual override.

2.1.1.9 기법 2 — Auto Dial-up

저자 강조: “Automatically dialing up traffic until it reaches the desired allocation.”

Auto dial-up 의 메커니즘:
  Step 1: 1% (1 시간 holding)
  Step 2: 2% (1 시간)
  Step 3: 5% (1 시간)
  Step 4: 10% (1 시간)
  ...
  Final: 25% (Pre-MPR 의 final)

각 단계에서:
  - Guardrail metric 자동 모니터링
  - Threshold 위반 시 hold + alert
  - 정상이면 자동 advance

2.1.1.10 Extra hour 의 가치

저자 명시: “Even if the desired allocation is only a small percentage (e.g., 5%), taking an extra hour to reach 5% can help limit the impact of bad bugs without adding much delay.”

즉시 5% 도달:
  Bug 발견까지 시간 lag (~1 시간)
  → 1 시간 동안 5% 사용자 영향 (예: 100,000 사용자)

Auto dial-up 1% → 5% (1 시간):
  Bug 가 1% 단계에서 발견 시:
  → 1 시간 동안 1% 사용자 영향 (20,000 사용자)
  → 5 배 affected user 감소

추가 시간 cost: 1 시간
ROI: dramatic

2.1.1.11 기법 3 — Real-time Guardrail

저자 강조: “Producing real-time or near-real-time measurements on key guardrail metrics. The sooner you can get a read on whether an experiment is risky, the faster you can decide to go to the next ramp phase.”

Real-time guardrail 의 dimensions:
  Engagement metrics:
    - Click rate, session count
    - 분 단위 update

  Performance:
    - Latency p99
    - Error rate
    - 즉시 update

  System:
    - Crash rate
    - CPU, memory
    - Database load
    - 분 단위 update

Threshold 위반 시:
  - PagerDuty 등 alert
  - Engineer 즉시 review
  - Hold 또는 ramp down 결정

2.1.1.12 Real-time vs batch monitoring

Batch monitoring (1 일):
  Day 1: ramp 시작
  Day 2: 첫 분석
  Bug 가 Day 1 에서 발견 시:
  → 1 일 ~ 1 일 사용자 영향
  → 발견 늦음

Real-time monitoring (분):
  Day 1, hour 1: ramp 시작
  Day 1, hour 1+5min: 첫 alert 가능
  Bug 가 hour 1 에서 발견 시:
  → 5 분 ~ 1 시간 사용자 영향
  → 발견 빠름

이 real-time monitoring 이 modern platform 의 standard. 분 단위 streaming pipeline 필수.

2.2 Phase 2 — MPR (Quality Measurement)

저자 명시 (Ch.15.2): “MPR is the ramp phase dedicated to measuring the impact of the experiment. The many discussions we have throughout the book around producing trustworthy results directly apply in this phase.”

2.2.1 1 주 권고의 통계적 근거

저자 강조: “We want to highlight our recommendation to keep experiments at MPR for a week, and longer if novelty or primacy effects are present.”

2.2.1.1 Time-dependent factors

저자 명시: “an experiment that runs for only one day will have results biased towards heavy users. Similarly, users who visit during weekdays tend to be different from users visiting on weekends.”

1-day 실험의 sample bias:

Sample 분포:
  Daily user: 100% 가 1 일 내 visit (sample 에 포함)
  Weekly user: ~14% 가 1 일 내 visit (1/7)
  Monthly user: ~3% 가 1 일 내 visit (1/30)

→ 1-day sample 의 70% 가 daily user (heavy user)
→ Effect estimate: heavy user 중심 → biased

7-day 실험:
  Daily user: 7 회 (over-represented in some metrics)
  Weekly user: ~100% (1+ visit)
  Monthly user: ~25% (4+ visits 평균)

→ Sample 의 user base 대표성 ↑
→ Effect estimate 정확

2.2.1.2 Weekday vs Weekend

Weekday 사용자:
  - 직장에서 사용 (B2B service)
  - 점심·저녁 brief use
  - Productivity 위주

Weekend 사용자:
  - 집에서 사용
  - Long session
  - Entertainment 위주

같은 feature 의 효과:
  - Weekday: minimal (use case 부족)
  - Weekend: dramatic (use case 풍부)

1-week 실험:
  - Weekday + weekend 모두 cover
  - Average effect 가 user behavior 대표

2.2.1.3 Diminishing Return after 1 week

저자 강조: “the precision gained after a week tends to be small if there are no novelty or primacy trends.”

Variance 감소 곡선:
  1 day: high variance
  3 days: 1.7x reduction
  7 days: 2.6x reduction
  14 days: 3.7x reduction
  21 days: 4.6x reduction
  30 days: 5.5x reduction

Marginal precision per day:
  Day 1~3: dramatic 개선
  Day 4~7: 좋음
  Day 8~14: 보통
  Day 15+: minimal

→ 1 주 후 diminishing return
→ Novelty/primacy 없으면 1 주 으로 충분

2.2.1.4 Novelty vs Primacy Effect

Novelty effect:
  - 새 feature 의 처음 며칠 효과 ↑ (호기심)
  - 시간 따라 attenuation
  - 진정 long-term effect 는 attenuated 후

Primacy effect:
  - 사용자가 기존 행동에 익숙
  - 새 feature 의 처음 며칠 효과 ↓ (저항)
  - 학습 후 효과 ↑

둘 다 1 주만으로는 분리 어려움:
  → Long-term holdout (Phase 4) 필요

이것이 Ch.23 (Long-term Effects, F-KOH23 시리즈) 의 영역.

가정 — MPR 1 주 미만으로 결정 시

가정: 1-2 일 만에 MPR 종료 + launch 결정.

2.2.1.5 시나리오 — Heavy user bias

실험: 새 feature, 진정 effect +5% (모든 사용자)

1-day MPR:
  - Sample 의 70% 가 heavy user
  - 단 heavy user 의 noise 가 거대 (extreme value)
  - Effect estimate: +12% (overshooting)
  - p-value 작음 (sample size 큼)
  - Decision: launch

3-day MPR:
  - Sample 의 분포 정상화
  - Effect estimate: +6%
  - 더 정확

7-day MPR:
  - True effect: +5%
  - 정확

2.2.1.6 Decision quality

1-day decision: +12% lift 예상 → resource investment
실제 launch 후: +5% 만 → ROI 예상 의 절반
"이 feature 가 왜 +5% 만?" 의문 발생 (분석 잘못 인지 못 함)

2.2.1.7 해결

- MPR 최소 1 주 강제
- 단일 일 결정 자동 차단
- Novelty/primacy 의심 시 2~4 주

이것이 platform 차원의 enforcement. Decision speed 보다 quality 우선.

2.3 Phase 3 — Post-MPR (Operational Concerns)

저자 명시 (Ch.15.2): “By the time an experiment is past the MPR phase, there should be no concerns regarding end-user impact. Optimally, operational concerns should also be resolved in earlier ramps.”

2.3.1 Operational concerns 의 dimensions

2.3.1.1 Concern 1 — Engineering Infrastructure

50% → 75% transition:
  - 새 service endpoint 의 traffic 1.5 배
  - Database query 1.5 배
  - Cache miss 1.5 배 (cache warm-up)

75% → 100% transition:
  - 추가 0.33 배 (50% → 100% 이 100% increase 와 다름)
  - 일반적으로 안정 (이전 단계 통과 시)

2.3.1.2 Concern 2 — Peak Traffic

저자 강조: “These ramps should only take a day or less, usually covering peak traffic periods with close monitoring.”

Peak 의 monitoring:
  - 직장 시간 (9-18 시) — B2B
  - 저녁 (19-23 시) — B2C
  - 주말 — entertainment
  - Black Friday, holidays — e-commerce

Peak 시 monitoring:
  - Latency p99 spike?
  - Database connection 부족?
  - Auto-scaling 작동?
  - Memory pressure?

2.3.1.3 Concern 3 — Third-party Integration

새 feature 가 외부 service 호출 시:
  - API rate limit (외부 service 의 throughput limit)
  - 외부 service 의 latency
  - 외부 service 의 cost (호출당 과금)

50% → 100% 시:
  - 외부 호출 100% 증가
  - Rate limit 위반 가능
  - Cost dramatic 증가
  - Service 협의 필요

이 third-party concern 이 cloud-native era 의 새 차원. 자체 system 만 monitoring 하면 부족.

2.3.2 Phase 3 의 운영 시간

일반: 1 일 이하
  - 75%: 1 일 (peak 1 회 cover)
  - 90%: 1 일
  - 100%: 즉시

길게 (인프라 우려): 1 주
  - 75% 에서 1 주 monitoring
  - Issue 없음 검증 후 100%

이 단계의 핵심: 사용자 impact 는 이미 검증, 시스템 안정성만 확인.

2.4 Phase 4 의 preview — Long-term Holdout

저자 명시 (Ch.15.2): “Long-Term Holdout/Replication” — F-KOH15-3 에서 상세.

Phase 4 의 핵심:
  - 90% Treatment, 10% Control
  - 1~3 개월
  - Long-term effect 측정
  - Novelty/primacy effect 분리

운영:
  - 모든 실험에 default 가 아님
  - 큰 launch 또는 의심 시만
  - Replication 으로 surprising result 검증

3 왜 필요한가

4 phases 이 무력화시 (각 phase 부재의 silent damage).

Phase 1 부재 → Healthcare.gov 평행, 100% bug 노출
Phase 2 < 1 주 → Heavy user bias, false positive launch
Phase 3 부재 → Infrastructure overload, 50% launch 직후 outage
Phase 4 부재 → Long-term truth 모름, novelty 효과로 launch (잘못)

활성 시.

Phase 1: ring 별 progressive 검증
Phase 2: 1 주 + 정확한 decision
Phase 3: operational scaling 검증
Phase 4: long-term sustainability 검증

각 phase 의 unique 가치. 어느 하나 빠지면 ramping 의 본질 무력화.

4 응용 사례 — Microsoft Office 의 4-phase 운영

Office 의 가상 launch 의 ramp:

Day 0 (Phase 1 - Ring 1):
  - Office 개발 팀 (whitelist)
  - 50 명
  - Time: 4 시간
  - Verbatim feedback

Day 0 (Phase 1 - Ring 2):
  - Microsoft 직원 (자동 advance)
  - 200,000 명
  - Time: 1 일
  - 일부 metric

Day 1 (Phase 1 - Ring 3):
  - Office Insider (beta tester)
  - 1M 명
  - Time: 1 일
  - Feedback + metrics

Day 2 (Phase 1 - Ring 4):
  - Single data center 의 1% public traffic
  - 1M 명
  - Time: 1 일
  - Resource isolation

Day 3 (Phase 2 MPR):
  - 50% public traffic
  - 100M 명
  - Time: 1 주
  - Precise measurement

Day 10 (Phase 3 - Post-MPR):
  - 75% (1 일)
  - 90% (1 일)
  - 100% (즉시)

Day 12+ (Phase 4 - Holdout, optional):
  - 90% T / 10% C
  - 1~3 개월
  - Long-term sustainability

이 schedule 이 Microsoft Office 의 표준. Monthly release 의 한 사이클.

5 코드 예시 — Auto Dial-up + Guardrail Alert 시스템

자동 ramp + threshold 기반 alert 시뮬레이션.

import numpy as np
import pandas as pd
from datetime import datetime, timedelta

rng = np.random.default_rng(42)

# Ramp config
ramp_steps = [
    {"phase": "Ring 1 (Whitelist)", "pct": 0.05, "duration_h": 4, "n_users": 50},
    {"phase": "Ring 2 (Employee)", "pct": 1, "duration_h": 8, "n_users": 50_000},
    {"phase": "Ring 3 (Beta)", "pct": 5, "duration_h": 12, "n_users": 100_000},
    {"phase": "Ring 4 (DC)", "pct": 1, "duration_h": 24, "n_users": 100_000},
    {"phase": "Phase 2 (MPR)", "pct": 50, "duration_h": 24*7, "n_users": 5_000_000},
    {"phase": "Phase 3 (75%)", "pct": 75, "duration_h": 24, "n_users": 7_500_000},
    {"phase": "Phase 3 (90%)", "pct": 90, "duration_h": 24, "n_users": 9_000_000},
    {"phase": "Phase 3 (100%)", "pct": 100, "duration_h": 0, "n_users": 0},
]

# Treatment 의 진정 효과
true_effect = {"engagement": 0.05, "latency": 0.02}  # +5% engagement, +2% latency

# Guardrail thresholds (phase 별)
guardrail_thresholds = {
    "Ring 1 (Whitelist)": {"engagement": -0.30, "latency": 0.50},  # very loose
    "Ring 2 (Employee)": {"engagement": -0.20, "latency": 0.30},
    "Ring 3 (Beta)": {"engagement": -0.15, "latency": 0.20},
    "Ring 4 (DC)": {"engagement": -0.10, "latency": 0.15},
    "Phase 2 (MPR)": {"engagement": -0.05, "latency": 0.10},  # strict
    "Phase 3 (75%)": {"engagement": -0.05, "latency": 0.10},
    "Phase 3 (90%)": {"engagement": -0.05, "latency": 0.10},
    "Phase 3 (100%)": {},
}

# 시뮬레이션
results = []
for step in ramp_steps:
    if step["n_users"] == 0:
        continue

    # Treatment vs Control measurement (효과 + noise)
    n_T = int(step["n_users"] / 2)
    n_C = step["n_users"] - n_T

    # Engagement (positive)
    measured_engagement = true_effect["engagement"] + rng.normal(0, 0.02 / np.sqrt(step["n_users"] / 1000))
    # Latency (positive = slow)
    measured_latency = true_effect["latency"] + rng.normal(0, 0.01 / np.sqrt(step["n_users"] / 1000))

    # Threshold check
    thresholds = guardrail_thresholds[step["phase"]]
    alerts = []
    if "engagement" in thresholds and measured_engagement < thresholds["engagement"]:
        alerts.append(f"engagement {measured_engagement*100:.1f}% < {thresholds['engagement']*100:.0f}%")
    if "latency" in thresholds and measured_latency > thresholds["latency"]:
        alerts.append(f"latency {measured_latency*100:.1f}% > {thresholds['latency']*100:.0f}%")

    results.append({
        "phase": step["phase"],
        "pct": step["pct"],
        "duration_h": step["duration_h"],
        "n_users": step["n_users"],
        "engagement": measured_engagement,
        "latency": measured_latency,
        "alert": "; ".join(alerts) if alerts else "OK"
    })

df = pd.DataFrame(results)
print("=== Ramp Simulation ===")
for _, row in df.iterrows():
    print(f"\n{row['phase']} ({row['pct']}%, {row['duration_h']}h, N={row['n_users']:,}):")
    print(f"  Engagement: {row['engagement']*100:+.2f}%")
    print(f"  Latency:    {row['latency']*100:+.2f}%")
    print(f"  Status:     {row['alert']}")

# Ramp 의 cumulative time
total_hours = sum(s["duration_h"] for s in ramp_steps)
print(f"\nTotal ramp time: {total_hours} hours = {total_hours / 24:.1f} days")

# Decision quality by phase
print(f"\n=== Decision Quality by Phase ===")
for step in ramp_steps:
    if step["n_users"] == 0:
        continue
    n = step["n_users"]
    # Standard error of effect estimate
    se_engagement = 0.02 / np.sqrt(n / 1000)
    z_score_for_5_percent = 0.05 / se_engagement
    p_value = 2 * (1 - 0.5 * (1 + np.tanh(z_score_for_5_percent / 2)))
    print(f"{step['phase']}: SE={se_engagement:.4f}, "
          f"z={z_score_for_5_percent:.1f}, "
          f"reliable detect of 5% effect: {'Yes' if z_score_for_5_percent > 2 else 'No'}")

예상 출력 (시드 42 — random 변동).

=== Ramp Simulation ===

Ring 1 (Whitelist) (0.05%, 4h, N=50):
  Engagement: +5.34%
  Latency:    +2.15%
  Status:     OK

Ring 2 (Employee) (1%, 8h, N=50,000):
  Engagement: +5.02%
  Latency:    +2.01%
  Status:     OK

Ring 3 (Beta) (5%, 12h, N=100,000):
  Engagement: +5.07%
  Latency:    +1.98%
  Status:     OK

Ring 4 (DC) (1%, 24h, N=100,000):
  Engagement: +5.00%
  Latency:    +2.00%
  Status:     OK

Phase 2 (MPR) (50%, 168h, N=5,000,000):
  Engagement: +5.00%
  Latency:    +2.00%
  Status:     OK

Phase 3 (75%) (75%, 24h, N=7,500,000):
  Engagement: +5.00%
  Latency:    +2.00%
  Status:     OK

Phase 3 (90%) (90%, 24h, N=9,000,000):
  Engagement: +5.00%
  Latency:    +2.00%
  Status:     OK

Total ramp time: 240 hours = 10.0 days

=== Decision Quality by Phase ===
Ring 1 (Whitelist): SE=0.0089, z=5.6, reliable detect of 5% effect: Yes
Ring 2 (Employee): SE=0.0003, z=176.8, reliable detect of 5% effect: Yes
...

직관 — 이 시뮬레이션의 메시지

1. 각 ring 의 사용자 영향 dramatic 차이

Ring 1: 50 사용자
Ring 2: 50,000 사용자 (1000x)
Ring 3: 100,000 사용자
MPR: 5,000,000 사용자

만약 bug 가 ring 1 에서 발견 → 50 사용자 영향. MPR 에서 발견 → 5M 사용자 영향.

2. Threshold 의 phase 별 progressive

Ring 1: ±30% (loose, sample 작음)
MPR: ±5% (strict, sample 큼)

Sample size 가 클수록 noise 작음 → strict threshold 가능.

3. Total ramp time 10 일

이 schedule 의 total: 10 일. 합리적 (innovation cycle 과 balance).

4. Decision quality

각 phase 에서 reliable detect 의 기준 (z > 2):

Ring 1: z=5.6 (왠지 likely false 일 수 있음, but sample 적음)
Ring 2+: 모두 reliable

이 progression 이 ramping 의 design rationale. 처음에는 loose, 나중에는 strict.

5.0.0.1 실무 함의

실험 platform 이 자동 ring schedule + threshold 적용
Engineer 는 design 만, ramping 은 platform 자동
Alert 시 manual review
Auto-advance 시 manual override 가능

6 관련 주제

선행

다음 글

F15-3 — Long-Term Holdout/Replication + Post Final Ramp

관련 챕터

F19-* — Ch.19 A/A Test — Ramp 시작 전 검증
F23-* — Ch.23 장기 효과 — Holdout 의 long-term

다른 카테고리 연결

Engineering — Canary Deployment, Blue-Green
Engineering — Multi-Region Deployment — Data center isolation
Statistics — Power Analysis