Kwangmin Kim - Kohavi Ch.22.3 — Rule-of-Thumb · Ecosystem Value

1 정의

정의: Rule-of-Thumb (Ecosystem Value of an Action)

leakage 가 의심되는 action 의 downstream impact 를 사전 calibration 한 multiplier 로 보정 (Kohavi, Tang, Xu, 2020, Ch.22; Tutterow and Saint-Jacques 2019).

1.0.0.1 형식적 표현

특정 action \(a\) (예: 메시지 보냄) 의 총 효과 = 1 차 + downstream:

\[\text{TotalValue}(a) = \text{Direct}(a) + \sum_j m_{a \to j} \cdot \text{Down}_j(a)\]

\(m_{a \to j}\): action \(a\) 가 downstream metric \(j\) 에 끼치는 multiplier (historical experiment 들 로 calibrated)
\(\text{Down}_j(a)\): \(a\) 발생 시 metric \(j\) 의 응답 expectation

1.0.0.2 분석 절차

action \(a\) 의 1 차 metric (예: messages sent) delta 측정
downstream metrics (replies, sessions, retention) delta 측정
pre-calibrated multipliers \(m\) 적용
ecosystem-corrected delta 계산

직관

Rule-of-thumb 의 사고:

Bernoulli randomization 은 sample 크고 power 좋음 — 포기 하기 아쉽다
하지만 leakage 로 인해 1 차 metric 의 delta 만 보면 underestimate
해결: downstream metric 도 같이 측정 + historical multiplier 로 보정

비유: 사진 찍을 때 직접 빛 만 보지 말고 반사 빛 도 함께 측정. 두 빛을 알면 진짜 풍경 의 밝기 를 추정 가능.

2 왜 필요한가

다른 isolation 방식 의 한계

방식	한계
Splitting resources	shared machines, training data 의 일부만 적용 가능
Geo-based	sample size 가 지역 수 로 제한 → variance 증가
Time-based	시간 효과 강해 적용 어려움
Network-cluster	dense graph 에서 isolation 불완전
Ego-centric	구현 복잡, ego/alter 분리 필요

2.0.0.1 Bernoulli 의 장점

sample size 최대 (모든 사용자 단위)
구현 단순 (기존 randomization 재사용)
power 좋음

2.0.0.2 Rule-of-thumb 의 위치

Bernoulli 유지 + post-hoc downstream 보정 = sample size 최대 + leakage 부분 보정. 실험 의 대부분 에 적용 가능.

3 1차 vs Downstream Metric

정의

3.0.0.1 1 차 metric (first-order action)

사용자 가 직접 수행하는 action
예: 메시지 보냄, post 함, like 함, invitation 보냄

3.0.0.2 Downstream metric

1 차 action 에 반응 하는 metric
예: 받은 메시지 의 reply, post 의 like·comment, invitation 의 accept

3.0.0.3 LinkedIn 의 사례 (Barrilleaux and Wang 2018)

1 차 action	Downstream metric
Total messages sent	Total messages responded to
Total posts created	Total likes / comments received
Total likes / comments given	Total creators receiving these

직관: depth · breadth 의 측정

downstream metric 은 2 차 wave 측정.

Depth: 1 명의 친구 가 몇 번 응답 하는가
Breadth: 응답 한 친구 가 몇 명인가
두 axis 의 곱 → ecosystem 영향 의 총량

레슨: downstream metric 의 delta 가 0 인 실험 은 leakage 우려 가 없음 (다음 ecosystem multiplier 작용 안 함). delta 가 큰 실험 만 보정 필요.

4 Ecosystem Multiplier 추정 — Instrumental Variable 접근

Mechanism (Tutterow and Saint-Jacques 2019)

4.0.0.1 IV 의 사고

historical experiment \(e\) 들 의 데이터 — 각 실험 마다:

\(\Delta a_e\): 1 차 action delta
\(\Delta v_e\): 최종 value (예: long-term engagement, revenue) delta

가정: 실험 의 randomization 은 exogenous (다른 confounder 와 무관). 따라서 실험 \(e\) 가 instrument 역할.

4.0.0.2 Multiplier 추정

\[m_a = \frac{\Delta v}{\Delta a} \quad \text{(실험 e 들 평균 across)}\]

또는 회귀:

\[\Delta v_e = \alpha + m_a \cdot \Delta a_e + \epsilon_e\]

4.0.0.3 적용

새 실험 의 \(\Delta a\) 측정 → ecosystem-corrected \(\Delta \hat{v} = m_a \cdot \Delta a\).

직관: 과거 실험 의 자산화

이 접근 의 가치: 과거 실험 들 이 자산 화 — 한 번 multiplier 만 추정하면 모든 새 실험 에 즉시 활용.

비유: 환율 — 한 번 계산해 둔 (원/달러) 환율로 모든 거래 의 환산 가능.

적용 예시 (LinkedIn 메시지)

historical experiment 의 20 건 분석:

메시지 1 % 증가 → ecosystem 값 (장기 retention) 0.5 % 증가
multiplier \(m \approx 0.5\)

새 실험 결과:

\(\Delta a\) (messages sent) = +3.0%
\(\Delta v\) (immediate metric) 만 보면 +1.5% (underestimate)
ecosystem-corrected: \(0.5 \times 3.0\% = 1.5\%\) → 실제 +1.5% (이 multiplier 와 일치)

이 경우 underestimate 가 명시적 으로 모델링 됨.

5 한계

4 가지 한계

5.0.0.1 1. Multiplier 의 평균 화 (rule-of-thumb 의 본질)

\(m_a\) 는 historical experiment 들 의 평균
특정 새 실험 의 multiplier 가 다를 수 있음 (예: 메시지 의 질 이 다른 경우)

5.0.0.2 2. Action type 별 보정 필요

한 multiplier 가 모든 action 에 적용되지 않음
“메시지 보냄” 과 “invitation 보냄” 은 다른 multiplier
action type 마다 historical calibration 필요

5.0.0.3 3. Network 구조 의존

multiplier 는 현재 graph 구조 에 calibrated
graph 가 시간 따라 변하면 multiplier 도 drift
정기 재추정 (예: 분기마다) 필요

5.0.0.4 4. Heterogeneity

평균 multiplier 가 일부 cohort (예: super-user) 에는 매우 부정확
subgroup 별 multiplier 추정 시 sample 부족

직관: 평균 의 함정

Rule-of-thumb 의 핵심 가정 은 “새 실험 도 historical 평균 과 비슷”. 가정 깨질 수 있는 상황:

새 feature 가 특별히 viral — historical 평균 보다 큰 multiplier
새 feature 가 passive (예: 자동화 메시지) — multiplier 더 작음
target population 이 historical 과 다름 — multiplier 다름

레슨: rule-of-thumb 는 근사 도구. 큰 의사결정 (전사 launch) 시 추가 isolation 실험 으로 검증 필요.

6 Bernoulli 와 의 호환성

호환 매트릭스

Approach	sample size	leakage 보정	Bernoulli 호환
Bernoulli only	최대	없음	yes
Bernoulli + Rule-of-thumb	최대	부분 (downstream)	yes
Geo-based	작음	강	no (지역 단위)
Network-cluster	작음	강	no (cluster 단위)
Ego-centric	중간	강	partial (ego 만 Bernoulli)
Edge-level	최대	강	yes (post-hoc 분석)

6.0.0.1 적용 우선 순위

Bernoulli + Rule-of-thumb: 일반적 leakage 의심 시 1차 시도
Bernoulli + Edge-level: 명시적 edge 가 있는 사회 네트워크
Geo-based: marketplace 의 강한 자원 경쟁
Network-cluster · Ego-centric: 사회 네트워크 의 큰 실험

직관: 단순함 의 가치

Rule-of-thumb 는 가장 단순한 보정. 단순 하다 = 광범위 적용 가능.

비유: 환율 vs hedging. 환율 (rule-of-thumb) 은 모든 거래에 즉시 적용. hedging (network-cluster) 은 큰 거래에만 적용.

레슨: 일반 실험 의 80% 는 rule-of-thumb 충분. 나머지 20% (경쟁·네트워크 강함) 만 heavy isolation.

7 Python 시뮬레이션 — Rule-of-Thumb 보정

import numpy as np

np.random.seed(2026)

def simulate_with_spillover_and_correction(
    n_users=20000, alpha=0.4, true_lift=0.08, multiplier_prior=0.5
):
    """
    spillover 가 있는 실험 + downstream metric 측정 + multiplier 보정.
    """
    assignment = np.random.randint(0, 2, size=n_users)
    base_p_action = 0.30
    base_p_response = 0.15

    # 1차 action (예: messages sent)
    p_action_t = base_p_action * (1 + true_lift)
    p_action_c = base_p_action * (1 + alpha * true_lift)
    p_action = np.where(assignment == 1, p_action_t, p_action_c)
    actions = np.random.binomial(1, p_action, size=n_users)

    # downstream metric (예: messages responded to)
    p_response_t = base_p_response * (1 + alpha * true_lift)  # downstream 도 spillover
    p_response_c = base_p_response * (1 + alpha * 0.5 * true_lift)
    p_response = np.where(assignment == 1, p_response_t, p_response_c)
    responses = np.random.binomial(1, p_response, size=n_users)

    # 1차 delta (raw)
    rate_action_t = actions[assignment == 1].mean()
    rate_action_c = actions[assignment == 0].mean()
    delta_action = (rate_action_t - rate_action_c) / rate_action_c

    # downstream delta
    rate_resp_t = responses[assignment == 1].mean()
    rate_resp_c = responses[assignment == 0].mean()
    delta_resp = (rate_resp_t - rate_resp_c) / rate_resp_c

    # corrected delta = action_delta + multiplier * downstream_delta
    corrected = delta_action + multiplier_prior * delta_resp

    return delta_action, delta_resp, corrected

raw, down, corrected = simulate_with_spillover_and_correction()
print(f"True lift:                  8.00%")
print(f"Raw 1차 delta (biased):     {raw:.2%}")
print(f"Downstream delta:           {down:.2%}")
print(f"Corrected (rule-of-thumb):  {corrected:.2%}")

시뮬레이션 해석

raw delta 는 진짜 8% 보다 작음 (spillover 로 underestimate). downstream delta 도 양수 — ecosystem 영향 의 신호. corrected delta 는 raw + multiplier × downstream → 진짜 효과 에 근접.

이 시뮬레이션은 rule-of-thumb 가 완벽 하지 않지만 의미 있는 보정 을 제공함을 보여준다. multiplier 의 정확도 가 보정 의 정확도 를 결정.

8 사례

Feed 실험 의 creator-side optimization (Barrilleaux and Wang 2018)
1 차: post engagement (like, comment)
downstream: post 의 creator 의 retention
multiplier 는 historical experiment 로 추정 (분기 마다 재추정)

8.0.0.1 Bing

ad 실험 의 ecosystem value
1 차: query·click
downstream: search session 의 retention
multiplier 가 ad type 별로 다름

8.0.0.2 일반 패턴

action type 정의 (메시지·post·click·…)
action 별 1 차·downstream metric pair 정의
historical experiment cohort 에서 multiplier 회귀
새 실험 마다 raw + corrected 둘 다 보고

9 비교

차원	Raw delta	Rule-of-thumb corrected
데이터 요구	실험 데이터 만	+ downstream metric + historical multiplier
sample size	Bernoulli 최대	Bernoulli 최대 (변경 없음)
구현	단순	약간 복잡 (multiplier 관리)
Bias 보정	없음	부분적 (downstream 만)
정확도	평균 적 underestimate	더 정확 (단, multiplier 의 평균 한계)
의사결정	conservative	더 가까운 추정

10 응용

사회 engagement 실험 (메시지·post·like·share)
two-sided platform 의 수요 측 실험 (advertiser·publisher)
recommendation 실험 의 indirect effect (recommend → click → retention)
search ranking 의 query·click chain

11 실무 체크리스트

실험 의 1 차 action 정의
1 차 action 의 downstream metric 정의
historical experiments 에서 multiplier 추정 (분기 마다)
실험 보고 시 raw delta + corrected delta 둘 다 표기
multiplier 의 uncertainty 도 함께 보고 (CI 또는 IQR)
큰 의사결정 (전사 launch) 시 추가 isolation 실험 검증

12 관련 주제

F22-0 overview — Ch.22 전체 지도
F22-1 — Direct·Indirect 6 사례
F22-3 — Isolation 4 갈래 + edge-level + monitoring
D-21 (Hernan 22) — Time-varying treatment 의 IV 외삽
F-KOH18 (Variance) — Rule-of-thumb multiplier 의 variance 영향

출처

Kohavi, Tang, Xu (2020). Trustworthy Online Controlled Experiments. Ch.22.3 (Some Practical Solutions — Rule-of-Thumb).
Tutterow and Saint-Jacques (2019). LinkedIn ecosystem value 추정.
Barrilleaux and Wang (2018). “Spreading the Love in the LinkedIn Feed.”
Gupta et al. (2019). “Top Challenges from the OCE Summit.”