| ID | Maternal age | Bleeding starts | Exposed | Adheres to strategy | Outcome |
|---|---|---|---|---|---|
| 101 | 28 | Week 10 | Never | No | Censored |
| 102 | 32 | Week 10 | Week 12 | Yes | Yes |
Two approaches to estimating weights
In our cloning approach, people get “censored” when they deviate from their assigned treatment strategy.
The probability of remaining uncensored changes over time as more people deviate from the strategy.
Consider our “treat at week 12” strategy:
People contributing data at week 15 must be weighted more heavily because they represent not just themselves, but also those who would have had similar outcomes but were censored earlier.
Let’s follow two people assigned to “treat at week 12” strategy:
| ID | Maternal age | Bleeding starts | Exposed | Adheres to strategy | Outcome |
|---|---|---|---|---|---|
| 101 | 28 | Week 10 | Never | No | Censored |
| 102 | 32 | Week 10 | Week 12 | Yes | Yes |
Person 102 must be weighted to represent both themselves and people like Person 101 who would have had similar outcomes
Data structure: Interval survival format
time_in, time_out, censor (event indicator)# A tibble: 5 × 7
ID time_in time_out maternal_age bleeding censor outcome
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 101 8 10 28 0 0 0
2 101 10 12 28 1 1 0
3 102 8 10 32 0 0 0
4 102 10 12 32 1 0 0
5 102 12 16.8 32 1 0 1
Person 101 is censored in interval (10, 12]. Person 102 has the outcome in interval (12, 16.8].
The Cox model estimates the hazard of censoring at time \(t\):
\[h_c(t | X) = h_{c0}(t) \exp(\beta_1 X_1 + \beta_2 X_2 + \ldots)\]
Where:
\(S_c(t | X) = \exp\left(-\int_0^t h_c(u | X) du\right)\)
survival package in R does this automatically using the Breslow estimator.Weight: \(w(t) = \frac{1}{S_c(t | X)}\) = inverse probability of remaining uncensored
Step 1: Fit Cox model for censoring in each clone
Step 2: Extract interval survival probabilities
clone_weighted <- clone_long |>
mutate(.fitted = predict(mod_cens, type = "survival")) |>
group_by(ID) |>
arrange(time_in) |>
mutate(
# interval survival probabilities
p_uncens = lag(.fitted, default = 1),
# cumulative probability of remaining uncensored
p_uncens_cumulative = cumprod(p_uncens),
# inverse probability weight
weight = 1 / p_uncens_cumulative
)Data structure: Long format with weekly observations
week (every one), censor_now (0/1 for censored this week)# A tibble: 13 × 6
ID week maternal_age bleeding censor_now outcome_now
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 101 8 28 0 0 0
2 101 9 28 0 0 0
3 101 10 28 1 0 0
4 101 11 28 1 1 0
5 102 8 32 0 0 0
6 102 9 32 0 0 0
7 102 10 32 1 0 0
8 102 11 32 1 0 0
9 102 12 32 1 0 0
10 102 13 32 1 0 0
11 102 14 32 1 0 0
12 102 15 32 1 0 0
13 102 16 32 1 0 1
Person 101 censored at week 12 (censor_now = 1). Person 102 has outcome at week 17 (outcome_now = 1).
The logistic model estimates the probability of censoring in week \(t\):
\[\text{logit}(P(C_t = 1 | C_{t-1} = 0, X_t)) = \alpha_t + \beta_1 X_{1t} + \beta_2 X_{2t} + \ldots\]
Where: - \(C_t\) = indicator of censoring in week \(t\) - \(\alpha_t\) = week-specific intercepts (baseline hazard) - \(\beta\) = log odds ratios for predictors of censoring
Survival probability: \[S_c(t | X) = \prod_{k=1}^{t} (1 - P(C_k = 1 | C_{k-1} = 0, X_k))\]
Weight: \(w(t) = \frac{1}{S_c(t | X)}\) = inverse probability of remaining uncensored
Step 1: Fit logistic model for weekly censoring probability
Step 2: Calculate cumulative probability of remaining uncensored
clone_weighted <- clone_long_weekly |>
mutate(
# probability of censoring this week
p_censor_week = predict(mod_cens, type = "response"),
# probability of remaining uncensored this week
p_uncens_week = 1 - p_censor_week
) |>
group_by(ID) |>
mutate(
# cumulative probability of remaining uncensored
p_uncens_cumulative = cumprod(p_uncens_week),
# inverse probability weight
weight = 1 / p_uncens_cumulative
)| Aspect | Cox regression | Pooled logistic |
|---|---|---|
| Data size | Generally smaller (interval format) | Larger (weekly format) |
| Baseline hazard | Not specified | Must be modeled |
| Flexibility | Semi-parametric | Fully parametric |
| Time modeling | Automatic | Manual (e.g., splines, indicators) |
Both produce valid inverse probability weights when models are correctly specified.
Time scale:
Model checking:
Extreme weights: