---
title: "TrialCountExtraction"
author: "Will"
format: html
editor: source
---


```{r}
#| eval: false
#| include: true

#Full set
categories %>% unique() %>% sort() %>% length()

#Evaluation set
cf_categories %>% unique() %>% sort() %>% length()

```


```{r}
# Pulled from df
group_trials_by_category %>% group_by(category_id) %>% count()
```

# Actual data from Evaluation and counterfactual
```{r}
# Original Evaluation
# - Pulled from `categories` above when defined
counterfact_delay$ll %>% unique() %>% sort() %>% length()


# Counterfactual
# - Pulled from `cf_categories` above when defined
counterfact_delay$llx %>% unique() %>% sort() %>% length()
```
Those came from
```{r}
df$category_id %>% unique() %>% sort() %>% length()
df_counterfact_base$category_id %>% unique() %>% sort() %>% length()
```

The difference between those is that the counterfactual imposes the constraint
that there must be a snapshot where it moves from "ANR" to "Rec", implying that
it can't just terminate.

# Where do the other values drop

When we find the counterfactual, the table looses some of the categories etc.
Here is the extracted data

```{r}
data.frame(extract(generated_ib, pars="predicted_difference")$predicted_difference)
```

```{r}
pddf_ib <- data.frame(extract(generated_ib, pars="predicted_difference")$predicted_difference) |>
    pivot_longer(X1:X168) #CHANGE_NOTE: moved from X169 to X168


pddf_ib["entry_idx"] <- as.numeric(gsub("\\D","",pddf_ib$name))
pddf_ib["category"] <-  sapply(pddf_ib$entry_idx, function(i) counterfact_delay$llx[i])
pddf_ib["category_name"] <- sapply(
    pddf_ib$category, 
    function(i) category_names[i]
    )
```
and yet it seems that we predict the difference for all 168 trials

It looks like there is an error where I apply category IDs. Because I'm pulling them from 
```{r}
ground_truth <- df$category_id[1:168]
```