--- title: "TrialCountExtraction" author: "Will" format: html editor: source --- ```{r} #| eval: false #| include: true #Full set categories %>% unique() %>% sort() %>% length() #Evaluation set cf_categories %>% unique() %>% sort() %>% length() ``` ```{r} # Pulled from df group_trials_by_category %>% group_by(category_id) %>% count() ``` # Actual data from Evaluation and counterfactual ```{r} # Original Evaluation # - Pulled from `categories` above when defined counterfact_delay$ll %>% unique() %>% sort() %>% length() # Counterfactual # - Pulled from `cf_categories` above when defined counterfact_delay$llx %>% unique() %>% sort() %>% length() ``` Those came from ```{r} df$category_id %>% unique() %>% sort() %>% length() df_counterfact_base$category_id %>% unique() %>% sort() %>% length() ``` The difference between those is that the counterfactual imposes the constraint that there must be a snapshot where it moves from "ANR" to "Rec", implying that it can't just terminate. # Where do the other values drop When we find the counterfactual, the table looses some of the categories etc. Here is the extracted data ```{r} data.frame(extract(generated_ib, pars="predicted_difference")$predicted_difference) ``` ```{r} pddf_ib <- data.frame(extract(generated_ib, pars="predicted_difference")$predicted_difference) |> pivot_longer(X1:X168) #CHANGE_NOTE: moved from X169 to X168 pddf_ib["entry_idx"] <- as.numeric(gsub("\\D","",pddf_ib$name)) pddf_ib["category"] <- sapply(pddf_ib$entry_idx, function(i) counterfact_delay$llx[i]) pddf_ib["category_name"] <- sapply( pddf_ib$category, function(i) category_names[i] ) ``` and yet it seems that we predict the difference for all 168 trials It looks like there is an error where I apply category IDs. Because I'm pulling them from ```{r} ground_truth <- df$category_id[1:168] ```