\documentclass[../Main.tex]{subfiles}

\begin{document}

In this section 
I describe the model fitting, the posteriors of the parameters of interest,
and intepret the results.


\subsection{Model Fitting}
I fit the econometric model using mc-stan 
\cite{standevelopmentteam_StanModelling_2022}
through the rstan 
\cite{standevelopmentteam_RStanInterface_2023}
interface.

I had X Trials with X snapshots in total. \todo{Fill out.} 

%describe  
X\todo{UPDATE VALUES} 
warmup iterations and
X\todo{UPDATE VALUES} 
sampling iterations in six chains.

% \subsection{Data Exploration} 
% \todo{fill this out later.}
%look at trial 


\subsection{Interpretation}
% Explain 
% - What do we care about? Changes in the probability of 
% - distribution of differences -> relate to E(\delta Y)
% - How do we obtain this distribution of differences?
%   - from the model, we pay attention to P under treatment and control
%   - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
%   - 

The specific measure of interest is how much a delay in 
closing enrollment changes the probability of terminating a trial
$p_{i,n}$ in the model.

In the standard reduced form causal inference, the treatment effect
of interest for outcome $Z$ is measured as 
\begin{align}
    E(Z(\text{Treatment}) - Z(\text{Control})) 
    = E(Z(\text{Treatment})) - E(Z(\text{Control}))
\end{align}
Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. 
In the bayesian framework, this parameter has a distribution, and so 
we can calculate the distribution of differences in 
the probability of termination due to a given delay in 
closing recrutiment,
$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.

I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the 
posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
This involves taking a draw from the $\beta$s distribution, calculating
$p_{i,n}(C)$ 
for the underlying trials at the snapshot when they close enrollment
and then calculating 
$p_{i,n}(T)$ 
under the counterfactual where enrollment had not yet closed.
The difference 
$\delta_{p_{i,n}}$ 
is then calculated for each trial, and saved. 
After repeating this for all the posterior samples, we have an esitmate 
for the posterior distribution of differences.


\begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay}
	\small{
	    Values near 1 indicate a near perfect increase in the probability 
	    of termination. 
	    Values near 0 indicate little change in probability,
	    while values near -1, represent a decrease in the probability
	    of termination. 
	    The scale is in probability points, thus a value near 1 is a change 
	    from unlikely to terminate under control, to highly likely to 
	    terminate.
	}
	\caption{Distribution of Predicted Differences}
	\label{fig:pred_dist_diff_delay}
\end{figure}

We can see from figure 
\ref{fig:pred_dist_diff_delay} 
That there are roughly four regimes. 
The first consists of trials that experiences nearly no effect,
i.e. have values near zero.
Trials in the second regime experience a mild to large reduction in 
the probability of termination, with X percent of the probability mass 
between about 5 percentage points and 50 percentage point  reductions.
The third regime is those trials that experience a mild to large 
increase in the probability of termination, 
from an increase o 5 percentage points to about 75 percentage points. 
The fourth and final regime is the X\% of trials that experience a significant
(greater than 75 percentage point) increase in the probability of 
termination.
%Notes on interpretation
% - increase vs decrease on graph 
% - 
% - 
% - 
% - 

Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
result comes from different disease categories.
\begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group}
	\caption{Distribution of Predicted differences by Disease Group}
	\label{fig:pred_dist_dif_delay2}
\end{figure}

Overall, we can see that there appear to be some trials that are highly 
suceptable to enrollment difficulties, and this appears to hold for all the 
disease categories
This may be due to low sample
since these are using a hierarchal model -- which partially pools results -- 
and the sample size per disease is rather small.
An additional explanation is that the variance in parameters 
might be high enough for the change to 


Although it is not causally identified due to population interactions,
we can examine the direct effect from adding a single generic competitior drug
and how the similar result decomposes very differently.
Figure 
\label{fig:pred_dist_diff_generic}
shows a very similar result with roughly the same regimes,
while 
\label{fig:pred_dist_dif_generic2}
shows that this breakdown is different.
\todo{
    Consider moving these to an appendix as they are 
    just additions at this point.
}

\begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
	\caption{
	    Distribution of Predicted Differences for one additional generic 
	    competitor
	}
	\label{fig:pred_dist_diff_generic}
\end{figure}

\begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
	\caption{}
	\label{fig:pred_dist_dif_generic2}
\end{figure}


\end{document}