\documentclass[../Main.tex]{subfiles} \begin{document} In this section I describe the model fitting, the posteriors of the parameters of interest, and intepret the results. \subsection{Model Fitting} I fit the econometric model using mc-stan \cite{standevelopmentteam_StanModelling_2022} through the rstan \cite{standevelopmentteam_RStanInterface_2023} interface. I had X Trials with X snapshots in total. \todo{Fill out.} %describe X\todo{UPDATE VALUES} warmup iterations and X\todo{UPDATE VALUES} sampling iterations in six chains. % \subsection{Data Exploration} % \todo{fill this out later.} %look at trial \subsection{Interpretation} % Explain % - What do we care about? Changes in the probability of % - distribution of differences -> relate to E(\delta Y) % - How do we obtain this distribution of differences? % - from the model, we pay attention to P under treatment and control % - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability. % - The specific measure of interest is how much a delay in closing enrollment changes the probability of terminating a trial $p_{i,n}$ in the model. In the standard reduced form causal inference, the treatment effect of interest for outcome $Z$ is measured as \begin{align} E(Z(\text{Treatment}) - Z(\text{Control})) = E(Z(\text{Treatment})) - E(Z(\text{Control})) \end{align} Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables, $Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. In the bayesian framework, this parameter has a distribution, and so we can calculate the distribution of differences in the probability of termination due to a given delay in closing recrutiment, $p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$. I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$. This involves taking a draw from the $\beta$s distribution, calculating $p_{i,n}(C)$ for the underlying trials at the snapshot when they close enrollment and then calculating $p_{i,n}(T)$ under the counterfactual where enrollment had not yet closed. The difference $\delta_{p_{i,n}}$ is then calculated for each trial, and saved. After repeating this for all the posterior samples, we have an esitmate for the posterior distribution of differences. \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay} \small{ Values near 1 indicate a near perfect increase in the probability of termination. Values near 0 indicate little change in probability, while values near -1, represent a decrease in the probability of termination. The scale is in probability points, thus a value near 1 is a change from unlikely to terminate under control, to highly likely to terminate. } \caption{Distribution of Predicted Differences} \label{fig:pred_dist_diff_delay} \end{figure} We can see from figure \ref{fig:pred_dist_diff_delay} That there are roughly four regimes. The first consists of trials that experiences nearly no effect, i.e. have values near zero. Trials in the second regime experience a mild to large reduction in the probability of termination, with X percent of the probability mass between about 5 percentage points and 50 percentage point reductions. The third regime is those trials that experience a mild to large increase in the probability of termination, from an increase o 5 percentage points to about 75 percentage points. The fourth and final regime is the X\% of trials that experience a significant (greater than 75 percentage point) increase in the probability of termination. %Notes on interpretation % - increase vs decrease on graph % - % - % - % - Figure \ref{fig:pred_dist_dif_delay2} shows how this overall result comes from different disease categories. \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group} \caption{Distribution of Predicted differences by Disease Group} \label{fig:pred_dist_dif_delay2} \end{figure} Overall, we can see that there appear to be some trials that are highly suceptable to enrollment difficulties, and this appears to hold for all the disease categories This may be due to low sample since these are using a hierarchal model -- which partially pools results -- and the sample size per disease is rather small. An additional explanation is that the variance in parameters might be high enough for the change to Although it is not causally identified due to population interactions, we can examine the direct effect from adding a single generic competitior drug and how the similar result decomposes very differently. Figure \label{fig:pred_dist_diff_generic} shows a very similar result with roughly the same regimes, while \label{fig:pred_dist_dif_generic2} shows that this breakdown is different. \todo{ Consider moving these to an appendix as they are just additions at this point. } \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic} \caption{ Distribution of Predicted Differences for one additional generic competitor } \label{fig:pred_dist_diff_generic} \end{figure} \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group} \caption{} \label{fig:pred_dist_dif_generic2} \end{figure} \end{document}