diff --git a/Paper/sections/04_EconometricModel.tex b/Paper/sections/04_EconometricModel.tex index 5c509fd..dc1ee16 100644 --- a/Paper/sections/04_EconometricModel.tex +++ b/Paper/sections/04_EconometricModel.tex @@ -19,7 +19,7 @@ First, some notation: \item $y_i$: whether each trial terminated (true, 1) or completed (false, 0). \item $d_i$: indexes the ICD-10 disease category of the trial. - \item $x_{i,n}$: represents the other dependent + \item $x_{i,n}$: represents the independent variables associated with the snapshot. \end{itemize} diff --git a/Paper/sections/06_Results.tex b/Paper/sections/06_Results.tex index 999e056..c3a437c 100644 --- a/Paper/sections/06_Results.tex +++ b/Paper/sections/06_Results.tex @@ -1,113 +1,153 @@ \documentclass[../Main.tex]{subfiles} \begin{document} -%\subsection{Data Exploration} %TODO: fill this out later. -%look at trial + +In this section +I describe the model fitting, the posteriors of the parameters of interest, +and intepret the results. + + \subsection{Model Fitting} -In this section we examine the results from fitting the econometric model using -mc-stan (\cite{mc-stan}) through the rstan (\cite{rstan}) interface. - -%describe -The model was based on the hierarchal logistic regression model -presented in the Stan Users Guide (\cite{mc-stan}), -and was run with 2,500 warmup iterations and -2,500 sampling iterations in six chains. -There were various issues, including 160 divergent transitions and the R-hat -measure was 1.49. -Overall these suggest that the econometric model is incorrect as -written or requires reparameterization. -%TODO: and info about how I learned about these diagnostics - - -% \subsubsection{Diagnostics} -% %Examine trank plots -% To identify which parameters were problematic, I first looked at trace rank -% histograms. -% Under idea circumstances, each line (representing a chain) should exchange -% places with the other lines frequently. -% In both \cref{fig:mu_trank} and \cref{fig:sigma_trank}, most parameters seem -% to mix well but there are a couple of exceptions. -% This warrants further investigation. -% -% \begin{figure}[H] -% \includegraphics[width=\textwidth]{../assets/img/mu_trank.png} -% \caption{Trace Rank Histogram: Mu values} -% \label{fig:mu_trank} -% \end{figure} -% -% \begin{figure}[H] -% \includegraphics[width=\textwidth]{../assets/img/sigma_trank.png} -% \caption{Trace Rank Histogram: Sigma values} -% \label{fig:sigma_trank} -% \end{figure} -% -% %Take a look at batman and points for mu -% In the case of the Mu values, a parallel coordinates plot -% doesn't seem to indicate any parameters as likely candidates -% for causing the issues with divergent transitions. -% \begin{figure}[H] -% \includegraphics[width=\textwidth]{../assets/img/mu_batman.png} -% \caption{Parallel Coordinate Plot: Mu values} -% \label{fig:mu_batman} -% \end{figure} -% Note that at each parameter, there is some level of dispersion between -% values that diverged. -% -% On the other hand, in the parallel coordinates plot for sigma values, -% it appears that most divergent transitions occur with values of -% sigma[1], sigma[3], sigma[6], and sigma[7] close to zero. -% \begin{figure}[H] -% \includegraphics[width=\textwidth]{../assets/img/sigma_batman.png} -% \caption{Parallel Coordinate Plot: Sigma values} -% \label{fig:sigma_batman} -% \end{figure} -% Overall this suggests that there is an issue with the specification -% of the covariance structures of the hyperparameters. -% -% Additional evidence that the covariance structure is incorrect comes from -% plotting pairs of parameter values and examining the chains with divergent -% transitions. -% -% \begin{figure}[H] -% \includegraphics[width=\textwidth]{../assets/img/sigma_pairs_5-9.png} -% \caption{Parameter Pairs plots: Sigma[5] through Sigma[9]} -% \label{fig:sigma_pairs_5-9.png} -% \end{figure} -% From this we can see that divergent pairs are highly correlated with the cases -% where sigma[6] or sigma[7] are equal to zero. -% This has an impact on the shape of both of those estimated parameters, causing -% both to be bimodal. +I fit the econometric model using mc-stan +\cite{standevelopmentteam_StanModelling_2022} +through the rstan +\cite{standevelopmentteam_RStanInterface_2023} +interface. + +I had X Trials with X snapshots in total. \todo{Fill out.} + +%describe +X\todo{UPDATE VALUES} +warmup iterations and +X\todo{UPDATE VALUES} +sampling iterations in six chains. + +% \subsection{Data Exploration} +% \todo{fill this out later.} +%look at trial \subsection{Interpretation} +% Explain +% - What do we care about? Changes in the probability of +% - distribution of differences -> relate to E(\delta Y) +% - How do we obtain this distribution of differences? +% - from the model, we pay attention to P under treatment and control +% - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability. +% - + +The specific measure of interest is how much a delay in +closing enrollment changes the probability of terminating a trial +$p_{i,n}$ in the model. + +In the standard reduced form causal inference, the treatment effect +of interest for outcome $Z$ is measured as +\begin{align} + E(Z(\text{Treatment}) - Z(\text{Control})) + = E(Z(\text{Treatment})) - E(Z(\text{Control})) +\end{align} +Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables, +$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. +In the bayesian framework, this parameter has a distribution, and so +we can calculate the distribution of differences in +the probability of termination due to a given delay in +closing recrutiment, +$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$. + +I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the +posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$. +This involves taking a draw from the $\beta$s distribution, calculating +$p_{i,n}(C)$ +for the underlying trials at the snapshot when they close enrollment +and then calculating +$p_{i,n}(T)$ +under the counterfactual where enrollment had not yet closed. +The difference +$\delta_{p_{i,n}}$ +is then calculated for each trial, and saved. +After repeating this for all the posterior samples, we have an esitmate +for the posterior distribution of differences. -The key results so far are related to the distribution of differences in $p$. -In figure \ref{fig:pred_dist_dif_delay} we see that there while most trials do not see any increased risk -from a delay in closing enrollment, there is a small group that does experience this. \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay} - \caption{} + \small{ + Values near 1 indicate a near perfect increase in the probability + of termination. + Values near 0 indicate little change in probability, + while values near -1, represent a decrease in the probability + of termination. + The scale is in probability points, thus a value near 1 is a change + from unlikely to terminate under control, to highly likely to + terminate. + } + \caption{Distribution of Predicted Differences} \label{fig:pred_dist_diff_delay} \end{figure} -Figure \ref{fig:pred_dist_dif_delay2} shows how this varies across disease categories +We can see from figure +\ref{fig:pred_dist_diff_delay} +That there are roughly four regimes. +The first consists of trials that experiences nearly no effect, +i.e. have values near zero. +Trials in the second regime experience a mild to large reduction in +the probability of termination, with X percent of the probability mass +between about 5 percentage points and 50 percentage point reductions. +The third regime is those trials that experience a mild to large +increase in the probability of termination, +from an increase o 5 percentage points to about 75 percentage points. +The fourth and final regime is the X\% of trials that experience a significant +(greater than 75 percentage point) increase in the probability of +termination. +%Notes on interpretation +% - increase vs decrease on graph +% - +% - +% - +% - + +Figure \ref{fig:pred_dist_dif_delay2} shows how this overall +result comes from different disease categories. \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group} - \caption{} + \caption{Distribution of Predicted differences by Disease Group} \label{fig:pred_dist_dif_delay2} \end{figure} -We can also examine the direct effect from adding a single generic competitior drug. +Overall, we can see that there appear to be some trials that are highly +suceptable to enrollment difficulties, and this appears to hold for all the +disease categories +This may be due to low sample +since these are using a hierarchal model -- which partially pools results -- +and the sample size per disease is rather small. +An additional explanation is that the variance in parameters +might be high enough for the change to + + +Although it is not causally identified due to population interactions, +we can examine the direct effect from adding a single generic competitior drug +and how the similar result decomposes very differently. +Figure +\label{fig:pred_dist_diff_generic} +shows a very similar result with roughly the same regimes, +while +\label{fig:pred_dist_dif_generic2} +shows that this breakdown is different. +\todo{ + Consider moving these to an appendix as they are + just additions at this point. +} \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic} - \caption{} + \caption{ + Distribution of Predicted Differences for one additional generic + competitor + } \label{fig:pred_dist_diff_generic} \end{figure} -Figure \ref{fig:pred_dist_dif_generic2} shows how this varies across disease categories \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group} \caption{}