diff --git a/Paper/sections/04_EconometricModel.tex b/Paper/sections/04_EconometricModel.tex index dc1ee16..b792f3f 100644 --- a/Paper/sections/04_EconometricModel.tex +++ b/Paper/sections/04_EconometricModel.tex @@ -89,4 +89,47 @@ These include: I may have only done it in the CBO analysis.} } \end{itemize} + +\subsection{Interpretation} +% Explain +% - What do we care about? Changes in the probability of +% - distribution of differences -> relate to E(\delta Y) +% - How do we obtain this distribution of differences? +% - from the model, we pay attention to P under treatment and control +% - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability. +% - + +The specific measure of interest is how much a delay in +closing enrollment changes the probability of terminating a trial +$p_{i,n}$ in the model. + +In the standard reduced form causal inference, the treatment effect +of interest for outcome $Z$ is measured as +\begin{align} + E(Z(\text{Treatment}) - Z(\text{Control})) + = E(Z(\text{Treatment})) - E(Z(\text{Control})) +\end{align} +Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables, +$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. +In the bayesian framework, this parameter has a distribution, and so +we can calculate the distribution of differences in +the probability of termination due to a given delay in +closing recrutiment, +$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$. + +I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the +posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$. +This involves taking a draw from the $\beta$s distribution, calculating +$p_{i,n}(C)$ +for the underlying trials at the snapshot when they close enrollment +and then calculating +$p_{i,n}(T)$ +under the counterfactual where enrollment had not yet closed. +The difference +$\delta_{p_{i,n}}$ +is then calculated for each trial, and saved. +After repeating this for all the posterior samples, we have an esitmate +for the posterior distribution of differences between treatement and control. + + \end{document} diff --git a/Paper/sections/06_Results.tex b/Paper/sections/06_Results.tex index c3a437c..6eb2f02 100644 --- a/Paper/sections/06_Results.tex +++ b/Paper/sections/06_Results.tex @@ -7,7 +7,7 @@ I describe the model fitting, the posteriors of the parameters of interest, and intepret the results. -\subsection{Model Fitting} +\subsection{Estimation Procedure} I fit the econometric model using mc-stan \cite{standevelopmentteam_StanModelling_2022} through the rstan @@ -27,63 +27,29 @@ sampling iterations in six chains. %look at trial -\subsection{Interpretation} -% Explain -% - What do we care about? Changes in the probability of -% - distribution of differences -> relate to E(\delta Y) -% - How do we obtain this distribution of differences? -% - from the model, we pay attention to P under treatment and control -% - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability. -% - - -The specific measure of interest is how much a delay in -closing enrollment changes the probability of terminating a trial -$p_{i,n}$ in the model. - -In the standard reduced form causal inference, the treatment effect -of interest for outcome $Z$ is measured as -\begin{align} - E(Z(\text{Treatment}) - Z(\text{Control})) - = E(Z(\text{Treatment})) - E(Z(\text{Control})) -\end{align} -Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables, -$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. -In the bayesian framework, this parameter has a distribution, and so -we can calculate the distribution of differences in -the probability of termination due to a given delay in -closing recrutiment, -$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$. - -I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the -posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$. -This involves taking a draw from the $\beta$s distribution, calculating -$p_{i,n}(C)$ -for the underlying trials at the snapshot when they close enrollment -and then calculating -$p_{i,n}(T)$ -under the counterfactual where enrollment had not yet closed. -The difference -$\delta_{p_{i,n}}$ -is then calculated for each trial, and saved. -After repeating this for all the posterior samples, we have an esitmate -for the posterior distribution of differences. +\subsection{Primary Results} +The primary, causally-identified value we can estimate is the change in +the probability of termination caused by (counterfactually) keeping enrollment +open instead of closing enrollment when observed. +In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of +keeping enrollment open. \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay} - \small{ - Values near 1 indicate a near perfect increase in the probability - of termination. - Values near 0 indicate little change in probability, - while values near -1, represent a decrease in the probability - of termination. - The scale is in probability points, thus a value near 1 is a change - from unlikely to terminate under control, to highly likely to - terminate. - } - \caption{Distribution of Predicted Differences} - \label{fig:pred_dist_diff_delay} + \small{ + Values near 1 indicate a near perfect increase in the probability + of termination. + Values near 0 indicate little change in probability, + while values near -1, represent a decrease in the probability + of termination. + The scale is in probability points, thus a value near 1 is a change + from unlikely to terminate under control, to highly likely to + terminate. + } + \caption{Distribution of Predicted Differences} + \label{fig:pred_dist_diff_delay} \end{figure} We can see from figure @@ -107,53 +73,75 @@ termination. % - % - +% The probability mass associated with a each 10 percentage point change are in table \ref{tab:regimes} +% \begin{table}[H] +% \caption{Regimes and associated probability masses}\label{tab:regimes} +% \begin{center} +% \begin{tabular}[c]{l|l} +% \hline +% \multicolumn{1}{c|}{\textbf{Interval}} & +% \multicolumn{1}{c}{\textbf{Probability Mass}} \\ +% \hline +% $[,]$ & b \\ +% $[,]$ & b \\ +% $[,]$ & b \\ +% $[,]$ & b \\ +% $[,]$ & b \\ +% \hline +% \end{tabular} +% \end{center} +% \end{table} + Figure \ref{fig:pred_dist_dif_delay2} shows how this overall result comes from different disease categories. \begin{figure}[H] \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group} - \caption{Distribution of Predicted differences by Disease Group} - \label{fig:pred_dist_dif_delay2} -\end{figure} - -Overall, we can see that there appear to be some trials that are highly -suceptable to enrollment difficulties, and this appears to hold for all the -disease categories -This may be due to low sample -since these are using a hierarchal model -- which partially pools results -- -and the sample size per disease is rather small. -An additional explanation is that the variance in parameters -might be high enough for the change to - - -Although it is not causally identified due to population interactions, -we can examine the direct effect from adding a single generic competitior drug -and how the similar result decomposes very differently. -Figure -\label{fig:pred_dist_diff_generic} -shows a very similar result with roughly the same regimes, -while -\label{fig:pred_dist_dif_generic2} -shows that this breakdown is different. -\todo{ - Consider moving these to an appendix as they are - just additions at this point. -} - -\begin{figure}[H] - \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic} - \caption{ - Distribution of Predicted Differences for one additional generic - competitor - } - \label{fig:pred_dist_diff_generic} -\end{figure} - -\begin{figure}[H] - \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group} - \caption{} - \label{fig:pred_dist_dif_generic2} + \caption{Distribution of Predicted differences by Disease Group} + \label{fig:pred_dist_dif_delay2} \end{figure} +Overall, we can see that there appear to be some trials or situations +that are highly suceptable to enrollment difficulties, and this +appears to hold for all disease categories for which I have data. +This relative homogeneity of results may be due to the +partial pooling effect from the hierarchal model +and the fact that the sample size per disease is rather small. +An additional explanation is that the variance of the parameter distributions +might be high enough for each trial to have a few situation in which they have +a high probability of terminating. + + + +% Although it is not causally identified due to population interactions, +% we can examine the direct effect from adding a single generic competitior drug +% and how the similar result decomposes very differently. +% This is shown just as a contrast to the enrollment results. +% Figure +% \label{fig:pred_dist_diff_generic} +% shows a very similar result with roughly the same regimes, +% while +% \label{fig:pred_dist_dif_generic2} +% shows that this breakdown is different. +% \todo{ +% Consider moving these to an appendix as they are +% just additions at this point. +% } +% +% \begin{figure}[H] +% \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic} +% \caption{ +% Distribution of Predicted Differences for one additional generic +% competitor +% } +% \label{fig:pred_dist_diff_generic} +% \end{figure} +% +% \begin{figure}[H] +% \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group} +% \caption{} +% \label{fig:pred_dist_dif_generic2} +% \end{figure} +% \end{document} diff --git a/Paper/sections/10_CausalStory.tex b/Paper/sections/10_CausalStory.tex index 331f4f1..5ce5883 100644 --- a/Paper/sections/10_CausalStory.tex +++ b/Paper/sections/10_CausalStory.tex @@ -69,7 +69,7 @@ in the first place while currently observed safety and efficiency results help the sponsor judge whether or not to continue the trial. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -\subsection{Clinical Trials Data Sources} +\subsection{Data Summary} %% Describe data here Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled drugs or devices on human subjects must register diff --git a/Paper/sections/12_clinical_trial_background.tex b/Paper/sections/12_clinical_trial_background.tex index fe16b8f..7016d35 100644 --- a/Paper/sections/12_clinical_trial_background.tex +++ b/Paper/sections/12_clinical_trial_background.tex @@ -58,10 +58,29 @@ purpose of the clinical trials process. On the other hand, when a trial terminates early due to reasons other than safety or efficacy concerns, the trial operator does not learn if the drug is effective or safe. -This is a true failure in that we did not learn if the drug was effective or not. -Unfortunately, although termination documentation typically includes a -description of a reason for the clinical trial termination, this doesn't necessarily -list all the reasons contributing to the trial termination and may not exist for a given trial. +This is a knowledge-gathering failure where the trial operator +did not learn if the drug was effective or not. +I prefer describing a clinical trial as being terminated for +\begin{itemize} + \item Safety or Efficacy concerns + \item Strategic concerns + \item Operational concerns. +\end{itemize} + +Unfortunately it can be difficult to know why a given trial was terminated, +in spite of the fact that upon termination, trials typically record a +description of \textit{a single} reason for the clinical trial termination. +This doesn't necessarily list all the reasons contributing to the trial termination and may not exist for a given trial. +For example, if a Principle Investigator leaves for another institution +(terminating the trial), is this decison affected by +a safety or efficacy concern, +a new competitor on the market, +difficulting recruiting participants, +or a lack of financial support from the study sponsor? +Estimating the impact of different problems that trials face from these +low-information, post-hoc signals is insufficient. +For this reason, I use clinical trial progression to estimate effects. +\todo{not sure if this is the best place for this.} As a trial goes through the different stages of recruitment, the investigators update the records on ClinicalTrials.gov.