Midday updates from writing

2 years ago · 64f3d14f7b
parent 1630af2928
commit 64f3d14f7b
4 changed files with 149 additions and 99 deletions
--- a/Paper/sections/04_EconometricModel.tex
+++ b/Paper/sections/04_EconometricModel.tex
@ -89,4 +89,47 @@ These include:
        I may have only done it in the CBO analysis.}
    }
 \end{itemize}
+
+\subsection{Interpretation}
+% Explain 
+% - What do we care about? Changes in the probability of 
+% - distribution of differences -> relate to E(\delta Y)
+% - How do we obtain this distribution of differences?
+%   - from the model, we pay attention to P under treatment and control
+%   - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
+%   - 
+
+The specific measure of interest is how much a delay in 
+closing enrollment changes the probability of terminating a trial
+$p_{i,n}$ in the model.
+
+In the standard reduced form causal inference, the treatment effect
+of interest for outcome $Z$ is measured as 
+\begin{align}
+    E(Z(\text{Treatment}) - Z(\text{Control})) 
+    = E(Z(\text{Treatment})) - E(Z(\text{Control}))
+\end{align}
+Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
+$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. 
+In the bayesian framework, this parameter has a distribution, and so 
+we can calculate the distribution of differences in 
+the probability of termination due to a given delay in 
+closing recrutiment,
+$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
+
+I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the 
+posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
+This involves taking a draw from the $\beta$s distribution, calculating
+$p_{i,n}(C)$ 
+for the underlying trials at the snapshot when they close enrollment
+and then calculating 
+$p_{i,n}(T)$ 
+under the counterfactual where enrollment had not yet closed.
+The difference 
+$\delta_{p_{i,n}}$ 
+is then calculated for each trial, and saved. 
+After repeating this for all the posterior samples, we have an esitmate 
+for the posterior distribution of differences between treatement and control.
+
+
 \end{document}
--- a/Paper/sections/06_Results.tex
+++ b/Paper/sections/06_Results.tex
@ -7,7 +7,7 @@ I describe the model fitting, the posteriors of the parameters of interest,
 and intepret the results.


-\subsection{Model Fitting}
+\subsection{Estimation Procedure}
 I fit the econometric model using mc-stan 
 \cite{standevelopmentteam_StanModelling_2022}
 through the rstan 
@ -27,47 +27,13 @@ sampling iterations in six chains.
 %look at trial 


-\subsection{Interpretation}
-% Explain 
-% - What do we care about? Changes in the probability of 
-% - distribution of differences -> relate to E(\delta Y)
-% - How do we obtain this distribution of differences?
-%   - from the model, we pay attention to P under treatment and control
-%   - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
-%   - 
-
-The specific measure of interest is how much a delay in 
-closing enrollment changes the probability of terminating a trial
-$p_{i,n}$ in the model.
-
-In the standard reduced form causal inference, the treatment effect
-of interest for outcome $Z$ is measured as 
-\begin{align}
-    E(Z(\text{Treatment}) - Z(\text{Control})) 
-    = E(Z(\text{Treatment})) - E(Z(\text{Control}))
-\end{align}
-Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
-$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. 
-In the bayesian framework, this parameter has a distribution, and so 
-we can calculate the distribution of differences in 
-the probability of termination due to a given delay in 
-closing recrutiment,
-$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
-
-I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the 
-posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
-This involves taking a draw from the $\beta$s distribution, calculating
-$p_{i,n}(C)$ 
-for the underlying trials at the snapshot when they close enrollment
-and then calculating 
-$p_{i,n}(T)$ 
-under the counterfactual where enrollment had not yet closed.
-The difference 
-$\delta_{p_{i,n}}$ 
-is then calculated for each trial, and saved. 
-After repeating this for all the posterior samples, we have an esitmate 
-for the posterior distribution of differences.
+\subsection{Primary Results}

+The primary, causally-identified value we can estimate is the change in 
+the probability of termination caused by (counterfactually) keeping enrollment
+open instead of closing enrollment when observed. 
+In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of 
+keeping enrollment open.


 \begin{figure}[H]
@ -107,6 +73,25 @@ termination.
 % - 
 % - 

+% The probability mass associated with a each 10 percentage point change are in table \ref{tab:regimes}
+% \begin{table}[H]
+% \caption{Regimes and associated probability masses}\label{tab:regimes}
+% \begin{center}
+% \begin{tabular}[c]{l|l}
+% 	\hline
+% 	\multicolumn{1}{c|}{\textbf{Interval}} & 
+% 	\multicolumn{1}{c}{\textbf{Probability Mass}} \\
+% 	\hline
+% 	$[,]$ & b \\
+% 	$[,]$ & b \\
+% 	$[,]$ & b \\
+% 	$[,]$ & b \\
+% 	$[,]$ & b \\
+% 	\hline
+% \end{tabular}
+% \end{center}
+% \end{table}
+
 Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
 result comes from different disease categories.
 \begin{figure}[H]
@ -115,45 +100,48 @@ result comes from different disease categories.
    \label{fig:pred_dist_dif_delay2}
 \end{figure}

-Overall, we can see that there appear to be some trials that are highly 
-suceptable to enrollment difficulties, and this appears to hold for all the 
-disease categories
-This may be due to low sample
-since these are using a hierarchal model -- which partially pools results -- 
-and the sample size per disease is rather small.
-An additional explanation is that the variance in parameters 
-might be high enough for the change to 
-
-
-Although it is not causally identified due to population interactions,
-we can examine the direct effect from adding a single generic competitior drug
-and how the similar result decomposes very differently.
-Figure 
-\label{fig:pred_dist_diff_generic}
-shows a very similar result with roughly the same regimes,
-while 
-\label{fig:pred_dist_dif_generic2}
-shows that this breakdown is different.
-\todo{
-    Consider moving these to an appendix as they are 
-    just additions at this point.
-}
-
-\begin{figure}[H]
-    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
-	\caption{
-	    Distribution of Predicted Differences for one additional generic 
-	    competitor
-	}
-	\label{fig:pred_dist_diff_generic}
-\end{figure}
-
-\begin{figure}[H]
-    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
-	\caption{}
-	\label{fig:pred_dist_dif_generic2}
-\end{figure}
-
+Overall, we can see that there appear to be some trials or situations 
+that are highly suceptable to enrollment difficulties, and this 
+appears to hold for all disease categories for which I have data.
+This relative homogeneity of results may be due to the 
+partial pooling effect from the hierarchal model 
+and the fact that the sample size per disease is rather small.
+An additional explanation is that the variance of the parameter distributions
+might be high enough for each trial to have a few situation in which they have
+a high probability of terminating.
+
+
+
+% Although it is not causally identified due to population interactions,
+% we can examine the direct effect from adding a single generic competitior drug
+% and how the similar result decomposes very differently.
+% This is shown just as a contrast to the enrollment results.
+% Figure 
+% \label{fig:pred_dist_diff_generic}
+% shows a very similar result with roughly the same regimes,
+% while 
+% \label{fig:pred_dist_dif_generic2}
+% shows that this breakdown is different.
+% \todo{
+%     Consider moving these to an appendix as they are 
+%     just additions at this point.
+% }
+%
+% \begin{figure}[H]
+%     \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
+%     \caption{
+%         Distribution of Predicted Differences for one additional generic 
+%         competitor
+%     }
+%     \label{fig:pred_dist_diff_generic}
+% \end{figure}
+%
+% \begin{figure}[H]
+%     \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
+%     \caption{}
+%     \label{fig:pred_dist_dif_generic2}
+% \end{figure}
+%


 \end{document}
--- a/Paper/sections/10_CausalStory.tex
+++ b/Paper/sections/10_CausalStory.tex
@ -69,7 +69,7 @@ in the first place while currently observed safety and efficiency results
 help the sponsor judge whether or not to continue the trial.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Clinical Trials Data Sources}
+\subsection{Data Summary}
 %% Describe data here
 Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled 
 drugs or devices on human subjects must register 
--- a/Paper/sections/12_clinical_trial_background.tex
+++ b/Paper/sections/12_clinical_trial_background.tex
@ -58,10 +58,29 @@ purpose of the clinical trials process.
 On the other hand, when a trial terminates early due to reasons 
 other than safety or efficacy concerns, the trial operator does not learn
 if the drug is effective or safe. 
-This is a true failure in that we did not learn if the drug was effective or not.
-Unfortunately, although termination documentation typically includes a 
-description of a reason for the clinical trial termination, this doesn't necessarily
-list all the reasons contributing to the trial termination and may not exist for a given trial.
+This is a knowledge-gathering failure where the trial operator 
+did not learn if the drug was effective or not.
+I prefer describing a clinical trial as being terminated for 
+\begin{itemize}
+    \item Safety or Efficacy concerns
+    \item Strategic concerns
+    \item Operational concerns.
+\end{itemize}
+
+Unfortunately it can be difficult to know why a given trial was terminated,
+in spite of the fact that upon termination, trials typically record a 
+description of \textit{a single} reason for the clinical trial termination. 
+This doesn't necessarily list all the reasons contributing to the trial termination and may not exist for a given trial.
+For example, if a Principle Investigator leaves for another institution 
+(terminating the trial), is this decison affected by 
+a safety or efficacy concern, 
+a new competitor on the market, 
+difficulting recruiting participants,
+or a lack of financial support from the study sponsor? 
+Estimating the impact of different problems that trials face from these 
+low-information, post-hoc signals is insufficient.
+For this reason, I use clinical trial progression to estimate effects. 
+\todo{not sure if this is the best place for this.}

 As a trial goes through the different stages of recruitment, the investigators
 update the records on ClinicalTrials.gov.