Midday updates from writing

2 years ago · 64f3d14f7b
parent 1630af2928
commit 64f3d14f7b
4 changed files with 149 additions and 99 deletions
--- a/Paper/sections/04_EconometricModel.tex
+++ b/Paper/sections/04_EconometricModel.tex
@ -89,4 +89,47 @@ These include:
        I may have only done it in the CBO analysis.}
    }
 \end{itemize}
 \subsection{Interpretation}
 % Explain 
 % - What do we care about? Changes in the probability of 
 % - distribution of differences -> relate to E(\delta Y)
 % - How do we obtain this distribution of differences?
 %   - from the model, we pay attention to P under treatment and control
 %   - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
 %   - 
 The specific measure of interest is how much a delay in 
 closing enrollment changes the probability of terminating a trial
 $p_{i,n}$ in the model.
 In the standard reduced form causal inference, the treatment effect
 of interest for outcome $Z$ is measured as 
 \begin{align}
    E(Z(\text{Treatment}) - Z(\text{Control})) 
    = E(Z(\text{Treatment})) - E(Z(\text{Control}))
 \end{align}
 Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
 $Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. 
 In the bayesian framework, this parameter has a distribution, and so 
 we can calculate the distribution of differences in 
 the probability of termination due to a given delay in 
 closing recrutiment,
 $p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
 I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the 
 posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
 This involves taking a draw from the $\beta$s distribution, calculating
 $p_{i,n}(C)$ 
 for the underlying trials at the snapshot when they close enrollment
 and then calculating 
 $p_{i,n}(T)$ 
 under the counterfactual where enrollment had not yet closed.
 The difference 
 $\delta_{p_{i,n}}$ 
 is then calculated for each trial, and saved. 
 After repeating this for all the posterior samples, we have an esitmate 
 for the posterior distribution of differences between treatement and control.
 \end{document}
--- a/Paper/sections/06_Results.tex
+++ b/Paper/sections/06_Results.tex
@ -7,7 +7,7 @@ I describe the model fitting, the posteriors of the parameters of interest,
 and intepret the results.
-\subsection{Model Fitting}
+\subsection{Estimation Procedure}
 I fit the econometric model using mc-stan 
 \cite{standevelopmentteam_StanModelling_2022}
 through the rstan 
@ -27,47 +27,13 @@ sampling iterations in six chains.
 %look at trial 
-\subsection{Interpretation}
+\subsection{Primary Results}
 % Explain 
 % - What do we care about? Changes in the probability of 
 % - distribution of differences -> relate to E(\delta Y)
 % - How do we obtain this distribution of differences?
 %   - from the model, we pay attention to P under treatment and control
 %   - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
 %   - 
 The specific measure of interest is how much a delay in 
 closing enrollment changes the probability of terminating a trial
 $p_{i,n}$ in the model.
 In the standard reduced form causal inference, the treatment effect
 of interest for outcome $Z$ is measured as 
 \begin{align}
    E(Z(\text{Treatment}) - Z(\text{Control})) 
    = E(Z(\text{Treatment})) - E(Z(\text{Control}))
 \end{align}
 Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
 $Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. 
 In the bayesian framework, this parameter has a distribution, and so 
 we can calculate the distribution of differences in 
 the probability of termination due to a given delay in 
 closing recrutiment,
 $p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
 I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the 
 posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
 This involves taking a draw from the $\beta$s distribution, calculating
 $p_{i,n}(C)$ 
 for the underlying trials at the snapshot when they close enrollment
 and then calculating 
 $p_{i,n}(T)$ 
 under the counterfactual where enrollment had not yet closed.
 The difference 
 $\delta_{p_{i,n}}$ 
 is then calculated for each trial, and saved. 
 After repeating this for all the posterior samples, we have an esitmate 
 for the posterior distribution of differences.
 The primary, causally-identified value we can estimate is the change in 
 the probability of termination caused by (counterfactually) keeping enrollment
 open instead of closing enrollment when observed. 
 In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of 
 keeping enrollment open.
 \begin{figure}[H]
@ -107,6 +73,25 @@ termination.
 % - 
 % - 
 % The probability mass associated with a each 10 percentage point change are in table \ref{tab:regimes}
 % \begin{table}[H]
 % \caption{Regimes and associated probability masses}\label{tab:regimes}
 % \begin{center}
 % \begin{tabular}[c]{l|l}
 % 	\hline
 % 	\multicolumn{1}{c|}{\textbf{Interval}} & 
 % 	\multicolumn{1}{c}{\textbf{Probability Mass}} \\
 % 	\hline
 % 	$[,]$ & b \\
 % 	$[,]$ & b \\
 % 	$[,]$ & b \\
 % 	$[,]$ & b \\
 % 	$[,]$ & b \\
 % 	\hline
 % \end{tabular}
 % \end{center}
 % \end{table}
 Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
 result comes from different disease categories.
 \begin{figure}[H]
@ -115,45 +100,48 @@ result comes from different disease categories.
    \label{fig:pred_dist_dif_delay2}
 \end{figure}
-Overall, we can see that there appear to be some trials that are highly 
+Overall, we can see that there appear to be some trials or situations 
-suceptable to enrollment difficulties, and this appears to hold for all the 
+that are highly suceptable to enrollment difficulties, and this 
-disease categories
+appears to hold for all disease categories for which I have data.
-This may be due to low sample
+This relative homogeneity of results may be due to the 
-since these are using a hierarchal model -- which partially pools results -- 
+partial pooling effect from the hierarchal model 
-and the sample size per disease is rather small.
+and the fact that the sample size per disease is rather small.
-An additional explanation is that the variance in parameters 
+An additional explanation is that the variance of the parameter distributions
-might be high enough for the change to 
+might be high enough for each trial to have a few situation in which they have
-
+a high probability of terminating.
-
+
-Although it is not causally identified due to population interactions,
+
-we can examine the direct effect from adding a single generic competitior drug
+
-and how the similar result decomposes very differently.
+% Although it is not causally identified due to population interactions,
-Figure 
+% we can examine the direct effect from adding a single generic competitior drug
-\label{fig:pred_dist_diff_generic}
+% and how the similar result decomposes very differently.
-shows a very similar result with roughly the same regimes,
+% This is shown just as a contrast to the enrollment results.
-while 
+% Figure 
-\label{fig:pred_dist_dif_generic2}
+% \label{fig:pred_dist_diff_generic}
-shows that this breakdown is different.
+% shows a very similar result with roughly the same regimes,
-\todo{
+% while 
-    Consider moving these to an appendix as they are 
+% \label{fig:pred_dist_dif_generic2}
-    just additions at this point.
+% shows that this breakdown is different.
-}
+% \todo{
-
+%     Consider moving these to an appendix as they are 
-\begin{figure}[H]
+%     just additions at this point.
-    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
+% }
-	\caption{
+%
-	    Distribution of Predicted Differences for one additional generic 
+% \begin{figure}[H]
-	    competitor
+%     \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
-	}
+%     \caption{
-	\label{fig:pred_dist_diff_generic}
+%         Distribution of Predicted Differences for one additional generic 
-\end{figure}
+%         competitor
-
+%     }
-\begin{figure}[H]
+%     \label{fig:pred_dist_diff_generic}
-    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
+% \end{figure}
-	\caption{}
+%
-	\label{fig:pred_dist_dif_generic2}
+% \begin{figure}[H]
-\end{figure}
+%     \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
-
+%     \caption{}
 %     \label{fig:pred_dist_dif_generic2}
 % \end{figure}
 %
 \end{document}
--- a/Paper/sections/10_CausalStory.tex
+++ b/Paper/sections/10_CausalStory.tex
@ -69,7 +69,7 @@ in the first place while currently observed safety and efficiency results
 help the sponsor judge whether or not to continue the trial.
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Clinical Trials Data Sources}
+\subsection{Data Summary}
 %% Describe data here
 Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled 
 drugs or devices on human subjects must register 
--- a/Paper/sections/12_clinical_trial_background.tex
+++ b/Paper/sections/12_clinical_trial_background.tex
@ -58,10 +58,29 @@ purpose of the clinical trials process.
 On the other hand, when a trial terminates early due to reasons 
 other than safety or efficacy concerns, the trial operator does not learn
 if the drug is effective or safe. 
-This is a true failure in that we did not learn if the drug was effective or not.
+This is a knowledge-gathering failure where the trial operator 
-Unfortunately, although termination documentation typically includes a 
+did not learn if the drug was effective or not.
-description of a reason for the clinical trial termination, this doesn't necessarily
+I prefer describing a clinical trial as being terminated for 
-list all the reasons contributing to the trial termination and may not exist for a given trial.
+\begin{itemize}
    \item Safety or Efficacy concerns
    \item Strategic concerns
    \item Operational concerns.
 \end{itemize}
 Unfortunately it can be difficult to know why a given trial was terminated,
 in spite of the fact that upon termination, trials typically record a 
 description of \textit{a single} reason for the clinical trial termination. 
 This doesn't necessarily list all the reasons contributing to the trial termination and may not exist for a given trial.
 For example, if a Principle Investigator leaves for another institution 
 (terminating the trial), is this decison affected by 
 a safety or efficacy concern, 
 a new competitor on the market, 
 difficulting recruiting participants,
 or a lack of financial support from the study sponsor? 
 Estimating the impact of different problems that trials face from these 
 low-information, post-hoc signals is insufficient.
 For this reason, I use clinical trial progression to estimate effects. 
 \todo{not sure if this is the best place for this.}
 As a trial goes through the different stages of recruitment, the investigators
 update the records on ClinicalTrials.gov.