more updates

2 years ago · 1630af2928
parent 5d9640ab8d
commit 1630af2928
2 changed files with 125 additions and 85 deletions
--- a/Paper/sections/04_EconometricModel.tex
+++ b/Paper/sections/04_EconometricModel.tex
@ -19,7 +19,7 @@ First, some notation:
    \item $y_i$: whether each trial 
        terminated (true, 1) or completed (false, 0).
    \item $d_i$: indexes the ICD-10 disease category of the trial.
-    \item $x_{i,n}$: represents the other dependent 
+    \item $x_{i,n}$: represents the independent 
        variables associated with the snapshot.
 \end{itemize} 

--- a/Paper/sections/06_Results.tex
+++ b/Paper/sections/06_Results.tex
@ -1,113 +1,153 @@
 \documentclass[../Main.tex]{subfiles}

 \begin{document}
-%\subsection{Data Exploration} %TODO: fill this out later.
-%look at trial 
+
+In this section 
+I describe the model fitting, the posteriors of the parameters of interest,
+and intepret the results.
+
+
 \subsection{Model Fitting}
-In this section we examine the results from fitting the econometric model using
-mc-stan (\cite{mc-stan}) through the rstan (\cite{rstan}) interface.
+I fit the econometric model using mc-stan 
+\cite{standevelopmentteam_StanModelling_2022}
+through the rstan 
+\cite{standevelopmentteam_RStanInterface_2023}
+interface.
+
+I had X Trials with X snapshots in total. \todo{Fill out.} 

 %describe  
-The model was based on the hierarchal logistic regression model 
-presented in the Stan Users Guide (\cite{mc-stan}), 
-and was run with 2,500 warmup iterations and
-2,500 sampling iterations in six chains.
-There were various issues, including 160 divergent transitions and the R-hat 
-measure was 1.49. 
-Overall these suggest that the econometric model is incorrect as 
-written or requires reparameterization.
-%TODO: and info about how I learned about these diagnostics
-
-
-% \subsubsection{Diagnostics}
-% %Examine trank plots
-% To identify which parameters were problematic, I first looked at trace rank 
-% histograms.
-% Under idea circumstances, each line (representing a chain) should exchange 
-% places with the other lines frequently.
-% In both \cref{fig:mu_trank} and \cref{fig:sigma_trank}, most parameters seem
-% to mix well but there are a couple of exceptions.
-% This warrants further investigation.
-%
-% \begin{figure}[H]
-%     \includegraphics[width=\textwidth]{../assets/img/mu_trank.png}
-%     \caption{Trace Rank Histogram: Mu values}
-% 	\label{fig:mu_trank}
-% \end{figure}
-%
-% \begin{figure}[H]
-%     \includegraphics[width=\textwidth]{../assets/img/sigma_trank.png}
-%     \caption{Trace Rank Histogram: Sigma values}
-% 	\label{fig:sigma_trank}
-% \end{figure}
-%
-% %Take a look at batman and points for mu
-% In the case of the Mu values, a parallel coordinates plot 
-% doesn't seem to indicate any parameters as likely candidates
-% for causing the issues with divergent transitions.
-% \begin{figure}[H]
-%     \includegraphics[width=\textwidth]{../assets/img/mu_batman.png}
-%     \caption{Parallel Coordinate Plot: Mu values}
-% 	\label{fig:mu_batman}
-% \end{figure}
-% Note that at each parameter, there is some level of dispersion between 
-% values that diverged.
-%
-% On the other hand, in the parallel coordinates plot for sigma values,
-% it appears that most divergent transitions occur with values of 
-% sigma[1], sigma[3], sigma[6], and sigma[7] close to zero.
-% \begin{figure}[H]
-%     \includegraphics[width=\textwidth]{../assets/img/sigma_batman.png}
-%     \caption{Parallel Coordinate Plot: Sigma values}
-% 	\label{fig:sigma_batman}
-% \end{figure}
-% Overall this suggests that there is an issue with the specification
-% of the covariance structures of the hyperparameters.
-%
-% Additional evidence that the covariance structure is incorrect comes from 
-% plotting pairs of parameter values and examining the chains with divergent
-% transitions.
-%
-% \begin{figure}[H]
-%     \includegraphics[width=\textwidth]{../assets/img/sigma_pairs_5-9.png}
-% 	\caption{Parameter Pairs plots: Sigma[5] through Sigma[9]}
-% 	\label{fig:sigma_pairs_5-9.png}
-% \end{figure}
-% From this we can see that divergent pairs are highly correlated with the cases
-% where sigma[6] or sigma[7] are equal to zero.
-% This has an impact on the shape of both of those estimated parameters, causing
-% both to be bimodal.
+X\todo{UPDATE VALUES} 
+warmup iterations and
+X\todo{UPDATE VALUES} 
+sampling iterations in six chains.
+
+% \subsection{Data Exploration} 
+% \todo{fill this out later.}
+%look at trial 


 \subsection{Interpretation}
+% Explain 
+% - What do we care about? Changes in the probability of 
+% - distribution of differences -> relate to E(\delta Y)
+% - How do we obtain this distribution of differences?
+%   - from the model, we pay attention to P under treatment and control
+%   - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
+%   - 
+
+The specific measure of interest is how much a delay in 
+closing enrollment changes the probability of terminating a trial
+$p_{i,n}$ in the model.
+
+In the standard reduced form causal inference, the treatment effect
+of interest for outcome $Z$ is measured as 
+\begin{align}
+    E(Z(\text{Treatment}) - Z(\text{Control})) 
+    = E(Z(\text{Treatment})) - E(Z(\text{Control}))
+\end{align}
+Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
+$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable. 
+In the bayesian framework, this parameter has a distribution, and so 
+we can calculate the distribution of differences in 
+the probability of termination due to a given delay in 
+closing recrutiment,
+$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
+
+I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the 
+posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
+This involves taking a draw from the $\beta$s distribution, calculating
+$p_{i,n}(C)$ 
+for the underlying trials at the snapshot when they close enrollment
+and then calculating 
+$p_{i,n}(T)$ 
+under the counterfactual where enrollment had not yet closed.
+The difference 
+$\delta_{p_{i,n}}$ 
+is then calculated for each trial, and saved. 
+After repeating this for all the posterior samples, we have an esitmate 
+for the posterior distribution of differences.

-The key results so far are related to the distribution of differences in $p$.

-In figure \ref{fig:pred_dist_dif_delay} we see that there while most trials do not see any increased risk 
-from a delay in closing enrollment, there is a small group that does experience this.

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay}
-	\caption{}
+	\small{
+	    Values near 1 indicate a near perfect increase in the probability 
+	    of termination. 
+	    Values near 0 indicate little change in probability,
+	    while values near -1, represent a decrease in the probability
+	    of termination. 
+	    The scale is in probability points, thus a value near 1 is a change 
+	    from unlikely to terminate under control, to highly likely to 
+	    terminate.
+	}
+	\caption{Distribution of Predicted Differences}
 	\label{fig:pred_dist_diff_delay}
 \end{figure}

-Figure \ref{fig:pred_dist_dif_delay2} shows how this varies across disease categories
+We can see from figure 
+\ref{fig:pred_dist_diff_delay} 
+That there are roughly four regimes. 
+The first consists of trials that experiences nearly no effect,
+i.e. have values near zero.
+Trials in the second regime experience a mild to large reduction in 
+the probability of termination, with X percent of the probability mass 
+between about 5 percentage points and 50 percentage point  reductions.
+The third regime is those trials that experience a mild to large 
+increase in the probability of termination, 
+from an increase o 5 percentage points to about 75 percentage points. 
+The fourth and final regime is the X\% of trials that experience a significant
+(greater than 75 percentage point) increase in the probability of 
+termination.
+%Notes on interpretation
+% - increase vs decrease on graph 
+% - 
+% - 
+% - 
+% - 
+
+Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
+result comes from different disease categories.
 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group}
-	\caption{}
+	\caption{Distribution of Predicted differences by Disease Group}
 	\label{fig:pred_dist_dif_delay2}
 \end{figure}

-We can also examine the direct effect from adding a single generic competitior drug.
+Overall, we can see that there appear to be some trials that are highly 
+suceptable to enrollment difficulties, and this appears to hold for all the 
+disease categories
+This may be due to low sample
+since these are using a hierarchal model -- which partially pools results -- 
+and the sample size per disease is rather small.
+An additional explanation is that the variance in parameters 
+might be high enough for the change to 
+
+
+Although it is not causally identified due to population interactions,
+we can examine the direct effect from adding a single generic competitior drug
+and how the similar result decomposes very differently.
+Figure 
+\label{fig:pred_dist_diff_generic}
+shows a very similar result with roughly the same regimes,
+while 
+\label{fig:pred_dist_dif_generic2}
+shows that this breakdown is different.
+\todo{
+    Consider moving these to an appendix as they are 
+    just additions at this point.
+}

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
-	\caption{}
+	\caption{
+	    Distribution of Predicted Differences for one additional generic 
+	    competitor
+	}
 	\label{fig:pred_dist_diff_generic}
 \end{figure}

-Figure \ref{fig:pred_dist_dif_generic2} shows how this varies across disease categories
 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
 	\caption{}