You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
JobMarketPaper/Paper/sections/06_Results.tex

160 lines
5.5 KiB
TeX

\documentclass[../Main.tex]{subfiles}
\begin{document}
In this section
I describe the model fitting, the posteriors of the parameters of interest,
and intepret the results.
\subsection{Model Fitting}
I fit the econometric model using mc-stan
\cite{standevelopmentteam_StanModelling_2022}
through the rstan
\cite{standevelopmentteam_RStanInterface_2023}
interface.
I had X Trials with X snapshots in total. \todo{Fill out.}
%describe
X\todo{UPDATE VALUES}
warmup iterations and
X\todo{UPDATE VALUES}
sampling iterations in six chains.
% \subsection{Data Exploration}
% \todo{fill this out later.}
%look at trial
\subsection{Interpretation}
% Explain
% - What do we care about? Changes in the probability of
% - distribution of differences -> relate to E(\delta Y)
% - How do we obtain this distribution of differences?
% - from the model, we pay attention to P under treatment and control
% - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
% -
The specific measure of interest is how much a delay in
closing enrollment changes the probability of terminating a trial
$p_{i,n}$ in the model.
In the standard reduced form causal inference, the treatment effect
of interest for outcome $Z$ is measured as
\begin{align}
E(Z(\text{Treatment}) - Z(\text{Control}))
= E(Z(\text{Treatment})) - E(Z(\text{Control}))
\end{align}
Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable.
In the bayesian framework, this parameter has a distribution, and so
we can calculate the distribution of differences in
the probability of termination due to a given delay in
closing recrutiment,
$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the
posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
This involves taking a draw from the $\beta$s distribution, calculating
$p_{i,n}(C)$
for the underlying trials at the snapshot when they close enrollment
and then calculating
$p_{i,n}(T)$
under the counterfactual where enrollment had not yet closed.
The difference
$\delta_{p_{i,n}}$
is then calculated for each trial, and saved.
After repeating this for all the posterior samples, we have an esitmate
for the posterior distribution of differences.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay}
\small{
Values near 1 indicate a near perfect increase in the probability
of termination.
Values near 0 indicate little change in probability,
while values near -1, represent a decrease in the probability
of termination.
The scale is in probability points, thus a value near 1 is a change
from unlikely to terminate under control, to highly likely to
terminate.
}
\caption{Distribution of Predicted Differences}
\label{fig:pred_dist_diff_delay}
\end{figure}
We can see from figure
\ref{fig:pred_dist_diff_delay}
That there are roughly four regimes.
The first consists of trials that experiences nearly no effect,
i.e. have values near zero.
Trials in the second regime experience a mild to large reduction in
the probability of termination, with X percent of the probability mass
between about 5 percentage points and 50 percentage point reductions.
The third regime is those trials that experience a mild to large
increase in the probability of termination,
from an increase o 5 percentage points to about 75 percentage points.
The fourth and final regime is the X\% of trials that experience a significant
(greater than 75 percentage point) increase in the probability of
termination.
%Notes on interpretation
% - increase vs decrease on graph
% -
% -
% -
% -
Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
result comes from different disease categories.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group}
\caption{Distribution of Predicted differences by Disease Group}
\label{fig:pred_dist_dif_delay2}
\end{figure}
Overall, we can see that there appear to be some trials that are highly
suceptable to enrollment difficulties, and this appears to hold for all the
disease categories
This may be due to low sample
since these are using a hierarchal model -- which partially pools results --
and the sample size per disease is rather small.
An additional explanation is that the variance in parameters
might be high enough for the change to
Although it is not causally identified due to population interactions,
we can examine the direct effect from adding a single generic competitior drug
and how the similar result decomposes very differently.
Figure
\label{fig:pred_dist_diff_generic}
shows a very similar result with roughly the same regimes,
while
\label{fig:pred_dist_dif_generic2}
shows that this breakdown is different.
\todo{
Consider moving these to an appendix as they are
just additions at this point.
}
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
\caption{
Distribution of Predicted Differences for one additional generic
competitor
}
\label{fig:pred_dist_diff_generic}
\end{figure}
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
\caption{}
\label{fig:pred_dist_dif_generic2}
\end{figure}
\end{document}