|
|
|
|
@ -1,113 +1,153 @@
|
|
|
|
|
\documentclass[../Main.tex]{subfiles}
|
|
|
|
|
|
|
|
|
|
\begin{document}
|
|
|
|
|
%\subsection{Data Exploration} %TODO: fill this out later.
|
|
|
|
|
%look at trial
|
|
|
|
|
|
|
|
|
|
In this section
|
|
|
|
|
I describe the model fitting, the posteriors of the parameters of interest,
|
|
|
|
|
and intepret the results.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Model Fitting}
|
|
|
|
|
In this section we examine the results from fitting the econometric model using
|
|
|
|
|
mc-stan (\cite{mc-stan}) through the rstan (\cite{rstan}) interface.
|
|
|
|
|
|
|
|
|
|
%describe
|
|
|
|
|
The model was based on the hierarchal logistic regression model
|
|
|
|
|
presented in the Stan Users Guide (\cite{mc-stan}),
|
|
|
|
|
and was run with 2,500 warmup iterations and
|
|
|
|
|
2,500 sampling iterations in six chains.
|
|
|
|
|
There were various issues, including 160 divergent transitions and the R-hat
|
|
|
|
|
measure was 1.49.
|
|
|
|
|
Overall these suggest that the econometric model is incorrect as
|
|
|
|
|
written or requires reparameterization.
|
|
|
|
|
%TODO: and info about how I learned about these diagnostics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
% \subsubsection{Diagnostics}
|
|
|
|
|
% %Examine trank plots
|
|
|
|
|
% To identify which parameters were problematic, I first looked at trace rank
|
|
|
|
|
% histograms.
|
|
|
|
|
% Under idea circumstances, each line (representing a chain) should exchange
|
|
|
|
|
% places with the other lines frequently.
|
|
|
|
|
% In both \cref{fig:mu_trank} and \cref{fig:sigma_trank}, most parameters seem
|
|
|
|
|
% to mix well but there are a couple of exceptions.
|
|
|
|
|
% This warrants further investigation.
|
|
|
|
|
%
|
|
|
|
|
% \begin{figure}[H]
|
|
|
|
|
% \includegraphics[width=\textwidth]{../assets/img/mu_trank.png}
|
|
|
|
|
% \caption{Trace Rank Histogram: Mu values}
|
|
|
|
|
% \label{fig:mu_trank}
|
|
|
|
|
% \end{figure}
|
|
|
|
|
%
|
|
|
|
|
% \begin{figure}[H]
|
|
|
|
|
% \includegraphics[width=\textwidth]{../assets/img/sigma_trank.png}
|
|
|
|
|
% \caption{Trace Rank Histogram: Sigma values}
|
|
|
|
|
% \label{fig:sigma_trank}
|
|
|
|
|
% \end{figure}
|
|
|
|
|
%
|
|
|
|
|
% %Take a look at batman and points for mu
|
|
|
|
|
% In the case of the Mu values, a parallel coordinates plot
|
|
|
|
|
% doesn't seem to indicate any parameters as likely candidates
|
|
|
|
|
% for causing the issues with divergent transitions.
|
|
|
|
|
% \begin{figure}[H]
|
|
|
|
|
% \includegraphics[width=\textwidth]{../assets/img/mu_batman.png}
|
|
|
|
|
% \caption{Parallel Coordinate Plot: Mu values}
|
|
|
|
|
% \label{fig:mu_batman}
|
|
|
|
|
% \end{figure}
|
|
|
|
|
% Note that at each parameter, there is some level of dispersion between
|
|
|
|
|
% values that diverged.
|
|
|
|
|
%
|
|
|
|
|
% On the other hand, in the parallel coordinates plot for sigma values,
|
|
|
|
|
% it appears that most divergent transitions occur with values of
|
|
|
|
|
% sigma[1], sigma[3], sigma[6], and sigma[7] close to zero.
|
|
|
|
|
% \begin{figure}[H]
|
|
|
|
|
% \includegraphics[width=\textwidth]{../assets/img/sigma_batman.png}
|
|
|
|
|
% \caption{Parallel Coordinate Plot: Sigma values}
|
|
|
|
|
% \label{fig:sigma_batman}
|
|
|
|
|
% \end{figure}
|
|
|
|
|
% Overall this suggests that there is an issue with the specification
|
|
|
|
|
% of the covariance structures of the hyperparameters.
|
|
|
|
|
%
|
|
|
|
|
% Additional evidence that the covariance structure is incorrect comes from
|
|
|
|
|
% plotting pairs of parameter values and examining the chains with divergent
|
|
|
|
|
% transitions.
|
|
|
|
|
%
|
|
|
|
|
% \begin{figure}[H]
|
|
|
|
|
% \includegraphics[width=\textwidth]{../assets/img/sigma_pairs_5-9.png}
|
|
|
|
|
% \caption{Parameter Pairs plots: Sigma[5] through Sigma[9]}
|
|
|
|
|
% \label{fig:sigma_pairs_5-9.png}
|
|
|
|
|
% \end{figure}
|
|
|
|
|
% From this we can see that divergent pairs are highly correlated with the cases
|
|
|
|
|
% where sigma[6] or sigma[7] are equal to zero.
|
|
|
|
|
% This has an impact on the shape of both of those estimated parameters, causing
|
|
|
|
|
% both to be bimodal.
|
|
|
|
|
I fit the econometric model using mc-stan
|
|
|
|
|
\cite{standevelopmentteam_StanModelling_2022}
|
|
|
|
|
through the rstan
|
|
|
|
|
\cite{standevelopmentteam_RStanInterface_2023}
|
|
|
|
|
interface.
|
|
|
|
|
|
|
|
|
|
I had X Trials with X snapshots in total. \todo{Fill out.}
|
|
|
|
|
|
|
|
|
|
%describe
|
|
|
|
|
X\todo{UPDATE VALUES}
|
|
|
|
|
warmup iterations and
|
|
|
|
|
X\todo{UPDATE VALUES}
|
|
|
|
|
sampling iterations in six chains.
|
|
|
|
|
|
|
|
|
|
% \subsection{Data Exploration}
|
|
|
|
|
% \todo{fill this out later.}
|
|
|
|
|
%look at trial
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\subsection{Interpretation}
|
|
|
|
|
% Explain
|
|
|
|
|
% - What do we care about? Changes in the probability of
|
|
|
|
|
% - distribution of differences -> relate to E(\delta Y)
|
|
|
|
|
% - How do we obtain this distribution of differences?
|
|
|
|
|
% - from the model, we pay attention to P under treatment and control
|
|
|
|
|
% - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
|
|
|
|
|
% -
|
|
|
|
|
|
|
|
|
|
The specific measure of interest is how much a delay in
|
|
|
|
|
closing enrollment changes the probability of terminating a trial
|
|
|
|
|
$p_{i,n}$ in the model.
|
|
|
|
|
|
|
|
|
|
In the standard reduced form causal inference, the treatment effect
|
|
|
|
|
of interest for outcome $Z$ is measured as
|
|
|
|
|
\begin{align}
|
|
|
|
|
E(Z(\text{Treatment}) - Z(\text{Control}))
|
|
|
|
|
= E(Z(\text{Treatment})) - E(Z(\text{Control}))
|
|
|
|
|
\end{align}
|
|
|
|
|
Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
|
|
|
|
|
$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable.
|
|
|
|
|
In the bayesian framework, this parameter has a distribution, and so
|
|
|
|
|
we can calculate the distribution of differences in
|
|
|
|
|
the probability of termination due to a given delay in
|
|
|
|
|
closing recrutiment,
|
|
|
|
|
$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
|
|
|
|
|
|
|
|
|
|
I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the
|
|
|
|
|
posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
|
|
|
|
|
This involves taking a draw from the $\beta$s distribution, calculating
|
|
|
|
|
$p_{i,n}(C)$
|
|
|
|
|
for the underlying trials at the snapshot when they close enrollment
|
|
|
|
|
and then calculating
|
|
|
|
|
$p_{i,n}(T)$
|
|
|
|
|
under the counterfactual where enrollment had not yet closed.
|
|
|
|
|
The difference
|
|
|
|
|
$\delta_{p_{i,n}}$
|
|
|
|
|
is then calculated for each trial, and saved.
|
|
|
|
|
After repeating this for all the posterior samples, we have an esitmate
|
|
|
|
|
for the posterior distribution of differences.
|
|
|
|
|
|
|
|
|
|
The key results so far are related to the distribution of differences in $p$.
|
|
|
|
|
|
|
|
|
|
In figure \ref{fig:pred_dist_dif_delay} we see that there while most trials do not see any increased risk
|
|
|
|
|
from a delay in closing enrollment, there is a small group that does experience this.
|
|
|
|
|
|
|
|
|
|
\begin{figure}[H]
|
|
|
|
|
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay}
|
|
|
|
|
\caption{}
|
|
|
|
|
\small{
|
|
|
|
|
Values near 1 indicate a near perfect increase in the probability
|
|
|
|
|
of termination.
|
|
|
|
|
Values near 0 indicate little change in probability,
|
|
|
|
|
while values near -1, represent a decrease in the probability
|
|
|
|
|
of termination.
|
|
|
|
|
The scale is in probability points, thus a value near 1 is a change
|
|
|
|
|
from unlikely to terminate under control, to highly likely to
|
|
|
|
|
terminate.
|
|
|
|
|
}
|
|
|
|
|
\caption{Distribution of Predicted Differences}
|
|
|
|
|
\label{fig:pred_dist_diff_delay}
|
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
|
Figure \ref{fig:pred_dist_dif_delay2} shows how this varies across disease categories
|
|
|
|
|
We can see from figure
|
|
|
|
|
\ref{fig:pred_dist_diff_delay}
|
|
|
|
|
That there are roughly four regimes.
|
|
|
|
|
The first consists of trials that experiences nearly no effect,
|
|
|
|
|
i.e. have values near zero.
|
|
|
|
|
Trials in the second regime experience a mild to large reduction in
|
|
|
|
|
the probability of termination, with X percent of the probability mass
|
|
|
|
|
between about 5 percentage points and 50 percentage point reductions.
|
|
|
|
|
The third regime is those trials that experience a mild to large
|
|
|
|
|
increase in the probability of termination,
|
|
|
|
|
from an increase o 5 percentage points to about 75 percentage points.
|
|
|
|
|
The fourth and final regime is the X\% of trials that experience a significant
|
|
|
|
|
(greater than 75 percentage point) increase in the probability of
|
|
|
|
|
termination.
|
|
|
|
|
%Notes on interpretation
|
|
|
|
|
% - increase vs decrease on graph
|
|
|
|
|
% -
|
|
|
|
|
% -
|
|
|
|
|
% -
|
|
|
|
|
% -
|
|
|
|
|
|
|
|
|
|
Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
|
|
|
|
|
result comes from different disease categories.
|
|
|
|
|
\begin{figure}[H]
|
|
|
|
|
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group}
|
|
|
|
|
\caption{}
|
|
|
|
|
\caption{Distribution of Predicted differences by Disease Group}
|
|
|
|
|
\label{fig:pred_dist_dif_delay2}
|
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
|
We can also examine the direct effect from adding a single generic competitior drug.
|
|
|
|
|
Overall, we can see that there appear to be some trials that are highly
|
|
|
|
|
suceptable to enrollment difficulties, and this appears to hold for all the
|
|
|
|
|
disease categories
|
|
|
|
|
This may be due to low sample
|
|
|
|
|
since these are using a hierarchal model -- which partially pools results --
|
|
|
|
|
and the sample size per disease is rather small.
|
|
|
|
|
An additional explanation is that the variance in parameters
|
|
|
|
|
might be high enough for the change to
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Although it is not causally identified due to population interactions,
|
|
|
|
|
we can examine the direct effect from adding a single generic competitior drug
|
|
|
|
|
and how the similar result decomposes very differently.
|
|
|
|
|
Figure
|
|
|
|
|
\label{fig:pred_dist_diff_generic}
|
|
|
|
|
shows a very similar result with roughly the same regimes,
|
|
|
|
|
while
|
|
|
|
|
\label{fig:pred_dist_dif_generic2}
|
|
|
|
|
shows that this breakdown is different.
|
|
|
|
|
\todo{
|
|
|
|
|
Consider moving these to an appendix as they are
|
|
|
|
|
just additions at this point.
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
\begin{figure}[H]
|
|
|
|
|
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
|
|
|
|
|
\caption{}
|
|
|
|
|
\caption{
|
|
|
|
|
Distribution of Predicted Differences for one additional generic
|
|
|
|
|
competitor
|
|
|
|
|
}
|
|
|
|
|
\label{fig:pred_dist_diff_generic}
|
|
|
|
|
\end{figure}
|
|
|
|
|
|
|
|
|
|
Figure \ref{fig:pred_dist_dif_generic2} shows how this varies across disease categories
|
|
|
|
|
\begin{figure}[H]
|
|
|
|
|
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
|
|
|
|
|
\caption{}
|
|
|
|
|
|