more updates

claude_rewrite
will king 1 year ago
parent 5d9640ab8d
commit 1630af2928

@ -19,7 +19,7 @@ First, some notation:
\item $y_i$: whether each trial
terminated (true, 1) or completed (false, 0).
\item $d_i$: indexes the ICD-10 disease category of the trial.
\item $x_{i,n}$: represents the other dependent
\item $x_{i,n}$: represents the independent
variables associated with the snapshot.
\end{itemize}

@ -1,113 +1,153 @@
\documentclass[../Main.tex]{subfiles}
\begin{document}
%\subsection{Data Exploration} %TODO: fill this out later.
%look at trial
In this section
I describe the model fitting, the posteriors of the parameters of interest,
and intepret the results.
\subsection{Model Fitting}
In this section we examine the results from fitting the econometric model using
mc-stan (\cite{mc-stan}) through the rstan (\cite{rstan}) interface.
I fit the econometric model using mc-stan
\cite{standevelopmentteam_StanModelling_2022}
through the rstan
\cite{standevelopmentteam_RStanInterface_2023}
interface.
I had X Trials with X snapshots in total. \todo{Fill out.}
%describe
The model was based on the hierarchal logistic regression model
presented in the Stan Users Guide (\cite{mc-stan}),
and was run with 2,500 warmup iterations and
2,500 sampling iterations in six chains.
There were various issues, including 160 divergent transitions and the R-hat
measure was 1.49.
Overall these suggest that the econometric model is incorrect as
written or requires reparameterization.
%TODO: and info about how I learned about these diagnostics
% \subsubsection{Diagnostics}
% %Examine trank plots
% To identify which parameters were problematic, I first looked at trace rank
% histograms.
% Under idea circumstances, each line (representing a chain) should exchange
% places with the other lines frequently.
% In both \cref{fig:mu_trank} and \cref{fig:sigma_trank}, most parameters seem
% to mix well but there are a couple of exceptions.
% This warrants further investigation.
%
% \begin{figure}[H]
% \includegraphics[width=\textwidth]{../assets/img/mu_trank.png}
% \caption{Trace Rank Histogram: Mu values}
% \label{fig:mu_trank}
% \end{figure}
%
% \begin{figure}[H]
% \includegraphics[width=\textwidth]{../assets/img/sigma_trank.png}
% \caption{Trace Rank Histogram: Sigma values}
% \label{fig:sigma_trank}
% \end{figure}
%
% %Take a look at batman and points for mu
% In the case of the Mu values, a parallel coordinates plot
% doesn't seem to indicate any parameters as likely candidates
% for causing the issues with divergent transitions.
% \begin{figure}[H]
% \includegraphics[width=\textwidth]{../assets/img/mu_batman.png}
% \caption{Parallel Coordinate Plot: Mu values}
% \label{fig:mu_batman}
% \end{figure}
% Note that at each parameter, there is some level of dispersion between
% values that diverged.
%
% On the other hand, in the parallel coordinates plot for sigma values,
% it appears that most divergent transitions occur with values of
% sigma[1], sigma[3], sigma[6], and sigma[7] close to zero.
% \begin{figure}[H]
% \includegraphics[width=\textwidth]{../assets/img/sigma_batman.png}
% \caption{Parallel Coordinate Plot: Sigma values}
% \label{fig:sigma_batman}
% \end{figure}
% Overall this suggests that there is an issue with the specification
% of the covariance structures of the hyperparameters.
%
% Additional evidence that the covariance structure is incorrect comes from
% plotting pairs of parameter values and examining the chains with divergent
% transitions.
%
% \begin{figure}[H]
% \includegraphics[width=\textwidth]{../assets/img/sigma_pairs_5-9.png}
% \caption{Parameter Pairs plots: Sigma[5] through Sigma[9]}
% \label{fig:sigma_pairs_5-9.png}
% \end{figure}
% From this we can see that divergent pairs are highly correlated with the cases
% where sigma[6] or sigma[7] are equal to zero.
% This has an impact on the shape of both of those estimated parameters, causing
% both to be bimodal.
X\todo{UPDATE VALUES}
warmup iterations and
X\todo{UPDATE VALUES}
sampling iterations in six chains.
% \subsection{Data Exploration}
% \todo{fill this out later.}
%look at trial
\subsection{Interpretation}
% Explain
% - What do we care about? Changes in the probability of
% - distribution of differences -> relate to E(\delta Y)
% - How do we obtain this distribution of differences?
% - from the model, we pay attention to P under treatment and control
% - We obtain this by fitting the model, then simulating under treatment and control, and taking the difference in the probability.
% -
The specific measure of interest is how much a delay in
closing enrollment changes the probability of terminating a trial
$p_{i,n}$ in the model.
In the standard reduced form causal inference, the treatment effect
of interest for outcome $Z$ is measured as
\begin{align}
E(Z(\text{Treatment}) - Z(\text{Control}))
= E(Z(\text{Treatment})) - E(Z(\text{Control}))
\end{align}
Because $Z(\text{Treatment})$ and $Z(\text{Control})$ are random variables,
$Z(\text{Treatment}) - Z(\text{Control}) = \delta_Z$, is also a random variable.
In the bayesian framework, this parameter has a distribution, and so
we can calculate the distribution of differences in
the probability of termination due to a given delay in
closing recrutiment,
$p_{i,n}(T) - p_{i,n}(C) = \delta_{p_{i,n}}$.
I calculate the posterior distribution of $\delta_{p_{i,n}}$ by estimating the
posterior distributions of the $\beta$s and then simulating $\delta_{p_{i,n}}$.
This involves taking a draw from the $\beta$s distribution, calculating
$p_{i,n}(C)$
for the underlying trials at the snapshot when they close enrollment
and then calculating
$p_{i,n}(T)$
under the counterfactual where enrollment had not yet closed.
The difference
$\delta_{p_{i,n}}$
is then calculated for each trial, and saved.
After repeating this for all the posterior samples, we have an esitmate
for the posterior distribution of differences.
The key results so far are related to the distribution of differences in $p$.
In figure \ref{fig:pred_dist_dif_delay} we see that there while most trials do not see any increased risk
from a delay in closing enrollment, there is a small group that does experience this.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay}
\caption{}
\small{
Values near 1 indicate a near perfect increase in the probability
of termination.
Values near 0 indicate little change in probability,
while values near -1, represent a decrease in the probability
of termination.
The scale is in probability points, thus a value near 1 is a change
from unlikely to terminate under control, to highly likely to
terminate.
}
\caption{Distribution of Predicted Differences}
\label{fig:pred_dist_diff_delay}
\end{figure}
Figure \ref{fig:pred_dist_dif_delay2} shows how this varies across disease categories
We can see from figure
\ref{fig:pred_dist_diff_delay}
That there are roughly four regimes.
The first consists of trials that experiences nearly no effect,
i.e. have values near zero.
Trials in the second regime experience a mild to large reduction in
the probability of termination, with X percent of the probability mass
between about 5 percentage points and 50 percentage point reductions.
The third regime is those trials that experience a mild to large
increase in the probability of termination,
from an increase o 5 percentage points to about 75 percentage points.
The fourth and final regime is the X\% of trials that experience a significant
(greater than 75 percentage point) increase in the probability of
termination.
%Notes on interpretation
% - increase vs decrease on graph
% -
% -
% -
% -
Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
result comes from different disease categories.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-delay-group}
\caption{}
\caption{Distribution of Predicted differences by Disease Group}
\label{fig:pred_dist_dif_delay2}
\end{figure}
We can also examine the direct effect from adding a single generic competitior drug.
Overall, we can see that there appear to be some trials that are highly
suceptable to enrollment difficulties, and this appears to hold for all the
disease categories
This may be due to low sample
since these are using a hierarchal model -- which partially pools results --
and the sample size per disease is rather small.
An additional explanation is that the variance in parameters
might be high enough for the change to
Although it is not causally identified due to population interactions,
we can examine the direct effect from adding a single generic competitior drug
and how the similar result decomposes very differently.
Figure
\label{fig:pred_dist_diff_generic}
shows a very similar result with roughly the same regimes,
while
\label{fig:pred_dist_dif_generic2}
shows that this breakdown is different.
\todo{
Consider moving these to an appendix as they are
just additions at this point.
}
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic}
\caption{}
\caption{
Distribution of Predicted Differences for one additional generic
competitor
}
\label{fig:pred_dist_diff_generic}
\end{figure}
Figure \ref{fig:pred_dist_dif_generic2} shows how this varies across disease categories
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/current/pred_dist_diff-generic-group}
\caption{}

Loading…
Cancel
Save