\todo{Replace this graphic with the histdiff with boxplot}
\small{
\small{
Values near 1 indicate a near perfect increase in the probability
Values near 1 indicate a near perfect increase in the probability
of termination.
of termination.
@ -128,16 +100,14 @@ keeping enrollment open.
There are a few interesting things to point out here.
There are a few interesting things to point out here.
Let's start by getting aquainted with the details of the distribution above.
Let's start by getting aquainted with the details of the distribution above.
It can be devided into a few different regimes.
% - spike at 0
% - spike at 0
% - the boxplot
% - the boxplot
% - 63% of mass below 0 : find better way to say that
% - 63% of mass below 0 : find better way to say that
% - For a random trial, there is a 63% chance that the impact is to reduce the probability of a termination.
% - For a random trial, there is a 63% chance that the impact is to reduce the probability of a termination.
% - 2 pctg-point wide band centered on 0 has ~13% of the masss
% - 2 pctg-point wide band centered on 0 has ~13% of the masss
% - mean represents 9.x% increase in probability of termination. A quick simulation gives about the same pctg-point increase in terminated trials.
% - mean represents 9.x% increase in probability of termination. A quick simulation gives about the same pctg-point increase in terminated trials.
A few interesting interpretation bits come out of this.
% - there are 3 regimes: low impact (near zero), medium impact (concentrated in decreased probability of termination), and high impact (concentrated in increased probability of termination).
% - there are 3 regimes: low impact (near zero), medium impact (concentrated in decreased probability of termination), and high impact (concentrated in increased probability of termination).
The first this that there appear to be three different regimes.
The first regime consists of the low impact results, i.e. those values of $\delta_p$
The first regime consists of the low impact results, i.e. those values of $\delta_p$
near zero.
near zero.
About 13\% of trials lie within a single percentage point change of zero,
About 13\% of trials lie within a single percentage point change of zero,
@ -155,71 +125,57 @@ from a case where they were highly likely to complete their primary objectives t
a case where they were likely or almost certain to terminate the trial early.
a case where they were likely or almost certain to terminate the trial early.
% - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing.
% - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing.
% - Potential Explanations for high impact regime:
Based on the boxplot below, there are a couple of things to note.
How could this intervention have such a wide range in the intensity
First, the median effect is a 2.3 percentage point decrease
and direction of impacts?
in the probability of termination.
A few explanations include that some trials are suceptable or that this is a
Second, for a random selction from our trials,
result of too little data.
there is a 63\% chance that the impact is to
% - Some trials are highly suceptable. This is the face value effect
reduce the probability of a termination.
One option is that some categories are more suceptable to
Third, about 13\% of the probability mass is contained within the interval
issues with participant enrollment.
[-0.1,0.1].
If this is the case, we should be able to isolate categories that contribute
Finally, the mean effect is measured as a 9.6 percentage point increase in
the most to this effect.
the probability of termination.
Another is that this might be a modelling artefact, due to the relatively
The full percentile table can be found in
low number of trials in certain ICD-10 categories.
In short, there might be high levels of uncertanty in some parameter values,
in appendix
which manifest as fat tails in the distributions of the $\beta$ parameters.
\ref{Appendix:Results}
Because of the logistic format of the model, these fat tails lead to
extreme values of $p$, and potentally large changes $\delta_p$.
% Looking at the spike around zero, we find that 13.09% of the probability mass
% - Could be uncertanty. If the model is highly uncertain, e.g. there isn't enough data, we could have a small percentage of large increases. This could be in general or just for a few categories with low amounts of data.
% is contained within the band from [-1,1].
% -
% Additionally, there was 33.4282738% of the probability above that
% -
%– representing those with a general increase in the
% probability of termination – and 53.4817262% of the probability mass
I believe that this second explanation -- a model artifact due to uncertanty --
% below the band – representing a decrease in the probability of termination.
is likely to be the cause.
% On average, if you keep the trial open instead of closing it, 0.6337363% of
Three points lead me to believe this:
% trials will see a decrease in the probability of termination, but, due to
\begin{itemize}
% the high increase in probability of termination given termination was
\item The low fractions of E-BFMI suggest that the sampler is struggling
% increased, the mean probability of termination increases by 0.0964726.
to explore some regions of the posterior.
According to \cite{standevelopmentteam_RuntimeWarnings_2022} this is
often due to thick tails of posterior distributions.
% Pulled the data from the report
\item When we examine the results across different ICD-10 groups,
\caption{Distribution of parameters associated with ``Active, not recruiting'' status, by ICD-10 Category}
\caption{Distribution of parameters associated with ``Active, not recruiting'' status, by ICD-10 Category}
@ -227,147 +183,66 @@ result comes from different disease categories.
\end{figure}
\end{figure}
% -
% -
\subsection{Primary Results}
Finally, in figure \ref{fig:parameters_ANR_by_group}, we can see the estimated distributions of the $\beta$ parameter for
the status: \textbf{Active, not recruiting}.
The primary, causally-identified value we can estimate is the change in
The prior distributions were centered on zero, but we can see that the pooled learning has moved the mean
the probability of termination caused by (counterfactually) keeping enrollment
values negative, representing reductions in the probability of termination across the board.
open instead of closing enrollment when observed.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=49$),
In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of
Musculoskeletal diseases ($n=17$), and Infections and Parasites ($n=20$), the three categories with the most data.
keeping enrollment open.
As this is a comparison against the trial status XXX, we note that
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it. YES, THIS UPDATE NEEDS TO HAPPEN. The base needs to be ``active not recruiting.''}
Overall, this suggests that extending a clinical trial's enrollment period will reduce the probability of termination.
Values near 1 indicate a near perfect increase in the probability
of termination.
Values near 0 indicate little change in probability,
while values near -1, represent a decrease in the probability
of termination.
The scale is in probability points, thus a value near 1 is a change
from unlikely to terminate under control, to highly likely to
terminate.
}
\caption{Histogram of the Distribution of Predicted Differences}
\label{fig:pred_dist_diff_delay}
\end{figure}
There are a few interesting things to point out here.
Let's start by getting aquainted with the details of the distribution above.
% - spike at 0
% - the boxplot
% - 63% of mass below 0 : find better way to say that
% - For a random trial, there is a 63% chance that the impact is to reduce the probability of a termination.
% - 2 pctg-point wide band centered on 0 has ~13% of the masss
% - mean represents 9.x% increase in probability of termination. A quick simulation gives about the same pctg-point increase in terminated trials.
A few interesting interpretation bits come out of this.
% - there are 3 regimes: low impact (near zero), medium impact (concentrated in decreased probability of termination), and high impact (concentrated in increased probability of termination).
The first this that there appear to be three different regimes.
The first regime consists of the low impact results, i.e. those values of $\delta_p$
near zero.
About 13\% of trials lie within a single percentage point change of zero,
suggesting that there is a reasonable chance that delaying
a close of enrollment has no impact.
The second regime consists of the moderate impact on clinical trials'
probabilities of termination, say values in the interval $[-0.5, 0.5]$
on the graph.
Most of this probability mass is represents a decrease in the probability of
a termination, some of it rather large.
Finally, there exists the high impact region, almost exclusively concentrated
around increases in the probability of termination at $\delta_p > 0.75$.
These represent cases where delaying the close of enrollemnt changes a trial
from a case where they were highly likely to complete their primary objectives to
a case where they were likely or almost certain to terminate the trial early.
% - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing.
% - Potential Explanations for high impact regime:
% - Potential Explanations for high impact regime:
How could this intervention have such a wide range in the intensity
This leads to the question:
and direction of impacts?
``How could this intervention have such a wide range in the intensity
A few explanations include that some trials are suceptable or that this is a
and direction of impacts?''
result of too little data.
The most likely explanations in my mind are that either
some trials are highly suceptable to enrollment struggles or that this is a
modelling artifact.
% - Some trials are highly suceptable. This is the face value effect
% - Some trials are highly suceptable. This is the face value effect
One option is that some categories are more suceptable to
The first option -- that some categories are more suceptable to
issues with participant enrollment.
issues with participant enrollment -- should allow us to
If this is the case, we should be able to isolate categories that contribute
isolate categories or trials that contribute the most to this effect.
the most to this effect.
In figure
Another is that this might be a modelling artefact, due to the relatively
\ref{fig:pred_dist_dif_delay2}, it appears that most of the trials have
low number of trials in certain ICD-10 categories.
this high impact regime at $\delta_p > 0.75$.
Another explanation is that this is a modelling artefact due to priors
with strong tails and the relatively low number of trials in
each ICD-10 categories.
In short, there might be high levels of uncertanty in some parameter values,
In short, there might be high levels of uncertanty in some parameter values,
which manifest as fat tails in the distributions of the $\beta$ parameters.
which manifest as fat tails in the distributions of the $\beta$ parameters.
Because of the logistic format of the model, these fat tails lead to
Because of the logistic format of the model, these fat tails lead to
extreme values of $p$, and potentally large changes $\delta_p$.
extreme values of $p$, and potentally large changes $\delta_p$.
% - Could be uncertanty. If the model is highly uncertain, e.g. there isn't enough data, we could have a small percentage of large increases. This could be in general or just for a few categories with low amounts of data.
% -
% -
I believe that this second explanation -- a model artifact due to uncertanty --
I believe that this second explanation -- a model artifact due to uncertanty --
is likely to be the cause.
is likely to be the cause.
Three points lead me to believe this:
A few things lead me to believe this:
\begin{itemize}
\begin{itemize}
\item The low fractions of E-BFMI suggest that the sampler is struggling
\item The low fractions of E-BFMI suggest that the sampler is struggling
to explore some regions of the posterior.
to explore some regions of the posterior.
According to
According to \cite{standevelopmentteam_RuntimeWarnings_2022} this is
\caption{Distribution of Predicted differences by Disease Group}
\label{fig:pred_dist_dif_delay2}
\end{figure}
% Examine beta parameters
\end{itemize}
% - Little movement except where data is strong, general negative movement. Still really wide
% - Note how they all learned (partial pooling) reduction in \beta from ANR?
% - Need to discuss the 5 different states. Can't remember which one is dropped for the life of me. May need to fix parameterization.
% -
Finally, in figure \ref{fig:parameters_ANR_by_group}, we can see the estimated distributions of the $\beta$ parameter for
the status: \textbf{Active, not recruiting}.
The prior distributions were centered on zero, but we can see that the pooled learning has moved the mean
values negative, representing reductions in the probability of termination across the board.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=$),
Musculoskeletal diseases ($n=$), and Infections and Parasites ($n=$), the three categories with the most data.
As this is a comparison against the trial status XXX, we note that
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it.}
Overall, this suggests that extending a clinical trial's enrollment period will reduce the probability of termination.