Compare commits

..

No commits in common. 'caee87f86205b9b9b9db6c3458c6a2f548a4ff07' and 'fff56b52ea20682823e0fac77b54c2e04d529961' have entirely different histories.

@ -24,7 +24,7 @@
\titlespacing*{\paragraph} \titlespacing*{\paragraph}
{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex} {0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
\title{The effects of open enrollment on the \title{The effects of market conditions and enrollment on the
completion of clinical trials\\ \small{Preliminary Draft}} completion of clinical trials\\ \small{Preliminary Draft}}
\author{William King} \author{William King}

@ -81,7 +81,7 @@ open instead of closing enrollment when observed.
In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of
keeping enrollment open. keeping enrollment open.
% \begin{minipage}{\textwidth}
\begin{figure}[H] \begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot} \includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot}
\small{ \small{
@ -98,38 +98,9 @@ keeping enrollment open.
\label{fig:pred_dist_diff_delay} \label{fig:pred_dist_diff_delay}
\end{figure} \end{figure}
\begin{table}[H]
\centering
\caption{Boxplot Summary Statistics}
\label{table:boxplotsummary}
\begin{tabular}{ | c c c c c c c c | }
\hline
5th & 10th & 25th & median &
75th & 90th & 95th & mean \\
\hline
-0.376 & -0.264 & -0.129 & -0.023 &
0.145 & 0.925 & 0.982 & 0.096 \\
\hline
\end{tabular}
\end{table}
% \end{minipage}
The key figures from the boxplot in figure
\ref{fig:pred_dist_diff_delay}
are sumarized in table \ref{table:boxplotsummary}
There are a few interesting things to point out here. There are a few interesting things to point out here.
Let's start by getting aquainted with the details of the distribution above. Let's start by getting aquainted with the details of the distribution above.
A couple more points It can be devided into a few different regimes.
First, 63\% of the probability mass is equal to or below zero.
Seconds, about 13\% of the probability mass is contained within the interval
[-0.01,0.01].
The full 5\% percentile table can be found in table
\ref{TABLE:PercentilesOfDistributionOfDifferences}
in appendix
\ref{Appendix:Results}
It can also be devided into a few different regimes.
% - spike at 0 % - spike at 0
% - the boxplot % - the boxplot
% - 63% of mass below 0 : find better way to say that % - 63% of mass below 0 : find better way to say that
@ -146,15 +117,29 @@ The second regime consists of the moderate impact on clinical trials'
probabilities of termination, say values in the interval $[-0.5, 0.5]$ probabilities of termination, say values in the interval $[-0.5, 0.5]$
on the graph. on the graph.
Most of this probability mass is represents a decrease in the probability of Most of this probability mass is represents a decrease in the probability of
a termination, some of it rather large decreases. a termination, some of it rather large.
The third regime consists of the high impact region, Finally, there exists the high impact region, almost exclusively concentrated
almost exclusively concentrated above increases in the probability of around increases in the probability of termination at $\delta_p > 0.75$.
termination $\delta_p > 0.75$.
These represent cases where delaying the close of enrollemnt changes a trial These represent cases where delaying the close of enrollemnt changes a trial
from a case where they were highly likely to complete their primary objectives to from a case where they were highly likely to complete their primary objectives to
a case where they were likely or almost certain to terminate the trial early. a case where they were likely or almost certain to terminate the trial early.
% - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing. % - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing.
Based on the boxplot below, there are a couple of things to note.
First, the median effect is a 2.3 percentage point decrease
in the probability of termination.
Second, for a random selction from our trials,
there is a 63\% chance that the impact is to
reduce the probability of a termination.
Third, about 13\% of the probability mass is contained within the interval
[-0.1,0.1].
Finally, the mean effect is measured as a 9.6 percentage point increase in
the probability of termination.
The full percentile table can be found in
\ref{TABLE:PercentilesOfDistributionOfDifferences}
in appendix
\ref{Appendix:Results}
% Looking at the spike around zero, we find that 13.09% of the probability mass % Looking at the spike around zero, we find that 13.09% of the probability mass
% is contained within the band from [-1,1]. % is contained within the band from [-1,1].
% Additionally, there was 33.4282738% of the probability above that % Additionally, there was 33.4282738% of the probability above that
@ -190,20 +175,7 @@ tend to have a similar results:
Again, note the high mass near zero, the general decrease in the probability Again, note the high mass near zero, the general decrease in the probability
of termination, and then the strong upper tails. of termination, and then the strong upper tails.
Continuing to the $\beta$ parameters in figure Continuing to the $\beta$ parameters,
\ref{fig:parameters_ANR_by_group},
we can see the estimated distributions
the status: \textbf{Active, not recruiting}.
The prior distributions were centered on zero, but we can see that the
pooled learning has moved the mean
values negative, representing reductions in the probability of termination
across the board.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=49$),
Musculoskeletal diseases ($n=17$), and Infections and Parasites ($n=20$), the three categories with the most data.
As this is a comparison against the trial status XXX, we note that YYY.
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it. YES, THIS UPDATE NEEDS TO HAPPEN. The base needs to be ``active not recruiting.''}
Overall, this is consistent with the result that extending a clinical trial's enrollment period will reduce the probability of termination.
\begin{figure}[H] \begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR} \includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR}
\caption{Distribution of parameters associated with ``Active, not recruiting'' status, by ICD-10 Category} \caption{Distribution of parameters associated with ``Active, not recruiting'' status, by ICD-10 Category}
@ -211,6 +183,15 @@ Overall, this is consistent with the result that extending a clinical trial's en
\end{figure} \end{figure}
% - % -
Finally, in figure \ref{fig:parameters_ANR_by_group}, we can see the estimated distributions of the $\beta$ parameter for
the status: \textbf{Active, not recruiting}.
The prior distributions were centered on zero, but we can see that the pooled learning has moved the mean
values negative, representing reductions in the probability of termination across the board.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=49$),
Musculoskeletal diseases ($n=17$), and Infections and Parasites ($n=20$), the three categories with the most data.
As this is a comparison against the trial status XXX, we note that
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it. YES, THIS UPDATE NEEDS TO HAPPEN. The base needs to be ``active not recruiting.''}
Overall, this suggests that extending a clinical trial's enrollment period will reduce the probability of termination.
% - Potential Explanations for high impact regime: % - Potential Explanations for high impact regime:
This leads to the question: This leads to the question:
@ -220,15 +201,12 @@ The most likely explanations in my mind are that either
some trials are highly suceptable to enrollment struggles or that this is a some trials are highly suceptable to enrollment struggles or that this is a
modelling artifact. modelling artifact.
% - Some trials are highly suceptable. This is the face value effect % - Some trials are highly suceptable. This is the face value effect
The first option -- that some trials are more suceptable to The first option -- that some categories are more suceptable to
issues with participant enrollment -- should allow us to issues with participant enrollment -- should allow us to
isolate categories or trials that contribute the most to this effect. isolate categories or trials that contribute the most to this effect.
This is not what we find when we inspect the categories In figure
in figure \ref{fig:pred_dist_dif_delay2}, it appears that most of the trials have
\ref{fig:pred_dist_dif_delay2}. this high impact regime at $\delta_p > 0.75$.
Instead it appears that most of the categories have this high
impact regime when $\delta_p > 0.75$, although the maximum value
of this regime varies considerably.
Another explanation is that this is a modelling artefact due to priors Another explanation is that this is a modelling artefact due to priors
with strong tails and the relatively low number of trials in with strong tails and the relatively low number of trials in
@ -243,9 +221,7 @@ A few things lead me to believe this:
\begin{itemize} \begin{itemize}
\item The low fractions of E-BFMI suggest that the sampler is struggling \item The low fractions of E-BFMI suggest that the sampler is struggling
to explore some regions of the posterior. to explore some regions of the posterior.
According to According to \cite{standevelopmentteam_RuntimeWarnings_2022} this is
\cite{standevelopmentteam_runtimewarningsconvergence_2022}
this is
often due to thick tails of posterior distributions. often due to thick tails of posterior distributions.
During earlier analysis, when I had about 100 trials, the number of During earlier analysis, when I had about 100 trials, the number of
warnings was significantly higher. warnings was significantly higher.
@ -258,14 +234,14 @@ A few things lead me to believe this:
we see that most ICD-10 categories we see that most ICD-10 categories
have fat tails in the $\beta$s, even among the categories have fat tails in the $\beta$s, even among the categories
relatively larger sample sizes. relatively larger sample sizes.
\end{itemize} \end{itemize}
Overally it is hard to escape the conclusion that more data is needed across Overally it is hard to escape the conclusion that more data is needed across
many -- if not all -- of the disease categories. many -- if not all -- of the disease categories.
At the same time, the median result is a decrease in the probability At the same time, the median result is a decrease in the probability
of termination when the enrollment period is held open. of termination when the enrollment period is held open.
My inclination is to believe that the overall effect is to reduce the
probability of termination.
\end{document} \end{document}

@ -4,15 +4,13 @@
\begin{document} \begin{document}
As noted above, there are various issues with the analysis as completed so far. As noted above, there are various issues with the analysis as completed so far.
Below I discuss various issues and ways to address them that I believe Below I discuss various issues and ways to address them that I believe will improve the analysis.
will improve the analysis.
\subsection{Increasing number of observations} \subsection{Increasing number of observations}
The most important step is to increase the number of observations available, The most important step is to increase the number of observations available.
specifically the number of trials matched to ICD-10 codes with corresponding Currently this requires matching trials to ICD-10 codes by hand.
population estimates in the Global Burden of Disease Dataset. Improvements in Large-Language-Models may make this data more accessible, or
Improvements in Large Language Models may make this data more accessible, or
the data may be available in a commercial dataset. the data may be available in a commercial dataset.
@ -26,11 +24,13 @@ In most cases the trial sponsor reports the anticipated enrollment value
while the trial is still recruiting and only updates the actual enrollment while the trial is still recruiting and only updates the actual enrollment
after the trial has ended. after the trial has ended.
Some trials do publish an incremental record of their enrollment numbers, Some trials do publish an incremental record of their enrollment numbers,
but this is not the norm. but this is rare.
It may be possible to impute the enrollment process if a suitible model Due to the bayesian model used, it would be possible to
can be created. include a model of the missing data
% Due to the bayesian model used, this would be easy to incorporate \cite{mcelreath_statisticalrethinkingbayesian_2020}.
% \cite{mcelreath_statisticalrethinkingbayesian_2020}. which would
allow me to estimate the direct effect of slow enrollment
on clinical trial termination rates.
There has been substantial work on forecasting There has been substantial work on forecasting
multi-site enrollment rates and durations by multi-site enrollment rates and durations by
@ -51,31 +51,24 @@ multi-site enrollment rates and durations by
avalos-pacheco_validationpredictiveanalyses_2023, avalos-pacheco_validationpredictiveanalyses_2023,
} }
but choosing between the various single and multi-site models presented is but choosing between the various single and multi-site models presented is
difficult without a dataset with which to validate the results. difficult without a dataset to validate the results on.
% In addition to needing a well calibrated model, I would require more trials,
% specifically those that report their enrollment incrementally so
% that there is data on what happens when enrollment is slower than anticipated.
% It may also be possible to estimate the probability that enrollment goals \subsection{Improving Population Estimates}
% have been met if data can be extracted that details planned observation times.
% Of course, this is speculative at this point. The Global Burden of Disease dataset contains the best estimates of disease
%FIXTAG: Avoid speculation here. population sizes that I have found so far.
Unfortunately, for some conditions it can be relatively imprecise due to
% \subsection{Improving Population Estimates} its focus on providing data geared towards public health policy.
% For example, GBD contains categories for both
% The Global Burden of Disease dataset contains the best estimates of disease drug resistant and drug suceptible tuberculosis, but maps those to the same
% population sizes that I have found so far. ICD-10 code.
% Unfortunately, for some conditions it can be relatively imprecise due to In contrast, there is no category for non-age related macular degeneration.
% its focus on providing data geared towards public health policy. Thus not every trial has a good match with the estimate of the population of
% For example, GBD contains categories for both interest.
% drug resistant and drug suceptible tuberculosis, but maps those to the same Finding a way to focus on trials that have good disease population estimates
% ICD-10 code. would improve the efficiency of the analysis.
% In contrast, there is no category for non-age related macular degeneration.
% Thus not every trial has a good match with the estimate of the population of
% interest.
% Finding a way to focus on trials that have good disease population estimates
% would improve the efficiency of the analysis.
% %FIXTAG: What am I trying to say here. IHME is among the best data sources.
% % How do I propose getting other data? Should probably just remove this.
\subsection{Improving Measures of Market Conditions} \subsection{Improving Measures of Market Conditions}
@ -85,24 +78,18 @@ In addition to the fact that many diseases may be treated by non-pharmaceutical
means (e.g. diet, physical therapy, medical devices, etc), means (e.g. diet, physical therapy, medical devices, etc),
off-label prescription of pharmaceuticals is legal at the federal level off-label prescription of pharmaceuticals is legal at the federal level
\cite{commissioner_understandingunapproveduse_2019}. \cite{commissioner_understandingunapproveduse_2019}.
%FIXTAG: Discuss how there isn't much data about off label prescription (I have a source)
These two facts both complicate measuring competing treatments, These two facts both complicate measuring competing treatments,
a key part of market conditions. a key part of market conditions.
One way to address non-pharmaceutical treatments is to concentrate on domains One way to address non-pharmaceutical treatments is to concentrate on domains
that are primarily treated by pharmaceuticals. that are primarily treated by pharmaceuticals.
Another way to address this would be to focus the analysis on just a few specific Another way to address this would be to focus the analysis on just a few specific
diseases, for which a history of treatment options can be compiled. diseases, for which a history of treatment options can be compiled.
%FIXTAG: Get rid of 'another', doesn't match context
This second approach may also allow the researcher to distinguish the direction This second approach may also allow the researcher to distinguish the direction
of causality between population size and number of drugs on the market; of causality between population size and number of drugs on the market;
%FIXTAG: join better to prior sentence
for example, drugs to treat a chronic, non-fatal disease will probably not for example, drugs to treat a chronic, non-fatal disease will probably not
affect the market size much in the short to medium term. affect the market size much in the short to medium term.
This would require identifying diseases that are prime candidates and then This allows the effect of market conditions to be isolated from
trials and drugs associated with those diseases. the effects of the population.
% This allows the effect of market conditions to be isolated from
% the effects of the population.
% %FIXTAG: I am already proposing these as fixes
% Alternative approaches % Alternative approaches
% - diseases with constant kill rates? population effect should be relatively constant? % - diseases with constant kill rates? population effect should be relatively constant?

@ -5,39 +5,35 @@
Identifying commercial impediments to successfully completing Identifying commercial impediments to successfully completing
clinical trials in otherwise capable pharmaceuticals will hopefully clinical trials in otherwise capable pharmaceuticals will hopefully
lead to a more robust and competitive pharmaceutical market. lead to a more robust and competitive pharmaceutical market.
%FIXTAG: too much "hopefully"
Although the current state of this research is insufficient to draw robust Although the current state of this research is insufficient to draw robust
conclusions, these early results suggest that delaying the close of conclusions, these early results suggest that delaying the close of
enrollment period reduces the probability of termination of a trial. enrollment periods reduces the probability of termination of a trial.
%FIXTAG: OK for now but I think there might be a better way to handle this for now
% The successful completion of Phase III clinical trials is crucial for The successful completion of Phase III clinical trials is crucial for
% bringing new treatments to market. bringing new treatments to market.
%FIXTAG: needs to be earlier This research provides insights into how enrollment management
impacts trial outcomes.
While the preliminary results suggest that delaying the close of enrollment
periods may reduce termination probability, the analysis
reveals significant variation across disease categories and highlights
important methodological challenges.
The primary limitation that must be addressed before drawing a strong conclusion The primary limitation that must be addressed before drawing a strong conclusion
is that of insufficient data. is that of insufficient data.
%FIXTAG: This needs rewritten
This takes two forms. This takes two forms.
%FIXTAG: needs better transition
The first is the small sample size. The first is the small sample size.
To overcome this requires an improved data matching To overcome this requires an improved data matching
approach and a revised data scraper. approach and a revised data scraper.
%FIXTAG: active voice: this can be overcome by...
The second is creating a model of enrollment that can be used to address The second is creating a model of enrollment that can be used to address
the causal identification issue from the joint determination of the causal identification issue from the joint determination of
enrollment statuses and elapsed durations of trials. enrollment statuses and elapsed durations of trials.
%FIXTAG: sentence is too complicated
Despite these limitations, this work establishes a framework for analyzing Despite these limitations, this work establishes a framework for analyzing
operational versus strategic factors in clinical trial completion. operational versus strategic factors in clinical trial completion.
%FIXTAG: analyzing replaced with "separating causal effects"
The approach developed here can be extended with additional data to The approach developed here can be extended with additional data to
provide more definitive guidance on enrollment management strategies. provide more definitive guidance on enrollment management strategies.
%FIXTAG: the approach here + additional data can provide
%FIXTAG: tie to next sentence better
Further research in this direction could help reduce operational Further research in this direction could help reduce operational
barriers to trial completion or estimating the impact policies may have through barriers to trial completion or estimating the impact policies may have through
operational channels.%FIXTAG: clauses don't match. first clause needs tightened. operational channels.
Ultimately this work will hopefully support more efficient drug Ultimately this work will hopefully support more efficient drug
development and increased market competition. %FIXTAG: wishy-washy and duplicative. development and increased market competition.
\end{document} \end{document}

@ -3,11 +3,11 @@
\begin{document} \begin{document}
\begin{table}[h!]
\caption{Table of Percentiles of Distribution of Differences} \begin{center}
\label{TABLE:PercentilesOfDistributionOfDifferences} \label{TABLE:PercentilesOfDistributionOfDifferences}
\centering % \caption{Table of Percentiles of Distribution of Differences}
\begin{tabular}{c c} \begin{tabular}{cc}
\hline \hline
Percentile & Value \\ Percentile & Value \\
\hline \hline
@ -34,10 +34,6 @@
100\% & 1.0000000 \\ 100\% & 1.0000000 \\
\hline \hline
\end{tabular} \end{tabular}
\end{table} \end{center}
% This is here specifically to allow the table above to compile. Not sure why it is needed...
\begin{table}[h!]
\end{table}
\end{document} \end{document}

@ -47,14 +47,3 @@
**** TODO Redo analysis using "Recruitng" as the base status **** TODO Redo analysis using "Recruitng" as the base status
The goal is to get the $\beta$'s for active, not recruitng. The goal is to get the $\beta$'s for active, not recruitng.
**** TODO Rerun analysis with correct base
[[[[file:/mnt/backups/home/dad/research/PhD_Deliverables/JobMarketPaper/Paper/sections/06_Results.tex::204]]]]
The natural comparison I want to make is against the Recruting status.
Do I want to redo this so that I can read that directly?
It shouldn't affect the $\delta_p$ analysis, but this could probably use it.
YES, THIS UPDATE NEEDS TO HAPPEN. The base needs to be ``active not recruiting.''
So the plan is to set ``Active, not recruiting'' as the base condition, then
measure the effect when that is chagned to ``Recruiting''.
If that is negative, then extending recruiting reduces the probability of
termination.

Loading…
Cancel
Save