Results, conclusion, defficiencies

Added details about results, tweaked the conclusion thesis, and added
details about deficiencies.
claude_rewrite
Will King 1 year ago
parent 963293fc2b
commit 70ef27c57a

@ -84,7 +84,7 @@ Section \ref{SEC:Results} discusses the results of the analysis.
\subfile{sections/06_Results}
%---------------------------------------------------------------
\section{Improvements}\label{SEC:Improvements}
\section{Deficiencies and Improvements}\label{SEC:Improvements}
%---------------------------------------------------------------
\subfile{sections/08_PotentialImprovements}

@ -1,5 +1,5 @@
layout {
tab name="Main and Compile" cwd="~/research/phd_deliverables/jmp/Latex/Paper" hide_floating_panes=true focus=true {
tab name="Main and Compile" cwd="~/research/phd_deliverables/JobMarketPaper/Paper" hide_floating_panes=true focus=true {
// This tab is where I manage main from.
// it opens up Main.txt for my JMP, opens the pdf in okular (in a floating tab), and then get's ready to build the pdf.
pane size=1 borderless=true {
@ -33,7 +33,7 @@ layout {
}
}
tab name="sections" cwd="~/research/phd_deliverables/jmp/Latex/Paper/sections" {
tab name="sections" cwd="~/research/phd_deliverables/JobMarketPaper/Paper" {
pane size=1 borderless=true {
plugin location="tab-bar"
}
@ -56,7 +56,7 @@ layout {
}
}
tab name="git" cwd="~/research/phd_deliverables/jmp/Latex/Paper/" {
tab name="git" cwd="~/research/phd_deliverables/JobMarketPaper/Latex/Paper/" {
pane size=1 borderless=true {
plugin location="tab-bar"
}

@ -27,21 +27,18 @@ the correlation (measured at $0.34$) is apparent.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/trials_details/HistTrialDurations_Faceted}
\todo{Replace this graphic with the histogram of trial durations}
\caption{Histograms of Trial Durations}
\label{fig:trial_durations}
\end{figure}
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/trials_details/HistSnapshots}
\todo{Replace this graphic with the histogram of snapshots}
\caption{Histogram of the count of Snapshots}
\label{fig:snapshot_counts}
\end{figure}
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/trials_details/SnapshotsVsDurationVsTermination}
\todo{Replace this graphic with the scatterplot comparing durations and snapshots}
\caption{Scatterplot comparing the Count of Snapshots and Trial Duration}
\label{fig:snapshot_counts}
\end{figure}
@ -86,7 +83,6 @@ keeping enrollment open.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot}
\todo{Replace this graphic with the histdiff with boxplot}
\small{
Values near 1 indicate a near perfect increase in the probability
of termination.
@ -160,9 +156,8 @@ Three points lead me to believe this:
often due to thick tails of posterior distributions.
\item When we examine the results across different ICD-10 groups,
\ref{fig:pred_dist_dif_delay2}
\todo{move figure from below}
we note this same issue.
\item In Figure \ref{fig:betas_delay}, we see that some some ICD-10 categories
\item In Figure \ref{fig:parameters_ANR_by_group}, we see that some some ICD-10 categories
\todo{add figure}
have \todo{note fat tails}.
\item There are few trials available, particularly among some specific
@ -173,9 +168,11 @@ Three points lead me to believe this:
% -
% -
% -
Overally it is hard to escape the conclusion that more data is needed across
many -- if not all -- of the disease categories.
We can examine the per-group distributions of differences in \ref{fig:pred_dist_dif_delay2} to
acertain that the high impact group does exist in each of the groups.
This lends credence to the idea that this is a modelling issue, potentially
due to the low amounts of data overall.
Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
@ -187,13 +184,21 @@ result comes from different disease categories.
\end{figure}
\subsection{Secondary Results}
% Examine beta parameters
% - Little movement except where data is strong, general negative movement. Still really wide
% - Note how they all learned (partial pooling) reduction in \beta from ANR?
% - Need to discuss the 5 different states. Can't remember which one is dropped for the life of me. May need to fix parameterization.
% -
Finally, in figure \ref{fig:parameters_ANR_by_group}, we can see the estimated distributions of the $\beta$ parameter for
the status: \textbf{Active, not recruiting}.
The prior distributions were centered on zero, but we can see that the pooled learning has moved the mean
values negative, representing reductions in the probability of termination across the board.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=$),
Musculoskeletal diseases ($n=$), and Infections and Parasites ($n=$), the three categories with the most data.
As this is a comparison against the trial status XXX, we note that
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it.}
Overall, this suggests that extending a clinical trial's enrollment period will reduce the probability of termination.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR}
@ -202,5 +207,9 @@ result comes from different disease categories.
\end{figure}
% -
Overally it is hard to escape the conclusion that more data is needed across
many -- if not all -- of the disease categories.
\end{document}

@ -4,64 +4,33 @@
\begin{document}
As noted above, there are various issues with the analysis as completed so far.
Below I discuss various steps that I believe will improve the analysis.
Below I discuss various issues and ways to address them that I believe will improve the analysis.
\subsection{Increasing number of observations}
The most important step is to increase the number of observations available.
Currently this requires matching trials to ICD-10 codes by hand, but
there are certainly some steps that can be taken to improve the speed with which
this can be done.
%
% \subsection{Covariance Structure}
%
% As noted in the diagnostics section, many of the convergence issues seem
% to occure in the covariance structure.
% Instead of representing the parameters $\beta$ as independently normal:
% \begin{align}
% \beta_k(d) \sim \text{Normal}(\mu_k, \sigma_k)
% \end{align}
% I propose using a multivariate normal distribution:
% \begin{align}
% \beta(d) \sim \text{MvNormal}(\mu, \Sigma)
% \end{align}
% I am not familiar with typical approaches to priors on the covariance matrix,
% so this will require a further literature search as to best practices.
Currently this requires matching trials to ICD-10 codes by hand.
Improvements in Large-Language-Models may make this data more accessible, or
the data may be available in a commercial dataset.
% \subsection{Finding Reasonable Priors}
%
% In standard bayesian regression, heavy tailed priors are common.
% When working with a bayesian bernoulli-logit model, this is not appropriate as
% heavy tails cause the estimated probabilities $p_n$ to concentrate around the
% values $0$ and $1$, and away from values such as $\frac{1}{2}$ as discussed in
% \cite{mcelreath_statistical_2020}. %TODO: double check the chapter for this.
%
% I indend to take the general approach recommended in \cite{mcelreath_statistical_2020} of using
% prior predictive checks to evaluate the implications of different priors
% on the distribution on $p_n$.
% This would consist of taking the independent variables and predicting the values
% of $p_n$ based on a proposed set of priors.
% By plotting these predictions, I can ensure that the specific parameter priors
% used are consistent with my prior beliefs on how $p_n$ behaves.
% Currently I believe that $p_n$ should be roughly uniform or unimodal, centered
% around $p_n = \frac{1}{2}$.
%
\subsection{Imputing Enrollment}
\subsection{Enrollment Modelling}
Finally, I must address the issue of how enrollment is reported.
In many cases, the trial continues to report an anticipated enrollment value
while the trial is still recruiting.
Thus using anticipated enrollment figures is inappropriate.
I am planning on using bayesian imputation to estimate actual enrollment
when it has not yet occured.
This will require building a statistical model of the enrollment process.
One advantage this dataset has is that trial sponsors provide their anticipated
enrollment numbers, allowing me to use this in the prediction model.
Additionally, each snapshot contains the elapsed duration and current status of
the trial , which may help improve the prediction.
Although predicted enrollment will be imprecise, it explicitly accounts for
uncertanty in the imputation and dependent calculations \cite{mcelreath_statistical_2020}.
One of the original goals of this project was to examine the impact that
enrollment struggles have on the probability of trial termination.
Unfortunately, this requires a model of clinical trial enrollment, and the
data is just not in my dataset.
In most cases the trial sponsor reports the anticipated enrollment value
while the trial is still recruiting and only updates the actual enrollment
after the trial has ended.
Some trials do publish an up to date record of their enrollment numbers, but this
is rare.
If a bayesian model of multisite enrollment can be developed for the disease categories
in question, then it will be possible to impute this missing data probabalistically,
which will allow me to estimate the direct effect of slow enrollment
\cite{mcelreath_statistical_2020}.
This does not exist yet, although some work on multi-site enrollment forecasting has
been done by \cite{CHECK ZOTERO NOTES FOR CITATIONS}
\subsection{Improving Population Estimates}
@ -74,24 +43,31 @@ drug resistant and drug suceptible tuberculosis.
In contrast, there is no category for non-age related macular degeneration.
One resulting concern is that for a given ICD-10 code, the applicable GBD population
estimates may act as an estimate of the upper bound of population size
(\cite{global_burden_of_disease_collective_network_global_2020}). %fix citation
I would like to explicitly address this in my model, although I have not
found a way to do so.
(\cite{global_burden_of_disease_collective_network_global_2020}).
The dataset contains various measures of disease severity, so it may be
worth investigating how to incorporate some of those measures.
\subsection{Improving Measures of Market Conditions}
% Deficiency: cannot measure effect of market conditions because of endogenetiy of population and market conditions (fatal diseases)
In addition to the fact that many diseases may be treated by non-pharmaceutical
means, off-label prescription of pharmaceuticals is legal at the federal level
(\cite{commissioner_understanding_2019}).
These two facts both complicate measuring market conditions.
One way to address non-pharmaceutical treatments is to concentrate on domains
that are primarily treated by pharmaceuticals.
This requires domain knowledge that I don't have.
% One dataset that I have only investigated briefly is the \url{DrugCentral.org}
% database which tracks official indications and some off-label indications as
% well
% (\cite{ursu_drugcentral_2017}).
Another way to address this would be to focus the analysis on just a few specific
diseases, for which a history of treatment options can be compiled.
This second approach may also allow the researcher to distinguish the direction
of causality between population size and number of drugs on the market;
for example, drugs to treat a chronic, non-fatal disease will probably not
affect the market size much in the short to medium term.
This allows the effect of market conditions to be isolated from
the effects of the population.
% Alternative approaches
% - diseases with constant kill rates? population effect should be relatively constant?
\end{document}

@ -6,9 +6,7 @@ Identifying commercial impediments to successfully completing
clinical trials in otherwise capable pharmaceuticals will hopefully
lead to a more robust and competitive market.
Although the current state of this research is insufficient to draw robust
conclusions, early results suggest that enrollment rates have some impact
on whether or not a clinical trial terminates early or continues
to full completion.
conclusions, these early results suggest that delaying the close of enrollment periods
reduces the probability of termination of a trial.
\end{document}

Loading…
Cancel
Save