Results, conclusion, defficiencies

Added details about results, tweaked the conclusion thesis, and added details about deficiencies.
1 year ago · 70ef27c57a
parent 963293fc2b
commit 70ef27c57a
5 changed files with 59 additions and 76 deletions
--- a/Paper/Main.tex
+++ b/Paper/Main.tex
@ -84,7 +84,7 @@ Section \ref{SEC:Results} discusses the results of the analysis.
 \subfile{sections/06_Results}

 %---------------------------------------------------------------
-\section{Improvements}\label{SEC:Improvements}
+\section{Deficiencies and Improvements}\label{SEC:Improvements}
 %---------------------------------------------------------------
 \subfile{sections/08_PotentialImprovements}

--- a/Paper/jmp_layout_laptop.kdl
+++ b/Paper/jmp_layout_laptop.kdl
@ -1,5 +1,5 @@
 layout {
-    tab name="Main and Compile" cwd="~/research/phd_deliverables/jmp/Latex/Paper" hide_floating_panes=true focus=true {
+    tab name="Main and Compile" cwd="~/research/phd_deliverables/JobMarketPaper/Paper"  hide_floating_panes=true focus=true {
    // This tab is where I manage main from. 
    // it opens up Main.txt for my JMP, opens the pdf in okular (in a floating tab), and then get's ready to build the pdf.
        pane size=1 borderless=true {
@ -33,7 +33,7 @@ layout {
        }
    }

-    tab name="sections" cwd="~/research/phd_deliverables/jmp/Latex/Paper/sections" {
+    tab name="sections" cwd="~/research/phd_deliverables/JobMarketPaper/Paper"  {
        pane size=1 borderless=true {
            plugin location="tab-bar"
        }
@ -56,7 +56,7 @@ layout {
        }
    }

-    tab name="git" cwd="~/research/phd_deliverables/jmp/Latex/Paper/" {
+    tab name="git" cwd="~/research/phd_deliverables/JobMarketPaper/Latex/Paper/" {
        pane size=1 borderless=true {
            plugin location="tab-bar"
        }
--- a/Paper/sections/06_Results.tex
+++ b/Paper/sections/06_Results.tex
@ -27,21 +27,18 @@ the correlation (measured at $0.34$) is apparent.

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/trials_details/HistTrialDurations_Faceted}
-    \todo{Replace this graphic with the histogram of trial durations}
    \caption{Histograms of Trial Durations}
    \label{fig:trial_durations}
 \end{figure}

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/trials_details/HistSnapshots}
-    \todo{Replace this graphic with the histogram of snapshots}
    \caption{Histogram of the count of Snapshots}
    \label{fig:snapshot_counts}
 \end{figure}

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/trials_details/SnapshotsVsDurationVsTermination}
-    \todo{Replace this graphic with the scatterplot comparing durations and snapshots}
    \caption{Scatterplot comparing the Count of Snapshots and Trial Duration}
    \label{fig:snapshot_counts}
 \end{figure}
@ -86,7 +83,6 @@ keeping enrollment open.

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot}
-    \todo{Replace this graphic with the histdiff with boxplot}
    \small{
        Values near 1 indicate a near perfect increase in the probability 
        of termination. 
@ -160,9 +156,8 @@ Three points lead me to believe this:
        often due to thick tails of posterior distributions.
    \item When we examine the results across different ICD-10 groups, 
        \ref{fig:pred_dist_dif_delay2}
-        \todo{move figure from below}
        we note this same issue.
-    \item In Figure \ref{fig:betas_delay}, we see that some some ICD-10 categories
+    \item In Figure \ref{fig:parameters_ANR_by_group}, we see that some some ICD-10 categories
        \todo{add figure}
        have \todo{note fat tails}.
    \item There are few trials available, particularly among some specific 
@ -173,9 +168,11 @@ Three points lead me to believe this:
 % - 
 % - 
 % - 
-Overally it is hard to escape the conclusion that more data is needed across
-many -- if not all -- of the disease categories.

+We can examine the per-group distributions of differences in \ref{fig:pred_dist_dif_delay2} to 
+acertain that the high impact group does exist in each of the groups.
+This lends credence to the idea that this is a modelling issue, potentially
+due to the low amounts of data overall.


 Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
@ -187,13 +184,21 @@ result comes from different disease categories.
 \end{figure}


-\subsection{Secondary Results}

 % Examine beta parameters 
 % - Little movement except where data is strong, general negative movement. Still really wide 
 % - Note how they all learned (partial pooling) reduction in \beta from ANR?
 % - Need to discuss the 5 different states. Can't remember which one is dropped for the life of me. May need to fix parameterization.
 % - 
+Finally, in figure \ref{fig:parameters_ANR_by_group}, we can see the estimated distributions of the $\beta$ parameter for
+the status: \textbf{Active, not recruiting}.
+The prior distributions were centered on zero, but we can see that the pooled learning has moved the mean
+values negative, representing reductions in the probability of termination across the board. 
+This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=$),
+Musculoskeletal diseases ($n=$), and Infections and Parasites ($n=$), the three categories with the most data.
+As this is a comparison against the trial status XXX, we note that
+\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it.}
+Overall, this suggests that extending a clinical trial's enrollment period will reduce the probability of termination.

 \begin{figure}[H]
    \includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR}
@ -202,5 +207,9 @@ result comes from different disease categories.
 \end{figure}
 % - 

+Overally it is hard to escape the conclusion that more data is needed across
+many -- if not all -- of the disease categories.

 \end{document}
+
+
--- a/Paper/sections/08_PotentialImprovements.tex
+++ b/Paper/sections/08_PotentialImprovements.tex
@ -4,64 +4,33 @@
 \begin{document}

 As noted above, there are various issues with the analysis as completed so far.
-Below I discuss various steps that I believe will improve the analysis.
+Below I discuss various issues and ways to address them that I believe will improve the analysis.

 \subsection{Increasing number of observations}

 The most important step is to increase the number of observations available.
-Currently this requires matching trials to ICD-10 codes by hand, but
-there are certainly some steps that can be taken to improve the speed with which
-this can be done.
-%
-% \subsection{Covariance Structure}
-%
-% As noted in the diagnostics section, many of the convergence issues seem
-% to occure in the covariance structure. 
-% Instead of representing the parameters $\beta$ as independently normal:
-% \begin{align}
-%     \beta_k(d) \sim \text{Normal}(\mu_k, \sigma_k)
-% \end{align}
-% I propose using a multivariate normal distribution:
-% \begin{align}
-%     \beta(d) \sim \text{MvNormal}(\mu, \Sigma)
-% \end{align}
-% I am not familiar with typical approaches to priors on the covariance matrix,
-% so this will require a further literature search as to best practices.
+Currently this requires matching trials to ICD-10 codes by hand.
+Improvements in Large-Language-Models may make this data more accessible, or
+the data may be available in a commercial dataset.

-% \subsection{Finding Reasonable Priors}
-%
-% In standard bayesian regression, heavy tailed priors are common. 
-% When working with a bayesian bernoulli-logit model, this is not appropriate as 
-% heavy tails cause the estimated probabilities $p_n$ to concentrate around the 
-% values $0$ and $1$, and away from values such as $\frac{1}{2}$ as discussed in
-% \cite{mcelreath_statistical_2020}. %TODO: double check the chapter for this.
-%
-% I indend to take the general approach recommended in \cite{mcelreath_statistical_2020} of using
-% prior predictive checks to evaluate the implications of different priors
-% on the distribution on $p_n$.
-% This would consist of taking the independent variables and predicting the values
-% of $p_n$ based on a proposed set of priors. 
-% By plotting these predictions, I can ensure that the specific parameter priors 
-% used are consistent with my prior beliefs on how $p_n$ behaves.
-% Currently I believe that $p_n$ should be roughly uniform or unimodal, centered 
-% around $p_n = \frac{1}{2}$.
-%

-\subsection{Imputing Enrollment}
+\subsection{Enrollment Modelling}

-Finally, I must address the issue of how enrollment is reported.
-In many cases, the trial continues to report an anticipated enrollment value
-while the trial is still recruiting.
-Thus using anticipated enrollment figures is inappropriate.
-I am planning on using bayesian imputation to estimate actual enrollment
-when it has not yet occured. 
-This will require building a statistical model of the enrollment process.
-One advantage this dataset has is that trial sponsors provide their anticipated
-enrollment numbers, allowing me to use this in the prediction model.
-Additionally, each snapshot contains the elapsed duration and current status of
-the trial , which may help improve the prediction.
-Although predicted enrollment will be imprecise, it explicitly accounts for
-uncertanty in the imputation and dependent calculations \cite{mcelreath_statistical_2020}.
+One of the original goals of this project was to examine the impact that 
+enrollment struggles have on the probability of trial termination. 
+Unfortunately, this requires a model of clinical trial enrollment, and the
+data is just not in my dataset.
+In most cases the trial sponsor reports the anticipated enrollment value
+while the trial is still recruiting and only updates the actual enrollment
+after the trial has ended.
+Some trials do publish an up to date record of their enrollment numbers, but this
+is rare. 
+If a bayesian model of multisite enrollment can be developed for the disease categories
+in question, then it will be possible to impute this missing data probabalistically,
+which will allow me to estimate the direct effect of slow enrollment
+\cite{mcelreath_statistical_2020}.
+This does not exist yet, although some work on multi-site enrollment forecasting has 
+been done by \cite{CHECK ZOTERO NOTES FOR CITATIONS} 

 \subsection{Improving Population Estimates}

@ -74,24 +43,31 @@ drug resistant and drug suceptible tuberculosis.
 In contrast, there is no category for non-age related macular degeneration.
 One resulting concern is that for a given ICD-10 code, the applicable GBD population 
 estimates may act as an estimate of the upper bound of population size
-(\cite{global_burden_of_disease_collective_network_global_2020}). %fix citation
-I would like to explicitly address this in my model, although I have not 
-found a way to do so.
+(\cite{global_burden_of_disease_collective_network_global_2020}).
+The dataset contains various measures of disease severity, so it may be 
+worth investigating how to incorporate some of those measures.


 \subsection{Improving Measures of Market Conditions}

+% Deficiency: cannot measure effect of market conditions because of endogenetiy of population and market conditions (fatal diseases)
+
 In addition to the fact that many diseases may be treated by non-pharmaceutical 
 means, off-label prescription of pharmaceuticals is legal at the federal level 
 (\cite{commissioner_understanding_2019}).
 These two facts both complicate measuring market conditions.
 One way to address non-pharmaceutical treatments is to concentrate on domains
 that are primarily treated by pharmaceuticals.
-This requires domain knowledge that I don't have.
-% One dataset that I have only investigated briefly is the \url{DrugCentral.org}
-% database which tracks official indications and some off-label indications as 
-% well
-% (\cite{ursu_drugcentral_2017}).
+Another way to address this would be to focus the analysis on just a few specific
+diseases, for which a history of treatment options can be compiled.
+This second approach may also allow the researcher to distinguish the direction
+of causality between population size and number of drugs on the market; 
+for example, drugs to treat a chronic, non-fatal disease will probably not 
+affect the market size much in the short to medium term.
+This allows the effect of market conditions to be isolated from
+the effects of the population.

+% Alternative approaches 
+% - diseases with constant kill rates? population effect should be relatively constant?

 \end{document}
--- a/Paper/sections/09_Conclusion.tex
+++ b/Paper/sections/09_Conclusion.tex
@ -6,9 +6,7 @@ Identifying commercial impediments to successfully completing
 clinical trials in otherwise capable pharmaceuticals will hopefully 
 lead to a more robust and competitive market.
 Although the current state of this research is insufficient to draw robust
-conclusions, early results suggest that enrollment rates have some impact
-on whether or not a clinical trial terminates early or continues
-to full completion.
-
+conclusions, these early results suggest that delaying the close of enrollment periods
+reduces the probability of termination of a trial.

 \end{document}