Compare commits

..

3 Commits
main ... master

Author SHA1 Message Date
Will King 1672210931 adding notes 12 months ago
Will King 4bf321b475 Notes and plans for writing 1 year ago
Will King 88be4b7a38 updated todo and plan 1 year ago

@ -0,0 +1,25 @@
Key points
This is an attempt at measuring the effect of extending the enrollment period.
The main issue is that the interaction between enrollment levels, enrollment status, and timing is confounded due to endogeneity.
This can be addressed
The other concerns are:
- endogeneity between market and population.
I this isn't a caual issue because it is contained between the two, can be treated as a single RV and controlled for together.
- ommitted variable bias. Did I forget or miss anything?
- The DAG is based on the details outlined based on FDA rules. I NEED TO LOOK THOSE UP AGAIN. The Assumptions that allow this to work are:
1. timeliness/accuracy in reporting open and close
2. updating certain details (open/close recruitment) is helpful because this is part of your marketing. (Concerns about measurement error)
3.
- Where did the DAG come from?
In spite of the endogeneity issue, I chose to continue modelling as if it were causal, because:
1. If we assume an intervetion that is handles the joint timing/enrollment status together, then it is causally identified (but hard to interpret)
- Walking away from identification is an issue in that you lose the use of this analysis
- Interpretation is as follows: changing enrollment status but breaking out of the standard timing of these things. Need a better way to say that.
2.
This is the only attempt I've found that tries to address this in a causal way, everything else is just descriptive.
It also differs in being the first econ literature on measuring the impact of an operational concern.

@ -24,7 +24,7 @@
\titlespacing*{\paragraph}
{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
\title{The effects of open enrollment on the
\title{The effects of market conditions and enrollment on the
completion of clinical trials\\ \small{Preliminary Draft}}
\author{William King}
@ -94,19 +94,15 @@ completion of clinical trials\\ \small{Preliminary Draft}}
\printbibliography
\newpage
\appendix
%---------------------------------------------------------------
\section{Diagnostics}\label{Appendix:Diagnostics}
\section{Appendicies}
%---------------------------------------------------------------
\subfile{sections/21_appendix_diagnostics}
%---------------------------------------------------------------
\section{Other Statistical Results}\label{Appendix:Results}
%---------------------------------------------------------------
\subfile{sections/22_appendix_full_results}
\newpage
\tableofcontents
\end{document}
% NOTES:
%
%

@ -0,0 +1,27 @@
Plan
get list of things that Tom says I'm Missing
- Needs more citations
- Standard econometric concerns: Endogenetiy, Simultineatiy, etc.
- Needs to justify why I am doing what I am doing. What do I add?
Marketwide attempt to measure the impact of enrollment, an operational concern.
-
Integrate additional literature I've worked with.
- How big of a concern is operational results (about 22% of failures)
- Topics of how to address issues and what issues arise are common (give a couple of examples)
- Efforts to reduce failures include better pharmokinetics, attempts at improving enrollment, better enrollment prediction (huge lit).
Then look at my outline:
- How can I adjust it to address those missing bits?
- How can I simplify the structure?
Maybe a discussion of concerns about simultineity/endogeneity/other confounds/etc is where I
bring up the confounding parameters and then build a list of how things interact.
I then use this to flesh out the DAG, and introduce the backdoor criterion.
I think I'll put this together as a bullet point draft, using the * and -
notation for paragraphs and sentences respectively. Try to get the main points
of each sentence/paragraph out.

@ -71,6 +71,33 @@ not represented at all.
\label{FIG:barchart_idc_categories}
\end{figure}
% Estimation Procedure
I fit the econometric model using mc-stan
\cite{standevelopmentteam_StanModelling_2022}
through the rstan
\cite{standevelopmentteam_RStanInterface_2023}
interface using 4 chains with
%describe
2,500
warmup iterations and
2,500
sampling iterations each.
Two of the chains experienced a low
Estimated Baysian Fraction of Missing Information (E-BFMI) ,
suggesting that there are some parts of the posterior distribution
that were not explored well during the model fitting.
I presume this is due to the low number of trials in some of the
ICD-10 categories.
We can see in Figure \ref{fig:barchart_idc_categories} that some of these
disease categories had a single trial represented while others were
not represented at all.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/trials_details/CategoryCounts}
\caption{Bar chart of trials by ICD-10 categories}
\label{fig:barchart_idc_categories}
\end{figure}
\subsection{Primary Results}
@ -81,9 +108,10 @@ open instead of closing enrollment when observed.
In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of
keeping enrollment open.
% \begin{minipage}{\textwidth}
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot}
\todo{Replace this graphic with the histdiff with boxplot}
\small{
Values near 1 indicate a near perfect increase in the probability
of termination.
@ -98,45 +126,18 @@ keeping enrollment open.
\label{fig:pred_dist_diff_delay}
\end{figure}
\begin{table}[H]
\centering
\caption{Boxplot Summary Statistics}
\label{table:boxplotsummary}
\begin{tabular}{ | c c c c c c c c | }
\hline
5th & 10th & 25th & median &
75th & 90th & 95th & mean \\
\hline
-0.376 & -0.264 & -0.129 & -0.023 &
0.145 & 0.925 & 0.982 & 0.096 \\
\hline
\end{tabular}
\end{table}
% \end{minipage}
The key figures from the boxplot in figure
\ref{fig:pred_dist_diff_delay}
are sumarized in table \ref{table:boxplotsummary}
There are a few interesting things to point out here.
Let's start by getting aquainted with the details of the distribution above.
A couple more points
First, 63\% of the probability mass is equal to or below zero.
Seconds, about 13\% of the probability mass is contained within the interval
[-0.01,0.01].
The full 5\% percentile table can be found in table
\ref{TABLE:PercentilesOfDistributionOfDifferences}
in appendix
\ref{Appendix:Results}
It can also be devided into a few different regimes.
% - spike at 0
% - the boxplot
% - 63% of mass below 0 : find better way to say that
% - For a random trial, there is a 63% chance that the impact is to reduce the probability of a termination.
% - 2 pctg-point wide band centered on 0 has ~13% of the masss
% - mean represents 9.x% increase in probability of termination. A quick simulation gives about the same pctg-point increase in terminated trials.
A few interesting interpretation bits come out of this.
% - there are 3 regimes: low impact (near zero), medium impact (concentrated in decreased probability of termination), and high impact (concentrated in increased probability of termination).
The first this that there appear to be three different regimes.
The first regime consists of the low impact results, i.e. those values of $\delta_p$
near zero.
About 13\% of trials lie within a single percentage point change of zero,
@ -146,63 +147,78 @@ The second regime consists of the moderate impact on clinical trials'
probabilities of termination, say values in the interval $[-0.5, 0.5]$
on the graph.
Most of this probability mass is represents a decrease in the probability of
a termination, some of it rather large decreases.
The third regime consists of the high impact region,
almost exclusively concentrated above increases in the probability of
termination $\delta_p > 0.75$.
a termination, some of it rather large.
Finally, there exists the high impact region, almost exclusively concentrated
around increases in the probability of termination at $\delta_p > 0.75$.
These represent cases where delaying the close of enrollemnt changes a trial
from a case where they were highly likely to complete their primary objectives to
a case where they were likely or almost certain to terminate the trial early.
% - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing.
% Looking at the spike around zero, we find that 13.09% of the probability mass
% is contained within the band from [-1,1].
% Additionally, there was 33.4282738% of the probability above that
% representing those with a general increase in the
% probability of termination and 53.4817262% of the probability mass
% below the band representing a decrease in the probability of termination.
% On average, if you keep the trial open instead of closing it, 0.6337363% of
% trials will see a decrease in the probability of termination, but, due to
% the high increase in probability of termination given termination was
% increased, the mean probability of termination increases by 0.0964726.
% Pulled the data from the report
% ```{r}
% summary(pddf_ib$value)
% Min. 1st Qu. Median Mean 3rd Qu. Max.
% -0.99850 -0.12919 -0.02259 0.09647 0.14531 1.00000
% quants <- quantile(pddf_ib$value, probs = seq(0,1,0.05), type=4)
% # Convert to a data frame
% quant_df <- data.frame( Percentile = names(quants), Value = quants )
% kable(quant_df)
% Percentile Value
% SEE TABLE IN APPENDIX
%```
Figure \ref{fig:pred_dist_dif_delay2} shows how the different disease categories
tend to have a similar results:
% - Potential Explanations for high impact regime:
How could this intervention have such a wide range in the intensity
and direction of impacts?
A few explanations include that some trials are suceptable or that this is a
result of too little data.
% - Some trials are highly suceptable. This is the face value effect
One option is that some categories are more suceptable to
issues with participant enrollment.
If this is the case, we should be able to isolate categories that contribute
the most to this effect.
Another is that this might be a modelling artefact, due to the relatively
low number of trials in certain ICD-10 categories.
In short, there might be high levels of uncertanty in some parameter values,
which manifest as fat tails in the distributions of the $\beta$ parameters.
Because of the logistic format of the model, these fat tails lead to
extreme values of $p$, and potentally large changes $\delta_p$.
% - Could be uncertanty. If the model is highly uncertain, e.g. there isn't enough data, we could have a small percentage of large increases. This could be in general or just for a few categories with low amounts of data.
% -
% -
I believe that this second explanation -- a model artifact due to uncertanty --
is likely to be the cause.
Three points lead me to believe this:
\begin{itemize}
\item The low fractions of E-BFMI suggest that the sampler is struggling
to explore some regions of the posterior.
According to \cite{standevelopmentteam_RuntimeWarnings_2022} this is
often due to thick tails of posterior distributions.
\item When we examine the results across different ICD-10 groups,
\ref{fig:pred_dist_dif_delay2}
\todo{move figure from below}
we note this same issue.
\item In Figure \ref{fig:betas_delay}, we see that some some ICD-10 categories
\todo{add figure}
have \todo{note fat tails}.
\item There are few trials available, particularly among some specific
ICD-10 categories.
\end{itemize}
% - take a look at beta values and then discuss if that lines up with results from dist-diff by group.
% - My initial thought is that there is not enough data/too uncertain. I think this because it happens for most/all of the categories.
% -
% -
% -
Overally it is hard to escape the conclusion that more data is needed across
many -- if not all -- of the disease categories.
Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
result comes from different disease categories.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_by_group}
\caption{Distribution of Predicted differences by Disease Group}
\label{fig:pred_dist_dif_delay2}
\end{figure}
Again, note the high mass near zero, the general decrease in the probability
of termination, and then the strong upper tails.
Continuing to the $\beta$ parameters in figure
\ref{fig:parameters_ANR_by_group},
we can see the estimated distributions
the status: \textbf{Active, not recruiting}.
The prior distributions were centered on zero, but we can see that the
pooled learning has moved the mean
values negative, representing reductions in the probability of termination
across the board.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=49$),
Musculoskeletal diseases ($n=17$), and Infections and Parasites ($n=20$), the three categories with the most data.
As this is a comparison against the trial status XXX, we note that YYY.
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it. YES, THIS UPDATE NEEDS TO HAPPEN. The base needs to be ``active not recruiting.''}
Overall, this is consistent with the result that extending a clinical trial's enrollment period will reduce the probability of termination.
\subsection{Secondary Results}
% Examine beta parameters
% - Little movement except where data is strong, general negative movement. Still really wide
% - Note how they all learned (partial pooling) reduction in \beta from ANR?
% - Need to discuss the 5 different states. Can't remember which one is dropped for the life of me. May need to fix parameterization.
% -
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR}
@ -211,62 +227,147 @@ Overall, this is consistent with the result that extending a clinical trial's en
\end{figure}
% -
\subsection{Primary Results}
The primary, causally-identified value we can estimate is the change in
the probability of termination caused by (counterfactually) keeping enrollment
open instead of closing enrollment when observed.
In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of
keeping enrollment open.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot}
\small{
Values near 1 indicate a near perfect increase in the probability
of termination.
Values near 0 indicate little change in probability,
while values near -1, represent a decrease in the probability
of termination.
The scale is in probability points, thus a value near 1 is a change
from unlikely to terminate under control, to highly likely to
terminate.
}
\caption{Histogram of the Distribution of Predicted Differences}
\label{fig:pred_dist_diff_delay}
\end{figure}
There are a few interesting things to point out here.
Let's start by getting aquainted with the details of the distribution above.
% - spike at 0
% - the boxplot
% - 63% of mass below 0 : find better way to say that
% - For a random trial, there is a 63% chance that the impact is to reduce the probability of a termination.
% - 2 pctg-point wide band centered on 0 has ~13% of the masss
% - mean represents 9.x% increase in probability of termination. A quick simulation gives about the same pctg-point increase in terminated trials.
A few interesting interpretation bits come out of this.
% - there are 3 regimes: low impact (near zero), medium impact (concentrated in decreased probability of termination), and high impact (concentrated in increased probability of termination).
The first this that there appear to be three different regimes.
The first regime consists of the low impact results, i.e. those values of $\delta_p$
near zero.
About 13\% of trials lie within a single percentage point change of zero,
suggesting that there is a reasonable chance that delaying
a close of enrollment has no impact.
The second regime consists of the moderate impact on clinical trials'
probabilities of termination, say values in the interval $[-0.5, 0.5]$
on the graph.
Most of this probability mass is represents a decrease in the probability of
a termination, some of it rather large.
Finally, there exists the high impact region, almost exclusively concentrated
around increases in the probability of termination at $\delta_p > 0.75$.
These represent cases where delaying the close of enrollemnt changes a trial
from a case where they were highly likely to complete their primary objectives to
a case where they were likely or almost certain to terminate the trial early.
% - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing.
% - Potential Explanations for high impact regime:
This leads to the question:
``How could this intervention have such a wide range in the intensity
and direction of impacts?''
The most likely explanations in my mind are that either
some trials are highly suceptable to enrollment struggles or that this is a
modelling artifact.
How could this intervention have such a wide range in the intensity
and direction of impacts?
A few explanations include that some trials are suceptable or that this is a
result of too little data.
% - Some trials are highly suceptable. This is the face value effect
The first option -- that some trials are more suceptable to
issues with participant enrollment -- should allow us to
isolate categories or trials that contribute the most to this effect.
This is not what we find when we inspect the categories
in figure
\ref{fig:pred_dist_dif_delay2}.
Instead it appears that most of the categories have this high
impact regime when $\delta_p > 0.75$, although the maximum value
of this regime varies considerably.
Another explanation is that this is a modelling artefact due to priors
with strong tails and the relatively low number of trials in
each ICD-10 categories.
One option is that some categories are more suceptable to
issues with participant enrollment.
If this is the case, we should be able to isolate categories that contribute
the most to this effect.
Another is that this might be a modelling artefact, due to the relatively
low number of trials in certain ICD-10 categories.
In short, there might be high levels of uncertanty in some parameter values,
which manifest as fat tails in the distributions of the $\beta$ parameters.
Because of the logistic format of the model, these fat tails lead to
extreme values of $p$, and potentally large changes $\delta_p$.
% - Could be uncertanty. If the model is highly uncertain, e.g. there isn't enough data, we could have a small percentage of large increases. This could be in general or just for a few categories with low amounts of data.
% -
% -
I believe that this second explanation -- a model artifact due to uncertanty --
is likely to be the cause.
A few things lead me to believe this:
Three points lead me to believe this:
\begin{itemize}
\item The low fractions of E-BFMI suggest that the sampler is struggling
to explore some regions of the posterior.
According to
\cite{standevelopmentteam_runtimewarningsconvergence_2022}
\authorcite{standevelopmentteam_runtimewarningsconvergence_2022}
this is
often due to thick tails of posterior distributions.
During earlier analysis, when I had about 100 trials, the number of
warnings was significantly higher.
\item When we examine the results across different ICD-10 category,
\item When we examine the results across different ICD-10 groups,
\ref{fig:pred_dist_dif_delay2}
we note that most categories have the same upper tail spike.
\item In Figure
% \ref{fig:betas_delay},
\ref{fig:parameters_ANR_by_group},
we see that most ICD-10 categories
have fat tails in the $\beta$s, even among the categories
relatively larger sample sizes.
we note this same issue.
\item In Figure \ref{fig:parameters_ANR_by_group}, we see that some
ICD-10 categories have
\todo{note fat tails}.
\item There are few trials available, particularly among some specific
ICD-10 categories.
\todo{refer to figure ??}
\end{itemize}
\todo{Reformat so this refers to the original discussion of issues better.}
% - take a look at beta values and then discuss if that lines up with results from dist-diff by group.
% - My initial thought is that there is not enough data/too uncertain. I think this because it happens for most/all of the categories.
% -
% -
% -
Overally it is hard to escape the conclusion that more data is needed across
many -- if not all -- of the disease categories.
At the same time, the median result is a decrease in the probability
of termination when the enrollment period is held open.
My inclination is to believe that the overall effect is to reduce the
probability of termination.
We can examine the per-group distributions of differences in \ref{fig:pred_dist_dif_delay2} to
acertain that the high impact group does exist in each of the groups.
This lends credence to the idea that this is a modelling issue, potentially
due to the low amounts of data overall.
Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
result comes from different disease categories.
\begin{figure}
\includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_by_group}
\caption{Distribution of Predicted differences by Disease Group}
\label{fig:pred_dist_dif_delay2}
\end{figure}
% Examine beta parameters
% - Little movement except where data is strong, general negative movement. Still really wide
% - Note how they all learned (partial pooling) reduction in \beta from ANR?
% - Need to discuss the 5 different states. Can't remember which one is dropped for the life of me. May need to fix parameterization.
% -
Finally, in figure \ref{fig:parameters_ANR_by_group}, we can see the estimated distributions of the $\beta$ parameter for
the status: \textbf{Active, not recruiting}.
The prior distributions were centered on zero, but we can see that the pooled learning has moved the mean
values negative, representing reductions in the probability of termination across the board.
This decrease in the probability of termination is strongest in the categories of Neoplasms ($n=$),
Musculoskeletal diseases ($n=$), and Infections and Parasites ($n=$), the three categories with the most data.
As this is a comparison against the trial status XXX, we note that
\todo{The natural comparison I want to make is against the Recruting status. Do I want to redo this so that I can read that directly?It shouldn't affect the $\delta_p$ analysis, but this could probably use it.}
Overall, this suggests that extending a clinical trial's enrollment period will reduce the probability of termination.
\begin{figure}[H]
\includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR}
\caption{Distribution of parameters associated with ``Active, not recruiting'' status, by ICD-10 Category}
\label{fig:parameters_ANR_by_group}
\end{figure}
% -
Overall it is hard to escape the conclusion that more data is needed across
many -- if not all -- of the disease categories.
\end{document}

@ -4,15 +4,13 @@
\begin{document}
As noted above, there are various issues with the analysis as completed so far.
Below I discuss various issues and ways to address them that I believe
will improve the analysis.
Below I discuss various issues and ways to address them that I believe will improve the analysis.
\subsection{Increasing number of observations}
The most important step is to increase the number of observations available,
specifically the number of trials matched to ICD-10 codes with corresponding
population estimates in the Global Burden of Disease Dataset.
Improvements in Large Language Models may make this data more accessible, or
The most important step is to increase the number of observations available.
Currently this requires matching trials to ICD-10 codes by hand.
Improvements in Large-Language-Models may make this data more accessible, or
the data may be available in a commercial dataset.
@ -26,11 +24,13 @@ In most cases the trial sponsor reports the anticipated enrollment value
while the trial is still recruiting and only updates the actual enrollment
after the trial has ended.
Some trials do publish an incremental record of their enrollment numbers,
but this is not the norm.
It may be possible to impute the enrollment process if a suitible model
can be created.
% Due to the bayesian model used, this would be easy to incorporate
% \cite{mcelreath_statisticalrethinkingbayesian_2020}.
but this is rare.
Due to the bayesian model used, it would be possible to
include a model of the missing data
\cite{mcelreath_statisticalrethinkingbayesian_2020}.
which would
allow me to estimate the direct effect of slow enrollment
on clinical trial termination rates.
There has been substantial work on forecasting
multi-site enrollment rates and durations by
@ -51,31 +51,24 @@ multi-site enrollment rates and durations by
avalos-pacheco_validationpredictiveanalyses_2023,
}
but choosing between the various single and multi-site models presented is
difficult without a dataset with which to validate the results.
% In addition to needing a well calibrated model, I would require more trials,
% specifically those that report their enrollment incrementally so
% that there is data on what happens when enrollment is slower than anticipated.
% It may also be possible to estimate the probability that enrollment goals
% have been met if data can be extracted that details planned observation times.
% Of course, this is speculative at this point.
%FIXTAG: Avoid speculation here.
% \subsection{Improving Population Estimates}
%
% The Global Burden of Disease dataset contains the best estimates of disease
% population sizes that I have found so far.
% Unfortunately, for some conditions it can be relatively imprecise due to
% its focus on providing data geared towards public health policy.
% For example, GBD contains categories for both
% drug resistant and drug suceptible tuberculosis, but maps those to the same
% ICD-10 code.
% In contrast, there is no category for non-age related macular degeneration.
% Thus not every trial has a good match with the estimate of the population of
% interest.
% Finding a way to focus on trials that have good disease population estimates
% would improve the efficiency of the analysis.
% %FIXTAG: What am I trying to say here. IHME is among the best data sources.
% % How do I propose getting other data? Should probably just remove this.
difficult without a dataset to validate the results on.
\subsection{Improving Population Estimates}
The Global Burden of Disease dataset contains the best estimates of disease
population sizes that I have found so far.
Unfortunately, for some conditions it can be relatively imprecise due to
its focus on providing data geared towards public health policy.
For example, GBD contains categories for both
drug resistant and drug suceptible tuberculosis, but maps those to the same
ICD-10 code.
In contrast, there is no category for non-age related macular degeneration.
Thus not every trial has a good match with the estimate of the population of
interest.
Finding a way to focus on trials that have good disease population estimates
would improve the efficiency of the analysis.
\subsection{Improving Measures of Market Conditions}
@ -85,24 +78,18 @@ In addition to the fact that many diseases may be treated by non-pharmaceutical
means (e.g. diet, physical therapy, medical devices, etc),
off-label prescription of pharmaceuticals is legal at the federal level
\cite{commissioner_understandingunapproveduse_2019}.
%FIXTAG: Discuss how there isn't much data about off label prescription (I have a source)
These two facts both complicate measuring competing treatments,
a key part of market conditions.
One way to address non-pharmaceutical treatments is to concentrate on domains
that are primarily treated by pharmaceuticals.
Another way to address this would be to focus the analysis on just a few specific
diseases, for which a history of treatment options can be compiled.
%FIXTAG: Get rid of 'another', doesn't match context
This second approach may also allow the researcher to distinguish the direction
of causality between population size and number of drugs on the market;
%FIXTAG: join better to prior sentence
for example, drugs to treat a chronic, non-fatal disease will probably not
affect the market size much in the short to medium term.
This would require identifying diseases that are prime candidates and then
trials and drugs associated with those diseases.
% This allows the effect of market conditions to be isolated from
% the effects of the population.
% %FIXTAG: I am already proposing these as fixes
This allows the effect of market conditions to be isolated from
the effects of the population.
% Alternative approaches
% - diseases with constant kill rates? population effect should be relatively constant?

@ -5,39 +5,35 @@
Identifying commercial impediments to successfully completing
clinical trials in otherwise capable pharmaceuticals will hopefully
lead to a more robust and competitive pharmaceutical market.
%FIXTAG: too much "hopefully"
Although the current state of this research is insufficient to draw robust
conclusions, these early results suggest that delaying the close of
enrollment period reduces the probability of termination of a trial.
%FIXTAG: OK for now but I think there might be a better way to handle this for now
enrollment periods reduces the probability of termination of a trial.
% The successful completion of Phase III clinical trials is crucial for
% bringing new treatments to market.
%FIXTAG: needs to be earlier
The successful completion of Phase III clinical trials is crucial for
bringing new treatments to market.
This research provides insights into how enrollment management
impacts trial outcomes.
While the preliminary results suggest that delaying the close of enrollment
periods may reduce termination probability, the analysis
reveals significant variation across disease categories and highlights
important methodological challenges.
The primary limitation that must be addressed before drawing a strong conclusion
is that of insufficient data.
%FIXTAG: This needs rewritten
This takes two forms.
%FIXTAG: needs better transition
The first is the small sample size.
To overcome this requires an improved data matching
approach and a revised data scraper.
%FIXTAG: active voice: this can be overcome by...
The second is creating a model of enrollment that can be used to address
the causal identification issue from the joint determination of
enrollment statuses and elapsed durations of trials.
%FIXTAG: sentence is too complicated
Despite these limitations, this work establishes a framework for analyzing
operational versus strategic factors in clinical trial completion.
%FIXTAG: analyzing replaced with "separating causal effects"
The approach developed here can be extended with additional data to
provide more definitive guidance on enrollment management strategies.
%FIXTAG: the approach here + additional data can provide
%FIXTAG: tie to next sentence better
Further research in this direction could help reduce operational
barriers to trial completion or estimating the impact policies may have through
operational channels.%FIXTAG: clauses don't match. first clause needs tightened.
operational channels.
Ultimately this work will hopefully support more efficient drug
development and increased market competition. %FIXTAG: wishy-washy and duplicative.
development and increased market competition.
\end{document}

@ -31,8 +31,7 @@ one form of operational failure
in Phase III clinical trials.
Using a novel dataset constructed from administrative data registered on
ClinicalTrials.gov, I exploit variation in enrollment timing and market
conditions to identify how extending the enrollment period
affects trial completion.
conditions to identify how extending the enrollment period affects trial completion.
Specifically, I answer the question:
\textit{
``How does the probability of trial termination change
@ -44,18 +43,199 @@ pipeline and progression between clinical trial phases.
% In 1938 President Franklin D Rosevelt signed the Food, Drug, and Cosmetic Act,
% granting the Food and Drug Administration (FDA) authority to require
% pre-market approval of pharmaceuticals.
% \cite{commissioner_milestonesusfood_2023}
% As of Sept 2022 \todo{Check Date} they have approved 6,602 currently-marketed
% compounds with Structured Product Labels (SPLs)
% and 10,983 previously-marketed SPLs
% \cite{commissioner_nsde_2024},
% %from nsde table. Get number of unique application_nubmers_or_citations with most recent end date as null.
% In 1999, they began requiring that drug developers register and
% publish clinical trials on \url{https://clinicaltrials.gov}.
% This provides a public mechanism where clinical trial sponsors are
% responsible to explain what they are trying to acheive and how it will be
% measured, as well as provide the public the ability to search and find trials
% that they might enroll in.
% Multiple derived datasets such as the Cortellis Investigational Drugs dataset
% or the AACT dataset from the Clinical Trials Transformation Intiative
% integrate these data.
% This brings up a question:
% Can we use this public data on clinical trials to identify what effects the
% success or failure of trials?
% In this work, I use updates to records on
% \url{https://ClinicalTrials.gov}
% to do exactly that, disentangle the effect of participant enrollment
% and competing drugs on the market affect the success or failure of
% clinical trials.
\subsection{Background}
%Describe how clinical trials fit into the drug development landscape and how they proceed
Clinical trials are a required part of drug development.
Not only does the FDA require that a series of clinical trials demonstrate sufficient safety and efficacy of
a novel pharmaceutical compound or device, producers of derivative medicines may be required to ensure that
their generic small molecule compound -- such as ibuprofen or levothyroxine -- matches the
performance of the originator drug if delivery or dosage is changed.
For large molecule generics (termed biosimilars) such as Adalimumab
(Brand name Humira, with biosimilars Abrilada, Amjevita, Cyltezo, Hadlima, Hulio,
Hyrimoz, Idacio, Simlandi, Yuflyma, and Yusimry),
the biosimilars are required to prove they have similar efficacy and safety to the
reference drug.
%TODO? Decide whether to include this or not
%When registering these clinical trials
% discuss how these are registered and what data is published.
% Include image and discuss stages
% Discuss challenges faced
% Introduce my work
In the world of drug development, these trials are classified into different
phases of development\footnote{
\cite{anderson_fdadrugapproval_2022}
provide an overview of this process
while
\cite{commissioner_drugdevelopmentprocess_2020}
describes the process in detail.}.
Pre-clinical studies primarily establish toxicity and potential dosing levels.
% \cite{commissioner_drugdevelopmentprocess_2020}.
Phase I trials are the first attempt to evaluate safety and efficacy in humans.
Participants typically are healthy individuals, and they measure how the drug
affects healthy bodies, potential side effects, and adjust dosing levels.
Sample sizes are often less than 100 participants.
% \cite{commissioner_drugdevelopmentprocess_2020}.
Phase II trials typically involve a few hundred participants and is where
investigators will dial in dosing, research methods, and safety.
% \cite{commissioner_drugdevelopmentprocess_2020}.
A Phase III trial is the final trial before approval by the FDA, and is where
the investigator must demonstrate safety and efficacy with a large number of
participants, usually on the order of hundreds or thousands.
% \cite{commissioner_drugdevelopmentprocess_2020}.
Occasionally, a trial will be a multi-phase trial, covering aspects of either
Phases I and II or Phases II and III.
After a successful Phase III trial, the sponsor will decide whether or not
to submit an application for approval from the FDA.
Before filing this application, the developer must have completed
``two large, controlled clinical trials.''
% \cite{commissioner_drugdevelopmentprocess_2020}.
Phase IV trials are used after the drug has received marketing approval to
validate safety and efficacy in the general populace.
Throughout this whole process, the FDA is available to assist in decision-making
regarding topics such as study design, document review, and whether
they should terminate the trial.
The FDA also reserves the right to place a hold on the clinical trial for
safety or other operational concerns, although this is rare.
\cite{commissioner_drugdevelopmentprocess_2020}.
In the economics literature, most of the focus has been on describing how
drug candidates transition between different phases and their probability
of final approval.
% Lead into lit review
% Abrantes-Metz, Adams, Metz (2004)
\authorcite{abrantes-metz_pharmaceuticaldevelopmentphases_2004}
described the relationship between
various drug characteristics and how the drug progressed through clinical trials.
% This descriptive estimate was notable for using a
% mixed state proportional hazard model and estimating the impact of
% observed characteristics in each of the three phases.
They found that as Phase I and II trials last longer,
the rate of failure increases.
In contrast, Phase 3 trials generally have a higher rate of
success than failure after 91 months.
This may be due to the fact that the purpose of Phases I and II are different
from the purpose of Phase III.
Continuing on this theme,
%DiMasi FeldmanSeckler Wilson 2009
\authorcite{dimasi_trendsrisksassociated_2010}
examine the completion rate of clinical drug
development and find that for the 50 largest drug producers,
approximately 19\% of their drugs under development between 1993 and 2004
successfully moved from Phase I to receiving an New Drug Application (NDA)
or Biologics License Application (BLA).
They note a couple of changes in how drugs are developed over the years they
study, most notably that
drugs began to fail earlier in their development cycle in the
latter half of the time they studied.
They note that this may reduce the cost of new drugs by eliminating late
and costly failures in the development pipeline.
Earlier work by
\authorcite{dimasi_valueimprovingproductivity_2002}
used data on 68 investigational drugs from 10 firms to simulate how reducing
time in development reduces the costs of developing drugs.
He estimates that reducing Phase III of clinical trials by one year would
reduce total costs by about 8.9\% and that moving 5\% of clinical trial failures
from phase III to Phase II would reduce out of pocket costs by 5.6\%.
A key contribution to this drug development literature is the work by
\authorcite{khmelnitskaya_competitionattritiondrug_2021}
who created a causal identification strategy
to disentangle strategic exits from exits due to clinical failures
in the drug development pipeline.
She found that overall 8.4\% of all pipeline exits are due to strategic
terminations and that the rate of new drug production would be about 23\%
higher if those strategic terminatations were eliminated.
The work that is closest to mine is the work by
\authorcite{hwang_failureinvestigationaldrugs_2016}
who investigated causes for which late stage (Phase III)
clinical trials fail -- with a focus on trials in the USA,
Europe, Japan, Canada, and Australia.
They identified 640 novel therapies and then studied each therapy's
development history, as outlined in commercial datasets.
They found that for late stage trials that did not go on to receive approval,
57\% failed on efficacy grounds, 17\% failed on safety grounds, and 22\% failed
on commercial or other grounds.
Unfortunately the work of both
\authorcite{hwang_failureinvestigationaldrugs_2016}
and
\authorcite{khmelnitskaya_competitionattritiondrug_2021}
ignore a potentially large cause of failures: operational challenges, i.e. when
issues running or funding the trial cause it to fail before achieving its
primary objective.
In a personal review of 199 randomly selected clinical trials which terminated
before achieving their primary objective,
I found that
14.5\% cited safety or efficacy concerns,
9.1\% cited funding problems (an operational concern),
and
31\% cited enrollment issues (a separate operational concern)\footnote{
Note that these figures differ from
\authorcite{hwang_failureinvestigationaldrugs_2016}
because I sampled from all stages of trials, not just Phase III trials
focused on drug development.
}.
The main contribution of this work is the model I develop to separate
the causal effects of
market conditions (a strategic concern) from the effects of
participant enrollment (an operational concern) on Phase III Clinical trials.
This allows me to answer the question posed earlier:
\textit{
``How does the probability of trial termination change
when the enrollment period is extended?''
}
using administrative data.
To understand how I do this, we'll cover some background information on
clinical trials, the current literature,
and the administrative data I collected in section
\ref{SEC:ClinicalTrials}.
Then I'll
explain the approach to causal identification and how the data collected
matches those results,
clinical trials and the administrative data I collected in section
\ref{SEC:ClinicalTrials},
explain the approach to causal identification, the required data,
and describe how the data used matches these requirements in section
\ref{SEC:CausalAndData}.
Then we'll cover the econometric model
(section \ref{SEC:EconometricModel})
and results (section \ref{SEC:Results}).
and results (section
\ref{SEC:Results}).
Finally, we acknowledge deficiencies in the analysis and potential improvements
in section
\ref{SEC:Improvements},

@ -92,7 +92,6 @@ or termination.
Termination occurs after enrollment has begun but before achieving the
primary objective.
Understanding why trials terminate early is the key goal of this work, but
is not straightforward.
Terminated trials typically record a
@ -110,8 +109,7 @@ led to the termination, leaving us to
use another way to infer the relative impact of operational difficulties.
\todo{move the following}
To better describe termination causes, I suggest classifying them into
To better descrobe termination causes, I suggest classifying them into
three broad categories.
The first category, Safety or Efficacy concerns, occurs when data suggests
the treatment is unsafe or unlikely to achieve its therapeutic goals.
@ -129,152 +127,7 @@ These latter two categories represent true failures of the trial process,
as they prevent us from learning whether the treatment would have
been safe and effective.
\subsection{Literature on Clinical Trials}\label{SEC:LitReview}
%Describe how clinical trials fit into the drug development landscape and how they proceed
Clinical trials are a required part of drug development.
Not only does the FDA require that a series of clinical trials demonstrate sufficient safety and efficacy of
a novel pharmaceutical compound or device, producers of derivative medicines may be required to ensure that
their generic small molecule compound -- such as ibuprofen or levothyroxine -- matches the
performance of the originator drug if delivery or dosage is changed.
For large molecule generics (termed biosimilars) such as Adalimumab
(Brand name Humira, with biosimilars Abrilada, Amjevita, Cyltezo, Hadlima, Hulio,
Hyrimoz, Idacio, Simlandi, Yuflyma, and Yusimry),
the biosimilars are required to prove they have similar efficacy and safety to the
reference drug.
In the world of drug development, these trials are classified into different
phases of development\footnote{
\cite{anderson_fdadrugapproval_2022}
provide an overview of this process
while
\cite{commissioner_drugdevelopmentprocess_2020}
describes the process in detail.}.
Pre-clinical studies primarily establish toxicity and potential dosing levels.
% \cite{commissioner_drugdevelopmentprocess_2020}.
Phase I trials are the first attempt to evaluate safety and efficacy in humans.
Participants typically are healthy individuals, and they measure how the drug
affects healthy bodies, potential side effects, and adjust dosing levels.
Sample sizes are often less than 100 participants.
% \cite{commissioner_drugdevelopmentprocess_2020}.
Phase II trials typically involve a few hundred participants and is where
investigators will dial in dosing, research methods, and safety.
% \cite{commissioner_drugdevelopmentprocess_2020}.
A Phase III trial is the final trial before approval by the FDA, and is where
the investigator must demonstrate safety and efficacy with a large number of
participants, usually on the order of hundreds or thousands.
% \cite{commissioner_drugdevelopmentprocess_2020}.
Occasionally, a trial will be a multi-phase trial, covering aspects of either
Phases I and II or Phases II and III.
After a successful Phase III trial, the sponsor will decide whether or not
to submit an application for approval from the FDA.
Before filing this application, the developer must have completed
``two large, controlled clinical trials.''
% \cite{commissioner_drugdevelopmentprocess_2020}.
Phase IV trials are used after the drug has received marketing approval to
validate safety and efficacy in the general populace.
Throughout this whole process, the FDA is available to assist in decision-making
regarding topics such as study design, document review, and whether
they should terminate the trial.
The FDA also reserves the right to place a hold on the clinical trial for
safety or other operational concerns, although this is rare.
\cite{commissioner_drugdevelopmentprocess_2020}.
In the economics literature, most of the focus has been on describing how
drug candidates transition between different phases and their probability
of final approval.
% Lead into lit review
% Abrantes-Metz, Adams, Metz (2004)
\authorcite{abrantes-metz_pharmaceuticaldevelopmentphases_2004}
described the relationship between
various drug characteristics and how the drug progressed through clinical trials.
% This descriptive estimate was notable for using a
% mixed state proportional hazard model and estimating the impact of
% observed characteristics in each of the three phases.
They found that as Phase I and II trials last longer,
the rate of failure increases.
In contrast, Phase 3 trials generally have a higher rate of
success than failure after 91 months.
This may be due to the fact that the purpose of Phases I and II are different
from the purpose of Phase III.
Continuing on this theme,
%DiMasi FeldmanSeckler Wilson 2009
\authorcite{dimasi_trendsrisksassociated_2010}
examine the completion rate of clinical drug
development and find that for the 50 largest drug producers,
approximately 19\% of their drugs under development between 1993 and 2004
successfully moved from Phase I to receiving an New Drug Application (NDA)
or Biologics License Application (BLA).
They note a couple of changes in how drugs are developed over the years they
study, most notably that
drugs began to fail earlier in their development cycle in the
latter half of the time they studied.
They note that this may reduce the cost of new drugs by eliminating late
and costly failures in the development pipeline.
Earlier work by
\authorcite{dimasi_valueimprovingproductivity_2002}
used data on 68 investigational drugs from 10 firms to simulate how reducing
time in development reduces the costs of developing drugs.
He estimates that reducing Phase III of clinical trials by one year would
reduce total costs by about 8.9\% and that moving 5\% of clinical trial failures
from phase III to Phase II would reduce out of pocket costs by 5.6\%.
A key contribution to this drug development literature is the work by
\authorcite{khmelnitskaya_competitionattritiondrug_2021}
who created a causal identification strategy
to disentangle strategic exits from exits due to clinical failures
in the drug development pipeline.
She found that overall 8.4\% of all pipeline exits are due to strategic
terminations and that the rate of new drug production would be about 23\%
higher if those strategic terminatations were eliminated.
The work that is closest to mine is the work by
\authorcite{hwang_failureinvestigationaldrugs_2016}
who investigated causes for which late stage (Phase III)
clinical trials fail -- with a focus on trials in the USA,
Europe, Japan, Canada, and Australia.
They identified 640 novel therapies and then studied each therapy's
development history, as outlined in commercial datasets.
They found that for late stage trials that did not go on to receive approval,
57\% failed on efficacy grounds, 17\% failed on safety grounds, and 22\% failed
on commercial or other grounds.
Unfortunately the work of both
\authorcite{hwang_failureinvestigationaldrugs_2016}
and
\authorcite{khmelnitskaya_competitionattritiondrug_2021}
ignore a potentially large cause of failures: operational challenges, i.e. when
issues running or funding the trial cause it to fail before achieving its
primary objective.
In a personal review of 199 randomly selected clinical trials which terminated
before achieving their primary objective,
I found that
14.5\% cited safety or efficacy concerns,
9.1\% cited funding problems (an operational concern),
and
31\% cited enrollment issues (a separate operational concern)\footnote{
Note that these figures differ from
\authorcite{hwang_failureinvestigationaldrugs_2016}
because I sampled from all stages of trials, not just Phase III trials
focused on drug development.
}.
\subsection{Introduction to \href{https://ClinicalTrials.gov}{ClinicalTrials.Gov}}
\subsection{Data Summary}
%% Describe data here
Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled
drugs or devices on human subjects must register
@ -323,18 +176,13 @@ information about the past state of trials.
I combined these two sources, using the AACT dataset to select
trials of interest and then scraping \url{ClinicalTrials.gov} to get
a timeline of each trial.
The result is a series of snapshots, each documenting a specific set of
recorded changes in a trial.
It is these snapshots that provide the opportunity to estimate the
data generating process corresponding to the clinical trials for
which I have data.
%%%%%%%%%%%%%%%%%%%%%%%% Model Outline
% The way I use this data is to predict the final status of the trial
% from the snapshots that were taken, in effect asking:
% ``how does the probability of a termination change from the current state
% of the trial if X changes?''
The way I use this data is to predict the final status of the trial
from the snapshots that were taken, in effect asking:
``how does the probability of a termination change from the current state
of the trial if X changes?''
% -
% -
% -

@ -1,43 +0,0 @@
\documentclass[../Main.tex]{subfiles}
\graphicspath{{\subfix{Assets/img/}}}
\begin{document}
\begin{table}[h!]
\caption{Table of Percentiles of Distribution of Differences}
\label{TABLE:PercentilesOfDistributionOfDifferences}
\centering
\begin{tabular}{c c}
\hline
Percentile & Value \\
\hline
0\% & -0.9985020 \\
5\% & -0.3763454 \\
10\% & -0.2639654 \\
15\% & -0.2053399 \\
20\% & -0.1628793 \\
25\% & -0.1291890 \\
30\% & -0.0980523 \\
35\% & -0.0734082 \\
40\% & -0.0547123 \\
45\% & -0.0385514 \\
50\% & -0.0225949 \\
55\% & -0.0045955 \\
60\% & -0.0000394 \\
65\% & 0.0010549 \\
70\% & 0.0509626 \\
75\% & 0.1453046 \\
80\% & 0.3425234 \\
85\% & 0.7084837 \\
90\% & 0.9250351 \\
95\% & 0.9820456 \\
100\% & 1.0000000 \\
\hline
\end{tabular}
\end{table}
% This is here specifically to allow the table above to compile. Not sure why it is needed...
\begin{table}[h!]
\end{table}
\end{document}

@ -5355,7 +5355,7 @@ California 90401-3208},
file = {/home/will/Zotero/storage/KAHW2ABD/Indexing-SPL-Fact-Sheet.pdf}
}
@online{usnlm_fdaaa801finalrule,
@online{usnlm_fdaaa800finalrule,
type = {Government},
title = {{{FDAAA}} 801 and the {{Final Rule}} - {{ClinicalTrials}}.Gov},
author = {{U.S. National Library of Medicine}},

@ -5,11 +5,64 @@
Need to decide whether or not to include this set of sentences.
**** [2025-01-18 Sat 11:58] [[[[file:/home/will/research/phd_deliverables/JobMarketPaper/Paper/sections/11_intro_and_lit.tex::45]]]]
decide whether to include these details here
** 2025-W05
*** 2025-01-29 Wednesday
**** [2025-01-29 Wed 10:12] Summary of yesterday, thoughts for today
Yesterday I got my draft mostly done. I rearranged the causal inference section
fixed some references, etc.
Today I want to remove a bunch of todos, read it backwards to fix things,
and get it sent to Tom.
I'll also run it by claude.ai.
** 2025-W17
*** 2025-04-21 Monday
**** [2025-04-21 Mon 11:17] Plan based on last weeks thinking things through
get list of things that Tom says I'm Missing
- Needs more citations
- Standard econometric concerns: Endogenetiy, Simultineatiy, etc.
- Needs to justify why I am doing what I am doing. What do I add?
Marketwide attempt to measure the impact of enrollment, an operational concern.
-
Integrate additional literature I've worked with.
- How big of a concern is operational results (about 22% of failures)
- Topics of how to address issues and what issues arise are common (give a couple of examples)
- Efforts to reduce failures include better pharmokinetics, attempts at improving enrollment, better enrollment prediction (huge lit).
Then look at my outline:
- How can I adjust it to address those missing bits?
- How can I simplify the structure?
Maybe a discussion of concerns about simultineity/endogeneity/other confounds/etc is where I
bring up the confounding parameters and then build a list of how things interact.
I then use this to flesh out the DAG, and introduce the backdoor criterion.
I think I'll put this together as a bullet point draft, using the * and -
notation for paragraphs and sentences respectively. Try to get the main points
of each sentence/paragraph out.
***** List of issues identified by Tom:
Reference style (Author year)
Reference better and more often.
Introduction needs to motivate the problem & what I am trying to do. (could use the sources I have on reasons for failures)
Various issues with tense etc. Use Claude.ai as editor for those.
Reorder sections or outline better
Causal inference vs DAG approach
- standard concens in causal inference
- DAG isn't causal inference in Toms view. He is right, DAG isn't but backdoor criterion is.
- Will need to discuss standard concerns and how they may be related and then incorporate that into the DAG
- Then will need to discuss backdoor criterion, the backdoor paths that exist, and choosing adjustment sets
- Replace bullet points with paragraphs (page 12) maybe use claude to convert that?
- Page 18 comment: Refer to Robins What IF book to get citation
Thoughts:
Chapter 10: Lists 3 sources of bias in preceeding chapters (7,8,9)
- Selection
- Measurement
- Confounders
As I understand it, setting up the graph allows you to note where you
might have issues with all 3. Do-calc gives you the adjustment set to
handle confounding and selection, while measurement is handled either
through modelling uncertanty or improving you measurement approach.
***** Reading to complete before rewriting:
I think I should start by rereading (and taking notes on) What If and
the Causal Mixtape.

@ -4,19 +4,19 @@
**** DONE Push work to overleaf
DEADLINE: <2025-01-15 Wed> CLOSED: [2025-01-20 Mon 11:46]
*** 2025-01-17 Friday
**** DONE Fix JMP based on Tom's Suggestions and send to committee
CLOSED: [2025-01-29 Wed 10:11]
***** DONE Get references working properly
CLOSED: [2025-01-29 Wed 09:58]
**** TODO Redo analysis using "Recruitng" as the base status
The goal is to get the $\beta$'s for active, not recruitng.
**** TODO Fix JMP based on Tom's Suggestions and send to committee
***** TODO Get references working properly
- setup author date format
- fix references, add to Overleaf version
***** DONE fix issues
CLOSED: [2025-01-29 Wed 09:58]
***** TODO Read Backward
Identify poorly written portions (incomplete sentences and paragraphs) and what I was trying to communicate.
***** TODO fix issues
*** 2025-01-18 Saturday
**** DONE Decide if this section needs added
CLOSED: [2025-01-29 Wed 09:58]
**** TODO Decide if this section needs added
[[[[file:/home/will/research/phd_deliverables/JobMarketPaper/Paper/sections/11_intro_and_lit.tex::45]]]]
nope
**** RECINDED Update citations in lit review section.
CLOSED: [2025-01-20 Mon 11:47]
[[[[file:/home/will/research/phd_deliverables/JobMarketPaper/Paper/sections/05_LitReview.tex::25]]]]
@ -32,34 +32,36 @@
** 2025-W04
*** 2025-01-20 Monday
**** DONE get a citation for the AACT project
CLOSED: [2025-01-29 Wed 09:55]
[[[[file:/home/will/research/phd_deliverables/JobMarketPaper/Paper/sections/10_CausalStory.tex::114]]]]
*** 2025-01-23 Thursday
**** DONE Pickup citation fixes here
CLOSED: [2025-01-29 Wed 09:55]
[[[[file:/home/will/research/phd_deliverables/JobMarketPaper/Paper/sections/06_Results.tex::174]]]]
** 2025-W05
*** 2025-01-29 Wednesday
**** TODO Review JMP, list areas that need rewritten.
***** TODO Read Backward
Identify poorly written portions (incomplete sentences and paragraphs) and what I was trying to communicate.
**** TODO Redo analysis using "Recruitng" as the base status
The goal is to get the $\beta$'s for active, not recruitng.
**** TODO Rerun analysis with correct base
[[[[file:/mnt/backups/home/dad/research/PhD_Deliverables/JobMarketPaper/Paper/sections/06_Results.tex::204]]]]
The natural comparison I want to make is against the Recruting status.
Do I want to redo this so that I can read that directly?
It shouldn't affect the $\delta_p$ analysis, but this could probably use it.
YES, THIS UPDATE NEEDS TO HAPPEN. The base needs to be ``active not recruiting.''
So the plan is to set ``Active, not recruiting'' as the base condition, then
measure the effect when that is chagned to ``Recruiting''.
If that is negative, then extending recruiting reduces the probability of
termination.
*** 2025-01-30 Thursday
**** TODO How to finish for cbo
So I need to create a cbo branch, remove references to betas, and finish
edits for submission tomorrow.
*** 2025-02-01 Saturday
**** TODO Plan for finishing jmp draft
***** analysis updates
****** DONE apply fixes to main analysis
****** DONE Remove rebased etc work
CLOSED: [2025-02-01 Sat 11:24]
****** DONE reenable fit summary
****** DONE increase sampleing size
****** DONE reparameterize sigma
CLOSED: [2025-02-01 Sat 13:27]
Chagned to lognormal
****** RECINDED remove last few groups that are not actual diseases
CLOSED: [2025-02-01 Sat 13:42]
decided to leave as is for now.
***** graphics updates
****** TODO remove quantity beta comparisons
****** TODO save status_diff analysis
****** TODO change distdiff by group to include more groups
***** writing updates
****** TODO Remove discussions related to previous betas work
****** TODO add in status_diff analysis
****** TODO change mode info to match new lognormal(location,scale) priors

Loading…
Cancel
Save