First complete draft of JMP

Various adjustments, but used claude.ai to get some suggestions. Updated the causal model graph to represent my current understanding and try to make it colorblind friendly.
recording changes
14 changed files with 1328 additions and 450 deletions
--- a/Paper/Main.tex
+++ b/Paper/Main.tex
@ -57,7 +57,7 @@ completion of clinical trials\\ \small{Preliminary Draft}}


 %---------------------------------------------------------------
-\section{Causal Story and Data}\label{SEC:Data}
+\section{Causal Story and Data}\label{SEC:CausalAndData}
 %---------------------------------------------------------------
 \subfile{sections/10_CausalStory}
 \subfile{sections/02_data}
--- a/Paper/sections/02_data.tex
+++ b/Paper/sections/02_data.tex
@ -92,9 +92,11 @@ areas of particular interest.
            \item Active, not recruiting
            \item Suspended %I don't explicitly deal with this case
        \end{itemize}
-	When a trial has ended it is in one of two states:
+	When a trial has ended it is in one of three states:
        \begin{itemize}
-            \item Terminated: Trial has ended premateurly
+            \item Withdrawn: Trial has ended before any enrollment began. 
+                I filtered all of these out as they do not apply to our work.
+            \item Terminated: Trial has ended premateurly.
            \item Completed: Trial has ended after observing what they hoped to observe.
        \end{itemize}
    \item Start Date: The date that the first measurement was taken or that the 
@ -119,16 +121,17 @@ National Drug Code (NDC).
 The list of approved NDCs  are released regularly in the FDA's 
 Orangebook (small-molecule drugs) and Purplebook (Biologicals) publications.
 These two publications also contain information regarding which drugs are generics
-or biosimilars. %TODO: REF
+or biosimilars. %
 %which drugs are originators and which are generics (there is a better word for originator).

 Before a drug or drug compound is sold on the market, the FDA requires the seller
 to submit a standardized label and associated information called
 a Structured Product Label (SPL). 
-These SPLs include information about dosage, ingredients, warnings, and printed labels.
+These SPLs include information about dosage, ingredients, warnings, 
+and the format of the printed labels.
 Each NDC code can have multiple SPLs associated with it because each
 drug compound may be packaged in multiple ways, e.g. boxes with different 
-numbers of blister packs.
+numbers of blister packs, etc.
 These SPLs are made available for download so that they can be integrated 
 into patient health systems to improve patient safety 
 \cite{usfda_splfactsheet_2023}.
@ -150,16 +153,18 @@ to be on the market at a given date.
 %   Links to data

 %----------------------------------------------------
-\subsubsection{Global Disease Burden Survey}
+\subsubsection{Global Burdens of Disease (2019)}

 The University of Washington's Institute for Health Metrics and Evaluation
 published a dataset called the Global Burdens of Disease Study 2019 (GBD 2019).
 This dataset provides estimates of worldwide incidence of 
 various diseases and classes of diseases.
 %\footnote{A full list of the diseases and categories can be found in \ref{Appendix1}}
-The available measures of incidence include Deaths, Disability Adjusted Life Years (DALYs), 
-Years of Life Lost (YLL), and Years Lived with Disability (YLD) and come with 
-both an estimate and 95\% confidence interval bounds.
+The available measures of incidence include Deaths, 
+Disability Adjusted Life Years (DALYs), 
+Years of Life Lost (YLL), 
+and Years Lived with Disability (YLD) and come with 
+both an estimate and 90\% confidence interval bounds.
 Estimates are available for national, multinational, and global 
 populations 
 \cite{vos_globalburden369_2020}.
@ -172,9 +177,6 @@ that are most important from a public health perspective.
 %not quite sure how to fill this out. What I am hoping to do is to justify my use of 
 % the highest level (most precise categories of data. Might be better to discuss
 % the nested category outline.
-The IHME also provides a link between the disease/cause hierarchy and ICD10 
-codes 
-\cite{globalburdenofdiseasecollaborativenetwork_globalburdendisease_2020a}.


 %----------------------------------------------------
@ -254,14 +256,14 @@ The second layer of the hierarchy consists of about 225 groupings.


 %how was it used
-The GBD database provided a mapping between their categories and ICD-10
-codes 
-\cite{globalburdenofdiseasecollaborativenetwork_globalburdendisease_2020a}
-.
-Unfortunately it appears to use a combination of the default WHO ICD-10 codes
-and the ICD-10-CM codes from the CMS. 
-Additionally, many diseases classified by ICD-10 codes do not correspond to 
-categories in the GDB database.
+% The GBD database provided a mapping between their categories and ICD-10
+% codes 
+% \cite{globalburdenofdiseasecollaborativenetwork_globalburdendisease_2020a}
+% .
+% Unfortunately it appears to use a combination of the default WHO ICD-10 codes
+% and the ICD-10-CM codes from the CMS. 
+% Additionally, many diseases classified by ICD-10 codes do not correspond to 
+% categories in the GDB database.

 %how it was obtained
 As I needed a combined list of ICD-10 codes, I first obtained the 2019 version
@ -308,14 +310,9 @@ and the trial's overall status at the time.
 I also extracted the anticipated enrollment closest to the actual start date 
 of the trial, which I will call the planned enrollment under the assumption
 that the sponsor is recording their current plan for enrollment.
-From these I constructed a couple of normalized values.
-
-The first is a normalized measure of enrollment. 
-This was constructed by dividing the snapshot enrollment by the planned enrollment.
-The purpose of this was to normalize enrollment to a scale roughly around 1 
-instead of the widely varying counts that raw enrollment would give.
-The second was a measure of how far along the trial was
-in it's planned duration, in other word a measure of elapsed duration.
+
+I then calculated a normalized measure of how far along the trial was
+in it's planned duration; in other word, a measure of elapsed duration.
 This was calculated for each snapshot as:
 \begin{align}
 	\text{Elapsed Duration} = 
@ -331,14 +328,12 @@ As an initial measure of market conditions I have gathered the number of brands
 that are producing drugs containing the compound(s) of interest in the trial.
 This was done by extracting the RxCUIs that represented the drugs of interest,
 then linking those to the RxCUIs that are brands containing those ingredients.
-
-
 As a secondary measure of market conditions, I linked clinical trials to the
 USP Drug Classification list.
+
 Once I had linked the drugs used in a trial to the applicable USP DC category 
 and class, I could find the number of alternative brands in that class.
-This matching was performed by hand, using a custom web interface to the database.
-
+This matching was performed by hand, using a custom web interface I wrote.
 In order to link clinical trials to standardized ICD-10 conditions and thus
 to the Global Burdens of Disease Data, I wrote a python script to search the 
 UMLS system for ICD-10 codes that matched the MeSH descriptions for
@ -349,11 +344,6 @@ This search resulted in generally three categories of search results:
    \item The results contained a large number of entries, a few of which were correct.
    \item The results did not contain any matches.
 \end{enumerate}
-In these cases I needed a way to validate each match and potentially add my own
-ICD-10 codes to each trial.
-This matching was also performed by hand, using a separate custom web interface to the database.
-
-The effort to manually match ICD-10 codes and USP DC categories and classes data is ongoing.

 %Describe linking icd10 codes to GBD
 % Not every icd10 code maps, so some trials are excluded.
@ -365,7 +355,7 @@ Linking to one of the disease categories in the GBD heirarchy is similarly easy.
 To get the best estimate of the size of the population associated with a disease,
 each trial is linked to the most specific disease category applicable.
 As not every ICD-10 code is linked to a condition in the GBD, those without any 
-applicable conditions are dropped from the dataset.
+applicable conditions were dropped from the dataset.


 \end{document}
--- a/Paper/sections/04_EconometricModel.tex
+++ b/Paper/sections/04_EconometricModel.tex
@ -6,11 +6,8 @@

 The model I use is a 
 hierarchal logistic regression model where the 
-hierarchies are based on disease categories.
-%%NOTATION
-% change notation
-% i indexes trials for y and d 
-% n indexes snapshots within the trial
+hierarchies correspond to the 22 top-level ICD-10 disease categories.
+The goal is to take each snapshot and predict the probability of termination.

 First, some notation:
 \begin{itemize}
@ -23,7 +20,6 @@ First, some notation:
        variables associated with the snapshot.
 \end{itemize} 

-The goal is to take each snapshot and predict 
 The actual specification of the model to measure 
 the direct effect of enrollment is:
 \begin{align}
@ -33,7 +29,7 @@ the direct effect of enrollment is:
 Where beta is indexed by 
 $d \in \{1,2,\dots,21,22\}$ 
 for each general ICD-10 category.
-The betas are distributed
+The $\beta$s are distributed
 \begin{align}
    \beta(d_i) \sim \text{Normal}(\mu_i,\sigma_i I)
 \end{align}
@ -47,7 +43,6 @@ With hyperpriors


 The independent variables include: 
-\todo{Make sure data is described before this point.}
 \begin{subequations}
 \begin{align}
    x_{i,n}\beta(d_i) 
@ -66,20 +61,16 @@ The independent variables include:
 \end{align}
 \end{subequations}
 The arcsinh transform is used because it is similar to a log transform but
-differentiably handles counts of zero since 
+differentiably handles counts of zero and 
 $\text{arcsinh}(0) = \ln (0 + \sqrt{0^2 + 1}) =0$.
-Note that in this is a heirarchal model, each IDC-10 disease category 
-gets it's own set of parameters, and that is why the $\beta$s are parameterized
-by $d_i$.
 %%%% Not sure if space should go here. I think these work well together.
-Other variables are implicitly controlled for as they are used 
+Some of the other variables are implicitly controlled for as they are used 
 to select the trials of interest.
 These include:
-        \todo{double check these in the code.}
 \begin{itemize}
    \item The trial is Phase 3.
    \item The trial has a Data Monitoring Committee.
-    \item The compounds are FDA regulated drug.
+    \item The compounds are FDA regulated drugs.
    \item The trial was never suspended\footnote{
        This was because I wasn't sure how to handle it in the model
        when I started scraping the data. 
@ -128,8 +119,10 @@ under the counterfactual where enrollment had not yet closed.
 The difference 
 $\delta_{p_{i,n}}$ 
 is then calculated for each trial, and saved. 
-After repeating this for all the posterior samples, we have an esitmate 
-for the posterior distribution of differences between treatement and control.
+After repeating this for all the posterior samples and 
+all trials at their point of close, we have an esitmate 
+for the posterior distribution of differences between treatement and control
+for selected trials.


 \end{document}
--- a/Paper/sections/06_Results.tex
+++ b/Paper/sections/06_Results.tex
@ -71,6 +71,161 @@ not represented at all.
    \label{FIG:barchart_idc_categories}
 \end{figure}

+% Estimation Procedure
+I fit the econometric model using mc-stan 
+\cite{standevelopmentteam_StanModelling_2022}
+through the rstan 
+\cite{standevelopmentteam_RStanInterface_2023}
+interface using 4 chains with 
+%describe  
+2,500
+warmup iterations and
+2,500
+sampling iterations each.
+
+Two of the chains experienced a low 
+Estimated Baysian Fraction of Missing Information (E-BFMI) ,
+suggesting that there are some parts of the posterior distribution
+that were not explored well during the model fitting. 
+I presume this is due to the low number of trials in some of the 
+ICD-10 categories.
+We can see in Figure \ref{fig:barchart_idc_categories} that some of these 
+disease categories had a single trial represented while others were 
+not represented at all.
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/trials_details/CategoryCounts}
+    \caption{Bar chart of trials by ICD-10 categories}
+    \label{fig:barchart_idc_categories}
+\end{figure}
+
+
+\subsection{Primary Results}
+
+The primary, causally-identified value we can estimate is the change in 
+the probability of termination caused by (counterfactually) keeping enrollment
+open instead of closing enrollment when observed. 
+In figure \ref{fig:pred_dist_diff_delay} below, we see this impact of 
+keeping enrollment open.
+
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_boxplot}
+    \todo{Replace this graphic with the histdiff with boxplot}
+    \small{
+        Values near 1 indicate a near perfect increase in the probability 
+        of termination. 
+        Values near 0 indicate little change in probability,
+        while values near -1, represent a decrease in the probability
+        of termination. 
+        The scale is in probability points, thus a value near 1 is a change 
+        from unlikely to terminate under control, to highly likely to 
+        terminate.
+    }
+    \caption{Histogram of the Distribution of Predicted Differences}
+    \label{fig:pred_dist_diff_delay}
+\end{figure}
+
+There are a few interesting things to point out here. 
+Let's start by getting aquainted with the details of the distribution above.
+% - spike at 0
+% - the boxplot
+% - 63% of mass below 0 : find better way to say that
+%   - For a random trial, there is a 63% chance that the impact is to reduce the probability of a termination.
+% - 2 pctg-point wide band centered on 0 has ~13% of the masss
+% - mean represents 9.x% increase in probability of termination. A quick simulation gives about the same pctg-point increase in terminated trials.
+
+A few interesting interpretation bits come out of this.
+% - there are 3 regimes: low impact (near zero), medium impact (concentrated in decreased probability of termination), and high impact (concentrated in increased probability of termination). 
+The first this that there appear to be three different regimes. 
+The first regime consists of the low impact results, i.e. those values of $\delta_p$ 
+near zero. 
+About 13\% of trials lie within a single percentage point change of zero, 
+suggesting that there is a reasonable chance that delaying 
+a close of enrollment has no impact. 
+The second regime consists of the moderate impact on clinical trials'
+probabilities of termination, say values in the interval $[-0.5, 0.5]$ 
+on the graph.
+Most of this probability mass is represents a decrease in the probability of 
+a termination, some of it rather large.
+Finally, there exists the high impact region, almost exclusively concentrated 
+around increases in the probability of termination at $\delta_p > 0.75$. 
+These represent cases where delaying the close of enrollemnt changes a trial
+from a case where they were highly likely to complete their primary objectives to 
+a case where they were likely or almost certain to terminate the trial early.
+%   - the high impact regime is strange because it consists of trials that moved from unlikely (<20% chance) of termination to a high chance (>80% chance) of termination. Something like 5% of all trials have a greater than 98 percentage point increase in termination. Not sure what this is doing. 
+
+%   - Potential Explanations for high impact regime:
+How could this intervention have such a wide range in the intensity 
+and direction of impacts?
+A few explanations include that some trials are suceptable or that this is a 
+result of too little data.
+%       - Some trials are highly suceptable. This is the face value effect
+One option is that some categories are more suceptable to 
+issues with participant enrollment. 
+If this is the case, we should be able to isolate categories that contribute
+the most to this effect.
+Another is that this might be a modelling artefact, due to the relatively
+low number of trials in certain ICD-10 categories. 
+In short, there might be high levels of uncertanty in some parameter values,
+which manifest as fat tails in the distributions of the $\beta$ parameters. 
+Because of the logistic format of the model, these fat tails lead to 
+extreme values of $p$, and potentally large changes $\delta_p$. 
+%       - Could be uncertanty. If the model is highly uncertain, e.g. there isn't enough data, we could have a small percentage of large increases. This could be in general or just for a few categories with low amounts of data.
+% - 
+% - 
+
+I believe that this second explanation -- a model artifact due to uncertanty --
+is likely to be the cause. 
+Three points lead me to believe this:
+\begin{itemize}
+    \item The low fractions of E-BFMI suggest that the sampler is struggling 
+        to explore some regions of the posterior. 
+        According to \cite{standevelopmentteam_RuntimeWarnings_2022} this is 
+        often due to thick tails of posterior distributions.
+    \item When we examine the results across different ICD-10 groups, 
+        \ref{fig:pred_dist_dif_delay2}
+        \todo{move figure from below}
+        we note this same issue.
+    \item In Figure \ref{fig:betas_delay}, we see that some some ICD-10 categories
+        \todo{add figure}
+        have \todo{note fat tails}.
+    \item There are few trials available, particularly among some specific 
+        ICD-10 categories.
+\end{itemize}
+%           - take a look at beta values and then discuss if that lines up with results from dist-diff by group. 
+%       - My initial thought is that there is not enough data/too uncertain. I think this because it happens for most/all of the categories.
+% - 
+% - 
+% - 
+Overally it is hard to escape the conclusion that more data is needed across
+many -- if not all -- of the disease categories.
+
+
+
+Figure \ref{fig:pred_dist_dif_delay2} shows how this overall
+result comes from different disease categories.
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/dist_diff_analysis/p_delay_intervention_distdiff_by_group}
+    \caption{Distribution of Predicted differences by Disease Group}
+    \label{fig:pred_dist_dif_delay2}
+\end{figure}
+
+
+\subsection{Secondary Results}
+
+% Examine beta parameters 
+% - Little movement except where data is strong, general negative movement. Still really wide 
+% - Note how they all learned (partial pooling) reduction in \beta from ANR?
+% - Need to discuss the 5 different states. Can't remember which one is dropped for the life of me. May need to fix parameterization.
+% - 
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/betas/parameter_across_groups/parameters_12_status_ANR}
+    \caption{Distribution of parameters associated with ``Active, not recruiting'' status, by ICD-10 Category}
+    \label{fig:parameters_ANR_by_group}
+\end{figure}
+% - 

 \subsection{Primary Results}

--- a/Paper/sections/08_PotentialImprovements.tex
+++ b/Paper/sections/08_PotentialImprovements.tex
@ -30,7 +30,7 @@ include a model of the missing data
 \cite{mcelreath_statisticalrethinkingbayesian_2020}.
 which would
 allow me to estimate the direct effect of slow enrollment 
-on clinical trial termination rates
+on clinical trial termination rates.

 There has been substantial work on forecasting
 multi-site enrollment rates and durations by
@ -67,6 +67,8 @@ ICD-10 code.
 In contrast, there is no category for non-age related macular degeneration.
 Thus not every trial has a good match with the estimate of the population of
 interest.
+Finding a way to focus on trials that have good disease population estimates
+would improve the efficiency of the analysis.

 \subsection{Improving Measures of Market Conditions}

--- a/Paper/sections/09_Conclusion.tex
+++ b/Paper/sections/09_Conclusion.tex
@ -4,9 +4,36 @@
 \begin{document}
 Identifying commercial impediments to successfully completing 
 clinical trials in otherwise capable pharmaceuticals will hopefully 
-lead to a more robust and competitive market.
+lead to a more robust and competitive pharmaceutical market.
 Although the current state of this research is insufficient to draw robust
-conclusions, these early results suggest that delaying the close of enrollment periods
-reduces the probability of termination of a trial.
+conclusions, these early results suggest that delaying the close of 
+enrollment periods reduces the probability of termination of a trial.

+The successful completion of Phase III clinical trials is crucial for 
+bringing new treatments to market.  
+This research provides insights into how enrollment management 
+impacts trial outcomes.
+While the preliminary results suggest that delaying the close of enrollment 
+periods may reduce termination probability, the analysis 
+reveals significant variation across disease categories and highlights 
+important methodological challenges. 
+The primary limitation that must be addressed before drawing a strong conclusion
+is that of insufficient data. 
+This takes two forms.
+The first is the small sample size. 
+To overcome this requires an improved data matching 
+approach and a revised data scraper.
+The second is creating a model of enrollment that can be used to address
+the causal identification issue from the joint determination of
+enrollment statuses and elapsed durations of trials.
+
+Despite these limitations, this work establishes a framework for analyzing 
+operational versus strategic factors in clinical trial completion. 
+The approach developed here can be extended with additional data to 
+provide more definitive guidance on enrollment management strategies. 
+Further research in this direction could help reduce operational 
+barriers to trial completion or estimating the impact policies may have through
+operational channels.
+Ultimately this work will hopefully support more efficient drug 
+development and increased market competition.
 \end{document}
--- a/Paper/sections/10_CausalStory.tex
+++ b/Paper/sections/10_CausalStory.tex
@ -3,131 +3,134 @@

 \begin{document}

+%I need to describe separating concerns, e.g.
+
 % Begin by talking about goal, what does it mean? This might need some work prior to give more background.
 As I am trying to separate strategic concerns 
 (the effect of a marginal treatment methodology) 
 and an operational concern 
 (the effect of a delay in closing enrollment), 
 we need to look at what confounds these effects and how we might measure them.
+To start, we'll look at the data generating model, the values of interest, 
+and both the observed and unobserved confounders.
+We'll also discuss how the data collected fits the data generating process.

 The primary effects one might expect to see are that
 \begin{enumerate}
    \item Adding more drugs to the market will make it harder to 
        finish a trial as it is
        more likely to be terminated due to concerns about profitabilty.
-    \item Adding more drugs will make it harder to recruit, slowing enrollment.
-    \item Enrollment challenges increase the likelihood that a trial will 
-        terminate.
-    % Mentioned below
-    % \item A large population/market will tends to have more drugs to treat it 
-    %     because it is more profitable. 
-    % \item A large population/market will make it easier to recruit, 
-    %     reducing the likelihood of a termination due to enrollment failure.
+    \item Adding more drugs to the market 
+        will make it harder to recruit, slowing enrollment.
+    \item Enrollment challenges (i.e. delays) increase the likelihood that 
+        a trial will terminate.
 \end{enumerate}
+Unfortunately, these causal effects are confounded in many different ways. 
+Figure \ref{FIG:CausalModel} contains a description of the causal model.

-There are a few fundamental issues that arise when trying to estimate 
-these effects.
-The first is that the severity of the disease and the size of the population 
-who has that disease affects the ease of enrolling participants. 
-For example, a large population may make it easier to find enough participants
-to achieve the required statistical discrimination between 
-control and treatment.
-Second, for some diseases there exists an endogenous dynamic 
-between the treatments available for a disease and the 
-market size/population with that disease. 
-\authorcite{cerda_endogenousinnovationspharmaceutical_2007}
-proposes two mechanisms
-that link the drugs on the market and market size. 
-The inverse is that for many chronic diseases with high mortality rates, 
-more drugs cause better survivability, increasing the size of those markets.
-The third major confound is that the drugs on the market affect enrollment. 
-If there is a treatment already on the market, patients or their doctors
-may be less inclined to participate in the trial, even if the current treatment
-has severe downsides. 
-
-There are additional problems. 
-One is in that the disease being treated affects the 
-safety and efficacy standards that the drug will be held too. 
-For example, if a particular cancer is very deadly and does not respond well
-to current treatments, Phase I trials will enroll patients with that cancer, 
-as opposed to the standard of enrolling healthy volunteers 
-\cite{commissioner_drugdevelopmentprocess_2020}
-to establish safe dosages.
-The trial is more likely to be terminated early if the drug is unsafe or has no
-discerenable effect, therefore termination depends in part on a compound-disease 
-interaction.
-Another challenge comes from the interaction between duration and termination;
-in that if a trial terminates before closing enrollment for issues other 
-than enrollment, then the enrollment will still be low. 
-On the other hand, if enrollment is low, the trial might terminate.
-These outcomes are indistinguishable in the data provided by the final 
-\url{ClinicalTrials.gov} dataset.
+% The first issue is that the severity of the disease and the size of the population 
+% who has that disease affects the ease of enrolling participants. 
+% For example, a large population may make it easier to find enough participants
+% to achieve the required statistical discrimination between 
+% control and treatment.
+% Second, for some diseases there exists an endogenous dynamic 
+% between the treatments available for a disease and the 
+% market size/population with that disease. 
+% \authorcite{cerda_endogenousinnovationspharmaceutical_2007}
+% proposes two mechanisms
+% that link the drugs on the market and market size. 
+% The inverse is that for many chronic diseases with high mortality rates, 
+% more drugs cause better survivability, increasing the size of those markets.
+% The third major confound is that the drugs on the market affect enrollment. 
+% If there is a treatment already on the market, patients or their doctors
+% may be less inclined to participate in the trial, even if the current treatment
+% has severe downsides. 
+%
+% There are additional problems. 
+% One is in that the disease being treated affects the 
+% safety and efficacy standards that the drug will be held too. 
+% For example, if a particular cancer is very deadly and does not respond well
+% to current treatments, Phase I trials will enroll patients with that cancer, 
+% as opposed to the standard of enrolling healthy volunteers 
+% \cite{commissioner_drugdevelopmentprocess_2020}
+% to establish safe dosages and (hopefully) obtain some effectiveness data.
+% % The trial is more likely to be terminated early if the drug is unsafe or has no
+% % discerenable effect, therefore termination depends in part on a compound-disease 
+% % interaction.
+% Another challenge comes from the interaction between duration and termination;
+% in that if a trial terminates before closing enrollment for issues other 
+% than enrollment, then the enrollment will still be low. 
+% On the other hand, if enrollment is low, the trial might terminate.
+% Thus it is impossible to tell if the low enrollment caused the termination
+% or if the termination caused the low enrollment.
+% Finally, while conducting a trial, the safety and efficacy of a drug are driven by
+% fundamental pharmacokinetic properties of the compounds. 
+% These are only imperfectly measured both prior to and during any given trial.
+% Previously measured safety and efficacy inform the decision to start the trial
+% in the first place while currently observed safety and efficiency results
+% help the sponsor judge whether to continue the trial.
+% In contrast, the recruitment rate may depend on the previous results about safety 
+% and efficacy.

-Finally, while conducting a trial, the safety and efficacy of a drug are driven by
-fundamental pharmacokinetic properties of the compounds. 
-These are only imperfectly measured both prior to and during any given trial.
-Previously measured safety and efficacy inform the decision to start the trial
-in the first place while currently observed safety and efficiency results
-help the sponsor judge whether to continue the trial.

 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
-\subsection{Data Summary}
-%% Describe data here
-Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled 
-drugs or devices on human subjects must register 
-their trial at \url{ClinicalTrials.gov}
-(\cite{anderson_fdadrugapproval_2022}).
-This involves submitting information on the expected enrollment and duration of
-trials, drugs or devices that will be used, treatment protocols and study arms, 
-as well as contact information the trial sponsor and treatment sites.
-
-When starting a new trial, the required information must be submitted 
-``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
-After the initial submission, the data is briefly reviewed for quality and 
-then the trial record is published and the trial is assigned a 
-National Clinical Trial (NCT) identifier.
-(\cite{anderson_fdadrugapproval_2022}).
-
-Each trial's record is updated periodically, including a final update that must occur 
-within a year of completing the primary objective, although exceptions are
-available for trials related to drug approvals or for trials with secondary
-objectives that require further observation\footnote{This rule came into effect in 2017}
-(\cite{anderson_fdadrugapproval_2022}).
-Other than the requirements for the first and last submissions, all other
-updates occur at the discresion of the trial sponsor.
-Because the ClinicalTrials.gov website serves as a central point of information
-on which trials are active or recruting for a given condition or drug,
-most trials are updated multiple times during their progression.
-
-There are two primary ways to access data about clinical trials.
-The first is to search individual trials on ClinicalTrials.gov with a web browser.
-This web portal shows the current information about the trial and provides 
-access to snapshots of previously submitted information.
-Together, these features fulfill most of the needs of those seeking 
-to join a clinical trial.
-For this project I've been able to scrape these historical records to establish
-snapshots of the records provided.
-%include screenshots?
-The second way to access the data is through a normalized database setup by
-the 
-\href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
-called AACT. %TODO: Get CITATION
-The AACT database is available as a PostgreSQL database dump or set of 
-flat-files. 
-These dumps match a near-current version of the ClinicalTrials.gov database.
-This format is ameniable to large scale analysis, but does not contain 
-information about the past state of trials.
-I combined these two sources, using the AACT dataset to select 
-trials of interest and then scraping \url{ClinicalTrials.gov} to get 
-a timeline of each trial.
-
-%%%%%%%%%%%%%%%%%%%%%%%% Model Outline
-
-The way I use this data is to predict the final status of the trial 
-from the snapshots that were taken, in effect asking:
-``how does the probability of a termination change from the current state 
-of the trial if X changes?''
-
+% \subsection{Data Summary}
+% %% Describe data here
+% Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled 
+% drugs or devices on human subjects must register 
+% their trial at \url{ClinicalTrials.gov}
+% (\cite{anderson_fdadrugapproval_2022}).
+% This involves submitting information on the expected enrollment and duration of
+% trials, drugs or devices that will be used, treatment protocols and study arms, 
+% as well as contact information the trial sponsor and treatment sites.
+%
+% When starting a new trial, the required information must be submitted 
+% ``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
+% After the initial submission, the data is briefly reviewed for quality and 
+% then the trial record is published and the trial is assigned a 
+% National Clinical Trial (NCT) identifier.
+% (\cite{anderson_fdadrugapproval_2022}).
+%
+% Each trial's record is updated periodically, including a final update that must occur 
+% within a year of completing the primary objective, although exceptions are
+% available for trials related to drug approvals or for trials with secondary
+% objectives that require further observation\footnote{This rule came into effect in 2017}
+% (\cite{anderson_fdadrugapproval_2022}).
+% Other than the requirements for the first and last submissions, all other
+% updates occur at the discresion of the trial sponsor.
+% Because the ClinicalTrials.gov website serves as a central point of information
+% on which trials are active or recruting for a given condition or drug,
+% most trials are updated multiple times during their progression.
+%
+% There are two primary ways to access data about clinical trials.
+% The first is to search individual trials on ClinicalTrials.gov with a web browser.
+% This web portal shows the current information about the trial and provides 
+% access to snapshots of previously submitted information.
+% Together, these features fulfill most of the needs of those seeking 
+% to join a clinical trial.
+% For this project I've been able to scrape these historical records to establish
+% snapshots of the records provided.
+% %include screenshots?
+% The second way to access the data is through a normalized database setup by
+% the 
+% \href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
+% called AACT. %TODO: Get CITATION
+% The AACT database is available as a PostgreSQL database dump or set of 
+% flat-files. 
+% These dumps match a near-current version of the ClinicalTrials.gov database.
+% This format is ameniable to large scale analysis, but does not contain 
+% information about the past state of trials.
+% I combined these two sources, using the AACT dataset to select 
+% trials of interest and then scraping \url{ClinicalTrials.gov} to get 
+% a timeline of each trial.
+%
+% %%%%%%%%%%%%%%%%%%%%%%%% Model Outline
+%
+% The way I use this data is to predict the final status of the trial 
+% from the snapshots that were taken, in effect asking:
+% ``how does the probability of a termination change from the current state 
+% of the trial if X changes?''
+%
 %% Return to causal identification
 \subsection{Causal Identification}

@ -137,7 +140,6 @@ structural causal model.
 Because the data generating process for the clinical trials records is rather 
 straightforward, this is an ideal place to use
 \authorcite{pearl_causalitymodelsreasoning_2009}
-
 Do-Calculus.
 This process involves describing the data generating process in the form of 
 a directed acyclic graph, where the nodes represent different variables
@ -169,51 +171,46 @@ When appropriate issues arise, the study sponsor terminates the trial, otherwise
 it continues to completion.

 \begin{figure}[H] %use [H] to fix the figure here.
-    \frame{
-    \scalebox{0.65}{
-             \tikzfig{../assets/tikzit/CausalGraph2}
-    }
-    }
-    \todo{check if this is the correct graph}
+    \includegraphics[width=\textwidth]{../assets/img/CausalModel.drawio.png}
    \caption{Graphical Causal Model}
-    
    % \small{Crimson boxes are the variables of interest, 
    % white boxes are unobserved, while the gray boxes will be controlled for.}
-    \label{Fig:CausalModel}
+    \label{FIG:CausalModel}
 \end{figure}


-% Constructing the model more explicitly
-% - quickly describe each node and line.
-\todo{I think I need to blend the data section in before this, to give some overall information on data.}
-\todo{I may need to add some information on snapshots so that this makes sense.}
-
-A quick summary of the nodes of the DAG, the exact representation in the data, and their impact: 
+A quick summary of the nodes of the DAG, 
+which nodes are captured in the data, 
+the hypothesized relationships in the model,
+and the proposed confounding pathways.
 \begin{itemize}
-    \item Main Interests (Crimson Boxes)
+    \item Items of Interest (Blue boxes and Arrow)
        \begin{enumerate}
-            \item \texttt{Will Terminate?}: 
-                If the final status of the trial was \textit{terminated} 
-                and comes from the AACT dataset.
-                or \textit{completed}.
-            \item \texttt{Enrollment Status}: 
-                    This describes the current enrollment status of the snapshot, e.g. 
-                    \texttt{Recruiting},
-                    \texttt{Enrolling by invitation only},
-                    or
-                    \texttt{Active, not recruting}.
-            \item \texttt{Market Measures}: 
-                Various measures of the number of alternate drugs on the market.
-                These are either the number of other drugs with the same active ingredient as the trial
-                (both generic and originators),
-                and those considered alternatives in various formularies published by the United States Pharmacopeia.
+            \item \texttt{Enrollment Level (Enrollment Status)}:
+                While occasionally a trial will keep the enrollment numbers
+                up to date, the only regular information on enrollment recieved
+                is the enrollment status, i.e. whether they have finished
+                recruiting or not.
+            \item \texttt{Will it Terminate?}: 
+                This represents whether the trial was terminated or if it 
+                completed successfully.
+            \item The effect of \texttt{Enrollment Status} on 
+                \texttt{Will it Terminate?}: 
+                How does changing the enrollment status affect the 
+                probability of termination.
        \end{enumerate}
-    \item Observed Confounders (Gray Boxes)
+    \item Observed values (Solid orange boxes)
        \begin{enumerate}
-            \item \texttt{Condition}: 
+            \item \texttt{Condition} 
+                (Not drawn in DAG because it impacts everything):
                The underlying condition, classified by IDC-10 group. 
                This impacts every other aspect of the model and is pulled from
                the AACT dataset.
+            \item \texttt{Market Measures}: 
+                Various measures of the number of alternate drugs on the market.
+                These are either the number of other drugs with the same active ingredient as the trial
+                (both generic and originators),
+                and those considered alternatives in various formularies published by the United States Pharmacopeia.
            \item \texttt{Population (market size)}: 
                Multiple measures of the impact the disease.
                These are measured by the DALY cost of the disease, and is 
@ -233,7 +230,7 @@ A quick summary of the nodes of the DAG, the exact representation in the data, a
                This is included in the analysis by only including 
                Phase III trials registered in the AACT dataset.
        \end{enumerate}
-    \item Unobserved Confounders (White Boxes)
+    \item Unobserved (Green Boxes with squiggle hatch marks)
        \begin{enumerate}
            \item \texttt{Fundamental Efficacy and Safety}:
                The underlying safety of the compound. 
@ -252,27 +249,63 @@ A quick summary of the nodes of the DAG, the exact representation in the data, a
                As this information doesn't appear to be provided to 
                participants, we don't consider it.
        \end{enumerate}
-\end{itemize}
-
+% \end{itemize}
 %
-
-\begin{itemize}
-    \item Relationships of interest
+% %
+%
+% \begin{itemize}
+%     \item Relationships of interest
+%         \begin{enumerate}
+%             \item \texttt{Enrollment Status} $\rightarrow$ \texttt{Will Terminate?}:
+%                 This is the primary effect of interest.
+%             \item \texttt{Market Measures} $\rightarrow$ \texttt{Will Terminate?}:
+%                 This is the secondary effect of interest.
+%         \end{enumerate}
+    \item Jointly determined variables
        \begin{enumerate}
-            \item \texttt{Enrollment Status} $\rightarrow$ \texttt{Will Terminate?}:
-                This is the primary effect of interest.
-            \item \texttt{Market Measures} $\rightarrow$ \texttt{Will Terminate?}:
-                This is the secondary effect of interest.
+            \item 
+                \texttt{Enrollment Level (Enrollment Status)}
+                $\leftrightarrow$ 
+                \texttt{Elapsed Duration}:
+                Because I only observe enrollment status and have not good estimate of 
+                the enrollment process, there is a potential for confounding between
+                the elapsed duration of a trial and the enrollment status.
+                The proposed mechansims are through the partially observed levels of 
+                enrollment.
+                First, as a trial progresses, the enrollment levels should grow until
+                it matches the planned enrollment and the trial ends. 
+                Thus under good circumstances, elapsed duration drives 
+                enrollment levels.
+                Under bad circumstances though, low enrollment levels may cause the 
+                duration to extend, as study sponsors spend more resources
+                to complete the trial successfully.
+                This is an issue because the only complete measure of enrollment
+                that we currently have is the enrollment status, and thus I cannot 
+                control for this effect.
+            \item
+                \texttt{Market Conditions}
+                $\leftrightarrow$ 
+                \texttt{Population}:
+                    There exists an endogenous dynamic between 
+                    between the treatments available for a disease and the 
+                    market size/population with that disease. 
+                    \authorcite{cerda_endogenousinnovationspharmaceutical_2007}
+                    proposes two mechanisms
+                    that link the drugs on the market and market size. 
+                    The first is that a larger population increases the potential 
+                    profitability, trying to get more treatments allowed.
+                    The inverse is that for many chronic diseases with high mortality rates, 
+                    more drugs cause better survivability, increasing the size of those markets.
        \end{enumerate}
    \item Confounding Pathways
        \begin{enumerate}
            \item 
-                \texttt{Condition}: 
-                Affects every other node. 
-                Part of the Adjustment Set.
+                \texttt{Condition} (Not drawn in figure \ref{FIG:CausalModel}): 
+                Interacts with everything. 
            \item Backdoor Pathway 
                between \texttt{Will Terminate?} and 
-                \texttt{Enrollment Status} through safety and efficiency.
+                \texttt{Enrollment Status} through 
+                \texttt{Fundamental Safety and Efficacy}.
                The concern is that since previously learned information 
                and current information are driven by the same underlying 
                physical reality, the enrollment process and 
@ -282,107 +315,58 @@ A quick summary of the nodes of the DAG, the exact representation in the data, a
                Below I describe the exact pathways.
                \begin{enumerate}
                    \item 
+                        \texttt{Will Terminate?}
+                        $\leftarrow$ 
+                        \texttt{Currently Observed Efficacy and Safety}
+                        $\leftarrow$ 
                        \texttt{Fundamental Efficacy and Safety} 
                        $\rightarrow$ 
-                        \texttt{Currently Observed Efficacy and Safety}:
-                        This relationship represents the measurements of
-                        safety and efficacy in the current trial. 
-                    \item 
-                        \texttt{Currently Observed Efficacy and Safety}:
+                        \texttt{Previously Observed Efficacy and Safety}
                        $\rightarrow$ 
-                        \texttt{Will Terminate?}:
-                        This is how the measurements of safety and efficacy in the 
-                        current trial affect the probability of termination.
-                        % typically, evidence of a lack safety or efficacy is 
-                        % enought to terminate the trial.
-                    \item \texttt{Fundamental Efficacy and Safety} 
+                        \texttt{Is likely safe and effective (Decision to proceed with Phase III trial)}
                        $\rightarrow$ 
-                        \texttt{Previously Observed Efficacy and Safety}:
-                        This relationship represents the measurements of
-                        safety and efficacy in work prior to the current trial. 
-                    \item 
-                        \texttt{Previously Observed Efficacy and Safety}:
+                        \texttt{Enrollment Process Parameters}
                        $\rightarrow$ 
-                        \texttt{Decision to proceed with Phase III}:
-                        Previously observed data is essential to the FDA's 
-                        decision to allow a phase III trial. 
+                        \texttt{Enrollment Levels (Enrollment Status)}
                \end{enumerate}
            \item 
-                Backdoor Pathway from \texttt{Market Status} 
-                to \texttt{Enrollment} 
-                through \texttt{Population}. 
+                Backdoor Pathways through \texttt{Population} and 
+                \texttt{Market Conditions} 
                The concern with this pathway is that the rate of enrollment, and
                thus the enrollment status, is affected by the Population with 
-                the disease. 
-                Additionally, there is a concern that the number of competitors
-                is driven by the total market size.
-                Thus adding Population to the adjustment set is necessary.
+                the disease and the market condition. 
                \begin{enumerate}
                    \item 
-                        \texttt{Population} 
+                        \texttt{Will Terminate?}
+                        $\leftarrow$ 
+                        \texttt{Market Conditions} 
+                        $\rightarrow$
+                        \texttt{Enrollment Process Parameters}
                        $\rightarrow$ 
-                        \texttt{Enrollment Status}:
-                        This is fairly straightforward. 
-                        How easy it is to enroll participants depends in part  
-                        on how many people have the disease.
+                        \texttt{Enrollment Levels (Enrollment Status)}
                    \item 
+                        \texttt{Will Terminate?}
+                        $\leftarrow$ 
+                        \texttt{Market Conditions} 
+                        $\leftrightarrow$ 
                        \texttt{Population} 
+                        $\rightarrow$
+                        \texttt{Enrollment Process Parameters}
                        $\rightarrow$ 
-                        \texttt{Market Measures}:
-                        This assumes that the population effect flows only one
-                        direction, i.e. that a large population size increases
-                        the likelihood of a large number of drugs. 
-                        %TODO: Think about this one a bit because it does mess
-                        % with identification, particularly of market effects. 
-                        % these two are jointly determined per cerda 2007.
-                        % If I can't justify separating them, then I'll need to 
-                        % merge population (market size) and market measures (drugs on market). 
+                        \texttt{Enrollment Levels (Enrollment Status)}
+                \end{enumerate}
+            \item Backdoor Pathway through 
+                \texttt{Elapsed Duration}.
+                \begin{enumerate}
+                    \item
+                    \texttt{Will Terminate?}
+                    $\leftarrow$ 
+                    \texttt{Elapsed Duration} 
+                    $\leftrightarrow$ 
+                    \texttt{Enrollment Levels (Enrollment Status)}
                \end{enumerate}
-            \item 
-                \texttt{Market Measures} 
-                $\rightarrow$ 
-                \texttt{Enrollment Status}:
-                This confounds the estimation of the effect of 
-                \texttt{Enrollment} on \texttt{Will Terminate?}, and 
-                so \texttt{Market Measures} is part of the adjustment set.
-            \item 
-                \texttt{Market Measures} 
-                $\rightarrow$ 
-                \texttt{Decision to proceed with Phase III}:
-                The alternative treatments on the market will affect a sponsors'
-                decision to move forward with a Phase III trial.
-                This is controlled for by only working with trials that 
-                successfully begin recruitment for a Phase III Trial.
-            \item 
-                \texttt{Elapsed Duration} 
-                $\rightarrow$ 
-                \texttt{Will Terminate?}:
-                The amount of time past helps drive the decision to continue
-                or terminate.
-            \item 
-                \texttt{Enrollment Status} 
-                $\leftrightarrow$ 
-                \texttt{Elapsed Duration}:
-                % This is jointly determined. and the weakest part of the causal identification without an accurate model of enrollment.
-                This is one of the weakest parts of the causal inference. 
-                Without a well defined model of enrollment, we can't separate
-                the interaction between the enrollment status and the elapsed
-                duration. 
-                For example, if enrollment is running slower than expected,
-                the trial may be terminated due to concerns that it will not
-                achive the primary objectives or that costs will exceed 
-                the budget allocated to the project.
-            \item 
-                \texttt{Decision to Proceed with Phase III} 
-                $\rightarrow$ 
-                \texttt{Will Terminate?}:
-                %obviously required. Maybe remove from listing and graph?
-                This effect is fairly straightforward, in that 
-                there is no possibility of a termination or completion
-                if the trial does not start. 
-                This is here to block a backdoor pathway between 
-                \texttt{Will Terminate?} and the enrollment status
-                through \texttt{Previously observed Safety and Efficacy}.
        \end{enumerate}
 \end{itemize}
+
+
 \end{document}
--- a/Paper/sections/11_intro_and_lit.tex
+++ b/Paper/sections/11_intro_and_lit.tex
@ -3,39 +3,81 @@

 \begin{document}

-In 1938 President Franklin D Rosevelt signed the Food, Drug, and Cosmetic Act,
-granting the Food and Drug Administration (FDA) authority to require 
-pre-market approval of pharmaceuticals. 
-\cite{commissioner_milestonesusfood_2023}
-As of Sept 2022 \todo{Check Date} they have approved 6,602 currently-marketed 
-compounds with Structured Product Labels (SPLs) 
-and 10,983 previously-marketed SPLs
-\cite{commissioner_nsde_2024},
-%from nsde table. Get number of unique application_nubmers_or_citations with most recent end date as null.
-In 1999, they began requiring that drug developers register and 
-publish clinical trials on \url{https://clinicaltrials.gov}.
-This provides a public mechanism where clinical trial sponsors are 
-responsible to explain what they are trying to acheive and how it will be 
-measured, as well as provide the public the ability to search and find trials 
-that they might enroll in.
-Multiple derived datasets such as the Cortellis Investigational Drugs dataset 
-or the AACT dataset from the Clinical Trials Transformation Intiative
-integrate these data. 
-This brings up a question: 
-Can we use this public data on clinical trials to identify what effects the 
-success or failure of trials?
-In this work, I use updates to records on 
-\url{https://ClinicalTrials.gov} 
-to do exactly that, disentangle the effect of participant enrollment 
-and competing drugs on the market affect the success or failure of 
-clinical trials.
+In 1938, President Franklin D. 
+Roosevelt signed the Food, Drug, and Cosmetic Act, establishing the Food and
+Drug Administration's (FDA) authority to require pre-market approval of
+pharmaceuticals [Com14]. 
+This created a regulatory framework where pharmaceutical companies must
+demonstrate safety and efficacy through clinical trials before bringing drugs
+to market. 
+The costs of these trials - both in time and money - form a significant barrier
+to entry in pharmaceutical markets. 
+Understanding what causes clinical trials to fail is therefore crucial to 
+predict the impact of policies, intended or unintended.
+
+Existing research has examined how drugs progress through development
+pipelines, but we know relatively little about the relative contribution of different
+challenges to the early termination of clinical trials. 
+%HWANG et al do discuss a few different reasons
+When a trial terminates early due to operational challenges rather than safety
+or efficacy concerns, potentially effective treatments may be delayed or
+abandoned entirely.
+%Example of GLP-1s
+
+This paper provides the first empirical framework to separate 
+market-driven and safety/efficacy based terminations from 
+one form of operational failure 
+-- enrollment challenges -- 
+in Phase III clinical trials. 
+Using a novel dataset constructed from administrative data registered on 
+ClinicalTrials.gov, I exploit variation in enrollment timing and market
+conditions to identify how extending the enrollment period affects trial completion. 
+Specifically, I answer the question:
+\textit{
+    ``How does the probability of trial termination change 
+    when the enrollment period is extended?''
+}
+This approach differs from previous work that focuses for the most part 
+on the drug development
+pipeline and progression between clinical trial phases.
+
+
+
+% In 1938 President Franklin D Rosevelt signed the Food, Drug, and Cosmetic Act,
+% granting the Food and Drug Administration (FDA) authority to require 
+% pre-market approval of pharmaceuticals. 
+% \cite{commissioner_milestonesusfood_2023}
+% As of Sept 2022 \todo{Check Date} they have approved 6,602 currently-marketed 
+% compounds with Structured Product Labels (SPLs) 
+% and 10,983 previously-marketed SPLs
+% \cite{commissioner_nsde_2024},
+% %from nsde table. Get number of unique application_nubmers_or_citations with most recent end date as null.
+% In 1999, they began requiring that drug developers register and 
+% publish clinical trials on \url{https://clinicaltrials.gov}.
+% This provides a public mechanism where clinical trial sponsors are 
+% responsible to explain what they are trying to acheive and how it will be 
+% measured, as well as provide the public the ability to search and find trials 
+% that they might enroll in.
+% Multiple derived datasets such as the Cortellis Investigational Drugs dataset 
+% or the AACT dataset from the Clinical Trials Transformation Intiative
+% integrate these data. 
+% This brings up a question: 
+% Can we use this public data on clinical trials to identify what effects the 
+% success or failure of trials?
+% In this work, I use updates to records on 
+% \url{https://ClinicalTrials.gov} 
+% to do exactly that, disentangle the effect of participant enrollment 
+% and competing drugs on the market affect the success or failure of 
+% clinical trials.
+
+\subsection{Background}

 %Describe how clinical trials fit into the drug development landscape and how they proceed
 Clinical trials are a required part of drug development.
 Not only does the FDA require that a series of clinical trials demonstrate sufficient safety and efficacy of
 a novel pharmaceutical compound or device, producers of derivative medicines may be required to ensure that
 their generic small molecule compound -- such as ibuprofen or levothyroxine -- matches the
-performance of the originiator drug if delivery or dosage is changed.
+performance of the originator drug if delivery or dosage is changed.
 For large molecule generics (termed biosimilars) such as Adalimumab
 (Brand name Humira, with biosimilars Abrilada, Amjevita, Cyltezo, Hadlima, Hulio,
 Hyrimoz, Idacio, Simlandi, Yuflyma, and Yusimry),
@ -51,45 +93,44 @@ reference drug.
 % Introduce my work

 In the world of drug development, these trials are classified into different 
-phases of development.
+phases of development\footnote{
 \cite{anderson_fdadrugapproval_2022}
 provide an overview of this process
 while
 \cite{commissioner_drugdevelopmentprocess_2020}
-describes the actual details.
-Pre-clinical studies primarily establish toxicity and potential dosing levels 
-\cite{commissioner_drugdevelopmentprocess_2020}.
+describes the process in detail.}.
+Pre-clinical studies primarily establish toxicity and potential dosing levels.
+% \cite{commissioner_drugdevelopmentprocess_2020}.
 Phase I trials are the first attempt to evaluate safety and efficacy in humans. 
-Participants typically are heathy individuals, and they measure how the drug 
+Participants typically are healthy individuals, and they measure how the drug 
 affects healthy bodies, potential side effects, and adjust dosing levels. 
 Sample sizes are often less than 100 participants. 
-\cite{commissioner_drugdevelopmentprocess_2020}.
+% \cite{commissioner_drugdevelopmentprocess_2020}.
 Phase II trials typically involve a few hundred participants and is where 
 investigators will dial in dosing, research methods, and safety.
-\cite{commissioner_drugdevelopmentprocess_2020}.
-A Phase III trial is the final trial befor approval by the FDA, and is where 
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+A Phase III trial is the final trial before approval by the FDA, and is where 
 the investigator must demonstrate safety and efficacy with a large number of 
 participants, usually on the order of hundreds or thousands.
-\cite{commissioner_drugdevelopmentprocess_2020}.
-Occassionally, a trial will be a multiphase trial, covering aspects of either
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+Occasionally, a trial will be a multi-phase trial, covering aspects of either
 Phases I and II or Phases II and III. 
-
-
 After a successful Phase III trial, the sponsor will decide whether or not 
 to submit an application for approval from the FDA. 
 Before filing this application, the developer must have completed 
-"two large, controlled clinical trials."
-\cite{commissioner_drugdevelopmentprocess_2020}.
-Phase IV trials are used after the drug has recieved marketing approval to 
+``two large, controlled clinical trials.''
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+Phase IV trials are used after the drug has received marketing approval to 
 validate safety and efficacy in the general populace.
-Throughout this whole process, the FDA is available to assist in decisionmaking
-regarding topics such as study design, document review, and whether or not
+Throughout this whole process, the FDA is available to assist in decision-making
+regarding topics such as study design, document review, and whether
 they should terminate the trial. 
 The FDA also reserves the right to place a hold on the clinical trial for 
 safety or other operational concerns, although this is rare. 
 \cite{commissioner_drugdevelopmentprocess_2020}.

-In the economics literature, most of the focus has been on evaluating how 
+
+In the economics literature, most of the focus has been on describing how 
 drug candidates transition between different phases and their probability 
 of final approval.
 % Lead into lit review
@ -111,9 +152,9 @@ Continuing on this theme,
 %DiMasi FeldmanSeckler Wilson 2009
 \authorcite{dimasi_trendsrisksassociated_2010}
 examine the completion rate of clinical drug 
-develompent and find that for the 50 largest drug producers, 
+development and find that for the 50 largest drug producers, 
 approximately 19\% of their drugs under development between 1993 and 2004
-successfully moved from Phase I to recieving an New Drug Application (NDA) 
+successfully moved from Phase I to receiving an New Drug Application (NDA) 
 or Biologics License Application (BLA). 
 They note a couple of changes in how drugs are developed over the years they 
 study, most notably that
@ -130,20 +171,14 @@ He estimates that reducing Phase III of clinical trials by one year would
 reduce total costs by about 8.9\% and that moving 5\% of clinical trial failures
 from phase III to Phase II would reduce out of pocket costs by 5.6\%. 

-Like much of the work in this field, the focus of the the work by 
-\authorcite{dimasi_valueimprovingproductivity_2002}
-and
-\authorcite{dimasi_trendsrisksassociated_2010}
-tends to be on the drug development pipeline, i.e. the progression between 
-phases and towards marketing approval. 
 A key contribution to this drug development literature is the work by 
 \authorcite{khmelnitskaya_competitionattritiondrug_2021}
-on a causal identification strategy
+who created a causal identification strategy
 to disentangle strategic exits from exits due to clinical failures 
 in the drug development pipeline.
 She found that overall 8.4\% of all pipeline exits are due to strategic 
 terminations and that the rate of new drug production would be about 23\% 
-higher if those strategic terminatations were elimintated.
+higher if those strategic terminatations were eliminated.

 The work that is closest to mine is the work by 
 \authorcite{hwang_failureinvestigationaldrugs_2016}
@ -152,59 +187,51 @@ clinical trials fail -- with a focus on trials in the USA,
 Europe, Japan, Canada, and Australia. 
 They identified 640 novel therapies and then studied each therapy's 
 development history, as outlined in commercial datasets.
-They found that for late stage trials that did not go on to recieve approval,
+They found that for late stage trials that did not go on to receive approval,
 57\% failed on efficacy grounds, 17\% failed on safety grounds, and 22\% failed
 on commercial or other grounds.

-% Begin Discussing what I do. Then introduce 
-Unlike the majority of the literature, I focus on the progress of 
-individual clinical trials, not on the drug development pipeline. 
-In both 
-\authorcite{khmelnitskaya_competitionattritiondrug_2021}
+Unfortunately the work of both 
+\authorcite{hwang_failureinvestigationaldrugs_2016}
 and
+\authorcite{khmelnitskaya_competitionattritiondrug_2021}
+ignore a potentially large cause of failures: operational challenges, i.e. when
+issues running or funding the trial cause it to fail before achieving its 
+primary objective.
+In a personal review of 199 randomly selected clinical trials which terminated
+before achieving their primary objective,
+I found that 
+14.5\% cited safety or efficacy concerns, 
+9.1\% cited funding problems (an operational concern),
+and 
+31\% cited enrollment issues (a separate operational concern)\footnote{
+Note that these figures differ from 
 \authorcite{hwang_failureinvestigationaldrugs_2016}
-the authors describe failures due to safety, efficacy, or strategic concerns.
-There is another category of concerns that arise for individual clinical trials,
-that of operational failures. 
-Operational failures can arise when a trial struggles to recruit participants, 
-the principal investigator or other key member leaves for another opportunity,
-or other studies prove that the trial requires a protocol change. 
+because I sampled from all stages of trials, not just Phase III trials
+focused on drug development.
+}.

-% In a personal review of 199 randomly selected clinical trials from the AACT
-% database, the 
-% \begin{table}
-%     \caption{}\label{tab:}
-%     \begin{center}
-%         \begin{tabular}[c]{|l|l|}
-%             \hline
-%             Reason & Percentage Mentioned \\
-%             \hline
-%             Safety or Efficacy & 14.5\% \\
-%             Funding Problems & 9.1\% \\
-%             Enrollment Issues & 31\% \\
-%             \hline
-%         \end{tabular}
-%     \end{center}
-% \end{table}



-This paper proposes the first model to separate the causal effects of 
+The main contribution of this work is the model I develop to separate 
+the causal effects of 
 market conditions (a strategic concern) from the effects of 
 participant enrollment (an operational concern) on Phase III Clinical trials. 
-This allows me to answer the question
+This allows me to answer the question posed earlier:
 \textit{
    ``How does the probability of trial termination change 
    when the enrollment period is extended?''
 }
 using administrative data.
+
+
 To understand how I do this, we'll cover some background information on 
 clinical trials and the administrative data I collected in section 
 \ref{SEC:ClinicalTrials}, 
-explain the approach to causal identification strategy and the required data in section 
-\ref{SEC:Data}, 
+explain the approach to causal identification, the required data,
 and describe how the data used matches these requirements in section 
-\ref{SEC:}. 
+\ref{SEC:CausalAndData}. 
 Then we'll cover the econometric model 
 (section \ref{SEC:EconometricModel}) 
 and results (section 
@ -212,7 +239,7 @@ and results (section
 Finally, we acknowledge deficiencies in the analysis and potential improvements
 in section 
 \ref{SEC:Improvements},
-then summarize everything in the conclusion \ref{SEC:Conclusion}
+then end with my thoughts in the conclusion \ref{SEC:Conclusion}

 % \subsection{Market incentives and drug development}
 % %%%%%%%%% What do we know about drug development incentives?
--- a/Paper/sections/11_intro_and_lit.tex.bak
+++ b/Paper/sections/11_intro_and_lit.tex.bak
@ -0,0 +1,352 @@
+\documentclass[../Main.tex]{subfiles}
+\graphicspath{{\subfix{Assets/img/}}}
+
+\begin{document}
+
+In 1938, President Franklin D. 
+Roosevelt signed the Food, Drug, and Cosmetic Act, establishing the Food and
+Drug Administration's (FDA) authority to require pre-market approval of
+pharmaceuticals [Com14]. 
+This created a regulatory framework where pharmaceutical companies must
+demonstrate safety and efficacy through clinical trials before bringing drugs
+to market. 
+The costs of these trials - both in time and money - form a significant barrier
+to entry in pharmaceutical markets. 
+Understanding what causes clinical trials to fail is therefore crucial to 
+predict the impact of policies, intended or unintended.
+
+Existing research has examined how drugs progress through development
+pipelines, but we know relatively little about the relative contribution of different
+challenges to the early termination of clinical trials. 
+%HWANG et al do discuss a few different reasons
+When a trial terminates early due to operational challenges rather than safety
+or efficacy concerns, potentially effective treatments may be delayed or
+abandoned entirely.
+%Example of GLP-1s
+
+This paper provides the first empirical framework to separate 
+market-driven and safety/efficacy based terminations from 
+one form of operational failure 
+-- enrollment challenges -- 
+in Phase III clinical trials. 
+Using a novel dataset constructed from administrative data registered on 
+ClinicalTrials.gov, I exploit variation in enrollment timing and market
+conditions to identify how extending the enrollment period affects trial completion. 
+Specifically, I answer the question:
+\textit{
+    ``How does the probability of trial termination change 
+    when the enrollment period is extended?''
+}
+This approach differs from previous work that focuses for the most part 
+on the drug development
+pipeline and progression between clinical trial phases.
+
+
+
+% In 1938 President Franklin D Rosevelt signed the Food, Drug, and Cosmetic Act,
+% granting the Food and Drug Administration (FDA) authority to require 
+% pre-market approval of pharmaceuticals. 
+% \cite{commissioner_milestonesusfood_2023}
+% As of Sept 2022 \todo{Check Date} they have approved 6,602 currently-marketed 
+% compounds with Structured Product Labels (SPLs) 
+% and 10,983 previously-marketed SPLs
+% \cite{commissioner_nsde_2024},
+% %from nsde table. Get number of unique application_nubmers_or_citations with most recent end date as null.
+% In 1999, they began requiring that drug developers register and 
+% publish clinical trials on \url{https://clinicaltrials.gov}.
+% This provides a public mechanism where clinical trial sponsors are 
+% responsible to explain what they are trying to acheive and how it will be 
+% measured, as well as provide the public the ability to search and find trials 
+% that they might enroll in.
+% Multiple derived datasets such as the Cortellis Investigational Drugs dataset 
+% or the AACT dataset from the Clinical Trials Transformation Intiative
+% integrate these data. 
+% This brings up a question: 
+% Can we use this public data on clinical trials to identify what effects the 
+% success or failure of trials?
+% In this work, I use updates to records on 
+% \url{https://ClinicalTrials.gov} 
+% to do exactly that, disentangle the effect of participant enrollment 
+% and competing drugs on the market affect the success or failure of 
+% clinical trials.
+
+\subsection{Background}
+
+%Describe how clinical trials fit into the drug development landscape and how they proceed
+Clinical trials are a required part of drug development.
+Not only does the FDA require that a series of clinical trials demonstrate sufficient safety and efficacy of
+a novel pharmaceutical compound or device, producers of derivative medicines may be required to ensure that
+their generic small molecule compound -- such as ibuprofen or levothyroxine -- matches the
+performance of the originiator drug if delivery or dosage is changed.
+For large molecule generics (termed biosimilars) such as Adalimumab
+(Brand name Humira, with biosimilars Abrilada, Amjevita, Cyltezo, Hadlima, Hulio,
+Hyrimoz, Idacio, Simlandi, Yuflyma, and Yusimry),
+the biosimilars are required to prove they have similar efficacy and safety to the
+reference drug.
+
+%TODO? Decide whether to include this or not
+%When registering these clinical trials
+% discuss how these are registered and what data is published.
+% Include image and discuss stages
+% Discuss challenges faced
+
+% Introduce my work
+
+In the world of drug development, these trials are classified into different 
+phases of development\footnote{
+\cite{anderson_fdadrugapproval_2022}
+provide an overview of this process
+while
+\cite{commissioner_drugdevelopmentprocess_2020}
+describes the process in detail.}.
+Pre-clinical studies primarily establish toxicity and potential dosing levels.
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+Phase I trials are the first attempt to evaluate safety and efficacy in humans. 
+Participants typically are healthy individuals, and they measure how the drug 
+affects healthy bodies, potential side effects, and adjust dosing levels. 
+Sample sizes are often less than 100 participants. 
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+Phase II trials typically involve a few hundred participants and is where 
+investigators will dial in dosing, research methods, and safety.
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+A Phase III trial is the final trial before approval by the FDA, and is where 
+the investigator must demonstrate safety and efficacy with a large number of 
+participants, usually on the order of hundreds or thousands.
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+Occassionally, a trial will be a multiphase trial, covering aspects of either
+Phases I and II or Phases II and III. 
+After a successful Phase III trial, the sponsor will decide whether or not 
+to submit an application for approval from the FDA. 
+Before filing this application, the developer must have completed 
+``two large, controlled clinical trials.''
+% \cite{commissioner_drugdevelopmentprocess_2020}.
+Phase IV trials are used after the drug has received marketing approval to 
+validate safety and efficacy in the general populace.
+Throughout this whole process, the FDA is available to assist in decision-making
+regarding topics such as study design, document review, and whether
+they should terminate the trial. 
+The FDA also reserves the right to place a hold on the clinical trial for 
+safety or other operational concerns, although this is rare. 
+\cite{commissioner_drugdevelopmentprocess_2020}.
+
+
+In the economics literature, most of the focus has been on describing how 
+drug candidates transition between different phases and their probability 
+of final approval.
+% Lead into lit review
+% Abrantes-Metz, Adams, Metz (2004)
+\authorcite{abrantes-metz_pharmaceuticaldevelopmentphases_2004}
+described the relationship between
+various drug characteristics and how the drug progressed through clinical trials.
+% This descriptive estimate was notable for using a 
+% mixed state proportional hazard model and estimating the impact of 
+% observed characteristics in each of the three phases.
+They found that as Phase I and II trials last longer, 
+the rate of failure increases. 
+In contrast, Phase 3 trials generally have a higher rate of 
+success than failure after 91 months.
+This may be due to the fact that the purpose of Phases I and II are different
+from the purpose of Phase III.
+
+Continuing on this theme,
+%DiMasi FeldmanSeckler Wilson 2009
+\authorcite{dimasi_trendsrisksassociated_2010}
+examine the completion rate of clinical drug 
+develompent and find that for the 50 largest drug producers, 
+approximately 19\% of their drugs under development between 1993 and 2004
+successfully moved from Phase I to recieving an New Drug Application (NDA) 
+or Biologics License Application (BLA). 
+They note a couple of changes in how drugs are developed over the years they 
+study, most notably that
+drugs began to fail earlier in their development cycle in the 
+latter half of the time they studied. 
+They note that this may reduce the cost of new drugs by eliminating late 
+and costly failures in the development pipeline.
+
+Earlier work by 
+\authorcite{dimasi_valueimprovingproductivity_2002}
+used data on 68 investigational drugs from 10 firms to simulate how reducing
+time in development reduces the costs of developing drugs. 
+He estimates that reducing Phase III of clinical trials by one year would 
+reduce total costs by about 8.9\% and that moving 5\% of clinical trial failures
+from phase III to Phase II would reduce out of pocket costs by 5.6\%. 
+
+A key contribution to this drug development literature is the work by 
+\authorcite{khmelnitskaya_competitionattritiondrug_2021}
+who created a causal identification strategy
+to disentangle strategic exits from exits due to clinical failures 
+in the drug development pipeline.
+She found that overall 8.4\% of all pipeline exits are due to strategic 
+terminations and that the rate of new drug production would be about 23\% 
+higher if those strategic terminatations were elimintated.
+
+The work that is closest to mine is the work by 
+\authorcite{hwang_failureinvestigationaldrugs_2016}
+who investigated causes for which late stage (Phase III)
+clinical trials fail -- with a focus on trials in the USA, 
+Europe, Japan, Canada, and Australia. 
+They identified 640 novel therapies and then studied each therapy's 
+development history, as outlined in commercial datasets.
+They found that for late stage trials that did not go on to recieve approval,
+57\% failed on efficacy grounds, 17\% failed on safety grounds, and 22\% failed
+on commercial or other grounds.
+
+Unfortunately the work of both 
+\authorcite{hwang_failureinvestigationaldrugs_2016}
+and
+\authorcite{khmelnitskaya_competitionattritiondrug_2021}
+ignore a potentially large cause of failures: operational challenges, i.e. when
+issues running or funding the trial cause it to fail before achieving its 
+primary objective.
+In a personal review of 199 randomly selected clinical trials which terminated
+before achieving their primary objective,
+I found that 
+14.5\% cited safety or efficacy concerns, 
+9.1\% cited funding problems (an operational concern),
+and 
+31\% cited enrollment issues (a separate operational concern)\footnote{
+Note that these figures differ from 
+\authorcite{hwang_failureinvestigationaldrugs_2016}
+because I sampled from all stages of trials, not just Phase III trials
+focused on drug development.
+}.
+
+
+
+
+The main contribution of this work is the model I develop to separate 
+the causal effects of 
+market conditions (a strategic concern) from the effects of 
+participant enrollment (an operational concern) on Phase III Clinical trials. 
+This allows me to answer the question posed earlier:
+\textit{
+    ``How does the probability of trial termination change 
+    when the enrollment period is extended?''
+}
+using administrative data.
+
+
+To understand how I do this, we'll cover some background information on 
+clinical trials and the administrative data I collected in section 
+\ref{SEC:ClinicalTrials}, 
+explain the approach to causal identification, the required data,
+and describe how the data used matches these requirements in section 
+\ref{SEC:CausalAndData}. 
+Then we'll cover the econometric model 
+(section \ref{SEC:EconometricModel}) 
+and results (section 
+\ref{SEC:Results}). 
+Finally, we acknowledge deficiencies in the analysis and potential improvements
+in section 
+\ref{SEC:Improvements},
+then end with my thoughts in the conclusion \ref{SEC:Conclusion}
+
+% \subsection{Market incentives and drug development}
+% %%%%%%%%% What do we know about drug development incentives?
+%
+% \cite{dranove_DoesConsumer_2022} use the implementation of Medicare part D 
+% to examine whether the production of novel or follow up drugs increases during 
+% the following 15 years. 
+% They find that when Medicare part D was implemented -- increasing senior 
+% citizens' ability to pay for drugs -- there was a (delayed) increase 
+% in drug development, with effects concentrated among compounds that were least
+% innovative according to their classification of innovations.
+% They suggest that this is due to financial risk management, as novel 
+% pharmaceuticals have a higher probability of failure compared to the less novel
+% follow up development. 
+% This is what leads risk-adverse companies to prefer follow up development.
+%
+%
+% % Acemoglu and Linn
+% % - Market size in innovation
+% % - Exogenous demographic trends has a large impact on the entry of non-generic drugs and new molecular entitites.
+% On the side of market analysis, 
+% \citeauthor{acemoglu_market_2004} 
+% (\citeyear{acemoglu_market_2004})
+% used exogenous deomographics changes to show that the
+% entry of novel compounds is highly driven by the underlying aged population.
+% They estimate that a 1\% increase in applicable demographics increase the
+% entry of new drugs by 6\%, mostly concentrated among generics.
+% Among non-generics, a 1\% increase in potential market size 
+% (as measured by demographic groups) leads to a 4\% increase in novel therapies.
+%
+% % Gupta
+% % - Inperfect intellectual property rights in the pharmaceutical industry
+% \cite{gupta_OneProduct_2020} discovered that uncertainty around which patents
+% might apply to a novel drug causes a delay in the entry of generics after 
+% the primary patent has expired. 
+% She found that this delay in delivery is around 3 years. 
+%
+% % Agarwal and Gaule 2022
+% % - Retrospective on impact from COVID-19 pandemic
+% % Not in this version
+%
+% \subsection{Understanding Failures in Drug Development}
+%
+% % DISCUSS: Different types of failures
+% There are myriad of reasons that a drug candidate may not make it to market, 
+% regardless of it's novelty or known safety.
+% In this work, I focus on the failure of individual clinical trials, but the 
+% categories of failure apply to the individual trials as well as the entire
+% drug development pipeline.
+% They generally fall into one of the following categories:
+% \begin{itemize}
+%     \item Scientific  Failure: When there are issues regarding 
+%         safety and efficacy that must be addressed. 
+%         The preeminient question is: 
+%         ``Will the drug work for patients?''
+%         %E.Khm, Gupta, etc.
+%     \item Strategic Failure: When the sponsors stop development because of 
+%         profitability
+%         %Whether or not the drug will be profitiable, or align with
+%         %the drug developer's future Research \& Development directions i.e.
+%         ``Will producing the drug be beneficial to the 
+%         company in the long term?''
+%         %E.Khm, Gupta, GLP-1s, etc.
+%     \item Operational concerns are answers to: 
+%         %Whether or not the developer can successfully conduct
+%         %operations to meet scientific or strategic goals, i.e. 
+%         ``What has prevented the the company from being able to 
+%         finance, develop, produce, and market the drug?''
+% \end{itemize}
+% It is likely that a drug fails to complete the development cycle due to some
+% combination of these factors. 
+%
+%
+% %USE MetaBio/CalBio GLP-1 story to illuistrate these different factors.
+% \cite{flier_DrugDevelopment_2024} documents the case of MetaBio, a company
+% he was involved in founding that was in the first stages of
+% developing a GLP-1 based drug for diabetes or obesety before being shut down
+% in . 
+% MetaBio was a wholy owned subsidiary of CalBio, a metabolic drug development 
+% firm, that recieved a \$30 million -- 5 year investment from Pfizer to 
+% persue development of GLP-1 based therapies. 
+% At the time it was shut down, it faced a few challenges:
+% \begin{itemize}
+%     \item The compound had a short half life and they were seeking methods to 
+%         improve it's effectiveness; a scientific failure.
+%     \item Pfizer imposed a requirement that it be delivered though a route
+%         other than injection (the known delivery mechanism); a strategic failure. 
+%     \item When Pfizer pulled the plug, CalBio closed MetaBio because they 
+%         could not find other funding sources; an operational failure.
+% \end{itemize}
+%
+% The author states in his conclusion:
+% \begin{displayquote}
+%     Despite every possibility of success, 
+%     MetaBio went down because there were mistaken ideas about what was 
+%     possible and what was not in the realm of metabolic therapeutics, and 
+%     because proper corporate structure and adequate capital are always 
+%     issues when attempting to survive predictable setbacks.
+% \end{displayquote} 
+%
+% From this we see that there was a cascade of issues leading to the failure to 
+% develop this novel drug. 
+%
+%
+% % I don't think I need to include modelling enrollment here. 
+% % If it is applicable, it can show up in those sections later.
+%
+%
+
+\end{document}
--- a/Paper/sections/12_clinical_trial_background.tex
+++ b/Paper/sections/12_clinical_trial_background.tex
@ -14,101 +14,181 @@
 %   - 
 %   - 

-To understand how my administrative clinical trial data is obtained
-and what it can be used for, 
-let's take a look at how trial investigators record data on 
-\url{ClinicalTrials.gov} operate.
-Figure \ref{Fig:Stages} illuistrates the process I describe below.
-During the Pre-Trial period the trial investigators will design the trial, 
-choose primary and secondary objectives, 
-and decide on how many participants they need to enroll. 
-Once they have decided on these details, they post the trial to \url{ClinicalTrials.com}
-and decide on a date to begin enrolling trial participants.
-If the investigators decide to not continue with the trial before enrolling any participants,
-the trial is marked as ``Withdrawn''. 
-On the other hand, if they begin enrolling participants, there are two methods to do so.
-The first is to enter a general ``Recruiting'' state, where patients attempt to enroll.
-The second is to enter an "Enrollment by invitation only" state.
-After a trial has enrolled their participants, they wil typically move to an 
-"Active, not recruiting" state to inform potential participants that they are
-not recruiting. 
+To understand why clinical trials succeed or fail requires understanding how 
+they operate and how their progress is documented. 
+The primary source of this operational data is ClinicalTrials.gov, where 
+investigators record key information about their trials' status and progression. 
+To understand how my administrative data captures trial progression, we'll 
+examine how investigators document their trials' states and transitions.
+Figure \ref{Fig:Stages} is a flowchart of definitions of the different states 
+that a trial can take and the decisions leading to each. 
+It also describes the knowledge obtained by the study operator
+and how that influences further decisions.
+The states are standardized and defined by the National Library of Medicine
+\cite{usnlm_protocolregistrationdata_2024-06-17}.
+During the prior to a study, the trial investigators will design the trial, 
+choose primary and secondary objectives,  and decide on how many participants
+they need to enroll. 
+ Once they have decided on these details, they post the trial to
+\url{ClinicalTrials.com} and decide on a date to begin enrolling trial
+participants. 
+% If the investigators decide to not continue with the trial before enrolling any
+% participants, the trial is marked as ``Withdrawn''. 
+% If they begin enrolling participants, there are two methods to do so.
+% The first is to enter an "Enrollment by invitation only" state where the 
+% trial operators extend invitations through their own connections to doctors 
+% and patients they are working with.
+% The second is to enter a general ``Recruiting'' state, where participants apply 
+% to join the trial, and the sponsoring organization may extend invitations as 
+% before.
+After a trial has enrolled enough participants, the sponsor will  move to an 
+"Active, not recruiting" state to inform potential participants that they have
+recruiting. 
+During this time, the trial operators continue monitoring participants for 
+adverse events and tracking their disease severity and compliance with treatment. 
 Finally, when the investigators have obtained enough data to achieve their primary
-objective, the clinical trial will be closed, and marked as ``Completed'' in
+objective, the clinical trial will be closed and marked as ``Completed'' in
 \url{ClinicalTrials.gov}
 If the trial is closed before achieving the primary objective, the trial is 
 marked as ``Terminated'' on
 \url{ClinicalTrials.gov}.
-
+Trials can be terminated because safety or efficacy evidence suggested it was 
+not worth continuing, enrollment rates were too low to achieve the primary
+objective within time and budget contstraints.

 \begin{figure}%[H] %use [H] to fix the figure here.
    \includegraphics[width=\textwidth]{../assets/img/ClinicalTrialStagesAndStatuses}
    \par \small 
        Diamonds represent decision points while
-	Squares represent states of the clinical trial and Rhombuses represend data obtained by the trial.
+	Squares represent states of the clinical trial and Rhombuses represent data obtained by the trial.
    \caption[Clinical Trial Stages and Progression]{Clinical Trial Stages and Progression}
    \label{Fig:Stages}
 \end{figure}

-Note the information we obtain about the trial from the final status: 
-``Withdrawn'', ``Terminated'', or ``Completed''.
-Although 
-\cite{khmelnitskaya_competitionattritiondrug_2021}
-describes a clinical failure due to safety or efficacy as a 
-\textit{scientific} failure, it is better described as a compound failure.
-Discovering that a compound doesn't work as hoped is not a failure but the whole
-purpose of the clinical trials process. 
-On the other hand, when a trial terminates early due to reasons 
-other than safety or efficacy concerns, the trial operator does not learn
-if the drug is effective or safe. 
-This is a knowledge-gathering failure where the trial operator 
-did not learn if the drug was effective or not.
-I prefer describing a clinical trial as being terminated for 
-\begin{itemize}
-    \item Safety or Efficacy concerns
-    \item Strategic concerns
-    \item Operational concerns.
-\end{itemize}
-
-Unfortunately it can be difficult to know why a given trial was terminated,
-in spite of the fact that upon termination, trials typically record a 
-description of \textit{a single} reason for the clinical trial termination. 
-This doesn't necessarily list all the reasons contributing to the trial termination and may not exist for a given trial.
-For example, if a Principle Investigator leaves for another institution 
-(terminating the trial), is this decison affected by 
-a safety or efficacy concern, 
-a new competitor on the market, 
-difficulting recruiting participants,
-or a lack of financial support from the study sponsor? 
-Estimating the impact of different problems that trials face from these 
-low-information, post-hoc signals is insufficient.
-For this reason, I use clinical trial progression to estimate effects. 
-\todo{not sure if this is the best place for this.}
-
+% Note the information we obtain about the trial from the final status: 
+% ``Withdrawn'', ``Terminated'', or ``Completed''.
 As a trial goes through the different stages of recruitment, the investigators
 update the records on ClinicalTrials.gov. 
 Even though there are only a few times that investigators are required 
-to update this information, it tends to be updated somewhat regularly as it is 
-a way to communicate with potential enrollees. 
-When a trial is first posted, it tends to include information
+to update this information, it tends to be updated somewhat regularly during 
+enrollment as it is a way to communicate with potential enrollees. 
+When a trial is first posted, it includes information
 such as planned enrollment, 
 planned end dates, 
 the sites at which it is being conducted, 
 the diseases that it is investigating, 
 the drugs or other treatments that will be used,
-the experimental arms that will be used,
 and who is sponsoring the trial.
 As enrollment is opened and closed and sites are added or removed, 
 investigators will update the status and information
 to help doctors and potential participants understand whether they should apply. 

+When a trial ends, it can end in one of three ways.
+The most desirable outcome is completion, where the trial achieves its 
+primary objective by gathering sufficient data about safety and efficacy.
+However, trials may also end early either through withdrawal 
+(as mentioned previously) 
+or termination. 
+Termination occurs after enrollment has begun but before achieving the 
+primary objective.
+
+Understanding why trials terminate early is the key goal of this work, but
+is not straightforward.
+Terminated trials typically record a 
+description of \textit{a single} reason for the clinical trial termination. 
+This doesn't necessarily list all the reasons contributing to the trial 
+termination and may not exist for a given trial.
+As an example, if a Principal Investigator leaves for another institution 
+(terminating the trial), this decision may be affected by things such as
+a safety or efficacy concern, 
+a new competitor on the market, 
+difficulties recruiting participants,
+or a lack of financial support from the study sponsor.
+In this way, the stated reason may mask the underlying challenges that 
+led to the termination, leaving us to 
+use another way to infer the relative impact of operational difficulties.
+
+
+To better descrobe termination causes, I suggest classifying them into 
+three broad categories. 
+The first category, Safety or Efficacy concerns, occurs when data suggests 
+the treatment is unsafe or unlikely to achieve its therapeutic goals. 
+While Khmelnitskaya 
+\cite{khmelnitskaya_competitionattritiondrug_2021}
+describes these as scientific failures, I contend that they represent successful 
+knowledge gathering - the clinical trial process working as intended to 
+identify ineffective treatments. 
+The second category, Strategic concerns, encompasses business and 
+market-driven decisions such as changes in company priorities or 
+competitive landscape. 
+The final category, Operational concerns, includes practical challenges 
+like insufficient enrollment rates or loss of key personnel. 
+These latter two categories represent true failures of the trial process, 
+as they prevent us from learning whether the treatment would have 
+been safe and effective.
+
+\subsection{Data Summary}
+%% Describe data here
+Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled 
+drugs or devices on human subjects must register 
+their trial at \url{ClinicalTrials.gov}
+(\cite{anderson_fdadrugapproval_2022}).
+This involves submitting information on the expected enrollment and duration of
+trials, drugs or devices that will be used, treatment protocols and study arms, 
+as well as contact information the trial sponsor and treatment sites.

+When starting a new trial, the required information must be submitted 
+``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
+After the initial submission, the data is briefly reviewed for quality and 
+then the trial record is published and the trial is assigned a 
+National Clinical Trial (NCT) identifier.
+(\cite{anderson_fdadrugapproval_2022}).

+Each trial's record is updated periodically, including a final update that must occur 
+within a year of completing the primary objective, although exceptions are
+available for trials related to drug approvals or for trials with secondary
+objectives that require further observation\footnote{This rule came into effect in 2017}
+(\cite{anderson_fdadrugapproval_2022}).
+Other than the requirements for the first and last submissions, all other
+updates occur at the discresion of the trial sponsor.
+Because the ClinicalTrials.gov website serves as a central point of information
+on which trials are active or recruting for a given condition or drug,
+most trials are updated multiple times during their progression.
+
+There are two primary ways to access data about clinical trials.
+The first is to search individual trials on ClinicalTrials.gov with a web browser.
+This web portal shows the current information about the trial and provides 
+access to snapshots of previously submitted information.
+Together, these features fulfill most of the needs of those seeking 
+to join a clinical trial.
+For this project I've been able to scrape these historical records to establish
+snapshots of the records provided.
+%include screenshots?
+The second way to access the data is through a normalized database setup by
+the 
+\href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
+called AACT. %TODO: Get CITATION
+The AACT database is available as a PostgreSQL database dump or set of 
+flat-files. 
+These dumps match a near-current version of the ClinicalTrials.gov database.
+This format is ameniable to large scale analysis, but does not contain 
+information about the past state of trials.
+I combined these two sources, using the AACT dataset to select 
+trials of interest and then scraping \url{ClinicalTrials.gov} to get 
+a timeline of each trial.
+
+%%%%%%%%%%%%%%%%%%%%%%%% Model Outline
+
+The way I use this data is to predict the final status of the trial 
+from the snapshots that were taken, in effect asking:
+``how does the probability of a termination change from the current state 
+of the trial if X changes?''
 % - 
 % - 
 % - 
 % - 
 % - 
 % - 
-
+%

 \end{document}
--- a/Paper/sections/12_clinical_trial_background.tex.bak
+++ b/Paper/sections/12_clinical_trial_background.tex.bak
@ -0,0 +1,114 @@
+\documentclass[../Main.tex]{subfiles}
+\graphicspath{{\subfix{Assets/img/}}}
+
+\begin{document}
+
+% Clinical Trials Background Outline
+% - ClinicalTrials.gov
+% - Clincial trial progression
+% - 
+% - 
+% - 
+% - 
+% - 
+%   - 
+%   - 
+
+To understand how my administrative clinical trial data is obtained
+and what it can be used for, 
+let's take a look at how trial investigators record data on 
+\url{ClinicalTrials.gov}.
+Figure \ref{Fig:Stages} illuistrates the process I describe below.
+During the Pre-Trial period the trial investigators will design the trial, 
+choose primary and secondary objectives, 
+and decide on how many participants they need to enroll. 
+Once they have decided on these details, they post the trial to \url{ClinicalTrials.com}
+and decide on a date to begin enrolling trial participants.
+If the investigators decide to not continue with the trial before enrolling any participants,
+the trial is marked as ``Withdrawn''. 
+On the other hand, if they begin enrolling participants, there are two methods to do so.
+The first is to enter a general ``Recruiting'' state, where patients attempt to enroll.
+The second is to enter an "Enrollment by invitation only" state.
+After a trial has enrolled their participants, they wil typically move to an 
+"Active, not recruiting" state to inform potential participants that they are
+not recruiting. 
+Finally, when the investigators have obtained enough data to achieve their primary
+objective, the clinical trial will be closed, and marked as ``Completed'' in
+\url{ClinicalTrials.gov}
+If the trial is closed before achieving the primary objective, the trial is 
+marked as ``Terminated'' on
+\url{ClinicalTrials.gov}.
+
+
+\begin{figure}%[H] %use [H] to fix the figure here.
+    \includegraphics[width=\textwidth]{../assets/img/ClinicalTrialStagesAndStatuses}
+    \par \small 
+        Diamonds represent decision points while
+	Squares represent states of the clinical trial and Rhombuses represend data obtained by the trial.
+    \caption[Clinical Trial Stages and Progression]{Clinical Trial Stages and Progression}
+    \label{Fig:Stages}
+\end{figure}
+
+Note the information we obtain about the trial from the final status: 
+``Withdrawn'', ``Terminated'', or ``Completed''.
+Although 
+\cite{khmelnitskaya_competitionattritiondrug_2021}
+describes a clinical failure due to safety or efficacy as a 
+\textit{scientific} failure, it is better described as a compound failure.
+Discovering that a compound doesn't work as hoped is not a failure but the whole
+purpose of the clinical trials process. 
+On the other hand, when a trial terminates early due to reasons 
+other than safety or efficacy concerns, the trial operator does not learn
+if the drug is effective or safe. 
+This is a knowledge-gathering failure where the trial operator 
+did not learn if the drug was effective or not.
+I prefer describing a clinical trial as being terminated for 
+\begin{itemize}
+    \item Safety or Efficacy concerns
+    \item Strategic concerns
+    \item Operational concerns.
+\end{itemize}
+
+Unfortunately it can be difficult to know why a given trial was terminated,
+in spite of the fact that upon termination, trials typically record a 
+description of \textit{a single} reason for the clinical trial termination. 
+This doesn't necessarily list all the reasons contributing to the trial termination and may not exist for a given trial.
+For example, if a Principle Investigator leaves for another institution 
+(terminating the trial), is this decison affected by 
+a safety or efficacy concern, 
+a new competitor on the market, 
+difficulting recruiting participants,
+or a lack of financial support from the study sponsor? 
+Estimating the impact of different problems that trials face from these 
+low-information, post-hoc signals is insufficient.
+For this reason, I use clinical trial progression to estimate effects. 
+\todo{not sure if this is the best place for this.}
+
+As a trial goes through the different stages of recruitment, the investigators
+update the records on ClinicalTrials.gov. 
+Even though there are only a few times that investigators are required 
+to update this information, it tends to be updated somewhat regularly as it is 
+a way to communicate with potential enrollees. 
+When a trial is first posted, it tends to include information
+such as planned enrollment, 
+planned end dates, 
+the sites at which it is being conducted, 
+the diseases that it is investigating, 
+the drugs or other treatments that will be used,
+the experimental arms that will be used,
+and who is sponsoring the trial.
+As enrollment is opened and closed and sites are added or removed, 
+investigators will update the status and information
+to help doctors and potential participants understand whether they should apply. 
+
+
+
+% - 
+% - 
+% - 
+% - 
+% - 
+% - 
+
+
+\end{document}
--- a/assets/img/CausalModel.drawio
+++ b/assets/img/CausalModel.drawio
@ -0,0 +1,143 @@
+<mxfile host="Electron" agent="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) draw.io/26.0.9 Chrome/128.0.6613.186 Electron/32.2.5 Safari/537.36" version="26.0.9">
+  <diagram name="Page-1" id="mtO3n3LGiMWHk561vA-H">
+    <mxGraphModel dx="1434" dy="796" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="3300" pageHeight="2339" math="0" shadow="0">
+      <root>
+        <mxCell id="0" />
+        <mxCell id="1" parent="0" />
+        <mxCell id="rB942MAWg83VmefqhkrR-1" value="Enrollment Level" style="swimlane;fontStyle=0;childLayout=stackLayout;horizontal=1;startSize=30;horizontalStack=0;resizeParent=1;resizeParentMax=0;resizeLast=0;collapsible=1;marginBottom=0;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#0000FF;perimeterSpacing=1;strokeWidth=4;fillStyle=zigzag-line;" parent="1" vertex="1">
+          <mxGeometry x="560" y="430" width="120" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-2" value="Enrollment Status" style="text;strokeColor=#0000FF;fillColor=#ffe6cc;align=left;verticalAlign=middle;spacingLeft=4;spacingRight=4;overflow=hidden;points=[[0,0.5],[1,0.5]];portConstraint=eastwest;rotatable=0;whiteSpace=wrap;html=1;strokeWidth=4;perimeterSpacing=1;" parent="rB942MAWg83VmefqhkrR-1" vertex="1">
+          <mxGeometry y="30" width="120" height="30" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-5" value="Will it Terminate?" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#f8cecc;strokeColor=#0000FF;strokeWidth=4;fillStyle=dots;" parent="1" vertex="1">
+          <mxGeometry x="840" y="520" width="120" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-6" value="Enrollment Process&lt;br&gt;Parameters" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;fillStyle=zigzag-line;" parent="1" vertex="1">
+          <mxGeometry x="560" y="241" width="120" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-27" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;strokeWidth=3;" parent="1" source="rB942MAWg83VmefqhkrR-7" target="rB942MAWg83VmefqhkrR-5" edge="1">
+          <mxGeometry relative="1" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-7" value="Elapsed Duration" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;" parent="1" vertex="1">
+          <mxGeometry x="840" y="360" width="120" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-9" value="Population" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;" parent="1" vertex="1">
+          <mxGeometry x="440" y="460" width="60" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-10" value="Market Conditions" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#ffe6cc;strokeColor=#d79b00;" parent="1" vertex="1">
+          <mxGeometry x="500" y="520" width="60" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-12" value="Is likely Safe and Effective" style="swimlane;fontStyle=0;childLayout=stackLayout;horizontal=1;startSize=30;horizontalStack=0;resizeParent=1;resizeParentMax=0;resizeLast=0;collapsible=1;marginBottom=0;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;fillStyle=zigzag-line;" parent="1" vertex="1">
+          <mxGeometry x="770" y="242" width="120" height="59" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-13" value="Decision to proceed with Phase III trial" style="text;strokeColor=#d79b00;fillColor=#ffe6cc;align=left;verticalAlign=middle;spacingLeft=4;spacingRight=4;overflow=hidden;points=[[0,0.5],[1,0.5]];portConstraint=eastwest;rotatable=0;whiteSpace=wrap;html=1;" parent="rB942MAWg83VmefqhkrR-12" vertex="1">
+          <mxGeometry y="30" width="120" height="29" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-29" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;strokeWidth=3;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-6" target="rB942MAWg83VmefqhkrR-1" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="619.5" y="320" as="sourcePoint" />
+            <mxPoint x="619.5" y="380" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-30" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1;entryY=0.5;entryDx=0;entryDy=0;strokeWidth=3;exitX=0;exitY=0.25;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-12" target="rB942MAWg83VmefqhkrR-6" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="610" y="151" as="sourcePoint" />
+            <mxPoint x="610" y="211" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-31" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;strokeWidth=3;entryX=1;entryY=0.25;entryDx=0;entryDy=0;exitX=0;exitY=0.5;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-34" target="rB942MAWg83VmefqhkrR-12" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="980" y="221" as="sourcePoint" />
+            <mxPoint x="720" y="101" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-33" value="Current Trial Safety and Efficacy" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;fillStyle=zigzag-line;" parent="1" vertex="1">
+          <mxGeometry x="1001" y="520" width="115" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-34" value="Previously Observed Safety and Efficacy" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;fillStyle=zigzag-line;" parent="1" vertex="1">
+          <mxGeometry x="1001" y="241" width="115" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-35" value="Fundamental Safety and Efficacy" style="rounded=0;whiteSpace=wrap;html=1;fillColor=#d5e8d4;strokeColor=#82b366;fillStyle=zigzag-line;" parent="1" vertex="1">
+          <mxGeometry x="1001" y="390" width="115" height="60" as="geometry" />
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-39" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=1;entryY=0.5;entryDx=0;entryDy=0;strokeWidth=3;exitX=0;exitY=0.5;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-33" target="rB942MAWg83VmefqhkrR-5" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="1090" y="530" as="sourcePoint" />
+            <mxPoint x="1090" y="590" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-45" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=1;entryDx=0;entryDy=0;strokeWidth=3;exitX=0.5;exitY=0;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-35" target="rB942MAWg83VmefqhkrR-34" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="961" y="330" as="sourcePoint" />
+            <mxPoint x="961" y="390" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-46" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.5;entryY=0;entryDx=0;entryDy=0;strokeWidth=3;exitX=0.5;exitY=1;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-35" target="rB942MAWg83VmefqhkrR-33" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="971" y="340" as="sourcePoint" />
+            <mxPoint x="971" y="400" as="targetPoint" />
+            <Array as="points">
+              <mxPoint x="1061" y="490" />
+              <mxPoint x="1061" y="490" />
+            </Array>
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-48" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;strokeWidth=3;exitX=0.5;exitY=0;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-9" target="rB942MAWg83VmefqhkrR-6" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="400" y="380" as="sourcePoint" />
+            <mxPoint x="400" y="440" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-51" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.5;entryDx=0;entryDy=0;strokeWidth=3;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-10" target="rB942MAWg83VmefqhkrR-5" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="670" y="600" as="sourcePoint" />
+            <mxPoint x="670" y="660" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-52" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.75;entryY=1;entryDx=0;entryDy=0;strokeWidth=3;exitX=0;exitY=0.5;exitDx=0;exitDy=0;strokeColor=#1A1A1A;" parent="1" source="rB942MAWg83VmefqhkrR-10" target="rB942MAWg83VmefqhkrR-9" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="430" y="515" as="sourcePoint" />
+            <mxPoint x="430" y="575" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-54" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.25;entryY=0;entryDx=0;entryDy=0;strokeWidth=3;exitX=1;exitY=0.5;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-9" target="rB942MAWg83VmefqhkrR-10" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="569.5" y="435" as="sourcePoint" />
+            <mxPoint x="569.5" y="495" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-60" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.75;entryDx=0;entryDy=0;strokeWidth=3;exitX=0.5;exitY=0;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-10" target="rB942MAWg83VmefqhkrR-6" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="480" y="530" as="sourcePoint" />
+            <mxPoint x="570" y="280" as="targetPoint" />
+            <Array as="points">
+              <mxPoint x="530" y="285" />
+            </Array>
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-59" style="rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0;entryY=0.25;entryDx=0;entryDy=0;strokeWidth=4;exitX=1;exitY=0.5;exitDx=0;exitDy=0;jumpStyle=arc;jumpSize=14;strokeColor=#0000FF;" parent="1" source="rB942MAWg83VmefqhkrR-2" target="rB942MAWg83VmefqhkrR-5" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="740" y="480" as="sourcePoint" />
+            <mxPoint x="740" y="540" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-64" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.75;entryY=0;entryDx=0;entryDy=0;strokeWidth=3;exitX=0;exitY=0.25;exitDx=0;exitDy=0;" parent="1" source="rB942MAWg83VmefqhkrR-7" target="rB942MAWg83VmefqhkrR-1" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="650" y="380" as="sourcePoint" />
+            <mxPoint x="650" y="440" as="targetPoint" />
+            <Array as="points">
+              <mxPoint x="650" y="375" />
+            </Array>
+          </mxGeometry>
+        </mxCell>
+        <mxCell id="rB942MAWg83VmefqhkrR-66" style="edgeStyle=orthogonalEdgeStyle;rounded=0;orthogonalLoop=1;jettySize=auto;html=1;entryX=0.25;entryY=1;entryDx=0;entryDy=0;strokeWidth=3;exitX=1;exitY=0.25;exitDx=0;exitDy=0;strokeColor=#999999;" parent="1" source="rB942MAWg83VmefqhkrR-1" target="rB942MAWg83VmefqhkrR-7" edge="1">
+          <mxGeometry relative="1" as="geometry">
+            <mxPoint x="650" y="380" as="sourcePoint" />
+            <mxPoint x="650" y="440" as="targetPoint" />
+          </mxGeometry>
+        </mxCell>
+      </root>
+    </mxGraphModel>
+  </diagram>
+</mxfile>
--- a/assets/img/CausalModel.drawio.png
+++ b/assets/img/CausalModel.drawio.png
--- a/assets/preambles/References.bib
+++ b/assets/preambles/References.bib
@ -5377,6 +5377,17 @@ California 90401-3208},
  file = {/home/will/Zotero/storage/RTW5EPBG/meshhome.html}
 }

+@online{usnlm_protocolregistrationdata_2024-06-17,
+  title = {Protocol {{Registration Data Element Definitions}} for {{Interventional}} and {{Observational Studies}} | {{ClinicalTrials}}.Gov},
+  author = {{U.S. National Library of Medicine}},
+  date = {2024-06-17},
+  url = {https://clinicaltrials.gov/policy/protocol-definitions},
+  urldate = {2025-01-25},
+  organization = {ClinicalTrials.gov},
+  keywords = {ClinicalTrials},
+  file = {/home/will/Zotero/storage/HFM6LRS4/protocol-definitions.html}
+}
+
@online{usnlm_rxnavinabox_2023,
  title = {{{RxNav-in-a-Box}} - {{RxNav Applications}}},
  author = {{U.S. National Library of Medicine}},
Author	SHA1	Message	Date
will king	fb644c6c5d	First complete draft of JMP Various adjustments, but used claude.ai to get some suggestions. Updated the causal model graph to represent my current understanding and try to make it colorblind friendly.	1 year ago
Will King	da3c9c31b5	recording changes	1 year ago
Will King	12007e6689	edited intro with claude.ai and spellchecked some stuff	1 year ago
Will King	6f03d6ba08	Merge branch 'main'	1 year ago
Will King	ab98934dc6	Merge branch 'main' Updated citation references	1 year ago
Will King	95be7afb35	Merge branch 'main'	1 year ago
Will King	4302d07ef8	Merge branch 'main'	1 year ago
Will King	907214e359	Squashed commit of the following: commit `963293fc2b` Author: Will King <will.king.git@youainti.com> Date: Mon Jan 13 21:15:44 2025 -0800 Added diagnostics appendix, notes in results commit `d6d2360206` Author: Will King <will.king.git@youainti.com> Date: Mon Jan 13 20:53:18 2025 -0800 Finally got all the needed images correct. Adjusted directory to make it easier to find images. commit `37d35377b3` Author: Will King <will.king.git@youainti.com> Date: Mon Jan 13 16:29:00 2025 -0800 Added more images to assets & included in results. commit `becefe15e0` Author: Will King <will.king.git@youainti.com> Date: Mon Jan 13 12:56:15 2025 -0800 Updated images commit `86f9b8dfc9` Author: will king <youainti@protonmail.com> Date: Mon Jan 13 09:24:20 2025 -0800 finished drafting results commit `64f3d14f7b` Author: will king <youainti@protonmail.com> Date: Mon Jan 6 12:48:48 2025 -0800 Midday updates from writing commit `1630af2928` Author: will king <youainti@protonmail.com> Date: Tue Dec 3 17:08:43 2024 -0800 more updates commit `5d9640ab8d` Author: will king <youainti@protonmail.com> Date: Thu Nov 28 23:39:04 2024 -0800 saving work commit `3e6a8f10d4` Author: will king <youainti@protonmail.com> Date: Tue Nov 26 17:18:24 2024 -0800 tweaked econometrics presentation, added todos commit `7d51cb10b3` Author: will king <youainti@protonmail.com> Date: Tue Nov 26 15:57:50 2024 -0800 updated layout, added gitignore	1 year ago
Will King	9eab4b48e2	removed unused main from overleaf	2 years ago
Will King	b5e0057f0d	Merge branch 'master' of https://git.overleaf.com/6744f0ad29894cafab013eea	2 years ago
Will King	098af4f221	Update on Overleaf.	2 years ago