Merge branch 'rewrite_section' of https://git.youainti.com/youainti/ClinicalTrialsPaper into rewrite_section

2 years ago · e2576b5fc1
parent fb570d1a15 59c3a3a85c
commit e2576b5fc1
3 changed files with 248 additions and 64 deletions
--- a/Latex/Paper/outline3.txt
+++ b/Latex/Paper/outline3.txt
@ -5,9 +5,9 @@ How do I begin work on stuff
        - we can't trust what we are told
        - terminations could be due to safety, strategic, or operational concerns.
    - explaining confounding between 
+        - population/market and enrollment.
        -population/market and market conditions.
        - market conditions and enrollment. 
-        - population/market and enrollment.
    - describe other confounders
        - safety and effectiveness
        - duration <--> enrollment/termination
@ -17,6 +17,7 @@ How do I begin work on stuff
 - Introduce Do-Calculus
    - DAG model
    - What do I need to control for, in some form or other?
+        CURRENTLY HERE:
 - Introduce Data
    - Clinical Trial Progression
        - AACT gives us information on
--- a/Latex/Paper/sections/02_data.tex
+++ b/Latex/Paper/sections/02_data.tex
@ -3,6 +3,7 @@

 \begin{document}
 In the sections below, I examine each source of data, their key features,
+how they match with the variables in the Structural Model DAG,
 and describe applicable terminology (\cref{datasources}).
 I then discuss how these sources were tied together (\cref{datalinks}) and 
 describe the specific data used in the analysis (\cref{dataintegration}).
--- a/Latex/Paper/sections/10_CausalStory.tex
+++ b/Latex/Paper/sections/10_CausalStory.tex
@ -10,7 +10,22 @@ and an operational concern
 (the effect of a delay in closing enrollment), 
 we need to look at what confounds these effects and how we might measure them.

-There are a few fundamental issues.
+The primary effects one expects to see are that
+\begin{enumerate}
+    \item Adding more drugs will make it harder to finish a trial as it is
+        more likely to be terminated due to concerns about profitabilty.
+    \item Adding more drugs will make it harder to recruit, slowing enrollment.
+    \item Enrollment challenges increase the likelihood that a trial will 
+        terminate.
+    % Mentioned below
+    % \item A large population/market will tends to have more drugs to treat it 
+    %     because it is more profitable. 
+    % \item A large population/market will make it easier to recruit, 
+    %     reducing the likelihood of a termination due to enrollment failure.
+\end{enumerate}
+
+There are a few fundamental issues that arise when trying to estimate 
+these effects.
 The first is that the severity of the disease and the size of the population 
 who has that disease affects the ease of enrolling participants. 
 For example, a large population may make it easier to find enough participants
@ -20,22 +35,61 @@ Second, for some diseases there exists an endogenous dynamic
 between the treatments available for a disease and the 
 market size/population with that disease. 
 \authorcite{cerda_EndogenousInnovations_2007} proposes two mechanisms
-that link drugs on the market and market size. 
-The first is that a large market will tends to have more drugs to treat it. 
+that link the drugs on the market and market size. 
 The inverse is that for many chronic diseases with high mortality rates, 
 more drugs cause better survivability, increasing the size of those markets.
+The third major confound is that the drugs on the market affect enrollment. 
+If there is a treatment already on the market, patients or their doctors
+may be less inclined to participate in the trial, even if the current treatment
+has severe downsides. 

+There are additional problems. 
+One is in that the disease being treated affects the 
+safety and efficacy profile that the drug will be held too. 
+For example, if a particular cancer is very deadly and does not respond well
+to current treatments, Phase I trials will enroll patients with that cancer, 
+as opposed to the standard of enrolling healthy volunteers 
+\cite{commissioner_DrugDevelopment_2020}.
+The trial is more likely to be terminated early if the drug is unsafe or has no
+discernabile effect, therefore termination depends in part on a compound-disease 
+interaction.
+Another challenge comes from the interaction between duration and termination;
+in that if a trial terminates before closing enrollment for issues other 
+than enrollment, then the enrollment will still be low. 
+On the other hand, if enrollment is low, the trial might terminate.
+These outcomes are indistinguishable in the data provided by the final 
+\url{ClinicalTrials.gov} dataset.

-%%%%% \/\/\/\/\/ OLD STUFF \/\/\/\/\/
+Finally, while conducting a trial, the safety and efficacy of a drug are driven by
+fundamental pharmacokinetic properties of the compounds. 
+These are only imperfectly measured both prior to and during any given trial.
+Previously measured safety and efficacy inform the decision to start the trial
+in the first place while currently observed safety and efficiency results
+help the sponsor judge whether or not to continue the trial.

 Because running experiments on companies running clinical trials is not going
-to happen anytime soon, causal identification will depend on creating a 
-structural causal model.
+to happen anytime soon, causal identification depends on using an observational
+approach and a structural causal model.
+Because the data generating process for the clinical trials records is rather 
+straightforward, this is an ideal place to use
+\authorcite{pearl_causality_2000}
+Do-Calculus.
+This process involves describing the data generating process in the form of 
+a directed acyclic graph, where the nodes represent different variables
+within the causal model and the directed edges (arrows) represent
+assumptions about which variables influence the other variables. 
+There are a few algorithms that then tell the researcher which of the 
+relationships will be confounded, which ones can be statistically estimated, 
+and provides some hypotheses that can be tested to ensure the model is 
+reasonably correct.
+
+
+
 In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
-the data generating model.
-The proposed data generating model consists of a decision maker, the study 
-sponsor, who must decide whether to let a trial run to completion or terminate
-the trial early. 
+my proposed data generating process,  
+It revolves around the decisions made by the study sponsor, 
+who must decide whether to let a trial run to completion 
+or terminate the trial early. 
 While receiving updates regarding the status of the trial, they ask questions
 such as:
 \begin{itemize}
@ -43,65 +97,193 @@ such as:
    \item Does it appear that the drug is effective enough to achieve our 
        goals, justifying continuing the trial?
    \item Are we recruiting enough participants to achive the statistical
-        results we need?
+        results we need in the budget we have?
    \item Does the current market conditions and expectations about returns on 
        investment justify the expenditures we are making?
 \end{itemize}
-When appropriate, the study sponsor terminates the trial.
-If there are not enough issues to terminate the trial, it continues until it 
-is completed.
-
-While conducting a trial, the safety and efficacy of a drug are driven by
-fundamental pharmacokinetic properties of the compounds. 
-These are only imperfectly measured both prior to and during any given trial.
-Previously measured safety and efficacy inform the decision to start the trial
-in the first place while currently observed safety and efficiency results
-help the sponsor judge whether or not to continue the trial.
-Of course, these decisions are both affected by the specific condition being
-treated due to differences in the severity of the symptoms.
-
-When a trial has been started, it comes time to recruit participancts.
-Participants frequently depend on the advice of their physician when deciding 
-to join a trial or not. 
-As these physicians have a duty to seek their patients best interest; they, along
-with their patients will evaluate if the previously observed safety and efficacy
-results justify joining the trial over using current standard treatments.
-Thus the current market conditions may affect the rate at which participants 
-enroll in the trial.
-
-The enrollment of participants in a trial depends on a few other factors.
-The condition or disease of interest and how it progresses will determine how long
-recruitiment will be held open versus just an observation of treatment arms.
-Aditionally, a trial that has already reached a high enough enrollment will often
-close recruitment by switching to an "Active, not recruiting" stage to manage costs.
-Finally, enrolling participants depends on how difficult it is to find people 
-who suffer from the condition of interest.
-
-The preceeding issue of population size also affects the number of alternatives available.
-When there are less people affected by the disease, the smaller market reduces 
-possible profitability, all else equal.
-Thus the likelihood of companies paying the sunk costs to develop drugs for
-these conditions may be lower.
-Finally, the number of alternatives on the market may affect the return on
-investment directly, causing a trial to terminate early if the return is
-not high enough.
+When appropriate issues arise, the study sponsor terminates the trial, otherwise
+it continues to completion.

 \begin{figure}[H] %use [H] to fix the figure here.
-    \scalebox{0.6}{\tikzfig{../assets/tikzit/CausalGraph2}}
-    \caption{Causal Model}
+    \frame{
+    \scalebox{0.65}{
+             \tikzfig{../assets/tikzit/CausalGraph2}
+    }
+    }
+    \caption{Graphical Causal Model}
+    % \small{Crimson boxes are the variables of interest, 
+    % white boxes are unobserved, while the gray boxes will be controlled for.}
    \label{Fig:CausalModel}
 \end{figure}
-% 
-By using Judea Pearl's do-calculus, I can show that by choosing an adjustment 
-set of the decision to condut a phase III trial, the condition of interest, 
-the current status of the trial, and the population size will casually
-identify the direct effects of enrollment and market alternatives on the
-probability of termination.
-This is easily verified through the backdoor criterion, which states that
-if every path between the exposure and outcome that starts with an arrow 
-flowing into the exposure is blocked by one of the values in the adjustment
-set, then the effect of the exposure on outcome is causally identified
-(\cite{pearl_causality_2000}).
-It can be easily visually verified by the DAG on the graph that this is the case.

+
+% Constructing the model more explicitly
+% - quickly describe each node and line.
+% TODO: double check which graphic to use.
+
+A quick summary of the nodes of the DAG and their impact: 
+\begin{itemize}
+    \item Main Interests (Crimson Boxes)
+        \begin{enumerate}
+            \item \texttt{Will Terminate?}: 
+                If the final status of the trial was \textit{terminated} 
+                or \textit{completed}.
+            \item \texttt{Enrollment Status}: 
+                Measure of whether enrollment is progressing.
+            \item \texttt{Market Measures}: 
+                Various measures of the number of alternate drugs on the market.
+        \end{enumerate}
+    \item Observed Confounders (Gray Boxes)
+        \begin{enumerate}
+            \item \texttt{Condition}: 
+                The underlying condition. 
+                This impacts every other aspect of the model.
+            \item \texttt{Population (market size)}: 
+                Multiple measures of the impact the disease has (in DALYs).
+            \item \texttt{Elapsed Duration}: 
+                A normalized measure of the trial progression.
+            \item \texttt{Decision to Proceed with Phase III}: 
+                If the compound development has progressed to Phase III.
+        \end{enumerate}
+    \item Unobserved Confounders (White Boxes)
+        \begin{enumerate}
+            \item \texttt{Fundamental Efficacy and Safety}:
+                The underlying safety of the compound. 
+                Cannot be observed, only estimated through scientific study.
+            \item \texttt{Previously observed Efficacy and Safety}: 
+                The information gathered in previous studies.
+            \item \texttt{Currently observed Efficiency and Safety}:
+                The information gathered during this study.
+        \end{enumerate}
+\end{itemize}
+
+\begin{itemize}
+    \item Relationships of interest
+        \begin{enumerate}
+            \item \texttt{Enrollment Status} $\rightarrow$ \texttt{Will Terminate?}:
+                This is the primary effect of interest.
+            \item \texttt{Market Measures} $\rightarrow$ \texttt{Will Terminate?}:
+                This is the secondary effect of interest.
+        \end{enumerate}
+    \item Confounding Pathways
+        \begin{enumerate}
+            \item 
+                \texttt{Condition}: 
+                Affects every other node. 
+                Part of the Adjustment Set.
+            \item Backdoor Pathway 
+                between \texttt{Will Terminate?} and 
+                \texttt{Enrollment Status} through safety and efficiency.
+                The concern is that since previously learned information 
+                and current information are driven by the same underlying 
+                physical reality, the enrollment process and 
+                termination decisions may be correlated.
+                Controlling for the decision to proceed with the trial is the 
+                best adjustment available to block this confounding pathway.
+                Below I describe the exact pathways.
+                \begin{enumerate}
+                    \item 
+                        \texttt{Fundamental Efficacy and Safety} 
+                        $\rightarrow$ 
+                        \texttt{Currently Observed Efficacy and Safety}:
+                        This relationship represents the measurements of
+                        safety and efficacy in the current trial. 
+                    \item 
+                        \texttt{Currently Observed Efficacy and Safety}:
+                        $\rightarrow$ 
+                        \texttt{Will Terminate?}:
+                        This is how the measurements of safety and efficacy in the 
+                        current trial affect the probability of termination.
+                        % typically, evidence of a lack safety or efficacy is 
+                        % enought to terminate the trial.
+                    \item \texttt{Fundamental Efficacy and Safety} 
+                        $\rightarrow$ 
+                        \texttt{Previously Observed Efficacy and Safety}:
+                        This relationship represents the measurements of
+                        safety and efficacy in work prior to the current trial. 
+                    \item 
+                        \texttt{Previously Observed Efficacy and Safety}:
+                        $\rightarrow$ 
+                        \texttt{Decision to proceed with Phase III}:
+                        Previously observed data is essential to the FDA's 
+                        decision to allow a phase III trial. 
+                \end{enumerate}
+            \item 
+                Backdoor Pathway from \texttt{Market Status} 
+                to \texttt{Enrollment} 
+                through \texttt{Population}. 
+                The concern with this pathway is that the rate of enrollment, and
+                thus the enrollment status, is affected by the Population with 
+                the disease. 
+                Additionally, there is a concern that the number of competitors
+                is driven by the total market size.
+                Thus adding Population to the adjustment set is necessary.
+                \begin{enumerate}
+                    \item 
+                        \texttt{Population} 
+                        $\rightarrow$ 
+                        \texttt{Enrollment Status}:
+                        This is fairly straightforward. 
+                        How easy it is to enroll participants depends in part  
+                        on how many people have the disease.
+                    \item 
+                        \texttt{Population} 
+                        $\rightarrow$ 
+                        \texttt{Market Measures}:
+                        This assumes that the population effect flows only one
+                        direction, i.e. that a large population size increases
+                        the likelihood of a large number of drugs. 
+                        %TODO: Think about this one a bit because it does mess
+                        % with identification, particularly of market effects. 
+                        % these two are jointly determined per cerda 2007.
+                        % If I can't justify separating them, then I'll need to 
+                        % merge population (market size) and market measures (drugs on market). 
+                \end{enumerate}
+            \item 
+                \texttt{Market Measures} 
+                $\rightarrow$ 
+                \texttt{Enrollment Status}:
+                This confounds the estimation of the effect of 
+                \texttt{Enrollment} on \texttt{Will Terminate?}, and 
+                so \texttt{Market Measures} is part of the adjustment set.
+            \item 
+                \texttt{Market Measures} 
+                $\rightarrow$ 
+                \texttt{Decision to proceed with Phase III}:
+                The alternative treatments on the market will affect a sponsors'
+                decision to move forward with a Phase III trial.
+                This is controlled for by only working with trials that 
+                successfully begin recruitment for a Phase III Trial.
+            \item 
+                \texttt{Elapsed Duration} 
+                $\rightarrow$ 
+                \texttt{Will Terminate?}:
+                The amount of time past helps drive the decision to continue
+                or terminate.
+            \item 
+                \texttt{Enrollment Status} 
+                $\leftrightarrow$ 
+                \texttt{Elapsed Duration}:
+                % This is jointly determined. and the weakest part of the causal identification without an accurate model of enrollment.
+                This is one of the weakest parts of the causal inference. 
+                Without a well defined model of enrollment, we can't separate
+                the interaction between the enrollment status and the elapsed
+                duration. 
+                For example, if enrollment is running slower than expected,
+                the trial may be terminated due to concerns that it will not
+                achive the primary objectives or that costs will exceed 
+                the budget allocated to the project.
+            \item 
+                \texttt{Decision to Proceed with Phase III} 
+                $\rightarrow$ 
+                \texttt{Will Terminate?}:
+                %obviously required. Maybe remove from listing and graph?
+                This effect is fairly straightforward, in that 
+                there is no possibility of a termination or completion
+                if the trial does not start. 
+                This is here to block a backdoor pathway between 
+                \texttt{Will Terminate?} and the enrollment status
+                through \texttt{Previously observed Safety and Efficacy}.
+        \end{enumerate}
+\end{itemize}
 \end{document}