Drafted causal identification (including figure) and outlined data section

4 years ago · 3e62e33fae
parent 4402780a44
commit 3e62e33fae
3 changed files with 134 additions and 14 deletions
--- a/Latex/Paper/sections/02_data.tex
+++ b/Latex/Paper/sections/02_data.tex
@ -7,6 +7,48 @@
 % Drugs on Market
 % Market/Population Size
 % Compounds, interactions
+\subsection{Data Sources}

+\subsubsection{Clinical Trials Data}
+%ClinicalTrials.gov
+%   Key features - brief description
+%   Why is it being included
+%   What specific data is used
+%   Data Manipulations for each
+%   Links to data
+
+\subsubsection{Drugs on Market}
+%Structured Product Labels and dates of marketing
+%   Key features
+%   Why is it being included
+%   What specific data is used
+%   Data Manipulations for each
+%   Links to data
+
+\subsubsection{Global Disease Burden Survey}
+%Dataset name
+%   Key features
+%   Why is it being included
+%   What specific data is used
+%   Data Manipulations for each
+%   Links to data
+
+
+%Dataset name
+%   Key features
+%   Why is it being included
+%   What specific data is used
+%   Data Manipulations for each
+%   Links to data
+
+\subsection{Data Linkages}
+
+%% Ideal
+% Link trial indications to a generic indication registry
+% Link SPL indications to a generic indication registry
+% Link Global disease burden to a generic indication registry
+% Link trial compounds a generic registry (MPT or whatever that was)?
+% Link SPL compounds to a generic registry
+% 

 \end{document}
--- a/Latex/Paper/sections/03_CausalIdentification.tex
+++ b/Latex/Paper/sections/03_CausalIdentification.tex
@ -4,8 +4,8 @@
 \begin{document}

 The identification strategy centers on the fact that, in the U.S., clinical trials
-update the publically available information on \url{ClinicalTrials.gov}, which are available
-as historical snapshots through the website.
+update the publically available information on \url{ClinicalTrials.gov}, which are then made
+available as historical snapshots.
 These updates typically include information such as additional sites conducting the study, 
 the study status, and expected or current enrollment figures. 
 By measuring enrollment and other factors prior to the conclusion of a trial, we 
@ -13,8 +13,8 @@ can measure the effect of enrollment on trial conclusion
 (specifically whether it is registered as completed or terminated).
 In particular, this avoids measuring the joint determination of enrollment and conclusion
 status arising from trials terminated early.
-Figure \cref{Fig:CausalModel} describes the structural causal model used to justify
-our identification claims.
+Figure \ref{Fig:CausalModel} describes the structural causal model (SCM) used to justify
+the causal identification

 \begin{figure}[H] %use [H] to fix the figure here.
    \tikzfig{../assets/tikzit/CausalGraph}
@ -22,7 +22,84 @@ our identification claims.
    \label{Fig:CausalModel}
 \end{figure}

+The identification strategy is based on the backdoor criterion due to \cite{PEARL1995}.
+As the backdoor criteron depends on the SCM being a Directed Acyclic Graph (DAG, the first 
+step is to justify the DAG in \cref{FIG:CausalModel}.
+
+% The data consists of individual snapshots
+%   Describe "states" 
+%   Also, snapshot states are dependent across time
+%   Define conclusion state vs snapshot state.
+The key feature of the data is that it consists of sequences of trial snapshots for each trial.
+Snapshots prior to the start of the trial capture expected enrollment and time to completion,
+while snapshots during the trial record actual enrollement figures, current status, 
+and the date the snapshot was recorded.
+Finally, after a trial concludes, snapshots list final enrollment and the date at which the 
+last participant was examined\cite{CLINICALTRIALS-data_spec}.
+In the discussion below, I refer to a snapshot's ``state'' as the enrollment, duration, and status 
+recorded at the time of the snapshot.
+%TODO: make sure data section discusses the normalization of enrollment and duration.
+Additionally, I distinguish between the state at trial conclusion and state from a snapshot during the running trial as
+``conclusion state'' and ''active snapshot state''.
+%   Describe market conditions.
+Associated with each trial snapshot are the market conditions existing at that point in time.
+
+
 %Describe the observed and unobserved events and their supposed relationships.
+%%%%% Relationships of interest
+% Snapshot State -> Conclusion state
+% Discuss how the data captures this - time dependence
+%TODO
+
+% Market -> Snapshot state
+% Market -> Conclusion state
+
+
+
+%%%%% Confounding relationships and controls
+% Disease Burden -> Market Conditions, Snapshot State
+In addition to the relationships of interest between teh active snapshot states and 
+the conclusion states, there are various biasing effects that need to be accounted for.
+The first of these is the fact that enrollment and the drugs currently on the market are
+both affected by the number of people who are affected by the disease under examination.
+This biases not only the estimate of the total causal effect of market conditions 
+on conclusion state but also the direct effects of both 
+market conditions and active snapshot enrollment on conclusion state.
+Additionally, it biases the estimation of the effect of market conditions on
+active snapshot enrollment.
+I plan on using the WHO's Global Disease Burden Survey
+to control for population size. %CITE - ekaterina
+
+% Biasing Pathways
+
+%   Compound Safety -> Current Adverse Events -> Conclusion State. 
+%       Note: Compound Safety -> Current Adverse Events -/> Snapshot State. 
+%       Even if it were an issue, the direct events should still be identified?
+A second biasing effect is related to the fact that a compounds safety drives both beliefs about 
+the compound -- affecting active snapshot enrollment -- and the current adverse effects
+which directly influcences the conclusion state by leading to terminations.
+The backdoor criterion implies that controlling for whether or not prior trials have
+occurred will eliminate bias.
+%TODO: discuss how you will be conditioning on prior trials, i.e. per compound or just phase 3 etc.
+
+%   Compound Efficacy -> Measured Effectiveness -> Conclusion State
+Similarly, the last confounding factor is that of measured effectiveness.
+When running a trial, the sponsor will get periodic updates as to the measured effectiveness. 
+If this is lower than expected, the trial may conclude early.
+Although this is a direct effect, the issue comes through the backdoor path through prior trials
+and beliefs about the compound.
+Thus controlling for prior trials eliminates this path as well.
+%   Control
+%       Compound Safety, Compound Efficacy -> Prior Trials -> Beliefs about Compound
+% 
+
+%%%% Variance controls
+% Sponsor Changes -> Conclusion Status
+Finally, the last control variable is that of sponsor changes. 
+As sponsors are captured at each snapshot, it is possible to measure when a sponsor has changed.
+Changing sponsors is a potentially disruptive event, and so it is likely to affect the probability
+that the trial is canceled early. 
+The purpose of including this control is to reduce the variance of our estimates.
 %Describe what causal effects are identified by the backdoor criterion.

 \end{document}
--- a/Latex/assets/tikzit/CausalGraph.tikz
+++ b/Latex/assets/tikzit/CausalGraph.tikz
@ -2,32 +2,33 @@
 	\begin{pgfonlayer}{nodelayer}
 		\node [style=emptyBox] (0) at (-18.5, 4) {Compound Safety};
 		\node [style=emptyBox] (1) at (-18.5, -3.5) {Compound Effectiveness};
-		\node [style=emptyBox] (2) at (5, 0.75) {Population Size};
 		\node [style=Red Box] (3) at (-3.75, -3.5) {\begin{tabular}{l} Conclusion \\ $\bullet$ Status\\  $\bullet$ Duration \\ $\bullet$ Enrollment \end{tabular}};
-		\node [style=Red Box] (4) at (-3.75, 1) {Snapshot Enrollment};
+		\node [style=Red Box] (4) at (-2.75, 1) {\begin{tabular}{l} Snapshot \\ $\bullet$ Status\\  $\bullet$ Duration \\ $\bullet$ Enrollment \end{tabular}};
 		\node [style=Red Box] (5) at (3.25, -2) {Market Conditions};
 		\node [style=Box] (6) at (-17.5, -5.5) {Sponsor Changes};
 		\node [style=Box] (7) at (-15.25, 0.25) {\begin{tabular}{l}Prior \\ Trials \end{tabular}};
 		\node [style=emptyBox] (8) at (-2.25, 4) {Beliefs about Compound};
-		\node [style=Box] (9) at (-10.75, 4) {Adverse Events};
 		\node [style=emptyBox] (10) at (3.25, -5.25) {Unobserved};
 		\node [style=Box] (11) at (3.25, -6.25) {Observed: Control};
 		\node [style=Red Box] (12) at (3.25, -7.25) {Observed: Of interest};
-		\node [style=emptyBox] (13) at (-10.75, 5.25) {Adverse Events};
+		\node [style=emptyBox] (13) at (-11, 4) {Adverse Events};
+		\node [style=emptyBox] (14) at (-10.5, -1.5) {Measured Effectiveness};
+		\node [style=Box] (15) at (3.75, 1) {Disease Burden};
 	\end{pgfonlayer}
 	\begin{pgfonlayer}{edgelayer}
 		\draw [style=RightArrow] (4) to (3);
 		\draw [style=RightArrow] (5) to (3);
-		\draw [style=Light Arrow] (2) to (5);
-		\draw [style=Light Arrow] (2) to (4);
-		\draw [style=Light Arrow] (5) to (4);
 		\draw [style=Light Arrow] (0) to (7);
 		\draw [style=Light Arrow] (1) to (7);
 		\draw [style=Light Arrow] (7) to (8);
 		\draw [style=Light Arrow] (8) to (4);
-		\draw [style=Light Arrow] (1) to (3);
 		\draw [style=Light Arrow] (6) to (3);
-		\draw [style=Light Arrow] (0) to (9);
-		\draw [style=Light Arrow] (9) to (3);
+		\draw [style=Light Arrow] (0) to (13);
+		\draw [style=Light Arrow] (13) to (3);
+		\draw [style=Light Arrow] (1) to (14);
+		\draw [style=Light Arrow] (14) to (3);
+		\draw [style=Light Arrow] (15) to (5);
+		\draw [style=Light Arrow] (15) to (4);
+		\draw [style=RightArrow] (5) to (4);
 	\end{pgfonlayer}
 \end{tikzpicture}