ClinicalTrialsPaper/Latex/Paper/sections/03_CausalIdentification.tex

\documentclass[../Main.tex]{subfiles}
\graphicspath{{\subfix{Assets/img/}}}

\begin{document}

The identification strategy centers on the fact that, in the U.S., clinical trials
update the publically available information on \url{ClinicalTrials.gov}, which are then made
available as historical snapshots.
These updates typically include information such as additional sites conducting the study,
the study status, and expected or current enrollment figures.
By measuring enrollment and other factors prior to the conclusion of a trial, we
can measure the effect of enrollment on trial conclusion
(specifically whether it is registered as completed or terminated).
In particular, this avoids measuring the joint determination of enrollment and conclusion
status arising from trials terminated early.
Figure \ref{Fig:CausalModel} describes the structural causal model (SCM) used to justify
the causal identification

\begin{figure}[H] %use [H] to fix the figure here.
    \tikzfig{../assets/tikzit/CausalGraph}
    \caption{Causal Model}
    \label{Fig:CausalModel}
\end{figure}

The identification strategy is based on the backdoor criterion due to \cite{PEARL1995}.
As the backdoor criteron depends on the SCM being a Directed Acyclic Graph (DAG, the first
step is to justify the DAG in \cref{FIG:CausalModel}.

% The data consists of individual snapshots
%   Describe "states"
%   Also, snapshot states are dependent across time
%   Define conclusion state vs snapshot state.
The key feature of the data is that it consists of sequences of trial snapshots for each trial.
Snapshots prior to the start of the trial capture expected enrollment and time to completion,
while snapshots during the trial record actual enrollement figures, current status,
and the date the snapshot was recorded.
Finally, after a trial concludes, snapshots list final enrollment and the date at which the
last participant was examined\cite{CLINICALTRIALS-data_spec}.
In the discussion below, I refer to a snapshot's ``state'' as the enrollment, duration, and status
recorded at the time of the snapshot.
%TODO: make sure data section discusses the normalization of enrollment and duration.
Additionally, I distinguish between the state at trial conclusion and state from a snapshot during the running trial as
``conclusion state'' and ''active snapshot state''.
%   Describe market conditions.
Associated with each trial snapshot are the market conditions existing at that point in time.


%Describe the observed and unobserved events and their supposed relationships.
%%%%% Relationships of interest
% Snapshot State -> Conclusion state
% Discuss how the data captures this - time dependence
%TODO

% Market -> Snapshot state
% Market -> Conclusion state


%%%%% Confounding relationships and controls
% Disease Burden -> Market Conditions, Snapshot State
In addition to the relationships of interest between teh active snapshot states and
the conclusion states, there are various biasing effects that need to be accounted for.
The first of these is the fact that enrollment and the drugs currently on the market are
both affected by the number of people who are affected by the disease under examination.
This biases not only the estimate of the total causal effect of market conditions
on conclusion state but also the direct effects of both
market conditions and active snapshot enrollment on conclusion state.
Additionally, it biases the estimation of the effect of market conditions on
active snapshot enrollment.
I plan on using the WHO's Global Disease Burden Survey
to control for population size. %CITE - ekaterina

% Biasing Pathways

%   Compound Safety -> Current Adverse Events -> Conclusion State.
%       Note: Compound Safety -> Current Adverse Events -/> Snapshot State.
%       Even if it were an issue, the direct events should still be identified?
A second biasing effect is related to the fact that a compounds safety drives both beliefs about
the compound -- affecting active snapshot enrollment -- and the current adverse effects
which directly influcences the conclusion state by leading to terminations.
The backdoor criterion implies that controlling for whether or not prior trials have
occurred will eliminate bias.
%TODO: discuss how you will be conditioning on prior trials, i.e. per compound or just phase 3 etc.

%   Compound Efficacy -> Measured Effectiveness -> Conclusion State
Similarly, the last confounding factor is that of measured effectiveness.
When running a trial, the sponsor will get periodic updates as to the measured effectiveness.
If this is lower than expected, the trial may conclude early.
Although this is a direct effect, the issue comes through the backdoor path through prior trials
and beliefs about the compound.
Thus controlling for prior trials eliminates this path as well.
%   Control
%       Compound Safety, Compound Efficacy -> Prior Trials -> Beliefs about Compound
%

%%%% Variance controls
% Sponsor Changes -> Conclusion Status
Finally, the last control variable is that of sponsor changes.
As sponsors are captured at each snapshot, it is possible to measure when a sponsor has changed.
Changing sponsors is a potentially disruptive event, and so it is likely to affect the probability
that the trial is canceled early.
The purpose of including this control is to reduce the variance of our estimates.
%Describe what causal effects are identified by the backdoor criterion.

\end{document}