You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
373 lines
19 KiB
TeX
373 lines
19 KiB
TeX
\documentclass[../Main.tex]{subfiles}
|
|
\graphicspath{{\subfix{Assets/img/}}}
|
|
|
|
\begin{document}
|
|
|
|
%I need to describe separating concerns, e.g.
|
|
|
|
% Begin by talking about goal, what does it mean? This might need some work prior to give more background.
|
|
As I am trying to separate strategic concerns
|
|
(the effect of a marginal treatment methodology)
|
|
and an operational concern
|
|
(the effect of a delay in closing enrollment),
|
|
we need to look at what confounds these effects and how we might measure them.
|
|
To start, we'll look at the data generating model, the values of interest,
|
|
and both the observed and unobserved confounders.
|
|
We'll also discuss how the data collected fits the data generating process.
|
|
|
|
The primary effects one might expect to see are that
|
|
\begin{enumerate}
|
|
\item Adding more drugs to the market will make it harder to
|
|
finish a trial as it is
|
|
more likely to be terminated due to concerns about profitabilty.
|
|
\item Adding more drugs to the market
|
|
will make it harder to recruit, slowing enrollment.
|
|
\item Enrollment challenges (i.e. delays) increase the likelihood that
|
|
a trial will terminate.
|
|
\end{enumerate}
|
|
Unfortunately, these causal effects are confounded in many different ways.
|
|
Figure \ref{FIG:CausalModel} contains a description of the causal model.
|
|
|
|
% The first issue is that the severity of the disease and the size of the population
|
|
% who has that disease affects the ease of enrolling participants.
|
|
% For example, a large population may make it easier to find enough participants
|
|
% to achieve the required statistical discrimination between
|
|
% control and treatment.
|
|
% Second, for some diseases there exists an endogenous dynamic
|
|
% between the treatments available for a disease and the
|
|
% market size/population with that disease.
|
|
% \authorcite{cerda_endogenousinnovationspharmaceutical_2007}
|
|
% proposes two mechanisms
|
|
% that link the drugs on the market and market size.
|
|
% The inverse is that for many chronic diseases with high mortality rates,
|
|
% more drugs cause better survivability, increasing the size of those markets.
|
|
% The third major confound is that the drugs on the market affect enrollment.
|
|
% If there is a treatment already on the market, patients or their doctors
|
|
% may be less inclined to participate in the trial, even if the current treatment
|
|
% has severe downsides.
|
|
%
|
|
% There are additional problems.
|
|
% One is in that the disease being treated affects the
|
|
% safety and efficacy standards that the drug will be held too.
|
|
% For example, if a particular cancer is very deadly and does not respond well
|
|
% to current treatments, Phase I trials will enroll patients with that cancer,
|
|
% as opposed to the standard of enrolling healthy volunteers
|
|
% \cite{commissioner_drugdevelopmentprocess_2020}
|
|
% to establish safe dosages and (hopefully) obtain some effectiveness data.
|
|
% % The trial is more likely to be terminated early if the drug is unsafe or has no
|
|
% % discerenable effect, therefore termination depends in part on a compound-disease
|
|
% % interaction.
|
|
% Another challenge comes from the interaction between duration and termination;
|
|
% in that if a trial terminates before closing enrollment for issues other
|
|
% than enrollment, then the enrollment will still be low.
|
|
% On the other hand, if enrollment is low, the trial might terminate.
|
|
% Thus it is impossible to tell if the low enrollment caused the termination
|
|
% or if the termination caused the low enrollment.
|
|
% Finally, while conducting a trial, the safety and efficacy of a drug are driven by
|
|
% fundamental pharmacokinetic properties of the compounds.
|
|
% These are only imperfectly measured both prior to and during any given trial.
|
|
% Previously measured safety and efficacy inform the decision to start the trial
|
|
% in the first place while currently observed safety and efficiency results
|
|
% help the sponsor judge whether to continue the trial.
|
|
% In contrast, the recruitment rate may depend on the previous results about safety
|
|
% and efficacy.
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
% \subsection{Data Summary}
|
|
% %% Describe data here
|
|
% Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled
|
|
% drugs or devices on human subjects must register
|
|
% their trial at \url{ClinicalTrials.gov}
|
|
% (\cite{anderson_fdadrugapproval_2022}).
|
|
% This involves submitting information on the expected enrollment and duration of
|
|
% trials, drugs or devices that will be used, treatment protocols and study arms,
|
|
% as well as contact information the trial sponsor and treatment sites.
|
|
%
|
|
% When starting a new trial, the required information must be submitted
|
|
% ``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
|
|
% After the initial submission, the data is briefly reviewed for quality and
|
|
% then the trial record is published and the trial is assigned a
|
|
% National Clinical Trial (NCT) identifier.
|
|
% (\cite{anderson_fdadrugapproval_2022}).
|
|
%
|
|
% Each trial's record is updated periodically, including a final update that must occur
|
|
% within a year of completing the primary objective, although exceptions are
|
|
% available for trials related to drug approvals or for trials with secondary
|
|
% objectives that require further observation\footnote{This rule came into effect in 2017}
|
|
% (\cite{anderson_fdadrugapproval_2022}).
|
|
% Other than the requirements for the first and last submissions, all other
|
|
% updates occur at the discresion of the trial sponsor.
|
|
% Because the ClinicalTrials.gov website serves as a central point of information
|
|
% on which trials are active or recruting for a given condition or drug,
|
|
% most trials are updated multiple times during their progression.
|
|
%
|
|
% There are two primary ways to access data about clinical trials.
|
|
% The first is to search individual trials on ClinicalTrials.gov with a web browser.
|
|
% This web portal shows the current information about the trial and provides
|
|
% access to snapshots of previously submitted information.
|
|
% Together, these features fulfill most of the needs of those seeking
|
|
% to join a clinical trial.
|
|
% For this project I've been able to scrape these historical records to establish
|
|
% snapshots of the records provided.
|
|
% %include screenshots?
|
|
% The second way to access the data is through a normalized database setup by
|
|
% the
|
|
% \href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
|
|
% called AACT. %TODO: Get CITATION
|
|
% The AACT database is available as a PostgreSQL database dump or set of
|
|
% flat-files.
|
|
% These dumps match a near-current version of the ClinicalTrials.gov database.
|
|
% This format is ameniable to large scale analysis, but does not contain
|
|
% information about the past state of trials.
|
|
% I combined these two sources, using the AACT dataset to select
|
|
% trials of interest and then scraping \url{ClinicalTrials.gov} to get
|
|
% a timeline of each trial.
|
|
%
|
|
% %%%%%%%%%%%%%%%%%%%%%%%% Model Outline
|
|
%
|
|
% The way I use this data is to predict the final status of the trial
|
|
% from the snapshots that were taken, in effect asking:
|
|
% ``how does the probability of a termination change from the current state
|
|
% of the trial if X changes?''
|
|
%
|
|
%% Return to causal identification
|
|
\subsection{Causal Identification}
|
|
|
|
Because running experiments on companies running clinical trials is not going
|
|
to happen anytime soon, causal identification depends on using a
|
|
structural causal model.
|
|
Because the data generating process for the clinical trials records is rather
|
|
straightforward, this is an ideal place to use
|
|
\authorcite{pearl_causalitymodelsreasoning_2009}
|
|
Do-Calculus.
|
|
This process involves describing the data generating process in the form of
|
|
a directed acyclic graph, where the nodes represent different variables
|
|
within the causal model and the directed edges (arrows) represent
|
|
assumptions about which variables influence the other variables.
|
|
There are a few algorithms that then tell the researcher which of the
|
|
relationships will be confounded, which ones can be statistically estimated,
|
|
and provides some hypotheses that can be tested to ensure the model is
|
|
reasonably correct.
|
|
|
|
|
|
In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
|
|
my proposed data generating process,
|
|
It revolves around the decisions made by the study sponsor,
|
|
who must decide whether to let a trial run to completion
|
|
or terminate the trial early.
|
|
While receiving updates regarding the status of the trial, they ask questions
|
|
such as:
|
|
\begin{itemize}
|
|
\item Do I need to terminate the trial due to safety incidents?
|
|
\item Does it appear that the drug is effective enough to achieve our
|
|
goals, justifying continuing the trial?
|
|
\item Are we recruiting enough participants to achive the statistical
|
|
results we need in the budget we have?
|
|
\item Does the current market conditions and expectations about returns on
|
|
investment justify the expenditures we are making?
|
|
\end{itemize}
|
|
When appropriate issues arise, the study sponsor terminates the trial, otherwise
|
|
it continues to completion.
|
|
|
|
\begin{figure}[H] %use [H] to fix the figure here.
|
|
\includegraphics[width=\textwidth]{../assets/img/CausalModel.drawio.png}
|
|
\caption{Graphical Causal Model}
|
|
% \small{Crimson boxes are the variables of interest,
|
|
% white boxes are unobserved, while the gray boxes will be controlled for.}
|
|
\label{FIG:CausalModel}
|
|
\end{figure}
|
|
|
|
|
|
A quick summary of the nodes of the DAG,
|
|
which nodes are captured in the data,
|
|
the hypothesized relationships in the model,
|
|
and the proposed confounding pathways.
|
|
\begin{itemize}
|
|
\item Items of Interest (Blue boxes and Arrow)
|
|
\begin{enumerate}
|
|
\item \texttt{Enrollment Level (Enrollment Status)}:
|
|
While occasionally a trial will keep the enrollment numbers
|
|
up to date, the only regular information on enrollment recieved
|
|
is the enrollment status, i.e. whether they have finished
|
|
recruiting or not.
|
|
\item \texttt{Will it Terminate?}:
|
|
This represents whether the trial was terminated or if it
|
|
completed successfully.
|
|
\item The effect of \texttt{Enrollment Status} on
|
|
\texttt{Will it Terminate?}:
|
|
How does changing the enrollment status affect the
|
|
probability of termination.
|
|
\end{enumerate}
|
|
\item Observed values (Solid orange boxes)
|
|
\begin{enumerate}
|
|
\item \texttt{Condition}
|
|
(Not drawn in DAG because it impacts everything):
|
|
The underlying condition, classified by IDC-10 group.
|
|
This impacts every other aspect of the model and is pulled from
|
|
the AACT dataset.
|
|
\item \texttt{Market Measures}:
|
|
Various measures of the number of alternate drugs on the market.
|
|
These are either the number of other drugs with the same active ingredient as the trial
|
|
(both generic and originators),
|
|
and those considered alternatives in various formularies published by the United States Pharmacopeia.
|
|
\item \texttt{Population (market size)}:
|
|
Multiple measures of the impact the disease.
|
|
These are measured by the DALY cost of the disease, and is
|
|
separated by the impact on countries with
|
|
High, High-Medium, Medium, Medium-Low, and Low
|
|
Socio-Demographic Index (SDI) scores.
|
|
This data comes from the Institute for Health Metrics' Global Burden of Disease study
|
|
\cite{vos_globalburden369_2020}.
|
|
\item \texttt{Elapsed Duration}:
|
|
A normalized measure of the time elapsed in the trial.
|
|
Comes from the original estimate of the trial's primary completion date and the registered start date.
|
|
I take the difference in days between these, and get the percentage of that time that has elapsed.
|
|
This calculation is based on data from the snapshots and the
|
|
AACT final results.
|
|
\item \texttt{Decision to Proceed with Phase III}:
|
|
If the compound development has progressed to Phase III.
|
|
This is included in the analysis by only including
|
|
Phase III trials registered in the AACT dataset.
|
|
\end{enumerate}
|
|
\item Unobserved (Green Boxes with squiggle hatch marks)
|
|
\begin{enumerate}
|
|
\item \texttt{Fundamental Efficacy and Safety}:
|
|
The underlying safety of the compound.
|
|
Cannot be observed, only estimated through scientific study.
|
|
\item \texttt{Previously observed Efficacy and Safety}:
|
|
The information gathered in previous studies.
|
|
This is not available in my dataset because I don't
|
|
have links to prior studies.
|
|
\item \texttt{Currently observed Efficiency and Safety}:
|
|
The information gathered during this study.
|
|
This is only partially available, and so is
|
|
treated as unavailable.
|
|
After a study is over, the investigators are
|
|
often publish information about adverse events, but only
|
|
those that meet a certain threshold.
|
|
As this information doesn't appear to be provided to
|
|
participants, we don't consider it.
|
|
\end{enumerate}
|
|
% \end{itemize}
|
|
%
|
|
% %
|
|
%
|
|
% \begin{itemize}
|
|
% \item Relationships of interest
|
|
% \begin{enumerate}
|
|
% \item \texttt{Enrollment Status} $\rightarrow$ \texttt{Will Terminate?}:
|
|
% This is the primary effect of interest.
|
|
% \item \texttt{Market Measures} $\rightarrow$ \texttt{Will Terminate?}:
|
|
% This is the secondary effect of interest.
|
|
% \end{enumerate}
|
|
\item Jointly determined variables
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Enrollment Level (Enrollment Status)}
|
|
$\leftrightarrow$
|
|
\texttt{Elapsed Duration}:
|
|
Because I only observe enrollment status and have not good estimate of
|
|
the enrollment process, there is a potential for confounding between
|
|
the elapsed duration of a trial and the enrollment status.
|
|
The proposed mechansims are through the partially observed levels of
|
|
enrollment.
|
|
First, as a trial progresses, the enrollment levels should grow until
|
|
it matches the planned enrollment and the trial ends.
|
|
Thus under good circumstances, elapsed duration drives
|
|
enrollment levels.
|
|
Under bad circumstances though, low enrollment levels may cause the
|
|
duration to extend, as study sponsors spend more resources
|
|
to complete the trial successfully.
|
|
This is an issue because the only complete measure of enrollment
|
|
that we currently have is the enrollment status, and thus I cannot
|
|
control for this effect.
|
|
\item
|
|
\texttt{Market Conditions}
|
|
$\leftrightarrow$
|
|
\texttt{Population}:
|
|
There exists an endogenous dynamic between
|
|
between the treatments available for a disease and the
|
|
market size/population with that disease.
|
|
\authorcite{cerda_endogenousinnovationspharmaceutical_2007}
|
|
proposes two mechanisms
|
|
that link the drugs on the market and market size.
|
|
The first is that a larger population increases the potential
|
|
profitability, trying to get more treatments allowed.
|
|
The inverse is that for many chronic diseases with high mortality rates,
|
|
more drugs cause better survivability, increasing the size of those markets.
|
|
\end{enumerate}
|
|
\item Confounding Pathways
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Condition} (Not drawn in figure \ref{FIG:CausalModel}):
|
|
Interacts with everything.
|
|
\item Backdoor Pathway
|
|
between \texttt{Will Terminate?} and
|
|
\texttt{Enrollment Status} through
|
|
\texttt{Fundamental Safety and Efficacy}.
|
|
The concern is that since previously learned information
|
|
and current information are driven by the same underlying
|
|
physical reality, the enrollment process and
|
|
termination decisions may be correlated.
|
|
Controlling for the decision to proceed with the trial is the
|
|
best adjustment available to block this confounding pathway.
|
|
Below I describe the exact pathways.
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Will Terminate?}
|
|
$\leftarrow$
|
|
\texttt{Currently Observed Efficacy and Safety}
|
|
$\leftarrow$
|
|
\texttt{Fundamental Efficacy and Safety}
|
|
$\rightarrow$
|
|
\texttt{Previously Observed Efficacy and Safety}
|
|
$\rightarrow$
|
|
\texttt{Is likely safe and effective (Decision to proceed with Phase III trial)}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Process Parameters}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Levels (Enrollment Status)}
|
|
\end{enumerate}
|
|
\item
|
|
Backdoor Pathways through \texttt{Population} and
|
|
\texttt{Market Conditions}
|
|
The concern with this pathway is that the rate of enrollment, and
|
|
thus the enrollment status, is affected by the Population with
|
|
the disease and the market condition.
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Will Terminate?}
|
|
$\leftarrow$
|
|
\texttt{Market Conditions}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Process Parameters}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Levels (Enrollment Status)}
|
|
\item
|
|
\texttt{Will Terminate?}
|
|
$\leftarrow$
|
|
\texttt{Market Conditions}
|
|
$\leftrightarrow$
|
|
\texttt{Population}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Process Parameters}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Levels (Enrollment Status)}
|
|
\end{enumerate}
|
|
\item Backdoor Pathway through
|
|
\texttt{Elapsed Duration}.
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Will Terminate?}
|
|
$\leftarrow$
|
|
\texttt{Elapsed Duration}
|
|
$\leftrightarrow$
|
|
\texttt{Enrollment Levels (Enrollment Status)}
|
|
\end{enumerate}
|
|
\end{enumerate}
|
|
\end{itemize}
|
|
|
|
|
|
\end{document}
|