You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
385 lines
20 KiB
TeX
385 lines
20 KiB
TeX
\documentclass[../Main.tex]{subfiles}
|
|
\graphicspath{{\subfix{Assets/img/}}}
|
|
|
|
\begin{document}
|
|
|
|
% Begin by talking about goal, what does it mean? This might need some work prior to give more background.
|
|
As I am trying to separate strategic concerns
|
|
(the effect of a marginal treatment methodology)
|
|
and an operational concern
|
|
(the effect of a delay in closing enrollment),
|
|
we need to look at what confounds these effects and how we might measure them.
|
|
|
|
The primary effects one might expect to see are that
|
|
\begin{enumerate}
|
|
\item Adding more drugs to the market will make it harder to
|
|
finish a trial as it is
|
|
more likely to be terminated due to concerns about profitabilty.
|
|
\item Adding more drugs will make it harder to recruit, slowing enrollment.
|
|
\item Enrollment challenges increase the likelihood that a trial will
|
|
terminate.
|
|
% Mentioned below
|
|
% \item A large population/market will tends to have more drugs to treat it
|
|
% because it is more profitable.
|
|
% \item A large population/market will make it easier to recruit,
|
|
% reducing the likelihood of a termination due to enrollment failure.
|
|
\end{enumerate}
|
|
|
|
There are a few fundamental issues that arise when trying to estimate
|
|
these effects.
|
|
The first is that the severity of the disease and the size of the population
|
|
who has that disease affects the ease of enrolling participants.
|
|
For example, a large population may make it easier to find enough participants
|
|
to achieve the required statistical discrimination between
|
|
control and treatment.
|
|
Second, for some diseases there exists an endogenous dynamic
|
|
between the treatments available for a disease and the
|
|
market size/population with that disease.
|
|
\authorcite{cerda_EndogenousInnovations_2007} proposes two mechanisms
|
|
that link the drugs on the market and market size.
|
|
The inverse is that for many chronic diseases with high mortality rates,
|
|
more drugs cause better survivability, increasing the size of those markets.
|
|
The third major confound is that the drugs on the market affect enrollment.
|
|
If there is a treatment already on the market, patients or their doctors
|
|
may be less inclined to participate in the trial, even if the current treatment
|
|
has severe downsides.
|
|
|
|
There are additional problems.
|
|
One is in that the disease being treated affects the
|
|
safety and efficacy standards that the drug will be held too.
|
|
For example, if a particular cancer is very deadly and does not respond well
|
|
to current treatments, Phase I trials will enroll patients with that cancer,
|
|
as opposed to the standard of enrolling healthy volunteers
|
|
\cite{commissioner_DrugDevelopment_2020} to establish safe dosages.
|
|
The trial is more likely to be terminated early if the drug is unsafe or has no
|
|
discernabile effect, therefore termination depends in part on a compound-disease
|
|
interaction.
|
|
Another challenge comes from the interaction between duration and termination;
|
|
in that if a trial terminates before closing enrollment for issues other
|
|
than enrollment, then the enrollment will still be low.
|
|
On the other hand, if enrollment is low, the trial might terminate.
|
|
These outcomes are indistinguishable in the data provided by the final
|
|
\url{ClinicalTrials.gov} dataset.
|
|
|
|
Finally, while conducting a trial, the safety and efficacy of a drug are driven by
|
|
fundamental pharmacokinetic properties of the compounds.
|
|
These are only imperfectly measured both prior to and during any given trial.
|
|
Previously measured safety and efficacy inform the decision to start the trial
|
|
in the first place while currently observed safety and efficiency results
|
|
help the sponsor judge whether or not to continue the trial.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\subsection{Data Summary}
|
|
%% Describe data here
|
|
Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled
|
|
drugs or devices on human subjects must register
|
|
their trial at \url{ClinicalTrials.gov}
|
|
(\cite{noauthor_fdaaa_nodate}).
|
|
This involves submitting information on the expected enrollment and duration of
|
|
trials, drugs or devices that will be used, treatment protocols and study arms,
|
|
as well as contact information the trial sponsor and treatment sites.
|
|
|
|
When starting a new trial, the required information must be submitted
|
|
``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
|
|
After the initial submission, the data is briefly reviewed for quality and
|
|
then the trial record is published and the trial is assigned a
|
|
National Clinical Trial (NCT) identifier.
|
|
\cite{noauthor_fdaaa_nodate}.
|
|
|
|
Each trial's record is updated periodically, including a final update that must occur
|
|
within a year of completing the primary objective, although exceptions are
|
|
available for trials related to drug approvals or for trials with secondary
|
|
objectives that require further observation\footnote{This rule came into effect in 2017}
|
|
\cite{noauthor_fdaaa_nodate}.
|
|
Other than the requirements for the the first and last submissions, all other
|
|
updates occur at the discresion of the trial sponsor.
|
|
Because the ClinicalTrials.gov website serves as a central point of information
|
|
on which trials are active or recruting for a given condition or drug,
|
|
most trials are updated multiple times during their progression.
|
|
|
|
There are two primary ways to access data about clinical trials.
|
|
The first is to search individual trials on ClinicalTrials.gov with a web browser.
|
|
This web portal shows the current information about the trial and provides
|
|
access to snapshots of previously submitted information.
|
|
Together, these features fulfill most of the needs of those seeking
|
|
to join a clinical trial.
|
|
For this project I've been able to scrape these historical records to establish
|
|
snapshots of the records provided.
|
|
%include screenshots?
|
|
The second way to access the data is through a normalized database setup by
|
|
the
|
|
\href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
|
|
called AACT. %TODO: Get CITATION
|
|
The AACT database is available as a PostgreSQL database dump or set of
|
|
flat-files.
|
|
These dumps match a near-current version of the ClinicalTrials.gov database.
|
|
This format is ameniable to large scale analysis, but does not contain
|
|
information about the past state of trials.
|
|
I combined these two sources, using the AACT dataset to select
|
|
trials of interest and then scraping \url{ClinicalTrials.gov} to get
|
|
a timeline of each trial.
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%% Model Outline
|
|
|
|
The way I use this data is to predict the final status of the trial
|
|
from the snapshots that were taken, in effect asking:
|
|
``how does the probability of a termination change from the current state
|
|
of the trial if X changes?''
|
|
|
|
%% Return to causal identification
|
|
\subsection{Causal Identification}
|
|
|
|
Because running experiments on companies running clinical trials is not going
|
|
to happen anytime soon, causal identification depends on using a
|
|
structural causal model.
|
|
Because the data generating process for the clinical trials records is rather
|
|
straightforward, this is an ideal place to use
|
|
\authorcite{pearl_causality_2000}
|
|
Do-Calculus.
|
|
This process involves describing the data generating process in the form of
|
|
a directed acyclic graph, where the nodes represent different variables
|
|
within the causal model and the directed edges (arrows) represent
|
|
assumptions about which variables influence the other variables.
|
|
There are a few algorithms that then tell the researcher which of the
|
|
relationships will be confounded, which ones can be statistically estimated,
|
|
and provides some hypotheses that can be tested to ensure the model is
|
|
reasonably correct.
|
|
|
|
|
|
In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
|
|
my proposed data generating process,
|
|
It revolves around the decisions made by the study sponsor,
|
|
who must decide whether to let a trial run to completion
|
|
or terminate the trial early.
|
|
While receiving updates regarding the status of the trial, they ask questions
|
|
such as:
|
|
\begin{itemize}
|
|
\item Do I need to terminate the trial due to safety incidents?
|
|
\item Does it appear that the drug is effective enough to achieve our
|
|
goals, justifying continuing the trial?
|
|
\item Are we recruiting enough participants to achive the statistical
|
|
results we need in the budget we have?
|
|
\item Does the current market conditions and expectations about returns on
|
|
investment justify the expenditures we are making?
|
|
\end{itemize}
|
|
When appropriate issues arise, the study sponsor terminates the trial, otherwise
|
|
it continues to completion.
|
|
|
|
\begin{figure}[H] %use [H] to fix the figure here.
|
|
\frame{
|
|
\scalebox{0.65}{
|
|
\tikzfig{../assets/tikzit/CausalGraph2}
|
|
}
|
|
}
|
|
\todo{check if this is the correct graph}
|
|
\caption{Graphical Causal Model}
|
|
|
|
% \small{Crimson boxes are the variables of interest,
|
|
% white boxes are unobserved, while the gray boxes will be controlled for.}
|
|
\label{Fig:CausalModel}
|
|
\end{figure}
|
|
|
|
|
|
% Constructing the model more explicitly
|
|
% - quickly describe each node and line.
|
|
\todo{I think I need to blend the data section in before this, to give some overall information on data.}
|
|
\todo{I may need to add some information on snapshots so that this makes sense.}
|
|
|
|
A quick summary of the nodes of the DAG, the exact representation in the data, and their impact:
|
|
\begin{itemize}
|
|
\item Main Interests (Crimson Boxes)
|
|
\begin{enumerate}
|
|
\item \texttt{Will Terminate?}:
|
|
If the final status of the trial was \textit{terminated}
|
|
and comes from the AACT dataset.
|
|
or \textit{completed}.
|
|
\item \texttt{Enrollment Status}:
|
|
This describes the current enrollment status of the snapshot, e.g.
|
|
\texttt{Recruiting},
|
|
\texttt{Enrolling by invitation only},
|
|
or
|
|
\texttt{Active, not recruting}.
|
|
\item \texttt{Market Measures}:
|
|
Various measures of the number of alternate drugs on the market.
|
|
These are either the number of other drugs with the same active ingredient as the trial
|
|
(both generic and originators),
|
|
and those considered alternatives in various formularies published by the United States Pharmacopeia.
|
|
\end{enumerate}
|
|
\item Observed Confounders (Gray Boxes)
|
|
\begin{enumerate}
|
|
\item \texttt{Condition}:
|
|
The underlying condition, classified by IDC-10 group.
|
|
This impacts every other aspect of the model and is pulled from
|
|
the AACT dataset.
|
|
\item \texttt{Population (market size)}:
|
|
Multiple measures of the impact the disease.
|
|
These are measured by the DALY cost of the disease, and is
|
|
separated by the impact on countries with
|
|
High, High-Medium, Medium, Medium-Low, and Low
|
|
development scores.
|
|
This data comes from the Institute for Health Metrics' Global Burden of Disease study.
|
|
\item \texttt{Elapsed Duration}:
|
|
A normalized measure of the time elapsed in the trial.
|
|
Comes from the original estimate of the trial's primary completion date and the registered start date.
|
|
I take the difference in days between these, and get the percentage of that time that has elapsed.
|
|
This calculation is based on data from the snapshots and the
|
|
AACT final results.
|
|
\item \texttt{Decision to Proceed with Phase III}:
|
|
If the compound development has progressed to Phase III.
|
|
This is included in the analysis by only including
|
|
Phase III trials registered in the AACT dataset.
|
|
\end{enumerate}
|
|
\item Unobserved Confounders (White Boxes)
|
|
\begin{enumerate}
|
|
\item \texttt{Fundamental Efficacy and Safety}:
|
|
The underlying safety of the compound.
|
|
Cannot be observed, only estimated through scientific study.
|
|
\item \texttt{Previously observed Efficacy and Safety}:
|
|
The information gathered in previous studies.
|
|
This is not available in my dataset because I don't
|
|
have links to prior studies.
|
|
\item \texttt{Currently observed Efficiency and Safety}:
|
|
The information gathered during this study.
|
|
This is only partially available, and so is
|
|
treated as unavailable.
|
|
After a study is over, the investigators are
|
|
often publish information about adverse events, but only
|
|
those that meet a certain threshold.
|
|
As this information doesn't appear to be provided to
|
|
participants, we don't consider it.
|
|
\end{enumerate}
|
|
\end{itemize}
|
|
|
|
%
|
|
|
|
\begin{itemize}
|
|
\item Relationships of interest
|
|
\begin{enumerate}
|
|
\item \texttt{Enrollment Status} $\rightarrow$ \texttt{Will Terminate?}:
|
|
This is the primary effect of interest.
|
|
\item \texttt{Market Measures} $\rightarrow$ \texttt{Will Terminate?}:
|
|
This is the secondary effect of interest.
|
|
\end{enumerate}
|
|
\item Confounding Pathways
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Condition}:
|
|
Affects every other node.
|
|
Part of the Adjustment Set.
|
|
\item Backdoor Pathway
|
|
between \texttt{Will Terminate?} and
|
|
\texttt{Enrollment Status} through safety and efficiency.
|
|
The concern is that since previously learned information
|
|
and current information are driven by the same underlying
|
|
physical reality, the enrollment process and
|
|
termination decisions may be correlated.
|
|
Controlling for the decision to proceed with the trial is the
|
|
best adjustment available to block this confounding pathway.
|
|
Below I describe the exact pathways.
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Fundamental Efficacy and Safety}
|
|
$\rightarrow$
|
|
\texttt{Currently Observed Efficacy and Safety}:
|
|
This relationship represents the measurements of
|
|
safety and efficacy in the current trial.
|
|
\item
|
|
\texttt{Currently Observed Efficacy and Safety}:
|
|
$\rightarrow$
|
|
\texttt{Will Terminate?}:
|
|
This is how the measurements of safety and efficacy in the
|
|
current trial affect the probability of termination.
|
|
% typically, evidence of a lack safety or efficacy is
|
|
% enought to terminate the trial.
|
|
\item \texttt{Fundamental Efficacy and Safety}
|
|
$\rightarrow$
|
|
\texttt{Previously Observed Efficacy and Safety}:
|
|
This relationship represents the measurements of
|
|
safety and efficacy in work prior to the current trial.
|
|
\item
|
|
\texttt{Previously Observed Efficacy and Safety}:
|
|
$\rightarrow$
|
|
\texttt{Decision to proceed with Phase III}:
|
|
Previously observed data is essential to the FDA's
|
|
decision to allow a phase III trial.
|
|
\end{enumerate}
|
|
\item
|
|
Backdoor Pathway from \texttt{Market Status}
|
|
to \texttt{Enrollment}
|
|
through \texttt{Population}.
|
|
The concern with this pathway is that the rate of enrollment, and
|
|
thus the enrollment status, is affected by the Population with
|
|
the disease.
|
|
Additionally, there is a concern that the number of competitors
|
|
is driven by the total market size.
|
|
Thus adding Population to the adjustment set is necessary.
|
|
\begin{enumerate}
|
|
\item
|
|
\texttt{Population}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Status}:
|
|
This is fairly straightforward.
|
|
How easy it is to enroll participants depends in part
|
|
on how many people have the disease.
|
|
\item
|
|
\texttt{Population}
|
|
$\rightarrow$
|
|
\texttt{Market Measures}:
|
|
This assumes that the population effect flows only one
|
|
direction, i.e. that a large population size increases
|
|
the likelihood of a large number of drugs.
|
|
%TODO: Think about this one a bit because it does mess
|
|
% with identification, particularly of market effects.
|
|
% these two are jointly determined per cerda 2007.
|
|
% If I can't justify separating them, then I'll need to
|
|
% merge population (market size) and market measures (drugs on market).
|
|
\end{enumerate}
|
|
\item
|
|
\texttt{Market Measures}
|
|
$\rightarrow$
|
|
\texttt{Enrollment Status}:
|
|
This confounds the estimation of the effect of
|
|
\texttt{Enrollment} on \texttt{Will Terminate?}, and
|
|
so \texttt{Market Measures} is part of the adjustment set.
|
|
\item
|
|
\texttt{Market Measures}
|
|
$\rightarrow$
|
|
\texttt{Decision to proceed with Phase III}:
|
|
The alternative treatments on the market will affect a sponsors'
|
|
decision to move forward with a Phase III trial.
|
|
This is controlled for by only working with trials that
|
|
successfully begin recruitment for a Phase III Trial.
|
|
\item
|
|
\texttt{Elapsed Duration}
|
|
$\rightarrow$
|
|
\texttt{Will Terminate?}:
|
|
The amount of time past helps drive the decision to continue
|
|
or terminate.
|
|
\item
|
|
\texttt{Enrollment Status}
|
|
$\leftrightarrow$
|
|
\texttt{Elapsed Duration}:
|
|
% This is jointly determined. and the weakest part of the causal identification without an accurate model of enrollment.
|
|
This is one of the weakest parts of the causal inference.
|
|
Without a well defined model of enrollment, we can't separate
|
|
the interaction between the enrollment status and the elapsed
|
|
duration.
|
|
For example, if enrollment is running slower than expected,
|
|
the trial may be terminated due to concerns that it will not
|
|
achive the primary objectives or that costs will exceed
|
|
the budget allocated to the project.
|
|
\item
|
|
\texttt{Decision to Proceed with Phase III}
|
|
$\rightarrow$
|
|
\texttt{Will Terminate?}:
|
|
%obviously required. Maybe remove from listing and graph?
|
|
This effect is fairly straightforward, in that
|
|
there is no possibility of a termination or completion
|
|
if the trial does not start.
|
|
This is here to block a backdoor pathway between
|
|
\texttt{Will Terminate?} and the enrollment status
|
|
through \texttt{Previously observed Safety and Efficacy}.
|
|
\end{enumerate}
|
|
\end{itemize}
|
|
\end{document}
|