recording most recent updates

claude_rewrite
will king 1 year ago
parent 70c2eeb5f5
commit 0bea917eb3

@ -0,0 +1,18 @@
NEXT STEPS IN WRITING
- insert a description of the general approach I use:
- predicting, based on snapshots, the likelihood of termination.
- this needs to go between the description of the snapshots and the
causal inference introduction.
- Then I can use what I've written about the graph, and follow up with more information about the data.
Overall this would look like
- [x] Introduction of the question and general issues of confoundedness.
- [x] Clinical Trials Data Sources
- [x] Explain basic econometric modelling approach
- [ ] Then explain the graph, nodes, and confoundedness in more detail
- [ ] Then go over the rest of the data.
- [ ] Finally
- Discuss the number of datapoints.
- review major challenges to causal identification. (no enrollment model small data size)

@ -10,9 +10,10 @@ and an operational concern
(the effect of a delay in closing enrollment), (the effect of a delay in closing enrollment),
we need to look at what confounds these effects and how we might measure them. we need to look at what confounds these effects and how we might measure them.
The primary effects one expects to see are that The primary effects one might expect to see are that
\begin{enumerate} \begin{enumerate}
\item Adding more drugs will make it harder to finish a trial as it is \item Adding more drugs to the market will make it harder to
finish a trial as it is
more likely to be terminated due to concerns about profitabilty. more likely to be terminated due to concerns about profitabilty.
\item Adding more drugs will make it harder to recruit, slowing enrollment. \item Adding more drugs will make it harder to recruit, slowing enrollment.
\item Enrollment challenges increase the likelihood that a trial will \item Enrollment challenges increase the likelihood that a trial will
@ -45,11 +46,11 @@ has severe downsides.
There are additional problems. There are additional problems.
One is in that the disease being treated affects the One is in that the disease being treated affects the
safety and efficacy profile that the drug will be held too. safety and efficacy standards that the drug will be held too.
For example, if a particular cancer is very deadly and does not respond well For example, if a particular cancer is very deadly and does not respond well
to current treatments, Phase I trials will enroll patients with that cancer, to current treatments, Phase I trials will enroll patients with that cancer,
as opposed to the standard of enrolling healthy volunteers as opposed to the standard of enrolling healthy volunteers
\cite{commissioner_DrugDevelopment_2020}. \cite{commissioner_DrugDevelopment_2020} to establish safe dosages.
The trial is more likely to be terminated early if the drug is unsafe or has no The trial is more likely to be terminated early if the drug is unsafe or has no
discernabile effect, therefore termination depends in part on a compound-disease discernabile effect, therefore termination depends in part on a compound-disease
interaction. interaction.
@ -67,9 +68,70 @@ Previously measured safety and efficacy inform the decision to start the trial
in the first place while currently observed safety and efficiency results in the first place while currently observed safety and efficiency results
help the sponsor judge whether or not to continue the trial. help the sponsor judge whether or not to continue the trial.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Clinical Trials Data Sources}
%% Describe data here
Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled
drugs or devices on human subjects must register
their trial at \url{ClinicalTrials.gov}
(\cite{noauthor_fdaaa_nodate}).
This involves submitting information on the expected enrollment and duration of
trials, drugs or devices that will be used, treatment protocols and study arms,
as well as contact information the trial sponsor and treatment sites.
When starting a new trial, the required information must be submitted
``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
After the initial submission, the data is briefly reviewed for quality and
then the trial record is published and the trial is assigned a
National Clinical Trial (NCT) identifier.
\cite{noauthor_fdaaa_nodate}.
Each trial's record is updated periodically, including a final update that must occur
within a year of completing the primary objective, although exceptions are
available for trials related to drug approvals or for trials with secondary
objectives that require further observation\footnote{This rule came into effect in 2017}
\cite{noauthor_fdaaa_nodate}.
Other than the requirements for the the first and last submissions, all other
updates occur at the discresion of the trial sponsor.
Because the ClinicalTrials.gov website serves as a central point of information
on which trials are active or recruting for a given condition or drug,
most trials are updated multiple times during their progression.
There are two primary ways to access data about clinical trials.
The first is to search individual trials on ClinicalTrials.gov with a web browser.
This web portal shows the current information about the trial and provides
access to snapshots of previously submitted information.
Together, these features fulfill most of the needs of those seeking
to join a clinical trial.
For this project I've been able to scrape these historical records to establish
snapshots of the records provided.
%include screenshots?
The second way to access the data is through a normalized database setup by
the
\href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
called AACT. %TODO: Get CITATION
The AACT database is available as a PostgreSQL database dump or set of
flat-files.
These dumps match a near-current version of the ClinicalTrials.gov database.
This format is ameniable to large scale analysis, but does not contain
information about the past state of trials.
I combined these two sources, using the AACT dataset to select
trials of interest and then scraping \url{ClinicalTrials.gov} to get
a timeline of each trial.
%%%%%%%%%%%%%%%%%%%%%%%% Model Outline
The way I use this data is to predict the final status of the trial
from the snapshots that were taken, in effect asking:
``how does the probability of a termination change from the current state
of the trial if X changes?''
%% Return to causal identification
\subsection{Causal Identification}
Because running experiments on companies running clinical trials is not going Because running experiments on companies running clinical trials is not going
to happen anytime soon, causal identification depends on using an observational to happen anytime soon, causal identification depends on using a
approach and a structural causal model. structural causal model.
Because the data generating process for the clinical trials records is rather Because the data generating process for the clinical trials records is rather
straightforward, this is an ideal place to use straightforward, this is an ideal place to use
\authorcite{pearl_causality_2000} \authorcite{pearl_causality_2000}
@ -84,7 +146,6 @@ and provides some hypotheses that can be tested to ensure the model is
reasonably correct. reasonably correct.
In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
my proposed data generating process, my proposed data generating process,
It revolves around the decisions made by the study sponsor, It revolves around the decisions made by the study sponsor,
@ -130,6 +191,7 @@ A quick summary of the nodes of the DAG, the exact representation in the data, a
\begin{enumerate} \begin{enumerate}
\item \texttt{Will Terminate?}: \item \texttt{Will Terminate?}:
If the final status of the trial was \textit{terminated} If the final status of the trial was \textit{terminated}
and comes from the AACT dataset.
or \textit{completed}. or \textit{completed}.
\item \texttt{Enrollment Status}: \item \texttt{Enrollment Status}:
This describes the current enrollment status of the snapshot, e.g. This describes the current enrollment status of the snapshot, e.g.
@ -147,19 +209,25 @@ A quick summary of the nodes of the DAG, the exact representation in the data, a
\begin{enumerate} \begin{enumerate}
\item \texttt{Condition}: \item \texttt{Condition}:
The underlying condition, classified by IDC-10 group. The underlying condition, classified by IDC-10 group.
This impacts every other aspect of the model. This impacts every other aspect of the model and is pulled from
the AACT dataset.
\item \texttt{Population (market size)}: \item \texttt{Population (market size)}:
Multiple measures of the impact the disease. Multiple measures of the impact the disease.
These are measured by the DALY cost of the disease in countries that have a These are measured by the DALY cost of the disease, and is
High, High-Medium, Medium, Medium-Low, and Low development scores. separated by the impact on countries with
High, High-Medium, Medium, Medium-Low, and Low
development scores.
This data comes from the Institute for Health Metrics' Global Burden of Disease study. This data comes from the Institute for Health Metrics' Global Burden of Disease study.
\item \texttt{Elapsed Duration}: \item \texttt{Elapsed Duration}:
A normalized measure of the time elapsed in the trial. A normalized measure of the time elapsed in the trial.
Comes from the original estimate of the trial's primary completion date and the registered start date. Comes from the original estimate of the trial's primary completion date and the registered start date.
I take the difference in days between these, and get the percentage of that time that has elapsed. I take the difference in days between these, and get the percentage of that time that has elapsed.
This calculation is based on data from the snapshots and the
AACT final results.
\item \texttt{Decision to Proceed with Phase III}: \item \texttt{Decision to Proceed with Phase III}:
If the compound development has progressed to Phase III. If the compound development has progressed to Phase III.
This is included in the analysis by only including Phase III trials. This is included in the analysis by only including
Phase III trials registered in the AACT dataset.
\end{enumerate} \end{enumerate}
\item Unobserved Confounders (White Boxes) \item Unobserved Confounders (White Boxes)
\begin{enumerate} \begin{enumerate}
@ -168,14 +236,22 @@ A quick summary of the nodes of the DAG, the exact representation in the data, a
Cannot be observed, only estimated through scientific study. Cannot be observed, only estimated through scientific study.
\item \texttt{Previously observed Efficacy and Safety}: \item \texttt{Previously observed Efficacy and Safety}:
The information gathered in previous studies. The information gathered in previous studies.
This is not available in my dataset because I don't have links to prior studies. This is not available in my dataset because I don't
have links to prior studies.
\item \texttt{Currently observed Efficiency and Safety}: \item \texttt{Currently observed Efficiency and Safety}:
The information gathered during this study. The information gathered during this study.
This is only partially available, and so is treated as unavailable. This is only partially available, and so is
After a study is over, the investigators are supposed to publish information about adverse events. treated as unavailable.
After a study is over, the investigators are
often publish information about adverse events, but only
those that meet a certain threshold.
As this information doesn't appear to be provided to
participants, we don't consider it.
\end{enumerate} \end{enumerate}
\end{itemize} \end{itemize}
%
\begin{itemize} \begin{itemize}
\item Relationships of interest \item Relationships of interest
\begin{enumerate} \begin{enumerate}

Loading…
Cancel
Save