updated presentation and paper

claude_rewrite
Will King 3 years ago
parent f12268d33a
commit bd3fe80a5d

@ -47,8 +47,9 @@
\subfile{sections/05_LitReview} \subfile{sections/05_LitReview}
%--------------------------------------------------------------- %---------------------------------------------------------------
\section{Data}\label{SEC:Data} \section{Causal Story and Data}\label{SEC:Data}
%--------------------------------------------------------------- %---------------------------------------------------------------
\subfile{sections/10_CausalStory}
\subfile{sections/02_data} \subfile{sections/02_data}
%--------------------------------------------------------------- %---------------------------------------------------------------

@ -54,7 +54,7 @@ most trials are updated multiple times during their progression.
There are two primary ways to access data about clinical trials. There are two primary ways to access data about clinical trials.
The first is to search individual trials on ClinicalTrials.gov with a web browser. The first is to search individual trials on ClinicalTrials.gov with a web browser.
This web portal shows the current information about the trial and provides This web portal shows the current information about the trial and provides
access to snapshots of previous versions of the same information. access to snapshots of previously submitted information.
Together, these features fulfill most of the needs of those seeking Together, these features fulfill most of the needs of those seeking
to join a clinical trial. to join a clinical trial.
%include screenshots? %include screenshots?
@ -64,14 +64,13 @@ the
called AACT. %TODO: Get CITATION called AACT. %TODO: Get CITATION
The AACT database is available as a PostgreSQL database dump or set of pipe (``$\vert$'') The AACT database is available as a PostgreSQL database dump or set of pipe (``$\vert$'')
delimited files and matches the current version of the ClinicalTrials.gov database. delimited files and matches the current version of the ClinicalTrials.gov database.
This format is ameniable to large scale analysis, but does not contain information about past This format is ameniable to large scale analysis, but does not contain information about
state of trials. the past state of trials.
One of the main products of this research was the creation of a set of python scripts to I created a set of python scripts to
incorporate the historical data on clinical trials available through the web incorporate the historical data on clinical trials available through the web
portal and merge it into a local copy of the standard AACT database. portal and merge it into a local copy of the standard AACT database.
This novel dataset can be used to easily track changes across many trials, This novel dataset can be used to easily track changes as trials progresss.
particularly in the areas of enrollment and expected duration.
%describe the data NCT, trial records, mesh_terms, etc %describe the data NCT, trial records, mesh_terms, etc
In this combined dataset of current and historical trial records, there are a few In this combined dataset of current and historical trial records, there are a few
@ -112,8 +111,8 @@ areas of particular interest.
\subsubsection{Drug Compounds and Structured Product Labels (SPLs)} \subsubsection{Drug Compounds and Structured Product Labels (SPLs)}
When a drug is licensed for sale in the U.S., it is not just the active When a drug is licensed for sale in the U.S., it is not just the active
ingredients that are licensed, but also the dosage. ingredients that are licensed, but also the dosage and route of administration.
Each of these combined dosage and compound pairs is assigned a unique Each of these combined compound/dosage/route pairs are assigned a unique
National Drug Code (NDC). National Drug Code (NDC).
%mention orange book %mention orange book
The list of approved NDCs are released regularly in the FDA's The list of approved NDCs are released regularly in the FDA's
@ -277,22 +276,6 @@ It is made available through an API hosted by the NLM.
One key feature is the ability to use a basic text search to find matching One key feature is the ability to use a basic text search to find matching
terms in various terminologies. terms in various terminologies.
In order to link clinical trials to standardized ICD-10 conditions and thus
to the Global Burdens of Disease Data, I wrote a python script to search the
UMLS system for ICD-10 codes that matched the MeSH descriptions for
each trial.
This search resulted in generally three categories of search results:
\begin{enumerate}
\item The results contained a few entries, one of which was obviously correct.
\item The results contained a large number of entries, a few of which were correct.
\item The results did not contain any matches.
\end{enumerate}
In these cases I needed a way to validate each match and potentially add my own
ICD-10 codes to each trial.
To this end I build a website that allows one to quickly review and edit these
records.
The effort to manually review this data is ongoing.
\subsection{Data Integration}\label{dataintegration} \subsection{Data Integration}\label{dataintegration}
@ -307,6 +290,7 @@ Below is more information about how the data was used in the analysis.
For clinical trials, I captured each update that occured after the start date For clinical trials, I captured each update that occured after the start date
and prior to the primary completion date of the trial. and prior to the primary completion date of the trial.
For clarity I will refer to these as a snapshot of the trial. For clarity I will refer to these as a snapshot of the trial.
For each snapshot I recorded the enrollment (actual or anticipated), For each snapshot I recorded the enrollment (actual or anticipated),
the date the it was submitted, the planned primary completion date, the date the it was submitted, the planned primary completion date,
and the trial's overall status at the time. and the trial's overall status at the time.
@ -332,11 +316,34 @@ matters it is only about $[0,3]$. %good to put a graph here
I also included the current status by encoding it to dummy parameters. I also included the current status by encoding it to dummy parameters.
%Describe linking drugs/getting number of brands %Describe linking drugs/getting number of brands
As a basic measure of market conditions I have gathered the number of brands As an initial measure of market conditions I have gathered the number of brands
that are producing drugs containing the compound(s) of interest in the trial. that are producing drugs containing the compound(s) of interest in the trial.
This was done by extracting the RxCUIs that represented the drugs of interest, This was done by extracting the RxCUIs that represented the drugs of interest,
then linking those to the RxCUIs that are brands containing those ingredients. then linking those to the RxCUIs that are brands containing those ingredients.
As a secondary measure of market conditions, I linked clinical trials to the
USP Drug Classification list.
Once I had linked the drugs used in a trial to the applicable USP DC category
and class, I could find the number of alternative brands in that class.
This matching was performed by hand, using a custom web interface to the database.
In order to link clinical trials to standardized ICD-10 conditions and thus
to the Global Burdens of Disease Data, I wrote a python script to search the
UMLS system for ICD-10 codes that matched the MeSH descriptions for
each trial.
This search resulted in generally three categories of search results:
\begin{enumerate}
\item The results contained a few entries, one of which was obviously correct.
\item The results contained a large number of entries, a few of which were correct.
\item The results did not contain any matches.
\end{enumerate}
In these cases I needed a way to validate each match and potentially add my own
ICD-10 codes to each trial.
This matching was also performed by hand, using a separate custom web interface to the database.
The effort to manually match ICD-10 codes and USP DC categories and classes data is ongoing.
%Describe linking icd10 codes to GBD %Describe linking icd10 codes to GBD
% Not every icd10 code maps, so some trials are excluded. % Not every icd10 code maps, so some trials are excluded.
%Describe categorizing icd10 codes %Describe categorizing icd10 codes
@ -348,4 +355,6 @@ To get the best estimate of the size of the population associated with a disease
each trial is linked to the most specific disease category applicable. each trial is linked to the most specific disease category applicable.
As not every ICD-10 code is linked to a condition in the GBD, those without any As not every ICD-10 code is linked to a condition in the GBD, those without any
applicable conditions are dropped from the dataset. applicable conditions are dropped from the dataset.
\end{document} \end{document}

@ -0,0 +1,82 @@
\documentclass[../Main.tex]{subfiles}
\graphicspath{{\subfix{Assets/img/}}}
\begin{document}
Because running experiments on companies running clinical trials is not going
to happen anytime soon, causal identification will depend on creating a
structural causal model.
In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
the data generating model.
The proposed data generating model consists of a decision maker, the study
sponsor, who must decide whether to let a trial run to completion or terminate
the trial early.
While receiving updates regarding the status of the trial, they ask questions
such as:
\begin{itemize}
\item Do I need to terminate the trial due to safety incidents?
\item Does it appear that the drug is effective enough to achieve our
goals, justifying continuing the trial?
\item Are we recruiting enough participants to achive the statistical
results we need?
\item Does the current market conditions and expectations about returns on
investment justify the expenditures we are making?
\end{itemize}
When appropriate, the study sponsor terminates the trial.
If there are not enough issues to terminate the trial, it continues until it
is completed.
While conducting a trial, the safety and efficacy of a drug are driven by
fundamental pharmacokinetic properties of the compounds.
These are only imperfectly measured both prior to and during any given trial.
Previously measured safety and efficacy inform the decision to start the trial
in the first place while currently observed safety and efficiency results
help the sponsor judge whether or not to continue the trial.
Of course, these decisions are both affected by the specific condition being
treated due to differences in the severity of the symptoms.
When a trial has been started, it comes time to recruit participancts.
Participants frequently depend on the advice of their physician when deciding
to join a trial or not.
As these physicians have a duty to seek their patients best interest; they, along
with their patients will evaluate if the previously observed safety and efficacy
results justify joining the trial over using current standard treatments.
Thus the current market conditions may affect the rate at which participants
enroll in the trial.
The enrollment of participants in a trial depends on a few other factors.
The condition or disease of interest and how it progresses will determine how long
recruitiment will be held open versus just an observation of treatment arms.
Aditionally, a trial that has already reached a high enough enrollment will often
close recruitment by switching to an "Active, not recruiting" stage to manage costs.
Finally, enrolling participants depends on how difficult it is to find people
who suffer from the condition of interest.
The preceeding issue of population size also affects the number of alternatives available.
When there are less people affected by the disease, the smaller market reduces
possible profitability, all else equal.
Thus the likelihood of companies paying the sunk costs to develop drugs for
these conditions may be lower.
Finally, the number of alternatives on the market may affect the return on
investment directly, causing a trial to terminate early if the return is
not high enough.
\begin{figure}[H] %use [H] to fix the figure here.
\includegraphics[width=\textwidth]{../assets/img/dagitty-model.jpg}
\caption{Causal Model}
\label{Fig:CausalModel}
\end{figure}
%
By using Judea Pearl's do-calculus, I can show that by choosing an adjustment
set of the decision to condut a phase III trial, the condition of interest,
the current status of the trial, and the population size will casually
identify the direct effects of enrollment and market alternatives on the
probability of termination.
This is easily verified through the backdoor criterion, which states that
if every path between the exposure and outcome that starts with an arrow
flowing into the exposure is blocked by one of the values in the adjustment
set, then the effect of the exposure on outcome is causally identified
(\cite{pearl_causality_2000}).
It can be easily visually verified by the DAG on the graph that this is the case.
\end{document}

@ -152,7 +152,8 @@ Washington State University \\ % Your institution for the title page
Questions Questions
\begin{itemize} \begin{itemize}
\item How do the competitors on the market affect clinical trial completion? \item How do the competitors on the market affect clinical trial completion?
\item How is this effect moderated by the enrollment of participants? \item Can we tell how this effect moderated by the enrollment of participants?
\item Is this effect consistent across different disease categories?
\end{itemize} \end{itemize}
\end{frame} \end{frame}
%------------------------------- %-------------------------------
@ -770,7 +771,7 @@ Washington State University \\ % Your institution for the title page
%TODO: Update %TODO: Update
\begin{itemize} \begin{itemize}
\item 6 chains \item 4 chains
\item 2,500 warm-up, 2,500 sampling runs \item 2,500 warm-up, 2,500 sampling runs
\item seed = 11021585 \item seed = 11021585
\end{itemize} \end{itemize}

Loading…
Cancel
Save