diff --git a/Latex/Paper/Main.tex b/Latex/Paper/Main.tex index 2e46390..8fec9a9 100644 --- a/Latex/Paper/Main.tex +++ b/Latex/Paper/Main.tex @@ -47,8 +47,9 @@ \subfile{sections/05_LitReview} %--------------------------------------------------------------- -\section{Data}\label{SEC:Data} +\section{Causal Story and Data}\label{SEC:Data} %--------------------------------------------------------------- +\subfile{sections/10_CausalStory} \subfile{sections/02_data} %--------------------------------------------------------------- diff --git a/Latex/Paper/sections/02_data.tex b/Latex/Paper/sections/02_data.tex index 1bf055c..9146418 100644 --- a/Latex/Paper/sections/02_data.tex +++ b/Latex/Paper/sections/02_data.tex @@ -54,7 +54,7 @@ most trials are updated multiple times during their progression. There are two primary ways to access data about clinical trials. The first is to search individual trials on ClinicalTrials.gov with a web browser. This web portal shows the current information about the trial and provides -access to snapshots of previous versions of the same information. +access to snapshots of previously submitted information. Together, these features fulfill most of the needs of those seeking to join a clinical trial. %include screenshots? @@ -64,14 +64,13 @@ the called AACT. %TODO: Get CITATION The AACT database is available as a PostgreSQL database dump or set of pipe (``$\vert$'') delimited files and matches the current version of the ClinicalTrials.gov database. -This format is ameniable to large scale analysis, but does not contain information about past -state of trials. +This format is ameniable to large scale analysis, but does not contain information about +the past state of trials. -One of the main products of this research was the creation of a set of python scripts to +I created a set of python scripts to incorporate the historical data on clinical trials available through the web portal and merge it into a local copy of the standard AACT database. -This novel dataset can be used to easily track changes across many trials, -particularly in the areas of enrollment and expected duration. +This novel dataset can be used to easily track changes as trials progresss. %describe the data NCT, trial records, mesh_terms, etc In this combined dataset of current and historical trial records, there are a few @@ -112,8 +111,8 @@ areas of particular interest. \subsubsection{Drug Compounds and Structured Product Labels (SPLs)} When a drug is licensed for sale in the U.S., it is not just the active -ingredients that are licensed, but also the dosage. -Each of these combined dosage and compound pairs is assigned a unique +ingredients that are licensed, but also the dosage and route of administration. +Each of these combined compound/dosage/route pairs are assigned a unique National Drug Code (NDC). %mention orange book The list of approved NDCs are released regularly in the FDA's @@ -277,22 +276,6 @@ It is made available through an API hosted by the NLM. One key feature is the ability to use a basic text search to find matching terms in various terminologies. -In order to link clinical trials to standardized ICD-10 conditions and thus -to the Global Burdens of Disease Data, I wrote a python script to search the -UMLS system for ICD-10 codes that matched the MeSH descriptions for -each trial. -This search resulted in generally three categories of search results: -\begin{enumerate} - \item The results contained a few entries, one of which was obviously correct. - \item The results contained a large number of entries, a few of which were correct. - \item The results did not contain any matches. -\end{enumerate} -In these cases I needed a way to validate each match and potentially add my own -ICD-10 codes to each trial. -To this end I build a website that allows one to quickly review and edit these -records. - -The effort to manually review this data is ongoing. \subsection{Data Integration}\label{dataintegration} @@ -307,6 +290,7 @@ Below is more information about how the data was used in the analysis. For clinical trials, I captured each update that occured after the start date and prior to the primary completion date of the trial. For clarity I will refer to these as a snapshot of the trial. + For each snapshot I recorded the enrollment (actual or anticipated), the date the it was submitted, the planned primary completion date, and the trial's overall status at the time. @@ -332,11 +316,34 @@ matters it is only about $[0,3]$. %good to put a graph here I also included the current status by encoding it to dummy parameters. %Describe linking drugs/getting number of brands -As a basic measure of market conditions I have gathered the number of brands +As an initial measure of market conditions I have gathered the number of brands that are producing drugs containing the compound(s) of interest in the trial. This was done by extracting the RxCUIs that represented the drugs of interest, then linking those to the RxCUIs that are brands containing those ingredients. + +As a secondary measure of market conditions, I linked clinical trials to the +USP Drug Classification list. +Once I had linked the drugs used in a trial to the applicable USP DC category +and class, I could find the number of alternative brands in that class. +This matching was performed by hand, using a custom web interface to the database. + +In order to link clinical trials to standardized ICD-10 conditions and thus +to the Global Burdens of Disease Data, I wrote a python script to search the +UMLS system for ICD-10 codes that matched the MeSH descriptions for +each trial. +This search resulted in generally three categories of search results: +\begin{enumerate} + \item The results contained a few entries, one of which was obviously correct. + \item The results contained a large number of entries, a few of which were correct. + \item The results did not contain any matches. +\end{enumerate} +In these cases I needed a way to validate each match and potentially add my own +ICD-10 codes to each trial. +This matching was also performed by hand, using a separate custom web interface to the database. + +The effort to manually match ICD-10 codes and USP DC categories and classes data is ongoing. + %Describe linking icd10 codes to GBD % Not every icd10 code maps, so some trials are excluded. %Describe categorizing icd10 codes @@ -348,4 +355,6 @@ To get the best estimate of the size of the population associated with a disease each trial is linked to the most specific disease category applicable. As not every ICD-10 code is linked to a condition in the GBD, those without any applicable conditions are dropped from the dataset. + + \end{document} diff --git a/Latex/Paper/sections/10_CausalStory.tex b/Latex/Paper/sections/10_CausalStory.tex new file mode 100644 index 0000000..7d8f6c8 --- /dev/null +++ b/Latex/Paper/sections/10_CausalStory.tex @@ -0,0 +1,82 @@ +\documentclass[../Main.tex]{subfiles} +\graphicspath{{\subfix{Assets/img/}}} + +\begin{document} + +Because running experiments on companies running clinical trials is not going +to happen anytime soon, causal identification will depend on creating a +structural causal model. +In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes +the data generating model. +The proposed data generating model consists of a decision maker, the study +sponsor, who must decide whether to let a trial run to completion or terminate +the trial early. +While receiving updates regarding the status of the trial, they ask questions +such as: +\begin{itemize} + \item Do I need to terminate the trial due to safety incidents? + \item Does it appear that the drug is effective enough to achieve our + goals, justifying continuing the trial? + \item Are we recruiting enough participants to achive the statistical + results we need? + \item Does the current market conditions and expectations about returns on + investment justify the expenditures we are making? +\end{itemize} +When appropriate, the study sponsor terminates the trial. +If there are not enough issues to terminate the trial, it continues until it +is completed. + +While conducting a trial, the safety and efficacy of a drug are driven by +fundamental pharmacokinetic properties of the compounds. +These are only imperfectly measured both prior to and during any given trial. +Previously measured safety and efficacy inform the decision to start the trial +in the first place while currently observed safety and efficiency results +help the sponsor judge whether or not to continue the trial. +Of course, these decisions are both affected by the specific condition being +treated due to differences in the severity of the symptoms. + +When a trial has been started, it comes time to recruit participancts. +Participants frequently depend on the advice of their physician when deciding +to join a trial or not. +As these physicians have a duty to seek their patients best interest; they, along +with their patients will evaluate if the previously observed safety and efficacy +results justify joining the trial over using current standard treatments. +Thus the current market conditions may affect the rate at which participants +enroll in the trial. + +The enrollment of participants in a trial depends on a few other factors. +The condition or disease of interest and how it progresses will determine how long +recruitiment will be held open versus just an observation of treatment arms. +Aditionally, a trial that has already reached a high enough enrollment will often +close recruitment by switching to an "Active, not recruiting" stage to manage costs. +Finally, enrolling participants depends on how difficult it is to find people +who suffer from the condition of interest. + +The preceeding issue of population size also affects the number of alternatives available. +When there are less people affected by the disease, the smaller market reduces +possible profitability, all else equal. +Thus the likelihood of companies paying the sunk costs to develop drugs for +these conditions may be lower. +Finally, the number of alternatives on the market may affect the return on +investment directly, causing a trial to terminate early if the return is +not high enough. + +\begin{figure}[H] %use [H] to fix the figure here. + \includegraphics[width=\textwidth]{../assets/img/dagitty-model.jpg} + \caption{Causal Model} + \label{Fig:CausalModel} +\end{figure} +% +By using Judea Pearl's do-calculus, I can show that by choosing an adjustment +set of the decision to condut a phase III trial, the condition of interest, +the current status of the trial, and the population size will casually +identify the direct effects of enrollment and market alternatives on the +probability of termination. +This is easily verified through the backdoor criterion, which states that +if every path between the exposure and outcome that starts with an arrow +flowing into the exposure is blocked by one of the values in the adjustment +set, then the effect of the exposure on outcome is causally identified +(\cite{pearl_causality_2000}). +It can be easily visually verified by the DAG on the graph that this is the case. + +\end{document} diff --git a/Latex/Presentation/presentation.tex b/Latex/Presentation/presentation.tex index 3c14e17..eefeeb5 100644 --- a/Latex/Presentation/presentation.tex +++ b/Latex/Presentation/presentation.tex @@ -152,7 +152,8 @@ Washington State University \\ % Your institution for the title page Questions \begin{itemize} \item How do the competitors on the market affect clinical trial completion? - \item How is this effect moderated by the enrollment of participants? + \item Can we tell how this effect moderated by the enrollment of participants? + \item Is this effect consistent across different disease categories? \end{itemize} \end{frame} %------------------------------- @@ -770,7 +771,7 @@ Washington State University \\ % Your institution for the title page %TODO: Update \begin{itemize} - \item 6 chains + \item 4 chains \item 2,500 warm-up, 2,500 sampling runs \item seed = 11021585 \end{itemize}