updated presentation and paper

3 years ago · bd3fe80a5d
parent f12268d33a
commit bd3fe80a5d
4 changed files with 121 additions and 28 deletions
--- a/Latex/Paper/Main.tex
+++ b/Latex/Paper/Main.tex
@ -47,8 +47,9 @@
 \subfile{sections/05_LitReview}
 %---------------------------------------------------------------
-\section{Data}\label{SEC:Data}
+\section{Causal Story and Data}\label{SEC:Data}
 %---------------------------------------------------------------
 \subfile{sections/10_CausalStory}
 \subfile{sections/02_data}
 %---------------------------------------------------------------
--- a/Latex/Paper/sections/02_data.tex
+++ b/Latex/Paper/sections/02_data.tex
@ -54,7 +54,7 @@ most trials are updated multiple times during their progression.
 There are two primary ways to access data about clinical trials.
 The first is to search individual trials on ClinicalTrials.gov with a web browser.
 This web portal shows the current information about the trial and provides 
-access to snapshots of previous versions of the same information.
+access to snapshots of previously submitted information.
 Together, these features fulfill most of the needs of those seeking 
 to join a clinical trial.
 %include screenshots?
@ -64,14 +64,13 @@ the
 called AACT. %TODO: Get CITATION
 The AACT database is available as a PostgreSQL database dump or set of pipe (``$\vert$'') 
 delimited files and matches the current version of the ClinicalTrials.gov database.
-This format is ameniable to large scale analysis, but does not contain information about past 
+This format is ameniable to large scale analysis, but does not contain information about 
-state of trials.
+the past state of trials.
-One of the main products of this research was the creation of a set of python scripts to 
+I created a set of python scripts to 
 incorporate the historical data on clinical trials available through the web
 portal and merge it into a local copy of the standard AACT database.
-This novel dataset can be used to easily track changes across many trials, 
+This novel dataset can be used to easily track changes as  trials progresss. 
 particularly in the areas of enrollment and expected duration.
 %describe the data NCT, trial records, mesh_terms, etc
 In this combined dataset of current and historical trial records, there are a few 
@ -112,8 +111,8 @@ areas of particular interest.
 \subsubsection{Drug Compounds and Structured Product Labels (SPLs)}
 When a drug is licensed for sale in the U.S., it is not just the active 
-ingredients that are licensed, but also the dosage.
+ingredients that are licensed, but also the dosage and route of administration.
-Each of these combined dosage and compound pairs is assigned a unique 
+Each of these combined compound/dosage/route pairs are assigned a unique 
 National Drug Code (NDC).
 %mention orange book
 The list of approved NDCs  are released regularly in the FDA's 
@ -277,22 +276,6 @@ It is made available through an API hosted by the NLM.
 One key feature is the ability to use a basic text search to find matching 
 terms in various terminologies.
 In order to link clinical trials to standardized ICD-10 conditions and thus
 to the Global Burdens of Disease Data, I wrote a python script to search the 
 UMLS system for ICD-10 codes that matched the MeSH descriptions for
 each trial.
 This search resulted in generally three categories of search results:
 \begin{enumerate}
    \item The results contained a few entries, one of which was obviously correct.
    \item The results contained a large number of entries, a few of which were correct.
    \item The results did not contain any matches.
 \end{enumerate}
 In these cases I needed a way to validate each match and potentially add my own
 ICD-10 codes to each trial.
 To this end I build a website that allows one to quickly review and edit these 
 records.
 The effort to manually review this data is ongoing.
 \subsection{Data Integration}\label{dataintegration}
@ -307,6 +290,7 @@ Below is more information about how the data was used in the analysis.
 For clinical trials, I captured each update that occured after the start date 
 and prior to the primary completion date of the trial.
 For clarity I will refer to these as a snapshot of the trial.
 For each snapshot I recorded the enrollment (actual or anticipated), 
 the date the it was submitted, the planned primary completion date,
 and the trial's overall status at the time.
@ -332,11 +316,34 @@ matters it is only about $[0,3]$. %good to put a graph here
 I also included the current status by encoding it to dummy parameters.
 %Describe linking drugs/getting number of brands
-As a basic measure of market conditions I have gathered the number of brands 
+As an initial measure of market conditions I have gathered the number of brands 
 that are producing drugs containing the compound(s) of interest in the trial.
 This was done by extracting the RxCUIs that represented the drugs of interest,
 then linking those to the RxCUIs that are brands containing those ingredients.
 As a secondary measure of market conditions, I linked clinical trials to the
 USP Drug Classification list.
 Once I had linked the drugs used in a trial to the applicable USP DC category 
 and class, I could find the number of alternative brands in that class.
 This matching was performed by hand, using a custom web interface to the database.
 In order to link clinical trials to standardized ICD-10 conditions and thus
 to the Global Burdens of Disease Data, I wrote a python script to search the 
 UMLS system for ICD-10 codes that matched the MeSH descriptions for
 each trial.
 This search resulted in generally three categories of search results:
 \begin{enumerate}
    \item The results contained a few entries, one of which was obviously correct.
    \item The results contained a large number of entries, a few of which were correct.
    \item The results did not contain any matches.
 \end{enumerate}
 In these cases I needed a way to validate each match and potentially add my own
 ICD-10 codes to each trial.
 This matching was also performed by hand, using a separate custom web interface to the database.
 The effort to manually match ICD-10 codes and USP DC categories and classes data is ongoing.
 %Describe linking icd10 codes to GBD
 % Not every icd10 code maps, so some trials are excluded.
 %Describe categorizing icd10 codes
@ -348,4 +355,6 @@ To get the best estimate of the size of the population associated with a disease
 each trial is linked to the most specific disease category applicable.
 As not every ICD-10 code is linked to a condition in the GBD, those without any 
 applicable conditions are dropped from the dataset.
 \end{document}
--- a/Latex/Paper/sections/10_CausalStory.tex
+++ b/Latex/Paper/sections/10_CausalStory.tex
@ -0,0 +1,82 @@
 \documentclass[../Main.tex]{subfiles}
 \graphicspath{{\subfix{Assets/img/}}}
 \begin{document}
 Because running experiments on companies running clinical trials is not going
 to happen anytime soon, causal identification will depend on creating a 
 structural causal model.
 In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
 the data generating model.
 The proposed data generating model consists of a decision maker, the study 
 sponsor, who must decide whether to let a trial run to completion or terminate
 the trial early. 
 While receiving updates regarding the status of the trial, they ask questions
 such as:
 \begin{itemize}
    \item Do I need to terminate the trial due to safety incidents?
    \item Does it appear that the drug is effective enough to achieve our 
        goals, justifying continuing the trial?
    \item Are we recruiting enough participants to achive the statistical
        results we need?
    \item Does the current market conditions and expectations about returns on 
        investment justify the expenditures we are making?
 \end{itemize}
 When appropriate, the study sponsor terminates the trial.
 If there are not enough issues to terminate the trial, it continues until it 
 is completed.
 While conducting a trial, the safety and efficacy of a drug are driven by
 fundamental pharmacokinetic properties of the compounds. 
 These are only imperfectly measured both prior to and during any given trial.
 Previously measured safety and efficacy inform the decision to start the trial
 in the first place while currently observed safety and efficiency results
 help the sponsor judge whether or not to continue the trial.
 Of course, these decisions are both affected by the specific condition being
 treated due to differences in the severity of the symptoms.
 When a trial has been started, it comes time to recruit participancts.
 Participants frequently depend on the advice of their physician when deciding 
 to join a trial or not. 
 As these physicians have a duty to seek their patients best interest; they, along
 with their patients will evaluate if the previously observed safety and efficacy
 results justify joining the trial over using current standard treatments.
 Thus the current market conditions may affect the rate at which participants 
 enroll in the trial.
 The enrollment of participants in a trial depends on a few other factors.
 The condition or disease of interest and how it progresses will determine how long
 recruitiment will be held open versus just an observation of treatment arms.
 Aditionally, a trial that has already reached a high enough enrollment will often
 close recruitment by switching to an "Active, not recruiting" stage to manage costs.
 Finally, enrolling participants depends on how difficult it is to find people 
 who suffer from the condition of interest.
 The preceeding issue of population size also affects the number of alternatives available.
 When there are less people affected by the disease, the smaller market reduces 
 possible profitability, all else equal.
 Thus the likelihood of companies paying the sunk costs to develop drugs for
 these conditions may be lower.
 Finally, the number of alternatives on the market may affect the return on
 investment directly, causing a trial to terminate early if the return is
 not high enough.
 \begin{figure}[H] %use [H] to fix the figure here.
 	\includegraphics[width=\textwidth]{../assets/img/dagitty-model.jpg}
    \caption{Causal Model}
    \label{Fig:CausalModel}
 \end{figure}
 % 
 By using Judea Pearl's do-calculus, I can show that by choosing an adjustment 
 set of the decision to condut a phase III trial, the condition of interest, 
 the current status of the trial, and the population size will casually
 identify the direct effects of enrollment and market alternatives on the
 probability of termination.
 This is easily verified through the backdoor criterion, which states that
 if every path between the exposure and outcome that starts with an arrow 
 flowing into the exposure is blocked by one of the values in the adjustment
 set, then the effect of the exposure on outcome is causally identified
 (\cite{pearl_causality_2000}).
 It can be easily visually verified by the DAG on the graph that this is the case.
 \end{document}
--- a/Latex/Presentation/presentation.tex
+++ b/Latex/Presentation/presentation.tex
@ -152,7 +152,8 @@ Washington State University \\ % Your institution for the title page
    Questions
    \begin{itemize}
        \item How do the competitors on the market affect clinical trial completion?
-        \item How is this effect moderated by the enrollment of participants?
+        \item Can we tell how this effect moderated by the enrollment of participants?
        \item Is this effect consistent across different disease categories?
    \end{itemize}
 \end{frame}
 %-------------------------------
@ -770,7 +771,7 @@ Washington State University \\ % Your institution for the title page
    %TODO: Update
    \begin{itemize}
-        \item 6 chains
+        \item 4 chains
        \item 2,500 warm-up, 2,500 sampling runs
        \item seed = 11021585
    \end{itemize}