merged in past version of presentation. Turns out I had left it on git and had not merged it.

3 years ago · 5a27e4a567
parent 8c46418c5f a26f87ea72
commit 5a27e4a567
64 changed files with 2610 additions and 936 deletions
--- a/.gitignore
+++ b/.gitignore
@ -12,3 +12,6 @@

 ## Ignore PDfs
 *.pdf
+*.dvi
+#ignore swap files
+*.swp
--- a/2
+++ b/2
@ -1 +1 @@
-Subproject commit a2c0e4dcc70a70041e4895698c9dd856defdb7ed
+Subproject commit 05a96a3a29861e682f01498c5499eb686d064409
--- a/Latex/Paper/ClinicalTrialsPaper.md
+++ b/Latex/Paper/ClinicalTrialsPaper.md
@ -0,0 +1,24 @@
+# Intro
+
+# Lit Review
+
+# Causal Identification
+https://xkcd.com/2726/
+
+Because running an experimental trial on how clinical trial recruitment and
+drugs on market affect clinical trial completion is going to be nigh impossible.
+Finding natural experiments may also be difficult.
+Instead going to use a structural approach based on Pearl's Do-Calculus.
+
+Background on backdoor criterion. #can be ignored in this draft?
+
+Present Graph
+
+Discuss adjustment sets for total vs direct effects.
+
+# Conclusion
+
+# Appendix
+Include table of ?? Hierarchy
+
+# References
--- a/Latex/Paper/Main.tex
+++ b/Latex/Paper/Main.tex
@ -13,22 +13,34 @@

 \usepackage{float}

-\title{Title Goes Here \\ \small{subtitle goes here}}
+
+%setup paragraph level indexing
+\usepackage{titlesec}
+\setcounter{secnumdepth}{4}
+
+\titleformat{\paragraph}
+{\normalfont\normalsize\bfseries}{\theparagraph}{1em}{}
+\titlespacing*{\paragraph}
+{0pt}{3.25ex plus 1ex minus .2ex}{1.5ex plus .2ex}
+
+\title{The effects of market conditions on enrollment and completion of clinical trials\\ \small{Preliminary Draft}}
 \author{William King}

 \begin{document}
 \maketitle

 %Describe sections
+%\begin{center}
+%\textbf{Abstract}
+%\end{center}
+
+

 %---------------------------------------------------------------
-\section{Introduction}\label{SEC:Results}
+\section{Introduction}\label{SEC:Introduction}
 %---------------------------------------------------------------

-The paper is organized as follows.
-Section \ref{SEC:Models}, describes something important. 
-
-\subfile{sections/01_Introduction}
+\subfile{sections/01_introduction}
 %---------------------------------------------------------------
 \section{Literature Review}\label{SEC:LiteratureReview}
 %---------------------------------------------------------------
@ -52,11 +64,29 @@ Section \ref{SEC:Models}, describes something important.
 %---------------------------------------------------------------
 \section{Results}\label{SEC:Results}
 %---------------------------------------------------------------
+\subfile{sections/06_Results}
+
+%---------------------------------------------------------------
+\section{Improvements}\label{SEC:Improvements}
+%---------------------------------------------------------------
+\subfile{sections/08_PotentialImprovements}
+
+%---------------------------------------------------------------
+\section{Conclusion}\label{SEC:Conclusion}
+%---------------------------------------------------------------
+\subfile{sections/09_Conclusion}

 \newpage
+%---------------------------------------------------------------
 \section{References}
+%---------------------------------------------------------------
 \printbibliography

+\newpage
+%---------------------------------------------------------------
+\section{Appendicies}
+%---------------------------------------------------------------
+
 \newpage
 \tableofcontents
 \end{document}
--- a/Latex/Paper/sections/01_introduction.tex
+++ b/Latex/Paper/sections/01_introduction.tex
@ -2,11 +2,52 @@
 \graphicspath{{\subfix{Assets/img/}}}

 \begin{document}
-In September of 2019, the European Space Agency (ESA) released a tweet 
-explaining that they had performed an appendicotomy in space using 
-nothing more than radiation from the sun.
-They are sad to announce that the patient died due to complications from 
-exposure to the cold vaccum of space.
+% hook - what makes drugs expensive? Mention high failure rate
+% describe current research 
+% - Examine mechanisms by which clinical trials fail.
+% - Mention data
+% - Results
+How to best address the high cost of pharmaceuticals is a crucial health 
+and fiscal policy question that has been debated for 
+decades.
+Due to the complicated legal and competitive landscape, unintended consequences
+are common 
+\cite{van_der_gronde_addressing_2017}.
+One critical aspect to successfully introduce a novel pharmaceutical or even
+a generic compound is to establish that the drug as packaged and sold will
+have acceptable safety and efficacy profiles.
+This is done using clinical trials.
+
+To adequately guide public policy it is crucial that robust, causally-identified
+statistical models are available to describe the interaction between
+various players within the space.
+While it is known that pharmaceutical companies withdraw some drugs from
+their development pipeline due to commercialization concerns 
+(
+\cite{khmelnitskaya_competition_2021} 
+and 
+\cite{van_der_gronde_addressing_2017}
+), there are likely unseen
+effects that might affect the overall drug pipleline.
+One of these is the concern that when there are already approved therapies on 
+the market, patients might be loath to enroll in clinical trials,
+causing the trial to fail for reasons unrelated to the scientific or
+commercial viability of the therapy.
+
+This work endeavors to estimate the change in probability of successful completion
+of a clinical trial due to the existence of alternative drugs on the market. 
+In particular, it seeks to establish whether such an impact is mediated
+by enrollment patterns or is caused more directly.
+
+
+The paper proceeds as follows: a brief literature review in \cref{SEC:LiteratureReview}, 
+a description of the caual model in \cref{SEC:CausalIdentification},
+followed by a description of the data (\cref{SEC:Data}) and the 
+econometric model (\cref{SEC:EconometricModel}).
+Preliminary results are presented in \cref{SEC:Results} and a discussion
+of proposed improvements is included in \cref{SEC:Improvements}.
+
+


 \end{document}
--- a/Latex/Paper/sections/02_data.tex
+++ b/Latex/Paper/sections/02_data.tex
@ -2,9 +2,15 @@
 \graphicspath{{\subfix{Assets/img/}}}

 \begin{document}
-\subsection{Data Sources}
-The following are the data source I plan on using.
+In the sections below, I examine each source of data, their key features,
+and describe applicable terminology (\cref{datasources}).
+I then discuss how these sources were tied together (\cref{datalinks}) and 
+describe the specific data used in the analysis (\cref{dataintegration}).

+\subsection{Data Sources}\label{datasources}
+
+
+%----------------------------------------------------
 \subsubsection{Clinical Trials Data}
 %ClinicalTrials.gov
 %   Key features - brief description
@ -19,7 +25,117 @@ The following are the data source I plan on using.
 %       counts: number of sponsor changes since start.
 %   Links to data

-\subsubsection{Drugs on Market}
+Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled 
+drugs or devices on human subjects must register 
+their trial at \url{ClinicalTrials.gov}
+(\cite{noauthor_fdaaa_nodate}).
+This involves submitting information on the expected enrollment and duration of
+trials, drugs or devices that will be used, treatment protocols and study arms, 
+as well as contact information the trial sponsor and treatment sites.
+
+When starting a new trial, the required information must be submitted 
+``\dots not later than 21 calendar days after enrolling the first human subject\dots''.
+After the initial submission, the data is briefly reviewed for quality and 
+then the trial record is published and the trial is assigned a 
+National Clinical Trial (NCT) identifier.
+(\cite{noauthor_fdaaa_nodate}).
+
+Each trial's record is updated periodically, including a final update that must occur 
+within a year of completing the primary objective, although exceptions are
+available for trials related to drug approvals or for trials with secondary
+objectives that require further observation\footnote{This rule came into effect in 2017}
+(\cite{noauthor_fdaaa_nodate}).
+Other than the requirements for the the first and last submissions, all other
+updates occur at the discresion of the trial sponsor.
+Because the ClinicalTrials.gov website serves as a central point of information
+on which trials are active or recruting for a given condition or drug,
+most trials are updated multiple times during their progression.
+
+There are two primary ways to access data about clinical trials.
+The first is to search individual trials on ClinicalTrials.gov with a web browser.
+This web portal shows the current information about the trial and provides 
+access to snapshots of previous versions of the same information.
+Together, these features fulfill most of the needs of those seeking 
+to join a clinical trial.
+%include screenshots?
+The second way to access the data is through a normalized database setup by
+the 
+\href{https://aact.ctti-clinicaltrials.org/}{Clinical Trials Transformation Initiative}
+called AACT. %TODO: Get CITATION
+The AACT database is available as a PostgreSQL database dump or set of pipe (``$\vert$'') 
+delimited files and matches the current version of the ClinicalTrials.gov database.
+This format is ameniable to large scale analysis, but does not contain information about past 
+state of trials.
+
+One of the main products of this research was the creation of a set of python scripts to 
+incorporate the historical data on clinical trials available through the web
+portal and merge it into a local copy of the standard AACT database.
+This novel dataset can be used to easily track changes across many trials, 
+particularly in the areas of enrollment and expected duration.
+
+%describe the data NCT, trial records, mesh_terms, etc
+In this combined dataset of current and historical trial records, there are a few 
+areas of particular interest.
+\begin{itemize}
+    \item NCT: As a unique identifier of a trial, it is used throughout to 
+        ensure data is linked to the appropriate trial.
+    \item Enrollment: This takes on two forms. 
+        At the beginning of a trial this is presented as ``Anticipated'' 
+        enrollment, while near or at the end of the trial it is reported
+        as ``Actual'' enrollment.
+    \item Overall Status: Each trial must be in one of a list of states. 
+        While a trial is running, it can be in any of the following states.
+        \begin{itemize}
+            \item Not yet recruiting
+            \item Recruiting
+            \item Enrolling by Invitation
+            \item Active, not recruiting
+            \item Suspended %I don't explicitly deal with this case
+        \end{itemize}
+	When a trial has ended it is in one of two states:
+        \begin{itemize}
+            \item Terminated: Trial has ended premateurly
+            \item Completed: Trial has ended after observing what they hoped to observe.
+        \end{itemize}
+    \item Start Date: The date that the first measurement was taken or that the 
+        first site was authorized to take measurements.
+    \item Primary Completion Date: The date the last measurement for the primary 
+        objective was taken. 
+        Prior to the actual primary completion date, this is an anticipated value.
+    \item Conditions: The conditions of interest in the trial.
+    \item Interventions: The drug(s) used in treatment.
+\end{itemize}
+
+
+
+%----------------------------------------------------
+\subsubsection{Drug Compounds and Structured Product Labels (SPLs)}
+
+When a drug is licensed for sale in the U.S., it is not just the active 
+ingredients that are licensed, but also the dosage.
+Each of these combined dosage and compound pairs is assigned a unique 
+National Drug Code (NDC).
+%mention orange book
+The list of approved NDCs  are released regularly in the FDA's 
+Orangebook (small-molecule drugs) and Purplebook (Biologicals) publications.
+These two publications also contain information regarding which drugs are generics
+or biosimilars. %TODO: REF
+%which drugs are originators and which are generics (there is a better word for originator).
+
+Before a drug or drug compound is sold on the market, the FDA requires the seller
+to submit a standardized label and associated information called
+a Structured Product Label (SPL). 
+These SPLs include information about dosage, ingredients, warnings, and printed labels.
+Each NDC code can have multiple SPLs associated with it because each
+drug compound may be packaged in multiple ways, e.g. boxes with different 
+numbers of blister packs.
+These SPLs are made available for download so that they can be integrated 
+into patient health systems to improve patient safety (\cite{noauthor_indexing_nodate}).
+
+The FDA also published additional data in the NDC SPL Data Elements (NSDE) file.
+This file contains some of the data from the SPL files, as well as the dates 
+when each product was approved for sale and when it was removed from the market.
+
 %Structured Product Labels and dates of marketing
 %   Key features
 %   Why is it being included
@ -30,30 +146,206 @@ The following are the data source I plan on using.
 %       standardize start/end dates by getting a view: compound, dates, manufacturer.
 %   Links to data

+%----------------------------------------------------
 \subsubsection{Global Disease Burden Survey}
-%Dataset name
-%   Key features
-%   Why is it being included
-%   What specific data is used
-%   Data Manipulations for each
-%   Links to data

+The University of Washington's Institute for Health Metrics and Evaluation
+published a dataset called the Global Burdens of Disease Study 2019 (GBD 2019).
+This dataset provides estimates of worldwide incidence of 
+various diseases and classes of diseases.
+%\footnote{A full list of the diseases and categories can be found in \ref{Appendix1}}
+The available measures of incidence include Deaths, Disability Adjusted Life Years (DALYs), 
+Years of Life Lost (YLL), and Years Lived with Disability (YLD) and come with 
+both an estimate and 95\% confidence interval bounds.
+Estimates are available for national, multinational, and global 
+populations 
+(\cite{vos_global_2020}).

-%Dataset name
-%   Key features
-%   Why is it being included
-%   What specific data is used
-%   Data Manipulations for each
-%   Links to data
+These classes of disease are organized in a hierarchy, with each subsuming category
+having its own estimates of disease incidence. 
+One understandable defficiency in this dataset is that it doesn't account for all
+diseases tracked in other datasets, but focuses on those
+that are most important from a public health perspective.
+%not quite sure how to fill this out. What I am hoping to do is to justify my use of 
+% the highest level (most precise categories of data. Might be better to discuss
+% the nested category outline.
+The IHME also provides a link between the disease/cause hierarchy and ICD10 
+codes 
+(\cite{global_burden_of_disease_collaborative_network_global_2020}).
+
+
+%----------------------------------------------------
+\subsubsection{Medical and Pharmacological Terminologies}\label{datalinks}
+In order to link these disparate data sources I used multiple standardized 
+terminologies. 
+In each section below I briefly describe each terminology, its contents, and uses.
+
+\paragraph{Medical Subject Headings (MeSH) Thesaurus}
+
+The Medical Subject Headings (MeSH) Thesaurus is produced and maintained by the National
+Library of Medicine.
+It is used to index subjects in various NLM publications including PubMed 
+(\cite{noauthor_medical_nodate}).
+The AACT database contains a table that links clinical trials' clinical conditions
+and drug names to terms in the MeSH thesaurus.
+As this contains a standardized nomenclature, it simplified much of the 
+linking between clinical trials and other datasources.
+
+\paragraph{RxNorm}
+
+According to \cite{noauthor_rxnorm_nodate-1}
+\begin{displayquote}
+	What is RxNorm? \\
+	RxNorm is two things: a normalized naming system for generic and branded drugs; 
+	and a tool for supporting semantic interoperation between drug terminologies 
+	and pharmacy knowledge base systems.\dots \\
+\end{displayquote}
+Both of these functions are crucial to the analysis. 
+The normalized naming system allowed me to convert a diverse 
+set of names as recorded for each clinical trial into standardized identifers.
+These standardized identifiers are known as RxCUIs, and they are used in RxNorm
+to identify not only individual drug components, but also brand names, licensed
+drug/dosage pairs, and packages.
+The links to other drug terminologies included links to SPL identifiers, which
+permitted me to link each trial to drugs on the market at and point in time.
+
+%How did I get and incoprorate this data.
+The RxNorm data is provided in multiple formats.
+The one I chose to use was a MariaDB database that backs a service called RxNav
+provided by the National Library of Medicine (NLM). 
+The NLM provides scripts to set up and host the backing databases on your
+own servers
+(\cite{noauthor_rxnav---box_nodate}). 
+After setting up the local server, I wrote a python program to export 
+the data from the RxNorm database and import it into the AACT Database.
+This was required because the former uses a MariaDB database server
+and the latter uses a Postgres database server.
+
+With the data now available alongside the AACT database, I could link trials
+to various key drug concepts, including normalized drug ingredient names,
+NDCs incorporating those ingredients, and the brand names associated with the NDCs.
+
+\paragraph{International Classification of Diseases 10th revision (ICD-10)}
+
+%what it is
+The International Classification of Diseases 10th revision (ICD-10) is a 
+worldwide standard for categorizing human disease maintained by the 
+World Health Organization.
+Although the WHO version's last major update was in 2019 and it was officially
+superceded in 2022 by the 11th revision (\cite{noauthor_international_nodate}),
+the 10th revision is still in use in the United States as the 
+Centers for Medicare and Medicaid Services (CMS) continues to publish
+updated versions called 
+ICD-10-CM (Clinical Managment) (\cite{noauthor_2023_nodate}) and 
+ICD-10-PCS (Procedure Coding System)(\cite{noauthor_2023_nodate-1}) for use
+in medical billing.
+
+ICD-10 codes are organized in a heirarchy. 
+There are 22 highest level categories, representing general categories such 
+as cancers, mental illness, and infectious diseases.
+The second layer of the hierarchy consists of about 225 2nd level groupings.
+
+
+%how was it used
+The GBD database provided a mapping between their categories and ICD-10
+codes (\cite{global_burden_of_disease_collaborative_network_global_2020}).
+Unfortunately it appears to use a combination of the default WHO ICD-10 codes
+and the ICD-10-CM codes from the CMS. 
+Additionally, many diseases classified by ICD-10 codes do not correspond to 
+categories in the GDB database.
+
+%how it was obtained
+As I needed a combined list of ICD-10 codes, I first obtained the 2019 version
+of the ICD-10-CM codes from the CMS (\cite{noauthor_2019_nodate}).
+With the arrival of the ICD-11 system, it was difficult to find an official 
+source from which to download the WHO versions of ICD-10 codes. 
+Eventually I resorted to copying them from the navigation bar of the 
+\href{https://icd.who.int/browse10/2019/en}{official WHO ICD-10 (2019) website}
+(\cite{noauthor_icd-10_nodate}.)
+After getting both sources into the same format,
+I combined them and removed duplicate codes, preferring to keep the descriptions
+from the WHO version.
+This was done using standard unix scripting commands.
+I then imported the data into the Postgres Database alongside the AACT data.
+
+\paragraph{Unified Medical Language System (UMLS) Thesarus}
+
+The NLM also publishes a medical terminology thesaurus
+known at the Unified Medical Language System (UMLS) which links terminologies 
+such as RxNorm, MeSH, and ICD-10.
+It is made available through an API hosted by the NLM.
+One key feature is the ability to use a basic text search to find matching 
+terms in various terminologies.
+
+In order to link clinical trials to standardized ICD-10 conditions and thus
+to the Global Burdens of Disease Data, I wrote a python script to search the 
+UMLS system for ICD-10 codes that matched the MeSH descriptions for
+each trial.
+This search resulted in generally three categories of search results:
+\begin{enumerate}
+    \item The results contained a few entries, one of which was obviously correct.
+    \item The results contained a large number of entries, a few of which were correct.
+    \item The results did not contain any matches.
+\end{enumerate}
+In these cases I needed a way to validate each match and potentially add my own
+ICD-10 codes to each trial.
+To this end I build a website that allows one to quickly review and edit these 
+records.
+
+The effort to manually review this data is ongoing.
+
+
+\subsection{Data Integration}\label{dataintegration}
+%Goal - Help readers understand which data were used in the analysis
+Below is more information about how the data was used in the analysis.
+
+%Describe data pulled from AACT/historical snapshots
+% what are snapshots.
+% enrollment
+% elapsed_duration
+% current_status
+For clinical trials, I captured each update that occured after the start date 
+and prior to the primary completion date of the trial.
+For clarity I will refer to these as a snapshot of the trial.
+For each snapshot I recorded the enrollment (actual or anticipated), 
+the date the it was submitted, the planned primary completion date,
+and the trial's overall status at the time.
+I also extracted the anticipated enrollment closest to the actual start date 
+of the trial, which I will call the planned enrollment under the assumption
+that the sponsor is recording their current plan for enrollment.
+From these I constructed a couple of normalized values.

-\subsection{Data Linkages}
+The first is a normalized measure of enrollment. 
+This was constructed by dividing the snapshot enrollment by the planned enrollment.
+The purpose of this was to normalize enrollment to a scale roughly around 1 
+instead of the widely varying counts that raw enrollment would give.
+The second was a measure of how far along the trial was
+in it's planned duration, in other word a measure of elapsed duration.
+This was calculated for each snapshot as:
+\begin{align}
+	\text{Elapsed Duration} = 
+	\frac{\text{Snapshot Date} - \text{Start Date}}
+	    {\text{Primary Completion Date (anticipated)} - \text{Start Date}}
+\end{align}
+Note that this has a range of $[0,\infty)$ although for practical 
+matters it is only about $[0,3]$. %good to put a graph here
+I also included the current status by encoding it to dummy parameters.

-%% Ideal
-% Link trial indications to a generic indication registry
-% Link SPL indications to a generic indication registry
-% Link Global disease burden to a generic indication registry
-% Link trial compounds a generic registry (MPT or whatever that was)?
-% Link SPL compounds to a generic registry
-% Link Market data to SPLs
+%Describe linking drugs/getting number of brands
+As a basic measure of market conditions I have gathered the number of brands 
+that are producing drugs containing the compound(s) of interest in the trial.
+This was done by extracting the RxCUIs that represented the drugs of interest,
+then linking those to the RxCUIs that are brands containing those ingredients.

+%Describe linking icd10 codes to GBD
+% Not every icd10 code maps, so some trials are excluded.
+%Describe categorizing icd10 codes
+After manually matching each trial to an ICD-10 code, each trial is easily linked to 
+either one of the 22 highest level categories or the 225 or so 2nd level 
+categories in the ICD-10 hierarchy.
+Linking to one of the disease categories in the GBD heirarchy is similarly easy.
+To get the best estimate of the size of the population associated with a disease,
+each trial is linked to the most specific disease category applicable.
+As not every ICD-10 code is linked to a condition in the GBD, those without any 
+applicable conditions are dropped from the dataset.
 \end{document}
--- a/Latex/Paper/sections/03_CausalIdentification.tex
+++ b/Latex/Paper/sections/03_CausalIdentification.tex
@ -3,103 +3,80 @@

 \begin{document}

-The identification strategy centers on the fact that, in the U.S., clinical trials
-update the publically available information on \url{ClinicalTrials.gov}, which are then made
-available as historical snapshots.
-These updates typically include information such as additional sites conducting the study, 
-the study status, and expected or current enrollment figures. 
-By measuring enrollment and other factors prior to the conclusion of a trial, we 
-can measure the effect of enrollment on trial conclusion 
-(specifically whether it is registered as completed or terminated).
-In particular, this avoids measuring the joint determination of enrollment and conclusion
-status arising from trials terminated early.
-Figure \ref{Fig:CausalModel} describes the structural causal model (SCM) used to justify
-the causal identification
+Because running experiments on companies running clinical trials is not going
+to happen anytime soon, causal identification will depend on creating a 
+structural causal model.
+In \cref{Fig:CausalModel} I diagram the directed acyclic graph that describes
+the data generating model.
+The proposed data generating model consists of a decision maker, the study 
+sponsor, who must decide whether to let a trial run to completion or terminate
+the trial early. 
+While receiving updates regarding the status of the trial, they ask questions
+such as:
+\begin{itemize}
+    \item Do I need to terminate the trial due to safety incidents?
+    \item Does it appear that the drug is effective enough to achieve our 
+        goals, justifying continuing the trial?
+    \item Are we recruiting enough participants to achive the statistical
+        results we need?
+    \item Does the current market conditions and expectations about returns on 
+        investment justify the expenditures we are making?
+\end{itemize}
+When appropriate, the study sponsor terminates the trial.
+If there are not enough issues to terminate the trial, it continues until it 
+is completed.
+
+While conducting a trial, the safety and efficacy of a drug are driven by
+fundamental pharmacokinetic properties of the compounds. 
+These are only imperfectly measured both prior to and during any given trial.
+Previously measured safety and efficacy inform the decision to start the trial
+in the first place while currently observed safety and efficiency results
+help the sponsor judge whether or not to continue the trial.
+Of course, these decisions are both affected by the specific condition being
+treated due to differences in the severity of the symptoms.
+
+When a trial has been started, it comes time to recruit participancts.
+Participants frequently depend on the advice of their physician when deciding 
+to join a trial or not. 
+As these physicians have a duty to seek their patients best interest; they, along
+with their patients will evaluate if the previously observed safety and efficacy
+results justify joining the trial over using current standard treatments.
+Thus the current market conditions may affect the rate at which participants 
+enroll in the trial.
+
+The enrollment of participants in a trial depends on a few other factors.
+The condition or disease of interest and how it progresses will determine how long
+recruitiment will be held open versus just an observation of treatment arms.
+Aditionally, a trial that has already reached a high enough enrollment will often
+close recruitment by switching to an "Active, not recruiting" stage to manage costs.
+Finally, enrolling participants depends on how difficult it is to find people 
+who suffer from the condition of interest.
+
+The preceeding issue of population size also affects the number of alternatives available.
+When there are less people affected by the disease, the smaller market reduces 
+possible profitability, all else equal.
+Thus the likelihood of companies paying the sunk costs to develop drugs for
+these conditions may be lower.
+Finally, the number of alternatives on the market may affect the return on
+investment directly, causing a trial to terminate early if the return is
+not high enough.

 \begin{figure}[H] %use [H] to fix the figure here.
-    \tikzfig{../assets/tikzit/CausalGraph}
+	\includegraphics[width=\textwidth]{../assets/img/dagitty-model.jpg}
    \caption{Causal Model}
    \label{Fig:CausalModel}
 \end{figure}
-
-The identification strategy is based on the backdoor criterion due to \cite{PEARL1995}.
-As the backdoor criteron depends on the SCM being a Directed Acyclic Graph (DAG, the first 
-step is to justify the DAG in \cref{FIG:CausalModel}.
-
-% The data consists of individual snapshots
-%   Describe "states" 
-%   Also, snapshot states are dependent across time
-%   Define conclusion state vs snapshot state.
-The key feature of the data is that it consists of sequences of trial snapshots for each trial.
-Snapshots prior to the start of the trial capture expected enrollment and time to completion,
-while snapshots during the trial record actual enrollement figures, current status, 
-and the date the snapshot was recorded.
-Finally, after a trial concludes, snapshots list final enrollment and the date at which the 
-last participant was examined\cite{CLINICALTRIALS-data_spec}.
-In the discussion below, I refer to a snapshot's ``state'' as the enrollment, duration, and status 
-recorded at the time of the snapshot.
-%TODO: make sure data section discusses the normalization of enrollment and duration.
-Additionally, I distinguish between the state at trial conclusion and state from a snapshot during the running trial as
-``conclusion state'' and ''active snapshot state''.
-%   Describe market conditions.
-Associated with each trial snapshot are the market conditions existing at that point in time.
-
-
-%Describe the observed and unobserved events and their supposed relationships.
-%%%%% Relationships of interest
-% Snapshot State -> Conclusion state
-% Discuss how the data captures this - time dependence
-%TODO
-
-% Market -> Snapshot state
-% Market -> Conclusion state
-
-
-
-%%%%% Confounding relationships and controls
-% Disease Burden -> Market Conditions, Snapshot State
-In addition to the relationships of interest between teh active snapshot states and 
-the conclusion states, there are various biasing effects that need to be accounted for.
-The first of these is the fact that enrollment and the drugs currently on the market are
-both affected by the number of people who are affected by the disease under examination.
-This biases not only the estimate of the total causal effect of market conditions 
-on conclusion state but also the direct effects of both 
-market conditions and active snapshot enrollment on conclusion state.
-Additionally, it biases the estimation of the effect of market conditions on
-active snapshot enrollment.
-I plan on using the WHO's Global Disease Burden Survey
-to control for population size. %CITE - ekaterina
-
-% Biasing Pathways
-
-%   Compound Safety -> Current Adverse Events -> Conclusion State. 
-%       Note: Compound Safety -> Current Adverse Events -/> Snapshot State. 
-%       Even if it were an issue, the direct events should still be identified?
-A second biasing effect is related to the fact that a compounds safety drives both beliefs about 
-the compound -- affecting active snapshot enrollment -- and the current adverse effects
-which directly influcences the conclusion state by leading to terminations.
-The backdoor criterion implies that controlling for whether or not prior trials have
-occurred will eliminate bias.
-%TODO: discuss how you will be conditioning on prior trials, i.e. per compound or just phase 3 etc.
-
-%   Compound Efficacy -> Measured Effectiveness -> Conclusion State
-Similarly, the last confounding factor is that of measured effectiveness.
-When running a trial, the sponsor will get periodic updates as to the measured effectiveness. 
-If this is lower than expected, the trial may conclude early.
-Although this is a direct effect, the issue comes through the backdoor path through prior trials
-and beliefs about the compound.
-Thus controlling for prior trials eliminates this path as well.
-%   Control
-%       Compound Safety, Compound Efficacy -> Prior Trials -> Beliefs about Compound
 % 
-
-%%%% Variance controls
-% Sponsor Changes -> Conclusion Status
-Finally, the last control variable is that of sponsor changes. 
-As sponsors are captured at each snapshot, it is possible to measure when a sponsor has changed.
-Changing sponsors is a potentially disruptive event, and so it is likely to affect the probability
-that the trial is canceled early. 
-The purpose of including this control is to reduce the variance of our estimates.
-%Describe what causal effects are identified by the backdoor criterion.
+By using Judea Pearl's do-calculus, I can show that by choosing an adjustment 
+set of the decision to condut a phase III trial, the condition of interest, 
+the current status of the trial, and the population size will casually
+identify the direct effects of enrollment and market alternatives on the
+probability of termination.
+This is easily verified through the backdoor criterion, which states that
+if every path between the exposure and outcome that starts with an arrow 
+flowing into the exposure is blocked by one of the values in the adjustment
+set, then the effect of the exposure on outcome is causally identified
+(\cite{pearl_causality_2000}).
+It can be easily visually verified by the DAG on the graph that this is the case.

 \end{document}
--- a/Latex/Paper/sections/04_EconometricModel.tex
+++ b/Latex/Paper/sections/04_EconometricModel.tex
@ -11,50 +11,58 @@

 First, some notation:
 \begin{itemize}
-    \item $t,i$ index trial and snapshot of the trial respectively.
-    \item $S_{it}$ : status (``Terminated'', ``Concluded'') 
-        at the conclusion of the trial
-    \item $\tau_{it}$ : elapsed percentage of planned duration, i.e. the normalized elapsed duration.
-    \item $s_{it}$: current status variable(s) describing current trial status as one of:
-        ``Recruiting'', ``Active, not recruiting'', or ``Suspended''
-    \item $X_{it}$: data describing the trial at the snapshot level
-        \begin{itemize}
-            \item number of sponsor changes so far
-            \item percentage currently enrolled of planned enrolled
-        \end{itemize}
-    \item $M_{t}$: data describing market conditions at the time of the snapshot.
-        \begin{itemize}
-            \item number of compounds approved for the same indication
-            \item total number of marketers approved for the same indication
-            \item number of marketers for this specific compound
-        \end{itemize}
-    \item $T_t$: Trial Level Data
-        \begin{itemize}
-            \item Indication or indication class 
-            \item Sponsor Type 
-        \end{itemize}
+    \item $n$: indexes trial snapshots.
+    \item $y_n$: whether each trial terminated (true) or completed (false).
+    \item $d$: indexes ICD-10 disease categories.
+    \item $d_n$: represents the disease category of the trial associated with the snapshot $n$.
+    \item $x_n$: represents the other dependent variables associated to the snapshot.
+        This includes\footnote{No trials in the current dataset are ever suspended.}:
+        \begin{enumerate} 
+            \item Elapsed duration
+            \item arcsinh of the number of brands
+            \item arcsinh of the DALYs from high SDI countries
+            \item arcsinh of the DALYs from high-medium SDI countries
+            \item Enrollment (no distinction between anticipated or actual) 
+            \item Dummy Status: Not yet recruiting
+            \item Dummy Status: Recruiting
+            \item Dummy Status: Active, not recruiting
+            \item Dummy Status: Enrolling by invitation 
+        \end{enumerate} 
 \end{itemize} 
+The arcsinh transform is used because it is similar to a log transform but
+maps $\text{arcsinh}(0)=0$.

-Other variables used when selecting trials to include as observations are:
-\begin{itemize}
-    \item Trial Phase\footnote{
-            Conditioning on previous trials can be as simple as selecting only 
-            phase 2 or phase 3 trials.
-        }.
-    \item Does the trial have a Data Monitoring Committee?
-    \item Are the compounds an FDA regulated drug?
-\end{itemize}
-
-
-The goal is to estimate the probability distribution of conclusion status and duration conditional on the active snapshot status.

+The bayesian model to measure the direct effects of enrollment and the number 
+of other brands is easily specified as a hierarchal logistic regression.
+\begin{align}
+    y_n \sim \text{Bernoulli}(p_n) \\
+	p_n = \text{logit}(x_n \vec \beta(d_n))
+\end{align}
+Where beta is indexed by $k$ for each parameter in $x$, and by
+$d \in \{1,2,\dots,21,22\}$ for each general ICD-10 category.
+The betas are distributed
+\begin{align}
+    \beta_k(d) \sim \text{Normal}(\mu_k,\sigma_k)
+\end{align}
+With hyperparameters
 \begin{align}
-    \pr{ t, S | X,M,T} = \sum_{S\in\{\text{Terminated},\text{Concluded}\}} \pr{t|S,X,M,T} \pr{S|X,M,T}
+    \mu_k \sim \text{Normal}(0,1) \\
+    \sigma_k \sim \text{Gamma}(2,1)
 \end{align}
-Note how this has two component parts: 
+
+
+Other variables are implicitly conditioned on as they were used 
+to select trials of interest.
+These include:
 \begin{itemize}
-    \item The survival probability portion $\pr{t|S,X,M,T}$
-    \item The conclusion status portion $\pr{S|X,M,T}$
+    \item Is the trial Phase 3?\footnote{
+       Conditioning on phase 3 is equivalent to asserting that previous trials 
+       occured and had acceptable safety and efficacy results.
+       }
+    \item Does the trial have a Data Monitoring Committee?
+    \item Are the compounds an FDA regulated drug?
 \end{itemize}
+%TODO: double check the sql used to select trials of interest.

 \end{document}
--- a/Latex/Paper/sections/05_LitReview.tex
+++ b/Latex/Paper/sections/05_LitReview.tex
@ -1,19 +1,97 @@
 \documentclass[../Main.tex]{subfiles}
 \graphicspath{{\subfix{Assets/img/}}}

-%%%%% Organization %%%%%
-% First read and write personal notes.
-% Then organizing/tagging the literature along these axes
-%   - Trial and Trial Sequence Completion/success rates
-%   - R&D efforts
-%   - FIND MORE
-%   - 
-%
-% Now figure out the order in which to present them and what they explain.
-% Now sumarize each one, within the context of the literature around it.
-% Now write them out together
 \begin{document}
-TODO

+This paper sits within an intersection of health and industrial organization economics
+that is frequently studied.
+Encouraging a strong supply of novel and generic pharmaceuticals contributes 
+in important ways to both public health and fiscal policy.
+Not only to the pathway to drug approval long, as many as 90\% of compounds
+that begin human trials fail to gain approval 
+(\cite{khmelnitskaya_competition_2021}).
+Complicating this is the complex regulatory and competitive environment in
+which pharmaceutical companies operate. 
+
+%%%%%%%%% Why are drugs so expensive?
+
+% van der Grond, Uyle-de Groot, Pieters 2017
+% - What causes high costs of drugs? 
+% - High level synthesis of discussion regarding causes
+% - Academic and non-academic sources
+
+
+%%%%%%%%%%%%%%%% What do we know about clinical trials?
+
+% Hwang, Carpenter, Lauffenburger, et al (2016)
+% - Why do investigational new drugs fail during late stage trials?
+\citeauthor{hwang_failure_2016} (\citeyear{hwang_failure_2016}) 
+investigated causes for which late stage (Phase III)
+clinical trials fail across the USA, Europe, Japan, Canada, and Australia. 
+They found that for late stage trials that did not go on to recieve approval,
+57\% failed on efficacy grounds, 17\% failed on safety grounds, and 22\% failed
+on commercial or other grounds.
+For context, this current work hopes to be able to distinguish some of the 
+mechanisms behind those commercial or other failures.
+
+% Abrantes-Metz, Adams, Metz (2004)
+% - What correlates with successfully passing clinical trials and FDA review?
+% - 
+In \citeyear{abrantes-metz_pharmaceutical_2004}, 
+\citeauthor{abrantes-metz_pharmaceutical_2004} 
+described the relationship between
+various drug characteristics and how the drug progressed through clinical trials.
+This non-causal estimate was notable for using a 
+mixed state proportional hazard model and estimating the impact of 
+observed characteristics in each of the three phases.
+They found that as trials last longer, the rate of failure increases for
+Phase I \& II trials, while Phase 3 trials generally have a higher rate of 
+success than failure after 91 months.
+
+% Ekaterina Khmelnitskaya (2021)
+% - separates scientific from market failure of the clinical drug pipeline
+In her doctoral dissertation, Ekaterina Khmelnitskaya studied the transition of
+drug candidates between clinical trial phases. 
+Her key contribution was to find ways to disentangle strategic exits from the 
+development pipeline and exits due to clinical failures.
+She found that overall 8.4\% of all pipeline exits are due to strategic 
+terminations and that the rate of new drug production would be about 23\% 
+higher if those strategic terminatations were elimintated
+(\cite{khmelnitskaya_competition_2021}).
+
+% Waring, Arrosmith, Leach, et al (2015)
+% - Atrition of drug candidates from four major pharma companies
+% - Looked at how phisicochemical properties affected clinical failure due to safety issues
+%not in this version
+
+
+
+
+%%%%%%%%% What do we know about drug development incentives?
+
+% Dranov, Garthwaite, and Hermosilla (2022)
+% - does the demand-pull theory of R&D explain novel compound development?
+% - no, it is biased towards follow-on drug R&D.
+
+% Acemoglu and Linn
+% - Market size in innovation
+% - Exogenous demographic trends has a large impact on the entry of non-generic drugs and new molecular entitites.
+On the side of market analysis, %TODO:remove when other sections are written up.
+\citeauthor{acemoglu_market_2004} 
+(\citeyear{acemoglu_market_2004})
+used exogenous deomographics changes to show that the
+entry of novel compounds is highly driven by the underlying aged population.
+They estimate that a 1\% increase in applicable demographics increase the
+entry of new drugs by 6\%, mostly concentrated among generics.
+Among non-generics, a 1\% increase in potential market size 
+(as measured by demographic groups) leads to a 4\% increase in novel therapies.
+
+% Gupta
+% - Inperfect intellectual property rights in the pharmaceutical industry
+%\cite{GupaPhd2023} 
+
+% Agarwal and Gaule 2022
+% - Retrospective on impact from COVID-19 pandemic
+% Not in this version

 \end{document}
--- a/Latex/Paper/sections/06_Results.tex
+++ b/Latex/Paper/sections/06_Results.tex
@ -1,12 +1,115 @@
 \documentclass[../Main.tex]{subfiles}
-\graphicspath{{\subfix{Assets/img/}}}

 \begin{document}
-In September of 2019, the European Space Agency (ESA) released a tweet 
-explaining that they had performed an appendicotomy in space using 
-nothing more than radiation from the sun.
-They are sad to announce that the patient died due to complications from 
-exposure to the cold vaccum of space.
+%\subsection{Data Exploration} %TODO: fill this out later.
+%look at trial 
+\subsection{Model Fitting}
+In this section we examine the results from fitting the econometric model using
+mc-stan (\cite{mc-stan}) through the rstan (\cite{rstan}) interface.

+%describe 
+The model was based on the hierarchal logistic regression model 
+presented in the Stan Users Guide (\cite{mc-stan}), 
+and was run with 2,500 warmup iterations and
+2,500 sampling iterations in six chains.
+There were various issues, including 160 divergent transitions and the R-hat 
+measure was 1.49. 
+Overall these suggest that the econometric model is incorrect as 
+written or requires reparameterization.
+%TODO: and info about how I learned about these diagnostics
+
+
+\subsubsection{Diagnostics}
+%Examine trank plots
+To identify which parameters were problematic, I first looked at trace rank 
+histograms.
+Under idea circumstances, each line (representing a chain) should exchange 
+places with the other lines frequently.
+In both \cref{fig:mu_trank} and \cref{fig:sigma_trank}, most parameters seem
+to mix well but there are a couple of exceptions.
+This warrants further investigation.
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/mu_trank.png}
+    \caption{Trace Rank Histogram: Mu values}
+	\label{fig:mu_trank}
+\end{figure}
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/sigma_trank.png}
+    \caption{Trace Rank Histogram: Sigma values}
+	\label{fig:sigma_trank}
+\end{figure}
+
+%Take a look at batman and points for mu
+In the case of the Mu values, a parallel coordinates plot 
+doesn't seem to indicate any parameters as likely candidates
+for causing the issues with divergent transitions.
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/mu_batman.png}
+    \caption{Parallel Coordinate Plot: Mu values}
+	\label{fig:mu_batman}
+\end{figure}
+Note that at each parameter, there is some level of dispersion between 
+values that diverged.
+
+On the other hand, in the parallel coordinates plot for sigma values,
+it appears that most divergent transitions occur with values of 
+sigma[1], sigma[3], sigma[6], and sigma[7] close to zero.
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/sigma_batman.png}
+    \caption{Parallel Coordinate Plot: Sigma values}
+	\label{fig:sigma_batman}
+\end{figure}
+Overall this suggests that there is an issue with the specification
+of the covariance structures of the hyperparameters.
+
+Additional evidence that the covariance structure is incorrect comes from 
+plotting pairs of parameter values and examining the chains with divergent
+transitions.
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/sigma_pairs_5-9.png}
+	\caption{Parameter Pairs plots: Sigma[5] through Sigma[9]}
+	\label{fig:sigma_pairs_5-9.png}
+\end{figure}
+From this we can see that divergent pairs are highly correlated with the cases
+where sigma[6] or sigma[7] are equal to zero.
+This has an impact on the shape of both of those estimated parameters, causing
+both to be bimodal.
+
+
+\subsection{Interpretation}
+
+Ignoring the diagnosed issues with the model, we do see some interesting
+preliminary results.
+
+%in mu, mu[5] shifted strongly
+In \cref{fig:mu_posterior} we see that mu[5], the parameter corresponding
+to enrollment appears to be strongly negative.
+This is consistent with the idea that enrollment close to planned enrollment
+decreases the probability of terminating the trial.
+In \cref{fig:sigma_posterior}, sigma[2] (corresponding to the number of brands
+selling the drug of interest) has a large variance covers some relatively 
+high values.
+This suggests that the impact of how frequently the drug is sold varies greatly
+across different ICD-10 categories of disease.
+
+
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/mu_posterior.png}
+	\caption{Posterior Parameter Estimates: Mu}
+	\label{fig:mu_posterior}
+\end{figure}
+
+% Sigma[2] suggests there is a high variance in the impact that the number of drugs on the market has.
+\begin{figure}[H]
+    \includegraphics[width=\textwidth]{../assets/img/sigma_posterior.png}
+	\caption{Posterior Hyperparameter Estimates: Sigma}
+	\label{fig:sigma_posterior}
+\end{figure}
+
+Due to the deficiencies in the data and model, this is the limit of the 
+analysis I will perform at this time.

 \end{document}
--- a/Latex/Paper/sections/07_DataAppendix.tex
+++ b/Latex/Paper/sections/07_DataAppendix.tex
@ -0,0 +1,8 @@
+\documentclass[../Main.tex]{subfiles}
+\graphicspath{{\subfix{Assets/img/}}}
+
+\begin{document}
+\subsection{Appendix 1}\label{Appendix1}
+Insert a table containing the GBD Data Here
+
+\end{document}
--- a/Latex/Paper/sections/08_PotentialImprovements.tex
+++ b/Latex/Paper/sections/08_PotentialImprovements.tex
@ -0,0 +1,101 @@
+\documentclass[../Main.tex]{subfiles}
+\graphicspath{{\subfix{Assets/img/}}}
+
+\begin{document}
+
+As noted above, there are various issues with the analysis as completed so far.
+Below I discuss various steps that I believe will improve the analysis.
+
+\subsection{Increasing number of observations}
+
+The most important step is to increase the number of observations available.
+Currently this requires matching trials to ICD-10 codes by hand, but
+there are certainly some steps that can be taken to improve the speed with which
+this can be done.
+
+\subsection{Covariance Structure}
+
+As noted in the diagnostics section, many of the convergence issues seem
+to occure in the covariance structure. 
+Instead of representing the parameters $\beta$ as independently normal:
+\begin{align}
+    \beta_k(d) \sim \text{Normal}(\mu_k, \sigma_k)
+\end{align}
+I propose using a multivariate normal distribution:
+\begin{align}
+    \beta(d) \sim \text{MvNormal}(\mu, \Sigma)
+\end{align}
+I am not familiar with typical approaches to priors on the covariance matrix,
+so this will require a further literature search as to best practices.
+
+\subsection{Finding Reasonable Priors}
+
+In standard bayesian regression, heavy tailed priors are common. 
+When working with a bayesian bernoulli-logit model, this is not appropriate as 
+heavy tails cause the estimated probabilities $p_n$ to concentrate around the 
+values $0$ and $1$, and away from values such as $\frac{1}{2}$ as discussed in
+\cite{mcelreath_statistical_2020}. %TODO: double check the chapter for this.
+
+I indend to take the general approach recommended in \cite{mcelreath_statistical_2020} of using
+prior predictive checks to evaluate the implications of different priors
+on the distribution on $p_n$.
+This would consist of taking the independent variables and predicting the values
+of $p_n$ based on a proposed set of priors. 
+By plotting these predictions, I can ensure that the specific parameter priors 
+used are consistent with my prior beliefs on how $p_n$ behaves.
+Currently I believe that $p_n$ should be roughly uniform or unimodal, centered 
+around $p_n = \frac{1}{2}$.
+
+
+\subsection{Imputing Enrollment}
+
+Finally, I must address the issue of how enrollment is reported.
+In many cases, the trial continues to report an anticipated enrollment value
+while the trial is still recruiting.
+Thus using anticipated enrollment figures is inappropriate.
+I am planning on using bayesian imputation to estimate actual enrollment
+when it has not yet occured. 
+This will require building a statistical model of the enrollment process.
+One advantage this dataset has is that trial sponsors provide their anticipated
+enrollment numbers, allowing me to use this in the prediction model.
+Additionally, each snapshot contains the elapsed duration and current status of
+the trial , which may help improve the prediction.
+Although predicted enrollment will be imprecise, it explicitly accounts for
+uncertanty in the imputation and dependent calculations \cite{mcelreath_statistical_2020}.
+
+\subsection{Improving Population Estimates}
+
+The Global Burden of Disease dataset contains the best estimates of disease
+population sizes that I have found so far. 
+Unfortunately, for some conditions it can be relatively imprecise due to 
+its focus on providing data geared towards public health policy.
+For example, GBD contains categories for both
+drug resistant and drug suceptible tuberculosis.
+In contrast, there is no category for non-age related macular degeneration.
+One resulting concern is that for a given ICD-10 code, the applicable GBD population 
+estimates may act as an estimate of the upper bound of population size
+(\cite{global_burden_of_disease_collective_network_global_2020}). %fix citation
+I would like to explicitly address this in my model, although I have not 
+found a way to do so.
+
+
+\subsection{Improving Measures of Market Conditions}
+
+Finally, the currently employed measure of market conditions -- the number of 
+brands using the same active ingredients -- is not a very good measure of 
+the options available to potential participants of a clinical trial.
+The ideal measures would capture the alternatives available to treat a given
+disease (drug meeting the given indication) at the time of the trial snapshot, 
+but this data is hard to come by.
+In addition to the fact that many diseases may be treated by non-pharmaceutical 
+means, off-label prescription of pharmaceuticals is legal at the federal level 
+(\cite{commissioner_understanding_2019}).
+These two facts both complicate measuring market conditions.
+
+One dataset that I have only investigated briefly is the \url{DrugCentral.org}
+database which tracks official indications and some off-label indications as 
+well
+(\cite{ursu_drugcentral_2017}).
+
+
+\end{document}
--- a/Latex/Paper/sections/09_Conclusion.tex
+++ b/Latex/Paper/sections/09_Conclusion.tex
@ -0,0 +1,14 @@
+\documentclass[../Main.tex]{subfiles}
+\graphicspath{{\subfix{Assets/img/}}}
+
+\begin{document}
+Identifying commercial impediments to successfully completing 
+clinical trials in otherwise capable pharmaceuticals will hopefully 
+lead to a more robust and competitive market.
+Although the current state of this research is insufficient to draw robust
+conclusions, early results suggest that enrollment rates have some impact
+on whether or not a clinical trial terminates early or continues
+to full completion.
+
+
+\end{document}
--- a/Latex/Presentation/.presentation.tex.swp
+++ b/Latex/Presentation/.presentation.tex.swp
--- a/Latex/Presentation/presentation.tex
+++ b/Latex/Presentation/presentation.tex
@ -30,12 +30,11 @@



-
 %----------------------------------------------------------------------------------------
 %    TITLE PAGE
 %----------------------------------------------------------------------------------------

-\title[Clinical Trials]{Pharmaceutial competitors and their effect on clinical trial completion} 
+\title[Clinical Trials]{The Effects of Market Conditions on Recruitment and Completion of Clinical Trials} 

 \author{Will King} % Your name
 \institute[WSU] % Your institution as it will appear on the bottom of every slide, may be shorthand to save space
@ -54,17 +53,50 @@ Washington State University \\ % Your institution for the title page
 \begin{frame}
 \titlepage % Print the title page as the first slide
 \end{frame}
-
-
+%----------------------------------
+\begin{frame} %Allow frame breaks
+\frametitle{Clinical Trials} % Table of contents slide, comment this out to remove it
+    % - Intro and hook (Clinical Trials are key part of pharmacological pipeline)
+    Pharmaceuticals are a frequently discussed aspect of health care cost management.
+    Their development is dictated by scientific and regulatory hurdles 
+    including passing clinical trials
+    (\cite{noauthor_fda_nodate}), 
+    while their market is characterized by strategic competition and ambiguous 
+    patent protection
+    (\cite{van_der_gronde_addressing_2017}).
+
+    \vspace{12pt}
+
+    This research investigates the pathways by which market conditions
+    affect clinical trial completion.
+\end{frame}
 %-------------------------------
 \begin{frame}
-    \frametitle{Introduction}
-
+    \frametitle{This research}
+    \textbf{Questions:} 
+    \begin{enumerate}
+        \item Does the existence of alternative drugs on the market make it
+            harder for clinical trials to complete successfully? 
+        \item How much of this is occurs due to increased recruitment difficulty?
+    \end{enumerate}
+    
 \end{frame}
-%-------------------------------
+%--------------------------------
+\begin{frame}
+\frametitle{Thanks} % Table of contents slide, comment this out to remove it
+    Thanks to Chris Adams and Rebecca Sachs of the Congressional Budget Office.
+\end{frame}
+%--------------------------------
 \begin{frame}[allowframebreaks] %Allow frame breaks
 \frametitle{Overview} % Table of contents slide, comment this out to remove it
 \tableofcontents 
+% - Intro and hook
+% - Literature review
+% - Causal Identification
+% - Data
+% - Econometric model
+% - Results
+% - Improvements
 \end{frame}
 %-------------------------------

@ -100,11 +132,89 @@ Washington State University \\ % Your institution for the title page
 \end{frame}
 %-----------------------------
 \begin{frame}
-    \frametitle{Literature}
+    \frametitle{Literature Highlights}
+    \begin{itemize}
+        \item \cite{van_der_gronde_addressing_2017}: 
+            High level synthesis of overall discussion regarding drug costs. 
+            Both academic and non-academic sources.
+        \item \cite{hwang_failure_2016}:
+            Answered the question "Why do late-stage (phase III) trials fail?"
+            Found that efficacy, safety, and competition reasons accounted for 
+            57\%, 17\%, and 22\% respectively.
+        \item \cite{abrantes-metz_pharmaceutical_2004}:
+            Described how drugs progress through the 3 phases of clinical trials
+            and correlations between various trial characteristics and the 
+            clinical trial failures.
+        \item \cite{khmelnitskaya_competition_2021}: 
+            Modeled clinical trial life-cycle of drugs, found method to separate
+            scientific from competitive reasons for failure to progress to the 
+            next phase.
+%        \item \cite{}: 
+
+    \end{itemize}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{This research, in context}
+
+    In contrast to previous work looking at multiple phases of trials,
+    I seek to figure out what causes individual trials to fail.
+
+    \vspace{12pt}
+
+    Instead of focusing on the drug development pipeline, I attempt to 
+    investigate the population of drug-based, phase III trials.
+\end{frame}
+%-------------------------------
+\begin{frame} %Allow frame breaks
+\frametitle{Why this approach?} % Table of contents slide, comment this out to remove it
+
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/methodology_trial.png}
+        \label{FIG:xkcd2726}
+        \caption{``If you think THAT'S unethical, you should see the stuff we approved via our Placebo IRB.'' 
+        - \url{https://xkcd.com/2726}
+        }
+    \end{figure}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Causal Identification / DGP%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Causal Model}
+% Data Generating process
+% - Agents and their decisions
+% - Factors that influence each decision
+% - 
+% - 
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Generating Process}
+    % study sponsors
+    Study Sponsors Decide to start a Phase 3 trial and whether to terminate it.
+    \\
+    They ask themselves:
+    \begin{itemize}
+        \item Do safety incidents require terminating a trial?
+        \item Do efficacy results indicate the trial is worth continuing?
+        \item Is recruiting sufficient to achieve our results and contain costs?
+        \item Do expectations about future returns justify our expenditures?
+    \end{itemize}
+    
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Generating Process}
+    % participants
+    Participants decide to enroll (and dis-enroll) themselves in a trial based 
+    \begin{itemize}
+        \item Disease severity
+        \item Relative safety/efficacy compared to other treatments
+    \end{itemize}
+
+    Study sponsors plan their enrollment considering
    \begin{itemize}
-        \item Ekaterina
-        \item Adams
-        \item 
+        \item Total population affected
+        \item Likely participant response rates
    \end{itemize}

 \end{frame}
@ -508,18 +618,357 @@ Washington State University \\ % Your institution for the title page

 %-------------------------------
 \begin{frame}
-    \frametitle{Equilibrium in the Money Market}
-    Interest rates are the price of money, so we need to compare 
-    interest rates to the quantity of money demanded. 
+    \frametitle{Data Generating Process}
+    % Trial Snapshots and dependencies.
+    During a trial, the study sponsor reports snapshots of their trial.
+    This includes updates to:
+
+    \begin{itemize}
+        \item enrollment (actual or anticipated)
+        \item current recruitment status (Recruiting, Active not recruiting, etc)
+        \item study sponsor
+        \item planned completion dates
+        \item elapsed duration
+    \end{itemize}
+
+    Note that final enrollment and the final status (Completed or Terminated) 
+    of the trial are jointly determined.
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Causal Diagram: Key Pathways}
+    % Estimating Direct vs Total Effects
+    \begin{figure}
+        \resizebox{!}{0.5\textheight}{
+            \tikzfig{../assets/tikzit/CausalGraph}
+        }
+        \label{FIG:CausalDiagram}
+        \caption{Causal Diagram highlighting direct and total pathways}
+    \end{figure}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Causal Diagram: Backdoor Criterion}
+    \small
+    \begin{block}{$d$-separation}
+        A set $S$ of nodes blocks a path $p$ if either
+        \begin{enumerate}
+            \item $p$ contains at least one arrow-emitting node in $S$
+            \item $p$ contains at least one collision node $c$ that is outside $S$ 
+                and has no descendants in $S$.
+        \end{enumerate}
+        If $S$ blocks all paths from X to Y, then it is said to ``$d$-separate'' 
+        $X$ and $Y$, and then $X \perp Y | S$.
+    \end{block}
+    \begin{block}{Back-Door Criterion}
+        A set $S$ of covariates is admissible as controls on the 
+        causal relationship $X \rightarrow Y$ if:
+        \begin{enumerate}
+            \item No element of $S$ is a descendant of $X$
+            \item The elements of $S$ d-separate all paths from $X$ to $Y$ that include
+                parents of $X$.
+        \end{enumerate}
+    \end{block}
+    \cite{pearl_causality_2000}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Causal Diagram}
+    Key takeaways
+    \begin{itemize}
+        \item Measuring enrollment prior to trial completion is necessary for causal identification.
+        \item The backdoor criterion gives us the following adjustment sets:
+            \begin{itemize}
+                \item Total Effect for Market on Termination; Population, Condition, Phase III
+                \item Direct Effects for Enrollment, Market on Termination; Population, Condition Phase III, 
+                    Elapsed Duration, Planned Enrollment
+            \end{itemize}
+        \item Enrollment requires imputation
+    \end{itemize}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Data %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Data}
+%-------------------------------------------------------------------------------------
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Sources
+\subsection{Sources}
+%----------------------------------
+%-------------------------------
+\begin{frame} %Allow frame breaks
+    \frametitle{Data Sources}
+    \begin{itemize}
+        \item ClinicalTrials.gov - AACT \& custom scripts
+            \begin{itemize}
+                \item Select trials of interest
+                \item Trial details: 
+                    \begin{itemize}
+                        \item conditions
+                        \item final status
+                        \item drugs/interventions
+                    \end{itemize}
+                \item Trial snapshots:
+                    \begin{itemize}
+                        \item enrollment (anticipated, planned, or actual)
+                        \item elapsed duration
+                        \item current status
+                    \end{itemize}
+            \end{itemize}
+        \item Medical Subject Headings (MeSH) Thesaurus
+            \begin{itemize}
+                \item A standardized nomenclature used to classify interventions 
+                    and conditions in the clinical trials database.
+            \end{itemize}
+    \end{itemize}
+\end{frame}
+%-------------------------------
+\begin{frame} %Allow frame breaks
+    \frametitle{Data Sources}
+    \begin{itemize}
+        \item NSDE Files (New drug code Structured product labels Data Element)
+            \begin{itemize}
+                \item Contains information about when a given drug was on the market.
+            \end{itemize}
+        \item RxNorm
+            \begin{itemize}
+                \item Links pharmaceuticals between MeSH standardized terms and 
+                    NSDE files.
+            \end{itemize}
+        \item Global Disease Burden Survey (2019)
+            \begin{itemize}
+                \item Estimates of DALYs for categories of disease
+                \item Links of Categories to ICD-10 Codes
+            \end{itemize}
+        \item ICD-10 (2019)
+            \begin{itemize}
+                \item WHO version
+                \item CMS version (Clinical Management)
+                \item Used to group disease conditions in hierarchical model
+            \end{itemize}
+        \item Unified Medical Language System Thesaurus
+            \begin{itemize}
+                \item Used to link MeSH standardized terms and ICD-10 conditions
+                \item Manual matching process
+            \end{itemize}
+    \end{itemize}
+\end{frame}
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Integration
+\subsection{Integration}
+%----------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Summaries}
+    %put summaries now
+    \begin{itemize}
+        \item Number of Phase III, FDA monitored Drug Trials: 1,981
+        \item Number of Trials matched to ICD-10: 186
+        \item Number of Trials matched to ICD-10 with population measures: 67 
+            (51 completed, 16 terminated)
+        \item Number of Snapshots: 616
+    \end{itemize}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Data used}
+    The following data points were used.
+    \begin{itemize}
+        \item elapsed duration 
+        \item asinh(number of brands)
+        \item asinh(high sdi DALY estimate)
+        \item asinh(high-medium sdi DALY estimate)
+        \item asinh(medium sdi DALY estimate)
+        \item asinh(low-medium sdi DALY estimate)
+        \item asinh(low sdi DALY estimate)
+    \end{itemize}
+    The asinh operator was used because it parallels $\text{ln}(x)$ for 
+    large values of $x$ but also handles $\text{asinh}(0)=0$.
+\end{frame}
+%----------------------------------
+\begin{frame}
+    \frametitle{Summaries: Trial Durations}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-12_durations_hist.png}
+        \label{FIG:durations}
+        \caption{Trial Durations (days)}
+    \end{figure}
+\end{frame}
+%----------------------------------
+\begin{frame}
+    \frametitle{Summaries: snapshots}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-12_snapshots_hist.png}
+        \label{FIG:snapshots}
+        \caption{Number of Snapshots per matched trial}
+    \end{figure}
+\end{frame}
+%----------------------------------
+\begin{frame}
+    \frametitle{Summaries: snapshots}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-12_status_duration_snapshots_points.png}
+        \label{FIG:snapshot_duration_scatter}
+        \caption{Scatterplot of snapshot count and durations}
+    \end{figure}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Econometric Model %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Econometric model}
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Econometric Model}
+    Estimating the total effect of brands on market
+    \begin{align}
+        y_n &\sim \text{Bernoulli}(p_n) \\
+        p_n &= \text{logisticfn}(x_n * \beta(d_n)) \\
+        \beta_k(d) &\sim \text{Normal}(\mu_k, \sigma_k) \\
+        \mu_k &\sim \text{Normal}(0,1) \\
+        \sigma_k &\sim \text{Gamma}(2,1)
+    \end{align}
+    $k$ indexes parameters and $d_n$ represents the ICD-10 group the trial corresponds to.
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Results %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Results}
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Results}
+    Because Bayesian estimation is typically done numerically, we will first 
+    validate convergence.
+
+    Then we will take a look at preliminary results.
+
+    Sampling details
+    \begin{itemize}
+        \item 6 chains
+        \item 2,500 warm-up, 2,500 sampling runs
+        \item seed = 11021585
+    \end{itemize}
+\end{frame}
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Convergence Tests
+\subsection{Convergence}
+%----------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Warnings}
+
+    \begin{itemize}
+        \item There were no diverging transitions.
+        \item There were 15,000 transitions that exceeded max treedepth. 
+            Sampling efficiency is poor.
+        \item All chains had low Bayesian Fraction of Missing Information. 
+            Some areas of the distribution were poorly explored.
+        \item R-hat = $1.23$, ideal is around 1, chains did not mix well.
+        \item Bulk and Tail Effective Sample sizes were low, 
+            suggesting mean and variance/quantile estimates will be unreliable.
+    \end{itemize}
+    \cite{mc-stan}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Convergence: Mu}
+    \begin{figure}
+        \includegraphics[height=0.9\textheight]{../assets/img/2023-04-11_mu_points.png}
+        \label{FIG:caption}
+        \caption{Hyperparameter Points Plots: Mu}
+    \end{figure}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Convergence: Sigma}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-11_sigma_points.png}
+        \label{FIG:caption}
+        \caption{Hyperparameter Points Plots: Sigma}
+    \end{figure}
+\end{frame}
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Preliminary Results
+\subsection{Preliminary Results}
+%----------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Preliminary Results: Mu}

+    \begin{columns}
+        \begin{column}{0.3\textwidth}
+            \begin{enumerate}
+                \item elapsed duration 
+                \item asinh(n\_brands)
+                \item asinh(high sdi)
+                \item asinh(high-medium sdi)
+                \item asinh(medium sdi)
+                \item asinh(low-medium sdi)
+                \item asinh(low sdi)
+            \end{enumerate}
+        \end{column}
+        \begin{column}{0.7\textwidth}
+            \begin{figure}
+                \includegraphics[height=0.8\textheight]{../assets/img/2023-04-11_mu_dist.png}
+                \label{FIG:caption}
+                \caption{Hyperparameter Distribution: Mu}
+            \end{figure}
+        \end{column}
+    \end{columns}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Preliminary Results: Sigma}

    \begin{figure}
-%        \tikzfig{../Assets/tikzit/}
-        \label{FIG:costs}
-        \caption{Money Market Equilibrium}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-11_sigma_dist.png}
+        \label{FIG:caption}
+        \caption{Hyperparameter Distribution: Sigma}
    \end{figure}
 \end{frame}
 %-------------------------------
+\begin{frame}
+    \frametitle{Interpretation}
+    All of the following interpretations are done in the context of insufficient data
+
+    \begin{enumerate}
+        \item Elapsed Duration (Mu[1]): Trending Negative, reduced probability of termination.
+        \item Number of Brands(Mu[2]): Trending Positive, increased probability of termination.
+        \item Population Measures (Mu[3]-Mu[7])
+            \begin{enumerate}
+                \item What is most surprising is that these are both positive and negative.
+                    Probably need more data.
+            \end{enumerate}
+        \item It is surprising to see the wide distribution in sigma values.
+    \end{enumerate}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Improvements %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Improvements}
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Proposed improvements}
+    \begin{enumerate}
+        \item Match more trials to ICD-10 codes
+        \item Improve Measures of Market Conditions
+        \item Adjust Covariance Structure
+        \item Find Reasonable Priors
+        \item Remove disease categories that don't exist in the data from the priors
+        \item Imputing Enrollment
+        \item Improve Population Estimates
+    \end{enumerate}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Questions?}
+    \center{\huge{Questions?}}
+
+\end{frame}
+%-------------------------------
+\begin{frame}[allowframebreaks]
+    \frametitle{Bibliography}
+    \printbibliography
+\end{frame}
+%-------------------------------
 \end{document} 
 %=========================================
 %\begin{frame}
--- a/Latex/Presentation/presentation.tex.bak
+++ b/Latex/Presentation/presentation.tex.bak
@ -0,0 +1,573 @@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Beamer Presentation
+% LaTeX Template
+% Version 1.0 (10/11/12)
+%
+% This template has been downloaded from:
+% http://www.LaTeXTemplates.com
+%
+% License:
+% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
+%
+% Changed theme to WSU by William King
+%
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+%----------------------------------------------------------------------------------------
+%	PACKAGES AND THEMES
+%----------------------------------------------------------------------------------------
+
+\documentclass[xcolor=dvipsnames,aspectratio=169]{beamer}
+
+
+%Import Preamble bits
+\input{../assets/preambles/FormattingPreamble.tex}
+\input{../assets/preambles/TikzitPreamble.tex}
+\input{../assets/preambles/MathPreamble.tex}
+\input{../assets/preambles/BibPreamble.tex}
+\input{../assets/preambles/GeneralPreamble.tex}
+
+
+
+
+%----------------------------------------------------------------------------------------
+%	TITLE PAGE
+%----------------------------------------------------------------------------------------
+
+\title[Clinical Trials]{The Effects of Market Conditions on Recruitment and Completion of Clinical Trials} 
+
+\author{Will King} % Your name
+\institute[WSU] % Your institution as it will appear on the bottom of every slide, may be shorthand to save space
+{
+Washington State University \\ % Your institution for the title page
+\medskip
+\textit{william.f.king@wsu.edu} % Your email address
+}
+\date{\today} % Date, can be changed to a custom date
+
+
+
+
+
+\begin{document}
+\begin{frame}
+\titlepage % Print the title page as the first slide
+\end{frame}
+%----------------------------------
+\begin{frame} %Allow frame breaks
+\frametitle{Clincial Trials} % Table of contents slide, comment this out to remove it
+    % - Intro and hook (Clinical Trials are key part of pharmacological pipeline)
+    Pharmaceuticals are a frequently discussed aspect of health care cost managment.
+    Their development is dictated by scientific and regulatory hurdles 
+    including passing clinical trials
+    (\cite{noauthor_fda_nodate}), 
+    while their market is characterized by strategic competition and ambiguous 
+    patent protection
+    (\cite{van_der_gronde_addressing_2017}).
+
+    \vspace{12pt}
+
+    This research investigates the pathways by which market conditions
+    affect clinical trial completion.
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{This research}
+    \textbf{Questions:} 
+    \begin{enumerate}
+        \item Does the existence of alternative drugs on the market make it
+            harder for clinical trials to complete successfully? 
+        \item How much of this is occurs due to increased recruitment difficulty?
+    \end{enumerate}
+    
+\end{frame}
+%--------------------------------
+\begin{frame}
+\frametitle{Thanks} % Table of contents slide, comment this out to remove it
+    Thanks to Chris Adams and Rebecca Sachs of the Congressional Budget Office.
+\end{frame}
+%--------------------------------
+\begin{frame}[allowframebreaks] %Allow frame breaks
+\frametitle{Overview} % Table of contents slide, comment this out to remove it
+\tableofcontents 
+% - Intro and hook
+% - Literature review
+% - Causal Identification
+% - Data
+% - Econometric model
+% - Results
+% - Improvements
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Lit Review %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Lit Review}
+% First slide:
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Literature Highlights}
+    \begin{itemize}
+        \item \cite{van_der_gronde_addressing_2017}: 
+            High level synthesis of overall discussion regarding drug costs. 
+            Both academic and non-academic sources.
+        \item \cite{hwang_failure_2016}:
+            Answered the question "Why do late-stage (phase III) trials fail?"
+            Found that efficacy, safety, and competition reasons accounted for 
+            57\%, 17\%, and 22\% respectively.
+        \item \cite{abrantes-metz_pharmaceutical_2004}:
+            Described how drugs progress through the 3 phases of clinical trials
+            and correllations between various trial characteristics and the 
+            clinical trial failures.
+        \item \cite{khmelnitskaya_competition_2021}: 
+            Modeled clinical trial lifecycle of drugs, found method to separate
+            scientific from competitive reasons for failure to progress to the 
+            next phase.
+%        \item \cite{}: 
+
+    \end{itemize}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{This research, in context}
+
+    In contrast to previous work looking at multiple phases of trials,
+    I seek to figure out what causes individual trials to fail.
+
+    \vspace{12pt}
+
+    Instead of focusing on the drug development pipeline, I attempt to 
+    investigate the population of drug-based, phase III trials.
+\end{frame}
+%-------------------------------
+\begin{frame} %Allow frame breaks
+\frametitle{Why this approach?} % Table of contents slide, comment this out to remove it
+
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/methodology_trial.png}
+        \label{FIG:xkcd2726}
+        \caption{``If you think THAT'S unethical, you should see the stuff we approved via our Placebo IRB.'' 
+        - \url{https://xkcd.com/2726}
+        }
+    \end{figure}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Causal Identification / DGP%%%%%%%%%%%%%%%%%%%%%%%%
+\section{Causal Model}
+% Data Generating process
+% - Agents and their decisions
+% - Factors that influence each decision
+% - 
+% - 
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Generating Process}
+    % study sponsors
+    Study Sponsors Decide to start a Phase 3 trial and whether to terminate it.
+    \\
+    They ask themselves:
+    \begin{itemize}
+        \item Do safety incidents require terminating a trial?
+        \item Do efficacy results indicate the trial is worth continuing?
+        \item Is recruiting sufficient to achieve our results and contain costs?
+        \item Do expectations about future returns justify our expenditures?
+    \end{itemize}
+    
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Generating Process}
+    % participants
+    Participants decide to enroll (and disenroll) themselves in a trial based 
+    \begin{itemize}
+        \item Disease severity
+        \item Relative safety/efficacy compared to other treatments
+    \end{itemize}
+
+    Study sponsors plan their enrollment considering
+    \begin{itemize}
+        \item Total population affected
+        \item Likely participant response rates
+    \end{itemize}
+
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Generating Process}
+    % Trial Snapshots and dependencies.
+    During a trial, the study sponsor reports snapshots of their trial.
+    This includes updates to:
+
+    \begin{itemize}
+        \item enrollment (actual or anticipated)
+        \item current recruitment status (Recruiting, Active not recruiting, etc)
+        \item study sponsor
+        \item planned completion dates
+        \item elapsed duration
+    \end{itemize}
+
+    Note that final enrollment and the final status (Completed or Terminated) 
+    of the trial are jointly determined.
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Causal Diagram: Key Pathways}
+    % Estimating Direct vs Total Effects
+    \begin{figure}
+        \resizebox{!}{0.5\textheight}{
+            \tikzfig{../assets/tikzit/CausalGraph}
+        }
+        \label{FIG:CausalDiagram}
+        \caption{Causal Diagram highlighting direct and total pathways}
+    \end{figure}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Causal Diagram: Backdoor Crieterion}
+    \small
+    \begin{block}{$d$-separation}
+        A set $S$ of nodes blocks a path $p$ if either
+        \begin{enumerate}
+            \item $p$ contains at least one arrow-emitting node in $S$
+            \item $p$ contains at least one collision node $c$ that is outside $S$ 
+                and has no descendants in $S$.
+        \end{enumerate}
+        If $S$ blocks all paths from X to Y, then it is said to ``$d$-separate'' 
+        $X$ and $Y$, and then $X \perp Y | S$.
+    \end{block}
+    \begin{block}{Back-Door Criterion}
+        A set $S$ of covariates is admisible as controls on the 
+        causal relationship $X \rightarrow Y$ if:
+        \begin{enumerate}
+            \item No element of $S$ is a decendant of $X$
+            \item The elements of $S$ d-separate all paths from $X$ to $Y$ that include
+                parents of $X$.
+        \end{enumerate}
+    \end{block}
+    \cite{pearl_causality_2000}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Causal Diagram}
+    Key takeaways
+    \begin{itemize}
+        \item Measuring enrollment prior to trial completion is necessary for causal identification.
+        \item The backdoor criterion gives us the following adjustment sets:
+            \begin{itemize}
+                \item Total Effect for Market on Termination; Population, Condition, Phase III
+                \item Direct Effects for Enrollment, Market on Termination; Population, Condition Phase III, 
+                    Elapsed Duration, Planned Enrollment
+            \end{itemize}
+        \item Enrollment requires imputation
+    \end{itemize}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Data %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Data}
+%-------------------------------------------------------------------------------------
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Sources
+\subsection{Sources}
+%----------------------------------
+%-------------------------------
+\begin{frame} %Allow frame breaks
+    \frametitle{Data Sources}
+    \begin{itemize}
+        \item ClinicalTrials.gov - AACT \& custom scripts
+            \begin{itemize}
+                \item Select trials of interest
+                \item Trial details: 
+                    \begin{itemize}
+                        \item conditions
+                        \item final status
+                        \item drugs/interventions
+                    \end{itemize}
+                \item Trial snapshots:
+                    \begin{itemize}
+                        \item enrollment (anticipated, planned, or actual)
+                        \item elapsed duration
+                        \item current status
+                    \end{itemize}
+            \end{itemize}
+        \item Medical Subject Headings (MeSH) Thesaurus
+            \begin{itemize}
+                \item A standardized nomenclature used to classify interventions 
+                    and conditions in the clinical trials database.
+            \end{itemize}
+    \end{itemize}
+\end{frame}
+%-------------------------------
+\begin{frame} %Allow frame breaks
+    \frametitle{Data Sources}
+    \begin{itemize}
+        \item NSDE Files (New drug code Structured product labels Data Element)
+            \begin{itemize}
+                \item Contains information about when a given drug was on the market.
+            \end{itemize}
+        \item RxNorm
+            \begin{itemize}
+                \item Links pharmaceuticals between MeSH standardized terms and 
+                    NSDE files.
+            \end{itemize}
+        \item Global Disease Burden Survey (2019)
+            \begin{itemize}
+                \item Estimates of DALYs for categories of disease
+                \item Links of Categories to ICD-10 Codes
+            \end{itemize}
+        \item ICD-10 (2019)
+            \begin{itemize}
+                \item WHO version
+                \item CMS version (Clinical Managment)
+                \item Used to group disease conditions in hierarchal model
+            \end{itemize}
+        \item Unified Medical Language System Thesaurus
+            \begin{itemize}
+                \item Used to link MeSH standardized terms and ICD-10 conditions
+                \item Manual matching process
+            \end{itemize}
+    \end{itemize}
+\end{frame}
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Integration
+\subsection{Integration}
+%----------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Data Summaries}
+    %put summaries now
+    \begin{itemize}
+        \item Number of Phase III, FDA monitored Drug Trials: 1,981
+        \item Number of Trials matched to ICD-10: 186
+        \item Number of Trials matched to ICD-10 with population measures: 67 
+            (51 completed, 16 terminated)
+        \item Number of Snapshots: 616
+    \end{itemize}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Data used}
+    The following data points were used.
+    \begin{itemize}
+        \item elapsed duration 
+        \item asinh(number of brands)
+        \item asinh(high sdi DALY estimate)
+        \item asinh(high-medium sdi DALY estimate)
+        \item asinh(medium sdi DALY estimate)
+        \item asinh(low-medium sdi DALY estimate)
+        \item asinh(low sdi DALY estimate)
+    \end{itemize}
+    The asinh operator was used because it parallells $\text{ln}(x)$ for 
+    large values of $x$ but also handles $\text{asinh}(0)=0$.
+\end{frame}
+%----------------------------------
+\begin{frame}
+    \frametitle{Summaries: Trial Durations}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-12_durations_hist.png}
+        \label{FIG:durations}
+        \caption{Trial Durations (days)}
+    \end{figure}
+\end{frame}
+%----------------------------------
+\begin{frame}
+    \frametitle{Summaries: snapshots}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-12_snapshots_hist.png}
+        \label{FIG:snapshots}
+        \caption{Number of Snapshots per matched trial}
+    \end{figure}
+\end{frame}
+%----------------------------------
+\begin{frame}
+    \frametitle{Summaries: snapshots}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-12_status_duration_snapshots_points.png}
+        \label{FIG:snapshot_duration_scatter}
+        \caption{Scatterplot of snapshot count and durations}
+    \end{figure}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Econometric Model %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Econometric model}
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Econometric Model}
+    Estimating the total effect of brands on market
+    \begin{align}
+        y_n &\sim \text{Bernoulli}(p_n) \\
+        p_n &= \text{logisticfn}(x_n * \beta(d_n)) \\
+        \beta_k(d) &\sim \text{Normal}(\mu_k, \sigma_k) \\
+        \mu_k &\sim \text{Normal}(0,1) \\
+        \sigma_k &\sim \text{Gamma}(2,1)
+    \end{align}
+    $k$ indexes parameters and $d_n$ represets the ICD-10 group the trial corresponds to.
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Results %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Results}
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Results}
+    Because bayesian estimation is typically done numerically, we will first 
+    validate convergence.
+
+    Then we will take a look at preliminary results.
+
+    Sampling details
+    \begin{itemize}
+        \item 6 chains
+        \item 2,500 warmup, 2,500 sampling runs
+        \item seed = 11021585
+    \end{itemize}
+\end{frame}
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Convergence Tests
+\subsection{Convergence}
+%----------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Warnings}
+
+    \begin{itemize}
+        \item There were no diverging transitions.
+        \item There were 15,000 transitions that exceeded max treedepth. 
+            Sampling efficiency is poor.
+        \item All chains had low Bayesian Fraction of Missing Information. 
+            Some areas of the distribution were poorly explored.
+        \item R-hat = $1.23$, ideal is around 1, chains did not mix well.
+        \item Bulk and Tail Effective Sample sizes were low, 
+            suggesting mean and variance/quantile estimates will be unreliable.
+    \end{itemize}
+    \cite{mc-stan}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Convergence: Mu}
+    \begin{figure}
+        \includegraphics[height=0.9\textheight]{../assets/img/2023-04-11_mu_points.png}
+        \label{FIG:caption}
+        \caption{Hyperparameter Points Plots: Mu}
+    \end{figure}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Convergence: Sigma}
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-11_sigma_points.png}
+        \label{FIG:caption}
+        \caption{Hyperparameter Points Plots: Sigma}
+    \end{figure}
+\end{frame}
+%----------------------------------
+%%%%%%%%%%%%%%%%%%%% Preliminary Results
+\subsection{Preliminary Results}
+%----------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Preliminary Results: Mu}
+
+    \begin{columns}
+        \begin{column}{0.3\textwidth}
+            \begin{enumerate}
+                \item elapsed duration 
+                \item asinh(n\_brands)
+                \item asinh(high sdi)
+                \item asinh(high-medium sdi)
+                \item asinh(medium sdi)
+                \item asinh(low-medium sdi)
+                \item asinh(low sdi)
+            \end{enumerate}
+        \end{column}
+        \begin{column}{0.7\textwidth}
+            \begin{figure}
+                \includegraphics[height=0.8\textheight]{../assets/img/2023-04-11_mu_dist.png}
+                \label{FIG:caption}
+                \caption{Hyperparameter Distribution: Mu}
+            \end{figure}
+        \end{column}
+    \end{columns}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Preliminary Results: Sigma}
+
+    \begin{figure}
+        \includegraphics[height=0.8\textheight]{../assets/img/2023-04-11_sigma_dist.png}
+        \label{FIG:caption}
+        \caption{Hyperparameter Distribution: Sigma}
+    \end{figure}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Interpretation}
+    All of the following interpretations are done in the context of insufficient data
+
+    \begin{enumerate}
+        \item Elapsed Duration (Mu[1]): Trending Negative, reduced probability of termination.
+        \item Number of Brands(Mu[2]): Trending Positive, increased probability of termination.
+        \item Population Measures (Mu[3]-Mu[7])
+            \begin{enumerate}
+                \item What is most surprising is that these are both positive and negative.
+                    Probably need more data.
+            \end{enumerate}
+        \item It is surprising to see the wide distribution in sigma values.
+    \end{enumerate}
+\end{frame}
+%-------------------------------------------------------------------------------------
+%%%%%%%%%%%%%%%%%%%% Improvements %%%%%%%%%%%%%%%%%%%%%%%%
+\section{Improvements}
+%-------------------------------------------------------------------------------------
+%-------------------------------
+\begin{frame}
+    \frametitle{Proposed improvements}
+    \begin{enumerate}
+        \item Match more trials to ICD-10 codes
+        \item Improve Measures of Market Conditions
+        \item Adjust Covariance Structure
+        \item Find Reasonable Priors
+        \item Remove disease categories that don't exist in the data from the priors
+        \item Imputing Enrollment
+        \item Improve Population Estimates
+    \end{enumerate}
+\end{frame}
+%-------------------------------
+\begin{frame}
+    \frametitle{Questions?}
+    \center{\huge{Questions?}}
+
+\end{frame}
+%-------------------------------
+\begin{frame}[allowframebreaks]
+    \frametitle{Bibliography}
+    \printbibliography
+\end{frame}
+%-------------------------------
+\end{document} 
+%=========================================
+%\begin{frame}
+%    \frametitle{MarginalRevenue}
+%    \begin{figure}
+%        \tikzfig{../Assets/owned/ch8_MarginalRevenue}
+%        \includegraphics[height=\textheight]{../Assets/copyrighted/KrugmanObsterfeldMeliz_fig8-7.jpg}
+%        \label{FIG:costs}
+%        \caption{Average Cost Curve as firms enter.}
+%    \end{figure}
+%\end{frame}
+%-------------------------------
+%\begin{frame}
+%    \frametitle{Columns}
+%     \begin{columns}
+%        \begin{column}{0.5\textwidth}
+%        \end{column}
+%        \begin{column}{0.5\textwidth}
+%               \begin{figure}
+%                   \tikzfig{../Assets/owned/ch7_EstablishedAdvantageExample2}
+%                   \label{FIG:costs}
+%                   \caption{Setting the Stage}
+%               \end{figure}
+%        \end{column}
+%     \end{columns}
+%\end{frame}
+% %---------------------------------------------------------------
--- a/Latex/assets/img/0000163.png
+++ b/Latex/assets/img/0000163.png
--- a/Latex/assets/img/0000164.png
+++ b/Latex/assets/img/0000164.png
--- a/Latex/assets/img/000016a.png
+++ b/Latex/assets/img/000016a.png
--- a/Latex/assets/img/000016b.png
+++ b/Latex/assets/img/000016b.png
--- a/Latex/assets/img/000018.png
+++ b/Latex/assets/img/000018.png
--- a/Latex/assets/img/00001a.png
+++ b/Latex/assets/img/00001a.png
--- a/Latex/assets/img/00001c.png
+++ b/Latex/assets/img/00001c.png
--- a/Latex/assets/img/2023-04-11_mu_dist.png
+++ b/Latex/assets/img/2023-04-11_mu_dist.png
--- a/Latex/assets/img/2023-04-11_mu_hist.png
+++ b/Latex/assets/img/2023-04-11_mu_hist.png
--- a/Latex/assets/img/2023-04-11_mu_points.png
+++ b/Latex/assets/img/2023-04-11_mu_points.png
--- a/Latex/assets/img/2023-04-11_sigma_dist.png
+++ b/Latex/assets/img/2023-04-11_sigma_dist.png
--- a/Latex/assets/img/2023-04-11_sigma_hist.png
+++ b/Latex/assets/img/2023-04-11_sigma_hist.png
--- a/Latex/assets/img/2023-04-11_sigma_points.png
+++ b/Latex/assets/img/2023-04-11_sigma_points.png
--- a/Latex/assets/img/2023-04-12_durations_hist.png
+++ b/Latex/assets/img/2023-04-12_durations_hist.png
--- a/Latex/assets/img/2023-04-12_snapshots_hist.png
+++ b/Latex/assets/img/2023-04-12_snapshots_hist.png
--- a/Latex/assets/img/2023-04-12_status_duration_snapshots_points.png
+++ b/Latex/assets/img/2023-04-12_status_duration_snapshots_points.png
--- a/Latex/assets/img/dagitty-model.jpg
+++ b/Latex/assets/img/dagitty-model.jpg
--- a/Latex/assets/img/dagitty-model.svg
+++ b/Latex/assets/img/dagitty-model.svg
--- a/Latex/assets/img/methodology_trial.png
+++ b/Latex/assets/img/methodology_trial.png
--- a/Latex/assets/img/mu_batman.png
+++ b/Latex/assets/img/mu_batman.png
--- a/Latex/assets/img/mu_mix.png
+++ b/Latex/assets/img/mu_mix.png
--- a/Latex/assets/img/mu_posterior.png
+++ b/Latex/assets/img/mu_posterior.png
--- a/Latex/assets/img/mu_trank.png
+++ b/Latex/assets/img/mu_trank.png
--- a/Latex/assets/img/sigma_batman.png
+++ b/Latex/assets/img/sigma_batman.png
--- a/Latex/assets/img/sigma_mix.png
+++ b/Latex/assets/img/sigma_mix.png
--- a/Latex/assets/img/sigma_pairs_5-9.png
+++ b/Latex/assets/img/sigma_pairs_5-9.png
--- a/Latex/assets/img/sigma_posterior.png
+++ b/Latex/assets/img/sigma_posterior.png
--- a/Latex/assets/img/sigma_trank.png
+++ b/Latex/assets/img/sigma_trank.png
--- a/Latex/assets/preambles/GeneralPreamble.tex
+++ b/Latex/assets/preambles/GeneralPreamble.tex
@ -12,5 +12,7 @@
 \usepackage{graphicx}
 \graphicspath{assets/img/}

-%setup paragraph level indexing
-\setcounter{secnumdepth}{5}
+
+
+%quotes
+\usepackage{csquotes}
--- a/Latex/assets/preambles/References.bib
+++ b/Latex/assets/preambles/References.bib
--- a/Latex/assets/preambles/WSU_Econ.tikzstyles
+++ b/Latex/assets/preambles/WSU_Econ.tikzstyles
@ -6,12 +6,14 @@
 % Node styles
 \tikzstyle{CrimsonNode}=[fill={rgb,255: red,152; green,30; blue,50}, draw={rgb,255: red,152; green,30; blue,50}, shape=circle, tikzit category=WSU, tikzit draw={rgb,255: red,152; green,30; blue,50}, tikzit fill={rgb,255: red,152; green,30; blue,50}]
 \tikzstyle{GreyNode}=[fill={rgb,255: red,94; green,106; blue,113}, draw={rgb,255: red,94; green,106; blue,113}, shape=circle, tikzit category=WSU, tikzit draw={rgb,255: red,94; green,106; blue,113}, tikzit fill={rgb,255: red,94; green,106; blue,113}]
-\tikzstyle{Box}=[fill={rgb,255: red,94; green,106; blue,113}, draw={rgb,255: red,94; green,106; blue,113}, shape=rectangle, tikzit draw={rgb,255: red,94; green,106; blue,113}, tikzit fill={rgb,255: red,94; green,106; blue,113}]
+\tikzstyle{Gray Box}=[fill={rgb,255: red,94; green,106; blue,113}, draw={rgb,255: red,94; green,106; blue,113}, shape=rectangle, tikzit draw={rgb,255: red,94; green,106; blue,113}, tikzit fill={rgb,255: red,94; green,106; blue,113}]
 \tikzstyle{Red Box}=[fill={rgb,255: red,152; green,30; blue,50}, draw={rgb,255: red,152; green,30; blue,50}, shape=rectangle]
 \tikzstyle{new style 0}=[fill=white, draw=black, shape=circle, tikzit draw=black]
 \tikzstyle{new style 1}=[fill={rgb,255: red,128; green,0; blue,128}, draw=black, shape=circle]
 \tikzstyle{emptyBox}=[fill=white, draw=black, shape=rectangle]
 \tikzstyle{rotated text}=[fill=none, draw=none, shape=circle, rotate=270, tikzit draw={rgb,255: red,191; green,191; blue,191}]
+\tikzstyle{GreyBoxDotted}=[fill={rgb,255: red,94; green,106; blue,113}, draw={rgb,255: red,64; green,64; blue,64}, shape=rectangle, tikzit draw=black, dashed, ultra thick]
+\tikzstyle{purple box}=[fill={rgb,255: red,128; green,0; blue,128}, draw=black, shape=rectangle, tikzit fill={rgb,255: red,128; green,0; blue,128}, tikzit draw=black]

 % Edge styles
 \tikzstyle{RightArrow}=[->]
@ -25,7 +27,7 @@
 \tikzstyle{lightgreybar}=[-, draw={rgb,255: red,191; green,191; blue,191}]
 \tikzstyle{lightred}=[-, draw={rgb,255: red,222; green,148; blue,178}]
 \tikzstyle{Purple}=[-, draw={rgb,255: red,128; green,0; blue,128}, tikzit draw={rgb,255: red,128; green,0; blue,128}, line width=1mm]
-\tikzstyle{new edge style 1}=[draw={rgb,255: red,121; green,23; blue,40}, ->]
+\tikzstyle{lightredarrow}=[draw={rgb,255: red,121; green,23; blue,40}, ->]
 \tikzstyle{filled2}=[-, fill={rgb,255: red,255; green,191; blue,191}, draw=black, tikzit draw=black, tikzit fill={rgb,255: red,255; green,191; blue,191}, opacity=0.5]
 \tikzstyle{filled1}=[-, fill={rgb,255: red,191; green,191; blue,191}, draw=black, tikzit draw=black, opacity=0.5, tikzit fill={rgb,255: red,191; green,191; blue,191}]
 \tikzstyle{emptyFill1}=[-, fill={rgb,255: red,255; green,191; blue,191}, draw=none, tikzit draw=blue, opacity=0.3]
--- a/Latex/assets/preambles/references/A.bib
+++ b/Latex/assets/preambles/references/A.bib
@ -0,0 +1,45 @@
+
+@book{pearl_causality_2000,
+	location = {Cambridge, U.K. ; New York},
+	title = {Causality: models, reasoning, and inference},
+	isbn = {978-0-521-89560-6 978-0-521-77362-1},
+	shorttitle = {Causality},
+	pagetotal = {384},
+	publisher = {Cambridge University Press},
+	author = {Pearl, Judea},
+	date = {2000},
+	langid = {english},
+	keywords = {Causation, Probabilities},
+	file = {Pearl - 2000 - Causality models, reasoning, and inference.pdf:/home/dad/Nextcloud/Zotero_data/storage/8GZJS832/Pearl - 2000 - Causality models, reasoning, and inference.pdf:application/pdf},
+}
+
+@thesis{khmelnitskaya_competition_2021,
+	title = {Competition and Attrition in Drug Development},
+	abstract = {With fewer than 10\% of new drugs reaching the market, the drug development process is notorious for its high attrition rate. However, we rarely observe the reason for a drug’s discontinuation. It is known that pharmaceutical ﬁrms withdraw drugs after clinical failures, such as when trial results do not demonstrate adequate safety or eﬃcacy according to {FDA} standards. At the same time, surveys suggest that ﬁrms also withdraw drugs for strategic reasons, such as when competition makes it unproﬁtable to continue development. Disentangling these two sources of attrition is necessary in order to predict the eﬀects a government policy would have on the number of drugs that reach consumers. In this paper, I propose an empirical framework to separately identify the two components of attrition for each disease. To this end, I build a continuous-time dynamic model of the drug development process. In the model, ﬁrms take competitors’ R\&D choices into account when they make exit decisions at diﬀerent stages of the innovation process. To estimate the model, I use rich data on the development histories of experimental drugs, clinical trial outcomes, and disease-speciﬁc epidemiological characteristics. I ﬁnd that, on average, strategic terminations account for 8.4\% of all attrition, and as much as 35\% for some diseases. Using these estimates in counterfactual simulations, I show that without strategic withdrawals, the rate at which new drugs reach consumers would be on average 23\% higher. Large subsidies for clinical trials help realize some of that gain, with better results found for diseases that have a higher share of strategic attrition. However, the overall eﬀect of subsidies on the rate of new drug launches is small. Alternatively, the same eﬀect can be achieved through any minor regulatory adjustment that marginally helps lower the probability of late-stage clinical failures.},
+	pagetotal = {55},
+	institution = {University of Virginia},
+	type = {phdthesis},
+	author = {Khmelnitskaya, Ekaterina},
+	urldate = {2023-04-10},
+	date = {2021-05},
+	langid = {english},
+	file = {1_Khmelnitskaya_Ekaterina_2021_PHD.pdf:/home/dad/Nextcloud/Zotero_data/storage/CSRFCIDB/1_Khmelnitskaya_Ekaterina_2021_PHD.pdf:application/pdf;Khmelnitskaya - Competition and Attrition in Drug Development.pdf:/home/dad/Nextcloud/Zotero_data/storage/QBXQ4ZLR/Khmelnitskaya - Competition and Attrition in Drug Development.pdf:application/pdf},
+}
+
+@article{ursu_drugcentral_2017,
+	title = {{DrugCentral}: online drug compendium},
+	volume = {45},
+	issn = {0305-1048, 1362-4962},
+	url = {https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkw993},
+	doi = {10.1093/nar/gkw993},
+	shorttitle = {{DrugCentral}},
+	pages = {D932--D939},
+	issue = {D1},
+	journaltitle = {Nucleic Acids Research},
+	shortjournal = {Nucleic Acids Res},
+	author = {Ursu, Oleg and Holmes, Jayme and Knockel, Jeffrey and Bologa, Cristian G. and Yang, Jeremy J. and Mathias, Stephen L. and Nelson, Stuart J. and Oprea, Tudor I.},
+	urldate = {2023-04-10},
+	date = {2017-01-04},
+	langid = {english},
+	file = {Full Text:/home/dad/Nextcloud/Zotero_data/storage/7W6THRK6/Ursu et al. - 2017 - DrugCentral online drug compendium.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/B.bib
+++ b/Latex/assets/preambles/references/B.bib
@ -0,0 +1,30 @@
+
+@article{van_der_gronde_addressing_2017,
+	title = {Addressing the challenge of high-priced prescription drugs in the era of precision medicine: A systematic review of drug life cycles, therapeutic drug markets and regulatory frameworks},
+	volume = {12},
+	issn = {1932-6203},
+	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5559086/},
+	doi = {10.1371/journal.pone.0182613},
+	shorttitle = {Addressing the challenge of high-priced prescription drugs in the era of precision medicine},
+	abstract = {Context
+Recent public outcry has highlighted the rising cost of prescription drugs worldwide, which in several disease areas outpaces other health care expenditures and results in a suboptimal global availability of essential medicines.
+
+Method
+A systematic review of Pubmed, the Financial Times, the New York Times, the Wall Street Journal and the Guardian was performed to identify articles related to the pricing of medicines.
+
+Findings
+Changes in drug life cycles have dramatically affected patent medicine markets, which have long been considered a self-evident and self-sustainable source of income for highly profitable drug companies. Market failure in combination with high merger and acquisition activity in the sector have allowed price increases for even off-patent drugs. With market interventions and the introduction of {QALY} measures in health care, governments have tried to influence drug prices, but often encounter unintended consequences. Patent reform legislation, reference pricing, outcome-based pricing and incentivizing physicians and pharmacists to prescribe low-cost drugs are among the most promising short-term policy options. Due to the lack of systematic research on the effectiveness of policy measures, an increasing number of ad hoc decisions have been made with counterproductive effects on the availability of essential drugs. Future challenges demand new policies, for which recommendations are offered.
+
+Conclusion
+A fertile ground for high-priced drugs has been created by changes in drug life-cycle dynamics, the unintended effects of patent legislation, government policy measures and orphan drug programs. There is an urgent need for regulatory reform to curtail prices and safeguard equitable access to innovative medicines.},
+	pages = {e0182613},
+	number = {8},
+	journaltitle = {{PLoS} {ONE}},
+	shortjournal = {{PLoS} One},
+	author = {van der Gronde, Toon and Uyl-de Groot, Carin A. and Pieters, Toine},
+	urldate = {2023-03-11},
+	date = {2017-08-16},
+	pmid = {28813502},
+	pmcid = {PMC5559086},
+	file = {PubMed Central Full Text PDF:/home/dad/Nextcloud/Zotero_data/storage/7Y8KZSMU/van der Gronde et al. - 2017 - Addressing the challenge of high-priced prescripti.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/C.bib
+++ b/Latex/assets/preambles/references/C.bib
@ -0,0 +1,20 @@
+
+@article{hwang_failure_2016,
+	title = {Failure of Investigational Drugs in Late-Stage Clinical Development and Publication of Trial Results},
+	volume = {176},
+	issn = {2168-6106},
+	url = {http://archinte.jamanetwork.com/article.aspx?doi=10.1001/jamainternmed.2016.6008},
+	doi = {10.1001/jamainternmed.2016.6008},
+	abstract = {{OBJECTIVE} To assess factors associated with regulatory approval or reasons for failure of investigational therapeutics in phase 3 or pivotal trials and rates of publication of trial results. {DESIGN}, {SETTING}, {AND} {PARTICIPANTS} Using public sources and commercial databases, we identified investigational therapeutics that entered pivotal trials between 1998 and 2008, with follow-up through 2015. Agents were classified by therapeutic area, orphan designation status, fast track designation, novelty of biological pathway, company size, and as a pharmacologic or biologic product. {MAIN} {OUTCOMES} {AND} {MEASURES} For each product, we identified reasons for failure (efficacy, safety, commercial) and assessed the rates of publication of trial results. We used multivariable logistic regression models to evaluate factors associated with regulatory approval.
+{RESULTS} Among 640 novel therapeutics, 344 (54\%) failed in clinical development, 230 (36\%) were approved by the {US} Food and Drug Administration ({FDA}), and 66 (10\%) were approved in other countries but not by the {FDA}. Most products failed due to inadequate efficacy (n = 195; 57\%), while 59 (17\%) failed because of safety concerns and 74 (22\%) failed due to commercial reasons. The pivotal trial results were published in peer-reviewed journals for 138 of the 344 (40\%) failed agents. Of 74 trials for agents that failed for commercial reasons, only 6 (8.1\%) were published. In analyses adjusted for therapeutic area, agent type, firm size, orphan designation, fast-track status, trial year, and novelty of biological pathway, orphan-designated drugs were significantly more likely than nonorphan drugs to be approved (46\% vs 34\%; adjusted odds ratio [{aOR}], 2.3; 95\% {CI}, 1.4-3.7). Cancer drugs (27\% vs 39\%; {aOR}, 0.5; 95\% {CI}, 0.3-0.9) and agents sponsored by small and medium-size companies (28\% vs 42\%; {aOR}, 0.4; 95\% {CI}, 0.3-0.7) were significantly less likely to be approved.
+{CONCLUSIONS} {AND} {RELEVANCE} Roughly half of investigational drugs entering late-stage clinical development fail during or after pivotal clinical trials, primarily because of concerns about safety, efficacy, or both. Results for the majority of studies of investigational drugs that fail are not published in peer-reviewed journals.},
+	pages = {1826},
+	number = {12},
+	journaltitle = {{JAMA} Internal Medicine},
+	shortjournal = {{JAMA} Intern Med},
+	author = {Hwang, Thomas J. and Carpenter, Daniel and Lauffenburger, Julie C. and Wang, Bo and Franklin, Jessica M. and Kesselheim, Aaron S.},
+	urldate = {2023-01-31},
+	date = {2016-12-01},
+	langid = {english},
+	file = {Hwang et al. - 2016 - Failure of Investigational Drugs in Late-Stage Cli.pdf:/home/dad/Nextcloud/Zotero_data/storage/JJC96CPC/Hwang et al. - 2016 - Failure of Investigational Drugs in Late-Stage Cli.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/D.bib
+++ b/Latex/assets/preambles/references/D.bib
@ -0,0 +1,15 @@
+
+@article{abrantes-metz_pharmaceutical_2004,
+	title = {Pharmaceutical Development Phases: A Duration Analysis},
+	issn = {1556-5068},
+	url = {http://www.ssrn.com/abstract=607941},
+	doi = {10.2139/ssrn.607941},
+	shorttitle = {Pharmaceutical Development Phases},
+	journaltitle = {{SSRN} Electronic Journal},
+	shortjournal = {{SSRN} Journal},
+	author = {Abrantes-Metz, Rosa M. and Adams, Christopher and Metz, Albert D.},
+	urldate = {2023-01-31},
+	date = {2004},
+	langid = {english},
+	file = {Abrantes-Metz et al. - 2004 - Pharmaceutical Development Phases A Duration Anal.pdf:/home/dad/Nextcloud/Zotero_data/storage/LANZBC53/Abrantes-Metz et al. - 2004 - Pharmaceutical Development Phases A Duration Anal.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/E.bib
+++ b/Latex/assets/preambles/references/E.bib
@ -0,0 +1,9 @@
+
+@article{acemoglu_market_2004,
+	title = {{MARKET} {SIZE} {IN} {INNOVATION}: {THEORY} {AND} {EVIDENCE} {FROM} {THE} {PHARMACEUTICAL} {INDUSTRY}},
+	journaltitle = {{QUARTERLY} {JOURNAL} {OF} {ECONOMICS}},
+	author = {Acemoglu, Daron and Linn, Joshua},
+	date = {2004-08},
+	langid = {english},
+	file = {Acemoglu and Linn - MARKET SIZE IN INNOVATION THEORY AND EVIDENCE FRO.pdf:/home/dad/Nextcloud/Zotero_data/storage/HYTY3E36/Acemoglu and Linn - MARKET SIZE IN INNOVATION THEORY AND EVIDENCE FRO.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/F.bib
+++ b/Latex/assets/preambles/references/F.bib
@ -0,0 +1,88 @@
+
+@online{noauthor_fdaaa_nodate,
+	title = {{FDAAA} 801 and the Final Rule - {ClinicalTrials}.gov},
+	url = {https://clinicaltrials.gov/ct2/manage-recs/fdaaa},
+	urldate = {2023-04-08},
+	langid = {english},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/V9YVGVK2/fdaaa.html:text/html},
+}
+
+@online{noauthor_frequently_nodate,
+	title = {Frequently Asked Questions - {ClinicalTrials}.gov},
+	url = {https://clinicaltrials.gov/ct2/manage-recs/faq#board},
+	urldate = {2023-04-08},
+	langid = {english},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/GNBZDX5B/faq.html:text/html},
+}
+
+@online{noauthor_rxnorm_nodate,
+	title = {{RxNorm}},
+	rights = {Public Domain},
+	url = {https://www.nlm.nih.gov/research/umls/rxnorm/index.html},
+	type = {Product, Program, and Project Descriptions},
+	urldate = {2023-04-08},
+	note = {Publisher: U.S. National Library of Medicine},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/UPPXYYW6/index.html:text/html},
+}
+
+@online{noauthor_rxnorm_nodate-1,
+	title = {{RxNorm} Overview},
+	rights = {Public Domain},
+	url = {https://www.nlm.nih.gov/research/umls/rxnorm/overview.html},
+	type = {Product, Program, and Project Descriptions},
+	urldate = {2023-04-08},
+	note = {Publisher: U.S. National Library of Medicine},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/XI269ZNM/overview.html:text/html},
+}
+
+@online{noauthor_medical_nodate,
+	title = {Medical Subject Headings - Home Page},
+	rights = {Public Domain},
+	url = {https://www.nlm.nih.gov/mesh/meshhome.html},
+	type = {Product, Program, and Project Descriptions},
+	urldate = {2023-04-09},
+	note = {Publisher: U.S. National Library of Medicine},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/RTW5EPBG/meshhome.html:text/html},
+}
+
+@online{noauthor_international_nodate,
+	title = {International Classification of Diseases ({ICD})},
+	url = {https://www.who.int/standards/classifications/classification-of-diseases},
+	abstract = {International Classification of Diseases ({ICD}) Revision},
+	urldate = {2023-04-09},
+	langid = {english},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/4Y3F35AR/classification-of-diseases.html:text/html},
+}
+
+@online{noauthor_2023_nodate,
+	title = {2023 {ICD}-10-{CM} {\textbar} {CMS}},
+	url = {https://www.cms.gov/medicare/icd-10/2023-icd-10-cm},
+	urldate = {2023-04-09},
+}
+
+@online{noauthor_2023_nodate-1,
+	title = {2023 {ICD}-10-{PCS} {\textbar} {CMS}},
+	url = {https://www.cms.gov/medicare/icd-10/2023-icd-10-pcs},
+	urldate = {2023-04-09},
+	file = {2023 ICD-10-PCS | CMS:/home/dad/Nextcloud/Zotero_data/storage/4NLQJQT6/2023-icd-10-pcs.html:text/html},
+}
+
+@online{noauthor_2019_nodate,
+	title = {2019 {ICD}-10-{CM} {\textbar} {CMS}},
+	url = {https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM},
+	urldate = {2023-04-09},
+	file = {2019 ICD-10-CM | CMS:/home/dad/Nextcloud/Zotero_data/storage/S5ISTWEL/2019-ICD-10-CM.html:text/html},
+}
+
+@online{commissioner_understanding_2019,
+	title = {Understanding Unapproved Use of Approved Drugs "Off Label"},
+	url = {https://www.fda.gov/patients/learn-about-expanded-access-and-other-treatment-options/understanding-unapproved-use-approved-drugs-label},
+	abstract = {Understanding Unapproved Use of Approved Drugs "Off Label"},
+	titleaddon = {{FDA}},
+	author = {Commissioner, Office of the},
+	urldate = {2023-04-10},
+	date = {2019-04-18},
+	langid = {english},
+	note = {Publisher: {FDA}},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/VAKSGTAP/understanding-unapproved-use-approved-drugs-label.html:text/html},
+}
--- a/Latex/assets/preambles/references/G.bib
+++ b/Latex/assets/preambles/references/G.bib
@ -0,0 +1,5 @@
+
+@misc{noauthor_indexing_nodate,
+	title = {Indexing Spl Fact Sheet},
+	file = {Indexing-SPL-Fact-Sheet.pdf:/home/dad/Nextcloud/Zotero_data/storage/KAHW2ABD/Indexing-SPL-Fact-Sheet.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/H.bib
+++ b/Latex/assets/preambles/references/H.bib
--- a/Latex/assets/preambles/references/I.bib
+++ b/Latex/assets/preambles/references/I.bib
@ -0,0 +1,7 @@
+
+@online{noauthor_rxnav---box_nodate,
+	title = {{RxNav}-in-a-Box - {RxNav} Applications},
+	url = {https://lhncbc.nlm.nih.gov/RxNav/applications/RxNav-in-a-Box.html},
+	urldate = {2023-04-10},
+	file = {RxNav-in-a-Box - RxNav Applications:/home/dad/Nextcloud/Zotero_data/storage/A9S2NM29/RxNav-in-a-Box.html:text/html},
+}
--- a/Latex/assets/preambles/references/J.bib
+++ b/Latex/assets/preambles/references/J.bib
@ -0,0 +1,7 @@
+
+@online{noauthor_icd-10_nodate,
+	title = {{ICD}-10 Version:2019},
+	url = {https://icd.who.int/browse10/2019/en#/C00},
+	urldate = {2023-04-10},
+	file = {ICD-10 Version\:2019:/home/dad/Nextcloud/Zotero_data/storage/23DGMZ5X/en.html:text/html},
+}
--- a/Latex/assets/preambles/references/K.bib
+++ b/Latex/assets/preambles/references/K.bib
@ -0,0 +1,15 @@
+@Misc{rstan,
+    title = {{RStan}: the {R} interface to {Stan}},
+    author = {{Stan Development Team}},
+    note = {R package version 2.21.8},
+    year = {2023},
+    url = {https://mc-stan.org/},
+  }
+@Misc{mc-stan,
+    title = {Stan Modelling usersGuide and Reference Manual},
+    author = {{Stan Development Team}},
+    note = {R package version 2.26},
+    year = {2022},
+    url = {https://mc-stan.org/},
+  }
+
--- a/Latex/assets/preambles/references/L.bib
+++ b/Latex/assets/preambles/references/L.bib
@ -0,0 +1,13 @@
+
+@book{mcelreath_statistical_2020,
+	location = {Boca Raton},
+	edition = {2},
+	title = {Statistical rethinking: a Bayesian course with examples in R and Stan},
+	isbn = {978-0-367-13991-9},
+	series = {{CRC} texts in statistical science},
+	shorttitle = {Statistical rethinking},
+	abstract = {"Statistical Rethinking: A Bayesian Course with Examples in R and Stan, Second Edition builds knowledge/confidence in statistical modeling. Pushes readers to perform step-by-step calculations (usually automated.) Unique, computational approach ensures readers understand details to make reasonable choices and interpretations in their modeling work"--},
+	publisher = {Taylor and Francis, {CRC} Press},
+	author = {{McElreath}, Richard},
+	date = {2020},
+}
--- a/Latex/assets/preambles/references/M.bib
+++ b/Latex/assets/preambles/references/M.bib
@ -0,0 +1,9 @@
+
+@report{global_burden_of_disease_collaborative_network_global_2020-1,
+	location = {Seattle, United States of America},
+	title = {Global Burden of Disease Study 2019 ({GBD} 2019) Cause Hierarchy},
+	abstract = {Global Burden of Disease Collaborative Network. Global Burden of Disease Study 2019 ({GBD} 2019) Cause Hierarchy Seattle, United States of America: Institute for Health Metrics and Evaluation ({IHME}), 2020.},
+	institution = {nstitute for Health Metrics and Evaluation ({IHME})},
+	author = {Global Burden of Disease Collaborative Network},
+	date = {2020},
+}
--- a/Latex/assets/preambles/references/N.bib
+++ b/Latex/assets/preambles/references/N.bib
@ -0,0 +1,18 @@
+
+@article{ursu_drugcentral_2017,
+	title = {{DrugCentral}: online drug compendium},
+	volume = {45},
+	issn = {0305-1048, 1362-4962},
+	url = {https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkw993},
+	doi = {10.1093/nar/gkw993},
+	shorttitle = {{DrugCentral}},
+	pages = {D932--D939},
+	issue = {D1},
+	journaltitle = {Nucleic Acids Research},
+	shortjournal = {Nucleic Acids Res},
+	author = {Ursu, Oleg and Holmes, Jayme and Knockel, Jeffrey and Bologa, Cristian G. and Yang, Jeremy J. and Mathias, Stephen L. and Nelson, Stuart J. and Oprea, Tudor I.},
+	urldate = {2023-04-10},
+	date = {2017-01-04},
+	langid = {english},
+	file = {Full Text:/home/dad/Nextcloud/Zotero_data/storage/7W6THRK6/Ursu et al. - 2017 - DrugCentral online drug compendium.pdf:application/pdf},
+}
--- a/Latex/assets/preambles/references/O.bib
+++ b/Latex/assets/preambles/references/O.bib
@ -0,0 +1,10 @@
+
+@online{noauthor_fda_nodate,
+	title = {{FDA} Drug Approval Process},
+	url = {https://www.drugs.com/fda-approval-process.html},
+	abstract = {It can take up to \$2 billion and 12 to 15 years to get a drug from the test tube to the market. What happens at the {FDA} to get this drug safely to you?},
+	titleaddon = {Drugs.com},
+	urldate = {2023-04-12},
+	langid = {english},
+	file = {Snapshot:/home/dad/Nextcloud/Zotero_data/storage/VTIGXXJB/fda-approval-process.html:text/html},
+}
--- a/Latex/assets/preambles/references/compile.sh
+++ b/Latex/assets/preambles/references/compile.sh
@ -0,0 +1,2 @@
+
+cat ./*.bib > ../References.bib
--- a/Latex/assets/tikzit/CausalGraph.tikz
+++ b/Latex/assets/tikzit/CausalGraph.tikz
@ -1,34 +1,52 @@
 \begin{tikzpicture}
 	\begin{pgfonlayer}{nodelayer}
-		\node [style=emptyBox] (0) at (-18.5, 4) {Compound Safety};
-		\node [style=emptyBox] (1) at (-18.5, -3.5) {Compound Effecacy};
-		\node [style=Red Box] (3) at (-3.5, -4.25) {\begin{tabular}{l} Conclusion State \\ $\bullet$ Status\\  $\bullet$ Duration \\ $\bullet$ Enrollment \end{tabular}};
-		\node [style=Red Box] (4) at (-2, 1) {\begin{tabular}{l} Snapshot State \\ $\bullet$ Status\\  $\bullet$ Duration \\ $\bullet$ Enrollment \end{tabular}};
-		\node [style=Red Box] (5) at (3.25, -2) {Market Conditions};
-		\node [style=Box] (6) at (-17.5, -5.5) {Sponsor Changes};
-		\node [style=Box] (7) at (-15.25, 0.25) {\begin{tabular}{l}Prior \\ Trials \end{tabular}};
-		\node [style=emptyBox] (8) at (-2.25, 4) {Beliefs about Compound};
-		\node [style=emptyBox] (10) at (3.25, -5.25) {Unobserved};
-		\node [style=Box] (11) at (3.25, -6.25) {Observed: Control};
-		\node [style=Red Box] (12) at (3.25, -7.25) {Observed: Of interest};
-		\node [style=emptyBox] (13) at (-11, 4) {Current Adverse Events};
-		\node [style=emptyBox] (14) at (-10.5, -1.5) {Measured Effectiveness};
-		\node [style=Box] (15) at (4.75, 1) {Disease Burden};
+		\node [style=Red Box] (0) at (4, -1.5) {Will Terminate?};
+		\node [style=Red Box] (1) at (-4.25, -1.5) {Market Measures};
+		\node [style=emptyBox] (4) at (-6, -7.5) {Unobserved};
+		\node [style=purple box] (5) at (0, 2) {Enrollment};
+		\node [style=Red Box] (8) at (-5.75, -6) {Relationships of interest};
+		\node [style=emptyBox] (9) at (12.25, 7.5) {Fundamental Efficacy/Safety};
+		\node [style=emptyBox] (10) at (0, 9.25) {Previously Observed Efficacy/Safety};
+		\node [style=emptyBox] (12) at (12.25, 0) {Currently Observed Efficacy/Safety};
+		\node [style=Gray Box] (13) at (7, -6) {Observed, adjustment set 1};
+		\node [style=GreyBoxDotted] (14) at (7, -7.25) {Observed, adjustment set 2};
+		\node [style=GreyBoxDotted] (15) at (-4, 4.25) {Planned Enrollment};
+		\node [style=GreyBoxDotted] (16) at (4.5, 4.5) {Anticipated Enrollment};
+		\node [style=GreyBoxDotted] (17) at (7.5, 3) {Measured Enrollment};
+		\node [style=Gray Box] (18) at (-6.5, 1.5) {Population};
+		\node [style=GreyBoxDotted] (19) at (5.75, 1.5) {Current Status};
+		\node [style=Gray Box] (20) at (0, 7.5) {Decision to procced with Phase III};
+		\node [style=Gray Box] (21) at (0, -3.5) {Condition};
+		\node [style=Gray Box] (22) at (14.5, -4.25) {Elapsed Duration};
+		\node [style=purple box] (23) at (7, -8.5) {Partially observed};
 	\end{pgfonlayer}
 	\begin{pgfonlayer}{edgelayer}
-		\draw [style=RightArrow] (4) to (3);
-		\draw [style=RightArrow] (5) to (3);
-		\draw [style=Light Arrow] (0) to (7);
-		\draw [style=Light Arrow] (1) to (7);
-		\draw [style=Light Arrow] (7) to (8);
-		\draw [style=Light Arrow] (8) to (4);
-		\draw [style=Light Arrow] (6) to (3);
-		\draw [style=Light Arrow] (0) to (13);
-		\draw [style=Light Arrow] (13) to (3);
-		\draw [style=Light Arrow] (1) to (14);
-		\draw [style=Light Arrow] (14) to (3);
-		\draw [style=Light Arrow] (15) to (5);
-		\draw [style=Light Arrow] (15) to (4);
-		\draw [style=RightArrow] (5) to (4);
+		\draw [style=lightredarrow] (1) to (0);
+		\draw [style=lightredarrow] (1) to (5);
+		\draw [style=lightredarrow] (5) to (0);
+		\draw [style=RightArrow] (15) to (5);
+		\draw [style=RightArrow] (5) to (16);
+		\draw [style=RightArrow] (5) to (17);
+		\draw [style=RightArrow] (5) to (19);
+		\draw [style=RightArrow] (18) to (5);
+		\draw [style=RightArrow] (18) to (1);
+		\draw [style=RightArrow] (9) to (10);
+		\draw [style=RightArrow] (10) to (20);
+		\draw [style=RightArrow] (9) to (12);
+		\draw [style=RightArrow] (12) to (0);
+		\draw [style=Light Arrow] (21) to (1);
+		\draw [style=Light Arrow, in=-75, out=105] (21) to (15);
+		\draw [style=Light Arrow] (21) to (0);
+		\draw [style=Light Arrow, in=-120, out=165, looseness=2.00] (21) to (18);
+		\draw [style=Light Arrow, in=180, out=180, looseness=2.75] (21) to (20);
+		\draw [style=Light Arrow, bend right=60, looseness=1.75] (21) to (9);
+		\draw [style=Light Arrow] (21) to (5);
+		\draw [style=lightredarrow, in=135, out=45, loop] (8) to ();
+		\draw [style=RightArrow, in=-180, out=165, looseness=2.00] (1) to (20);
+		\draw [style=RightArrow, in=120, out=180, looseness=1.25] (10) to (15);
+		\draw [style=RightArrow] (20) to (5);
+		\draw [style=RightArrow] (18) to (15);
+		\draw [style=RightArrow] (22) to (0);
+		\draw [style=RightArrow] (22) to (5);
 	\end{pgfonlayer}
 \end{tikzpicture}