ClinicalTrialsPaper/Latex/Paper/sections/04_EconometricModel.tex

\documentclass[../Main.tex]{subfiles}
\graphicspath{{\subfix{Assets/img/}}}

\begin{document}
%% Describe goal
%   Estimate probability distribution of normalized durations and conclusion statuses.
%   Explain why this answers questions well.
%   How do I propose estimating that?

%%NOTATION

First, some notation:
\begin{itemize}
    \item $n$: indexes trial snapshots.
    \item $y_n$: whether each trial terminated (true) or completed (false).
    \item $d$: indexes ICD-10 disease categories.
    \item $d_n$: represents the disease category of the trial associated with the snapshot $n$.
    \item $x_n$: represents the other dependent variables associated to the snapshot.
        This includes\footnote{No trials in the current dataset are ever suspended.}:
        \begin{enumerate}
            \item Elapsed duration
            \item arcsinh of the number of brands
            \item arcsinh of the DALYs from high SDI countries
            \item arcsinh of the DALYs from high-medium SDI countries
            \item Enrollment (no distinction between anticipated or actual)
            \item Dummy Status: Not yet recruiting
            \item Dummy Status: Recruiting
            \item Dummy Status: Active, not recruiting
            \item Dummy Status: Enrolling by invitation
        \end{enumerate}
\end{itemize}
The arcsinh transform is used because it is similar to a log transform but
maps $\text{arcsinh}(0)=0$.


The bayesian model to measure the direct effects of enrollment and the number
of other brands is easily specified as a hierarchal logistic regression.
\begin{align}
    y_n \sim \text{Bernoulli}(p_n) \\
	p_n = \text{logit}(x_n \vec \beta(d_n))
\end{align}
Where beta is indexed by $k$ for each parameter in $x$, and by
$d \in \{1,2,\dots,21,22\}$ for each general ICD-10 category.
The betas are distributed
\begin{align}
    \beta_k(d) \sim \text{Normal}(\mu_k,\sigma_k)
\end{align}
With hyperparameters
\begin{align}
    \mu_k \sim \text{Normal}(0,1) \\
    \sigma_k \sim \text{Gamma}(2,1)
\end{align}


Other variables are implicitly conditioned on as they were used
to select trials of interest.
These include:
\begin{itemize}
    \item Is the trial Phase 3?\footnote{
       Conditioning on phase 3 is equivalent to asserting that previous trials
       occured and had acceptable safety and efficacy results.
       }
    \item Does the trial have a Data Monitoring Committee?
    \item Are the compounds an FDA regulated drug?
\end{itemize}
%TODO: double check the sql used to select trials of interest.

\end{document}