JobMarketPaper/Paper/sections/04_EconometricModel.tex

\documentclass[../Main.tex]{subfiles}
\graphicspath{{\subfix{Assets/img/}}}

\begin{document}
%% Describe goal
%   Estimate probability distribution of normalized durations and conclusion statuses.
%   Explain why this answers questions well.
%   How do I propose estimating that?

%%NOTATION
% change notation
% i indexes trials for y and d
% n indexes snapshots within the trial

First, some notation:
\begin{itemize}
    \item $i$: indexes trials
    \item $n$: indexes trial snapshots.
    \item $y_i$: whether each trial terminated (true) or completed (false).
    \item $d_i$: indexes the ICD-10 disease categories per trial.
    \item $x_{i,n}$: represents the other dependent
        variables associated with the snapshot.
        % This includes\footnote{No trials in the current dataset are ever suspended.}:
        % \begin{enumerate}
        %     \item Elapsed duration
        %     \item arcsinh of the number of brands
        %     \item arcsinh of the DALYs from high SDI countries
        %     \item arcsinh of the DALYs from high-medium SDI countries
        %     \item Enrollment (no distinction between anticipated or actual)
        %     \item Dummy Status: Not yet recruiting
        %     \item Dummy Status: Recruiting
        %     \item Dummy Status: Active, not recruiting
        %     \item Dummy Status: Enrolling by invitation
        % \end{enumerate}
\end{itemize}
% The arcsinh transform is used because it is similar to a log transform but
% maps $\text{arcsinh}(0)=0$.


The bayesian model to measure the direct effect of enrollment
is specified as a hierarchal logistic regression.
\begin{align}
    y_i \sim \text{Bernoulli}(p_{i,n}) \\
        p_{i,n} = \text{logit}(x_{i,n} \vec \beta(d_n))
\end{align}
Where beta is indexed by
$d \in \{1,2,\dots,21,22\}$
for each general ICD-10 category.
The betas are distributed
\begin{align}
    \beta(d) \sim \text{Normal}(\mu,\sigma I)
\end{align}
With hyperpriors
\begin{align}
    \mu_k \sim \text{Normal}(0,0.05) \\
    \sigma_k \sim \text{Gamma}(4,20)
\end{align}
\todo{Double check that these are the priors I used.}


Other variables are implicitly conditioned-on as they are used
to select the trials of interest.
I ensured that:
        \todo{double check these in the code.}
\begin{itemize}
    \item The trial is Phase 3.
    \item The trial has a Data Monitoring Committee.
    \item The compounds are FDA regulated drug.
    \item The trial was never suspended\footnote{
        This was because I wasn't sure how to handle it in the model
        when I started scraping the data.
        Later the website changed.
        This is technically post selection in some cases.
    }
\end{itemize}


\todo{Make sure data is described before this point.}
\todo{Put in a standard econometrics model}
\begin{equation}
    x\beta = \beta_0 + \beta_1 \times \text{test}
    \label{eq:test}
\end{equation}
\end{document}