diff --git a/Paper/Main.tex b/Paper/Main.tex index 4149e61..fed77ef 100644 --- a/Paper/Main.tex +++ b/Paper/Main.tex @@ -68,10 +68,10 @@ Section \ref{SEC:Results} discusses the results of the analysis. \subfile{sections/10_CausalStory} \subfile{sections/02_data} -%--------------------------------------------------------------- -\section{Causal Identification}\label{SEC:CausalIdentification} -%--------------------------------------------------------------- -\subfile{sections/03_CausalIdentification} +% %--------------------------------------------------------------- +% \section{Causal Identification}\label{SEC:CausalIdentification} +% %--------------------------------------------------------------- +% \subfile{sections/03_CausalIdentification} %--------------------------------------------------------------- \section{Econometric Model}\label{SEC:EconometricModel} diff --git a/Paper/sections/04_EconometricModel.tex b/Paper/sections/04_EconometricModel.tex index b8f2d6c..de205f9 100644 --- a/Paper/sections/04_EconometricModel.tex +++ b/Paper/sections/04_EconometricModel.tex @@ -8,61 +8,77 @@ % How do I propose estimating that? %%NOTATION +% change notation +% i indexes trials for y and d +% n indexes snapshots within the trial First, some notation: \begin{itemize} + \item $i$: indexes trials \item $n$: indexes trial snapshots. - \item $y_n$: whether each trial terminated (true) or completed (false). - \item $d$: indexes ICD-10 disease categories. - \item $d_n$: represents the disease category of the trial associated with the snapshot $n$. - \item $x_n$: represents the other dependent variables associated to the snapshot. - This includes\footnote{No trials in the current dataset are ever suspended.}: - \begin{enumerate} - \item Elapsed duration - \item arcsinh of the number of brands - \item arcsinh of the DALYs from high SDI countries - \item arcsinh of the DALYs from high-medium SDI countries - \item Enrollment (no distinction between anticipated or actual) - \item Dummy Status: Not yet recruiting - \item Dummy Status: Recruiting - \item Dummy Status: Active, not recruiting - \item Dummy Status: Enrolling by invitation - \end{enumerate} + \item $y_i$: whether each trial terminated (true) or completed (false). + \item $d_i$: indexes the ICD-10 disease categories per trial. + \item $x_{i,n}$: represents the other dependent + variables associated with the snapshot. + % This includes\footnote{No trials in the current dataset are ever suspended.}: + % \begin{enumerate} + % \item Elapsed duration + % \item arcsinh of the number of brands + % \item arcsinh of the DALYs from high SDI countries + % \item arcsinh of the DALYs from high-medium SDI countries + % \item Enrollment (no distinction between anticipated or actual) + % \item Dummy Status: Not yet recruiting + % \item Dummy Status: Recruiting + % \item Dummy Status: Active, not recruiting + % \item Dummy Status: Enrolling by invitation + % \end{enumerate} \end{itemize} -The arcsinh transform is used because it is similar to a log transform but -maps $\text{arcsinh}(0)=0$. +% The arcsinh transform is used because it is similar to a log transform but +% maps $\text{arcsinh}(0)=0$. -The bayesian model to measure the direct effects of enrollment and the number -of other brands is easily specified as a hierarchal logistic regression. +The bayesian model to measure the direct effect of enrollment +is specified as a hierarchal logistic regression. \begin{align} - y_n \sim \text{Bernoulli}(p_n) \\ - p_n = \text{logit}(x_n \vec \beta(d_n)) + y_i \sim \text{Bernoulli}(p_{i,n}) \\ + p_{i,n} = \text{logit}(x_{i,n} \vec \beta(d_n)) \end{align} -Where beta is indexed by $k$ for each parameter in $x$, and by -$d \in \{1,2,\dots,21,22\}$ for each general ICD-10 category. +Where beta is indexed by +$d \in \{1,2,\dots,21,22\}$ +for each general ICD-10 category. The betas are distributed \begin{align} - \beta_k(d) \sim \text{Normal}(\mu_k,\sigma_k) + \beta(d) \sim \text{Normal}(\mu,\sigma I) \end{align} -With hyperparameters +With hyperpriors \begin{align} \mu_k \sim \text{Normal}(0,0.05) \\ \sigma_k \sim \text{Gamma}(4,20) \end{align} +\todo{Double check that these are the priors I used.} -Other variables are implicitly conditioned on as they were used -to select trials of interest. -These include: +Other variables are implicitly conditioned-on as they are used +to select the trials of interest. +I ensured that: + \todo{double check these in the code.} \begin{itemize} - \item Is the trial Phase 3?\footnote{ - Conditioning on phase 3 is equivalent to asserting that previous trials - occured and had acceptable safety and efficacy results. - } - \item Does the trial have a Data Monitoring Committee? - \item Are the compounds an FDA regulated drug? + \item The trial is Phase 3. + \item The trial has a Data Monitoring Committee. + \item The compounds are FDA regulated drug. + \item The trial was never suspended\footnote{ + This was because I wasn't sure how to handle it in the model + when I started scraping the data. + Later the website changed. + This is technically post selection in some cases. + } \end{itemize} -%TODO: double check the sql used to select trials of interest. + +\todo{Make sure data is described before this point.} +\todo{Put in a standard econometrics model} +\begin{equation} + x\beta = \beta_0 + \beta_1 \times \text{test} + \label{eq:test} +\end{equation} \end{document}