\documentclass[../Main.tex]{subfiles} \graphicspath{{\subfix{Assets/img/}}} \begin{document} %% Describe goal The model I use is a hierarchal logistic regression model where the hierarchies are based on disease categories. %%NOTATION % change notation % i indexes trials for y and d % n indexes snapshots within the trial First, some notation: \begin{itemize} \item $i$: indexes trials \item $n$: indexes trial snapshots. \item $y_i$: whether each trial terminated (true, 1) or completed (false, 0). \item $d_i$: indexes the ICD-10 disease category of the trial. \item $x_{i,n}$: represents the independent variables associated with the snapshot. \end{itemize} The goal is to take each snapshot and predict The actual specification of the model to measure the direct effect of enrollment is: \begin{align} y_i \sim \text{Bernoulli}(p_{i,n}) \\ p_{i,n} = \text{logit}(x_{i,n} \vec \beta(d_i)) \end{align} Where beta is indexed by $d \in \{1,2,\dots,21,22\}$ for each general ICD-10 category. The betas are distributed \begin{align} \beta(d_i) \sim \text{Normal}(\mu_i,\sigma_i I) \end{align} With hyperpriors %Checked on 2024-11-27. Is corrrect. \todo{Double check that these are the priors I used.} \begin{align} \mu_k \sim \text{Normal}(0,0.05) \\ \sigma_k \sim \text{Gamma}(4,20) \end{align} \todo{Double check actual spec} The independent variables include: \todo{Make sure data is described before this point.} \begin{subequations} \begin{align} x_{i,n}\beta(d_i) = & \bx{1}{\text{Elapsed Duration}} \\ &+ \bx{2}{\arcsinh \left(\text{\# Generic compunds}\right)} \\ &+ \bx{3}{\arcsinh \left(\text{\# Branded compunds}\right)} \\ &+ \bx{4}{\text{\# DALYs in High SDI Countries}} \\ &+ \bx{5}{\text{\# DALYs in High-Medium SDI Countries}} \\ &+ \bx{6}{\text{\# DALYs in Medium SDI Countries}} \\ &+ \bx{7}{\text{\# DALYs in Low-Medium SDI Countries}} \\ &+ \bx{8}{\text{\# DALYs in Low SDI Countries}} \\ &+ \bxi{9}{\text{Not yet Recruiting}}{\text{Trial Status}}\\ &+ \bxi{10}{\text{Recruiting}}{\text{Trial Status}}\\ &+ \bxi{11}{\text{Enrolling by Invitation Only}}{\text{Trial Status}}\\ &+ \bxi{12}{\text{Active, not recruiting}}{\text{Trial Status}} \end{align} \end{subequations} The arcsinh transform is used because it is similar to a log transform but differentiably handles counts of zero since $\text{arcsinh}(0) = \ln (0 + \sqrt{0^2 + 1}) =0$. Note that in this is a heirarchal model, each IDC-10 disease category gets it's own set of parameters, and that is why the $\beta$s are parameterized by $d_i$. %%%% Not sure if space should go here. I think these work well together. Other variables are implicitly controlled for as they are used to select the trials of interest. These include: \todo{double check these in the code.} \begin{itemize} \item The trial is Phase 3. \item The trial has a Data Monitoring Committee. \item The compounds are FDA regulated drug. \item The trial was never suspended\footnote{ This was because I wasn't sure how to handle it in the model when I started scraping the data. Later the website changed. This is technically post selection. \todo{double check where this happened in the code. I may have only done it in the CBO analysis.} } \end{itemize} \end{document}