tweaked econometrics presentation, added todos

claude_rewrite
will king 1 year ago
parent 7d51cb10b3
commit 3e6a8f10d4

@ -68,10 +68,10 @@ Section \ref{SEC:Results} discusses the results of the analysis.
\subfile{sections/10_CausalStory} \subfile{sections/10_CausalStory}
\subfile{sections/02_data} \subfile{sections/02_data}
%--------------------------------------------------------------- % %---------------------------------------------------------------
\section{Causal Identification}\label{SEC:CausalIdentification} % \section{Causal Identification}\label{SEC:CausalIdentification}
%--------------------------------------------------------------- % %---------------------------------------------------------------
\subfile{sections/03_CausalIdentification} % \subfile{sections/03_CausalIdentification}
%--------------------------------------------------------------- %---------------------------------------------------------------
\section{Econometric Model}\label{SEC:EconometricModel} \section{Econometric Model}\label{SEC:EconometricModel}

@ -8,61 +8,77 @@
% How do I propose estimating that? % How do I propose estimating that?
%%NOTATION %%NOTATION
% change notation
% i indexes trials for y and d
% n indexes snapshots within the trial
First, some notation: First, some notation:
\begin{itemize} \begin{itemize}
\item $i$: indexes trials
\item $n$: indexes trial snapshots. \item $n$: indexes trial snapshots.
\item $y_n$: whether each trial terminated (true) or completed (false). \item $y_i$: whether each trial terminated (true) or completed (false).
\item $d$: indexes ICD-10 disease categories. \item $d_i$: indexes the ICD-10 disease categories per trial.
\item $d_n$: represents the disease category of the trial associated with the snapshot $n$. \item $x_{i,n}$: represents the other dependent
\item $x_n$: represents the other dependent variables associated to the snapshot. variables associated with the snapshot.
This includes\footnote{No trials in the current dataset are ever suspended.}: % This includes\footnote{No trials in the current dataset are ever suspended.}:
\begin{enumerate} % \begin{enumerate}
\item Elapsed duration % \item Elapsed duration
\item arcsinh of the number of brands % \item arcsinh of the number of brands
\item arcsinh of the DALYs from high SDI countries % \item arcsinh of the DALYs from high SDI countries
\item arcsinh of the DALYs from high-medium SDI countries % \item arcsinh of the DALYs from high-medium SDI countries
\item Enrollment (no distinction between anticipated or actual) % \item Enrollment (no distinction between anticipated or actual)
\item Dummy Status: Not yet recruiting % \item Dummy Status: Not yet recruiting
\item Dummy Status: Recruiting % \item Dummy Status: Recruiting
\item Dummy Status: Active, not recruiting % \item Dummy Status: Active, not recruiting
\item Dummy Status: Enrolling by invitation % \item Dummy Status: Enrolling by invitation
\end{enumerate} % \end{enumerate}
\end{itemize} \end{itemize}
The arcsinh transform is used because it is similar to a log transform but % The arcsinh transform is used because it is similar to a log transform but
maps $\text{arcsinh}(0)=0$. % maps $\text{arcsinh}(0)=0$.
The bayesian model to measure the direct effects of enrollment and the number The bayesian model to measure the direct effect of enrollment
of other brands is easily specified as a hierarchal logistic regression. is specified as a hierarchal logistic regression.
\begin{align} \begin{align}
y_n \sim \text{Bernoulli}(p_n) \\ y_i \sim \text{Bernoulli}(p_{i,n}) \\
p_n = \text{logit}(x_n \vec \beta(d_n)) p_{i,n} = \text{logit}(x_{i,n} \vec \beta(d_n))
\end{align} \end{align}
Where beta is indexed by $k$ for each parameter in $x$, and by Where beta is indexed by
$d \in \{1,2,\dots,21,22\}$ for each general ICD-10 category. $d \in \{1,2,\dots,21,22\}$
for each general ICD-10 category.
The betas are distributed The betas are distributed
\begin{align} \begin{align}
\beta_k(d) \sim \text{Normal}(\mu_k,\sigma_k) \beta(d) \sim \text{Normal}(\mu,\sigma I)
\end{align} \end{align}
With hyperparameters With hyperpriors
\begin{align} \begin{align}
\mu_k \sim \text{Normal}(0,0.05) \\ \mu_k \sim \text{Normal}(0,0.05) \\
\sigma_k \sim \text{Gamma}(4,20) \sigma_k \sim \text{Gamma}(4,20)
\end{align} \end{align}
\todo{Double check that these are the priors I used.}
Other variables are implicitly conditioned on as they were used Other variables are implicitly conditioned-on as they are used
to select trials of interest. to select the trials of interest.
These include: I ensured that:
\todo{double check these in the code.}
\begin{itemize} \begin{itemize}
\item Is the trial Phase 3?\footnote{ \item The trial is Phase 3.
Conditioning on phase 3 is equivalent to asserting that previous trials \item The trial has a Data Monitoring Committee.
occured and had acceptable safety and efficacy results. \item The compounds are FDA regulated drug.
\item The trial was never suspended\footnote{
This was because I wasn't sure how to handle it in the model
when I started scraping the data.
Later the website changed.
This is technically post selection in some cases.
} }
\item Does the trial have a Data Monitoring Committee?
\item Are the compounds an FDA regulated drug?
\end{itemize} \end{itemize}
%TODO: double check the sql used to select trials of interest.
\todo{Make sure data is described before this point.}
\todo{Put in a standard econometrics model}
\begin{equation}
x\beta = \beta_0 + \beta_1 \times \text{test}
\label{eq:test}
\end{equation}
\end{document} \end{document}

Loading…
Cancel
Save