|
|
|
@ -8,61 +8,77 @@
|
|
|
|
% How do I propose estimating that?
|
|
|
|
% How do I propose estimating that?
|
|
|
|
|
|
|
|
|
|
|
|
%%NOTATION
|
|
|
|
%%NOTATION
|
|
|
|
|
|
|
|
% change notation
|
|
|
|
|
|
|
|
% i indexes trials for y and d
|
|
|
|
|
|
|
|
% n indexes snapshots within the trial
|
|
|
|
|
|
|
|
|
|
|
|
First, some notation:
|
|
|
|
First, some notation:
|
|
|
|
\begin{itemize}
|
|
|
|
\begin{itemize}
|
|
|
|
|
|
|
|
\item $i$: indexes trials
|
|
|
|
\item $n$: indexes trial snapshots.
|
|
|
|
\item $n$: indexes trial snapshots.
|
|
|
|
\item $y_n$: whether each trial terminated (true) or completed (false).
|
|
|
|
\item $y_i$: whether each trial terminated (true) or completed (false).
|
|
|
|
\item $d$: indexes ICD-10 disease categories.
|
|
|
|
\item $d_i$: indexes the ICD-10 disease categories per trial.
|
|
|
|
\item $d_n$: represents the disease category of the trial associated with the snapshot $n$.
|
|
|
|
\item $x_{i,n}$: represents the other dependent
|
|
|
|
\item $x_n$: represents the other dependent variables associated to the snapshot.
|
|
|
|
variables associated with the snapshot.
|
|
|
|
This includes\footnote{No trials in the current dataset are ever suspended.}:
|
|
|
|
% This includes\footnote{No trials in the current dataset are ever suspended.}:
|
|
|
|
\begin{enumerate}
|
|
|
|
% \begin{enumerate}
|
|
|
|
\item Elapsed duration
|
|
|
|
% \item Elapsed duration
|
|
|
|
\item arcsinh of the number of brands
|
|
|
|
% \item arcsinh of the number of brands
|
|
|
|
\item arcsinh of the DALYs from high SDI countries
|
|
|
|
% \item arcsinh of the DALYs from high SDI countries
|
|
|
|
\item arcsinh of the DALYs from high-medium SDI countries
|
|
|
|
% \item arcsinh of the DALYs from high-medium SDI countries
|
|
|
|
\item Enrollment (no distinction between anticipated or actual)
|
|
|
|
% \item Enrollment (no distinction between anticipated or actual)
|
|
|
|
\item Dummy Status: Not yet recruiting
|
|
|
|
% \item Dummy Status: Not yet recruiting
|
|
|
|
\item Dummy Status: Recruiting
|
|
|
|
% \item Dummy Status: Recruiting
|
|
|
|
\item Dummy Status: Active, not recruiting
|
|
|
|
% \item Dummy Status: Active, not recruiting
|
|
|
|
\item Dummy Status: Enrolling by invitation
|
|
|
|
% \item Dummy Status: Enrolling by invitation
|
|
|
|
\end{enumerate}
|
|
|
|
% \end{enumerate}
|
|
|
|
\end{itemize}
|
|
|
|
\end{itemize}
|
|
|
|
The arcsinh transform is used because it is similar to a log transform but
|
|
|
|
% The arcsinh transform is used because it is similar to a log transform but
|
|
|
|
maps $\text{arcsinh}(0)=0$.
|
|
|
|
% maps $\text{arcsinh}(0)=0$.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The bayesian model to measure the direct effects of enrollment and the number
|
|
|
|
The bayesian model to measure the direct effect of enrollment
|
|
|
|
of other brands is easily specified as a hierarchal logistic regression.
|
|
|
|
is specified as a hierarchal logistic regression.
|
|
|
|
\begin{align}
|
|
|
|
\begin{align}
|
|
|
|
y_n \sim \text{Bernoulli}(p_n) \\
|
|
|
|
y_i \sim \text{Bernoulli}(p_{i,n}) \\
|
|
|
|
p_n = \text{logit}(x_n \vec \beta(d_n))
|
|
|
|
p_{i,n} = \text{logit}(x_{i,n} \vec \beta(d_n))
|
|
|
|
\end{align}
|
|
|
|
\end{align}
|
|
|
|
Where beta is indexed by $k$ for each parameter in $x$, and by
|
|
|
|
Where beta is indexed by
|
|
|
|
$d \in \{1,2,\dots,21,22\}$ for each general ICD-10 category.
|
|
|
|
$d \in \{1,2,\dots,21,22\}$
|
|
|
|
|
|
|
|
for each general ICD-10 category.
|
|
|
|
The betas are distributed
|
|
|
|
The betas are distributed
|
|
|
|
\begin{align}
|
|
|
|
\begin{align}
|
|
|
|
\beta_k(d) \sim \text{Normal}(\mu_k,\sigma_k)
|
|
|
|
\beta(d) \sim \text{Normal}(\mu,\sigma I)
|
|
|
|
\end{align}
|
|
|
|
\end{align}
|
|
|
|
With hyperparameters
|
|
|
|
With hyperpriors
|
|
|
|
\begin{align}
|
|
|
|
\begin{align}
|
|
|
|
\mu_k \sim \text{Normal}(0,0.05) \\
|
|
|
|
\mu_k \sim \text{Normal}(0,0.05) \\
|
|
|
|
\sigma_k \sim \text{Gamma}(4,20)
|
|
|
|
\sigma_k \sim \text{Gamma}(4,20)
|
|
|
|
\end{align}
|
|
|
|
\end{align}
|
|
|
|
|
|
|
|
\todo{Double check that these are the priors I used.}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Other variables are implicitly conditioned on as they were used
|
|
|
|
Other variables are implicitly conditioned-on as they are used
|
|
|
|
to select trials of interest.
|
|
|
|
to select the trials of interest.
|
|
|
|
These include:
|
|
|
|
I ensured that:
|
|
|
|
|
|
|
|
\todo{double check these in the code.}
|
|
|
|
\begin{itemize}
|
|
|
|
\begin{itemize}
|
|
|
|
\item Is the trial Phase 3?\footnote{
|
|
|
|
\item The trial is Phase 3.
|
|
|
|
Conditioning on phase 3 is equivalent to asserting that previous trials
|
|
|
|
\item The trial has a Data Monitoring Committee.
|
|
|
|
occured and had acceptable safety and efficacy results.
|
|
|
|
\item The compounds are FDA regulated drug.
|
|
|
|
}
|
|
|
|
\item The trial was never suspended\footnote{
|
|
|
|
\item Does the trial have a Data Monitoring Committee?
|
|
|
|
This was because I wasn't sure how to handle it in the model
|
|
|
|
\item Are the compounds an FDA regulated drug?
|
|
|
|
when I started scraping the data.
|
|
|
|
|
|
|
|
Later the website changed.
|
|
|
|
|
|
|
|
This is technically post selection in some cases.
|
|
|
|
|
|
|
|
}
|
|
|
|
\end{itemize}
|
|
|
|
\end{itemize}
|
|
|
|
%TODO: double check the sql used to select trials of interest.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\todo{Make sure data is described before this point.}
|
|
|
|
|
|
|
|
\todo{Put in a standard econometrics model}
|
|
|
|
|
|
|
|
\begin{equation}
|
|
|
|
|
|
|
|
x\beta = \beta_0 + \beta_1 \times \text{test}
|
|
|
|
|
|
|
|
\label{eq:test}
|
|
|
|
|
|
|
|
\end{equation}
|
|
|
|
\end{document}
|
|
|
|
\end{document}
|
|
|
|
|