You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
102 lines
4.7 KiB
TeX
102 lines
4.7 KiB
TeX
\documentclass[../Main.tex]{subfiles}
|
|
\graphicspath{{\subfix{Assets/img/}}}
|
|
|
|
\begin{document}
|
|
|
|
As noted above, there are various issues with the analysis as completed so far.
|
|
Below I discuss various steps that I believe will improve the analysis.
|
|
|
|
\subsection{Increasing number of observations}
|
|
|
|
The most important step is to increase the number of observations available.
|
|
Currently this requires matching trials to ICD-10 codes by hand, but
|
|
there are certainly some steps that can be taken to improve the speed with which
|
|
this can be done.
|
|
|
|
\subsection{Covariance Structure}
|
|
|
|
As noted in the diagnostics section, many of the convergence issues seem
|
|
to occure in the covariance structure.
|
|
Instead of representing the parameters $\beta$ as independently normal:
|
|
\begin{align}
|
|
\beta_k(d) \sim \text{Normal}(\mu_k, \sigma_k)
|
|
\end{align}
|
|
I propose using a multivariate normal distribution:
|
|
\begin{align}
|
|
\beta(d) \sim \text{MvNormal}(\mu, \Sigma)
|
|
\end{align}
|
|
I am not familiar with typical approaches to priors on the covariance matrix,
|
|
so this will require a further literature search as to best practices.
|
|
|
|
\subsection{Finding Reasonable Priors}
|
|
|
|
In standard bayesian regression, heavy tailed priors are common.
|
|
When working with a bayesian bernoulli-logit model, this is not appropriate as
|
|
heavy tails cause the estimated probabilities $p_n$ to concentrate around the
|
|
values $0$ and $1$, and away from values such as $\frac{1}{2}$ as discussed in
|
|
\cite{mcelreath_statistical_2020}. %TODO: double check the chapter for this.
|
|
|
|
I indend to take the general approach recommended in \cite{mcelreath_statistical_2020} of using
|
|
prior predictive checks to evaluate the implications of different priors
|
|
on the distribution on $p_n$.
|
|
This would consist of taking the independent variables and predicting the values
|
|
of $p_n$ based on a proposed set of priors.
|
|
By plotting these predictions, I can ensure that the specific parameter priors
|
|
used are consistent with my prior beliefs on how $p_n$ behaves.
|
|
Currently I believe that $p_n$ should be roughly uniform or unimodal, centered
|
|
around $p_n = \frac{1}{2}$.
|
|
|
|
|
|
\subsection{Imputing Enrollment}
|
|
|
|
Finally, I must address the issue of how enrollment is reported.
|
|
In many cases, the trial continues to report an anticipated enrollment value
|
|
while the trial is still recruiting.
|
|
Thus using anticipated enrollment figures is inappropriate.
|
|
I am planning on using bayesian imputation to estimate actual enrollment
|
|
when it has not yet occured.
|
|
This will require building a statistical model of the enrollment process.
|
|
One advantage this dataset has is that trial sponsors provide their anticipated
|
|
enrollment numbers, allowing me to use this in the prediction model.
|
|
Additionally, each snapshot contains the elapsed duration and current status of
|
|
the trial , which may help improve the prediction.
|
|
Although predicted enrollment will be imprecise, it explicitly accounts for
|
|
uncertanty in the imputation and dependent calculations \cite{mcelreath_statistical_2020}.
|
|
|
|
\subsection{Improving Population Estimates}
|
|
|
|
The Global Burden of Disease dataset contains the best estimates of disease
|
|
population sizes that I have found so far.
|
|
Unfortunately, for some conditions it can be relatively imprecise due to
|
|
its focus on providing data geared towards public health policy.
|
|
For example, GBD contains categories for both
|
|
drug resistant and drug suceptible tuberculosis.
|
|
In contrast, there is no category for non-age related macular degeneration.
|
|
One resulting concern is that for a given ICD-10 code, the applicable GBD population
|
|
estimates may act as an estimate of the upper bound of population size
|
|
(\cite{global_burden_of_disease_collective_network_global_2020}). %fix citation
|
|
I would like to explicitly address this in my model, although I have not
|
|
found a way to do so.
|
|
|
|
|
|
\subsection{Improving Measures of Market Conditions}
|
|
|
|
Finally, the currently employed measure of market conditions -- the number of
|
|
brands using the same active ingredients -- is not a very good measure of
|
|
the options available to potential participants of a clinical trial.
|
|
The ideal measures would capture the alternatives available to treat a given
|
|
disease (drug meeting the given indication) at the time of the trial snapshot,
|
|
but this data is hard to come by.
|
|
In addition to the fact that many diseases may be treated by non-pharmaceutical
|
|
means, off-label prescription of pharmaceuticals is legal at the federal level
|
|
(\cite{commissioner_understanding_2019}).
|
|
These two facts both complicate measuring market conditions.
|
|
|
|
One dataset that I have only investigated briefly is the \url{DrugCentral.org}
|
|
database which tracks official indications and some off-label indications as
|
|
well
|
|
(\cite{ursu_drugcentral_2017}).
|
|
|
|
|
|
\end{document}
|