Overarching Vision

I think I can get two, maybe three, papers out of this:

Estimate the joint probability of success and Plan Normalized duration.
Estimate a model similar to EK's model of market vs scientific failure, incorporating the previous point.
Develop a structural model that includes a policy parameter relevant to a policy the CBO cares about (Not a high priority yet).

Questions:

How can one statistically describe the way drugs pass through the FDA development pipeline?
What is the effect of policy on phase completion and phase transition?
- What is the impact of surragate endpoints on phase transition completion? on phase transition?

Value Proposition:

Attempt to build a "cannonical" probabalistic model of the clinical trials process, which will allow for a simple way to test hypotheses regarding the impact of policies and practices on drug development.

Distinguishing features

Estimation is contingent on expectations at the beginning of a trial.
Estimation targets are conditional probability functions/distributions.

Desired Attributes of finished project

straightforward to re-estimate model and add extensions.
Data Processing is separate and mostly automated.
- An input data standard exists.
Inference method is well supported.
Documentation of code exists, and is useful.

Models

A couple of general principles.

Describe the model in terms of conditional probability distributions (not expectations).
Model based on information known at the beginning of the trial. (benefit of the historical data we captured).
- This might allow me to escape EK's markov modeling approach for phase transitions.

Phase completion: Joint Probability Estimation (single paper?)

This is just following in the footsteps of AbrantesMetz-Adams-and-Metz, but with a possibly different probability estimation approach. Also, I think I can improve the richness of the terminal states.

Can probably use the data we have. This would include Plan Normalized duration, where we take the planned completion date and estimate the term: (Actual completion date - start date)/(Planned completion date - start date)

By estimating P(end condition | data) and P(Plan normalized duration | end condition & data) I can get P(end condition and Plan normalized duration | data), the more useful joint probability describing phase completion.

Phase Transition: Probability Estimation (single paper?)

Once I have the joint probability, I can begin estimateing a model similar to Ekaterina Khmelnitskaya's model of drug development, trying to separate out probability of scientific vs market drops.

This would use the IND or similar data to build the list of phase transitions.

Design tradeoffs.

Some design tradeoffs include

Normalized or non-normalized transition paths? I'm particularly interested in inclucing mixed-phase paths, i.e. non-normalized paths.

Overall Model

Overall this would allow me to construct a probabalistic description of passing through the trials portion of the R&D pipeline. The probabalistic model could then be used to answer various questions, including causal/structural questions. It would also be straightforward to develop simulations from this approach, as the probabilities are right there.

Data

Phase Completion

Use the API access given to CBO to get:

Information at the beginning of the project.
Information after wrap-up of the project (completion status).

This would allow estimation to be contingent on beliefs at the beginning of the trial.

Estimation Strategy

Probably use a non-parametric (np-bayesian?) approach to estimating the probability densities.

Phase Completion Probabilities:

AbrantesMetz-Adams-and-Metz use a mixed state proportional hazards model.

The goal is to estimate

P(completing phase 1 & duration phase 1 | Data)
P(completing phase 2 & duration phase 2 | Data & transitioned to phase 2 & duration phase 1)
P(completing phase 3 & duration phase 3 | Data & transitioned to phase 3 & durations phases 1,2)

I may also include the combined phases

P(completing combined phase 1-2 & duration phase 1-2 | Data)
P(completing combined phase 2-2 & duration phase 2-3 | Data & transitioned to phase 2 & duration phase 1)

I think there might be a way of condensing these using "phase endpoints" as a marker, and not the phase type itself.

Phase Transition Probabilities:

EK used finish describing

The goal is to estimate

P(transitioning to phase 2 | completing phase 1 & duration phase 1 & Data & market data)
P(transitioning to combined phases 2-3 | completing phase 1 & duration phase 1 & Data & market data)
P(transitioning to phase 3 | completing phase 2 & durations phases 1,2 & Data & market data)
P(transitioning to phase 3 | completing combined phase 1-2 & durations phase 1-2 & Data & market data)

Identification Strategies

Below is a list of challenges to identification and how to overcome them.

Descriptive identification

Causal Identification

Overarching Vision

Questions:
Value Proposition:

Distinguishing features
Desired Attributes of finished project

Models

Phase completion: Joint Probability Estimation (single paper?)
Phase Transition: Probability Estimation (single paper?)

Design tradeoffs.

Overall Model

Data

Phase Completion

Estimation Strategy

Phase Completion Probabilities:
Phase Transition Probabilities:
Identification Strategies

Descriptive identification
Causal Identification