You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
93 lines
2.4 KiB
Markdown
93 lines
2.4 KiB
Markdown
# ClinicalTrialsDataProcessing
|
|
|
|
This represents my
|
|
|
|
## Prerequisites
|
|
|
|
> Python >= 3.10 (requires match statement)
|
|
> Docker >= 20.10
|
|
> Curl >= 7
|
|
> Just >= 1.9
|
|
|
|
|
|
# Usage
|
|
|
|
|
|
## Basic usage
|
|
|
|
Check prerequisites
|
|
```bash
|
|
just check-status
|
|
```
|
|
|
|
Setup the underlying AACT database including downloading both
|
|
the AACT dump and historical data.
|
|
```bash
|
|
just create
|
|
just select-trials
|
|
just count=1000 get-histories
|
|
```
|
|
replacing the 1000 in `count=1000` with the number of trials you want to download.
|
|
|
|
## Advanced Usage
|
|
|
|
If you need to reset the db without downloading the AACT dump
|
|
```bash
|
|
just rebuild
|
|
just select-trials
|
|
just count=1000 get-histories
|
|
```
|
|
|
|
|
|
### Description of all the `just` recipes
|
|
|
|
# Background information
|
|
|
|
This is designed to run on a linux machine with bash.
|
|
If you are using a shell other than bash you should be aware of what
|
|
is needed to run all of this using bash
|
|
|
|
If any of the discussions below don't make sense, talk to your sysadmin,
|
|
a local linux user, or reach out to the author.
|
|
|
|
## Just installation
|
|
|
|
I use the command runner `just` to automate/simplfy setting up the
|
|
docker containers and running many of the python scripts.
|
|
It is similar to `make` in many ways but is designed to do less.
|
|
|
|
Just can be installed from https://github.com/casey/just/
|
|
|
|
## Python installation
|
|
|
|
This requires python 3.10 or above due to the use of match-case statements
|
|
in the html parser.
|
|
|
|
Check which version of python you have by typing `python --version`.
|
|
If you do not have the required version, I would recommend installing
|
|
the conda python manager and setting up a conda environment with python 3.10.
|
|
Instructions for doing so are on the internet.
|
|
|
|
## Docker and Postgres
|
|
Docker is a tool to manage and run OCI containers.
|
|
What this means in regards to this project is that docker makes it
|
|
easy to setup containers.
|
|
|
|
Install docker based on instructions for your linux distribution.
|
|
|
|
### Docker networking
|
|
|
|
I have the docker container for the database attached to a
|
|
network called "pharmaceutical_research" because I have a
|
|
container with pgadmin4 running on that docker network.
|
|
This can be adjusted in the dockerfile.
|
|
|
|
I also have the database container open on port 5432, the typical
|
|
postgresql database port.
|
|
|
|
### Database logins
|
|
I have choosen the database user of *root* with a password of *root*
|
|
because I don't really need this database to be secure.
|
|
If you do need to think about the security of your database I would recommend
|
|
you start by changing these.
|