You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
92 lines
2.5 KiB
Markdown
92 lines
2.5 KiB
Markdown
# ClinicalTrialsDataProcessing
|
|
|
|
This represents my
|
|
|
|
## Prerequisites
|
|
|
|
> Python >= 3.10 (requires match statement)
|
|
> Docker >= 20.10
|
|
> Curl >= 7
|
|
> Just >= 1.9
|
|
|
|
|
|
# Usage
|
|
|
|
|
|
## Basic usage
|
|
|
|
Check prerequisites
|
|
```bash
|
|
just check-status
|
|
```
|
|
|
|
Setup the underlying AACT database including downloading both
|
|
the AACT dump and historical data.
|
|
```bash
|
|
just create
|
|
just select-trials
|
|
just count=1000 get-histories
|
|
```
|
|
replacing the 1000 in `count=1000` with the number of trials you want to download.
|
|
|
|
## Advanced Usage
|
|
|
|
If you need to reset the db without downloading the AACT dump
|
|
```bash
|
|
just rebuild
|
|
just select-trials
|
|
just count=1000 get-histories
|
|
```
|
|
|
|
|
|
### Description of all the `just` recipes
|
|
|
|
# Background information
|
|
|
|
This is designed to run on a linux machine with bash.
|
|
If you are using a shell other than bash you should be aware of what
|
|
is needed to run all of this using bash
|
|
|
|
If any of the discussions below don't make sense, talk to your sysadmin,
|
|
a local linux user, or reach out to the author.
|
|
|
|
## Just installation
|
|
|
|
I use the command runner `just` to automate/simplfy setting up the
|
|
docker containers and running many of the python scripts.
|
|
It is similar to `make` in many ways but is designed to do less.
|
|
|
|
Just can be installed from https://github.com/casey/just/
|
|
|
|
## Python installation
|
|
|
|
This requires python 3.10 or above due to the use of match-case statements
|
|
in the html parser.
|
|
|
|
Check which version of python you have by typing `python --version`.
|
|
If you do not have the required version, I would recommend installing
|
|
the conda python manager and setting up a conda environment with python 3.10.
|
|
Instructions for doing so are on the internet.
|
|
|
|
## Docker and Postgres
|
|
Docker is a tool to manage and run OCI containers.
|
|
What this means in regards to this project is that docker makes it
|
|
easy to setup containers.
|
|
|
|
Install docker based on instructions for your linux distribution.
|
|
I use podman (an alternative from RedHat) because it allows for running without root permissions.
|
|
|
|
### Docker networking
|
|
|
|
It is helpful to construct an external docker network by running
|
|
|
|
`docker network create network_name`
|
|
|
|
and then including that network in the docker-compose.yaml
|
|
|
|
# Environment Variables (`.env` file)
|
|
I use an single .env file to setup the docker containers and pass configuration variables to
|
|
the python scripts. I would suggest changing the default values in `sample.env` to match your needs.
|
|
If you do need to think about the security of your database I would recommend
|
|
you start by changing these.
|