# ClinicalTrialsDataProcessing This represents my ## Prerequisites > Python >= 3.10 (requires match statement) > Docker >= 20.10 > Curl >= 7 > Just >= 1.9 # Usage ## Basic usage Check prerequisites ```bash just check-status ``` Setup the underlying AACT database including downloading both the AACT dump and historical data. ```bash just create just select-trials just count=1000 get-histories ``` replacing the 1000 in `count=1000` with the number of trials you want to download. ## Advanced Usage If you need to reset the db without downloading the AACT dump ```bash just rebuild just select-trials just count=1000 get-histories ``` ### Description of all the `just` recipes # Background information This is designed to run on a linux machine with bash. If you are using a shell other than bash you should be aware of what is needed to run all of this using bash If any of the discussions below don't make sense, talk to your sysadmin, a local linux user, or reach out to the author. ## Just installation I use the command runner `just` to automate/simplfy setting up the docker containers and running many of the python scripts. It is similar to `make` in many ways but is designed to do less. Just can be installed from https://github.com/casey/just/ ## Python installation This requires python 3.10 or above due to the use of match-case statements in the html parser. Check which version of python you have by typing `python --version`. If you do not have the required version, I would recommend installing the conda python manager and setting up a conda environment with python 3.10. Instructions for doing so are on the internet. ## Docker and Postgres Docker is a tool to manage and run OCI containers. What this means in regards to this project is that docker makes it easy to setup containers. Install docker based on instructions for your linux distribution. I use podman (an alternative from RedHat) because it allows for running without root permissions. ### Docker networking It is helpful to construct an external docker network by running `docker network create network_name` and then including that network in the docker-compose.yaml # Environment Variables (`.env` file) I use an single .env file to setup the docker containers and pass configuration variables to the python scripts. I would suggest changing the default values in `sample.env` to match your needs. If you do need to think about the security of your database I would recommend you start by changing these.