Link trial conditions to Burden of Disease data #36

Closed
opened 3 years ago by youainti · 16 comments
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Use the WHO burden of disease data to approximate population sizes of different diseases.

Steps include:

  • Download WHO-BoD data
  • Link WHO-BoD to mesh, atc, or other standardized nomenclature
  • Figure out population measures
  • Link trial conditions to mesh
  • Connect Trial conditions to measures of population
Use the WHO burden of disease data to approximate population sizes of different diseases. Steps include: - [x] Download WHO-BoD data - [x] Link WHO-BoD to mesh, atc, or other standardized nomenclature - [x] Figure out population measures - [x] Link trial conditions to mesh - [x] Connect Trial conditions to measures of population
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

https://meshb.nlm.nih.gov/record/ui?name=Global%20Burden%20of%20Disease

It looks like there is already some sort of linkage between GBD and MeSH

https://meshb.nlm.nih.gov/record/ui?name=Global%20Burden%20of%20Disease It looks like there is already some sort of linkage between GBD and MeSH
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

downloaded data and wrote scripts to put in in the db.

downloaded data and wrote scripts to put in in the db.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

To quote snowmed about smdct <-> idc10 mappings

ICD-10
The International Classification of Diseases (ICD) is the foundation for the identification of health trends and statistics globally, and the international standard for reporting diseases and health conditions, published by the World Health Organization. ICD-10 is the tenth ICD revision. - snowmed ct mappings page

So it looks like I should map GBD to IDC-10 and also map the conditions from browse_conditions to idc-10

To quote snowmed about smdct <-> idc10 mappings > ICD-10 The International Classification of Diseases (ICD) is the foundation for the identification of health trends and statistics globally, and the international standard for reporting diseases and health conditions, published by the World Health Organization. ICD-10 is the tenth ICD revision. [- snowmed ct mappings page](https://www.snomed.org/maps) So it looks like I should [map GBD to IDC-10](https://ghdx.healthdata.org/record/ihme-data/gbd-2019-cause-icd-code-mappings) and also map the conditions from `browse_conditions` to idc-10
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)
# icd-10 background info https://www.cdc.gov/nchs/icd/index.htm
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM
the 2019 icd-10-cm data which should match that used to link the global burdens of disease data as it was last released for 2019.

https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM the 2019 icd-10-cm data which should match that used to link the global burdens of disease data as it was last released for 2019.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

The icd-10 version from WHO: https://icd.who.int/browse10/2019/en

This does not seem to match the downloadable version of ICD-10-CM from CMS
https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM. In particular, there are missing codes in the CMS version that I find in the WHO version: e.g.

F02.4 can be found in the WHO version but not the CMS version search here

The icd-10 version from WHO: https://icd.who.int/browse10/2019/en This does not seem to match the downloadable version of ICD-10-CM from CMS https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM. In particular, there are missing codes in the CMS version that I find in the WHO version: e.g. F02.4 can be found in the WHO version but not the CMS version [search here](https://www.cms.gov/Medicare/Coding/ICD10/2019-ICD-10-CM)
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

I checked the 2020 CMS version of ICD-10-CM for F02.4 as well and it is missing.

WHO does provide API access to get the ICD-10:2019 access. https://icd.who.int/icdapi

Details on python requests here:
https://github.com/ICD-API/Python-samples/blob/master/sample.py

I checked the 2020 CMS version of ICD-10-CM for F02.4 as well and it is missing. WHO does provide API access to get the ICD-10:2019 access. https://icd.who.int/icdapi Details on python requests here: https://github.com/ICD-API/Python-samples/blob/master/sample.py
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Downloaded the WHO ICD-10 (2019) categories by copying and pasting from the nav bar of https://icd.who.int/browse10/2019/en

Downloaded the WHO ICD-10 (2019) categories by copying and pasting from the nav bar of https://icd.who.int/browse10/2019/en
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

So it turns out that the ICD-10 codes used in the GBD data source is not consistent between WHO ICD-109 and CMS ICD-10-CM.

Examples:

  • F02.4 can be found in the WHO version but not the CMS version.
  • C46.5 Can be found in the CMS version but not the WHO version.
So it turns out that the ICD-10 codes used in the GBD data source is not consistent between WHO ICD-109 and CMS ICD-10-CM. Examples: * F02.4 can be found in the WHO version but not the CMS version. * C46.5 Can be found in the CMS version but not the WHO version.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

As of 470dfc2611 I got a working merge of the WHO and CMS versions as well as code to generate it.

As of 470dfc2611609c64 I got a working merge of the WHO and CMS versions as well as code to generate it.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Current data state:

  1. trial -> mesh_term : ctgov
  2. mesh_term -> icd10 : UMLS api
  3. icd10 -> Global Burden of Disease sizes : DiseaseBurden
  4. icd10 -> Global Burden of Disease causes : DiseaseBurden

What I need to do:

  • Write python script to link that second step
  • Import links between icd10 and GBD (steps 3 & 4)
Current data state: 1. trial -> mesh_term : ctgov 2. mesh_term -> icd10 : UMLS api 3. icd10 -> Global Burden of Disease sizes : DiseaseBurden 4. icd10 -> Global Burden of Disease causes : DiseaseBurden What I need to do: - [x] Write python script to link that second step - [x] Import links between icd10 and GBD (steps 3 & 4)
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

I currently have a flask app that will simplify manual matching of the data I need.
I still have a couple of steps left:

  • Review how I am saving the data in the db.
    • I think there is probably a better way than the 2 tables that I am using
    • maybe a single table that gets updated instead of a separate table?
    • or a view that collects only the most recent values from match-status?
  • Ensure that the index page has each study show up exactly once.
  • Work through the data
  • Figure out how to keep the data consistent for analysis steps. Consider exporting backups?
I currently have a flask app that will simplify manual matching of the data I need. I still have a couple of steps left: - [x] Review how I am saving the data in the db. - I think there is probably a better way than the 2 tables that I am using - maybe a single table that gets updated instead of a separate table? - or a view that collects only the most recent values from `match-status`? - [x] Ensure that the index page has each study show up exactly once. - [x] Work through the data - [x] Figure out how to keep the data consistent for analysis steps. Consider exporting backups?
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

I adjusted the schema for a simpler workflow. Everything is working fine. I will get the data together and then export a backup.

277b5b9

I adjusted the schema for a simpler workflow. Everything is working fine. I will get the data together and then export a backup. 277b5b9
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Need to download the full data set so I can start setting things up for matching trials to causes.

Need to download the full data set so I can start setting things up for matching trials to causes.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Have the linking script in the repo as of 9a718f7.

Have the linking script in the repo as of 9a718f7.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Used the estimated values and matched each trial to the most specific population measure they had.

Used the estimated values and matched each trial to the most specific population measure they had.
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: youainti/ClinicalTrialsDataProcessing#36
Loading…
There is no content yet.