Link Drugs on Market to RxNorm #31

Closed
opened 3 years ago by youainti · 24 comments
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Take two files, rxnorm mapping and Comprehensive NDC_SPL data elements file to link together.

  • Build Schema for files
  • Make script to upload files
  • build view to join as needed.
Take two files, rxnorm mapping and Comprehensive NDC_SPL data elements file to link together. - [x] Build Schema for files - [x] Make script to upload files - [x] build view to join as needed.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Get rxnorm mappings from https://dailymed-data.nlm.nih.gov/public-release-files/rxnorm_mappings.zip and unzip to get the file rxnorm_mappings.txt

Get rxnorm mappings from https://dailymed-data.nlm.nih.gov/public-release-files/rxnorm_mappings.zip and unzip to get the file `rxnorm_mappings.txt`
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

rxnorm mappings use rxnorm term types rxtty https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix5.html

rxnorm mappings use rxnorm term types rxtty https://www.nlm.nih.gov/research/umls/rxnorm/docs/appendix5.html
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)
the comprehensive NDC_SPL data elements can be found at the bottom of https://open.fda.gov/data/downloads/ The links I used were https://download.open.fda.gov/other/nsde/other-nsde-0001-of-0003.json.zip https://download.open.fda.gov/other/nsde/other-nsde-0002-of-0003.json.zip https://download.open.fda.gov/other/nsde/other-nsde-0003-of-0003.json.zip
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

266c1c9 has a working download and inserter.
Bug exists where start and end dates seem backwards.

266c1c9 has a working download and inserter. Bug exists where start and end dates seem backwards.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

those json files are also available as a single csv at
https://www.fda.gov/industry/structured-product-labeling-resources/nsde

those json files are also available as a single csv at https://www.fda.gov/industry/structured-product-labeling-resources/nsde
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

https://lhncbc.nlm.nih.gov/RxNav/applications/RxNav-in-a-Box.html

Docker container to run rxnorm and rxclass locally

https://lhncbc.nlm.nih.gov/RxNav/applications/RxNav-in-a-Box.html Docker container to run rxnorm and rxclass locally
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

I setup RXnavinabox and it is stored on
[/media/will/2397706f-a8f8-4763-ac3f-8a9c07e6101f/Data/rxnav-in-a-box-20230103
]

I feel like I need to integrate it into setup.

I setup RXnavinabox and it is stored on [/media/will/2397706f-a8f8-4763-ac3f-8a9c07e6101f/Data/rxnav-in-a-box-20230103 ] I feel like I need to integrate it into setup.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

insstead of using teh RxNorm Mappings, I am going to use the RxNavInABox api to link drugs (as listed as mesh-terms in ctgov.browse_interventions) to RxCUI's.

General process

  1. Get compounds from ctgov.browse_interventions (see #30 for details).
  2. Search for RxCUI on RxNav and get them for the PIN/IN/MIN
  3. Get the brand names associated with step 2.
  4. Pull class information from RxMix
  5. Match Brand name against spl.nsde to get marketing dates.
insstead of using teh RxNorm Mappings, I am going to use the RxNavInABox api to link drugs (as listed as mesh-terms in ctgov.browse_interventions) to RxCUI's. General process 1. Get compounds from ctgov.browse_interventions (see #30 for details). 1. Search for RxCUI on RxNav and get them for the PIN/IN/MIN 1. Get the brand names associated with step 2. 1. Pull class information from RxMix 1. Match Brand name against spl.nsde to get marketing dates.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

List of steps

  • Build into a set of tickets
List of steps - [ ] Build into a set of tickets
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

I have got some basic api requests working as of 4cc4c5c99f.

Here are some approaches I am considering

Note that by storing the original datasource the analysis becomes repeatable.

Using APIs

Pro

  • Doesn't require someone to actually download the original data, as they can hit the official APIs.
  • Very minimal work integrating tables on my side
  • Matching ctgov.browse_interventions's mesh terms is automatic and reasonably justifiable.

Cons

  • Requires putting custom tables in postgres
  • Will require a multi-stage download-insert-query process as I select appropriate data

Importing

Pro

  • tables already in postgres once done.
  • Allows for more natural queries

Con

  • extra data in postgres
  • need to write import tools myself

Mixed

I could just use the API to get the rxcui and then interact with the db as necessary

I have got some basic api requests working as of 4cc4c5c99f12. Here are some approaches I am considering - move data from [mysql to postgres](https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#MySQL) - just use the api and build tables in postgres Note that by storing the original datasource the analysis becomes repeatable. ## Using APIs ### Pro - Doesn't require someone to actually download the original data, as they can hit the official APIs. - Very minimal work integrating tables on my side - Matching `ctgov.browse_interventions`'s mesh terms is automatic and reasonably justifiable. ### Cons - Requires putting custom tables in postgres - Will require a multi-stage download-insert-query process as I select appropriate data ## Importing ### Pro - tables already in postgres once done. - Allows for more natural queries - ### Con - extra data in postgres - need to write import tools myself ## Mixed I could just use the API to get the rxcui and then interact with the db as necessary
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

So the following has information on primary keys in the database and information on how to query it https://www.nlm.nih.gov/research/umls/rxnorm/docs/techdoc.html#s14_0

So the following has information on primary keys in the database and information on how to query it https://www.nlm.nih.gov/research/umls/rxnorm/docs/techdoc.html#s14_0
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Consider the three following queries against the api for different RxCUIs related to levothyroxine.

Consider the three following queries against the api for different RxCUIs related to levothyroxine. - Levothyroxine (IN) http://will-office:4000/REST/rxcui/10582/property.json?propName=TTY - Levo-T (BN) http://will-office:4000/REST/rxcui/217996/property.json?propName=TTY - Levothyroxine Sodium (PIN) http://will-office:4000/REST/rxcui/40144/property.json?propName=TTY
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)
SELECT * FROM http.download_status 
where status = 'Downloaded'
limit 100;

--choose NCT00789633

select * from ctgov.browse_interventions
where nct_id = 'NCT00789633';

--choose gemcitabine

get rxcui that might apply

http://will-office:4000/REST/rxcui.json?name=gemcitabine&search=1&srclist=rxnorm

{"idGroup":{"rxnormId":["12574","236234"]}}

Scanning rxnav, those are IN and PIN respectively.

To find the tty throught the api, I should search

http://will-office:4000/REST/rxcui/12574/property.json?propName=TTY

{"propConceptGroup":{"propConcept":[{"propCategory":"ATTRIBUTES","propName":"TTY","propValue":"IN"}]}}

http://will-office:4000/REST/rxcui/236234/property.json?propName=TTY

{"propConceptGroup":{"propConcept":[{"propCategory":"ATTRIBUTES","propName":"TTY","propValue":"PIN"}]}}

A similar query that provides other identifiers
http://will-office:4000/REST/rxcui/12574/allProperties?prop=attributes+codes

I can now search for associated brand names

http://will-office:4000/REST/rxcui/12574/related.json?tty=BN
http://will-office:4000/REST/rxcui/236234/related.json?tty=BN

Note how the IN has brands but the PIN doesn't.

These can ?probably? be matched aganst the marketing info. If not, i believe there is an NDC or something similar that can be used.

```sql SELECT * FROM http.download_status where status = 'Downloaded' limit 100; --choose NCT00789633 select * from ctgov.browse_interventions where nct_id = 'NCT00789633'; --choose gemcitabine ``` get rxcui that might apply http://will-office:4000/REST/rxcui.json?name=gemcitabine&search=1&srclist=rxnorm > {"idGroup":{"rxnormId":["12574","236234"]}} Scanning rxnav, those are IN and PIN respectively. To find the tty throught the api, I should search http://will-office:4000/REST/rxcui/12574/property.json?propName=TTY > {"propConceptGroup":{"propConcept":[{"propCategory":"ATTRIBUTES","propName":"TTY","propValue":"IN"}]}} http://will-office:4000/REST/rxcui/236234/property.json?propName=TTY > {"propConceptGroup":{"propConcept":[{"propCategory":"ATTRIBUTES","propName":"TTY","propValue":"PIN"}]}} A similar query that provides other identifiers http://will-office:4000/REST/rxcui/12574/allProperties?prop=attributes+codes I can now search for associated brand names http://will-office:4000/REST/rxcui/12574/related.json?tty=BN http://will-office:4000/REST/rxcui/236234/related.json?tty=BN Note how the IN has brands but the PIN doesn't. These can ?probably? be matched aganst the marketing info. If not, i believe there is an NDC or something similar that can be used.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Need to make sure these queries limit to human prescription drugs. For example, toggling human and prescription on http://will-office:4000/RxNav/search?searchBy=RXCUI&searchTerm=10582 changes from 10 to 8 brands.

Need to make sure these queries limit to human prescription drugs. For example, toggling human and prescription on http://will-office:4000/RxNav/search?searchBy=RXCUI&searchTerm=10582 changes from 10 to 8 brands.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)
-- this is the set of rxnav strings that I could possibly search.
SELECT * FROM rxnorm_current.rxnorm_props
where 
	propname='RXNAV_STR'
    and 
    propvalue1 in ('gemcitabine', 'ranibizumab','bevacizumab')
;

So I can link on this table.

```sql -- this is the set of rxnav strings that I could possibly search. SELECT * FROM rxnorm_current.rxnorm_props where propname='RXNAV_STR' and propvalue1 in ('gemcitabine', 'ranibizumab','bevacizumab') ; ``` So I can link on this table.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

It appears that the best option is to import the appropriate data into postgres. the primary reason is that linking will probably go better.

I think I am going to do this manually the first time using mysql workbench to export the tables and then manually upload them to postgres.

It appears that the best option is to import the appropriate data into postgres. the primary reason is that linking will probably go better. I think I am going to do this manually the first time using mysql workbench to export the tables and then manually upload them to postgres.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

IN 97af862 I created a tool to migrate tables from the mariadb backing RxNavInABox to postgres. I also created a view that links trials to drugs as measured by rxcui's.

IN 97af862 I created a tool to migrate tables from the mariadb backing RxNavInABox to postgres. I also created a view that links trials to drugs as measured by rxcui's.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

At this point I can match trials to available brands. Now I need to link brands to marketing dates.

select * from spl.nsde n 
where marketing_category in ('NDA', 'ANDA');

select * from rxnorm_migrated.match_drugs_to_trials mdtt ;

I am wondering if I can grab (instead of brand names BN) the ndcs or package ndcs associated with the brands. If so then I can do a direct match of some sort.

At this point I can match trials to available brands. Now I need to link brands to marketing dates. ```sql select * from spl.nsde n where marketing_category in ('NDA', 'ANDA'); select * from rxnorm_migrated.match_drugs_to_trials mdtt ; ``` I am wondering if I can grab (instead of brand names `BN`) the ndcs or package ndcs associated with the brands. If so then I can do a direct match of some sort.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Figured out what to do.

If I grab every brand, then every branded drug or pack associated with that drug and then attach that to the nsde data I would get the marketing dates required.

trial -> mesh_term -> IN/MIN (rxcui) -> BN (rxcui) -> SBD/BPCK (rxcui) -> ndc11 -> nsde (marketing dates)

I need to bring the correct tables into postgres but the date columns need ignored or converted.

Figured out what to do. If I grab every brand, then every branded drug or pack associated with that drug and then attach that to the nsde data I would get the marketing dates required. > trial -> mesh_term -> IN/MIN (rxcui) -> BN (rxcui) -> SBD/BPCK (rxcui) -> ndc11 -> nsde (marketing dates) I need to bring the correct tables into postgres but the date columns need ignored or converted.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Link trials to ingredients, brands, and packs

WITH 
    trialncts as (
        SELECT DISTINCT nct_id  FROM history.trial_snapshots TS
    )
SELECT 
    bi.nct_id ,
    bi.downcase_mesh_term, 
    rp.rxcui  AS drug_rxcui,
    rn.tty1,
    rn.rxcui1 as ingredient_rxcui, --ingredients
    rn.tty2 ,
    rn.rxcui2 as brand_or_pack_rxcui --brand or pack
FROM ctgov.browse_interventions bi
    LEFT OUTER JOIN rxnorm_migrated.rxnorm_props AS rp
        on bi.downcase_mesh_term  = rp.propvalue1  --Link drug ingredient
    LEFT OUTER JOIN rxnorm_migrated.rxnorm_relations AS rn
        on rp.rxcui = rn.rxcui1 --Grab brand names
WHERE
    bi.nct_id in (
        SELECT nct_id  FROM trialncts
        )
    and
    bi.mesh_type='mesh-list'
    /*and 
    rn.tty1 in ('IN','MIN')
    and 
    rn.tty2 = 'BN'*/
Link trials to ingredients, brands, and packs ```sql WITH trialncts as ( SELECT DISTINCT nct_id FROM history.trial_snapshots TS ) SELECT bi.nct_id , bi.downcase_mesh_term, rp.rxcui AS drug_rxcui, rn.tty1, rn.rxcui1 as ingredient_rxcui, --ingredients rn.tty2 , rn.rxcui2 as brand_or_pack_rxcui --brand or pack FROM ctgov.browse_interventions bi LEFT OUTER JOIN rxnorm_migrated.rxnorm_props AS rp on bi.downcase_mesh_term = rp.propvalue1 --Link drug ingredient LEFT OUTER JOIN rxnorm_migrated.rxnorm_relations AS rn on rp.rxcui = rn.rxcui1 --Grab brand names WHERE bi.nct_id in ( SELECT nct_id FROM trialncts ) and bi.mesh_type='mesh-list' /*and rn.tty1 in ('IN','MIN') and rn.tty2 = 'BN'*/ ```
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

SQL Error [53100]: ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp13033.6.sharedfileset/i14324of65536.p0.0": No space left on device

Sounds like I am running into some issues with the database. I think I need to restart everything from scratch with the docker containers storing things on my 4tb drive.

> SQL Error [53100]: ERROR: could not write to file "base/pgsql_tmp/pgsql_tmp13033.6.sharedfileset/i14324of65536.p0.0": No space left on device Sounds like I am running into some issues with the database. I think I need to restart everything from scratch with the docker containers storing things on my 4tb drive.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Turns out the error is related to the size of shared memory in the container. I upped it from 64mib to 1gib.

Turns out the error is related to the size of shared memory in the container. I upped it from 64mib to 1gib.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

I have figured out that some of these are just going to take a long time to run.
I am trying to break them down into simpler tables/temp tables.

The current plan is just to get the earliest date that any brand containing all the ingredients was sold.

I have figured out that some of these are just going to take a long time to run. I am trying to break them down into simpler tables/temp tables. The current plan is just to get the earliest date that any brand containing all the ingredients was sold.
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Linked trials to marketing start dates with the views created in Associating NCTs to NDCs and marketing dates

Note that I only linked to start dates, not the number of currently active brands.

Linked trials to marketing start dates with the views created in [Associating NCTs to NDCs and marketing dates](https://gitea.kgjk.icu/Research/ClinicalTrialsDataProcessing/src/commit/2ec314180f903cb94a89cb970ab9721db08927ac/development_sql/ASSOICATING%20NCTIDs%20to%20NDCs%20and%20Marketing%20dates.sql) Note that I only linked to start dates, not the number of currently active brands.
Sign in to join this conversation.
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: youainti/ClinicalTrialsDataProcessing#31
Loading…
There is no content yet.