Match drugs to alternatives to get count of competitors. #40

Open
opened 3 years ago by youainti · 22 comments
youainti commented 3 years ago (Migrated from gitea.kgjk.icu)

Increase the number of drugs by matching using WHO ATC codes.

This data exists in the RxNav set.

Increase the number of drugs by matching using WHO ATC codes. This data exists in the RxNav set.

Notes on using ATC:

  • https://www.whocc.no/atc/structure_and_principles/
  • The ATC groups drugs (as identified by chemicals/compounds) in a hierarchy with 5 levels. These classifications fall into the group of
    1. Anatomical or Pharmacological group
    2. Pharmacological or theraputic group
    3. Chemical, pharmacological or theraputic subgroups
    4. Chemical, pharmacological or theraputic subgroups
    5. Chemical Substance

I need to choose a level to describe as "alternatives", but that isn't necessarily consistent. For example, in (J01), level 3 splits into various classes of antibacterials, by what appears to be common lineage (pharmacological grouping?). In contrast, for (S01), splits are between general areas of interests (Theraputic grouping?) and level 4 splits between antibiotics and antivirals.

This makes groupings for my purposes (finding competing drugs) difficult.

Notes on using ATC: - https://www.whocc.no/atc/structure_and_principles/ - The ATC groups drugs (as identified by chemicals/compounds) in a hierarchy with 5 levels. These classifications fall into the group of 1. Anatomical or Pharmacological group 1. Pharmacological or theraputic group 1. Chemical, pharmacological or theraputic subgroups 1. Chemical, pharmacological or theraputic subgroups 1. Chemical Substance I need to choose a level to describe as "alternatives", but that isn't necessarily consistent. For example, in (J01), level 3 splits into various classes of antibacterials, by what appears to be common lineage (pharmacological grouping?). In contrast, for (S01), splits are between general areas of interests (Theraputic grouping?) and level 4 splits between antibiotics and antivirals. This makes groupings for my purposes (finding competing drugs) difficult.

USP classifications seem to make more sense

Human readable discussion of drug classes: https://www.verywellhealth.com/drug-classes-1123991

Therapeutic categories.
https://www.fda.gov/regulatory-information/fdaaa-implementation-chart/usp-therapeutic-categories-model-guidelines

I think this might be a decent alternative. In particular, the Pharmacological Classes (2nd level) would probably be useful.

USP classifications seem to make more sense Human readable discussion of drug classes: https://www.verywellhealth.com/drug-classes-1123991 Therapeutic categories. https://www.fda.gov/regulatory-information/fdaaa-implementation-chart/usp-therapeutic-categories-model-guidelines I think this might be a decent alternative. In particular, the Pharmacological Classes (2nd level) would probably be useful.

I just confirmed that USP is one of the properties held in rxnorm and the list of 3600+ properties can be found using

select count(*) from rxnorm_migrated.rxnorm_props rp 
where propname = 'USP'

in my migrated data.

I just confirmed that USP is one of the properties held in rxnorm and the list of 3600+ properties can be found using ```sql select count(*) from rxnorm_migrated.rxnorm_props rp where propname = 'USP' ``` in my migrated data.
youainti changed title from Match drugs to others by WHO ATC codes to Match drugs to alternatives to get count of competitors. 3 years ago

Updated name to represent the goal better.

Updated name to represent the goal better.
https://www.nlm.nih.gov/research/umls/rxnorm/sourcereleasedocs/usp.html

https://www.nlm.nih.gov/pubs/techbull/so18/brief/so18_rxnorm_usp.html

RxNorm Adds USP Compendial Nomenclature Data Source. NLM Tech Bull. 2018 Sep-Oct;(424):b5.
2018 September 06 [posted]
The National Library of Medicine (NLM) is pleased to announce the addition of United States Pharmacopeia (USP) Compendial Nomenclature as a new data source in RxNorm.
The addition of USP data to RxNorm helps support naming accuracy for drug substances in electronic environments. Integrating USP Compendial Nomenclature into RxNorm will help reduce nomenclature errors, particularly for substances with different salt forms (e.g. diclofenac potassium and diclofenac sodium). USP content will be integrated into RxNorm using a phased approach, beginning with substances, capsules, and tablets for the September release. USP Compendial Nomenclature is maintained and developed by the US Pharmacopeial Convention.

Based on this, it appears that only the nomenclature is included in RxNorm, while the classifications (including the pharmaceutical/therapeutic classifications) are found in the USP Drug Classification system (USP-DC)

https://www.nlm.nih.gov/pubs/techbull/so18/brief/so18_rxnorm_usp.html > RxNorm Adds USP Compendial Nomenclature Data Source. NLM Tech Bull. 2018 Sep-Oct;(424):b5. 2018 September 06 [posted] The National Library of Medicine (NLM) is pleased to announce the addition of United States Pharmacopeia (USP) Compendial Nomenclature as a new data source in RxNorm. The addition of USP data to RxNorm helps support naming accuracy for drug substances in electronic environments. Integrating USP Compendial Nomenclature into RxNorm will help reduce nomenclature errors, particularly for substances with different salt forms (e.g. diclofenac potassium and diclofenac sodium). USP content will be integrated into RxNorm using a phased approach, beginning with substances, capsules, and tablets for the September release. USP Compendial Nomenclature is maintained and developed by the US Pharmacopeial Convention. Based on this, it appears that only the nomenclature is included in RxNorm, while the classifications (including the pharmaceutical/therapeutic classifications) are found in the [USP Drug Classification system (USP-DC)](https://www.usp.org/health-quality-safety/usp-drug-classification-system)
  • Request data (I keep having issues where the site is intermittently available)
- [x] Request data (I keep having issues where the site is intermittently available)

I just confirmed that USP is one of the properties held in rxnorm and the list of 3600+ properties can be found using

select count(*) from rxnorm_migrated.rxnorm_props rp 
where propname = 'USP'

in my migrated data.

https://www.nlm.nih.gov/pubs/techbull/so18/brief/so18_rxnorm_usp.html

RxNorm Adds USP Compendial Nomenclature Data Source. NLM Tech Bull. 2018 Sep-Oct;(424):b5.
2018 September 06 [posted]
The National Library of Medicine (NLM) is pleased to announce the addition of United States Pharmacopeia (USP) Compendial Nomenclature as a new data source in RxNorm.
The addition of USP data to RxNorm helps support naming accuracy for drug substances in electronic environments. Integrating USP Compendial Nomenclature into RxNorm will help reduce nomenclature errors, particularly for substances with different salt forms (e.g. diclofenac potassium and diclofenac sodium). USP content will be integrated into RxNorm using a phased approach, beginning with substances, capsules, and tablets for the September release. USP Compendial Nomenclature is maintained and developed by the US Pharmacopeial Convention.

Based on this, it appears that only the nomenclature is included in RxNorm, while the classifications (including the pharmaceutical/therapeutic classifications) are found in the USP Drug Classification system (USP-DC)

The main issue here is that RxNorm doesn't have the actual hierarchy/classification in it. Instead it has just the mappings from RXCUIs to USP compendial nomenclature codes. I need a mapping of USP CN codes to pharmacological class. Here are the steps I need to take:

  • Request USP DC and alignment tables.
  • Ensure licensing etc.
  • Link ingredients to USP CN codes.
  • Further link USP CN codes to Pharmacological classes.
  • Find other ingredients within the Pharmacological classes.
  • Get counts of brand names etc. associated with the ingredients within a pharmacological class.
> I just confirmed that USP is one of the properties held in rxnorm and the list of 3600+ properties can be found using > > ```sql > select count(*) from rxnorm_migrated.rxnorm_props rp > where propname = 'USP' > ``` > > in my migrated data. > https://www.nlm.nih.gov/pubs/techbull/so18/brief/so18_rxnorm_usp.html > > > RxNorm Adds USP Compendial Nomenclature Data Source. NLM Tech Bull. 2018 Sep-Oct;(424):b5. > 2018 September 06 [posted] > The National Library of Medicine (NLM) is pleased to announce the addition of United States Pharmacopeia (USP) Compendial Nomenclature as a new data source in RxNorm. > The addition of USP data to RxNorm helps support naming accuracy for drug substances in electronic environments. Integrating USP Compendial Nomenclature into RxNorm will help reduce nomenclature errors, particularly for substances with different salt forms (e.g. diclofenac potassium and diclofenac sodium). USP content will be integrated into RxNorm using a phased approach, beginning with substances, capsules, and tablets for the September release. USP Compendial Nomenclature is maintained and developed by the US Pharmacopeial Convention. > > Based on this, it appears that only the nomenclature is included in RxNorm, while the classifications (including the pharmaceutical/therapeutic classifications) are found in the [USP Drug Classification system (USP-DC)](https://www.usp.org/health-quality-safety/usp-drug-classification-system) The main issue here is that RxNorm doesn't have the actual hierarchy/classification in it. Instead it has just the mappings from RXCUIs to USP compendial nomenclature codes. I need a mapping of USP CN codes to pharmacological class. Here are the steps I need to take: - [x] Request USP DC and alignment tables. - [x] Ensure licensing etc. - [ ] Link ingredients to USP CN codes. - [ ] Further link USP CN codes to Pharmacological classes. - [ ] Find other ingredients within the Pharmacological classes. - [ ] Get counts of brand names etc. associated with the ingredients within a pharmacological class.

Another option is to use the Vetrans Affairs formulary and classes.

https://www.pbm.va.gov/nationalformulary.asp

Another option is to use the Vetrans Affairs formulary and classes. https://www.pbm.va.gov/nationalformulary.asp

As I've been thinking about how to set this up, I've been struck by the thought that using formularies to measure market competition is probably appropriate because those are the "approved competitors", as formularies determine what insurance will cover.

Some formularies or pharmaceutical classes I can probably get a hold of.

  • Medicare or Medicaid
  • USP MMG
  • USP DC
  • Veterans Affairs

I am not sure how to include those in my model, but they are important.
Probably as a set of correlated coefficients.
I should probably start with the VA formulary.

As I've been thinking about how to set this up, I've been struck by the thought that using formularies to measure market competition is probably appropriate because those are the "approved competitors", as formularies determine what insurance will cover. Some formularies or pharmaceutical classes I can probably get a hold of. - Medicare or Medicaid - USP MMG - USP DC - Veterans Affairs I am not sure how to include those in my model, but they are important. Probably as a set of correlated coefficients. I should probably start with the VA formulary.

I have requested the USP DC info.

I have requested the USP DC info.

Got the USP DC data from 2022 and 2023.

Also have the VA national formulary for 2023.

Got the USP DC data from 2022 and 2023. Also have the VA national formulary for 2023.

There was no discussion of licensing for USP or VA. I assume they are publicly available at this point.

There was no discussion of licensing for USP or VA. I assume they are publicly available at this point.

So in the RxNorm aligned file from USP DC 2023, all drugs have USP categories, and most have USP Classes (but not all). I think I will match on both Class and Category.

To put this into the database, I will create three tables.
Two will represent the categories and classes.
The third will link RxCUIs to those categories and classes.

Then I will create views that:

  • Link an RXCUI to drugs in the USP category
  • link an RXCUI to drugs in the same USP class
So in the RxNorm aligned file from USP DC 2023, all drugs have USP categories, and most have USP Classes (but not all). I think I will match on both Class and Category. To put this into the database, I will create three tables. Two will represent the categories and classes. The third will link RxCUIs to those categories and classes. Then I will create views that: - Link an RXCUI to drugs in the USP category - link an RXCUI to drugs in the same USP class

For the VA data, the national drug file extract with ndc has both VA class code and the va class name. This appears to correspond with the class code list I have.

I have the following options in front of me:

  • Use the NDC extract to link drug NDCs to classes. Then I would find NDCs that contain the ingredients of interest, then the classes they belong to, and then match that back to NCDs, then to brand names.
  • Use the national formulary and link ingredients to RxCUIs (manually I think) and then connect ingredient RxCUIs to the formulary and the VA classes.

After looking at these two, I think I want to do the former.

For the VA data, the national drug file extract with ndc has both VA class code and the va class name. This appears to correspond with the class code list I have. I have the following options in front of me: - Use the NDC extract to link drug NDCs to classes. Then I would find NDCs that contain the ingredients of interest, then the classes they belong to, and then match that back to NCDs, then to brand names. - Use the national formulary and link ingredients to RxCUIs (manually I think) and then connect ingredient RxCUIs to the formulary and the VA classes. After looking at these two, I think I want to do the former.

USP and VA data uploaded in 1c3d749.

USP and VA data uploaded in 1c3d749.

A final dataset for formularies is the USP MMG, which is a base from which Medicare part D providers can start.

A final dataset for formularies is the USP MMG, which is a base from which Medicare part D providers can start.

I will need to figure out a way to combine these formularies into a single measure.

  1. I am thinking that I would take the max count of brands across the three formularies.
    That max operator will have a big impact if one of them is matched poorly though.
  2. Other options include just including each formulary with their own parameters.

If one of the formulary classes is usually poorly matched, then that should generally come to be disregarded as non-informative.
That would be an issue in and of itself because it affects inference.

I think the best option is the second one (include each formulary separately).

I will need to figure out a way to combine these formularies into a single measure. 1. I am thinking that I would take the max count of brands across the three formularies. That max operator will have a big impact if one of them is matched poorly though. 2. Other options include just including each formulary with their own parameters. If one of the formulary classes is usually poorly matched, then that should generally come to be disregarded as non-informative. That would be an issue in and of itself because it affects inference. I think the best option is the second one (include each formulary separately).

Potential issues

A potential issue with the matching is if a drug belongs to multiple classes, then it is possible we would pull in non-competitors.

  • The way to address this would be to do the matching and then verify which one matches the trial the best. If none of the classes match well, then we would flag it as needs review, and then manually match to a class. This would ensure correct matching between trial and class (and thus to competitors)

A potential issue is that a trial compound doesn't show up in any of the formularies (because formularies are not exhaustive of drugs used).

  • The way to address this is to match it manually to categories or classes in each formulary.
# Potential issues A potential issue with the matching is if a drug belongs to multiple classes, then it is possible we would pull in non-competitors. - The way to address this would be to do the matching and then verify which one matches the trial the best. If none of the classes match well, then we would flag it as needs review, and then manually match to a class. This would ensure correct matching between trial and class (and thus to competitors) A potential issue is that a trial compound doesn't show up in any of the formularies (because formularies are not exhaustive of drugs used). - The way to address this is to match it manually to categories or classes in each formulary.

Added the USP MMG data in 6a931b3. Main file of interest is the alignment file.

Note that this is version 8, which is current up to 2023-09-15, 8 days from now. That is when version 9 should be published.

Added the USP MMG data in 6a931b3. Main file of interest is the alignment file. Note that this is version 8, which is current up to 2023-09-15, 8 days from now. That is when version 9 should be published.

Just imported USP DC, USP MMG (V8), and VA data into the database. This was done using the import function from DBeaver community edition.

Now I need to focus on linking drugs from trials to the different classes.

Just imported USP DC, USP MMG (V8), and VA data into the database. This was done using the import function from DBeaver community edition. Now I need to focus on linking drugs from trials to the different classes.

One identified issue. Based on:

select "USP Category", count(distinct "USP Class")
from "Formularies".usp_dc ud
group by "USP Category" 
;

select "USP Class" ,count(rxcui) from "Formularies".usp_dc ud
group by "USP Class"
order by "USP Class" 
; 

some categories hold drugs that don't fall into other classes.
I am not sure how to handle those drugs.
My first inclination is to decide that if a drug in a category is in one of the "No USP Class, $CATEGORY" classes, then it should not be treated as having no competitors.
I don't know what needs to be true in order to justify this.

One identified issue. Based on: ```sql select "USP Category", count(distinct "USP Class") from "Formularies".usp_dc ud group by "USP Category" ; select "USP Class" ,count(rxcui) from "Formularies".usp_dc ud group by "USP Class" order by "USP Class" ; ``` some categories hold drugs that don't fall into other classes. I am not sure how to handle those drugs. My first inclination is to decide that if a drug in a category is in one of the "No USP Class, $CATEGORY" classes, then it should not be treated as having no competitors. I don't know what needs to be true in order to justify this.
Sign in to join this conversation.
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: youainti/ClinicalTrialsDataProcessing#40
Loading…
There is no content yet.