Update 'Thinklab notes How should we construct a catalog of drug indications?'

master
Will King 3 years ago
parent 15997ca59c
commit 98e6c88798

@ -1,25 +1,24 @@
Daniel Himmelstein Researcher Jan. 14, 2015 Daniel Himmelstein Researcher Jan. 14, 2015
We are looking to construct a catalog of indications (efficacious drug-disease pairs) with the following attributes (ordered by importance): We are looking to construct a catalog of indications (efficacious drug-disease pairs) with the following attributes (ordered by importance):
automated and high-throughput construction - automated and high-throughput construction
high-quality, or varying levels of quality as long as quality level is annotated -high-quality, or varying levels of quality as long as quality level is annotated
comprehensive -comprehensive
disease modifying rather than symptomatic -disease modifying rather than symptomatic
compounds which map to pubchem -compounds which map to pubchem
contraindications and adverse effects are excluded and cataloged separately -contraindications and adverse effects are excluded and cataloged separately
diseases which map to the disease ontology -diseases which map to the disease ontology
source is retrievable -source is retrievable
A few options we can consider: A few options we can consider:
LabeledIn — Curators manually identified indications from drug labels for 250 human prescription ingredients (drugs) [1]. -LabeledIn — Curators manually identified indications from drug labels for 250 human prescription ingredients (drugs) [1].
MEDI — Indications extracted from RxNorm, SIDER 2, MedlinePlus, and Wikipedia were integrated into a single resource. The high-precision subset (indications in RxNorm or two other resources) includes 13,304 unique indications for 2,136 medications [2]. Further work added indication prevalence information [3]. MEDI compares favorably to SemRep for extracting indications from clinical text [4]. -MEDI — Indications extracted from RxNorm, SIDER 2, MedlinePlus, and Wikipedia were integrated into a single resource. The high-precision subset (indications in RxNorm or two other resources) includes 13,304 unique indications for 2,136 medications [2]. Further work added indication prevalence information [3]. MEDI compares favorably to SemRep for extracting indications from clinical text [4].
SemRep — "SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text" [5]. SemRep has been used to extract TREAT relations from MeSH scope notes, Daily Med, DrugBank, and AHFS Consumer Medication Information [6]. SemRep has also been used to identify TREAT relations from Medline abstracts [7]. A project called SemMedDB provides the SemRep results from mining PubMed [8]. -SemRep — "SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text" [5]. SemRep has been used to extract TREAT relations from MeSH scope notes, Daily Med, DrugBank, and AHFS Consumer Medication Information [6]. SemRep has also been used to identify TREAT relations from Medline abstracts [7]. A project called SemMedDB provides the SemRep results from mining PubMed [8].
SPL-X — Structured Product Labels eXtractor — Using MetaMap, this project extracted indications from DailyMed drug labels that were available as XML [9]. Data does not appear to be available. -SPL-X — Structured Product Labels eXtractor — Using MetaMap, this project extracted indications from DailyMed drug labels that were available as XML [9]. Data does not appear to be available.
Comparative Toxicogenomics Database [10] — Manual literature curators annotated drug-disease pairs as 'therapeutic'. The resource is extensive (the 'therapeutic' threshold was low) but incomplete. -Comparative Toxicogenomics Database [10] — Manual literature curators annotated drug-disease pairs as 'therapeutic'. The resource is extensive (the 'therapeutic' threshold was low) but incomplete.
SIDER 2 — In addition to extracting side effects from drug labels, SIDER also extracts indications [11]. Since the approach is automated, some side effects may be extracted as indications and vice versa. This approach would only provide information for drugs with labels from the US FDA or Canada. -SIDER 2 — In addition to extracting side effects from drug labels, SIDER also extracts indications [11]. Since the approach is automated, some side effects may be extracted as indications and vice versa. This approach would only provide information for drugs with labels from the US FDA or Canada.
Any additional resources or suggestions? Any additional resources or suggestions?
@ -30,6 +29,11 @@ Any additional resources or suggestions?
Jesse Spaulding: The markdown error has been fixed. Jesse Spaulding: The markdown error has been fixed.
-----
Daniel Himmelstein Researcher April 1, 2015 Daniel Himmelstein Researcher April 1, 2015
The initial LabeledIn [1] resource used expert curators. The team behind this project tested crowdsourced curation using Amazon Mechanical Turk workers [2]. They found the majority vote of workers on whether a disease within a label was an indication had a high accuracy (96%). The initial LabeledIn [1] resource used expert curators. The team behind this project tested crowdsourced curation using Amazon Mechanical Turk workers [2]. They found the majority vote of workers on whether a disease within a label was an indication had a high accuracy (96%).
@ -37,11 +41,19 @@ The initial LabeledIn [1] resource used expert curators. The team behind this pr
They assessed 3004 indications not already in LabeledIn corresponding to 706 new drug labels. We are looking to increase the coverage of the initial LabeledIn dataset by adding these crowdsourced indications. They assessed 3004 indications not already in LabeledIn corresponding to 706 new drug labels. We are looking to increase the coverage of the initial LabeledIn dataset by adding these crowdsourced indications.
-----
Benjamin Good April 3, 2015 Benjamin Good April 3, 2015
This dataset might be worth looking into. Drug-indication links captured from physicians in an EHR system [1] . Data appears to be available - though its in a 200+ page PDF! http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422843/bin/amiajnl-2012-000852-s1.pdf(I'm sure that was a journal requirement). This dataset might be worth looking into. Drug-indication links captured from physicians in an EHR system [1] . Data appears to be available - though its in a 200+ page PDF! http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422843/bin/amiajnl-2012-000852-s1.pdf(I'm sure that was a journal requirement).
-----
Daniel Himmelstein Researcher April 3, 2015 Daniel Himmelstein Researcher April 3, 2015
Hey @b_good, thanks for the suggestion [1] and tracking down the data supplement, which I cannot find on the article's JAMIA page. Hereon, I will refer to this resource as ehrlink, unless anyone can find a previously-used or author-preferred nickname. Hey @b_good, thanks for the suggestion [1] and tracking down the data supplement, which I cannot find on the article's JAMIA page. Hereon, I will refer to this resource as ehrlink, unless anyone can find a previously-used or author-preferred nickname.
@ -51,6 +63,10 @@ This resource is noteworthy because it will capture off-label usages better than
I converted the pdf file into a tsv file, which can be downloaded here. I converted the pdf file into a tsv file, which can be downloaded here.
-----
Benjamin Good April 7, 2015 Benjamin Good April 7, 2015
You can access SemRep extracted semantic relations (e.g. treats, causes) based on all PubMed abstracts (updated bi-annually) via the semantic medline database. With a UMLS login, you can get the complete MySQL dump via http://skr3.nlm.nih.gov/SemMedDB/ . Main challenge here is in ensuring quality (as with any NLP output). You can access SemRep extracted semantic relations (e.g. treats, causes) based on all PubMed abstracts (updated bi-annually) via the semantic medline database. With a UMLS login, you can get the complete MySQL dump via http://skr3.nlm.nih.gov/SemMedDB/ . Main challenge here is in ensuring quality (as with any NLP output).
@ -58,6 +74,10 @@ You can access SemRep extracted semantic relations (e.g. treats, causes) based o
Daniel Himmelstein: Added the reference to my initial post. Given the quality issues, I do not plan to include this resource in our gold standard set of indications. It could be helpful later as a literature-derived set of potential indications. Daniel Himmelstein: Added the reference to my initial post. Given the quality issues, I do not plan to include this resource in our gold standard set of indications. It could be helpful later as a literature-derived set of potential indications.
-----
Daniel Himmelstein Researcher April 8, 2015 Daniel Himmelstein Researcher April 8, 2015
ehrlink problem and medication vocabularies ehrlink problem and medication vocabularies
@ -78,6 +98,10 @@ medication_definition_id medication
My worry is that these identifiers may not correspond to a standardized vocabulary that we can access and easily map to. I will contact the authors for clarification. My worry is that these identifiers may not correspond to a standardized vocabulary that we can access and easily map to. I will contact the authors for clarification.
-----
Tudor Oprea April 8, 2015 Tudor Oprea April 8, 2015
Just to let you guys know that, at UNM, Oleg Ursu and I have been constructing such a catalog for nearly eight years. Just to let you guys know that, at UNM, Oleg Ursu and I have been constructing such a catalog for nearly eight years.
@ -93,6 +117,10 @@ A few pointers:
Daniel Himmelstein: +1 for grant agencies to fund this type of activity Daniel Himmelstein: +1 for grant agencies to fund this type of activity
-----
Allison McCoy April 8, 2015 Allison McCoy April 8, 2015
My colleagues and I have worked on multiple approaches to create this knowledge in the papers below: My colleagues and I have worked on multiple approaches to create this knowledge in the papers below:
@ -105,6 +133,10 @@ My colleagues and I have worked on multiple approaches to create this knowledge
In the JAMIA paper mentioned above, we used what we called a crowdsourcing approach to get this data. We have recently validated that approach at another site, and that publication is coming out in ACI soon. Unfortunately, in the original version, as you suspected, our medications and problems not mapped to any standardized terminology. The identifiers are local to the EHR, and while we have made some attempts to map them to RxNorm and SNOMED-CT, we were never able to get a really accurate set. However, the validation uses data from a different EHR, which I believe can be more easily mapped. Once the paper is out, I'll see if I can share that data. In the JAMIA paper mentioned above, we used what we called a crowdsourcing approach to get this data. We have recently validated that approach at another site, and that publication is coming out in ACI soon. Unfortunately, in the original version, as you suspected, our medications and problems not mapped to any standardized terminology. The identifiers are local to the EHR, and while we have made some attempts to map them to RxNorm and SNOMED-CT, we were never able to get a really accurate set. However, the validation uses data from a different EHR, which I believe can be more easily mapped. Once the paper is out, I'll see if I can share that data.
-----
Tudor Oprea April 8, 2015 Tudor Oprea April 8, 2015
I find crowdsourcing useful when you use a team of experts. So, for example, a carefully selected team of experts, when working on the same problem, can give surprisingly interesting feedback on an otherwise difficult problem. I find crowdsourcing useful when you use a team of experts. So, for example, a carefully selected team of experts, when working on the same problem, can give surprisingly interesting feedback on an otherwise difficult problem.
@ -116,16 +148,28 @@ See this paper http://pubs.acs.org/doi/abs/10.1021/ci400099q for details (mine i
With this in mid, I want to point out that crowdsourcing problem medication pairs by clinicians is an intriguing effort, and if the data is publicly available I would like to learn more. There are risks because a) verification of data entry was probably not done at the entry level (was the clinician familiar with both the drug and the disease?); b) the person determining the problem would require training in pharmacovigilance, understanding of known side-effects, etc. I assume you have done that, and that you compared the sets? I apologize that I do not have time to access your papers right now. With this in mid, I want to point out that crowdsourcing problem medication pairs by clinicians is an intriguing effort, and if the data is publicly available I would like to learn more. There are risks because a) verification of data entry was probably not done at the entry level (was the clinician familiar with both the drug and the disease?); b) the person determining the problem would require training in pharmacovigilance, understanding of known side-effects, etc. I assume you have done that, and that you compared the sets? I apologize that I do not have time to access your papers right now.
-----
Allison McCoy April 8, 2015 Allison McCoy April 8, 2015
To clarify the crowdsourcing approach, in our study the clinicians are completing the task because it is required during routine care, not solely for the purpose of creating a knowledge base. They are entering the data into the EHR because they are prescribing a medication to a patient and are often required to link it to one or more of the patient's problems for billing purposes. We did not ask them to do any additional work outside of their own routine clinical practice. To clarify the crowdsourcing approach, in our study the clinicians are completing the task because it is required during routine care, not solely for the purpose of creating a knowledge base. They are entering the data into the EHR because they are prescribing a medication to a patient and are often required to link it to one or more of the patient's problems for billing purposes. We did not ask them to do any additional work outside of their own routine clinical practice.
-----
Tudor Oprea April 8, 2015 Tudor Oprea April 8, 2015
thank you - was wondering about that. this does make their work more reliable. thank you - was wondering about that. this does make their work more reliable.
-----
Daniel Himmelstein Researcher April 9, 2015 Daniel Himmelstein Researcher April 9, 2015
My colleagues and I have worked on multiple approaches to create this knowledge in the papers [1, 2, 3] My colleagues and I have worked on multiple approaches to create this knowledge in the papers [1, 2, 3]
@ -137,6 +181,10 @@ In terms of the mappings from the aforementioned study [2], we still may be able
@TIOprea mentioned the difficulty of identifying disease-modifying indications, even in a carefully hand-curated database. @allisonmccoy, does your method favor disease-modifying links? For example, if modafinil were prescribed to treat MS-induced fatigue, would the clinicians link modafinil to multiple sclerosis or fatigue? @TIOprea mentioned the difficulty of identifying disease-modifying indications, even in a carefully hand-curated database. @allisonmccoy, does your method favor disease-modifying links? For example, if modafinil were prescribed to treat MS-induced fatigue, would the clinicians link modafinil to multiple sclerosis or fatigue?
-----
Allison McCoy April 9, 2015 Allison McCoy April 9, 2015
Did any of the other papers you highlighted release data that could add value here? Did any of the other papers you highlighted release data that could add value here?
@ -152,6 +200,10 @@ The 2nd reference uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely ava
It could be either, but more than likely it would be linked to MS, because that's what would be on the problem list already and easily linked during e-prescribing, but in our evaluation, we would have counted either as correct. We actually had a lot of discussion about this while doing the evaluations, because it did occur frequently. It could be either, but more than likely it would be linked to MS, because that's what would be on the problem list already and easily linked during e-prescribing, but in our evaluation, we would have counted either as correct. We actually had a lot of discussion about this while doing the evaluations, because it did occur frequently.
-----
Daniel Himmelstein Researcher April 9, 2015 Daniel Himmelstein Researcher April 9, 2015
@TIOprea, thanks for your insights. You touch on important points. In general our method may not require a perfect indication catalog to succeed, so I am hopeful despite the difficulties you mention. Specifically, @TIOprea, thanks for your insights. You touch on important points. In general our method may not require a perfect indication catalog to succeed, so I am hopeful despite the difficulties you mention. Specifically,
@ -173,6 +225,10 @@ Thanks for the perspective. We won't include these as part of our gold standard.
This I think will be the biggest difficulty. One option could be to exclude drugs that mostly treat symptoms. We noticed that drugs with many indications tended to be of this category. For multiple sclerosis, disease modifying is an established concept with currently 12 drugs. Unfortunately, the MS indications we've extracted from MEDI and LabeledIn are predominantly symptomatic. And to make matters worse, for most other diseases the DM status seems much more poorly defined. This I think will be the biggest difficulty. One option could be to exclude drugs that mostly treat symptoms. We noticed that drugs with many indications tended to be of this category. For multiple sclerosis, disease modifying is an established concept with currently 12 drugs. Unfortunately, the MS indications we've extracted from MEDI and LabeledIn are predominantly symptomatic. And to make matters worse, for most other diseases the DM status seems much more poorly defined.
-----
Daniel Himmelstein Researcher April 9, 2015 Daniel Himmelstein Researcher April 9, 2015
The 2nd reference [1] uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely available, so that knowledge base could easily be regenerated by another party. The 2nd reference [1] uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely available, so that knowledge base could easily be regenerated by another party.
@ -188,6 +244,10 @@ Then in the methods they state:
Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]? Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
-----
Allison McCoy April 9, 2015 Allison McCoy April 9, 2015
Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]? Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
@ -197,6 +257,10 @@ Allison McCoy April 9, 2015
Daniel Himmelstein: Thanks for the clarification. We also plan to perform some indication propagation on the Disease Ontology hierarchy. Daniel Himmelstein: Thanks for the clarification. We also plan to perform some indication propagation on the Disease Ontology hierarchy.
-----
Daniel Himmelstein Researcher April 21, 2015 Daniel Himmelstein Researcher April 21, 2015
PREDICT Indications PREDICT Indications
@ -215,6 +279,10 @@ We combined the supplementary datasets from the study to create a table of PREDI
Daniel Himmelstein: Fixed, thanks Daniel Himmelstein: Fixed, thanks
-----
Daniel Himmelstein Researcher April 21, 2015 Daniel Himmelstein Researcher April 21, 2015
Indication Set Indication Set
@ -236,6 +304,10 @@ Indication Links
We would still like a way to differentiate disease-modifying from symptomatic indications and will explore manually classifying a subset of indications and training a model. We would still like a way to differentiate disease-modifying from symptomatic indications and will explore manually classifying a subset of indications and training a model.
-----
Antoine Lizee April 29, 2015 Antoine Lizee April 29, 2015
Hi Daniel, Hi Daniel,
@ -263,6 +335,10 @@ Potential future directions:
Daniel Himmelstein: Awesome, you have set the train in motion for us to include ehrlink indications. Let's move discussion to this new thread specifically for ehrlink analysis. Daniel Himmelstein: Awesome, you have set the train in motion for us to include ehrlink indications. Let's move discussion to this new thread specifically for ehrlink analysis.
-----
Antoine Lizee May 3, 2015 Antoine Lizee May 3, 2015
UPDATE: UPDATE:
@ -281,6 +357,10 @@ I also created the QC file for the ambiguity resolution step.
SCD, SBD, SCDF, SBDF, BN, SCDC, SBDC, IN, MIN, PIN, GPCK, BPCK, SCDG, SBDG, DF, DFG SCD, SBD, SCDF, SBDF, BN, SCDC, SBDC, IN, MIN, PIN, GPCK, BPCK, SCDG, SBDG, DF, DFG
-----
Daniel Himmelstein Researcher May 5, 2015 Daniel Himmelstein Researcher May 5, 2015
Revised indications which include ehrlink Revised indications which include ehrlink
@ -294,6 +374,10 @@ Our indication catalog, which only includes DO slim diseases and approved small
The combined high and low-confidence indication set covers 107 diseases and 744 compounds. For more information see the notebook, table of indications with resource info, or table of collapsed indications. The combined high and low-confidence indication set covers 107 diseases and 744 compounds. For more information see the notebook, table of indications with resource info, or table of collapsed indications.
-----
Allison McCoy May 27, 2015 Allison McCoy May 27, 2015
Our validation manuscript has been published, @dhimmel: http://aci.schattauer.de/en/contents/current-issue/issue/special/manuscript/24377/show.html Our validation manuscript has been published, @dhimmel: http://aci.schattauer.de/en/contents/current-issue/issue/special/manuscript/24377/show.html
@ -305,6 +389,10 @@ I'll see what I can do about sharing the data, but unfortunately I've got travel
Enjoy the travels, we would definitely appreciate the data! Enjoy the travels, we would definitely appreciate the data!
-----
Daniel Himmelstein Researcher July 14, 2015 Daniel Himmelstein Researcher July 14, 2015
Expert curation of the indication catalog Expert curation of the indication catalog
@ -313,6 +401,10 @@ We have decided to filter our catalog for disease-modifying indications and are
@allisonmccoy, have you thought more about releasing the data from your recent publication [1]? If you can do this in the next week or two, we would be thrilled to include this data. Otherwise we will have to move ahead with only the ehrlink data from your initial study [2]. @allisonmccoy, have you thought more about releasing the data from your recent publication [1]? If you can do this in the next week or two, we would be thrilled to include this data. Otherwise we will have to move ahead with only the ehrlink data from your initial study [2].
-----
Daniel Himmelstein Researcher March 16, 2016 Daniel Himmelstein Researcher March 16, 2016
PharmacotherapyDB Version 1.0 PharmacotherapyDB Version 1.0
@ -323,6 +415,10 @@ Thanks @b_good, @TIOprea, @allisonmccoy, @ritukhare, and @alizee — your sugges
We'll keep this discussion alive for any suggestions of new resources or methods to improve future versions of PharmacotherapyDB. We'll keep this discussion alive for any suggestions of new resources or methods to improve future versions of PharmacotherapyDB.
-----
Daniel Himmelstein Researcher March 16, 2016 Daniel Himmelstein Researcher March 16, 2016
Therapeutic Target Database Therapeutic Target Database
@ -337,12 +433,20 @@ Update July 16, 2016: Qin Chu provided me the following additional information v
The mapping between diseases and drugs were done manually. We searched different sources of literature such as pharmacology textbooks, review articles and research papers. The methods to extract the related drug target and disease information from literature were described in the 2012 version of TTD update paper [3]. We mapped the disease information to ICD code in the 2014 update of TTD [4]. The mapping between diseases and drugs were done manually. We searched different sources of literature such as pharmacology textbooks, review articles and research papers. The methods to extract the related drug target and disease information from literature were described in the 2012 version of TTD update paper [3]. We mapped the disease information to ICD code in the 2014 update of TTD [4].
-----
Daniel Himmelstein Researcher April 10, 2016 Daniel Himmelstein Researcher April 10, 2016
Cheng et al 2014 Cheng et al 2014
A 2014 study titled "Systematic evaluation of connectivity map for disease indications" compiled 890 indications between 152 drugs and 145 diseases [1]. They compiled the indications from FAERS and Pharmaprojects. The indications are available as free text in Table S2 of the supplementary word document. I copied Table S2 into a TSV available here. A 2014 study titled "Systematic evaluation of connectivity map for disease indications" compiled 890 indications between 152 drugs and 145 diseases [1]. They compiled the indications from FAERS and Pharmaprojects. The indications are available as free text in Table S2 of the supplementary word document. I copied Table S2 into a TSV available here.
-----
Daniel Himmelstein Researcher July 8, 2017 Daniel Himmelstein Researcher July 8, 2017
Recently, three resources have been published that provide catalogs of indications and drug repurposing examples. Recently, three resources have been published that provide catalogs of indications and drug repurposing examples.

Loading…
Cancel
Save