We are looking to construct a catalog of indications (efficacious drug-disease pairs) with the following attributes (ordered by importance):
We are looking to construct a catalog of indications (efficacious drug-disease pairs) with the following attributes (ordered by importance):
automated and high-throughput construction
- automated and high-throughput construction
high-quality, or varying levels of quality as long as quality level is annotated
-high-quality, or varying levels of quality as long as quality level is annotated
comprehensive
-comprehensive
disease modifying rather than symptomatic
-disease modifying rather than symptomatic
compounds which map to pubchem
-compounds which map to pubchem
contraindications and adverse effects are excluded and cataloged separately
-contraindications and adverse effects are excluded and cataloged separately
diseases which map to the disease ontology
-diseases which map to the disease ontology
source is retrievable
-source is retrievable
A few options we can consider:
A few options we can consider:
LabeledIn — Curators manually identified indications from drug labels for 250 human prescription ingredients (drugs) [1].
-LabeledIn — Curators manually identified indications from drug labels for 250 human prescription ingredients (drugs) [1].
MEDI — Indications extracted from RxNorm, SIDER 2, MedlinePlus, and Wikipedia were integrated into a single resource. The high-precision subset (indications in RxNorm or two other resources) includes 13,304 unique indications for 2,136 medications [2]. Further work added indication prevalence information [3]. MEDI compares favorably to SemRep for extracting indications from clinical text [4].
-MEDI — Indications extracted from RxNorm, SIDER 2, MedlinePlus, and Wikipedia were integrated into a single resource. The high-precision subset (indications in RxNorm or two other resources) includes 13,304 unique indications for 2,136 medications [2]. Further work added indication prevalence information [3]. MEDI compares favorably to SemRep for extracting indications from clinical text [4].
SemRep — "SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text" [5]. SemRep has been used to extract TREAT relations from MeSH scope notes, Daily Med, DrugBank, and AHFS Consumer Medication Information [6]. SemRep has also been used to identify TREAT relations from Medline abstracts [7]. A project called SemMedDB provides the SemRep results from mining PubMed [8].
-SemRep — "SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text" [5]. SemRep has been used to extract TREAT relations from MeSH scope notes, Daily Med, DrugBank, and AHFS Consumer Medication Information [6]. SemRep has also been used to identify TREAT relations from Medline abstracts [7]. A project called SemMedDB provides the SemRep results from mining PubMed [8].
SPL-X — Structured Product Labels eXtractor — Using MetaMap, this project extracted indications from DailyMed drug labels that were available as XML [9]. Data does not appear to be available.
-SPL-X — Structured Product Labels eXtractor — Using MetaMap, this project extracted indications from DailyMed drug labels that were available as XML [9]. Data does not appear to be available.
Comparative Toxicogenomics Database [10] — Manual literature curators annotated drug-disease pairs as 'therapeutic'. The resource is extensive (the 'therapeutic' threshold was low) but incomplete.
-Comparative Toxicogenomics Database [10] — Manual literature curators annotated drug-disease pairs as 'therapeutic'. The resource is extensive (the 'therapeutic' threshold was low) but incomplete.
SIDER 2 — In addition to extracting side effects from drug labels, SIDER also extracts indications [11]. Since the approach is automated, some side effects may be extracted as indications and vice versa. This approach would only provide information for drugs with labels from the US FDA or Canada.
-SIDER 2 — In addition to extracting side effects from drug labels, SIDER also extracts indications [11]. Since the approach is automated, some side effects may be extracted as indications and vice versa. This approach would only provide information for drugs with labels from the US FDA or Canada.
Any additional resources or suggestions?
Any additional resources or suggestions?
@ -29,6 +28,11 @@ Any additional resources or suggestions?
Jesse Spaulding: The markdown error has been fixed.
Jesse Spaulding: The markdown error has been fixed.
-----
Daniel Himmelstein Researcher April 1, 2015
Daniel Himmelstein Researcher April 1, 2015
@ -36,11 +40,19 @@ The initial LabeledIn [1] resource used expert curators. The team behind this pr
They assessed 3004 indications not already in LabeledIn corresponding to 706 new drug labels. We are looking to increase the coverage of the initial LabeledIn dataset by adding these crowdsourced indications.
They assessed 3004 indications not already in LabeledIn corresponding to 706 new drug labels. We are looking to increase the coverage of the initial LabeledIn dataset by adding these crowdsourced indications.
-----
Benjamin Good April 3, 2015
Benjamin Good April 3, 2015
This dataset might be worth looking into. Drug-indication links captured from physicians in an EHR system [1] . Data appears to be available - though its in a 200+ page PDF! http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422843/bin/amiajnl-2012-000852-s1.pdf(I'm sure that was a journal requirement).
This dataset might be worth looking into. Drug-indication links captured from physicians in an EHR system [1] . Data appears to be available - though its in a 200+ page PDF! http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422843/bin/amiajnl-2012-000852-s1.pdf(I'm sure that was a journal requirement).
-----
Daniel Himmelstein Researcher April 3, 2015
Daniel Himmelstein Researcher April 3, 2015
@ -50,6 +62,10 @@ This resource is noteworthy because it will capture off-label usages better than
I converted the pdf file into a tsv file, which can be downloaded here.
I converted the pdf file into a tsv file, which can be downloaded here.
-----
Benjamin Good April 7, 2015
Benjamin Good April 7, 2015
@ -58,6 +74,10 @@ You can access SemRep extracted semantic relations (e.g. treats, causes) based o
Daniel Himmelstein: Added the reference to my initial post. Given the quality issues, I do not plan to include this resource in our gold standard set of indications. It could be helpful later as a literature-derived set of potential indications.
Daniel Himmelstein: Added the reference to my initial post. Given the quality issues, I do not plan to include this resource in our gold standard set of indications. It could be helpful later as a literature-derived set of potential indications.
My worry is that these identifiers may not correspond to a standardized vocabulary that we can access and easily map to. I will contact the authors for clarification.
My worry is that these identifiers may not correspond to a standardized vocabulary that we can access and easily map to. I will contact the authors for clarification.
-----
Tudor Oprea April 8, 2015
Tudor Oprea April 8, 2015
@ -93,6 +117,10 @@ A few pointers:
Daniel Himmelstein: +1 for grant agencies to fund this type of activity
Daniel Himmelstein: +1 for grant agencies to fund this type of activity
-----
Allison McCoy April 8, 2015
Allison McCoy April 8, 2015
My colleagues and I have worked on multiple approaches to create this knowledge in the papers below:
My colleagues and I have worked on multiple approaches to create this knowledge in the papers below:
@ -104,6 +132,10 @@ My colleagues and I have worked on multiple approaches to create this knowledge
In the JAMIA paper mentioned above, we used what we called a crowdsourcing approach to get this data. We have recently validated that approach at another site, and that publication is coming out in ACI soon. Unfortunately, in the original version, as you suspected, our medications and problems not mapped to any standardized terminology. The identifiers are local to the EHR, and while we have made some attempts to map them to RxNorm and SNOMED-CT, we were never able to get a really accurate set. However, the validation uses data from a different EHR, which I believe can be more easily mapped. Once the paper is out, I'll see if I can share that data.
In the JAMIA paper mentioned above, we used what we called a crowdsourcing approach to get this data. We have recently validated that approach at another site, and that publication is coming out in ACI soon. Unfortunately, in the original version, as you suspected, our medications and problems not mapped to any standardized terminology. The identifiers are local to the EHR, and while we have made some attempts to map them to RxNorm and SNOMED-CT, we were never able to get a really accurate set. However, the validation uses data from a different EHR, which I believe can be more easily mapped. Once the paper is out, I'll see if I can share that data.
-----
Tudor Oprea April 8, 2015
Tudor Oprea April 8, 2015
@ -115,16 +147,28 @@ See this paper http://pubs.acs.org/doi/abs/10.1021/ci400099q for details (mine i
With this in mid, I want to point out that crowdsourcing problem medication pairs by clinicians is an intriguing effort, and if the data is publicly available I would like to learn more. There are risks because a) verification of data entry was probably not done at the entry level (was the clinician familiar with both the drug and the disease?); b) the person determining the problem would require training in pharmacovigilance, understanding of known side-effects, etc. I assume you have done that, and that you compared the sets? I apologize that I do not have time to access your papers right now.
With this in mid, I want to point out that crowdsourcing problem medication pairs by clinicians is an intriguing effort, and if the data is publicly available I would like to learn more. There are risks because a) verification of data entry was probably not done at the entry level (was the clinician familiar with both the drug and the disease?); b) the person determining the problem would require training in pharmacovigilance, understanding of known side-effects, etc. I assume you have done that, and that you compared the sets? I apologize that I do not have time to access your papers right now.
-----
Allison McCoy April 8, 2015
Allison McCoy April 8, 2015
To clarify the crowdsourcing approach, in our study the clinicians are completing the task because it is required during routine care, not solely for the purpose of creating a knowledge base. They are entering the data into the EHR because they are prescribing a medication to a patient and are often required to link it to one or more of the patient's problems for billing purposes. We did not ask them to do any additional work outside of their own routine clinical practice.
To clarify the crowdsourcing approach, in our study the clinicians are completing the task because it is required during routine care, not solely for the purpose of creating a knowledge base. They are entering the data into the EHR because they are prescribing a medication to a patient and are often required to link it to one or more of the patient's problems for billing purposes. We did not ask them to do any additional work outside of their own routine clinical practice.
-----
Tudor Oprea April 8, 2015
Tudor Oprea April 8, 2015
thank you - was wondering about that. this does make their work more reliable.
thank you - was wondering about that. this does make their work more reliable.
-----
Daniel Himmelstein Researcher April 9, 2015
Daniel Himmelstein Researcher April 9, 2015
@ -136,6 +180,10 @@ In terms of the mappings from the aforementioned study [2], we still may be able
@TIOprea mentioned the difficulty of identifying disease-modifying indications, even in a carefully hand-curated database. @allisonmccoy, does your method favor disease-modifying links? For example, if modafinil were prescribed to treat MS-induced fatigue, would the clinicians link modafinil to multiple sclerosis or fatigue?
@TIOprea mentioned the difficulty of identifying disease-modifying indications, even in a carefully hand-curated database. @allisonmccoy, does your method favor disease-modifying links? For example, if modafinil were prescribed to treat MS-induced fatigue, would the clinicians link modafinil to multiple sclerosis or fatigue?
-----
Allison McCoy April 9, 2015
Allison McCoy April 9, 2015
@ -151,6 +199,10 @@ The 2nd reference uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely ava
It could be either, but more than likely it would be linked to MS, because that's what would be on the problem list already and easily linked during e-prescribing, but in our evaluation, we would have counted either as correct. We actually had a lot of discussion about this while doing the evaluations, because it did occur frequently.
It could be either, but more than likely it would be linked to MS, because that's what would be on the problem list already and easily linked during e-prescribing, but in our evaluation, we would have counted either as correct. We actually had a lot of discussion about this while doing the evaluations, because it did occur frequently.
-----
Daniel Himmelstein Researcher April 9, 2015
Daniel Himmelstein Researcher April 9, 2015
@ -173,6 +225,10 @@ Thanks for the perspective. We won't include these as part of our gold standard.
This I think will be the biggest difficulty. One option could be to exclude drugs that mostly treat symptoms. We noticed that drugs with many indications tended to be of this category. For multiple sclerosis, disease modifying is an established concept with currently 12 drugs. Unfortunately, the MS indications we've extracted from MEDI and LabeledIn are predominantly symptomatic. And to make matters worse, for most other diseases the DM status seems much more poorly defined.
This I think will be the biggest difficulty. One option could be to exclude drugs that mostly treat symptoms. We noticed that drugs with many indications tended to be of this category. For multiple sclerosis, disease modifying is an established concept with currently 12 drugs. Unfortunately, the MS indications we've extracted from MEDI and LabeledIn are predominantly symptomatic. And to make matters worse, for most other diseases the DM status seems much more poorly defined.
-----
Daniel Himmelstein Researcher April 9, 2015
Daniel Himmelstein Researcher April 9, 2015
The 2nd reference [1] uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely available, so that knowledge base could easily be regenerated by another party.
The 2nd reference [1] uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely available, so that knowledge base could easily be regenerated by another party.
@ -188,6 +244,10 @@ Then in the methods they state:
Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
-----
Allison McCoy April 9, 2015
Allison McCoy April 9, 2015
Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
@ -196,6 +256,10 @@ Allison McCoy April 9, 2015
Daniel Himmelstein: Thanks for the clarification. We also plan to perform some indication propagation on the Disease Ontology hierarchy.
Daniel Himmelstein: Thanks for the clarification. We also plan to perform some indication propagation on the Disease Ontology hierarchy.
-----
Daniel Himmelstein Researcher April 21, 2015
Daniel Himmelstein Researcher April 21, 2015
PREDICT Indications
PREDICT Indications
@ -214,6 +278,10 @@ We combined the supplementary datasets from the study to create a table of PREDI
Daniel Himmelstein: Fixed, thanks
Daniel Himmelstein: Fixed, thanks
-----
Daniel Himmelstein Researcher April 21, 2015
Daniel Himmelstein Researcher April 21, 2015
Indication Set
Indication Set
@ -235,6 +303,10 @@ Indication Links
We would still like a way to differentiate disease-modifying from symptomatic indications and will explore manually classifying a subset of indications and training a model.
We would still like a way to differentiate disease-modifying from symptomatic indications and will explore manually classifying a subset of indications and training a model.
-----
Antoine Lizee April 29, 2015
Antoine Lizee April 29, 2015
@ -263,6 +335,10 @@ Potential future directions:
Daniel Himmelstein: Awesome, you have set the train in motion for us to include ehrlink indications. Let's move discussion to this new thread specifically for ehrlink analysis.
Daniel Himmelstein: Awesome, you have set the train in motion for us to include ehrlink indications. Let's move discussion to this new thread specifically for ehrlink analysis.
-----
Antoine Lizee May 3, 2015
Antoine Lizee May 3, 2015
UPDATE:
UPDATE:
@ -281,6 +357,10 @@ I also created the QC file for the ambiguity resolution step.
@ -294,6 +374,10 @@ Our indication catalog, which only includes DO slim diseases and approved small
The combined high and low-confidence indication set covers 107 diseases and 744 compounds. For more information see the notebook, table of indications with resource info, or table of collapsed indications.
The combined high and low-confidence indication set covers 107 diseases and 744 compounds. For more information see the notebook, table of indications with resource info, or table of collapsed indications.
-----
Allison McCoy May 27, 2015
Allison McCoy May 27, 2015
Our validation manuscript has been published, @dhimmel: http://aci.schattauer.de/en/contents/current-issue/issue/special/manuscript/24377/show.html
Our validation manuscript has been published, @dhimmel: http://aci.schattauer.de/en/contents/current-issue/issue/special/manuscript/24377/show.html
@ -305,6 +389,10 @@ I'll see what I can do about sharing the data, but unfortunately I've got travel
Enjoy the travels, we would definitely appreciate the data!
Enjoy the travels, we would definitely appreciate the data!
-----
Daniel Himmelstein Researcher July 14, 2015
Daniel Himmelstein Researcher July 14, 2015
Expert curation of the indication catalog
Expert curation of the indication catalog
@ -312,6 +400,10 @@ We have decided to filter our catalog for disease-modifying indications and are
@allisonmccoy, have you thought more about releasing the data from your recent publication [1]? If you can do this in the next week or two, we would be thrilled to include this data. Otherwise we will have to move ahead with only the ehrlink data from your initial study [2].
@allisonmccoy, have you thought more about releasing the data from your recent publication [1]? If you can do this in the next week or two, we would be thrilled to include this data. Otherwise we will have to move ahead with only the ehrlink data from your initial study [2].
-----
Daniel Himmelstein Researcher March 16, 2016
Daniel Himmelstein Researcher March 16, 2016
PharmacotherapyDB Version 1.0
PharmacotherapyDB Version 1.0
@ -323,6 +415,10 @@ Thanks @b_good, @TIOprea, @allisonmccoy, @ritukhare, and @alizee — your sugges
We'll keep this discussion alive for any suggestions of new resources or methods to improve future versions of PharmacotherapyDB.
We'll keep this discussion alive for any suggestions of new resources or methods to improve future versions of PharmacotherapyDB.
-----
Daniel Himmelstein Researcher March 16, 2016
Daniel Himmelstein Researcher March 16, 2016
Therapeutic Target Database
Therapeutic Target Database
@ -337,12 +433,20 @@ Update July 16, 2016: Qin Chu provided me the following additional information v
The mapping between diseases and drugs were done manually. We searched different sources of literature such as pharmacology textbooks, review articles and research papers. The methods to extract the related drug target and disease information from literature were described in the 2012 version of TTD update paper [3]. We mapped the disease information to ICD code in the 2014 update of TTD [4].
The mapping between diseases and drugs were done manually. We searched different sources of literature such as pharmacology textbooks, review articles and research papers. The methods to extract the related drug target and disease information from literature were described in the 2012 version of TTD update paper [3]. We mapped the disease information to ICD code in the 2014 update of TTD [4].
-----
Daniel Himmelstein Researcher April 10, 2016
Daniel Himmelstein Researcher April 10, 2016
Cheng et al 2014
Cheng et al 2014
A 2014 study titled "Systematic evaluation of connectivity map for disease indications" compiled 890 indications between 152 drugs and 145 diseases [1]. They compiled the indications from FAERS and Pharmaprojects. The indications are available as free text in Table S2 of the supplementary word document. I copied Table S2 into a TSV available here.
A 2014 study titled "Systematic evaluation of connectivity map for disease indications" compiled 890 indications between 152 drugs and 145 diseases [1]. They compiled the indications from FAERS and Pharmaprojects. The indications are available as free text in Table S2 of the supplementary word document. I copied Table S2 into a TSV available here.
-----
Daniel Himmelstein Researcher July 8, 2017
Daniel Himmelstein Researcher July 8, 2017
Recently, three resources have been published that provide catalogs of indications and drug repurposing examples.
Recently, three resources have been published that provide catalogs of indications and drug repurposing examples.