Update 'Thinklab notes How should we construct a catalog of drug indications?'

4 years ago · 98e6c88798
parent 15997ca59c
commit 98e6c88798
1 changed files with 119 additions and 15 deletions
--- a/Thinklab-notes---How-should-we-construct-a-catalog-of-drug-indications%3F.md
+++ b/Thinklab-notes---How-should-we-construct-a-catalog-of-drug-indications%3F.md
@ -1,25 +1,24 @@
 Daniel Himmelstein Researcher   Jan. 14, 2015
 We are looking to construct a catalog of indications (efficacious drug-disease pairs) with the following attributes (ordered by importance):
-    automated and high-throughput construction
+    - automated and high-throughput construction
-    high-quality, or varying levels of quality as long as quality level is annotated
+    -high-quality, or varying levels of quality as long as quality level is annotated
-    comprehensive
+    -comprehensive
-    disease modifying rather than symptomatic
+    -disease modifying rather than symptomatic
-    compounds which map to pubchem
+    -compounds which map to pubchem
-    contraindications and adverse effects are excluded and cataloged separately
+    -contraindications and adverse effects are excluded and cataloged separately
-    diseases which map to the disease ontology
+    -diseases which map to the disease ontology
-    source is retrievable
+    -source is retrievable
 A few options we can consider:
-    LabeledIn — Curators manually identified indications from drug labels for 250 human prescription ingredients (drugs) [1].
+    -LabeledIn — Curators manually identified indications from drug labels for 250 human prescription ingredients (drugs) [1].
-    MEDI — Indications extracted from RxNorm, SIDER 2, MedlinePlus, and Wikipedia were integrated into a single resource. The high-precision subset (indications in RxNorm or two other resources) includes 13,304 unique indications for 2,136 medications [2]. Further work added indication prevalence information [3]. MEDI compares favorably to SemRep for extracting indications from clinical text [4].
+    -MEDI — Indications extracted from RxNorm, SIDER 2, MedlinePlus, and Wikipedia were integrated into a single resource. The high-precision subset (indications in RxNorm or two other resources) includes 13,304 unique indications for 2,136 medications [2]. Further work added indication prevalence information [3]. MEDI compares favorably to SemRep for extracting indications from clinical text [4].
-    SemRep — "SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text" [5]. SemRep has been used to extract TREAT relations from MeSH scope notes, Daily Med, DrugBank, and AHFS Consumer Medication Information [6]. SemRep has also been used to identify TREAT relations from Medline abstracts [7]. A project called SemMedDB provides the SemRep results from mining PubMed [8].
+    -SemRep — "SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text" [5]. SemRep has been used to extract TREAT relations from MeSH scope notes, Daily Med, DrugBank, and AHFS Consumer Medication Information [6]. SemRep has also been used to identify TREAT relations from Medline abstracts [7]. A project called SemMedDB provides the SemRep results from mining PubMed [8].
-    SPL-X — Structured Product Labels eXtractor — Using MetaMap, this project extracted indications from DailyMed drug labels that were available as XML [9]. Data does not appear to be available.
+    -SPL-X — Structured Product Labels eXtractor — Using MetaMap, this project extracted indications from DailyMed drug labels that were available as XML [9]. Data does not appear to be available.
-    Comparative Toxicogenomics Database [10] — Manual literature curators annotated drug-disease pairs as 'therapeutic'. The resource is extensive (the 'therapeutic' threshold was low) but incomplete.
+    -Comparative Toxicogenomics Database [10] — Manual literature curators annotated drug-disease pairs as 'therapeutic'. The resource is extensive (the 'therapeutic' threshold was low) but incomplete.
-    SIDER 2 — In addition to extracting side effects from drug labels, SIDER also extracts indications [11]. Since the approach is automated, some side effects may be extracted as indications and vice versa. This approach would only provide information for drugs with labels from the US FDA or Canada.
+    -SIDER 2 — In addition to extracting side effects from drug labels, SIDER also extracts indications [11]. Since the approach is automated, some side effects may be extracted as indications and vice versa. This approach would only provide information for drugs with labels from the US FDA or Canada.
 Any additional resources or suggestions?
@ -30,6 +29,11 @@ Any additional resources or suggestions?
    Jesse Spaulding: The markdown error has been fixed.
 -----
 Daniel Himmelstein Researcher  April 1, 2015
 The initial LabeledIn [1] resource used expert curators. The team behind this project tested crowdsourced curation using Amazon Mechanical Turk workers [2]. They found the majority vote of workers on whether a disease within a label was an indication had a high accuracy (96%).
@ -37,11 +41,19 @@ The initial LabeledIn [1] resource used expert curators. The team behind this pr
 They assessed 3004 indications not already in LabeledIn corresponding to 706 new drug labels. We are looking to increase the coverage of the initial LabeledIn dataset by adding these crowdsourced indications.
 -----
 Benjamin Good  April 3, 2015
 This dataset might be worth looking into. Drug-indication links captured from physicians in an EHR system [1] . Data appears to be available - though its in a 200+ page PDF! http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3422843/bin/amiajnl-2012-000852-s1.pdf(I'm sure that was a journal requirement).
 -----
 Daniel Himmelstein Researcher  April 3, 2015
 Hey @b_good, thanks for the suggestion [1] and tracking down the data supplement, which I cannot find on the article's JAMIA page. Hereon, I will refer to this resource as ehrlink, unless anyone can find a previously-used or author-preferred nickname.
@ -51,6 +63,10 @@ This resource is noteworthy because it will capture off-label usages better than
 I converted the pdf file into a tsv file, which can be downloaded here.
 -----
 Benjamin Good  April 7, 2015
 You can access SemRep extracted semantic relations (e.g. treats, causes) based on all PubMed abstracts (updated bi-annually) via the semantic medline database. With a UMLS login, you can get the complete MySQL dump via http://skr3.nlm.nih.gov/SemMedDB/ . Main challenge here is in ensuring quality (as with any NLP output).
@ -58,6 +74,10 @@ You can access SemRep extracted semantic relations (e.g. treats, causes) based o
    Daniel Himmelstein: Added the reference to my initial post. Given the quality issues, I do not plan to include this resource in our gold standard set of indications. It could be helpful later as a literature-derived set of potential indications.
 -----
 Daniel Himmelstein Researcher  April 8, 2015
 ehrlink problem and medication vocabularies
@ -78,6 +98,10 @@ medication_definition_id	medication
 My worry is that these identifiers may not correspond to a standardized vocabulary that we can access and easily map to. I will contact the authors for clarification.
 -----
 Tudor Oprea  April 8, 2015
 Just to let you guys know that, at UNM, Oleg Ursu and I have been constructing such a catalog for nearly eight years.
@ -93,6 +117,10 @@ A few pointers:
    Daniel Himmelstein: +1 for grant agencies to fund this type of activity
 -----
 Allison McCoy  April 8, 2015
 My colleagues and I have worked on multiple approaches to create this knowledge in the papers below:
@ -105,6 +133,10 @@ My colleagues and I have worked on multiple approaches to create this knowledge
 In the JAMIA paper mentioned above, we used what we called a crowdsourcing approach to get this data. We have recently validated that approach at another site, and that publication is coming out in ACI soon. Unfortunately, in the original version, as you suspected, our medications and problems not mapped to any standardized terminology. The identifiers are local to the EHR, and while we have made some attempts to map them to RxNorm and SNOMED-CT, we were never able to get a really accurate set. However, the validation uses data from a different EHR, which I believe can be more easily mapped. Once the paper is out, I'll see if I can share that data.
 -----
 Tudor Oprea  April 8, 2015
 I find crowdsourcing useful when you use a team of experts. So, for example, a carefully selected team of experts, when working on the same problem, can give surprisingly interesting feedback on an otherwise difficult problem.
@ -116,16 +148,28 @@ See this paper http://pubs.acs.org/doi/abs/10.1021/ci400099q for details (mine i
 With this in mid, I want to point out that crowdsourcing problem medication pairs by clinicians is an intriguing effort, and if the data is publicly available I would like to learn more. There are risks because a) verification of data entry was probably not done at the entry level (was the clinician familiar with both the drug and the disease?); b) the person determining the problem would require training in pharmacovigilance, understanding of known side-effects, etc. I assume you have done that, and that you compared the sets? I apologize that I do not have time to access your papers right now.
 -----
 Allison McCoy  April 8, 2015
 To clarify the crowdsourcing approach, in our study the clinicians are completing the task because it is required during routine care, not solely for the purpose of creating a knowledge base. They are entering the data into the EHR because they are prescribing a medication to a patient and are often required to link it to one or more of the patient's problems for billing purposes. We did not ask them to do any additional work outside of their own routine clinical practice.
 -----
 Tudor Oprea  April 8, 2015
 thank you - was wondering about that. this does make their work more reliable.
 -----
 Daniel Himmelstein Researcher  April 9, 2015
    My colleagues and I have worked on multiple approaches to create this knowledge in the papers [1, 2, 3]
@ -137,6 +181,10 @@ In terms of the mappings from the aforementioned study [2], we still may be able
@TIOprea mentioned the difficulty of identifying disease-modifying indications, even in a carefully hand-curated database. @allisonmccoy, does your method favor disease-modifying links? For example, if modafinil were prescribed to treat MS-induced fatigue, would the clinicians link modafinil to multiple sclerosis or fatigue?
 -----
 Allison McCoy  April 9, 2015
    Did any of the other papers you highlighted release data that could add value here?
@ -152,6 +200,10 @@ The 2nd reference uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely ava
 It could be either, but more than likely it would be linked to MS, because that's what would be on the problem list already and easily linked during e-prescribing, but in our evaluation, we would have counted either as correct. We actually had a lot of discussion about this while doing the evaluations, because it did occur frequently.
 -----
 Daniel Himmelstein Researcher  April 9, 2015
@TIOprea, thanks for your insights. You touch on important points. In general our method may not require a perfect indication catalog to succeed, so I am hopeful despite the difficulties you mention. Specifically,
@ -173,6 +225,10 @@ Thanks for the perspective. We won't include these as part of our gold standard.
 This I think will be the biggest difficulty. One option could be to exclude drugs that mostly treat symptoms. We noticed that drugs with many indications tended to be of this category. For multiple sclerosis, disease modifying is an established concept with currently 12 drugs. Unfortunately, the MS indications we've extracted from MEDI and LabeledIn are predominantly symptomatic. And to make matters worse, for most other diseases the DM status seems much more poorly defined.
 -----
 Daniel Himmelstein Researcher  April 9, 2015
    The 2nd reference [1] uses RxNorm, SNOMED-CT, and NDF-RT, all of which is freely available, so that knowledge base could easily be regenerated by another party.
@ -188,6 +244,10 @@ Then in the methods they state:
 Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
 -----
 Allison McCoy  April 9, 2015
    Do you know whether the RxNorm portion of MEDI relied on the same underlying NDF-RT data that you collected for the 2011 AMIA Proceedings Paper [1]?
@ -197,6 +257,10 @@ Allison McCoy  April 9, 2015
    Daniel Himmelstein: Thanks for the clarification. We also plan to perform some indication propagation on the Disease Ontology hierarchy.
 -----
 Daniel Himmelstein Researcher  April 21, 2015
 PREDICT Indications
@ -215,6 +279,10 @@ We combined the supplementary datasets from the study to create a table of PREDI
    Daniel Himmelstein: Fixed, thanks
 -----
 Daniel Himmelstein Researcher  April 21, 2015
 Indication Set
@ -236,6 +304,10 @@ Indication Links
 We would still like a way to differentiate disease-modifying from symptomatic indications and will explore manually classifying a subset of indications and training a model.
 -----
 Antoine Lizee  April 29, 2015
 Hi Daniel,
@ -263,6 +335,10 @@ Potential future directions:
    Daniel Himmelstein: Awesome, you have set the train in motion for us to include ehrlink indications. Let's move discussion to this new thread specifically for ehrlink analysis.
 -----
 Antoine Lizee  May 3, 2015
 UPDATE:
@ -281,6 +357,10 @@ I also created the QC file for the ambiguity resolution step.
    SCD, SBD, SCDF, SBDF, BN, SCDC, SBDC, IN, MIN, PIN, GPCK, BPCK, SCDG, SBDG, DF, DFG
 -----
 Daniel Himmelstein Researcher  May 5, 2015
 Revised indications which include ehrlink
@ -294,6 +374,10 @@ Our indication catalog, which only includes DO slim diseases and approved small
 The combined high and low-confidence indication set covers 107 diseases and 744 compounds. For more information see the notebook, table of indications with resource info, or table of collapsed indications.
 -----
 Allison McCoy  May 27, 2015
 Our validation manuscript has been published, @dhimmel: http://aci.schattauer.de/en/contents/current-issue/issue/special/manuscript/24377/show.html
@ -305,6 +389,10 @@ I'll see what I can do about sharing the data, but unfortunately I've got travel
    Enjoy the travels, we would definitely appreciate the data!
 -----
 Daniel Himmelstein Researcher  July 14, 2015
 Expert curation of the indication catalog
@ -313,6 +401,10 @@ We have decided to filter our catalog for disease-modifying indications and are
@allisonmccoy, have you thought more about releasing the data from your recent publication [1]? If you can do this in the next week or two, we would be thrilled to include this data. Otherwise we will have to move ahead with only the ehrlink data from your initial study [2].
 -----
 Daniel Himmelstein Researcher  March 16, 2016
 PharmacotherapyDB Version 1.0
@ -323,6 +415,10 @@ Thanks @b_good, @TIOprea, @allisonmccoy, @ritukhare, and @alizee — your sugges
 We'll keep this discussion alive for any suggestions of new resources or methods to improve future versions of PharmacotherapyDB.
 -----
 Daniel Himmelstein Researcher  March 16, 2016
 Therapeutic Target Database
@ -337,12 +433,20 @@ Update July 16, 2016: Qin Chu provided me the following additional information v
    The mapping between diseases and drugs were done manually. We searched different sources of literature such as pharmacology textbooks, review articles and research papers. The methods to extract the related drug target and disease information from literature were described in the 2012 version of TTD update paper [3]. We mapped the disease information to ICD code in the 2014 update of TTD [4].
 -----
 Daniel Himmelstein Researcher  April 10, 2016
 Cheng et al 2014
 A 2014 study titled "Systematic evaluation of connectivity map for disease indications" compiled 890 indications between 152 drugs and 145 diseases [1]. They compiled the indications from FAERS and Pharmaprojects. The indications are available as free text in Table S2 of the supplementary word document. I copied Table S2 into a TSV available here.
 -----
 Daniel Himmelstein Researcher  July 8, 2017
 Recently, three resources have been published that provide catalogs of indications and drug repurposing examples.