@ -29,7 +29,7 @@ describe the specific data used in the analysis (\cref{dataintegration}).
Since Sep 27th, 2007 those who conduct clinical trials of FDA controlled
drugs or devices on human subjects must register
their trial at \url { ClinicalTrials.gov}
(\cite { noauthor_ fdaaa_ nodate} ) .
\cite { usnlm_ fdaaa801finalrule} .
This involves submitting information on the expected enrollment and duration of
trials, drugs or devices that will be used, treatment protocols and study arms,
as well as contact information the trial sponsor and treatment sites.
@ -39,13 +39,13 @@ When starting a new trial, the required information must be submitted
After the initial submission, the data is briefly reviewed for quality and
then the trial record is published and the trial is assigned a
National Clinical Trial (NCT) identifier.
(\cite { noauthor_ fdaaa_ nodate} ) .
\cite { usnlm_ fdaaa801finalrule} .
Each trial's record is updated periodically, including a final update that must occur
within a year of completing the primary objective, although exceptions are
available for trials related to drug approvals or for trials with secondary
objectives that require further observation\footnote { This rule came into effect in 2017}
(\cite { noauthor_ fdaaa_ nodate} ) .
\cite { usnlm_ fdaaa801finalrule} .
Other than the requirements for the the first and last submissions, all other
updates occur at the discresion of the trial sponsor.
Because the ClinicalTrials.gov website serves as a central point of information
@ -61,8 +61,8 @@ to join a clinical trial.
% include screenshots?
The second way to access the data is through a normalized database setup by
the
\ href{ https://aact.ctti-clinicaltrials.org/} { Clinical Trials Transformation Initiative }
called AACT. % TODO: Get CITATION
\ authorcite{ ctti_ aact_ 2022 }
called the Aggreggate Analysis of ClinicalTrials.gov (AACT).
The AACT database is available as a PostgreSQL database dump or set of pipe (``$ \vert $ '')
delimited files and matches the current version of the ClinicalTrials.gov database.
This format is ameniable to large scale analysis, but does not contain information about
@ -130,7 +130,8 @@ Each NDC code can have multiple SPLs associated with it because each
drug compound may be packaged in multiple ways, e.g. boxes with different
numbers of blister packs.
These SPLs are made available for download so that they can be integrated
into patient health systems to improve patient safety (\cite { noauthor_ indexing_ nodate} ).
into patient health systems to improve patient safety
\cite { indexingsplfactsheet_ } .
The FDA also published additional data in the NDC SPL Data Elements (NSDE) file.
This file contains some of the data from the SPL files, as well as the dates
@ -159,7 +160,7 @@ Years of Life Lost (YLL), and Years Lived with Disability (YLD) and come with
both an estimate and 95\% confidence interval bounds.
Estimates are available for national, multinational, and global
populations
( \cite { vos_ global_ 2020} ) .
\cite { vos_ globalburden369 _ 2020} .
These classes of disease are organized in a hierarchy, with each subsuming category
having its own estimates of disease incidence.
@ -171,7 +172,7 @@ that are most important from a public health perspective.
% the nested category outline.
The IHME also provides a link between the disease/cause hierarchy and ICD10
codes
( \cite { global_ burden_ of_ disease_ collaborative_ network_ global_ 2020} ) .
\cite { globalburdenofdiseasecollaborativenetwork_ globalburdendisease _ 2020a } .
% ----------------------------------------------------
@ -185,7 +186,7 @@ In each section below I briefly describe each terminology, its contents, and use
The Medical Subject Headings (MeSH) Thesaurus is produced and maintained by the National
Library of Medicine.
It is used to index subjects in various NLM publications including PubMed
(\cite { noauthor_ medical_ nodate} ) .
\cite { medicalsubjectheadingshomepage_ } .
The AACT database contains a table that links clinical trials' clinical conditions
and drug names to terms in the MeSH thesaurus.
As this contains a standardized nomenclature, it simplified much of the
@ -193,7 +194,7 @@ linking between clinical trials and other datasources.
\paragraph { RxNorm}
According to \cite { noauthor_ rxnorm_ nodate-1 }
According to \cite { usnlm_ rxnorm_ 2023 }
\begin { displayquote}
What is RxNorm? \\
RxNorm is two things: a normalized naming system for generic and branded drugs;
@ -215,7 +216,7 @@ The one I chose to use was a MariaDB database that backs a service called RxNav
provided by the National Library of Medicine (NLM).
The NLM provides scripts to set up and host the backing databases on your
own servers
(\cite { noauthor_ rxnav---box_ nodate} ) .
\cite { usnlm_ rxnaviabox_ 2023} .
After setting up the local server, I wrote a python program to export
the data from the RxNorm database and import it into the AACT Database.
This was required because the former uses a MariaDB database server
@ -232,23 +233,29 @@ The International Classification of Diseases 10th revision (ICD-10) is a
worldwide standard for categorizing human disease maintained by the
World Health Organization.
Although the WHO version's last major update was in 2019 and it was officially
superceded in 2022 by the 11th revision (\cite { noauthor_ international_ nodate} ),
superceded in 2022 by the 11th revision
\cite { who_ icd-10_ 2023}
.
the 10th revision is still in use in the United States as the
Centers for Medicare and Medicaid Services (CMS) continues to publish
updated versions called
ICD-10-CM (Clinical Managment) (\cite { noauthor_ 2023_ nodate} ) and
ICD-10-PCS (Procedure Coding System)(\cite { noauthor_ 2023_ nodate-1} ) for use
in medical billing.
ICD-10-CM (Clinical Managment)
\cite { uscms_ icd-10_ cm_ 2022}
and ICD-10-PCS (Procedure Coding System)
\cite { uscms_ icd-10_ pcs_ 2022}
for use in medical billing.
ICD-10 codes are organized in a heirarchy.
There are 22 highest level categories, representing general categories such
as cancers, mental illness, and infectious diseases.
The second layer of the hierarchy consists of about 225 2nd level groupings.
The second layer of the hierarchy consists of about 225 groupings.
% how was it used
The GBD database provided a mapping between their categories and ICD-10
codes (\cite { global_ burden_ of_ disease_ collaborative_ network_ global_ 2020} ).
codes
\cite { globalburdenofdiseasecollaborativenetwork_ globalburdendisease_ 2020a}
.
Unfortunately it appears to use a combination of the default WHO ICD-10 codes
and the ICD-10-CM codes from the CMS.
Additionally, many diseases classified by ICD-10 codes do not correspond to
@ -256,26 +263,27 @@ categories in the GDB database.
% how it was obtained
As I needed a combined list of ICD-10 codes, I first obtained the 2019 version
of the ICD-10-CM codes from the CMS (\cite { noauthor_ 2019_ nodate} ).
of the ICD-10-CM codes from the CMS
(the most recent version corresponding to the GBD matching file)
With the arrival of the ICD-11 system, it was difficult to find an official
source from which to download the WHO versions of ICD-10 codes.
Eventually I resorted to copying them from the navigation bar of the
\href { https://icd.who.int/browse10/2019/en} { official WHO ICD-10 (2019) website}
(\cite { noauthor_ icd-10_ nodate} .)
\cite { worldhealthorganization_ icd10version2019_ } .
After getting both sources into the same format,
I combined them and removed duplicate codes, preferring to keep the descriptions
from the WHO version.
This was done using standard unix scripting commands.
I then imported the data into the Postgres Database alongside the AACT data.
\paragraph { Unified Medical Language System (UMLS) Thesarus}
The NLM also publishes a medical terminology thesaurus
known at the Unified Medical Language System (UMLS) which links terminologies
such as RxNorm, MeSH, and ICD-10.
It is made available through an API hosted by the NLM.
One key feature is the ability to use a basic text search to find matching
terms in various terminologies.
% Did I use the UMLS in any specific way? Not that I remember, I just linked on RXCUI's
% \paragraph { Unified Medical Language System (UMLS) Thesarus}
%
% The NLM also publishes a medical terminology thesaurus
% known at the Unified Medical Language System (UMLS) which links terminologies
% such as RxNorm, MeSH, and ICD-10.
% It is made available through an API hosted by the NLM.
% One key feature is the ability to use a basic text search to find matching
% terms in various terminologies.