MetaMap2016 Usage Notes
Fran¸cois-Michel Lang
metamap@nlm.nih.gov
July 2016
This document explains MetaMap’s command-line options, which support a wide variety of pro-
cessing. All options have a long name (e.g., --term_processing), and most have a short name
(e.g., -z) as well, for simplicity and ease of use.
All use of MetaMap requires a UMLS Metathesaurus license; see this page for all access to MetaMap,
including interactive and batch use from our website and downloading and running it locally at
user sites.
The MetaMap 2016 Release Notes are available here. Users are encouraged to review the MetaMap
Usage FAQ, which presents many use cases and scenarios, here.
Click on any of the following links for documentation about the various types of MetaMap options.
• Usage
• Data Options
• Output/Display Options
• Behavior Options
• Browse Mode Options
• Using User-Defined Acronyms/Abbreviations
• Restricting to/Excluding UMLS Sources and Semantic Types
• NegEx Options
• Server Options
• Miscellaneous Options
Usage
There are two ways to use MetaMap interactively, reading input text from the keyboard and seeing
output on the screen:
1. metamap [ options ]
then type your input text, e.g., lung cancer, at the “|:” prompt.
2. echo lung cancer | metamap [ options ]
1
For processing an input file:
metamap [ options ] InputFile OutputFile
The InputFile and OutputFile options, if specified, must be the last two arguments. If OutputFile
is not specified, it will default to InputFile.out. Note that if the output file (whether specified
on the command line or not) already exists, it will be overwritten and its original contents lost.
For processing another program’s output:
OtherProgram | metamap [ options ]
OtherProgram | metamap [ options ] > OutputFile
To generate a short list of all MetaMap options, simply call
metamap --help
Data Options
MetaMap’s data options determine the Knowledge Source, the Data Version, and the Data
Model used for processing.
Knowledge Source
-Z (--mm data year)
Sets the version of the UMLS Metathesaurus to use, e.g., 2017AA, 2017AB, etc.
Data Version
-V (--mm data version)
Sets MetaMap’s data version (Base, USAbase (the default), and NLM). See this page for more
information about MetaMap’s Base, USAbase, and NLM data versions.
Data Model
-A (--strict model) (default)
-C (--relaxed model)
Sets MetaMap’s data model (strict or relaxed). See this page for more information about MetaMap’s
strict and relaxed data models.
Output/Display Options
MetaMap provides a wide variety of options that control its output. The options that affect only
MetaMap’s human-readable output; are labeled “HR only”; using those options with any output
format other than human-readable will generate a warning, or, in certain cases, an error.
Display Tagger Output
-T (--tagger output)
Displays the output of the MedPost/SKR tagger lining up input words on one line with their tags
on a line below.
Hide Header Output
[no short option] --silent
Suppresses the display of header information such as that shown below.
2
Berkeley DB databases (USAbase 2015AB strict model) are open.
Static variants will come from table varsan in
/nfsvol/nls/II_Group_WorkArea/MetaMap_DB//DB.USAbase.2015AB.strict.
Derivational Variants: Adj/noun ONLY.
Variant generation mode: static.
Established connection $stream(140152552284000) to TAGGER Server on ii-server3.
a.out.Linux (2015)
Control options:
composite_phrases=4
lexicon=db
mm_data_year=2015AB
Display Variants
-v (--variants)
Displays the variants generated for each input word.
Hide Plain Syntax
-p (--hide plain syntax)
Disables the display of the words forming each phrase, as determined by the SPECIALIST parser;
HR only.
Syntax
-x (--syntax)
Displays the output of the SPECIALIST parser; HR only.
Show Candidates
-c (--show candidates)
By default, MetaMap output contains only final mappings, but not the candidate concepts identified
in the text. This option forces the display of all Metathesaurus candidate concepts identified in the
text, regardless of whether they appear in MetaMap’s final mappings. Candidates are displayed
best to worst, according to the MetaMap evaluation metric.
Number Candidates
-n (--number the candidates)
Numbers the candidates in a displayed candidate list; HR only. Requires -c (--show candidates).
Number Mappings
-f (--number the mappings)
Numbers the final mappings; HR only.
Short Semantic Types
-s (--short semantic types)
Displays the short form of UMLS Semantic Types rather than the long form, e.g., dsyn rather than
Disease or Syndrome; HR only.
Show CUIs
-I (--show cuis)
Displays the UMLS CUI for each concept; HR only.
3
Machine Output
-q (--machine output)
Generates Prolog terms rather than human-readable form. See this page for more information
about MetaMap’s Prolog Machine Output.
Formatted XML Output
[no short option] --XMLf
Generates formatted XML, one XML document per input record/citation. Formatted XML is
suitable for reading by humans, but more space intensive than unformatted XML. See this page
for detailed information about MetaMap’s XML output formats.
Unformatted XML Output
[no short option] --XMLn
Generates unformatted XML, one XML document per input record/citation. Formatted XML is
not suitable for reading by humans, but more compact than formatted XML. See this page for
detailed information about MetaMap’s XML output formats.
Formatted JSON Output
[no short option] --JSONf New in MetaMap2016V2
Generates formatted JSON, one JSON document per input file. See this page for detailed infor-
mation about MetaMap’s JSON output formats.
Unformatted JSON Output
[no short option] --JSONn New in MetaMap2016V2
Generates unformatted JSON, one JSON document per input file. See this page for detailed
information about MetaMap’s JSON output formats.
Formal Tagger Output
-F (--formal tagger output)
Displays the tagging information returned by the tagger server.
Fielded MMI Output
-N (--fielded mmi output)
Generate Fielded MMI (MetaMap Indexing) output. See this page for detailed information about
MetaMap’s MMI output.
Show Concept’s Sources
-G (--sources)
Displays the Metathesaurus sources for each candidate and mapping in the output; HR only. More
information about UMLS Source Vocabularies is available here.
Show Acronyms/Abbreviations (AAs)
-j (--dump aas)
Displays the acronyms/abbreviations (AAs) discovered by MetaMap in the form below (pretty-
printed for readability); HR only.
AA | PMID | Acronym | Expansion | #Acronym Tokens | #Acronym Chars |
#ExpansionTokens | #Expansion Chars | Text Offsets
E.g., for the input confidence interval (CI), MetaMap would display
AA|00000000|CI|confidence interval|1|2|3|19|21:2
Show Bracketed Output
-+ (--bracketed output)
4
Surrounds the Phrase, Candidates, and Mappings sections of output with >>>>> and <<<<< brackets;
HR only. E.g., when called with -c (--show candidates) :
>>>>> Phrase
heart attack
<<<<< Phrase
>>>>> Candidates
Meta Candidates (Total=6; Excluded=0; Pruned=0; Remaining=6)
1000
861
861
861
861
861
-- Heart Attack (Myocardial Infarction) [Disease or Syndrome]
HEART (Heart) [Body Part, Organ, or Organ Component]
Attack (Onset of illness) [Temporal Concept]
attack (Attack behavior) [Social Behavior]
Heart (Entire heart) [Body Part, Organ, or Organ Component]
Attack (Observation of attack) [Finding]
<<<<< Candidates
>>>>> Mappings
Meta Mapping (1000):
1000
-- Heart Attack (Myocardial Infarction) [Disease or Syndrome]
<<<<< Mappings
Behavior Options
Processing options control MetaMap’s search algorithms and therefore affect the choice of UMLS
concepts identified.
No Mappings
-m (--hide mappings)
By default, MetaMap output contains only final mappings, and not all the candidate concepts
found in the text. This option disables the display of mappings. It is an error to use this option
without -c --show candidates).
Enable NegEx
[no short option] --negex
Displays information about negated UMLS concepts occurring in the input and the associated
strings that caused the negation; HR only. Negation information is always included in Prolog
Machine Output, XML Output, and Fielded MMI Output.
Turn on Conjunction Processing
[no short option] --conj New in MetaMap2016V2
Causes MetaMap’s phrase chunker to recombine smaller phrases separated by a conjunction. See
this page for more detailed information.
Composite Phrases
-Q (--composite phrases)
Causes MetaMap to construct longer, composite phrases from the smaller phrases produced by the
parser; the integer operand specifies the number of prepositional phrases that can be glommed onto
the initial noun phrase. This option is on by default with a setting of 4, but can be overridden
(e.g., -Q 2 or -Q 0) to achieve greater processing efficiency, albeit possibly with less good results.
For more information, see this page.
5
Prune Threshold
[no short option] --prune
Specify the maximum number of candidate concepts used in creating final mappings. This option
should be used only if MetaMap runs for a very long time.
Disable Pruning
[no short option] --no prune
Disables pruning of candidate concept list.
No Text Tagging
-t (--no tagging)
Bypasses the part-of-speech tagging. By default, the SPECIALIST parser will use the results of a
tagger to assist in parsing. MetaMap currently uses the MedPost/SKR tagger. See this page and
this page for more information about the MedPost tagger, which was developed at NCBI specifically
for tagging biomedical text; we modified it to use MetaMap’s part-of-speech tags.
No Derivational Variants
-d (--no derivational variants)
Prevents the use of any derivational variation in the computation of word variants. This option
exists because derivational variants can involve a significant change in meaning.
All Derivational Variants
-D (all derivational variants)
Allow the use of all derivational variation, instead of only those between adjectives and nouns (the
default). Adjective/noun derivational variants are generally the best derivational variants.
Allow Acronym/Abbreviation Variants
-a (--all acros abbrs)
Allows the use of any acronym/abbreviation (AA) variants, which are the least reliable form of
variation because of the extreme ambiguity of AA variants.
Unique Acronym/Abbreviation Variants Only
-u (--unique acros abbrs only)
Restricts the generation of acronym/abbreviation (AA) variants to those forms with unique expan-
sions. This option generally produces better results than allowing all forms of acronym/abbreviation
variants (using -a or all acros abbrs), but our experience has shown that still better results are
produced by allowing no AA variants.
Allow Large N
-l (--allow large n)
Enables retrieval of Metathesaurus candidates for (a) two-character words occurring in more than
4,000 Metathesaurus strings and (b) one-character words occurring in more than 2,000 Metathe-
saurus strings. This option also allows retrieval for words that can be a preposition, conjunction
or determiner.
Threshold
-r (--threshold)
Restricts output to UMLS candidate concepts whose evaluation score equals or exceeds the specified
threshold. Judicious use of this option can exclude false positives when some input text has no
close matches in the Metathesaurus. An appropriate threshold can usually be determined simply
by examining MetaMap output for typical text in a given application.
Ignore Word Order
-i (--ignore word order)
6
Allows MetaMap to ignore the order of words in the input text. MetaMap was originally developed
to process full text, and consequently depended very strongly on normal English word order. This
option avoids the use of specialized word indexes used for efficient candidate retrieval; it also ignores
word order when matching phrase text to candidate words; and it replaces the normal coverage
metric with an involvement metric for evaluating how well a candidate covers the words of a phrase.
Using this option tends to increase recall but decrease precision.
Prefer Multiple Concepts
-Y (--prefer multiple concepts)
Causes MetaMap to score mappings with more concepts higher than those with fewer concepts
(simply by inverting the normal cohesiveness value). For example, with this option, the input text
lung cancer will be mapped to the two concepts Lung and Cancer, rather than the single concept
Lung Cancer. This option is useful for discovering semantic relationships among concepts found
in text (e.g., lung-LOCATION OF-cancer).
Compute/Display All Mappings
-b (--compute all mappings)
Causes MetaMap to compute and display all mappings, rather than only the top-scoring one(s).
Note: It is rarely useful to display all mappings because of their large number.
Use Word-Sense Disambiguation
-y (--word sense disambiguation)
Causes MetaMap to attempt to disambiguate among concepts scoring equally well. More informa-
tion about MetaMap’s WSD is available here.
Browse Mode Options
Browse Mode is intended to cast as wide a net as possible in mapping text to UMLS concepts, and
will identify concepts even only vaguely related to the input text. For a fuller description, see this
page.
Term Processing
-z (--term processing)
Process terms, i.e., short text fragments, rather than a document containing complete sentences.
See this page for more information about term processing. A typical use of term processing involves
processing a list of terms or a list of terms with ID.
Allow Overmatches
-o (--allow overmatches)
Causes MetaMap to retrieve Metathesaurus candidates containing words on one or both ends that
do not match the text. For example, overmatches of medicine include Antibiotic Medicine, Medicine
Preparations, Investigational Medicinal Product. This option greatly increases the number of can-
didates retrieved and is consequently much slower than MetaMap without overmatches.
Allow Concept Gaps
-g (--allow concept gaps)
Causes MetaMap to retrieve Metathesaurus candidates with gaps. For example, with this option,
MetaMap maps the text chronic toxicity to the UMLS concept chronic radiation toxicity. The word
radiation is inserted into the gap between chronic and toxicity. This option does not appreciably
affect MetaMap’s performance, and is best suited for browsing purposes.
Single-Line-Delimited Input
[no short option] --sldi
7
Single-Line-Delimited Input with ID
[no short option] --sldiID
Causes MetaMap to recognize in a list of terms (--sldi) or a list of terms with IDs (--sldiID)
rather than free text.
Using User-Defined Acronyms/Abbreviations
Use User-Defined Acronyms/Abbreviations
[no short option] --UDA
Allows users to specify acronyms and abbreviations (AAs) that are not defined in the input text
(“UDA” is a recursive acronym meaning “user-defined AA”). This option is designed specifically
for processing clinical text, which often contains undefined AAs. See this page for more information
about processing clinical text with MetaMap. More information about MetaMap’s UDA processing
is available here.
Restricting to/Excluding UMLS Sources and Semantic Types
Retain only Concepts in Specified UMLS Source Vocabularies
-R (--restrict to sources)
Uses only the specified UMLS Source Vocabularies while mapping concepts. E.g., -R ICD10CM,MSH.
More information about UMLS Source Vocabularies is available here.
Exclude Concepts in Specific UMLS Source Vocabularies
-e (--exclude sources)
Excludes the UMLS Source Vocabularies specified in the comma-separated list while mapping
concepts. E.g., -e ICD10CM,MSH. More information about UMLS Source Vocabularies is available
here.
Retain only Concepts with Specified Semantic Types
-J (--restrict to sts)
Restricts output to those concepts with one of the semantic types specified in the comma-separated
list. E.g., -J dsyn,neop. More information about UMLS Semantic Types is available here.
Exclude Concepts with Specified Semantic Type(s)
-k (--exclude sts)
Excludes concepts not having a semantic type in the comma-separated list. E.g., -k dsyn,neop.
More information about UMLS Semantic Types is available here.
NegEx Options
Add NegEx Triggers
[no short option] --negex st add
Delete NegEx Triggers
[no short option] --negex st del
8