logo资料库

Metamap使用方法.pdf

第1页 / 共10页
第2页 / 共10页
第3页 / 共10页
第4页 / 共10页
第5页 / 共10页
第6页 / 共10页
第7页 / 共10页
第8页 / 共10页
资料共10页,剩余部分请下载后查看
MetaMap2016 Usage Notes Fran¸cois-Michel Lang metamap@nlm.nih.gov July 2016 This document explains MetaMap’s command-line options, which support a wide variety of pro- cessing. All options have a long name (e.g., --term_processing), and most have a short name (e.g., -z) as well, for simplicity and ease of use. All use of MetaMap requires a UMLS Metathesaurus license; see this page for all access to MetaMap, including interactive and batch use from our website and downloading and running it locally at user sites. The MetaMap 2016 Release Notes are available here. Users are encouraged to review the MetaMap Usage FAQ, which presents many use cases and scenarios, here. Click on any of the following links for documentation about the various types of MetaMap options. • Usage • Data Options • Output/Display Options • Behavior Options • Browse Mode Options • Using User-Defined Acronyms/Abbreviations • Restricting to/Excluding UMLS Sources and Semantic Types • NegEx Options • Server Options • Miscellaneous Options Usage There are two ways to use MetaMap interactively, reading input text from the keyboard and seeing output on the screen: 1. metamap [ options ] then type your input text, e.g., lung cancer, at the “|:” prompt. 2. echo lung cancer | metamap [ options ] 1
For processing an input file: metamap [ options ] InputFile OutputFile The InputFile and OutputFile options, if specified, must be the last two arguments. If OutputFile is not specified, it will default to InputFile.out. Note that if the output file (whether specified on the command line or not) already exists, it will be overwritten and its original contents lost. For processing another program’s output: OtherProgram | metamap [ options ] OtherProgram | metamap [ options ] > OutputFile To generate a short list of all MetaMap options, simply call metamap --help Data Options MetaMap’s data options determine the Knowledge Source, the Data Version, and the Data Model used for processing. Knowledge Source -Z (--mm data year) Sets the version of the UMLS Metathesaurus to use, e.g., 2017AA, 2017AB, etc. Data Version -V (--mm data version) Sets MetaMap’s data version (Base, USAbase (the default), and NLM). See this page for more information about MetaMap’s Base, USAbase, and NLM data versions. Data Model -A (--strict model) (default) -C (--relaxed model) Sets MetaMap’s data model (strict or relaxed). See this page for more information about MetaMap’s strict and relaxed data models. Output/Display Options MetaMap provides a wide variety of options that control its output. The options that affect only MetaMap’s human-readable output; are labeled “HR only”; using those options with any output format other than human-readable will generate a warning, or, in certain cases, an error. Display Tagger Output -T (--tagger output) Displays the output of the MedPost/SKR tagger lining up input words on one line with their tags on a line below. Hide Header Output [no short option] --silent Suppresses the display of header information such as that shown below. 2
Berkeley DB databases (USAbase 2015AB strict model) are open. Static variants will come from table varsan in /nfsvol/nls/II_Group_WorkArea/MetaMap_DB//DB.USAbase.2015AB.strict. Derivational Variants: Adj/noun ONLY. Variant generation mode: static. Established connection $stream(140152552284000) to TAGGER Server on ii-server3. a.out.Linux (2015) Control options: composite_phrases=4 lexicon=db mm_data_year=2015AB Display Variants -v (--variants) Displays the variants generated for each input word. Hide Plain Syntax -p (--hide plain syntax) Disables the display of the words forming each phrase, as determined by the SPECIALIST parser; HR only. Syntax -x (--syntax) Displays the output of the SPECIALIST parser; HR only. Show Candidates -c (--show candidates) By default, MetaMap output contains only final mappings, but not the candidate concepts identified in the text. This option forces the display of all Metathesaurus candidate concepts identified in the text, regardless of whether they appear in MetaMap’s final mappings. Candidates are displayed best to worst, according to the MetaMap evaluation metric. Number Candidates -n (--number the candidates) Numbers the candidates in a displayed candidate list; HR only. Requires -c (--show candidates). Number Mappings -f (--number the mappings) Numbers the final mappings; HR only. Short Semantic Types -s (--short semantic types) Displays the short form of UMLS Semantic Types rather than the long form, e.g., dsyn rather than Disease or Syndrome; HR only. Show CUIs -I (--show cuis) Displays the UMLS CUI for each concept; HR only. 3
Machine Output -q (--machine output) Generates Prolog terms rather than human-readable form. See this page for more information about MetaMap’s Prolog Machine Output. Formatted XML Output [no short option] --XMLf Generates formatted XML, one XML document per input record/citation. Formatted XML is suitable for reading by humans, but more space intensive than unformatted XML. See this page for detailed information about MetaMap’s XML output formats. Unformatted XML Output [no short option] --XMLn Generates unformatted XML, one XML document per input record/citation. Formatted XML is not suitable for reading by humans, but more compact than formatted XML. See this page for detailed information about MetaMap’s XML output formats. Formatted JSON Output [no short option] --JSONf New in MetaMap2016V2 Generates formatted JSON, one JSON document per input file. See this page for detailed infor- mation about MetaMap’s JSON output formats. Unformatted JSON Output [no short option] --JSONn New in MetaMap2016V2 Generates unformatted JSON, one JSON document per input file. See this page for detailed information about MetaMap’s JSON output formats. Formal Tagger Output -F (--formal tagger output) Displays the tagging information returned by the tagger server. Fielded MMI Output -N (--fielded mmi output) Generate Fielded MMI (MetaMap Indexing) output. See this page for detailed information about MetaMap’s MMI output. Show Concept’s Sources -G (--sources) Displays the Metathesaurus sources for each candidate and mapping in the output; HR only. More information about UMLS Source Vocabularies is available here. Show Acronyms/Abbreviations (AAs) -j (--dump aas) Displays the acronyms/abbreviations (AAs) discovered by MetaMap in the form below (pretty- printed for readability); HR only. AA | PMID | Acronym | Expansion | #Acronym Tokens | #Acronym Chars | #ExpansionTokens | #Expansion Chars | Text Offsets E.g., for the input confidence interval (CI), MetaMap would display AA|00000000|CI|confidence interval|1|2|3|19|21:2 Show Bracketed Output -+ (--bracketed output) 4
Surrounds the Phrase, Candidates, and Mappings sections of output with >>>>> and <<<<< brackets; HR only. E.g., when called with -c (--show candidates) : >>>>> Phrase heart attack <<<<< Phrase >>>>> Candidates Meta Candidates (Total=6; Excluded=0; Pruned=0; Remaining=6) 1000 861 861 861 861 861 -- Heart Attack (Myocardial Infarction) [Disease or Syndrome] HEART (Heart) [Body Part, Organ, or Organ Component] Attack (Onset of illness) [Temporal Concept] attack (Attack behavior) [Social Behavior] Heart (Entire heart) [Body Part, Organ, or Organ Component] Attack (Observation of attack) [Finding] <<<<< Candidates >>>>> Mappings Meta Mapping (1000): 1000 -- Heart Attack (Myocardial Infarction) [Disease or Syndrome] <<<<< Mappings Behavior Options Processing options control MetaMap’s search algorithms and therefore affect the choice of UMLS concepts identified. No Mappings -m (--hide mappings) By default, MetaMap output contains only final mappings, and not all the candidate concepts found in the text. This option disables the display of mappings. It is an error to use this option without -c --show candidates). Enable NegEx [no short option] --negex Displays information about negated UMLS concepts occurring in the input and the associated strings that caused the negation; HR only. Negation information is always included in Prolog Machine Output, XML Output, and Fielded MMI Output. Turn on Conjunction Processing [no short option] --conj New in MetaMap2016V2 Causes MetaMap’s phrase chunker to recombine smaller phrases separated by a conjunction. See this page for more detailed information. Composite Phrases -Q (--composite phrases) Causes MetaMap to construct longer, composite phrases from the smaller phrases produced by the parser; the integer operand specifies the number of prepositional phrases that can be glommed onto the initial noun phrase. This option is on by default with a setting of 4, but can be overridden (e.g., -Q 2 or -Q 0) to achieve greater processing efficiency, albeit possibly with less good results. For more information, see this page. 5
Prune Threshold [no short option] --prune Specify the maximum number of candidate concepts used in creating final mappings. This option should be used only if MetaMap runs for a very long time. Disable Pruning [no short option] --no prune Disables pruning of candidate concept list. No Text Tagging -t (--no tagging) Bypasses the part-of-speech tagging. By default, the SPECIALIST parser will use the results of a tagger to assist in parsing. MetaMap currently uses the MedPost/SKR tagger. See this page and this page for more information about the MedPost tagger, which was developed at NCBI specifically for tagging biomedical text; we modified it to use MetaMap’s part-of-speech tags. No Derivational Variants -d (--no derivational variants) Prevents the use of any derivational variation in the computation of word variants. This option exists because derivational variants can involve a significant change in meaning. All Derivational Variants -D (all derivational variants) Allow the use of all derivational variation, instead of only those between adjectives and nouns (the default). Adjective/noun derivational variants are generally the best derivational variants. Allow Acronym/Abbreviation Variants -a (--all acros abbrs) Allows the use of any acronym/abbreviation (AA) variants, which are the least reliable form of variation because of the extreme ambiguity of AA variants. Unique Acronym/Abbreviation Variants Only -u (--unique acros abbrs only) Restricts the generation of acronym/abbreviation (AA) variants to those forms with unique expan- sions. This option generally produces better results than allowing all forms of acronym/abbreviation variants (using -a or all acros abbrs), but our experience has shown that still better results are produced by allowing no AA variants. Allow Large N -l (--allow large n) Enables retrieval of Metathesaurus candidates for (a) two-character words occurring in more than 4,000 Metathesaurus strings and (b) one-character words occurring in more than 2,000 Metathe- saurus strings. This option also allows retrieval for words that can be a preposition, conjunction or determiner. Threshold -r (--threshold) Restricts output to UMLS candidate concepts whose evaluation score equals or exceeds the specified threshold. Judicious use of this option can exclude false positives when some input text has no close matches in the Metathesaurus. An appropriate threshold can usually be determined simply by examining MetaMap output for typical text in a given application. Ignore Word Order -i (--ignore word order) 6
Allows MetaMap to ignore the order of words in the input text. MetaMap was originally developed to process full text, and consequently depended very strongly on normal English word order. This option avoids the use of specialized word indexes used for efficient candidate retrieval; it also ignores word order when matching phrase text to candidate words; and it replaces the normal coverage metric with an involvement metric for evaluating how well a candidate covers the words of a phrase. Using this option tends to increase recall but decrease precision. Prefer Multiple Concepts -Y (--prefer multiple concepts) Causes MetaMap to score mappings with more concepts higher than those with fewer concepts (simply by inverting the normal cohesiveness value). For example, with this option, the input text lung cancer will be mapped to the two concepts Lung and Cancer, rather than the single concept Lung Cancer. This option is useful for discovering semantic relationships among concepts found in text (e.g., lung-LOCATION OF-cancer). Compute/Display All Mappings -b (--compute all mappings) Causes MetaMap to compute and display all mappings, rather than only the top-scoring one(s). Note: It is rarely useful to display all mappings because of their large number. Use Word-Sense Disambiguation -y (--word sense disambiguation) Causes MetaMap to attempt to disambiguate among concepts scoring equally well. More informa- tion about MetaMap’s WSD is available here. Browse Mode Options Browse Mode is intended to cast as wide a net as possible in mapping text to UMLS concepts, and will identify concepts even only vaguely related to the input text. For a fuller description, see this page. Term Processing -z (--term processing) Process terms, i.e., short text fragments, rather than a document containing complete sentences. See this page for more information about term processing. A typical use of term processing involves processing a list of terms or a list of terms with ID. Allow Overmatches -o (--allow overmatches) Causes MetaMap to retrieve Metathesaurus candidates containing words on one or both ends that do not match the text. For example, overmatches of medicine include Antibiotic Medicine, Medicine Preparations, Investigational Medicinal Product. This option greatly increases the number of can- didates retrieved and is consequently much slower than MetaMap without overmatches. Allow Concept Gaps -g (--allow concept gaps) Causes MetaMap to retrieve Metathesaurus candidates with gaps. For example, with this option, MetaMap maps the text chronic toxicity to the UMLS concept chronic radiation toxicity. The word radiation is inserted into the gap between chronic and toxicity. This option does not appreciably affect MetaMap’s performance, and is best suited for browsing purposes. Single-Line-Delimited Input [no short option] --sldi 7
Single-Line-Delimited Input with ID [no short option] --sldiID Causes MetaMap to recognize in a list of terms (--sldi) or a list of terms with IDs (--sldiID) rather than free text. Using User-Defined Acronyms/Abbreviations Use User-Defined Acronyms/Abbreviations [no short option] --UDA Allows users to specify acronyms and abbreviations (AAs) that are not defined in the input text (“UDA” is a recursive acronym meaning “user-defined AA”). This option is designed specifically for processing clinical text, which often contains undefined AAs. See this page for more information about processing clinical text with MetaMap. More information about MetaMap’s UDA processing is available here. Restricting to/Excluding UMLS Sources and Semantic Types Retain only Concepts in Specified UMLS Source Vocabularies -R (--restrict to sources) Uses only the specified UMLS Source Vocabularies while mapping concepts. E.g., -R ICD10CM,MSH. More information about UMLS Source Vocabularies is available here. Exclude Concepts in Specific UMLS Source Vocabularies -e (--exclude sources) Excludes the UMLS Source Vocabularies specified in the comma-separated list while mapping concepts. E.g., -e ICD10CM,MSH. More information about UMLS Source Vocabularies is available here. Retain only Concepts with Specified Semantic Types -J (--restrict to sts) Restricts output to those concepts with one of the semantic types specified in the comma-separated list. E.g., -J dsyn,neop. More information about UMLS Semantic Types is available here. Exclude Concepts with Specified Semantic Type(s) -k (--exclude sts) Excludes concepts not having a semantic type in the comma-separated list. E.g., -k dsyn,neop. More information about UMLS Semantic Types is available here. NegEx Options Add NegEx Triggers [no short option] --negex st add Delete NegEx Triggers [no short option] --negex st del 8
分享到:
收藏