I GATE Basics
Introduction
How to Use this Text
Context
Overview
Developing and Deploying Language Processing Facilities
Built-In Components
Additional Facilities in GATE Developer/Embedded
An Example
Some Evaluations
Recent Changes
Next Release
Version 8.0 (May 2014)
Further Reading
Installing and Running GATE
Downloading GATE
Installing and Running GATE
The Easy Way
The Hard Way (1)
The Hard Way (2): Subversion
Running GATE Developer on Unix/Linux
Using System Properties with GATE
Changing GATE's launch configuration
Configuring GATE
Building GATE
Using GATE with Maven/Ivy
Uninstalling GATE
Troubleshooting
Using GATE Developer
The GATE Developer Main Window
Loading and Viewing Documents
Creating and Viewing Corpora
Working with Annotations
The Annotation Sets View
The Annotations List View
The Annotations Stack View
The Co-reference Editor
Creating and Editing Annotations
Schema-Driven Editing
Printing Text with Annotations
Using CREOLE Plugins
Installing and updating CREOLE Plugins
Loading and Using Processing Resources
Creating and Running an Application
Running an Application on a Datastore
Running PRs Conditionally on Document Features
Doing Information Extraction with ANNIE
Modifying ANNIE
Saving Applications and Language Resources
Saving Documents to File
Saving and Restoring LRs in Datastores
Saving Application States to a File
Saving an Application with its Resources (e.g. GATECloud.net)
Keyboard Shortcuts
Miscellaneous
Stopping GATE from Restoring Developer Sessions/Options
Working with Unicode
CREOLE: the GATE Component Model
The Web and CREOLE
The GATE Framework
The Lifecycle of a CREOLE Resource
Processing Resources and Applications
Language Resources and Datastores
Built-in CREOLE Resources
CREOLE Resource Configuration
Configuration with XML
Configuring Resources using Annotations
Mixing the Configuration Styles
Loading Third-Party Libraries using Apache Ivy
Tools: How to Add Utilities to GATE Developer
Putting Your Tools in a Sub-Menu
Adding Tools To Existing Resource Types
Language Resources: Corpora, Documents and Annotations
Features: Simple Attribute/Value Data
Corpora: Sets of Documents plus Features
Documents: Content plus Annotations plus Features
Annotations: Directed Acyclic Graphs
Annotation Schemas
Examples of Annotated Documents
Creating, Viewing and Editing Diverse Annotation Types
Document Formats
Detecting the Right Reader
XML
HTML
SGML
Plain text
RTF
Email
PDF Files and Office Documents
UIMA CAS Documents
CoNLL/IOB Documents
XML Input/Output
ANNIE: a Nearly-New Information Extraction System
Document Reset
Tokeniser
Tokeniser Rules
Token Types
English Tokeniser
Gazetteer
Sentence Splitter
RegEx Sentence Splitter
Part of Speech Tagger
Semantic Tagger
Orthographic Coreference (OrthoMatcher)
GATE Interface
Resources
Processing
Pronominal Coreference
Quoted Speech Submodule
Pleonastic It Submodule
Pronominal Resolution Submodule
Detailed Description of the Algorithm
A Walk-Through Example
Step 1 - Tokenisation
Step 2 - List Lookup
Step 3 - Grammar Rules
II GATE for Advanced Users
GATE Embedded
Quick Start with GATE Embedded
Resource Management in GATE Embedded
Using CREOLE Plugins
Language Resources
GATE Documents
Feature Maps
Annotation Sets
Annotations
GATE Corpora
Processing Resources
Controllers
Modelling Relations between Annotations
Duplicating a Resource
Sharable properties
Persistent Applications
Ontologies
Creating a New Annotation Schema
Creating a New CREOLE Resource
Adding Support for a New Document Format
Using GATE Embedded in a Multithreaded Environment
Using GATE Embedded within a Spring Application
Duplication in Spring
Spring pooling
Further reading
Using GATE Embedded within a Tomcat Web Application
Recommended Directory Structure
Configuration Files
Initialization Code
Groovy for GATE
Groovy Scripting Console for GATE
Groovy scripting PR
The Scriptable Controller
Utility methods
Saving Config Data to gate.xml
Annotation merging through the API
Using Resource Helpers to Extend the API
JAPE: Regular Expressions over Annotations
The Left-Hand Side
Matching Entire Annotation Types
Using Features and Values
Using Meta-Properties
Building complex patterns from simple patterns
Matching a Simple Text String
Using Templates
Multiple Pattern/Action Pairs
LHS Macros
Multi-Constraint Statements
Using Context
Negation
Escaping Special Characters
LHS Operators in Detail
Equality Operators
Comparison Operators
Regular Expression Operators
Contextual Operators
Custom Operators
The Right-Hand Side
A Simple Example
Copying Feature Values from the LHS to the RHS
Optional or Empty Labels
RHS Macros
Use of Priority
Using Phases Sequentially
Using Java Code on the RHS
A More Complex Example
Adding a Feature to the Document
Finding the Tokens of a Matched Annotation
Using Named Blocks
Java RHS Overview
Optimising for Speed
Ontology Aware Grammar Transduction
Serializing JAPE Transducer
How to Serialize?
How to Use the Serialized Grammar File?
Notes for Montreal Transducer Users
JAPE Plus
ANNIC: ANNotations-In-Context
Instantiating SSD
Search GUI
Overview
Syntax of Queries
Top Section
Central Section
Bottom Section
Using SSD from GATE Embedded
How to instantiate a searchabledatastore
How to search in this datastore
Performance Evaluation of Language Analysers
Metrics for Evaluation in Information Extraction
Annotation Relations
Cohen's Kappa
Precision, Recall, F-Measure
Macro and Micro Averaging
The Annotation Diff Tool
Performing Evaluation with the Annotation Diff Tool
Creating a Gold Standard with the Annotation Diff Tool
Corpus Quality Assurance
Description of the interface
Step by step usage
Details of the Corpus statistics table
Details of the Document statistics table
GATE Embedded API for the measures
sec:eval:qapr
Corpus Benchmark Tool
Preparing the Corpora for Use
Defining Properties
Running the Tool
The Results
A Plugin Computing Inter-Annotator Agreement (IAA)
IAA for Classification
IAA For Named Entity Annotation
The BDM-Based IAA Scores
A Plugin Computing the BDM Scores for an Ontology
Quality Assurance Summariser for Teamware
Profiling Processing Resources
Overview
Features
Limitations
Graphical User Interface
Command Line Interface
Application Programming Interface
Log4j.properties
Benchmark log format
Enabling profiling
Reporting tool
Developing GATE
Reporting Bugs and Requesting Features
Contributing Patches
Creating New Plugins
What to Call your Plugin
Writing a New PR
Writing a New VR
Writing a `Ready Made' Application
Distributing Your New Plugins
Updating this User Guide
Building the User Guide
Making Changes to the User Guide
III CREOLE Plugins
Gazetteers
Introduction to Gazetteers
ANNIE Gazetteer
Creating and Modifying Gazetteer Lists
ANNIE Gazetteer Editor
OntoGazetteer
Gaze Ontology Gazetteer Editor
The Gaze Gazetteer List and Mapping Editor
The Gaze Ontology Editor
Hash Gazetteer
Prerequisites
Parameters
Flexible Gazetteer
Gazetteer List Collector
OntoRoot Gazetteer
How Does it Work?
Initialisation of OntoRoot Gazetteer
Simple steps to run OntoRoot Gazetteer
Large KB Gazetteer
Quick usage overview
Dictionary setup
Additional dictionary configuration
Dictionary for Gazetteer List Files
Processing Resource Configuration
Runtime configuration
Semantic Enrichment PR
The Shared Gazetteer for multithreaded processing
Working with Ontologies
Data Model for Ontologies
Hierarchies of Classes and Restrictions
Instances
Hierarchies of Properties
URIs
Ontology Event Model
What Happens when a Resource is Deleted?
The Ontology Plugin: Current Implementation
The OWLIMOntology Language Resource
The ConnectSesameOntology Language Resource
The CreateSesameOntology Language Resource
The OWLIM2 Backwards-Compatible Language Resource
Using Ontology Import Mappings
Using BigOWLIM
The sesameCLI command line interface
The Ontology_OWLIM2 plugin: backwards-compatible implementation
The OWLIMOntologyLR Language Resource
GATE Ontology Editor
Ontology Annotation Tool
Viewing Annotated Text
Editing Existing Annotations
Adding New Annotations
Options
Relation Annotation Tool
Description of the two views
Create new annotation and instance from text selection
Create new annotation and add label to existing instance from text selection
Create and set properties for annotation relation
Delete instance, label or property
Differences with OAT and Ontology Editor
Using the ontology API
Using the ontology API (old version)
Ontology-Aware JAPE Transducer
Annotating Text with Ontological Information
Populating Ontologies
Ontology API and Implementation Changes
Differences between the implementation plugins
Changes in the Ontology API
Non-English Language Support
Language Identification
Fingerprint Generation
French Plugin
German Plugin
Romanian Plugin
Arabic Plugin
Chinese Plugin
Chinese Word Segmentation
Hindi Plugin
Russian Plugin
Bulgarian Plugin
Domain Specific Resources
Biomedical Support
ABNER
MetaMap
GSpell biomedical spelling suggestion and correction
BADREX
MiniChem/Drug Tagger
AbGene
GENIA
Penn BioTagger
MutationFinder
NormaGene
Tools for Social Media Data
Tools for Twitter
Twitter JSON format
Low-level PRs for Tweets
Handling multi-word hashtags
The TwitIE Pipeline
Parsers
MiniPar Parser
Platform Supported
Resources
Parameters
Prerequisites
Grammatical Relationships
RASP Parser
SUPPLE Parser
Requirements
Building SUPPLE
Running the Parser in GATE
Viewing the Parse Tree
System Properties
Configuration Files
Parser and Grammar
Mapping Named Entities
Upgrading from BuChart to SUPPLE
Stanford Parser
Input Requirements
Initialization Parameters
Runtime Parameters
Machine Learning
ML Generalities
Some Definitions
GATE-Specific Interpretation of the Above Definitions
Batch Learning PR
Batch Learning PR Configuration File Settings
Case Studies for the Three Learning Types
How to Use the Batch Learning PR in GATE Developer
Output of the Batch Learning PR
Using the Batch Learning PR from the API
Machine Learning PR
The DATASET Element
The ENGINE Element
The WEKA Wrapper
The MAXENT Wrapper
The SVM Light Wrapper
Example Configuration File
Tools for Alignment Tasks
Introduction
The Tools
Compound Document
CompoundDocumentFromXml
Compound Document Editor
Composite Document
DeleteMembersPR
SwitchMembersPR
Saving as XML
Alignment Editor
Saving Files and Alignments
Section-by-Section Processing
Crowdsourcing Data with GATE
The Basics
Entity classification
Creating a classification job
Loading data into a job
Importing the results
Entity annotation
Creating an annotation job
Loading data into a job
Importing the results
Combining GATE and UIMA
Embedding a UIMA AE in GATE
Mapping File Format
The UIMA Component Descriptor
Using the AnalysisEnginePR
Embedding a GATE CorpusController in UIMA
Mapping File Format
The GATE Application Definition
Configuring the GATEApplicationAnnotator
More (CREOLE) Plugins
Verb Group Chunker
Noun Phrase Chunker
Differences from the Original
Using the Chunker
TaggerFramework
TreeTagger—Multilingual POS Tagger
GENIA and Double Quotes
Chemistry Tagger
Using the Tagger
Zemanta Semantic Annotation Service
Lupedia Semantic Annotation Service
TextRazor Annotation Service
Annotating Numbers
Numbers in Words and Numbers
Roman Numerals
Annotating Measurements
Annotating and Normalizing Dates
Snowball Based Stemmers
Algorithms
GATE Morphological Analyzer
Rule File
Flexible Exporter
Configurable Exporter
Annotation Set Transfer
Schema Enforcer
Information Retrieval in GATE
Using the IR Functionality in GATE
Using the IR API
Websphinx Web Crawler
Using the Crawler PR
Proxy configuration
WordNet in GATE
The WordNet API
Kea - Automatic Keyphrase Detection
Using the `KEA Keyphrase Extractor' PR
Using Kea Corpora
Annotation Merging Plugin
Copying Annotations between Documents
OpenCalais Plugin
LingPipe Plugin
LingPipe Tokenizer PR
LingPipe Sentence Splitter PR
LingPipe POS Tagger PR
LingPipe NER PR
LingPipe Language Identifier PR
OpenNLP Plugin
Init parameters and models
OpenNLP PRs
Obtaining and generating models
Stanford CoreNLP
Stanford Tagger
Stanford Parser
Stanford Named Entity Recognition
Content Detection Using Boilerpipe
Inter Annotator Agreement
Schema Annotation Editor
Coref Tools Plugin
Pubmed Format
MediaWiki Format
Fast Infoset Document Format
CSV Document Support
TermRaider term extraction tools
Termbank language resources
Termbank Score Copier
The PMI bank language resource
Document Normalizer
Developer Tools
Linguistic Simplifier
IV The GATE Family: Cloud, MIMIR, Teamware
GATE Cloud
GATE Cloud services: an overview
Comparison with other systems
How to buy services
Pricing and discounts
Annotation Jobs on GATECloud.net
The Annotation Service Charges Explained
Annotation Job Execution in Detail
Running Custom Annotation Jobs on GATECloud.net
Preparing Your Application: The Basics
The GATECloud.net environment
GATE Teamware: A Web-based Collaborative Corpus Annotation Tool
Introduction
Requirements for Multi-Role Collaborative Annotation Environments
Typical Division of Labour
Remote, Scalable Data Storage
Automatic annotation services
Workflow Support
Teamware: Architecture, Implementation, and Examples
Data Storage Service
Annotation Services
The Executive Layer
The User Interfaces
Practical Applications
GATE Mímir
Appendices
Change Log
Next Release
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
Version 8.0 (May 2014)
Major changes
Other new and improved plugins
Bug fixes and other improvements
For developers
Version 7.1 (November 2012)
New plugins
Library updates
GATE Embedded API changes
Version 7.0 (February 2012)
Major new features
Removal of deprecated functionality
Other enhancements and bug fixes
Version 6.1 (April 2011)
New CREOLE Plugins
Other new features and improvements
Version 6.0 (November 2010)
Major new features
Breaking changes
Other new features and bugfixes
Version 5.2.1 (May 2010)
Version 5.2 (April 2010)
JAPE and JAPE-related
Other Changes
Version 5.1 (December 2009)
New Features
JAPE improvements
Other improvements and bug fixes
Version 5.0 (May 2009)
Major New Features
Other New Features and Improvements
Specific Bug Fixes
Version 4.0 (July 2007)
Major New Features
Other New Features and Improvements
Bug Fixes and Optimizations
Version 3.1 (April 2006)
Major New Features
Other New Features and Improvements
Bug Fixes
January 2005
December 2004
September 2004
Version 3 Beta 1 (August 2004)
July 2004
June 2004
April 2004
March 2004
Version 2.2 – August 2003
Version 2.1 – February 2003
June 2002
Version 5.1 Plugins Name Map
Obsolete CREOLE Plugins
Ontotext JapeC Compiler
Google Plugin
Yahoo Plugin
Using the YahooPR
Gazetteer Visual Resource - GAZE
Display Modes
Linear Definition Pane
Linear Definition Toolbar
Operations on Linear Definition Nodes
Gazetteer List Pane
Mapping Definition Pane
Google Translator PR
Design Notes
Patterns
Components
Model, view, controller
Interfaces
Exception Handling
Ant Tasks for GATE
Declaring the Tasks
The packagegapp task - bundling an application with its dependencies
Introduction
Basic Usage
Handling Non-Plugin Resources
Streamlining your Plugins
Bundling Extra Resources
The expandcreoles Task - Merging Annotation-Driven Config into creole.xml
Named-Entity State Machine Patterns
Main.jape
first.jape
firstname.jape
name.jape
Person
Location
Organization
Ambiguities
Contextual information
name_post.jape
date_pre.jape
date.jape
reldate.jape
number.jape
address.jape
url.jape
identifier.jape
jobtitle.jape
final.jape
unknown.jape
name_context.jape
org_context.jape
loc_context.jape
clean.jape
Part-of-Speech Tags used in the Hepple Tagger
References