logo资料库

Joint Entity and Relation Extraction Based on a hybrid neural ne....pdf

第1页 / 共8页
第2页 / 共8页
第3页 / 共8页
第4页 / 共8页
第5页 / 共8页
第6页 / 共8页
第7页 / 共8页
第8页 / 共8页
资料共8页,全文预览结束
Joint entity and relation extraction based on a hybrid neural network
1 Introduction
2 Related works
2.1 Named entity recognition
2.2 Relation classification
2.3 Joint entity and relation extraction
2.4 LSTM and CNN models On NLP
3 Our method
3.1 Bidirectional LSTM encoding layer
3.2 Named entity recognition (NER) module
3.3 Relation classification (RC) module
3.4 Training and implementation
4 Experiment
4.1 Experimental setting
4.2 Results
5 Analysis and discussions
5.1 Analysis of named entity recognition module
5.2 Analysis of relation classification module
5.3 The effect of two entities’ distance
5.4 Error analysis
6 Conclusion
Acknowledgment
References
JID: NEUCOM ARTICLE IN PRESS Neurocomputing 0 0 0 (2017) 1–8 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom [m5G; March 8, 2017;1:24 ] Joint entity and relation extraction based on a hybrid neural network Suncong Zheng Hongwei Hao a , Bo Xu a , c a , ∗ a , Yuexing Hao a , Dongyuan Lu b , Hongyun Bao , Jiaming Xu a , a Digital Content Technology Research Center, Institute of Automation, Chinese Academy of Sciences, China b The School of Information Technology and Management, University of International Business and Economics, China c Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, China a r t i c l e i n f o a b s t r a c t Entity and relation extraction is a task that combines detecting entity mentions and recognizing en- tities’ semantic relationships from unstructured text. We propose a hybrid neural network model to extract entities and their relationships without any handcrafted features. The hybrid neural network contains a novel bidirectional encoder-decoder L STM module (BiL STM-ED) for entity extraction and a CNN module for relation classification. The contextual information of entities obtained in BiLSTM-ED further pass though to CNN module to improve the relation classification. We conduct experiments on the public dataset ACE05 (Automatic Content Extraction program) to verify the effectiveness of our method. The method we proposed achieves the state-of-the-art results on entity and relation extraction task. © 2017 Elsevier B.V. All rights reserved. Article history: Received 15 July 2016 Revised 20 December 2016 Accepted 25 December 2016 Available online xxx Keywords: Neural network Information extraction Tagging Classification 1. Introduction Entity and relation extraction is to detect entity mentions and recognize their semantic relationships from text. It is an important issue in knowledge extraction and plays a vital role in automatic construction of knowledge base. Traditional systems treat this task as a pipeline of two sepa- rated tasks, i.e., named entity recognition (NER) [1] and relation classification (RC) [2] . This separated framework makes the task easy to deal with, and each component can be more flexible. But it pays little attention to the relevance of two sub-tasks. Joint learn- ing framework is an effective approach to correlate NER and RC, which can also avoid cascading of errors [3] . However, most ex- isting joint methods are feature-based structured systems [3–7] . They need complicated feature engineering and heavily rely on the supervised NLP toolkits, which might also lead to error prop- agation. In order to reduce the manual work in feature extrac- tion, recently, Miwa and Bansal [8] present a neural network-based method for the end-to-end entity and relation extraction. How- ever, when detecting the entity, they use a NN structure to pre- dict the entity tags, which neglects the long relationships between tags. Based on the above analysis, we propose a hybrid neural net- work model to settle these problems which contains a named en- ∗ Corresponding author. E-mail address: hongyun.bao@ia.ac.cn (H. Bao). http://dx.doi.org/10.1016/j.neucom.2016.12.075 0925-2312/© 2017 Elsevier B.V. All rights reserved. tity recognition (NER) module and a relation classification (RC) module. NER and RC share a same bidirectional LSTM encoding layer, which is used to encode each input word by taking into account the context on both sides of the word. Although bidirec- tional LSTM can capture long distance interactions between words, each output entity tag is predicted independently. Hence, we also adopt a LSTM structure to explicitly model tag interactions. It can capture the long distance relationships between tags when com- paring with NN decoding manner [8] . As for relation classification, sub-sentence between two entities has been proven to effectively reflect the entities relationship [9,10] . Besides, bidirectional LSTM encoding layer can obtain entities’ contextual information that is also benefit for identifying relationships between entities. Hence, we adopt a CNN model, which has achieved great success in ex- tracting relations, to extract relation based on the encoding infor- mation of entities and the sub-sentence information. Our model not only considers the relevance of NER module and RC module when comparing with classical pipeline methods, but also considers the long distance relationships between entity tags and without complicated feature engineering, when compar- ing with existing joint learning methods. We conduct experiments on the public dataset ACE05 (Automatic Content Extraction pro- 1 . Our method achieves the state-of-the-art results on entity gram) and relation extraction task. Besides, we also analyze the perfor- mance of the two modules alone. On the entity detection task, our 1 http://www.itl.nist.gov/iad/mig//tests/ace/ Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM 2 ARTICLE IN PRESS S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 [m5G; March 8, 2017;1:24 ] NER module achieves 2% improvements when comparing with dif- ferent kinds of LSTM structures, which verifies the effectiveness of NER module. On the task of relation classification, it shows that the entities’ contextual information, obtained at the encoding pro- cedure, can promote the accuracy of relation classification. The remainder of the paper is structured as follows. In Section 2 , we review related work about named entity recogni- tion, relation classification and neural networks used in this pa- per. Section 3 present our hybrid neural network in detail. In Section 4 , we describe details about the setup of experiment and presents the experimental results. Finally, we analyze the model in Section 5 and make conclusion in Section 6 . 2. Related works Entity and relation extraction is an important step to con- struct a knowledge base, which can be benefit for many NLP tasks [11] and social media analysis tasks [12,13] . There are two main frameworks to solve the problem of extracting entity and their relationships: the pipeline method and the joint learning model. The pipeline method treats this task as a pipeline of two sepa- rated tasks, i.e., named entity recognition (NER) [14–17] and rela- tion classification (RC) [2,9,10,18,19] . The joint model extracts enti- ties and relations simultaneously. Hence, in this paper, the problem we focused is related to named entity recognition, relation classi- fication and joint entity and relation extraction. The methods we used are related to long short term memory networks (LSTM) and convolutional neural network (CNN). 2.1. Named entity recognition Named entity recognition is a classic task in NLP. Most existing NER models are traditional linear statistical models, such as Hid- den Markov Models (HMM) and Conditional Random Fields (CRF) [14,20] . Their performances rely heavily on hand-crafted features extracted by NLP tools and external knowledge resources. Recently, several neural network architectures have been successfully ap- plied to NER, which is regarded as a sequential token tagging task. Collobert et al. [21] used a CNN and a CRF on top with word em- beddings. Nowadays, Recurrent Neural Networks (RNN) has shown better performance than other neural networks in many sequence- to-sequence tasks. Chiu and Nichols [15] proposed a hybrid model by learning both character-level and word-level features. They de- coded each tag independently base on a linear layer and a log- softmax layer. [16,17,22] proposed a BiLSTM and a CRF on top for jointly tag decoding. Miwa and Bansal [8] proposed a BiLSTM for encoding and a single incrementally NN structure to decode tags jointly. These RNN models all utilized BiLSTM as encoding models, but the decoding manners were different. 2.2. Relation classification Relation classification is a widely studied task in the NLP com- munity. Various approaches have been proposed to accomplish the task. Existing methods for relation classification can be di- vided into handcrafted feature based methods [2,23] , neural net- work based methods [19,24–27] and the other valuable methods [25,28] . The handcrafted feature based methods focus on using different natural language processing (NLP) tools and knowledge resources to obtain effective handcrafted features. Kambhatla [23] employs Maximum Entropy model to combine diverse lexical, syntactic and semantic features derived from the text. It is the early work for re- lation classification. The features they used are not comprehensive. Rink [2] designs 16 kinds of features that are extracted by using many supervised NLP toolkits and resources including POS, Word- Net, dependency parse, etc. It can get the best result at SemEval- 2010 Task 8 when compared with other handcrafted features based methods. However, it relied heavily on other NLP tools and it also requires a lot of work to design and extract features. In recent years, deep neural models have made significant progress in the task of relation classification. These models can learn effective relation features from the given sentence with- out complicated feature engineering. The most common neural- network based models applied in this task are Convolutional Neu- ral Networks (CNN) [18,19,27,29,30] and sequential neural net- works such as Recurrent Neural Networks (RNN) [31] , Recursive Neural Networks (RecNN) [24,32] and Long Short Term Memory Networks (LSTM) [26,33] . There also exists other valuable meth- ods such as the kernel-based methods [28,34] and compositional model [25] . Nguyen et al. [28] explore the use of innovative kernels based on syntactic and semantic structures for the task and Sun and Han [34] propose a new tree kernel, called feature-enriched tree kernel (FTK) for relation extraction. The compositional model FCM [25] learns representations for the substructures of an anno- tated sentence. Compared to existing compositional models, FCM can easily handle arbitrary types of input and global information for composition. 2.3. Joint entity and relation extraction Although, pipeline method can be more flexible to design the system, it neglects the relevance of sub-tasks and may also lead to the error propagation [3] . Most existing joint methods are feature- based structured systems [3,4,35–37] , which need complicated fea- ture engineering. [35,36] proposed a joint model that uses opti- mal results of subtasks and seeks a globally optimal solution. Singh et al. [37] proposed a single joint graphical model that represents the various dependencies between subtasks. Li and Ji [3] proposed the first model to incrementally predict entities and relations us- ing a single joint model, which is a structured perceptron with efficient beam search. Miwa and Sasaki [4] introduced a table to represent the entity and relation structures in sentences, and pro- posed a history-based beam search structured learning model. Re- cently, Miwa and Bansal [8] used a LSTM-based model to extract entities and relations, which can reduce the manual work. 2.4. LSTM and CNN models On NLP The methods used in this paper are based on neural network models: Convolutional neural networks (CNN) and Long Short- Term Memory (LSTM). CNN is originally invented for computer vi- sion [38] and it always be used to extract image’s features [39,40] . In recent years, CNNs have been successfully applied to different NLP tasks and have also shown the effectiveness on extracting sen- tence semantic and keywords information [27,41–43] . Long-Short Term Memory (LSTM) model is a specific kind of recurrent neu- ral networks (RNNs). LSTM replaces the hidden vector of a recur- rent neural network with memory blocks which are equipped with gates. It can keep long term memory by training proper gating weights [44,45] . LSTM have also shown powerful capacity on many NLP tasks such as machine translation [46] , sentence representa- tion [47] and relation extraction [26] . In this paper, we propose a hybrid neural network based on joint learning the entities and their relationships. It can learn re- lated features from given sentences without complicated feature engineering work, when compared with handcrafted feature based methods. When comparing with the other neural network based method [8] , our method considers the long distance relationships between entity tags. Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM ARTICLE IN PRESS [m5G; March 8, 2017;1:24 ] S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 3 NER S RC PART-WHOLE ... O T1 s1 B Tt st L ... Tt+1 st+1 Tn sn R CNN SoŌmax CNN Layer ∑ Wt,Wt+1,...Wn ∑ Input h1 ... ht ht+1 ... hn h1 ht ht+1 hn h1 ht ht+1 hn W1 ... Wt Wt+1 ... Wn ct-1 xt ht-1 . f σ Forget gate From New York America LSTM Block Block gate SoŌmax LSTM Decode Merge Layer Backward LSTM Forward LSTM Word Embedding Input Sentence h . Output gate σo xt ht-1 tanh c Cell . i z tanh xt ht-1 σ Input gate ct-1 xtht-1 Fig. 1. The framework of the hybrid neural network for jointly extracting entities and relations. 3. Our method The framework of hybrid neural network is shown in Fig. 1 . The first layer of hybrid neural network is a bidirectional LSTM en- coding layer, which is shared by both named entity recognition (NER) module and relation classification (RC) module. There are two “channels” after encoding layer, one links to the NER mod- ule which is a LSTM decoding layer, the other feeds into a CNN layer to extract the relations. In following parts, we describe these components in detail. 3.1. Bidirectional LSTM encoding layer . . . = { , w 1 } ...w n w t+1 , w t ∈ , where w t The Bi-LSTM encoding layer contains word embedding layer, forward lstm layer, backward lstm layer and the concatenate layer. The word embedding layer converts the word with 1-hot represen- tation to an embedding vector. Hence, a sequence of words can be d is the d - represented as W dimensional word vector corresponding to the t th word in the sen- tence and n is the length of the given sentence. After word embed- ding layer, there are two parallel LSTM layers: forward lstm layer and backward lstm layer. For each word w t , the forward layer will −→ encode w t by considering the contextual words information from w 1 h t . In the similar way, the backward layer will encode w t based on the contextual words information from w n to w t , which marked as to w t , which marked as ← − h t . R The LSTM architecture consists of a set of recurrently connected subnets, known as memory blocks. Each time-step in forward hid- den layer and backward hidden layer is a LSTM memory block. A block contains one or more self-connected memory cells and three multiplicative units the input, output and forget gates that provide continuous analogues of write, read and reset operations for the cells [45] . Fig. 2 provides an illustration of a LSTM mem- ory block with a single cell. At each time-step, a lstm memory block is used to compute current hidden vector h t based on the previous hidden vector h t−1 the previous cell vector c t−1 and the , ←−− current input word embedding w t , which can be shortly denoted h t+1 , as: . The de- tail operation of lstm can be defined as follows: ←−− c t+1 ) , w t −−→ c t−1 ) , w t −−→ h t−1 , −→ h t ← − h t ( lstm ( lstm and = = = i t + + + δ(W xi x t W ci c t−1 W hi h t−1 ) b i , (1) Fig. 2. LSTM memory block with one cell. = f t + + + δ(W x f x t W c f c t−1 W h f h t−1 ) b f , = tanh z t + + W hc h t−1 (W xc x t ) b c , = + f t c t−1 , i t z t c t = o t + + + δ(W xo x t W ho h t−1 ) b o W co c t , = o t tanh h t (c t ) , (2) (3) (4) (5) (6) where i, f and o are the input gate, forget gate and output gate · denotes respectively, b is the bias term, c is the cell memory, element-wise multiplication and W (.) are the parameters. Finally, −→ we concatenate = h t ] . tion, which is denoted as h t −→ h t to represent word t ’s encoded informa- ← − h t and ← − , h t [ 3.2. Named entity recognition (NER) module Each word will be assigned an entity tag. The tags are com- monly used encoding scheme: BILOS (Begin, Inside, Last, Outside, Single) [22,48] . Each tag contains the position information of a word in the entity. We also adopt a LSTM structure to explicitly model tag interactions. When detecting the entity tag of word t , the inputs of decoding layer are: h t obtained from Bi-LSTM encod- ing layer, former tag predicted vector T t−1 , and the former hidden state of decoding LSTM s t−1 . Each unit of the decoding LSTM is the same as the encoding lstm memory block except for the input gate, which can be rewritten as: = i t + + + δ(W xi h t W ti T t−1 W hi s t−1 ) b i , (7) where the tag predicted vector T is transformed from the hidden state s as follows: The final softmax layer computes normalized entity tag probabili- ties based on the tag predicted vector T t : = + . b ts W ts s t T t = + , b y W y T t y t (8) (9) Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM 4 he1 we1 we1+1 we2 he2 Input Sequence ARTICLE IN PRESS S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 [m5G; March 8, 2017;1:24 ] l R e a Ɵ o n T a g Wc1 Wc2 Wcn Embedding Layer ConvoluƟon Layer Max Pooling Layer SoŌmax Layer 0.3 0.25 l s e p m a s 0.2 0.15 0.1 f o n o i t r o p o r P Fig. 3. The convolutional module for relation classification. 0.05 0 0 10 = i p t , exp(y ) i t nt j=1 j exp(y ) t 20 30 The distance between two enƟƟes 40 50 (10) Fig. 4. The distribution of dataset based on the distance between two entities. The horizontal axis is the distance between two entities and the vertical axis represents the number of samples corresponding to distance. where W y is the softmax matrix, nt is the total number of entity tags. Because the T is similar to tag embedding and LSTM is ca- pable of learning long-term dependencies, this manner can model tag interactions. 3.3. Relation classification (RC) module Bernoulli with probability i which makes the model more robust. In Formula 15 , p r probability that the sentence describes relation i . ρ. Dropout guards against overfitting, means the 3.4. Training and implementation When recognizing entities’ semantic relationships, we merge the encoding information of entities and the sub-sentence between entities, then feed them into the CNN model [49] . It can be repre- sented as: = CN N R (11) w e 1+1 , ([ h e 1 , h e 2 ]) . , w e 2 , w e 1 , . . where R is the relation label, h e is the encoding information of en- tity, w is the word embedding. Especially, a entity may contain two or more words, we sum up these words’ encoding information to represent the whole entity information. Besides, the CNN denotes the convolutional operations which shown in Fig. 3 . ×d to represent the i th convolution filter and br to represent the bias term accord- ingly, where k is the context window size of the filter. Filter = (i ) . W will slide through the input sequence S c i . The sliding process can be , w e 2 h e 2 ] to get the latent features z represented as: (i ) In convolution layer, we use W c w e 1+1 , , , w e 1 [ h e 1 ∈ ∈ (i ) k R R . . We train our models to maximize the log-likelihood of the data and the optimization method we used is RMSprop proposed by Hinton in [52] . The objective function of NER module can be de- fined as: ( j) log(p t = ( j) y t | , x j ) ner , (16) | | = max L ner D L j j=1 t=1 | where | D is the size of dataset, L j is the length of sentence x j ( j) and p t is the label of word t in sentence x j is the normalized entity tag probabilities which defined in Formula 10 . Besides, RC module’s objective function is: | | = max L rc D j=1 ( j) log(p r = ( j) y r | , x j ) rc , (17) ( j) , y t = (i ) z l (i ) σ (W c ∗ s l : l + + −1 br k (i ) ) , (12) ( j) where p r is defined in Formula 15 . (i ) where z l word s l+ −1 k denoted as: z = (i ) (i ) is the feature extracted by filter W to c . Hence, the latent features of the given sequence S are from word s l (i ) , [ z 1 . . . (i ) , −k z L +1 ] . We then apply the max-pooling (i ) operation to reserve the most prominent feature of filter W c denote it as: = max { } = max { (i ) z (i ) z max (i ) , z 1 . . . , (i ) −k z L } +1 . and (13) We use multiple filters to extract multiple features. Therefore, = the relation features of the given sequence is represented as: R s [ z (nr) , ] where nr is the number of filters. max (1) , max , z . . . After that we set a soft-max layer [50] with dropout [51] to classify the relations based on relation features R s , which is defined as: = W R y r + , b R · (R s ◦ r) (14) = i p r exp(y ) i r nc j=1 j exp(y ) r , (15) ∈ where W R of relation classes, symbol ∈ cation operator and r nr·nc is the softmax matrix, nc is the total number ◦ denotes the element-wise multipli- nr is a binary mask vector drawn from R R We firstly train NER module to recognize the entities and ob- tain the encoding information of entities, then further train the RC module to classify relations based on the encoding information and the entities combinations. Specially, we find that if there is a relationship between the two entities, the distance of two entities always smaller than about 20 words, which is shown in Fig. 4 . Hence, when determining the re- lationship between the two entities, we also make full use of this property that if the distance of two entities is larger than L max , we don’t think there exists a relationship between them. L max is around 20 in the ACE05 dataset based on the statistical results of Fig. 4 . 4. Experiment 4.1. Experimental setting Datasets. We use public dataset ACE05 for entity and rela- tion extraction, which 6 coarse-grained relation types and an additional “other” relation to denote non-entity or non-relation classes. The 6 coarse-grained relation types are “ART (artifact)”, “G- A (Gen-affiliation)”, “O- A (Org-affiliation)”, “P-W (PART-WHOLE)”, “P-S (person-social)” and “PHYS (physical)”. The same relation type Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM ARTICLE IN PRESS [m5G; March 8, 2017;1:24 ] S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 5 Table 1 Hyper parameters of the hybrid neural network. Parameter Parameter description Parameter value d ne nd k nr ρ Dimension of word embedding The number of hidden units in encode layer The number of hidden units in decode layer Context window size of CNN module The filter number of CNN The ratio of dropout 300 300 300 3 100 0.3 Table 2 Comparisons with the baselines on the ACE05 test set. Model P (%) R (%) F 1(%) ME) [3] + Pipeline (CRF Joint w/Global [3] SPTree [8] Our method 65.1 65.4 65.8 61.9 38.1 39.8 42.9 45.0 48.0 49.5 51.9 52.1 with opposite directions are considered to be two classes. For ex- ample, “PART-WHOLE (e1,e2)” and “PART-WHOLE(e2,e1)” are differ- ent relations. The “PART-WHOLE(e1,e2)” means that e1 is a part of e2 and “PART-WHOLE(e2,e1)” means e1 contains e2. Hence, there are 13 relation classes in total. The data pre-processing and set- tings in experiments are the same as [3] . Baselines. The baselines we used are recent methods for the ACE05 dataset, which include a classical pipeline model [3] , a joint feature-based model called Joint w/Global [3] , and an end-to-end NN-based model SPTree [8] . • • • Pipeline (CRF+ME) [3] trained a linear-chain Conditional Ran- dom Fields model [53] for entity mention extraction and a Maximum Entropy model [54] for relation extraction. It is a classical pipeline method for the task. Joint w/Global [3] incrementally extract entity mentions to- gether with relations using a single model. They developed a number of new and effective global features as soft constraints to capture the interdependency among entity mentions and re- lations. SPTree [8] presented a novel end-to-end relation extraction model that represents both word sequence and dependency tree structures by using bidirectional sequential and bidirec- tional tree-structured LSTM-RNNs. Metrics. To compare our model with baselines, we use Precision (P), Recall (R) and F- Measure (F1) in the task of joint entity and relation extraction. A relation instance is regarded as correct when its relation type and the head offsets of two corresponding entities are both correct. Hyper parameters. In this paper, we propose a hybrid neural network to extract entities and their relations. The hyper parame- ters used in the model are summarized in Table 1 . 4.2. Results The predicted results on test set are shown in Table 2 . Our method achieves F 1 of 52.1%, which is the best result when com- paring with the existing methods. It illustrates the effectiveness of our proposed hybrid neural network on the task of jointly extract- ing the entities and their relationships. Besides, the Joint w/Global [3] approach outperforms the pipelined method and the neural network based methods (SPTree [8] and our model) can get a higher F 1 results than these feature based methods [3] . It shows that neural network model accompa- nied with joint learning manner is a feasible way to extract entities and their relationships. Especially, the precision results of these methods are similar and the difference is mainly concentrated in the recall results. Our method can balance the precision and recall, which achieve a bet- ter F 1 result. 5. Analysis and discussions 5.1. Analysis of named entity recognition module The NER module contains a bidirectional LSTM encoding layer and a LSTM decoding layer. We use BiLSTM-ED to represent the structure of NER module. In order to further illustrate the effec- tiveness of BiLSTM-ED on the task of entity extraction, we com- pare BiLSTM-ED with its different variations and other effective se- quence labeling models. The contrast methods are: • • • • • Forward-LSTM uses a unidirectional LSTM to encode the input to w n , then also applies a LSTM structure to sentence from w 1 decode the entity tags. Backward-LSTM has the similar manner of Forward-LSTM, the difference is the encoding order which is from w n to w 1 . BiLSTM-NN uses a bi-directional LSTM to encode the input sen- tence and uses a feed-forward neural network (NN) architecture to predict the entity tags. It neglects the relationship between tags. BiLSTM-NN-2 [8] uses a bi-directional LSTM to encode the in- put sentence and uses a novel feed-forward neural network (NN) by considering adjacent tags information instead of the long distance relationships between tags. CRF [53] is a classic and effective sequence labeling model. In this section, we use CRF as one of the powerful comparison method and the feature used in CRF are the same as [3] used. We use the standard F1 to evaluate the performance of these methods and treat an entity as correct when its type and the re- gion of its head are correct. Table 3 shows the results of the above methods on the task of name entity recognition. When comparing with Forward-LSTM and Backward-L STM, the bi-directional L STM encoding manner can have significant improvements. Bi-LSTM en- coding considers the whole sentence information when comparing with the ui-LSTM encoding, hence it can achieve much higher ac- curacies in the tagging task. BiLSTM-NN-2 is better than BiLSTM- NN, which shows the need of considering the relationship between tags. Besides, BiLSTM-ED is batter than BiLSTM-NN-2 which means that considering the long distance relationships between tags can be better than only considering the adjacent tags information. We also compare BiLSTM-ED with the famous sequential model of CRF. The result also shows the effectiveness of BiLSTM-ED. 5.2. Analysis of relation classification module In the relation classification module, we use two kinds of infor- mation: the sub-sentence between entities and the encoding infor- mation of entities obtained from bidirectional LSTM layer. In order to illustrate the effectiveness of these information we considered, Table 3 Comparisons with the different methods on the task of entity detection. Methods P (%) R (%) F 1(%) Forward-LSTM Backward-LSTM CRF BiLSTM-NN BiLSTM-NN-2 BiLSTM-ED 63.8 65.3 83.2 83.3 85.5 85.2 59.2 60.0 73.6 83.0 81.2 85.4 60.0 61.0 78.1 82.2 83.3 84.2 Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM 6 ARTICLE IN PRESS S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 [m5G; March 8, 2017;1:24 ] Table 4 Comparisons of different information on the task of relation classification. Methods P (%) R (%) F 1(%) Full-CNN Sub-CNN Sub-CNN-H 30.8 57.7 58.3 34.9 51.9 54.8 32.7 54.6 56.5 51 50.5 50 1 F 49.5 49 48.5 48 0 5 10 15 20 25 30 35 Lmax Fig. 5. The F 1 results of different L max values. The horizontal axis is the distance between two entities and its range is from 5 to 30. The vertical axis represents F 1 value on the relation classification tasks. In order to exclude the effect of encoding information, we use Sub-CNN to obtain the F 1 results. we compare our method with its different variations. We firstly use the NER module to detect the entities in sentence, then uses the right entity recognition results of step 1 to test the RC mod- ule. We report the effect of these information on the relation clas- sification task as Table 4 shows. Full-CNN uses a whole sentence to recognize the relationships of entities. Sub-CNN only uses the sub-sentence between two entities. Sub-CNN-H uses both the sub- sentence and the encoding information of entities obtained from bidirectional encoding layer. When comparing Full-CNN with Sub- +20% im- CNN, the result shows that sub-sentence can achieve a provement. This result matches [9] ’s analysis that most relation- ships can be reflected by the sub-sentence between the given two entities instead of full sentence. When adding the encoding infor- mation of entities into the Sub-CNN, Sub-CNN-H can further pro- mote the accuracy of relation classification. It verifies that entities’ contextual information is also benefit for identifying relationships between entities. 5.3. The effect of two entities’ distance From Fig. 4 , we know that the data distribution shows the long tail property when the horizontal axis is the distance between two entities. Hence, we set a threshold L max to filter the data. If two entities’ distance is larger than L max , we treat that these two enti- ties have no relationship. In order to analyze the effect of thresh- old L max , we use Sub-CNN to predict entities relationships based on different L max values. The effect is shown in Fig. 5 . The smaller of L max is , the more data will be filtered. So if L max is too small, it may filter the right data and make the F 1 results decline. If L max is too large, it cannot filter the noisy data which may also hurt the final results. Fig. 5 shows that when L max is between 10 and 25, it can perform well. The range also matches the statistical results of Fig. 4 . 5.4. Error analysis To analyze the errors of our method, we visualized the model’s predicted results on relation classification task as Fig. 6 shows. The diagonal region indicates the correct prediction results and the other regions reflect the distribution of error samples. The high- lighted diagonal region means that our method can perform well on each relation class except for the relation “P-S”. Because the test dataset contains a few samples whose relation labels are “P-S”, the predicted distribution of “P-S” cannot fully reflect the true situa- tion. Besides, “P-S” means the relationship of “person-social”. The entity “person” and entity “social” always are pronoun words in the dataset, so it is hard to recognize “P-S” relationship based on these pronoun words. Further more, from Fig. 6 , we also can see that the distribu- tion of predicted relation is relatively dispersed on the first row of “OTHER”, which means that most of the specific relation classes can be predicted as the “OTHER”. Namely, we cannot identify some relationships and it directly leads to relatively low recall. From the first column of “OTHER”, we can see that if there is no relation- ship between the two entities the model can be effectively dis- criminated. Apart from the class “OTHER”, the other problem is that the same relation type with opposite directions are ease to mix up, Fig. 6. The distribution of the predicted results for each relation class. The horizontal axis is the target relation and each target relation corresponds to a column of predicted relations. Point (X,Y) means the ratio that the target relation is X and the predicted relation is Y. The sum of each column value equal to 1. Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM ARTICLE IN PRESS [m5G; March 8, 2017;1:24 ] S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 7 such as: P-W(e2e1) and P-W(e1e2), ART(e1e1)and ART(e2e1), O- A(e1e1)and O- A(e2e1). The reason is that the same relation type always have similar description even if they are not in the same direction. 6. Conclusion Entity and relation extraction is an important issue in knowl- edge extraction and plays a vital role in automatic construction of knowledge base. In this paper, we propose a hybrid neural network model to extract entities and their semantic relationships without any handcrafted features. When comparing with the other neural network based method, our method considers the long distance re- lationships between entity tags. In order to illustrate our methods’ effectiveness, we conduct experiments on the public dataset ACE05 (Automatic Content Extraction program). The experimental results on the public dataset ACE05 verify the effectiveness of our method. In the future, we will explore how to better link these two modules based on the neural network, so that it can perform bet- ter. Besides, we also need to solve the problem of neglecting some relationships and try to promote the recall value. Acknowledgment We thank Qi Li and Miwa for dataset details and helpful dis- cussions. We also thank Qi Li for providing the partition of dataset so that we can conduct contrast experiment in a fair environ- ment. This work is also supported by the National High Technol- ogy Research and Development Program of China (863 Program) (Grant No. 2015AA015402), the Hundred Talents Program of Chi- nese Academy of Sciences (No. Y3S4011D31) and National Natural Science Foundation (Grant No. 71402178 ). References [15] J.P. Chiu, E. Nichols, Named entity recognition with bidirectional lstm-cnns, arXiv: 1511.08308 (2015). [16] Z. Huang, W. Xu, K. Yu, Bidirectional lstm-crf models for sequence tagging, arXiv: 1508.01991 (2015). [17] G. Lample , M. Ballesteros , S. Subramanian , K. Kawakami , C. Dyer , Neural ar- chitectures for named entity recognition, in: Proceedings of the Annual Con- ference of the North American Chapter of the Association for Computational Linguistics, 2016 . [18] K. Xu Y. Feng, S. Huang, D. Zhao, Semantic relation classification via con- volutional neural networks with simple negative sampling, arXiv: 1506.07650 (2015). [19] D. Zeng , K. Liu , G. Zhou , J. Zhao , Relation classification via convolutional deep neural network, in: Proceedings of the 25th COLING International Conference, 2014, pp. 2335–2344 . [20] A. Passos , V. Kumar , A. McCallum , Lexicon infused phrase embeddings for named entity resolution, in: Proceedings of the International Conference on Computational Linguistics, 2014, pp. 78–86 . [21] R. Collobert , J. Weston , L. Bottou , M. Karlen , K. Kavukcuoglu , P. Kuksa , Natu- ral language processing (almost) from scratch, J. Mach. Learn. Res. 12 (2011) 2493–2537 . [22] X. Ma, E. Hovy, End-to-end sequence labeling via bi-directional lstm-cnns-crf, arXiv: 1603.01354 (2016). [23] N. Kambhatla , Combining lexical, syntactic, and semantic features with maxi- mum entropy models for extracting relations, in: Proceedings of the 43th ACL International Conference, 2004, p. 22 . [24] R. Socher , B. Huval , C.D. Manning , A.Y. Nq , Semantic compositionality through recursive matrix-vector spaces, in: Proceedings of the EMNLP International Conference, 2012, pp. 1201–1211 . [25] M. Yu , M. Gormleyl , M. Dredze , Factor-based compositional embedding mod- els, in: Proceedings of the NIPS Workshop on Learning Semantics, 2014 . [26] X. Yan , L. Moul , G. Li , Y. Chen , H. Peng , Z. Jin , Classifying relations via long short term memory networks along shortest dependency paths, in: Proceed- ings of EMNLP International Conference, 2015 . [27] C.N. dos Santos , B. Xiangl , B. Zhou , Classifying relations by ranking with con- volutional neural networks, in: Proceedings of the 53th ACL International Con- ference, vol. 1, 2015, pp. 626–634 . [28] T.-V.T. Nguyen , A. Moschittil , G. Riccardi , Convolution kernels on constituent, dependency and sequential structures for relation extraction, in: Proceedings of the EMNLP International Conference, 2009, pp. 1378–1387 . [29] P. Qin , W. Xu , J. Guo , An empirical convolutional neural network approach for semantic relation classification, Neurocomputing 190 (2016) 1–9 . [30] S. Zheng , J. Xu , P. Zhou , H. Bao , Z. Qi , B. Xu , A neural network framework for re- lation extraction: Learning entity semantic and relation pattern, Knowl. Based Syst. 114 (2016) 12–23 . [31] D. Zhang D. Wang, Relation classification via recurrent neural network, arXiv: 1508.01006 (2015). [1] D. Nadeau , S. Sekine , A survey of named entity recognition and classification, [32] J. Ebrahimi , D. Dou , Chain based RNN for relation classification, in: Proceedings Lingvisticae Investigationes 30 (1) (2007) 3–26 . of the NAACL International Conference, 2015, pp. 1244–1249 . [2] B. Rink , Utd: classifying semantic relations by combining lexical and seman- tic resources, in: Proceedings of the 5th International Workshop on Semantic Evaluation, 2010, pp. 256–259 . [33] S. Zhang , D. Zheng , X. Hu , M. Yang , Bidirectional long short-term memory net- works for relation classification, in: Proceedings of the Pacific Asia Conference on Language, Information and Computation, 2015, pp. 73–78 . [3] Q. Li , H. Ji , Incremental joint extraction of entity mentions and relations., in: Proceedings of the 52rd Annual Meeting of the Association for Computational Linguistics, 2014, pp. 402–412 . [34] L. Sun , X. Han , A feature-enriched tree kernel for relation extraction, in: Proceedings of the 52th ACL International Conference, 2014, pp. pages 61– 67 . [4] M. Miwa , Y. Sasaki , Modeling joint entity and relation extraction with table representation., in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014, pp. 1858–1869 . [35] D. Roth , W.-t. Yih , Global inference for entity and relation identification via a linear programming formulation, in: Introduction to Statistical Relational Learning, 2007, pp. 553–580 . [5] Y.S. Chan , D. Roth , Exploiting syntactico-semantic structures for relation ex- traction, in: Proceedings of the 49rd Annual Meeting of the Association for Computational Linguistics, 2011, pp. 551–560 . [36] B. Yang , C. Cardie , Joint inference for fine-grained opinion extraction., in: Pro- ceedings of the 51rd Annual Meeting of the Association for Computational Lin- guistics, 2013, pp. 1640–1649 . [6] X. Yu , W. Lam , Jointly identifying entities and extracting relations in encyclo- pedia text via a graphical model approach, in: Proceedings of the 21th COLING International Conference, 2010, pp. 1399–1407 . [37] S. Singh , S. Riedel , B. Martin , J. Zheng , A. McCallum , Joint inference of enti- ties, relations, and coreference, in: Proceedings of the 2013 Workshop on Au- tomated Knowledge Base Construction, ACM, 2013, pp. 1–6 . [7] L. Li , J. Zhang , L. Jin , R. Guo , D. Huang , A distributed meta-learning system for [38] Y. LeCun , L. Bottou , Y. Bengio , P. Haffner , Gradient-based learning applied to chinese entity relation extraction, Neurocomputing 149 (2015) 1135–1142 . document recognition, Proc. IEEE 86 (11) (1998) 2278–2324 . [8] M. Miwa , M. Bansal , End-to-end relation extraction using lstms on sequences and tree structures, in: Proceedings of the 54rd Annual Meeting of the Associ- ation for Computational Linguistics, 2016 . [39] J. Yu, X. Yang, F. Gao, D. Tao, Deep multimodal distance metric learning us- ing click constraints for image ranking, IEEE Trans. Cybern. (2016), doi: 10.1109/ TCYB.2016.2591583 . [9] C.N. dos Santos , B. Xiang , B. Zhou , Classifying relations by ranking with con- volutional neural networks, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Con- ference on Natural Language Processing, vol. 1, 2015, pp. 626–634 . [40] J. Yu , B. Zhang , Z. Kuang , D. Lin , J. Fan , Image privacy protection by identify- ing sensitive objects via deep multi-task learning, in: Proceedings of the IEEE Transactions on Information Forensics and Security, 2016 . [41] Y. Kim , Convolutional neural networks for sentence classification, in: Proceed- [10] Y. Xu , L. Mou , G. Li , Y. Chen , H. Peng , Z. Jin , Classifying relations via long short term memory networks along shortest dependency paths, in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2015 . [11] L. Zou , R. Huang , H. Wang , J.X. Yu , W. He , D. Zhao , Natural language question answering over RDF: a graph data driven approach, in: Proceedings of the 2014 ACM SIGMOD international conference on Management of data, ACM, 2014, pp. 313–324 . [12] J. Sang , C. Xu , J. Liu , User-aware image tag refinement via ternary semantic analysis, IEEE Trans. Multimed. 14 (3) (2012) 883–895 . [13] J. Sang , C. Xu , Right buddy makes the difference: An early exploration of social relation analysis in multimedia applications, in: Proceedings of the 20th ACM International Conference on Multimedia, ACM, 2012, pp. 19–28 . [14] G. Luo , X. Huang , C.-Y. Lin , Z. Nie , Joint entity recognition and disambiguation, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2015, pp. 879–888 . ings of the EMNLP International Conference, 2014 . [42] N. Kalchbrenner , E. Grefenstette , P. Blunsom , A convolutional neural network for modelling sentences, in: Proceedings of the 52th ACL International Confer- ence, 2014 . [43] P. Wang , B. Xu , J. Xu , G. Tian , C.-L. Liu , H. Hao , Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification, Neurocomputing 174 (2016) 806–814 . [44] X. Zhu , P. Sobihani , H. Guo , Long short-term memory over recursive struc- tures, in: Proceedings of the 32nd International Conference on Machine Learn- ing (ICML-15), 2015, pp. 1604–1612 . [45] A. Graves , Supervised Sequence Labelling, Springer, 2012 . [46] M.-T. Luong , I. Sutskever , Q.V. Le , O. Vinyals , W. Zaremba , Addressing the rare word problem in neural machine translation, in: Proceedings of the 53rd An- nual Meeting of the Association for Computational Linguistics and the 7th In- ternational Joint Conference on Natural Language Processing, 2015, pp. 11–19 . Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
JID: NEUCOM 8 ARTICLE IN PRESS S. Zheng et al. / Neurocomputing 0 0 0 (2017) 1–8 [47] R. Kiros , Y. Zhu , R.R. Salakhutdinov , R. Zemel , R. Urtasun , A. Torralba , S. Fidler , Skip-thought vectors, in: Proceedings of the Advances in Neural Information Processing Systems, 2015, pp. 3276–3284 . [48] L. Ratinov , D. Roth , Design challenges and misconceptions in named entity recognition, in: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, 2009, pp. 147–155 . [49] N. Kalchbrenner , E. Grefenstette , P. Blunsom , A convolutional neural network for modelling sentences, in: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014 . [50] K. Duan , S.S. Keerthi , W. Chu , S.K. Shevade , A.N. Poo , Multi-category classi- fication by soft-max combination of binary classifiers, in: Multiple Classifier Systems, Springer, 2003, pp. 125–134 . [51] G.E. Dahl , T.N. Sainath , G.E. Hinton , Improving deep neural networks for LVCSR using rectified linear units and dropout, in: Proceedings of the ICASSP, 2013, pp. 8609–8613 . [52] T. Tieleman , G. Hinton , Lecture 6.5-rmsprop, COURSERA: Neural networks for machine learning (2012) . [53] J. Lafferty , A. McCallum , F. Pereira , Conditional random fields: Probabilistic models for segmenting and labeling sequence data, in: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, vol. 1, 2001, pp. 282–289 . [54] S.J. Phillips , R.P. Anderson , R.E. Schapire , Maximum entropy modeling of species geographic distributions, Ecol. Modell. 190 (3) (2006) 231–259 . Suncong Zheng is a Ph.D candidate in Institute of Au- tomation Chinese Academy of Sciences. He received his B.S. degree in School of TianJin University, China, in 2012. His research interests include information extraction and web/text mining. Yuexing Hao is a M.S. candidate in Institute of Automa- tion Chinese Academy of Sciences. She received his B.S. degree in School of University of Science Technology Bei- jing, China, in 2014. Her research interests include infor- mation extraction and web/text mining. Dongyuan Lu is a Lecturer with the School of Information Technology and Management, University of International Business and Economics, Beijing, China. She received the B.S. degree from Beijing Normal University, Beijing, China, in 2007, and the Ph.D. degree from the Institute of Au- tomation, Chinese Academy of Sciences, Beijing, China, in 2012. Then she continued her research work in National University of Singapore as a research fellow for 2 years. Her research interests include social media analysis, in- formation retrieval, and data mining. [m5G; March 8, 2017;1:24 ] Hongyun Bao is an assistant researcher in the Insti- tute of Automation Chinese Academy of Sciences. She received his B.S. degree in School of Mathematical Sci- ences from Capital Normal University, China, in 2008, and Ph.D. degree from Chinese Academy of Sciences, in 2013. Her research interests include information extraction and web/text mining. Jiaming Xu is an assistant researcher in the Institute of Automation Chinese Academy of Sciences. He received his M.S. degree in School of University of Science Technol- ogy Beijing, China, in 2012, and Ph.D. degree from Chi- nese Academy of Sciences, in 2016. His research interests include information extraction and web/text mining. Hongwei Hao is the deputy director of Interactive Digital Media Technology Research Center, Institute of Automa- tion, Chinese Academy of Sciences. His research interests include semantic computation, pattern recognition, ma- chine learning, and image processing. He has published over 50 papers in Chinese Journals, international journals and conferences. Bo Xu is a Professor at the Institute of Automation Chi- nese Academy of Sciences. He received the B.S. degree in Zhejiang University in 1988. From 1988 he joined the speech recognition research and received the Master’s and Doctor’s Degree in the field in 1992 and 1997 respec- tively. Now he is the President of CASIA and takes posi- tion in committee of National high-tech Program in fields of Chinese Information Processing, Multimedia and Virtu- ality. He has published more than 100 papers on major journals and processdings including IEEE Trans. Please cite this article as: S. Zheng et al., Joint entity and relation extraction based on a hybrid neural network, Neurocomputing (2017), http://dx.doi.org/10.1016/j.neucom.2016.12.075
分享到:
收藏