logo资料库

Deep Learning 在中文分词和词性标注中的应用.pdf

第1页 / 共18页
第2页 / 共18页
第3页 / 共18页
第4页 / 共18页
第5页 / 共18页
第6页 / 共18页
第7页 / 共18页
第8页 / 共18页
资料共18页,剩余部分请下载后查看
引言
神经网络构架
Mapping Characters into Feature Vectors
Tag Scoring
Tag Inference
Training
Sentence-Level Log-Likelihood
A New Training Method
实验
Tagging Scheme
The Choice of Hyper-parameters
Closed Test on the SIGHAN Bakeoff
Combined Approach
总结
./""!0/# peghoty   peghoty@163.com 1 h 2 VC_8= 3 Xj 4  2.1 Mapping Characters into Feature Vectors . . . . . . . . . . . . . . . . . . . . 2.2 Tag Scoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Tag Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Sentence-Level Log-Likelihood . . . . . . . . . . . . . . . . . . . . . . 2.5 A New Training Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Tagging Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The Choice of Hyper-parameters . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Closed Test on the SIGHAN Bakeoff . . . . . . . . . . . . . . . . . . . . . . 3.4 Combined Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 4 5 6 7 8 8 9 12 13 13 15 16 18 1
"VjSZ (DL, Deep Learning) "; (CWS, Chinese word segmentation);! (POS tagging)dG. jSTTd26  ,?bYZ 6 (task-specific feature engineering). jCj!Y (unlabeled data)v"{_ (internal representation), >\jv_ (representation) x.. maximum-likelihoodheX{D," 5 perceptron-stylezed\9y { 1,b5Z2 W~C^. 2
§1 i ;Xb" NLP8X{Æ,{; W`"~?2 !. 0RV:5. (pipelined system)6,x^h (joint solutions) j POSD ;\28,d\ xSZ.>b,8x` he2C} ,?b\/{#: 1.EC,6za=\ jZ8#; 2.!`,\ {E{ {Y^! (overfit); 3. {X=; 4.w~e=C,W!5.  1.1ÆAjÆ pipelined system joint solutions| WW,S >}W,^>W,X^xyzW. ./X{ Abxb :.C0 (state-of-the-art).Xbi 0^SCk" ,>{gj56E.he?,bbR V? d\EBEC}SÆ[ (linguistic knowledge).Fb,he k~9y joint,D;POS!`G! joint. U jointEÆ{Cw~e=\6   6\2! jj.{",D j`TT?E\d2j . "V3{: (1)}5X {TT perceptron-styleze,{4}V 6h/, d\9y {1,5~C^Z2; (2)~ Deep Learningd\?j";;!U NLP. D{Cj!YV; <". 3
§2 WD`9> ";;!b^E\G!.8!he 2 Y :, D,B|. (CRFs)WVX (feature templates). b<,. 6RW"Z  6;~, 6{>VFk?w (labor- intensive),b5V ?> (human ingenuity)S℄ (linguistic intu- 2003", Bengio?CvwSEX 5~XTSTT ([2]), 2011" Collobert? {TT 5~Bhe ([3]).~? ze^ 6Y ,"\j{TTX{,D\.E,^ ` :,?b ~3< _. 1}5~TT;. ition). 1 The neural network architecture  XE\{ ; DX{VZ  i6yX{B ;/#TT;5X{ (graph),j Viterbized\/!℄ (tag inference). 4
ZD(ci) = Meki ∈ Rd, (2.1) §2.1 Mapping Characters into Feature Vectors \~dh^v?,X{#.d" ;}. X{VC>S D,;}_i{X M ∈ Rd×|D|, d ;}e =Z (dT!), |D|SC>.  2.1F," S`b? {1. ÆDiX{ \jk. }V\ c[1:n], n{ ci, 1 ≤ i ≤ n. ci ;}X M ki, d_  eki ∈ R|D| R|D| ki{E!;} (6 ki{} 1} 0). ZD^ gX{ (lookup table layer),WdfX{>Eh (projection layer).  ;} (3X M)i0|.4℄",> BPW {,\6 . {Z8gj,TdEXja , D,GZ[ (name entity recognition)X,d\ X{U :X{be. ,Wd\dEX 6}, D boundary entropy, accessor varietyWjj< YE";E.^B9 ,W x6g,,; ;} W/6g x~.  2.2" boundary entropy accessor variety 1948",d 5~D/w#.{D,D/d\VD5 2w,F{V;97D/wx,{;9d;=V WWwx,{;9WWwd bX{\.{"7,.0B (Boundary entropy, BE)./{;.gj,V` ^ wbX{", p(w|w1w2 · · · wk)b w9 w1w2 · · · wkw. b<,|/V`,/ BEl/ BEr. 0B5g (Accessor veriety, AV)j /,X\=V.} \##.V`  RLav(w1w2 · · · wk)_F^d 9 w1w2 · · · wk{. U,d\V``U AVl`U AVr. AV (w1w2 · · · wk) = log RLav(w1w2 · · · wk), BE(w1w2 · · · wk) = − X w∈C p(w|w1w2 · · · wk) log p(w|w1w2 · · · wk), (2.2) (2.3) 5
 ... ... (2.4) f 1 θ (ci) = ZD(ci) . ZD(ci− w ) 2   (· · · f 1 θ (·) · · · )). fθ(·) = f L θ (f L−1 §2.2 Tag Scoring X{TTd\b b\ θ! fθ(·),^X{ L0iTT, fθd \A/\/F^ ^E\X{,TT;Dd X{ tagG. ^\^g\X#, j376 (window approach),: fhe:VX{ tagVY6.~=4,}VX{E\ c[1:n], C> w:f (:f? c1a℄X  cn).^ ci,;}d_ w/^gDi/|{k,^g start stop.   2.3:f\ ciC,|:6,Z w,b< wg  . (2.5) w .~;/: ( 1SB). θ ∈ Rwd,DvE|{#5,X{j5/ V f 1  gj :xZj5 . T #!1,}T|{5, ;}d\_ ,TE W 2 ∈ RH×wd, W 3 ∈ R|T |×H,\2Re b2 ∈ RH, b3 ∈ R|T |`V {!, H_e"eE{ (dT!),/ gj sigmoid fθ(ci) j{}_ ci! T j{x (score).  2.4" [3]j/b HardTanh,V` {ZT,d2 sigmoid HardTanh2VXRR. θ (ci) + b2) + b3 ∈ R|T |, θ (ci)))) = W 3g(W 2f 1 fθ(ci) = f 3 θ (g(f 2 θ (f 1 HardT anh(x) = −1, x < −1;   −1 ≤ x ≤ 1; ZD(ci+ w ) 2 x > 1. g(x) = 1 . 1 + e−x (2.6) (2.7) (2.5) (2.8) 2 θ x, 1, 6
s(c[1:n], t[1:n], θ) = (Ati−1ti + fθ(ti|i)). (2.9) n X i=1 §2.3 Tag Inference ^;;!U NLP,X{\|{ tag={2 Y.Gy|{ tag T i{ j{X,dE6g" (transition score) Aij,,dE A0i_\i X{ tag T i{X" .? tag path/,b tag path℄. }TTEX{\ c[1:n],DX{X fθ(c[1:n]).7 fθ(t|i)_ E\ i{ ci tag T t{X.}VX tag path t[1:n],}6g dV` ~^ (2.9),}|>E .  2.1\z T = {S, B, I, E},><b,Rz /Sb/S/B/I/E  c[1:5] = {,b,,,}, t[1:5] = {1, 1, 2, 3, 4},} t0 = 0. v` (2.9),C<Æ , fθ(ci)} |T |ÆF E,Y0 B = (Bij) ∈ R|T |×n{a4 fθ(c1), fθ(c2), · · · , fθ(cn) Æ:,.[jÆ fθ(c[1:n]), fθ(i|j) = Bij. ^}VE c[1:n],C" (2.9)V`^,Wd\Xk tag [1:n],3 (2.10)7d\j Viterbi\6.  2.5 ^dÆ,k tag pathb? |T |n pathÆ5.b<, \w (3 nwC),}w~e=wC. /,D\j[^[ (end-to-end)h^ {!. s(c[1:5], t[1:5], θ) = (A01 + fθ(1|1)) + (A11 + fθ(1|2)) + (A12 + fθ(2|3)) + (A23 + fθ(3|4)) + (A34 + fθ(4|5)) . t∗ [1:n] = arg max ˜t[1:n] s(c[1:n], ˜t[1:n], θ), path t∗ (2.10) 7
§2.4 Training θ ← θ + λ (2.12) log p(t|c, θ), (2.11) ∂ log p(t|c, θ) , ∂θ X ∀ (c,t)∈R {1Dj {Y=VTT! θ = (M, W 2, b2, W 3, b3, A). j  {hebC"D/ log-likelihood  c_X{\26  , t_6g tag.>"7,z\j/ [1 : n].w p(·)56z,Z6zhe{ §2.4.1z. (2.11)C"d\j℄<^4b?6,bX{fUD1,UDY=? {Y|.:X{U (c, t),GX=  λ  (dT!). (2.12)Zd\ BPze6z.  2.6" [3] jb|.Z/Fe,F jZ/Fe.Z/Fe6z/ (cost function)ZXVjU,b" {1b4,b<d #. (2.9)V`d\\/h^"Bw: |z: log (\ e),d (2.14)7 9 (= |T |n)|E\Z0~.6z log p(t|c, θ) 2  fθ(t|i), Aij6za=jC.  2.7}V! θ,\ c[1:n]! |T |nd  path,^ path t,  s(c, t, θ),Æ,Di/ es(c,t,θ),1>, path t^gwWg b es(c,t,θ){ es(c,˜t,θ)a,Wb^ (2.13)`. /XD {ze,7Z6za=V>`. log p(t|c, θ) = s(c, t, θ) − logX §2.4.1 Sentence-Level Log-Likelihood es(c,t,θ) es(c,˜t,θ) P˜t p(t|c, θ) = ∈ (0, 1), es(c,˜t,θ). ˜t (2.13) (2.14) P P˜t 8
分享到:
收藏