

第1页 / 共21页
第2页 / 共21页
第3页 / 共21页
第4页 / 共21页
第5页 / 共21页
第6页 / 共21页
第7页 / 共21页
第8页 / 共21页
5ὃᫀ ʔʔʔ 1ஹᫀ5ὃ᝞5I┯ᡈὅAḄY⚪AZὶ4ᦟ 2ஹʟʠɎ▤ᜧ᝞ᙠ`ᳮYḄ_⚪ψAZὶ4ᦟ 3ஹὃ⌕ʖ9Ạ⚪Lᜧɢ⍝5ḄLᐵMF 4ஹᫀ5ȜȆ?5Ȍ8Ḅ⊡ 1Q1 Let ωmax(x) be the state of nature for which P (ωmax|x) ≥ P (ωi|x) for all i, i = 1, ..., c. (a) Show that P (ωmax|x) ≥ 1/c (b) Show that for the minimum-error-rate decision rule the average (c) Use these two results to show that P (error) ≤ (c − 1)/c (d) Describe a situation for which P (error) = (c − 1)/c probability of error is given by P (error) = 1 − P (ωmax|x)p(x)dx (a) ᵫP (ωmax|x) ≥ P (ωi|x) P (ωmax|x) + ··· + P (ωmax|x) c  = cP (ωmax|x) ≥ P (ω1|x) + ··· + P (ωc|x) = 1 P (error) = P (error, x)dx = P (error|x)p(x)dxxᐸȊωmaxḄ ᡠᨵcP (ωmax|x) ≥ 1ᓽP (ωmax|x) ≥ 1/c (b) Ȝ ᭆ᳛P (ωmax|x)ᡠᢥ᯿⚪LḄᑖ3ᑣx XᑨωmaxA P (ωmax|x)p(x)dx 1/cp(x)dx p(x)dx 1 ≤ 1 − = 1 − 1 c c − 1 c = P (error|x) = 1 − P (ωmax|x) P (error|x)p(x)dx = [1−P (ωmax|x)]p(x)dx = 1− P (ωmax|x)p(x)dx ᡠ P (error) = (c) MP (ωmax|x) ≥ 1/c P (error) = 1 −
p(x|ωi)p(ωi)ḄᦪǷωiAᐵA5P (error) = (d) P (ωi|x) = p(x|ωi)p(ωi) (c − 1)/c p(x) 1Q2 In many pattern classification problems one has the option either to assign the pattern to one of c classes, or to reject it as being unrec- ognizable. If the cost for rejects is not too high, rejection may be a desirable action. Let  0 λr λs λ(αi|ωi) = i = j i, j = 1, ..., c i = c + 1 otherwise where λr is the loss incurred for choosing the (c + 1)th action, rejec- tion, and λs is the loss incurred for making a substitution error. Show that the minimum risk is obtained if we decide ωi if P (ωi|x) ≥ P (ωi|x) for all j and if P (ωi|x) ≥ 1 − λr , and reject otherwise. What happens if λr = 0? What happens if λr > λs? λs ᪵ʠxᐸXᑨȊωiᑖḄb◅Aὃ⇋> c j=1 Ri = λ(αi|ωj)P (ωj|x) j=i λsP (ωj|x) = 0 × P (ωi|x) + = λs[1 − P (ωi|x)] ᝞ʧ>ᑣ᪵ʠxb◅Rr = λrP (ωi|x) ≥ 1− λr λs ARiRr: Ri − Rr = λs[1 − P (ωi|x)] − Rr − 1) − λr ≤ λs(1 + λr λs = 0 ᓽRi − Rr ≤ 0ᑨωiḄb◅>Ḅb◅஺ λr = 0A>ᙠb◅ᑖᙠϏḄb◅ᡠᡠᨵḄ᪵ʠZX > λr > λsAᨵRr = λr > λs ≥ Ri = λs[1 − P (ωi|x)]ᓽRr > Ri>Ḅ b◅ɛʖᑖb◅ A>Ḅʡᑴᜫᦔ 2
1Q3 Now we have N samples, and each sample xi, i = 1, ..., N has d- dimensions. Please provide us the proofs and the pseudo-codes of PCA algorithm PCAʖOτA ᪵ʠᢗᑮ__Ḅ᪵ʠ[ 8ț;_A[ᨬ஺M▣X = (xc N ) ᦮ᦪɼ`ᐸ: 2, ..., xc i = xi − µ, xi ∈ Rd×1, i = 1, ..., N xc 1, xc µ = 1 N ?ᩭȖAM▣ N xi i=1 Σ = XX T ΣḄʠǷ᪀ᡂYM▣ᜐλ1, λ2, ..., λdᜧᑮ᣸ᑡ ΛḄʠᔣ[᪀ᡂd × dM▣ Λ = Diag(λ1, λ2, ..., λd) Φ = (φ1, φ2, ..., φd) ⌱>ʠǷᨬᜧḄmʠᔣ[_Ḅ9ʠᔣ[ᡂᢗM ▣W  ΅᝞@5ᐰZᦪɼτᑮm W = (φ1, φ2, ..., φm) Z = W T X ʔ✌ᐜPCA⌕ᙠᦪɼτᢗȜᐸᙠᢗȜḄᜧA ᙠᙶ᪗Oᡂᑖ ᜧAᙠᙶ᪗ᡂ ᑖ ?ZW ⌱3ᢗᔣ[φjφj3xiᙠ _ḄjḄᦪǷφT j xiφjA Ej = φT j XX T φj LȌḄᓄL᪗5ᑮᔠ〉ḄφjᐸᨬᜧᓄEjᙠʣφT ᔠ>ʫᨽAᓄL᪗5 j φj − 1 = 0 JEj = φT j Σφj + λj(φT j φj − 1) JEj φjᦪ`ᜐᑮᔠ〉Ḅφjᑮ ʗᯠφjʠᔣ[ᩩ Σφj + λjφj = 0 3
1Q4 Consider the following decision rule for a two-category one-dimensional problem: Decide ω1 if x > θ; otherwise decide ω2. (a) Show the probability of error for this rule is given by P (error) = P (ω1) θ −∞ p(x|ω1)dx + P (ω2) ∞ θ p(x|ω2)dx (b) By differentiating, show that a necessary condition to minimize P (error) is that θ satisfy p(θ|ω1)P (ω1) = p(θ|ω2)P (ω2) (c) Does this equation define θ uniquely? (d) Give an example where a value of θ satisfying the equation actu- ally maximizes the probability of error (a) x ≤ θAᑨ3ω2ᡠZᑖḄ┯᳛ȕ_(−∞, θ]xω1Ḅᭆ ᳛ P 1(error) = = θ θ θ −∞ −∞ p(ω1|x)p(x)dx p(x, ω1) p(x)dx p(x) p(x|ω1)p(ω1) = −∞ θ −∞ θ p(x|ω2)dxᡠ = p(ω1) p(x)dx p(x) p(x|ω1)dx p(x|ω1)dx + p(ω2) θ −∞ ∞ θ p(x|ω2)dx Ȝᳮp2(error) = p(ω2) ∞ P (error) = p(ω1) (b) dP (error) dθ = p(ω1)p(θ|ω1) − p(ω2)p(θ|ω2) (c) P3θḄȨY8 p(θ|ω1) (d) x ≤ θAᑨ3ω25ᑨ3ω1ZȊω2Ḅ⌕ᭆ᳛ᑖាɕ ᙠ(θ,∞)ȕ_̠Ȋω1Ḅ⌕ᭆ᳛ᑖាɕᙠ(−∞, θ)ȕ_5̠ ᑨ3YᑣḄ᳛ᜧ ḄθǷᨵ p(θ|ω2) = p(ω2) p(ω1) 1Q5 Consider the multivariate normal density for which σij = 0 and σii = σ2 i , i.e., Σ = diag(σ2 2, ..., σ2 1, σ2 d). 4
(a) Show that the evidence is p(x) = (b) Plot and describe the contours of constant density (c) Write an expression for the Mahalanobis distance from x to µ d i=1( xi−µi σi )2] exp[− 1 2 2πσi 1d √ i=1 (a) ɏ A p(x) = 1 2 |Σ| 1 (2π) d 2 exp[− 1 2 (x − µ)T Σ−1(x − µ)] ᐭ⚪Lᩩᑮ 2d 1 p(x) = (2π) d i=1 σi d ( i=1 xi − µi σi )2] exp[− 1 2 (b) ᔣ[x1x2Rp(x1) = p(x2)ᑣᨵ d i=1 d i=1 ( x1 i σi − µi σi )2 = ( x2 i σi − µi σi )2 , ..., µd σd ]ḄOLᡠ ᙠ 1, x1 2, ..., x1 d]ȤF[x2 d]ᑮF[ µ1 ᓽF[x1 σ1 <3ʴᙊᙠ <3ʴᳫ (c) O 2, ..., x2 1, x2 d i=1 xi − µi ( )2 σi ∞ a e− µ2 1Q6 Let p(x|ωi) ∼ N (µi, σ2I) for a two-category d-dimensional problem with P (ω1) = P (ω2) = 1 2 (a) Show that the minimum probability of error is given by Pe = 1√ 2π (b) Let µ1 = 0 and µ = (µ1, ..., µd)t. Use the inequality from [Pattern Classification, Chapter 2, Problem 31] to show that Pe approaches zero as the dimension d approaches infinity. (c) Express the meaning of this result in words 2 dµ, where a = ||µ2 − µ1||/(2σ) (a) 5
R1 R1 ȕ_R1ᑨω1ȕ_R2ᑨω2 p(ω2|x)p(x)dx + P (error) = p(ω1|x)p(x)dx R2 p(ω1, x) R2 p(x) p(ω2, x) p(x) = p(x)dx + p(x)dx = p(ω2) p(x|ω2)dx + p(ω2) p(x|ω1)p(ω1)dx R1 p(x|ω2)dx + R2 p(x|ω1)dx 1 2 R2 1 2 = = R1 p(x|ω2)dx R1 ʠ⚪ᐹḄᑨ3Yᑣxµ1Oᑣᑨ3ω1µ2Oᑣᑨ 3ω2ὃ⇋ᑨω1Ḅ<3 i )2 − (xi − µ2 i )x − (µ2 [(xi − µ1 i )(µ2 i + µ1 i )] < 0 i − µ1 [2(µ2 i − µ1 i )2] = i ᓄ i y = (µ2 − µ1)tx < 2 µ2 − µT (µT 1 µ1) 1 2 y᪗[ᡠᑖḄᙳǷᑖȊ5(µ2 − µ1)T µ1Ȥ(µ2 − µ1)T µ2Aᙳ σ2||µ2 − µ1||2Aɴɏᔣ[ʹᜐᳮ P (error) = p(x|ω2)dx R1 √ = 1 2πσ||µ2 − µ1|| ∞ (µ2−µ1)T µ1+w(µ2−µ1)T µ2 2 exp(− (y − (µ2 − µ1)T µ2)2 2σ2||µ2 − µ1||2 )dy ᵨu = y−(µ2−µ1)T µ2 σ||µ2−µ1|| Bᣚᣵyᑮ P (error) = 1√ 2π ∞ ||µ2−µ1|| 2σ exp(− u2 2 )du (b) ᔠMᩩ Pe ≤ √ 1 ||µ2−µ1|| 2σ 2π e− (||µ2−µ1|| 2σ 2 )2 Pe ≤ 2σ√ 2π||µ2|| e − ||µ2||2 8σ2 Ϗ`ᔣ[µ2ḄAᜧAPe` (c) ʔ ᦪɼ5ᑖ 6
2Q1 n n Let the sample mean ˆµn and the sample covariance matrix Cn for a set of n samples x1, ..., xn ((each of which is d-dimensional) be defined k=1(xk − ˆµn)(xk − ˆµn)t. We call these k=1 xk and Cn = 1 by ˆµn = 1 n−1 n the ’non-recursive’ formulas. (a) What is the computational complexity of calculating ˆµn and Cn by these formulas? (b) Show that alternative, ’recursive’ techniques for calculating ˆµn and Cn based on the successive addition of new samples xn+1 can be n+1 (xn+1 − ˆµn) and derived using the recursion relations: ˆµn+1 = ˆµn + 1 Cn+1 = n−1 (c) What is the computational complexity of finding ˆµn and Cn by these recursive methods? n+1 (xn+1 − ˆµn)(xn+1 − ˆµn)t. n Cn + 1 (a) ᙳǷAn᪵ʠn4ᐳdɎᩖO(dn) ȖAM▣CnA᪵ʠ˯d2ᐳn᪵ʠᡠ Ɏᩖnd2 (b) ⌴ḄAˆµn+1 n+1 1 xi n + 1 i=1 1 n + 1 (xn+1 + n i=1 ˆµn+1 = = = = xi) n i=1 1 n + 1 (xn+1 + n × 1 n xi) 1 n + 1 xn+1 + n n + 1 ˆµn (xn+1 − ˆµn) = ˆµn + 1 n + 1 7
⌴ACn+1 = n−1 n Cn + 1 n+1 (xn+1 − ˆµn)(xn+1 − ˆµn)t (xk − µn+1)(xk − µn+1)T k=1 n+1 n n k=1 k=1 n Cn+1 = = = = 1 n 1 n 1 n 1 n 1 n (xk − µn+1)(xk − µn+1)T + (xn+1 − µn+1)(xn+1 − µn+1)T 1 n [xk − µn − 1 n + 1 (xn+1 − µn)][xk − µn − 1 n + 1 (xn+1 − µn)]T + (xn+1 − µn+1)(xn+1 − µn+1)T (xk − µn)(xk − µn)T − 1 n(n + 1) (xn+1 − µn) n k=1 1 − n (xk − µn)T k=1 (xk − µn)(xn+1 − µn)T + 1 (n + 1)2 (xn+1 − µn)(xn+1 − µn)T ᐸn k=1 n(n + 1) [xn+1 − µn − 1 n + 1 (xn+1 − µn)][xn+1 − µn − 1 n + 1 1 + n k=1(xk − µn)T `ᔣ[ᡠஹ⚗5◀ (n + 1)2 (xn+1 − µn)(xn+1 − µn)T Cn+1 = Cn + n + 1 (xn+1 − µn)]T n − 1 n n − 1 n Cn + 1 n + 1 (xn+1 − µn)(xn+1 − µn)T (c) ᑭᵨᓫ᪵ʠBAµ᡻ʹ43ḄɎᩖ O(d)ᑭᵨᓫ᪵ʠBACA᡻ʹd2ᡠɎᩖO(d2) 2Q2 exp[− 1 2 In Pattern Classification, Chapter 3, Page 77, we have f (σ, σn) = )2]dµ, now please calculate f (σ, σn), give the (µ− σ2 σ2+σ2 n σ2σ2 n nx+σ2µn σ2+σ2 n final result of f , and explain that f has nothing to do with x. f (σ, σn) f (σ, σn) = exp[− 1 2 σ2 + σ2 n σ2σ2 n (µ − σ2 nx + σ2µn σ2 + σ2 n )2]dµ 8