R语言-核密度及其导数估计、最优窗宽选择的方法.pdf

发布时间：2022-06-09 发布人：admin 分类：说明书资料大小：0.46M 资料格式：pdf 举报版权申诉

cet777-13184440-16359647300650685451.pdf-第1页.png

第1页 / 共22页

cet777-13184440-16359647300650685451.pdf-第2页.png

第2页 / 共22页

cet777-13184440-16359647300650685451.pdf-第3页.png

第3页 / 共22页

cet777-13184440-16359647300650685451.pdf-第4页.png

第4页 / 共22页

cet777-13184440-16359647300650685451.pdf-第5页.png

第5页 / 共22页

cet777-13184440-16359647300650685451.pdf-第6页.png

第6页 / 共22页

cet777-13184440-16359647300650685451.pdf-第7页.png

第7页 / 共22页

cet777-13184440-16359647300650685451.pdf-第8页.png

第8页 / 共22页

Introduction

Convolutions and derivatives in kernels

Kernel density derivative estimator

Bandwidth selections

Optimal bandwidth

Maximum likelihood cross-validation

Unbiased cross-validation

Biased cross-validation

Complete cross-validation

Modified cross-validation

Trimmed cross-validation

Summary

Kernel Estimator and Bandwidth Selection for Density and its Derivatives The kedd Package Version 1.0.3 by Arsalane Chouaib Guidoum∗ Revised October 30, 2015 1 Introduction In statistics, the univariate kernel density estimation (KDE) is a non-parametric way to estimate the probability density function f (x) of a random variable X, is a fundamental data smoothing problem where inferences about the population are made, based on a ﬁnite data sample. This techniques are widely used in various inference procedures such as signal processing, data mining and econometrics, see e.g., Silverman [1986], Wand and Jones [1995], Jeﬀrey [1996], Wolfgang et all [2004], Alexandre [2009]. The kernel estimator are standard in many books with applications and computer vision, see Wolfgang [1991], Scott [1992], Bowman and Azzalini [1997], Venables and Ripley [2002], for computational complexity and with implementation in S, for an overview. Estimation of the density derivatives also comes up in various other applications like estimation of modes and inﬂexion points of densities, a good list of applications which require the estimation of density derivatives can be found in Singh [1977]. There already exist a number of packages that can perform kernel density estimation in R (density in R base); see for example KernSmooth [Wand and Ripley, 2013], sm [Bowman and Az- zalini, 2013], np [Tristen and Jeﬀrey, 2008] and feature [Duong and Matt, 2013], they exist also of functions for kernel density derivative estimation (KDDE), e.g., kdde in ks package [Duong, 2007]. We introduce in this vignette a new R package kedd [Guidoum, 2015] for use with the statistical programming environment R Development Core Team [2015], which implements smoothing tech- niques and computing bandwidth selectors of the rth derivative of a probability density f (x) for univariate data, using several kernels functions. 2 Convolutions and derivatives in kernels In non-parametric statistics, a kernel is a weighting function used in non-parametric estimation techniques. Kernels are used in kernel density estimation to estimate random variables density functions f (x), or in kernel regression to estimate the conditional expectation of a random variable, see e.g., Silverman [1986], Wand and Jones [1995]. In general any functions having the following assumptions can be used as a kernel: R xK(x)dx = 0. (A1) K(x) ≥ 0 and (A2) Symmetric about the origin, e.g., R K(x)dx = 1. ∗Department of Probabilities & Statistics. Faculty of Mathematics. University of Science and Technology Houari Boumediene. BP 32 El-Alia, U.S.T.H.B, Algeria. acguidoum@usthb.dz 1

(A3) Has ﬁnite second moment, e.g., µ2(K) = R x2K(x)dx < ∞. We denote R(K) = R (K(x))2 dx. If K(x) is a kernel, then so is the function ¯K(x) deﬁned by ¯K(x) = λK(λx), where λ > 0, this can be used to select a scale that is appropriate for the data. The kernel function is very important to spreading a probability mass of 1/n, the most widely used kernel is the Gaussian of zero mean and unit variance. Some classical of kernel function K(x; r) (r is the maximum derivative of kernel) in kedd package are the following: Kernel K(x; r) Gaussian K(x;∞) = 1√ 1]−∞,+∞[ 2 2π exp − x2 1 − x2 1(|x|≤1) 1 − x23 1(|x|≤1) 1 − |x|33 1(|x|≤1) 1 − x22 1(|x|≤1) 4 cos π 2 x 1(|x|≤1) Epanechnikov K(x; 2) = 3 4 Uniform K(x; 0) = 1 2 1(|x|≤1) Triangular K(x; 1) = (1 − |x|)1(|x|≤1) Triweight K(x; 6) = 35 32 Tricube K(x; 9) = 70 81 Biweight K(x; 4) = 15 16 Cosine K(x;∞) = π R(K) √ 1/ (2 µ2(K) π) 1 3/5 1/2 2/3 350/429 175/247 5/7 π2/16 1/5 1/3 1/6 1/9 35/243 1/7 −8 + π2 /π2 Table 1: Kernel functions in kedd pakage. The rth derivative of the kernel function K(x) is written as: K(r)(x) = dr dxr K(x) and convolution of K(r)(x) is: K(r) ∗ K(r)(x) = K(r)(x)K(r)(x − y)dy (1) (2) R R for example the rth derivative of the Gaussian kernel is given by: K(r)(x) = (−1)rHr(x)K(x) and the rth convolution can be written as: K(r) ∗ K(r)(x) = (−1)2r Hr(x)Hr(x − y)K(x)K(x − y)dy where Hr(x) is the rth Hermite polynomial, see e.g., Olver et all [2010]. We use kernel.fun for kernel derivative deﬁned by (1), and kernel.conv for kernel convolution deﬁned by (2). For example the ﬁrst derivative of the Gaussian kernel displayed on the left in Figure 1. On the right is the ﬁrst convolution of the Gaussian kernel. > library(kedd) > kernel.fun(x = seq(-0.02,0.02,by=0.01), deriv.order = 1, kernel = "gaussian")$kx [1] 0.007977250 0.003989223 0.000000000 -0.003989223 -0.007977250 > kernel.conv(x = seq(-0.02,0.02,by=0.01), deriv.order = 1, kernel = "gaussian")$kx [1] -0.1410051 -0.1410368 -0.1410474 -0.1410368 -0.1410051 > plot(kernel.fun(deriv.order = 1, kernel = "gaussian")) > plot(kernel.conv(deriv.order = 1, kernel = "gaussian")) 2

Figure 1: (Left) First derivative of the Gaussian kernel. (Right) Convolution of the ﬁrst derivative Gaussian kernel. 3 Kernel density derivative estimator Let (X1, X2, . . . , Xn) be a data sample, independent and identically distributed of a continuous random variable X, with density function f (x). If the kernel K is diﬀerentiable r times then a natural estimator of the rth derivative of f (x) the rth derivative of the kernel estimate [Bhattacharya, 1967, Schuster, 1969, Alekseev, 1972]: ˆf (r) h (x) = dr dxr 1 nh = 1 nhr+1 K(r) (3) x − Xi h n i=1 K n i=1 x − Xi h where K(r) is rth derivative of the kernel function K, which we take to be a symmetric probability density with at least r non zero derivatives when estimating f (r)(x), and h is the bandwidth, this parameter is very important that controls the degree of smoothing applied to the data. The following assumptions on the density f (r)(x), the bandwidth h, and the kernel K: (A4) The (r + 2) derivatives f (r+2)(x) is continuous, square integrable and ultimately monotone. = ∞, i.e., as the number (A5) In the asymptotic framework, as limn→∞ hn = 0 and limn→∞ nh2r+1 n of sample n is increased h approaches zero at a rate slower than 1/n2r+1. (A6) Assumptions about K are introduced in the previous section. As seen in Equation (3), when working with a kernel estimator of the rth derivative function two choices must be made: the kernel function K and the smoothing parameter or bandwidth h. The choice of K is a problem of less importance, because K is not very sensitive to the shape of estimator, and diﬀerent functions that produce good results can be used. In practice, the choice of an eﬃcient method for the computation of h, for an observed data sample is a crucial problem, because of the eﬀect of the bandwidth on the shape of the corresponding estimator. If the bandwidth is small, we will obtain an under smoothed estimator, with high variability. On the contrary, if the value of h is big, the resulting estimator will be over smooth and farther from the function that we are trying to estimate. An example is drawn in Figure 2 where we show in left four diﬀerent kernel (Gaussian, biweight, triweight and tricube) estimators of the ﬁrst derivative of a bimodal (separated) Gaussian density (Equation 5), and a given value of h = 0.6. On the right, using the Gaussian kernel and four diﬀerent values for the bandwidth. 3 −4−2024−0.2−0.10.00.10.2x−505−0.10−0.050.000.05x

Figure 2: (Left) Diﬀerent kernels for estimation, with h = 0.6. (Right) Eﬀect of the bandwidth on the kernel estimator. We have implemented in R the function dkde corresponds to the derivative of kernel density estimator (Equation 3). Eight possibilities are allowed for the kernel functions that are summarized in Table 1. We enumerate the arguments and results of this function in Table 2. Arguments x y Description The data sample. The points of the grid at which the density derivative is to be estimated. The default are 4h outside of range(x). deriv.order Derivative order (scalar). h The smoothing bandwidth to be used. The default, ”ucv” unbiased cross- validation. The kernel function (see Table 1), by default "gaussian". Description kernel Results eval.points The coordinates of the points where the density derivative is estimated. est.fx The estimated density derivative values (Equation 3). Table 2: Summary of arguments and results of dkde. Working with the dataset ’bimodal’ correspond to data sample of 200 random numbers of a bi- modality (separated) of a two-component Gaussian mixture density (Equation 4), with the following parameters: −µ1 = µ2 = 3/2 and σ1 = σ2 = 1/2. The dkde function enables to compute the rth derivative of kernel density estimator over a grid of points, with a bandwidth selected by the user, but it also allows to estimate directly this parameter by the unbiased cross-validation method h.ucv (see following Section). We have chosen this method as the automatic one because it is the fastest in computation time terms. Now we estimate the ﬁrst three derivatives of f (x), can be written as: f (x) = 0.5φ(µ1, σ1) + 0.5φ(µ2, σ2) f (1)(x) = 0.5(−4x − 6)φ(µ1, σ1) + 0.5(−4x + 6)φ(µ2, σ2) f (2)(x) = 0.5 f (3)(x) = 0.5(−4x − 6) (−4x − 6)2 − 12 (−4x − 6)2 − 4 φ(µ1, σ1) + 0.5 (−4x + 6)2 − 4 φ(µ1, σ1) + 0.5(−4x + 6) φ(µ2, σ2) (−4x + 6)2 − 12 φ(µ2, σ2) 4 (4) (5) (6) (7) −4−2024−0.6−0.4−0.20.00.20.40.6xdensity derivative functionTRUEgaussianbiweighttriweighttricube−3−2−10123−0.50.00.51.0xdensity derivative functionTRUEh = 0.14h = 0.3h = 0.6h = 1.2

where φ is a standard normal density. > hatf <- dkde(bimodal, deriv.order = 0) Data: bimodal (200 obs.); Kernel: gaussian Derivative order: 0; Bandwidth 'h' = 0.2098 eval.points :-3.86436 Min. 1st Qu.:-1.98016 Median :-0.09595 Mean :-0.09595 3rd Qu.: 1.78826 : 3.67246 Max. est.fx :0.0000032 Min. 1st Qu.:0.0147846 Median :0.0737948 Mean :0.1324227 3rd Qu.:0.2326044 :0.4374314 Max. > hatf1 <- dkde(bimodal, deriv.order = 1) Data: bimodal (200 obs.); Kernel: gaussian Derivative order: 1; Bandwidth 'h' = 0.259 eval.points Min. :-4.06125 1st Qu.:-2.07860 Median :-0.09595 Mean :-0.09595 3rd Qu.: 1.88670 Max. : 3.86935 est.fx Min. :-0.4870865 1st Qu.:-0.1521016 Median : 0.0009041 Mean : 0.0000000 3rd Qu.: 0.1731795 Max. : 0.5038096 > hatf2 <- dkde(bimodal, deriv.order = 2) Data: bimodal (200 obs.); Kernel: gaussian Derivative order: 2; Bandwidth 'h' = 0.3017 eval.points Min. :-4.23200 1st Qu.:-2.16398 Median :-0.09595 Mean :-0.09595 3rd Qu.: 1.97208 Max. : 4.04010 est.fx Min. :-1.6800486 1st Qu.: 0.0012798 Median : 0.1421495 Mean :-0.0000073 3rd Qu.: 0.3389096 Max. : 0.7457487 > hatf3 <- dkde(bimodal, deriv.order = 3) Data: bimodal (200 obs.); Kernel: gaussian Derivative order: 3; Bandwidth 'h' = 0.3367 eval.points Min. :-4.37205 1st Qu.:-2.23400 Median :-0.09595 Mean :-0.09595 3rd Qu.: 2.04210 Max. : 4.18016 est.fx Min. :-4.353602 1st Qu.:-0.472761 Median : 0.001312 Mean :-0.000008 3rd Qu.: 0.388689 Max. : 3.614749 5

By default, the function dkde selects a grid of 512 points in the data range and used the Gaussian kernel. The output is a list containing the estimated values in the points of the grid, this last sequence and the bandwidth h (by default, using unbiased cross-validation method). In Figure 3 we show the ﬁrst three derivatives estimators of f (x) obtained with the code: dnorm(x,1.5,0.5) ((-4*x+6)^2 - 4) * dnorm(x,1.5,0.5) > fx <- function(x) 0.5 * dnorm(x,-1.5,0.5) + 0.5 * dnorm(x,1.5,0.5) > fx1 <- function(x) 0.5 *(-4*x-6)* dnorm(x,-1.5,0.5) + 0.5 *(-4*x+6) * + > fx2 <- function(x) 0.5 * ((-4*x-6)^2 - 4) * dnorm(x,-1.5,0.5) + 0.5 * + > fx3 <- function(x) 0.5 * (-4*x-6) * ((-4*x-6)^2 - 12) * dnorm(x,-1.5,0.5) + + > plot(hatf ,fx = fx) > plot(hatf1,fx = fx1) > plot(hatf2,fx = fx2) > plot(hatf3,fx = fx3) 0.5 * (-4*x+6) * ((-4*x+6)^2 - 12) * dnorm(x,1.5,0.5) Figure 3: Kernel density derivative estimates obtained with the function dkde. (top left) density estimate ˆfh(x). (top right) ﬁrst derivative ˆf (1) h (x). (bottom right) third derivative ˆf (3) h (x). (bottom left) second derivative ˆf (2) h (x). 6 −4−2020.00.10.20.30.4xdensity functionEstimateTrue−4−2024−0.4−0.20.00.20.4xdensity derivative functionEstimateTrue−4−2024−1.5−1.0−0.50.00.5xdensity derivative functionEstimateTrue−4−2024−4−2024xdensity derivative functionEstimateTrue

4 Bandwidth selections Despite the great number of bandwidth selection techniques in kernel density estimator or regression estimation, as for example Rudemo [1982], Bowman [1984], Scott and George [1987], Sheather and Jones [1991], Chiu [1991a,b, 1992], Feluch and Koronacki [1992], Stute [1992], Jones et all [1996], Sheather [2004], Duong and Hazelton [2003, 2005], Heidenreich et all [2013], to the best of our knowledge, only few paper have been studied in the context of estimating the rth derivative of a density f (x), see Peter and Marron [1987], Wolfgang et all [1990], Jones and Kappenman [1991], Stoker [1993]. In this section we summarize the techniques of cross-validation methods for bandwidth choice in the kernel estimation of the derivatives of a probability density. The practicality of this methods is demonstrated by an example. 4.1 Optimal bandwidth We Consider the following AMISE version of the rth derivative of a probability density f (x) [Scott, 1992, p. 131]: f (r+2) AMISE(h, r) = h4µ2 2(K)R The optimal bandwidth minimizing (8) is: 1 4 nh2r+1 + RK(r) 2(K)Rf (r+2)1/(2r+5) (2r + 1)RK(r) K(r) 4 µ2 (2r+5) µ2 2r + 5 R 4 h∗ = n−1/(2r+5) 2(K)Rf (r+2) 2r+1 2r+5 − 4 2r+5 n (10) 2r + 1 (8) (9) whereof: AMISE(h, r) = which is the smallest possible AMISE for estimation of ˆf (r) h . The function h.amise provides the optimal bandwidth under AMISE. The same possibilities for the kernel function as in the function dkde appear here. We enumerate the arguments and results of this function in Table 3. Description The data sample. Arguments x deriv.order Derivative order (scalar). lower,upper Range over which to minimize. The default is almost always satisfactory, tol kernel Results h amise hos (Over-smoothing) is calculated internally from an kernel. The convergence tolerance for optimize. The kernel function (see Table 1), by default "gaussian". Description Value of bandwidth (Equation 9). The AMISE value (Equation 10). Table 3: Summary of arguments and results of h.amise. The following example computes this bandwidth for a ﬁrst three derivatives estimators of (4). > h.amise(bimodal, deriv.order = 0) Call: Aymptotic Mean Integrated Squared Error Derivative order = 0 Data: bimodal (200 obs.); AMISE = 0.002602521; Kernel: gaussian Bandwidth 'h' = 1.284843 > h.amise(bimodal, deriv.order = 1) 7

Call: Aymptotic Mean Integrated Squared Error Derivative order = 1 Data: bimodal (200 obs.); AMISE = 0.0009282042; Kernel: gaussian Bandwidth 'h' = 1.774593 > h.amise(bimodal, deriv.order = 2) Call: Aymptotic Mean Integrated Squared Error Derivative order = 2 Data: bimodal (200 obs.); AMISE = 0.0003062873; Kernel: gaussian Bandwidth 'h' = 2.245869 > h.amise(bimodal, deriv.order = 3) Call: Aymptotic Mean Integrated Squared Error Derivative order = 3 Data: bimodal (200 obs.); AMISE = 8.793292e-05; Kernel: gaussian Bandwidth 'h' = 2.690288 4.2 Maximum likelihood cross-validation They proposed to choose h so that the pseudo-likelihoodn This method was proposed by Habbema, Hermans and Van den Broek [1974] and Duin [1976]. ˆfh(Xi) is maximized. However this has a trivial maximum at h = 0, so the cross-validation principle is invoked by replacing ˆfh(x) by the leave-one-out ˆfh,i(x), where: i=1 ˆfh,i(Xi) = 1 (n − 1)h Xj − Xi h j=i K Deﬁne that h as good which approaches the ﬁnite maximum of n−1 MLCV(h) = n i=1  j=i MLCV(h) Xj − Xi  − log[(n − 1)h]  h hmlcv = argmax h>0 log K (11) (12) The function h.mlcv computed the maximum likelihood cross-validation for bandwidth selection. We enumerate the arguments and results of this function in Table 4. Description The data sample. Arguments x lower,upper Range over which to minimize. The default is almost always satisfactory. tol kernel Results h mlcv The convergence tolerance for optimize. The kernel function (see Table 1), by default "gaussian". Description Value of bandwidth (Equation 11). The MLCV value (Equation 12). Table 4: Summary of arguments and results of h.mlcv. The following example computes this bandwidth of bimodal Gaussian density (Equation 4), by diﬀerent kernels. 8

分享到：

赞收藏

资料库

R语言-核密度及其导数估计、最优窗宽选择的方法.pdf

相关推荐

课程资源

热门标签

最新资料