logo资料库

半监督式深度极限学习机,用于基于Wi-Fi的本地化.pdf

第1页 / 共12页
第2页 / 共12页
第3页 / 共12页
第4页 / 共12页
第5页 / 共12页
第6页 / 共12页
第7页 / 共12页
第8页 / 共12页
资料共12页,剩余部分请下载后查看
Semi-supervised deep extreme learning machine for Wi-Fi based localization
Introduction
Related works
Methodology--Semi-supervised deep extreme learning machine
A. Graph based Semi-supervised Learning
B. Deep Learning
C. Extreme Learning Machine
D. Proposed Semi-Supervised Deep Extreme Learning Machine
Performance Evaluation
A. Office environment
a) Model generation
b) Feature Extraction
c) Localization Performance
d) Man-hour Cost
B. Lobby environment
a) Feature Extraction
b) Localization Performance
Conclusion
Acknowledgement
References
Neurocomputing 166 (2015) 282–293 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Semi-supervised deep extreme learning machine for Wi-Fi based localization Yang Gu a,b,c, Yiqiang Chen a,b,n, Junfa Liu a,b, Xinlong Jiang a,b,c a Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China b Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing, China c University of Chinese Academy of Sciences, Beijing, China a r t i c l e i n f o a b s t r a c t Article history: Received 12 September 2014 Received in revised form 29 March 2015 Accepted 2 April 2015 Communicated by: G.-B. Huang Available online 23 April 2015 Keywords: Wi-Fi indoor localization Semi-supervised learning Deep learning Extreme Learning Machine (ELM) Along with the proliferation of mobile devices and wireless signal coverage, indoor localization based on Wi-Fi gets great popularity. Fingerprint based method is the mainstream approach for Wi-Fi indoor localization, for it can achieve high localization performance as long as labeled data are sufficient. However, the number of labeled data is always limited due to the high cost of data acquisition. Nowadays, crowd sourcing becomes an effective approach to gather large number of data; meanwhile, most of them are unlabeled. Therefore, it is worth studying the use of unlabeled data to improve localization performance. To achieve this goal, a novel algorithm Semi-supervised Deep Extreme Learning Machine (SDELM) is proposed, which takes the advantages of semi-supervised learning, Deep Leaning (DL), and Extreme Learning Machine (ELM), so that the localization performance can be improved both in the feature extraction procedure and in the classifier. The experimental results in real indoor environments show that the proposed SDELM not only outperforms other compared methods but also reduces the calibration effort with the help of unlabeled data. & 2015 Elsevier B.V. All rights reserved. 1. Introduction Wireless localization based on Wi-Fi is quite popular, especially in indoor environment [1–3], for it does not need deploying any extra infrastructure. What is more, the widespread Access Points (APs) and smart mobile devices facilitate the development of Wi-Fi based indoor localization. To implement indoor localization, tradi- tionally, labeled data are required. In Wi-Fi localization field, labeled data means both data and their corresponding locations (coordi- nates/classes) are known; unlabeled data means only data are available, and whose corresponding locations are unknown. Gen- erally, there are mainly two types of Wi-Fi localization methods: propagation based method and fingerprint based method. The propagation based method takes the advantage of the nonlinear fading characteristics of wireless signal to set up a propagation model [1]. This kind of method can be easily implemented; however, the localization performance is not good enough for it is difficult to set up an accurate propagation model in a complex and dynamic indoor environment. Compared with propagation based method, fingerprint based localization method is widely adopted. “Finger- prints” are features obtained by feature extraction methods, and used to represent the corresponding locations; “localization” means n Corresponding author at: Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. http://dx.doi.org/10.1016/j.neucom.2015.04.011 0925-2312/& 2015 Elsevier B.V. All rights reserved. adopting pattern recognition method to estimate the location. Theoretically, the more the labeled data are, the better the localiza- tion performance will be. However, the calibration procedure (which is also known as the acquisition of labeled data) is always at the cost of time, man-hour and money, which leads to the limited number of labeled data. According to [4], “calibration procedures are applied in a great number of proposed location techniques and are considered to be not practical or a considerable barrier to wider adoption of such methods”. As reported [5], Ekahau [6], a commercial real-time localization system, spent $10,000 just to collect labeled data in a large office building, which clearly verifies the high cost of obtaining labeled data. In contrast to the acquisition of labeled data, the collection of unlabeled data can be carried out easily; especially when crowd sourcing is adopted. Crowd Sourcing [7,8] is a distributed model to solve problems through an open way with different participants. For indoor localization, crowd sourcing (data collection) means gathering data from heterogeneous devices [9], and most of the data are unlabeled. In order to use unlabeled data, works based on manifold learning and semi-supervised learning were proposed. These works indeed improved localization performance with unlabeled data; however, they paid less attention to the problem caused by crowd sourcing. After data have been collected for localization system, feature extraction becomes an important step. Though Received Signal Strength (RSS) of Wi-Fi signal has real physical meaning and can
Y. Gu et al. / Neurocomputing 166 (2015) 282–293 283 be directly used as feature; the highly dynamic indoor environment and heterogeneous devices lead to severe fluctuation of wireless signal, which makes this kind of direct feature less representative. Traditional indoor localization methods mainly selected features manually or extracted features by shallow networks, which cannot reflect Wi-Fi characteristics in complex environment well. However, a machine learning method, Deep Learning (DL) has been famous worldwide since 2006 [10]. DL learns high level features and distributed data structure, which can represent original data better than shallow feature. Though DL outperforms traditional neural networks, its training time consumption is high. While, Extreme Learning Machine (ELM) proposed in the same period as DL, is popular for fast learning speed, which can make up the time- consuming shortage of DL. Considering the analysis above, we put forward a novel localiza- tion method, Semi-supervised Deep Extreme Learning Machine (SDELM) to improve the localization performance with unlabeled data. The contribution of this work is: 1) Utilize unlabeled data to get discriminative features and better classification ability; 2) Propose semi-supervised embedding for deep leaning network; 3) Adopt modified ELM to improve the learning speed of SDELM. The rest of this paper is organized as follows: Section II shows the related works in Wi-Fi localization field. Section III introduces the proposed SDELM in detail. Section IV evaluates the performance of SDELM in real wireless indoor environments. And Section V con- cludes the work. 2. Related works The mainstream approach of Wi-Fi based indoor localization, fingerprint based method [2,3], consists of two phases: off-line training phase and online locating phase. At training phase, features are extracted according to a certain rule, so that the relationship between features and their corresponding locations can be estab- lished, which forms a “radio map”; at online locating phase, features of unknown location are extracted with the same rule, and pattern recognition methods are used to find the best matched locations on the “radio map”, then the final location can be estimated by those matched locations. Two parts in fingerprint based method: feature extraction and pattern recognition (classifier) involve unlabeled/ labeled data, which can affect the overall localization performance. For these two parts, there are many related works. For feature extraction, it is well-known that Wi-Fi signal is fickle: the unpredictable movement of people, the status of door (open/ closed), the change of temperature and humidity, the placement of objects, the type of device [11] et al., which all lead to the fluctuation of Wi-Fi signal. Hence, feature extraction work of Wi-Fi can greatly affect the final localization performance. Up till now, most fingerprint methods extract shallow level information from labeled data as Wi-Fi feature. Mean value method is the commonest feature extraction method [12–15], which collected multiple RSS vectors at the same location, and averaged those vectors to form the feature. Even though RSS changes unpredictably, the mean value can, from the statistical view, reduce the unstableness of RSS. Further than mean value method, Gaussian model was put forward based on the assumption that RSS at the same location satisfied Gaussian distribution. [16] utilized Gaussian process to eliminate the side-effect caused by small probability RSS and outliers, and averaged high probability RSS vectors to get feature. Except for statistical feature extraction meth- ods, unsupervised machine learning method is also adopted. [17] used Principal Component Analysis (PCA) to extract Wi-Fi feature; the principal components of multi-dimension wireless signal not only offered new feature but also reduced the effect of noise. AP selection methods were also put forward, for AP is the emission source of Wi-Fi signal. [18] picked out k APs with the strongest RSS at each location as feature. [19] proposed InfoGain criterion to evaluate the merit of each AP in terms of power and selected the highest ones to form feature. Other than common feature extraction approaches, special methods were also presented to solve specific problems. To overcome the variance caused by different devices, [20] came up with a hyperbolic method, utilizing signal strength ratio instead of absolute signal strength to form feature. And to mitigate the distortion caused by human body, [21] proposed an ellipse signal attenuation model to generate an orientation-independent fingerprint database. Though there were many works proposed to extract features for Wi-Fi based localization, seldom utilized unlabeled data. Besides, most of them were at shallow level, which cannot represent the Wi- Fi characteristics in complex indoor environment well. For fingerprint based localization methods, pattern recognition methods (classifiers) are used to estimate the final localization result, and most of the traditional methods are supervised [1]. However, considering the fact that labeled data are limited and unlabeled data are adequate, manifold learning and semi-supervised learning methods were widely adopted. Pan [22] proposed a manifold learning method to calculated users’ location and APs’ location. It first adopted SVD based method with labeled and unlabeled data to calculate relative locations (which is a dimension reduction process), then used manifold assumption to get the absolute locations with the assist of labeled data. Pulkkinen [23] used Isomap method to reduce the feature space to a low dimension manifold space; then transformed the manifold space to location space to implement localization. By carefully selected key points to represent fingerprints, this method can calibrate an entire radio map based on a small fraction of labeled data; and according to the experiment, the mean localization error was 2.0 m when there were 66 testing points. Zhang [24] presented LocMR method based on manifold regularization, which learnt the mapping from signal space to location space; the novelty of the work is that time span constraint is brought into the calculation of Laplacian matrix, so that the manifold in time domain can be maintained. In order to utilize unlabeled data, Lin [25] applied spectral decomposition of Laplacian matrix to get data representation in eigenvectors space, then added labels to unlabeled data by aligning labeled data. The result showed that only with small proportion of labeled data like 20%, the localization accuracy can reach to 75%. A semi-supervised localization method, Semi-supervised Extreme Learning Machine (SELM) [26], was put forward based on ELM, which brought graph Laplacian constraint to ELM, so that the sparsely calibrated localiza- tion system can perform well with fast learning speed. Though there are plenty of existing manifold and semi- supervised works which can improve the localization performance with unlabeled data; however, most of them are based on shallow features and do not take crowd sourcing into consideration. Therefore, considering the practical localization problems such as feature extraction and unlabeled data from crowd sourcing, we put forward SDELM method on the basis of semi-supervised learning, deep learning and ELM, so that the overall localization performance can be improved. 3. Methodology–Semi-supervised deep extreme learning machine Graph based semi-supervised learning is embedded to deep learning structure to get discriminative feature and better classifica- tion boundary. To guarantee the learning speed, semi-supervised learning is evolved from ELM. Hence, in this section, we firstly
284 Y. Gu et al. / Neurocomputing 166 (2015) 282–293 goal of AE based DL is to minimize reconstruct error, making hW;b Xk  Xk, which equals to minimize the loss function: #    ð J W; b Þ ¼ min W;b " " 1 N XN XN i ¼ 1   i ¼ 1 1 2 ¼ min W;b 1 N briefly introduce graph based semi-supervised learning, deep lean- ing, and ELM; then elaborate the proposed SDELM in detail. 3.1. A. Graph based Semi-supervised Learning Semi-supervised learning is between unsupervised learning and supervised learning, which utilizes a small amount of labeled data and a large amount of unlabeled to train a model. There are three assumptions for semi-supervised learning [27]: smoothness assump- tion, cluster assumption and manifold assumption. The smoothness assumption implies that if two points are close in a high-density region, their corresponding outputs should also be close. The cluster assumption is used to find the boundary of each cluster, for “if points are in the same cluster, they are likely to be in the same class”, and cluster assumption can be seen as a special case of smooth assump- tion. The manifold assumption is used to avoid “the curse of dimensionality”. And semi-supervised learning should satisfy at least one of the above assumptions. Semi-supervised learning usually adopts graph based methods, data points (both labeled and unlabeled data) are the vertexes of the graph, and the goal of graph based methods can be expressed as: ! V f xið Þ; yi þλA‖f ‖2 þλI Z M arg min f 1 l f xð Þ‖∇Mf xð Þ‖2dp xð Þ ð1Þ Where M is the manifold that data lie in, and λA, λI are the parameters control the smoothness in the ambient and intrinsic spaces, l is the number of samples. Among all the graph based methods, Laplacian graph is mostly adopted, which makes (1) be approximately expressed as: V f xið Þ; yi þλA‖f ‖2 þλIf T Lf ! ð2Þ arg min f 1 l Xl i ¼ 1 Xl i ¼ 1 Where L is the Laplacian matrix. 3.2. B. Deep Learning Deep learning is a branch of neural networks, which simulates the multi-layer cognition of human brain to obtain high level features and distributed data structure. The basic idea of DL is firstly using unsupervised learning methods to pre-train the network layer by layer; then making the training output of lower level as the input for the upper layer; and finally adopting supervised methods to fine- tune the parameters of the whole network [28,29]. DL with different building blocks has been applied to many fields [30–34], and here we mainly introduce DL built up with Auto Encoder (AE) [35]. Assuming there is a DL network with M hidden layers, which is illustrated in Fig. 1. (Each node in the lower layer is connected to all the nodes in the upper layer, here to make the figure concise, only some connections are shown). Lk is the neuron number of the kth hidden layer. Each layer of AE consists of encoding and decoding procedure. Assuming Xk represents the input data, W k1 is the encoding weight, bk1 is the encoding bias; W k2 and bk2 are the corresponding decoding weight and bias. The encoding and decoding steps follow the rules [35]: Encoding : Xk ¼ fðzkÞ zkþ 1 ¼ W k1Xk þbk1 ð3Þ Decoding : XM þ k ¼ fðzM þ kÞ zM þ kþ 1 ¼ W M k ð Þ2 Þ2XM þ kþb M k ð4Þ The encoding step means encoding each layer in forward order, and decoding step means decoding stack of AE in reverse order (The encoding and decoding weight and bias can be different). The ð J W; b; Xk  ‖f W; b; Xk #   Xk‖2 ð5Þ With pre-training process, better initial value of encoding (decoding) weight and bias can be calculated from (5) . And based on these parameters, the final optimal parameters can be obtained by the fine-tuning process. Encoding output of the final layer is the high level feature to represent original input data. 3.3. C. Extreme Learning Machine Extreme Learning Machine (ELM) is a supervised learning method proposed by Huang et al. [36–38]. The random generation of input weight and bias of ELM [39] is the essential difference from other conventional neural networks, which leads to the fast Þ, where learning speed. Given N arbitrary distinct samples Xi; T i ŠAℜn is the input feature and n is the input Xi ¼ Xi1; Xi2; …Xin ŠAℜm is the output vector repre- ½ dimension, T i ¼ T i1; T i2; …T im sents the class and m is the output dimension. Neural network with L hidden neurons can be illustrated in Fig. 2. ð ½ The output of this network can be expressed as: i ¼ 1 ð6Þ βiGðwi; bi; XiÞ; wi Aℜn; bi Aℜ; βi Aℜm ðXiÞ ¼ f L ½ Š is the weight vector connecting the ith hidden wi ¼ wi1; wi2; :::; win  neuron and input neurons, and bi is the threshold of the ith hidden neuron, Gðwi; bi; XiÞ is the output of the ith hidden neurons. βi ¼ βi1; βi2; ⋯; βim neuron and the output neurons. If the activation function of hidden neuron is g Xð Þ, then the output of the ith hidden neuron can be calculated as (7): Gðwi; bi; XiÞ ¼ gðwi UXi þbiÞ; wi Aℜn; bi Aℜ T is the weight vector connecting the ith hidden ð7Þ XL ... ... ... ... ... Output data Hidden layer LM Hidden layer L1 Input data Fig. 1. Deep network with M hidden layers. w w n L Lm L Fig. 2. SLFN with L hidden neurons.
Y. Gu et al. / Neurocomputing 166 (2015) 282–293 285 The above two equations can be summarized as: Hβ ¼ T 2 Where 64 H ¼ 2 664 Gðw1; b1; X1Þ ⋯ GðwL 3 Gðw1; b1; XNÞ ⋯ GðwL 775 βT 1 ⋮ βT L ; bL ⋮ ; bL T T 1 ⋮ T T N ; XNÞ ; X1Þ 2 664 3 775 3 75 T ¼ β ¼ NL ⋱ ⋮ ; ; Nm Lm ð8Þ ð9Þ The goal of ELM is to obtain β ¼ argmin ‖HβT‖, so that the error between real output and expected output can be minimized. The solution of ELM is : β ¼ H†T ¼ HTH  1 ð10Þ  HT β Where “†” is the Moore–Penrose generalized inverse of matrix. The above result is the minimum norm least-squares solution of eq. (8), which guarantees the minimum error and the least norm at the same time. 3.4. D. Proposed Semi-Supervised Deep Extreme Learning Machine for Wi-Fi As claimed above, indoor localization, the limited number of labeled data is a practical problem. To solve it, we put forward SDELM localization method, utilizing both labeled and unlabeled data (crowd sourcing) to improve localization performance. For SDELM, deep learning embedded with semi-supervised learning is used to get high level abstract feature and better classification ability; and ELM is used to enhance the learning speed. The original goal of ELM is: Hβ ¼ T. Here, in order to study the characteristics of both labeled data and unlabeled data as AE does, the goal is changed to (11): ð11Þ Hβ ¼ X Þ ðXi ¼ ½Xi1; Xi2:::XinŠ Where X represents N1 labeled data Xi; T i Aℜn; T i ¼ ½T i1; T i2:::T imŠAℜmÞ and N2 unlabeled data Xj ðXj ¼ ½Xj1; Xj2:::XjnŠAℜnÞ), N1 þN2 ¼ N. In order to ensure the smooth- ness of proposed method, which is “Points which are close to each other are more likely to share a label”, a smoothness constraint is ‖f ‖2 (f ¼ Hβ and C1 is the brought in to the target, which is C1 2 balance parameter). Besides, in order to avoid over fitting and guarantee the generalization ability, ℓ2;1 norm regularization C2=2‖β‖2 is added to the target (C2 is also a balance parameter). ð Fig. 3. Framework of deep feature extraction. ð13Þ ð14Þ Therefore the loss function for deep feature extraction becomes: l1 ¼ min ð12Þ According to ELM, f ¼ Hβ is put into (12); then the loss function ‖f X‖2 þC1 2 ‖f ‖2 þC2 2 ‖β‖2 1 2 f becomes: 1 l1 ¼ min 2 β ‖HβX‖2 þC1 2 ‖Hβ‖2 þC2 2 ‖β‖2 ∂l1 ∂β ð ÞT HþC1 Hβð The derivative of β with respect to l1 is: ¼ HβX  When ∂l1=∂β ¼ 0, we get the solution of β, which is: ÞT HþC2βT  1 ð ð HT X ÞUI1 ÞHþC2I2 β ¼ HT C1 þ1 ð15Þ I1 AℜNN and I2 AℜLL are identity matrixes (L is the number of hidden neuron). For the kth layer of the deep feature learning,  the output weight is: βk ¼ HT ÞUI1 ÞHkþC2I2  1 k C1 þ1 ð ð kXk 1 HT  T ð16Þ We adopt this modified ELM as building block, and use βk as weight matrix to get each layer's output [40]. If the activation function is g xð Þ, then the output of the kth layer becomes: Xk ¼ g   T UXk 1 ð17Þ  βk The framework of this deep feature extraction is illustrated in Fig. 3. To summarize, the deep feature extraction has three steps: 1) for each layer, calculate output matrix based on (16); 2) get the output of each hidden layer with its corresponding output weight matrix; 3) make the lower layer output as the input for next upper layer. After the feature extraction process, the final high level features are put into the classifier. In order to guarantee the generalization ability of classifier, ℓ2;1 norm regularization is brought in, and the goal becomes: β min ‖β‖2 ‖f T‖2 þC 2 1 2 C is a weight penalty. Besides ℓ2;1 norm constraint, Laplacian graph regularization is also added to (18) to generate a precise classification boundary, which makes the final goal of classifier: ð19Þ ‖β‖2 þλTr f T Lf ð18Þ min   1 2 ‖f T‖2 þC 2 β Where λ is a constraint penalty, Tr is the trace of matrix, and L is Laplacian matrix calculated as: L ¼ DW D is a diagonal matrix given by Dii ¼ PN1 þ N2 ð20Þ i ¼ 1 W ij, and W is the edge weight matrix, which is usually processed with Gaussian kernel: ð21Þ ! W ij ¼ exp ‖XiXj‖2 2σ2 With f ¼ Hβ, becomes: l2 ¼ min the goal of the semi-supervised classifier   β Þ ÞT L Hβð ‖JHβT‖2 þC‖β‖2 þλTr Hβð ð22Þ In (22), J AℜNN is the indication matrix to distinguish labeled data iaj. from unlabeled data, and J i; i For N training data, T AℜmN is the corresponding extended label matrix, m is the number of class. If one labeled data belongs to class j, then for the label vector, the jth value is set to be 1, and the rest m1 Þ i ¼ 1; 2; :::; N1; Þ ¼ 1; Þ ¼ 0; ð J i; j ð ð
286 Y. Gu et al. / Neurocomputing 166 (2015) 282–293 values become 1; the label vectors for unlabeled data are all set to be 0. In order to get the solution of this optimization problem, we get the derivative of β with respect to l2 in (22), which is: ∂l2 ∂β ¼ JHβT ð23Þ The solution of β satisfies the condition that ∂l2=∂β ¼ 0, which h ÞT JHþCβT þλ Hβð ÞT LH ð i 1 ð24Þ is: β ¼ C UIþHT JþλL ð ÞH HT JT Eqs. (16) and (24) are the solutions of weight matrix, which construct the structure of SDELM. Constraint parameters in those equations and the depth of deep feature learning affect the overall performance of SDELM; hence we need to tune these parameters. There are mainly four kinds of parameters in SDELM needed to be tuned, which are weight C1, C2 in deep learning procedure, and weight λ, C in classifier. Usually, cross validation is used to adjust these four parameters. However if the parameters are tuned simultaneously, the time complexity will be Oðn4Þ, which is time- consuming. Therefore, in order to make the adjustment easier and less costly, we adopt hierarchical adjustment mechanism to tune the parameters, which is similar to sparse coding [34] (For sparse coding, it firstly optimized sparse coefficients with fixed bases, then tuned the bases with fixed coefficients). The parameters are split into two levels according to their level the first function; parameters are the parameters in the classifier, and the second level parameters are the parameters in deep learning. Firstly, we assign empirical values to these parameters, and then tune the first level's parameters. After optimizing parameters in the first level, we adjust the parameters in the second level. With this hierarchical adjustment mechanism, the time complexity can be decreased. Here, we show an example to make the scheme clear: for depth di, we adjust the parameters in the classifier at first, then tune the parameters in the deep learning procedure with locally optimized C and λ. When all these four parameters are optimized, the final localization result rdi is obtained. Then we increase the depth by 1, which makes diþ 1 ¼ diþ1. Again, we adopt the same scheme to tune the parameters and get its corresponding result rdi þ 1. When for the first time, the final depth becomes dep ¼ di, rdi þ 1 otherwise diþ 1 ¼ diþ1. This greedy scheme is used to determine the depth of the network. With optimized parameters and depth, the SDELM model can be set up. Þ, To sum up, given N1 labeled wireless training data Xi; T i ŠAℜm, and N2 where Xi ¼ Xi1; Xi2; :::; Xin unlabeled training data Xj Aℜn; n is the number of collected AP and m is the number of class. SDELM algorithm can be summar- ized as: ŠAℜn,T i ¼ T i1; T i2; …; T im ordi ð ½ ½ 1) Prepare all the training data, 2) Assign initial depth value di ¼ 0; unlabeled data, and normalize them into [0, 1]; including labeled data and Fig.4. Model generation of SDELM.
Y. Gu et al. / Neurocomputing 166 (2015) 282–293 287 3) Determine activation function g xð Þ, number of hidden neurons hidnumk for deep learning and activation function f xð Þ, num- ber of hidden neurons hidnumc for classifier; 4) Assign initial values for C1, C2 in deep learning and the constrain parameter C, λ in classifier;  5) Randomly assign the weight matrix W k and bias matrix bk for  the kth layer; 6) Calculate the kth hidden layer output by equation Hk ¼ g W k UXk 1 þbk   1 layer output matrix by (16): βk ¼ ÞUI1 ; 7) Calculate the kth ÞHk þC2I2 k C1 þ1 ð ð HT kXk 1; HT input for the upper layer; 8) Calculate the output of the kth layer by (17), and make it as the 9) Transform the labels of N1 labeled data to Nnmð 10) Calculate the classifier output weight by (24); 11) Put the last layer output of deep learning into the semi- Þ matrix; supervised classifier to get the result; 12) Utilize hierarchical mechanism to adjust parameters C1, C2, C, and λ, then go through steps 4) to 11) to get the final result for the corresponding depth; 13) Make diþ 1 ¼ diþ1, and go through 4) to 12). 14) If rdi þ 1 4rdi , go through 4) to 13), else the model is set up. The model generation framework of proposed SDELM is illu- strated in Fig. 4. 4. Performance Evaluation In this section, performance of proposed SDELM will be validated from four aspects: feature representation, localization accuracy, time consumption and man-hour cost in real indoor environments. 4.1. A. Office environment Firstly, localization performance of SDELM is validated in a typical office environment. It is a 15 mn10 m area on the 8th floor of a research building with 24 different locations, whose layout is shown in Fig. 5. Circle points are the locations where labeled data were collected. The reason to choose this environment as test bed is that it is highly dynamic, full of people movement and other changing elements, which can show the complex characteristics of Wi-Fi signal in indoor environment. In this environment, we collected 5695 items of data in one month; 500 of them were randomly chosen as testing data and the rest became training data. The experimental configuration is as follows: DELL PC with 32 bit Operation System, 2.00 G RAM, Intel Core2 CPU, and Matlab R2009b. To show the statistical characteristics of RSS, another 500 data were collected at the same location, and Fig. 6 illustrates the distribution of RSS for the same AP. Obviously, even though the signal was collected for the same AP at the same location by the Fig. 5. Indoor location environment. 0.35 0.3 0.25 0.2 0.15 0.1 0.05 y t i l i b a b o r P 0 -85 -80 -75 -70 -65 -60 RSS(dbm) Fig. 6. RSS distribution for the same AP over a time period. same device, RSS still can fluctuate from -83 dbm to -60 dbm, which makes the difference bigger than 20dbm. Therefore, if the discri- minability of feature is not good enough, the location will be wrongly estimated, which degenerates the localization performance. Hence, a good feature extraction method is necessary, and we propose SDELM. 4.1.1. a) Model generation To verify the performance of SDELM, an effective model should be set up at first. Here, we randomly choose 500 labeled data and 3000 unlabeled data from training data set; the number of hidden neuron for each layer is 300. Initial value for constraint parameter C1 is set to be 0.2, and C2 is set to be 0.01. The parameters for semi- supervised classifier are: “sig” activation function,C ¼ 0:001, λ ¼ 0:3, 1000 hidden neurons, σ for Gaussian kernel is 0.5, and the number of Laplacian graph neighbor is 5. When the depth is 0, model achieves its best accuracy 30.2% with C ¼ 0:0005 and λ ¼ 0. We record this result, and increase the depth to 1. When dep ¼ 1, firstly, parameters in the classifier are tuned, and the corresponding result is given in Fig. 7. In Fig. 7, when C ¼ 0:01 and λ ¼ 0:3(Lambda), the model gets its locally optimal result. Then based on the value of C and λ, we adjust the parameters in deep feature extraction procedure, and the result is illustrated in Fig. 8. From the experiment, when dep ¼ 1, C1 ¼ 0:1, C2 ¼ 0:0, C ¼ 0:01 and λ ¼ 0:3, the network get its best result, which is 31.2% (classification accuracy). Apparently, this result is better than that when dep ¼ 0. Hence, we increase the depth of network to dep ¼ 2, and tune the parameters to get the result, which is shown in Fig. 9. When dep ¼ 2, the combination of parameters C1 ¼ 0:2, C2 ¼ 0:01, C ¼ 0:001 and λ ¼ 0:3 leads to the optimal localization accuracy 34.4%, which is better than that when dep ¼ 1. Therefore, the depth is increased to 3, and related result with different parameters is illustrated in Fig. 10. And the combination of parameters C1 ¼ 0:2, C2 ¼ 0:01, C ¼ 0:001 and λ ¼ 0:4 leads to the best classification accuracy, 32%, which is not better than that when dep ¼ 2. Hence, the model is generated with two layers of deep learning, one layer of classifier, and C1 ¼ 0:2, C2 ¼ 0:01, C ¼ 0:001, λ ¼ 0:3. 4.1.2. b) Feature Extraction The discriminability of SDELM's feature is mainly compared with that of shallow feature “mean value”. Different features are put into the same Nearest Neighbor (NN) classifier. Since NN is the most basic classifier which does not need to tune any parameters, and the stronger the discriminability of the feature is, the better the localization performance will be.
288 Y. Gu et al. / Neurocomputing 166 (2015) 282–293 We test the feature extraction ability of the generated model with 500 labeled data, and the number of unlabeled data changes as: 0, 500, 1000, 2000, and 3000. The performance with different features is shown in Fig. 11. Fig. 11 consists of six bars, the first one “shallow” is the performance without deep learning, just adopting shallow features (mean value), and the rest five bars are the performance of deep features with different number of unlabeled data (0, 500, 1000, 2000, and 3000). The localization accuracy also represents the classification accuracy. From the first bar to the last bar, the accuracy changes as: 18.6%, 19.9%, 20.5%, 21.6%, 22.8%, and 23.0% respectively. We find that even using the most naïve NN classifier, from the first two bars, localization performance of deep feature still can be improved by 1.3% with the same training data. And with the number of unlabeled data increase, classification accuracy shows a rising trend. When there are 3000 unlabeled data, localization accuracy improves 4.4% than that of shallow feature. The performance here verifies that the extracted deep feature of SDELM can improved the discriminability of fingerprints, and unlabeled data can have positive effects on feature extraction. y c a r u c c A n o i t a z i l a c o L 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 01e-0050.0001 0.001 0.01 0.03 0.05 C 0.1 0.3 0.5 1 0 1 0.9 0.8 0.7 0.6 Lambda 0.5 0.4 0.3 0.2 0.1 Fig. 7. Localization accuracy when dep¼1 with different combination of classifier parameters. y c a r u c c A n o i t a z i l a c o L 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 C1 0.00010.0010.01 0.04 0.07 0.1 0.3 0.5 0.7 1 Fig. 8. Localization accuracy when dep¼1 with optimized classifier parameters and different deep learning parameters. C 2 0 1 y c a r u c c A n o i t a z i l a c o L 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 01e-0050.0001 0.001 0.01 0.03 0.05 C 0.1 0.3 0.5 1 0 1 0.9 0.8 0.5 0.4 0.7 0.6 Lambda 0.3 0.2 0.1 4.1.3. c) Localization Performance After verifying the feature extraction ability of SDELM, we testify its overall localization performance. Firstly, localization performance of SDELM is compared with existing shallow indoor localization methods: RADAR [1], Support Vector Machine (SVM) [41], and SELM [26]. We use ‘libsvm’ [42] software package to implement SVM, and adopt cross-validation method to find the best parameters c and g in range ½2 24; 224Š with footstep 20:5. Finally c¼20:5 and g¼8. Para- meters for SELM are: 250 hidden neurons, 5 neighbors for Laplacian graph, σ ¼ 0:5 for Gaussian kernel, and λ ¼ 0:5. Parameters for SDELM are the same as those in Section a. Error distance criterion is adopted to measure the localization performance. Specifically, if the distance between estimated location LE and real (excepted) location LR (usually location information is a two dimension coordinate or a label, and one label is related to one coordinate) is smaller than pre-defined error distance r, which is ‖LE LR‖rr, then the localization is considered to be correct, and vice versa. Localization accuracy is the percentage that the number of correct localization accounts for the number of total localization, which is acc ¼ numright=numall. The localization performance of different algorithms is illustrated in Fig. 12. In Fig. 12, it is obvious, with the localization error distance increase, localization accuracy for each algorithm improves. Especially, SELM has better classification accuracy than RADAR and SVM, for SELM utilizes unlabeled data to achieve more precise classification plane; SVM utilizes support vectors to mitigate the fluctuation of wireless signal, therefore its performance is better than the baseline RADAR. SDELM achieves the relatively best localization performance, for the extracted high level feature and semi-supervised classifier improve the localization ability. Besides the comparison with shallow localization methods, SDELM is also compared with deep learning methods, such as SAE (Stacked auto encoder) [35], DBN (Deep belief network) [10], and ML-ELM (Multi-layer Extreme Learning Machine) [40]. The parameters for SDELM are still the same as above with structure 24-300-300-1000- 24; and the network structure for SAE, DBN and ML-ELM are: 24-300- y c a r u c c A n o i t a z i l a c o L 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 C1 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 C 2 0 Fig. 9. Localization accuracy when dep¼2 with different parameters. (a). Different combination of classifier's parameters (b). Different combination of deep learning's parameters.
Y. Gu et al. / Neurocomputing 166 (2015) 282–293 289 y c a r u c c A n o i t a z i l a c o L 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 01e-0050.00010.001 0.01 0.03 0.05 0.1 0.3 0.5 C 1 0 1 0.9 0.8 0.7 Lambda 0.6 0.5 0.4 0.3 0.2 0.1 y c a r u c c A n o i t a z i l a c o L 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 0.1 0.2 0.3 0.4 0.5 C1 0.6 0.7 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 C 2 0.8 0.9 1 0 Fig. 10. Localization accuracy when dep¼3 with different parameters. (a). Different combination of classifier's parameters (b). Different combination of deep learning's parameters. y c a r u c c A n o i t a z i l a c o L 0.25 0.2 0.15 0.1 0.05 0 Shallow D-0 D-500 D-1000 D-2000 D-3000 Fig. 11. Localization performance with the same 500 labeled data and different number of unlabeled data. y c a r u c c A n o i t a z i l a c o L 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 RADAR SVM SELM SDELM 1 2 3 4 5 6 7 8 9 10 Error Distance(m) Fig. 12. Localization performance for different localization methods. 24; 24-200-24; 24-500-2000-24 respectively. The localization perfor- mance is shown is Table 1. Form Table 1, we can see that except SDELM, other deep learning methods have similar localization performance, which is better than Table 1 Localization performance for deep learning methods. SAE DBN ML-ELM SDELM Accuracy (%) Training time (s) 29.80 110.3083 31.20 93.3978 29.60 36.8318 34.40 63.9448 shallow localization methods. And SDELM slightly outperforms the other three methods, because it not only extracts high level features as deep learning methods do, but also utilizes semi-supervised classifier to improve the classification ability. Besides the localization performance, we also compare the learning speed for different deep learning methods. Apparently, SDELM has less amount of training time than SAE and DBN, for SDELM neither has iterative optimization for the pre-training step, nor needs the fine-tuning procedure. In contrast to ML-ELM, SDELM spends more time, as the calculation of Laplacian matrix is time- consuming. However, considering the trade-off between accuracy and time consumption, SDELM is a reasonable choice to solve the indoor localization problems. 4.1.4. d) Man-hour Cost The goal of SDELM is to utilize unlabeled data to improve localization performance, so that the man-hour of gathering labeled data can be reduced. Hence, how much man-hour of calibration can be saved is tested in this part. In Fig. 12, in order to achieve approximately 34% localization accuracy (34.4%) when the error distance is 1 m, 500 labeled data and 3000 unlabeled data are needed. Here, we only use labeled data and change the number of them to get the same localization performance, and the result is shown in Fig. 13. Parameters of SDELM are the same as those in Section c. In Fig. 13, when error distance is 1 m, the localization accuracy changes as: 24.4%, 28.8%, 32.1%, 33.3%, and 35.4% with different number of labeled data. The result indicates that 1500 or 2000 labeled data are needed to achieve approximately 34% (33.3% or 35.4%) localization accuracy when the error distance is 1 m. How- ever, when unlabeled data are used to assist localization, from Fig. 12, we find that only 500 labeled data are required to achieve the same localization performance (34.4%), which means with the help of unlabeled data, man-hour can be saved to achieve the same result.
分享到:
收藏