logo资料库

步态识别gaitgan论文.pdf

第1页 / 共8页
第2页 / 共8页
第3页 / 共8页
第4页 / 共8页
第5页 / 共8页
第6页 / 共8页
第7页 / 共8页
第8页 / 共8页
资料共8页,全文预览结束
GaitGAN: Invariant Gait Feature Extraction Using Generative Adversarial Networks Shiqi Yu, Haifeng Chen, Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, China. Edel B. Garc´ıa Reyes Norman Poh Advanced Technologies Department of Computer Science, Application Center, 7ma A 21406, Playa, University of Surrey, Guildford, Surrey, Havana, Cuba. GU2 7XH, United Kingdom. shiqi.yu@szu.edu.cn egarcia@cenatav.co.cu n.poh@surrey.ac.uk chenhaifeng@email.szu.edu.cn Abstract The performance of gait recognition can be adversely affected by many sources of variation such as view angle, clothing, presence of and type of bag, posture, and occlu- sion, among others. In order to extract invariant gait fea- tures, we proposed a method named as GaitGAN which is based on generative adversarial networks (GAN). In the proposed method, a GAN model is taken as a regressor to generate invariant gait images that is side view images with normal clothing and without carrying bags. A unique ad- vantage of this approach is that the view angle and other variations are not needed before generating invariant gait images. The most important computational challenge, how- ever, is to address how to retain useful identity information when generating the invariant gait images. To this end, our approach differs from the traditional GAN which has only one discriminator in that GaitGAN contains two discrimi- nators. One is a fake/real discriminator which can make the generated gait images to be realistic. Another one is an identification discriminator which ensures that the the gen- erated gait images contain human identification informa- tion. Experimental results show that GaitGAN can achieve state-of-the-art performance. To the best of our knowledge this is the first gait recognition method based on GAN with encouraging results. Nevertheless, we have identified sev- eral research directions to further improve GaitGAN. 1. Introduction Gait is a behavioural biometric modality with a great po- tential for person identification because of its unique advan- tages such as being contactless, hard to fake and passive in nature which requires no explicit cooperation from the sub- jects. Furthermore, the gait features can be captured at a distance in uncontrolled scenarios. Therefore, gait recogni- tion has potentially wide application in video surveillance. Indeed, many surveillance cameras has been installed in major cities around world. With improved accuracy, the gait recognition technology will certainly be another use- ful tools for crime prevention and forensic identification. Therefore, gait recognition is and will become an ever more important research topic in the computer vision community. Unfortunately, automatic gait recognition remains a challenging task because there are many variations that can alter the human appearance drastically, such as view, cloth- ing, variations in objects being carried. These variations can affect the recognition accuracy greatly. Among these variations, view is one of the most common one because we cannot control the walking directions of subjects in real applications, which is the central focus of our work here. Early literature reported in [9] uses static body parame- ters measured from gait images as a kind of view invariant feature. Kale et al. [10] used the perspective projection model to generated side view features from arbitrary views. Unfortunately, the the relation between two views is hard to be modelled by a simple linear model, such as the perspec- tive projection model. Some other researchers employed more complex models to handle this problem. The most commonly used model is the view transformation model (VTM) which can transfor- m gait feature from one view to another view. Makihara et al. [15] designed a VTM named as FD-VTM, which can work in the frequency-domain. Different from FD-VTM, RSVD-VTM proposed in [11] operates in the spatial do- main. It uses reduced SVD to construct a VTM and opti- mized Gait Energy Image (GEI) feature vectors based on linear discriminant analysis (LDA), and has achieved a rel- atively good improvement. Motivated by the capability of robust principal component analysis (RPCA) for feature ex- 130
traction, Zheng et al. [23] established a robust VTM via RPCA for view invariant feature extraction. Kusakunniran et al. [12] considered view transformation as a regression problem, and used the sparse regression based on the elastic net as the regression function. Bashir et al. [1] formulated a gaussian process classification framework to estimate view angle in probe set, then uses canonical correlation analy- sis(CCA) to model the correlation of gait sequences from different, arbitrary views. Luo et al. [14] proposed a gait recognition method based on partitioning and CCA. They separated GEI image into 5 non-overlapping parts, and for each part they used CCA to model the correlation. In [19], Xing et al. also used CCA; but they reformulated the tra- ditional CCA to deal with high-dimensional matrix, and re- duced the computational burden in view invariant feature extraction. Lu et al. [13] proposed one method which can handle arbitrary walking directions by cluster-based aver- aged gait images. However, if there is no views with similar walking direction in the gallery set, the recognition rate will decrease. For most VTM related methods, one view transforma- tion model [2, 3, 11, 12, 15, 23] can only transform one specific view angle to another one. The model heavily de- pends on the accuracy of view angle estimation. If we want to transform gait images from arbitrary angles to a specific view, a lot of models are needed. Some other researchers also tried to achieve view invariance using only one mod- el, such as Hu et al. [8] who proposed a method named as ViDP which extracts view invariant features using a linear transform. Wu et al. [18] trained deep convolution neural networks for any view pairs and achieved high accuracies. Besides view variations, clothing can also change the human body appearance as well as shape greatly. Some clothes, such as long overcoats, can occlude the leg mo- tion. Carrying condition is another factor which can affect feature extraction since it is not easy to segment the car- ried object from a human body in images. In the literature, there are few methods that can handle clothing invariant in gait recognition unlike their view invariant counterparts. In [7] , clothing invariance is achieved by dividing the human body into 8 parts, each of which is subject to discrimina- tion analysis. In [5], Guan et al. proposed a random sub- space method (RSM) for clothing-invariant gait recognition by combining multiple inductive biases for classification. One recent method named as SPAE in [21] can extract in- variant gait feature using only one model. In the paper, we propose to use generative adversarial networks (GAN) as a means to solve the problems of varia- tions due to view, clothing and carrying condition simul- taneously using only one model. GAN is inspired by t- wo person zero-sum game in Game Theory, developed by Goodfellow et al. [4] in 2014, which is composed of one generative model G and one discriminative model D. The generative model captures the distribution of the training data, and the discriminative model is a second classifier that determines whether the input is real or generated. The opti- mization process of these two models is a problem of min- imax two-player game. The generative model produces a realistic image from an input random vector z. As we know the early GAN model is too flexible in generating image. In [16], Mirza et al. fed a conditional parameter y into both the discriminator and generator as additional input layer to increase the constraint. Meanwhile, Denton et al. proposed a method using a cascade of convolutional networks with- in a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. As a matter of fact, GANs are hard to train and the generators often produce nonsensical outputs. A method named Deep Convolutional GAN [17] proposed by Radford et al., which contains a series of strategies such as using fractional-strided convolutions and batch normal- ization to make the GAN more stable in training. Recently, Yoo et al. [20] presented an image-conditional generation model which contains a vital component named domain- discriminator. This discriminator ensures that a generat- ed image is relevant to its input image. Furthermore, this method proposes domain transfer using GANs at the pix- el level; and is subsequently known as pixel-level domain transfer GAN, or PixelDTGAN in [20]. In the proposed method, GAN is taken as a regressor. The gait data captured with multiple variations can be trans- formed into the side view without knowing the specific an- gles, clothing type and the object carried. This method has great potential in real scenes. The rest of the paper is organized as follows. Section 2 describes the proposed method. Experiments and evalua- tion are presented in Section 3. The last section, Section 4, gives the conclusions and identifies future work. 2. Proposed method To reduce the effect of variations, GAN is employed as a regressor to generate invariant gait images, that contain side view giat images with normal clothing and without carrying objects. The gait images at arbitrary views can be converted to those at the side view since the side view data contains more dynamic information. While this is intuitively appeal- ing, a key challenge that must be address is to preserve the human identification information in the generated gait im- ages. The GaitGAN model is trained to generate gait images with normal clothing and without carrying objects at the side view by data from the training set. In the test phase, gait images are sent to the GAN model and invariant gait images contains human identification information are gen- erated. The difference between the proposed method and most other GAN related methods is that the generated im- age here can help to improve the discriminant capability, 231
not just generate some images which just looks realistic. The most challenging thing in the proposed method is to p- reserve human identification when generating realistic gait images. 2.1. Gait energy image The gait energy image [6] is a popular gait feature, which is produced by averaging the silhouettes in one gait cycle in a gait sequence as illustrated in Figure 1. GEI is well known for its robustness to noise and its efficient computation. The pixel values in a GEI can be interpretted as the probability of pixel positions in GEI being occupied by a human body over one gait cycle. According to the success of GEI in gait recognition, we take GEI as the input and target im- age of our method. The silhouettes and energy images used in the experiments are produced in the same way as those described in [22]. that work, the authors defined two domains, a source do- main and a target domain. The two domains are connected by a semantic meaning. For instance, the source domain is an image of a dressed person with variations in pose and the target domain is an image of the person’s shirt. So PixelDT- GAN can transfer an image from the source domain which is a photo of a dressed person to the pixel-level target image of shirts. Meanwhile the transferred image should look re- alistic yet preserving the semantic meaning. The framework consists of three important parts as illustrated in Figure 2. While the real/fake discriminator ensures that the generated images are realistic, the domain discriminator, on the other hand, ensures that the generated images contain semantic information. Figure 1: A gait energy image (the right most one) is pro- duced by averaging all the silhouettes (all the remaining im- ages on the left) in one gait cycle . 2.2. Generative adversarial networks for pixel-level domain transfer Generative adversarial networks (GAN) [4] are a branch of unsupervised machine learning, implemented by a sys- tem of two neural networks competing against each other in a zero-sum game framework. A generative model G that captures the data distribution. A discriminative model D then takes either a real data from the training set or a fake image generated from model G and estimates the probabil- ity of its input having come from the training data set rather than the generator. In the GAN for image data, the eventual goal of the generator is to map a small dimensional space z to a pixel-level image space with the objective that the generator can produce a realistic image given an input ran- dom vector z. Both G and D could be a non-linear mapping function. In the case where G and D are defined by mul- tilayer perceptrons, the entire system can be trained with backpropagation. The input of the generative model can be an image in- stead of a noise vector. GAN can realize pixel-level domain transfer between input image and target image such as Pix- elDTGAN proposed by Yoo et al. [20]. PixelDTGAN can transfer a visual input into different forms which can then be visualized through the generated pixel-level image. In this way, it simulates the creation of mental images from visual scenes and objects that are perceived by the human eyes. In Figure 2: The framework of PixelDTGAN which consists of three important parts. The first important component is a pixel-level converter which are composed of an encoder for semantic embedding of a source image and a decoder to produce a target image. The encoder and decoder are implemented by convolution neural networks. However, training the converter is not s- traightforward because the target is not deterministic. Con- sequently, on the top of converter it needs some strategies like loss function to constrain the target image produced. Therefore, Yoo et al. connected a separate network named domain discriminator on top of the converter. The domain discriminator takes a pair of a source image and a target im- age as input, and is trained to produce a scalar probability of whether the input pair is associated or not. The loss func- A is defined tion  as in [20] for the domain discriminator D D A D   ;   = g [D  ;  ℄1g [1D  ;  ℄; A S A S A S 8 < T 1 if  =  ::  = 0 if  =  ^ : T 0 if  =  : T (1) 332
where  S is the source image,  T is the ground truth target, T is the generated image from  T the irrelevant target, and ^  converter. Another component is the real/fake discriminator which similar to traditional GAN in that it is supervised by the labels of real or fake, in order for the entire network to pro- duce realistic images. Here, the discriminator produces a scalar probability to indicate if the image is a real one or not. The discriminator ’s loss function  , according to [20], takes the form of binary cross entropy: D R Figure 4: The structure of the converter which transform the source images to a target one as shown in Figure 3. D    =   g [D  ℄   1  g [1 D  ℄; R R R 1 if  2 f g i ::  = 0 if  2 f  g: i ^ (2) normal walking, the discriminator will output 1. Othervise, it will output 0. The structure of the real/fake discriminator is shown in Figure 5. i where f fake images produced by the generator. g contains real training images and f i ^  g contains Labels are given to the two discriminators, and they su- pervise the converter to produce images that are realistic while keeping the semantic meaning. 2.3. GaitGAN: GAN for gait gecognition Inspired by the pixel-level domain transfer in PixelDT- GAN, we propose GaitGAN to transform the gait data from any view, clothing and carrying conditions to the invariant view that contains side view with normal clothing and with- out carrying objects. Additionally, identification informa- tion is preserved. We set the GEIs at all the viewpoints with clothing and carrying variations as the source and the GEIs of normal Æ (side view) as the target, as shown in Fig- walking at 90 ure 3. The converter contains an encoder and a decoder as shown in Figure 4. Figure 3: The source images and the target image. ’NM’ stands for the normal condition, ’BG’ is for carrying a bag and ’CL’ is for dressing in a coat as defined in CASIA-B database. There are two discriminators. The first one is a real/fake discriminator which is trained to predict whether an image Æ view in is real. If the input GEI is from real gait data at 90 433 Figure 5: The structure of the Real/fake-discriminator. The input data is target image or generated image. With the real/fake discriminator, we can only generate side view GEIs which look well. But, the identification in- formation of the subjects may be lost. To preserve the i- dentification information, another discriminator, named as identification discriminator, which is similar to the domain discriminator in [20] is involved. The identification dis- criminator takes a source image and a target image as input, and is trained to produce a scalar probability of whether the input pair is the same person. If the two inputs source im- ages are from the same subject, the output should be 1. If they are source images belonging to two different subjects, the output should be 0. Likewise, if the input is a source image and the target one is generated by the converter, the discriminator function should output 0. The structure of i- dentification discriminator is shown in Figure 6. 3. Experiments and analysis 3.1. Dataset CASIA-B gait dataset [22] is one of the largest public gait databases, which was created by the Institute of Au- tomation, Chinese Academy of Sciences in January 2005. It consists of 124 subjects (31 females and 93 males) cap- tured from 11 views. The view range is from 0 Æ to 180 Æ
it into two parts, an encoder and a decoder. The encoder part is composed of four convolutional layers to abstrac- t the source into another space which should capture the personal attributes of the source as well as possible. Then the resultant feature z is then fed into the decoder in order to construct a relevant target through the four decoding layers. Each decoding layer conducts fractional stride convolution- s, where the convolution operates in the opposite direction. The details of the encoder and decoder structures are shown in Table 2 and 3. The structure of the real/fake discrimi- nator and the identification discriminator are similar to the encoder’s first four convolution layers. The layers of the discriminators are all convolution layers. Table 2: Details of the encoder. The first four layers of encoder is the same as real/fake discriminator and domain discriminator. After Conv.4, the real/fake and identification discriminator connect Conv.5 to output binary value Layers Number of filters Conv.1 96 Conv.2 Conv.3 Conv.4 192 384 768 Filter size Stride Batch norm Activation function 4 4 f1,1,2g 4 4 96 4 4 192 4 4 384 2 2 2 2 N Y Y Y L-ReLU L-ReLU L-ReLU L-ReLU Table 3: Details of the decoder. F denotes fractional-stride. Layers Number of filters Filter size Stride Batch norm Activation function F- 768 4 4 384 1/2 Conv.1 F- 384 4 4 192 1/2 Conv.2 F- 192 4 4 96 1/2 Conv.3 F- 96 4 4 1 1/2 Conv.4 Y Y Y N L-ReLU L-ReLU L-ReLU Tanh Normally to achieve a good performance using deep learning related methods, a large number of iterations in training are needed. From Figure 7, we can find that more iterations can indeed result in a higher recognition rate, but the rate peaks at around 450 epoches. So in our experi- ments, the training was stopped after 450 epoches. 3.4. Experimental results on CASIA-B dataset To evaluate the robustness of the proposed GaitGAN, three sources of variations have been evaluated, covering view, clothing, and carrying objects. The performance of the model under these conditions are shown in Tables 4-6. Figure 6: The structure of the identification-discriminator. The input data is two image concatenated by channel. with 18 Æ interval between two nearest views. There are 11 sequences for each subject. There are 6 sequences for nor- mal walking (”nm”), 2 sequences for walking with a bag (”bg”) and 2 sequences for walking in a coat (”cl”). 3.2. Experimental design In our experiments, all the three types of gait data in- cluding ”nm”, ”bg” and ”cl” are all involved. We put the six normal walking sequences, two sequences with coat and two sequences containing walking with a bag of the first 62 subjects into the training set and the remaining 62 subjects into the test set. In the test set, the first 4 normal walking sequences of each subjects are put into the gallery set and the others into the probe set as it is shown in Table 1. There are four probe sets to evaluate different kind of variations. Table 1: The experimental design. Training set Gallery set Probe set ProbeNM ProbeBG ProbeCL ProbeALL ID: 001-062 nm01-nm06, bg01, bg02, cl01, cl02 ID: 063-124 nm01-nm04 ID: 063-124 nm05, nm06 ID: 063-124 bg01, bg02 ID: 063-124 cl01, cl02 ID: 063-124 nm05, nm06 bg01, bg02, cl01, cl02 3.3. Model parameters In the experiments, we used a similar setup to that of [20], which is shown in Figure 4. The converter is a uni- fied network that is end-to-end trainable but we can divide 534
there are 11 views in the database, there are 121 pairs of combinations. In each table, each row corresponds to a view angle of the gallery set, whereas each column corresponds to the view angle of the probe set. The recognition rates of these combinations are listed in Table 4. For the results in Table 5, the main difference with those in Table 4 are the probe sets. The probe data contains images of people carry- ing bags, and the carrying conditions are different from that of the gallery set. The probe sets for Table 6 contain gait data with coats. 3.5. Comparisons with GEI+PCA and SPAE Æ, 72 Æ, 108 Æ and 144 Since GEIs are used as input trying to extract invariant features, we first compare our method with GEI+PCA [6] and SPAE [21]. The experiment protocols in terms of the gallery and probe sets for GEI+PCA and SPAE are exactly the same as those presented in Table. 1. Due to limited s- pace, we only list 4 probe angles with a 36 Æ interval. Each row in this figure represents a probe angle. The compared Æ. The first column of Fig- angles are 36 ure 8 compares the recognition rates of the proposed Gait- GAN with GEI+PCA and SPAE at different probe angles in normal walking sequences. The second column shows the comparison with different carrying conditions, and the third shows the comparison with different clothings. As illustrat- ed in Figure 8, the proposed method outperforms GEI+PCA at all probe angle and gallery angle pairs. Meantime, its performance is comparable to that of SPAE and better than SPAE at many points. The results show that the proposed method can produce features that as robust as the state of the art methods in the presence of view, clothing and carrying condition variations. We also compared the recognition rates without view variation. This can be done by taking the average of the rates on the diagonal of Table 4, Table 5 and Table 6. The corresponding average rates of GEI+PCA and SPAE are also obtained in the same manner. The result are shown in Figure 9. When there is no variation, the proposed method achieve a high recognition rate which is better than GEI+PCA and SPAE. But when variation exists, the pro- posed method outperforms GEI+PCA greatly. Figure 7: The recognition rate as a function of the number of iterations. The blue, red and green respectively corre- sponds to normal walking (ProbeNM), walking with a bag (ProbeBG) and walking in a coat (ProbeCL) sequences re- spectively in the probe set. Table 4: Recognition Rate of ProbeNM in experiment 2, training with sequences containing three conditions. 0 100.0 78.23 56.45 33.87 27.42 22.58 20.16 29.84 28.23 48.39 73.39 18 79.03 99.19 88.71 53.23 41.13 37.10 32.26 37.90 45.97 63.71 56.45 Probe set view(Normal walking, nm05,nm06) 36 45.97 91.94 97.58 85.48 69.35 54.84 58.06 66.94 60.48 60.48 36.29 54 33.87 63.71 95.97 95.97 83.06 74.19 76.61 75.00 66.94 50.00 25.81 72 28.23 46.77 75.00 87.10 100.0 98.39 90.32 81.45 61.29 41.13 21.77 90 25.81 38.71 57.26 75.00 96.77 98.39 95.97 79.03 59.68 33.87 19.35 108 26.61 37.90 59.68 75.00 89.52 96.77 97.58 91.13 75.00 45.16 20.16 126 25.81 44.35 72.58 77.42 73.39 75.81 95.97 99.19 95.16 65.32 31.45 144 31.45 43.55 70.16 63.71 62.10 57.26 74.19 97.58 99.19 83.06 44.35 162 54.84 65.32 60.48 37.10 37.10 35.48 38.71 59.68 79.84 99.19 72.58 180 72.58 58.06 35.48 22.58 17.74 21.77 22.58 37.10 45.97 71.77 100.0 w e i v t e s y r e l l a G 0 18 36 54 72 90 108 126 144 162 180 Table 5: Recognition Rate of ProbeBG in the experiment 2 training with sequences containing three conditions. 0 79.03 54.84 36.29 25.00 20.16 15.32 16.13 19.35 26.61 29.03 42.74 18 45.97 76.61 58.87 45.16 24.19 27.42 25.00 29.84 32.26 34.68 28.23 Probe set view(walking with a bag, bg01,bg02) 36 33.06 58.87 75.81 66.13 38.71 37.90 41.13 41.94 48.39 36.29 24.19 54 14.52 31.45 53.23 68.55 41.94 38.71 42.74 45.16 37.90 25.00 12.90 72 16.13 26.61 44.35 57.26 65.32 62.10 58.87 46.77 37.10 19.35 11.29 90 14.52 16.13 30.65 42.74 56.45 64.52 58.06 52.42 36.29 16.13 11.29 108 11.29 24.19 34.68 41.13 57.26 62.10 69.35 58.06 38.71 20.16 14.52 126 15.32 29.84 46.77 45.97 51.61 61.29 70.16 73.39 67.74 37.90 21.77 144 22.58 32.26 42.74 40.32 39.52 38.71 53.23 66.13 73.39 51.61 30.65 162 33.87 41.94 34.68 20.16 16.94 20.97 24.19 41.13 50.00 76.61 49.19 180 41.13 32.26 20.16 13.71 8.87 12.10 11.29 22.58 32.26 41.94 77.42 w e i v t e s y r e l l a G 0 18 36 54 72 90 108 126 144 162 180 Table 6: Recognition Rate of ProbeCL in the experiment 2 training with sequences containing three conditions. Probe set view(walking wearing a coat, cl01,cl02) 0 25.81 17.74 13.71 2.42 4.84 4.03 4.03 10.48 8.87 14.52 17.74 18 16.13 37.90 24.19 19.35 12.10 10.48 12.90 10.48 13.71 18.55 13.71 36 15.32 34.68 45.16 37.10 29.03 22.58 27.42 23.39 26.61 20.97 11.29 54 12.10 20.97 43.55 55.65 40.32 31.45 27.42 27.42 22.58 17.74 6.45 72 6.45 13.71 30.65 39.52 43.55 50.00 38.71 26.61 18.55 12.10 10.48 90 6.45 8.87 19.35 22.58 34.68 48.39 44.35 25.81 19.35 12.10 5.65 108 9.68 12.10 16.94 29.03 32.26 43.55 47.58 37.10 21.77 17.74 6.45 126 7.26 19.35 22.58 29.84 28.23 36.29 38.71 45.97 35.48 21.77 5.65 144 12.10 16.94 28.23 29.84 33.87 31.45 32.26 41.13 43.55 37.10 14.52 162 11.29 24.19 20.16 16.94 12.90 13.71 15.32 15.32 20.97 35.48 29.03 180 15.32 19.35 10.48 8.06 8.06 8.06 4.84 10.48 12.90 21.77 27.42 w e i v t e s y r e l l a G 0 18 36 54 72 90 108 126 144 162 180 3.6. Comparisons with state-of-the-art In order to better analyse the performance of the pro- posed method, we further compare the proposed GaitGAN with additional state-of-the-art methods including FD-VTM [15], RSVD-VTM [11], RPCA-VTM [23], R-VTM [12], GP+CCA [1] , C3A [19] and SPAE [21]. For Table 4, the first four normal sequences at a specific view are put into the gallery set, and the last two normal sequences at another view are put into the probe set. Since 635 The probe angles selected are 54 Æ as in experiments of those methods. The experimental results are listed in Figure 10. From the results we can find that the pro- posed method outperforms others when the angle difference between the gallery and the probe is large. This shows that Æ and 126 Æ, 90
(a) (b) (c) Figure 10: Comparing with existing methods at probe an- gles (a)54 Æ. The gallery angles are the rest 10 angles except the corresponding probe angle. Æ and (c)126 Æ, (b)90 Table 7: Average recognition rates at probe angles 54 and 126 Æ. The gallery angles are the rest 10 angles except the corresponding probe angle. The values in the right most Æ, column are the averages rate at the three probe angles 54 Æ, 90 Æ Æ and 126 Æ. 90 Method C3A [19] ViDP [8] CNN [18] SPAE [21] Proposed Probe angle Æ Æ 54 90 Æ 126 Average 56.64% 54.65% 58.38% 56.56% 64.2% 63.2% 77.8% 72.9% 63.31% 62.1% 66.29% 63.9% 64.52% 58.15% 65.73% 62.8% 60.4% 64.9% 65.0% 76.1% significant variations. 736 Figure 8: Comparison with GEI+PCA and SPAE at differ- ent probe angle. Each row represents a probe angle and each column represents different probe sequences. The blue lines are achieved by proposed method. Figure 9: The average recognition rate on the diagonal com- pare with GEI+PCA and SPAE at three condition. The green bars are achieved by the proposed method. the model can handle large viewpoint variation well. When the viewpoint variation is not large enough, the proposed method can also improve the recognition rate obviously. In Table 7, the experimental results of C3A [19], ViD- P [8], CNN [18], SPAE [21] and the proposed method are listed. Here we want to emphasis that the proposed method obtains comparable results using only one generative model for any views, and for clothing or carrying condition vari- ations, simultaneously. Meanwhile, this method is the first use of GAN for gait recognition and the experimental re- sults show that GAN is feasible for gait recognition under
4. Conclusions and future work In this paper, we proposed to apply PixelDTGAN which is a variant of generative adversarial networks to deal with variations in viewpoint, clothing and carrying condition- s simultaneously in gait recognition. The resultant mod- el is known as GaitGAN. Extensive experiments based on the CASIA-B database shows that the GaitGAN can trans- form gait images obtained from any viewpoint to the side view, from abnormal walking sequences to normal walk- ing sequences without the need to estimate the subject’s view angle, clothing type and carrying condition before- hand. Experimental results show that the recognition rate of proposed model is comparable to that of the state-of-the- art methods. Indeed, GaitGAN is shown to be promising for practical applications in video surveillance. There are however, a number of limitations which need to be addressed in future work. The number of database samples currently used is relatively small. In the future, we will use a larger database for training. In addition, we can use more complex and powerful networks. In this work, we have merely utilized the generated side view image as the matching feature without exploring intermediate layer features neither others effective feature extraction method- s. We believe that better GAN technologies will further improve gait recognition in future. Besides gait recogni- tion, other recognition and classification problems could al- so benefit from the development of GAN. Acknowledgment The authors would like to thank Dr. Xianglei Xing for his support on experimental results of some methods. The work is supported by the Science Foundation of Shenzhen (Grant No. JCYJ20150324141711699). References [1] K. Bashir, T. Xiang, and S. Gong. Cross-view gait recogni- tion using correlation strength. In BMVC, 2010. [2] X. Ben, W. Meng, R. Yan, and K. Wang. An improved bio- metrics technique based on metric learning approach. Neu- rocomputing, 97:44 – 51, 2012. [3] X. Ben, P. Zhang, W. Meng, R. Yan, M. Yang, W. Liu, and H. Zhang. On the distance metric learning between cross- domain gaits. Neurocomputing, 208:153 – 164, 2016. SI: BridgingSemantic. [4] I. J. Goodfellow, J. Pougetabadie, M. Mirza, B. X- u, D. Wardefarley, S. Ozair, A. Courville, Y. Bengio, Z. Ghahramani, and M. Welling. Generative adversarial nets. Advances in Neural Information Processing Systems, 3:2672–2680, 2014. [5] Y. Guan, C. T. Li, and Y. Hu. Robust clothing-invariant gait recognition. In 2012 Eighth International Conference on In- telligent Information Hiding and Multimedia Signal Process- ing, pages 321–324, July 2012. [6] J. Han and B. Bhanu. Individual recognition using gait ener- gy image. IEEE TPAMI, 28(2):316–322, 2006. [7] M. A. Hossain, Y. Makihara, J. Wang, and Y. Yagi. Clothing- invariant gait identification using part-based clothing catego- rization and adaptive weight control. Pattern Recognition, 43(6):2281 – 2291, 2010. [8] M. Hu, Y. Wang, Z. Zhang, J. J. Little, and D. Huang. View- invariant discriminative projection for multi-view gait-based human identification. IEEE TIFS, 8(12):2034–2045, Dec 2013. [9] A. Y. Johnson and A. F. Bobick. A multi-view method for gait recognition using static body parameters. In Proc. of 3rd International Conference on Audio and Video Based Biomet- ric Person Authentication, pages 301–311, Jun. 2001. [10] A. Kale, A. K. R. Chowdhury, and R. Chellappa. Towards a view invariant gait recognition algorithm. In IEEE Con- ference on Advanced Video and Signal Based Surveillance, pages 143–150, July 2003. [11] W. Kusakunniran, Q. Wu, H. Li, and J. Zhang. Multi- ple views gait recognition using view transformation model based on optimized gait energy image. In ICCV Workshops, pages 1058–1064, 2009. [12] W. Kusakunniran, Q. Wu, J. Zhang, and H. Li. Gait recog- nition under various viewing angles based on correlated mo- tion regression. IEEE TCSVT, 22(6):966–980, 2012. [13] J. Lu, G. Wang, and P. Moulin. Human identity and gen- der recognition from gait sequences with arbitrary walking directions. IEEE TIFS, 9(1):51–61, Jan 2014. [14] C. Luo, W. Xu, and C. Zhu. Robust gait recognition based on partitioning and canonical correlation analysis. In IEEE In- ternational Conference on Imaging Systems and Techniques, 2015. [15] Y. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and Y. Yagi. Gait recognition using a view transformation model in the frequency domain. In ECCV, pages 151–163, 2006. [16] M. Mirza and S. Osindero. Conditional generative adversar- ial nets. Computer Science, pages 2672–2680, 2014. [17] A. Radford, L. Metz, and S. Chintala. Unsupervised repre- sentation learning with deep convolutional generative adver- sarial networks. Computer Science, 2015. [18] Z. Wu, Y. Huang, L. Wang, X. Wang, and T. Tan. A compre- hensive study on cross-view gait based human identification with deep cnns. IEEE TPAMI, 39(2):209–226, Feb 2017. [19] X. Xing, K. Wang, T. Yan, and Z. Lv. Complete canoni- cal correlation analysis with application to multi-view gait recognition. Pattern Recognition, 50:107–117, 2016. [20] D. Yoo, N. Kim, S. Park, A. S. Paek, and I. S. Kweon. Pixel- level domain transfer. arXiv:CoRR, 1603.07442, 2016. [21] S. Yu, H. Chen, Q. Wang, L. Shen, and Y. Huang. Invariant feature extraction for gait recognition using only one uniform model. Neurocomputing, 239:81–93, 2017. [22] S. Yu, D. Tan, and T. Tan. A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. In ICPR, pages 441–444, August 2006. [23] S. Zheng, J. Zhang, K. Huang, R. He, and T. Tan. Robust In ICIP, view transformation model for gait recognition. pages 2073–2076, 2011. 837
分享到:
收藏