logo资料库

Scribble+based+3D+shape+segmentation+via+weakly-supervised+learn....pdf

第1页 / 共12页
第2页 / 共12页
第3页 / 共12页
第4页 / 共12页
第5页 / 共12页
第6页 / 共12页
第7页 / 共12页
第8页 / 共12页
资料共12页,剩余部分请下载后查看
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 1 Scribble based 3D shape segmentation via weakly-supervised learning Zhenyu Shu, Xiaoyong Shen, Shiqing Xin*, Qingjun Chang, Jieqing Feng, Ladislav Kavan, and Ligang Liu Abstract—Shape segmentation is a fundamental problem in shape analysis. Previous research shows that prior knowledge helps to improve the segmentation accuracy and quality. However, completely labeling each 3D shape in a large training data set requires a heavy manual workload. In this paper, we propose a novel weakly-supervised algorithm for segmenting 3D shapes using deep learning. Our method jointly propagates information from scribbles to unlabeled faces and learns deep neural network parameters. Therefore, it does not rely on completely labeled training shapes and only needs a really simple and convenient scribble-based partially labeling process, instead of the extremely time-consuming and tedious fully labeling processes. Various experimental results demonstrate the proposed method’s superior segmentation performance over the previous unsupervised approaches and comparable segmentation performance to the state-of-the-art fully supervised methods. Index Terms—3D shapes, Segmentation, Scribble, Weakly-supervised, Deep learning ! 1 INTRODUCTION A Utomatic segmentation of 3D shapes is a fundamental problem in shape analysis [1]. It is also central to many computer graphics problems, including mesh parameteri- zation [2], skeleton extraction [3], resolution modeling [4], shape retrieval [5] and so on. A commonly used idea in traditional segmentation algorithms is to first map all the tri- angular faces on 3D shapes, which are usually represented by 3D triangular meshes, into feature space using shape signatures, then clustering the faces on a mesh into several clusters in the feature space, and finally transforming the result onto the dual graph of the mesh to obtain the final seg- mentation result. Earlier segmentation work [1], [6], [7] does not utilize any prior knowledge and focuses on employing unsupervised methods to cluster faces in the feature space to obtain the desired results. Recent work [8], [9], [10] uses prior knowledge to improve performance by classifying faces in the feature space using supervised methods. There are two trends for improving the performance of 3D shape segmentation. One trend is to use multiple shape signatures simultaneously to improve the ability to describe a shape’s geometric features from different perspectives, as a single signature can only characterize a shape from a specific perspective. • Zhenyu Shu and Qingjun Chang are with School of Computer and Data Engineering, Ningbo Institute of Technology, Zhejiang University, Ningbo, PR China. E-mail: shuzhenyu@nit.zju.edu.cn (Zhenyu Shu) • Xiaoyong Shen is with Department of Computer Science and Engineering, • The Chinese University of Hong Kong, HongKong. Shiqing Xin is with School of Computer Science and Technology, Shan- Dong University, Jinan, PR China. Corresponding author. E-mail: xinshiqing@163.com (Shiqing Xin) Jieqing Feng is with State Key Lab of CAD&CG, Zhejiang University, Hangzhou, PR China. Ladislav Kavan is with School of Computing, University of Utah, Salt Lake City, USA. Ligang Liu is with Graphics & Geometric Computing Laboratory, School of Mathematical Sciences, University of Science and Technology of China, Anhui, PR China. • • • Manuscript received month day, year; revised month day, year. Fig. 1. Some representative results of our approach. The other trend is to use and learn from prior knowl- edge, such as training data, to improve the faces’ classifi- cation results in feature space. These two techniques work very well and are used in the state-of-the-art segmentation systems [8], [9], [10], [11], [12]. Due to their nature, most existing learning-based 3D shape segmentation methods rely heavily on high-quality training data to gain a signif- icant performance improvement. However, preparing high- quality training data for 3D shape segmentation is quite expensive because manually labeling each face on a 3D shape is a very tedious and time-consuming task, even though some tools [13] have been developed to accelerate the labeling process. As an example, preparing the training data for 380 3D shapes in the Princeton Shape Benchmark dataset, 12 volunteers took about 10 hours to complete the manual labeling. In this paper, we propose an effective weakly-supervised method for shape segmentation based on deep learning to address the problem of tedious manual labeling. In our method, we use sparse scribble-based labels, rather than the complete high-quality labels to train our 3D shape segmen- tation model. Our method does not need the tedious and time-consuming labeling process for preparing training data and therefore is much more practical than existing learning- 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 2 Fig. 2. Workflow of our proposed approach for 3D shape segmentation. based 3D shape segmentation approaches. Experimental results on the Princeton Shape Benchmark dataset [13] show that our method outperforms other state-of-the-art scribble- based or non-scribble-based segmentation methods, and can achieve comparable or only slightly worse perfor- mance when compared with existing fully supervised deep learning-based 3D shape segmentation methods. Figure 1 shows some representative results of our approach, and an overview of our algorithm is shown in Figure 2. The main contribution of this work is our novel shape segmentation method using weakly-supervised deep learn- ing techniques, eliminating the tedious and time-consuming training data preparation processes. The remaining parts of the paper are organized as follows. Section 2 reviews the related work on 3D shape segmentation. In Section 3, we give our novel scribble- based deep training model and the overall framework of our algorithm. After that, we show extensive experimental results, as well as comparisons with the state of the art, in Section 4. Finally, we discuss the limitations and future work in Section 5 and conclude this paper in Section 6. 2 RELATED WORK Shape segmentation [14], [15] aims at dividing a 3D shape into meaningful parts and plays an important role in shape analysis and shape understanding. Until now, many meth- ods have been proposed to address this problem. A com- mon practice is to first extract geometric feature vectors and then apply some decomposition techniques to obtain a satisfactory segmentation result: extreme learning ma- chine [9], approximate convexity analysis [16], concavity- aware fields [17], spectral clustering [18], K-Means [19], core extraction [20], randomized cuts [21], normalized cuts [21], graph cuts [21], [22], random walks [23]. The existing shape segmentation methods can be mainly divided into two categories, including unsupervised 3D shape segmentation and supervised 3D shape segmentation. Unsupervised 3D shape segmentation. Earlier work focuses on segmenting one single 3D shape without prior knowl- edge. For example, Sidi and van Kaick et al. [7] proposed an unsupervised method that treats segmentation as a clustering problem in a descriptor space. Golovinskiy and Funkhouser [24] use graph clustering techniques to segment a set of 3D shapes simultaneously. Their assumption is that a global rigid alignment exists between the input shapes, which facilitates iteratively establishing correspondence be- tween respective parts. Huang and Koltun et al. [25] sug- gested formulating the joint segmentation problem as an integer quadratic programming problem. Experimental re- sults show that segmenting a set of similar 3D shapes si- multaneously significantly outperforms traditional segmen- tation that targets a single shape. Hu and Fan et.al [26] also presented an unsupervised approach for segmenting a set of 3D shapes by over-segmenting the input shapes into primitive patches and then grouping similar patches via a subspace clustering scheme. Recently, Meng and Xia et al. [6] proposed to improve the segmentation results by a multi-label optimization process. Compared with segmenting a single 3D shape, segment- 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply. PredictFeature DescriptorsPredictPredictWeakly supervised Deep LearningFeature DescriptorsFeature DescriptorsOver-segmentationOver-segmentationOver-segmentationFeature Vectors(Partly with Labels)Graph CutsGraph CutsGraph CutsSegmentation on training shapesPredictPredictGraph CutsGraph CutsSegmentation on testing shapes
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 3 ing a family of 3D shapes in the same class can achieve better results, assuming that their underlying relevance is effectively utilized. Therefore, more work employing this idea was proposed by being combining with various techniques. Wu and Wang et al. [1] suggested a spectral clustering method that generates consistent segmentation by performing spectral clustering in a fused space of shape descriptors. Xu and Li et al. [27] used anisotropic part scales to part correspondence and their algorithm performs well on a diverse set of shapes. Later, Zheng and Cohen-Or et al. [28] suggested an indirect top-down approach to deal with large shape variations. A significant advantage of unsupervised 3D shape seg- mentation is that it does not need any training data, which can be expensive to prepare. However, due to the absence of prior knowledge, their segmentation performance may not be as good as supervised ones. Supervised 3D shape segmentation. Because 3D shape seg- mentation can actually be regarded as a face clustering process in the feature space, a data-driven method may be helpful to obtain better results. Therefore, more recent work tries to further improve performance by utilizing prior knowledge during segmentation. For example, Wang and Asafi et al. [29] presented a semi-supervised learning method where the user actively assists in the co-analysis by iteratively providing inputs. Their method requires only a sparse set of constraints to quickly converge toward a consistent and error-free semantic labeling of the set. Kalogerakis and Hertzmann et al. [8] proposed a supervised approach to do labeling and segmentation simultaneously. They showed that the segmentation performance can be dramatically improved by learning techniques. Van Kaick and Tagliasacchi et al.[30] introduced an approach to part correspondence which incorporates prior knowledge im- parted by a training set of pre-segmented, labeled models and combines that knowledge with content-driven analysis based on the geometric similarity between the matched shapes. Recently, Guo and Zou et al. [10] train convolutional neural networks to effectively classify faces in feature space to segment 3D shapes. Supervised segmentation methods usually outperform unsupervised ones. However, they often require a huge set of manually labeled segmentation results, which greatly limits their applicability. Existing learning-based segmentation methods [8], [9], [10] need a fully labeled training shapes to train a learning model. Due to the limited generalizability of learning-based methods, the prediction models must be learned by using 3D training shapes similar to the testing 3D shapes, to achieve satisfactory segmentation results. For example, to segment a chair shape, one has to construct a prediction model by learning from training shapes of chairs. If we want to segment a variety of 3D shapes using learning-based methods, we have to manually prepare many different train- ing 3D shapes, which usually require a different set of train- ing data for different types of 3D shapes, where each face on the 3D shapes needs to be labeled. Therefore, expanding the capabilities of the system to different types of shapes becomes prohibitively expensive in terms of the manual labor required for producing the training data. Although a tool [13] for manually segmenting 3D shapes has been provided by the computer graphics research community, the Fig. 3. Some examples of weakly labeled shapes with scribbles (top). The labeling information is automatically propagated to unlabeled parts on the 3D shapes by our algorithm (bottom). Note that we use different colors to show disconnected segments even if they have the same label. labeling process is still quite tedious and time-consuming. To reduce the time of training data preparation, we pro- pose a novel optimization model to remove the necessity of fully labeled training 3D shapes. With our new optimization model, only some sparsely labeled training 3D shapes are required for the learning process. The labeling information will be automatically propagated to the unlabeled parts of the training 3D shapes and given as a part of the optimization results in our algorithm. Figure 3 shows the weakly labeled training 3D shapes used as the input of our algorithm and the corresponding propagation results. To our best knowledge, we are the first to introduce weakly-supervised deep learning techniques to 3D shape segmentation. It is worth pointing out that the distinguish- ing feature of our method is that it does not need fully labeled training shapes for the learning process, which can greatly reduce the expenses spent on training data prepa- ration in existing learning-based segmentation method. For existing learning-based segmentation methods, a large number of similar fully labeled training shapes must be prepared. That means the users must fully label a lot of training shapes, which is quite expensive. Also, when segmenting different target 3D shapes, different groups of corresponding training 3D shapes are required for training the learning model, which is a significant drawback of the existing learning-based segmentation methods. Our method removes the necessity of the tedious and time-consuming training 3D shapes preparation process, and directly learns from scribble-based weakly labeled training 3D shapes. Therefore, our method is much more practical when com- pared with the existing learning-based methods, because it does not require fully labeled training 3D shapes. 3 SCRIBBLE-BASED LEARNING MODEL 3.1 Objective Function We use a graphical model to propagate information from scribbles to unknown faces. For acceleration, we over- segment 3D shapes first, where each shape is divided into 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 4 primitive patches, such as the ones achieved in [21]. We then build a dual graph of the primitive patches. Each primitive patch is regarded as a vertex in the dual graph and an edge in the graph represents the similarity between two patches. Given a training 3D shape M, the scribble labels of M are S = {sk, lk} where sk are the faces of a scribble k and lk is that scribble’s category label. For a primitive patch xi, we want to find its category label yi. In order to merge the initial small patches on the sur- face of a 3D shape into resulting segments, we have three considerations. Ei First, the segmentation results should respect the scribble labels provided by users and regard them as the ground truth. Therefore, we use the following unary energy term Ei scr (yi) to ensure the optimization results comply with the training labels provided by scribbles. if yi = lk and xi ∩ sk = ∅ if yi ∈ {lk} and xi ∩ S = ∅ otherwise  0 1|{lk}| − log ∞ scr (yi) = The meaning of Ei (1) scr (yi) is the following: If a primitive patch xi overlaps with a scribble sk, then its corresponding penalty is zero when assigned lk or infinity when the assigned label is not lk. Otherwise, xi can be assigned any labels with equal probability if it does not overlap with any scribble. However, it can never be assigned any labels that do not occur in this 3D shape. Second, the segmentation results should conform to the predicted results of our trained model, i.e., a deep neural network. We use a unary term Ei neu (yi, Θ) to respect the output of the deep neural network and define it as: neu (yi, Θ) = − log P (yi |X, Θ ) . Ei (2) Here Θ represents the parameters of the deep neural net- work. log P (yi |X, Θ ) denotes the log probability of predict- ing xi to have the label yi. It can be computed by summing the log probabilities of all faces in the patch xi. Finally, faces with similar geometric features should be merged into one common segment. To measure the simi- larity between two primitive patches in the feature space, we define a pairwise term Ei,j f ea (yi, yj) in our optimization problem as follows: −f (xi)−f (xj )2 2 δ2 exp 0 if yi = yj otherwise (3) Ei,j f ea (yi, yj) = where f (xi) and f (xj) represent the corresponding feature vectors of primitive patches xi and xj, respectively. The purpose of this term is to ensure that two primitive patches should be very different in the feature space if they have different labels. With the above considerations, we give the following ob- jective function to minimize by our scribble-based learning model: E = Escr (Y ) + Eneu (Y, Θ) + λEf ea (Y ) , where Y scr (yi), Eneu (Y, Θ) = denotes the Ei (4) {yi}, Escr (Y ) = neu (yi, Θ) and Ef ea (Y ) = Ei set of i Ei,j f ea (yi, yj). i i,j There are two sets of variables to be optimized, which are primitive patches’ category labels Y and the deep neural network’s parameters Θ, such that the above objective func- tion is minimized to get the trained deep learning models. In this paper, λ is empirically set to be 1.0. Equation (4) is only for one 3D shape. Our final energy function sums Equation (4) over all 3D shapes in the same category. 3.2 Optimization The optimization problem proposed in the previous section can be effectively solved by alternating optimization (also known as block coordinate descent). We fix Θ to solve for Y and then fix Y to solve for Θ. Propagating users’ scribble information to unlabeled faces. When Θ is fixed, the solver propagates labels to unmarked faces based on users’ scribble, geometric features, and deep neural network predictions. In this step, the objec- tive function can be effectively minimized by using graph cuts algorithm [31] because the objective function in Equa- tion (4) can be seen as a linear combination of the unary neu (yi, Θ)) and the pairwise term f ea (yi, yj)). This formulation is widely used for 3D scr (yi) + term ( ( i Ei,j Ei Ei i i,j shape segmentation [6], [9], [10]. Optimizing deep neural network parameters. When Y is fixed, minimizing the objective function turns into training a deep neural network and the solver learns a deep neural network for face-wise 3D shape segmentation. Given the labels Y for all faces on 3D shapes, the deep neural net- work can be trained by regarding feature vectors and label information of all faces as input vectors and output proba- bility vectors respectively. The last layer of the deep neural network outputs the face-wise log probabilities, which are used to update the unary term in our energy function. Our solver is initialized from the graph cuts step without using network prediction and then alternates between the two steps. In our experiments, the neural network contains 4 fully connected layers and the numbers of neurons in the two hidden layers are experimentally tuned to best fit the training set. The first layer and the last layer contains d and c neurons respectively, where d is the dimension of input feature vectors and c is the number of label categories provided by scribbles. The rectified linear unit [32] is em- ployed as the activation function for each layer except the last layer. The last layer of our neural network is a softmax classifier producing the probabilities of each faces belonging to different categories. Our solver stops iterations when the total energy does not decrease anymore. Note that here the graph cuts step is applied on the patch level, while the deep neural network is directly trained on the face level. Theoretically, we can change our objective function so that our method can apply the graph cuts step and the training step on the face level. However, applying graph cuts on the patch level instead of the face level can further reduce the computation time. Therefore, we apply graph cuts algorithm on the patch level (usually 50 patches) instead of the face level in our method. Figure 4 shows the optimization process of our algorithm for an ant model. In this figure, we can see that the interme- diate segmentations improve with each successive iteration. 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 5 (a) Scribble (b) Iteration 1 (c) Iteration 2 (d) Iteration 3 (e) Final result Fig. 4. The optimization process for an ant model. With the decreasing value of the objective function, the intermediate segmentation result of each iteration becomes better and better. The solver converges after 4 iterations. The top row and the bottom row show the ant model from the top view and bottom view respectively. Average geodesic distance, shape diameter function and Gaussian curvature are used as feature descriptors. In each iteration, the labeling information is learned from the result of the graph cut by the deep neural network, and then used to more accurately predict the labeling informa- tion. As the deep neural network gets updated, the labeling information from the network becomes more reliable, im- proving the accuracy propagated labeling information from the graph cuts. This propagated labeling information in turn improve the deep neural network learning. In our exper- iments, we empirically find that the optimization process converges after four iterations and the satisfactory result is obtained. 3.3 Shape signatures selection To extract feature vectors for faces and primitive patches, four widely known feature descriptors, including average geodesic distance (AGD) [33], shape diameter function (SDF) [3], Gaussian curvature (GC) [34] and scale-invariant heat kernel signatures (SIHKS) [35], are selected in this paper. These feature descriptors are deemed to have the capability of characterizing the geometric properties well from different perspectives. For each face, we concatenate the values (for AGD, SDF, and GC) and vectors (for SIHKS, the dimension is experimentally set to be 19) together and obtain a corresponding feature vector, whose dimension is 22, to be used in Equation (2). For each primitive patch, a different strategy is adopted for extracting feature vectors. For scalar fields defined on each face, such as AGD, SDF, and GC, we use histograms capturing the feature distribution to extract the feature vectors for each primitive patch. For SIHKS, which is defined as a vector field on each face, we extract feature vectors by employing the famous bag-of- feature (BoF) technique for each primitive patch. We note that the number of bins in the histograms and that of the bags for the bag-of-feature representation are both set to be B = 100. Like the face-level situation, the feature vectors from different shape signatures are also concatenated into a single feature vector to be used in Equation (1) and (3). In this way, any feature descriptor can be adapted into our algorithm framework. 3.4 Algorithm Our algorithm works on a set of similar 3D shapes and it consists of five stages, as illustrated in Figure 2. First, similar to the super-pixel based image segmentation [36], [37], we divide each shape into primitive patches for speed- up, like in [21]. Second, we calculate feature vectors for each patch, as described in Section 3.3. Then in the next stage, we manually scribble the shapes and get the weakly-labeled training data for our segmentation method. After that, with the scribble-based labels and the extracted feature vectors for each face and primitive patch, a deep learning model is constructed and trained to predict the label of each primitive patch in the shapes. Finally, the learned model is applied to the shapes to obtain the segmentation results. We apply graph cuts algorithm [6], [31] to further improve the quality of the segmentation. Our algorithm can be summarized as follows. Given a set of manually scribbled training 3D shapes, 1) 2) For each training 3D shape, divide it into several primitive patches. For each primitive patch, calculate corresponding feature vectors. 3) Minimize the objective function (4) using alternating optimization. a) Apply graph cuts technique to get an initial b) guess of Y . Fix Y and train deep neural network param- eters Θ to minimize the objective function (4). 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 6 Parameter settings. In this paper, the weight coefficient λ in the optimization problem (4) is set to 1.0. In our experiments, we tried various choices of L. We show the average Rand Index scores with different values of L on PSB in Figure 5. It can be seen that the Rand Index score gets worse when L is far less than 50 because small parts cannot be captured very well by our method in this case. However, if L is far larger than 50, too many patches will lead our method to be too inefficient. Based on the above observation, the number of patches for over-segmenting is experimentally set to L = 50. 4.1 Results Our method is a weakly-supervised one and the training dataset only requires partially annotated 3D shapes. It is worth pointing out that, although the 3D shapes in the training set only contain partially annotated information (obtained from scribbles) instead of fully annotated ones, all the 3D shapes in the training set are automatically fully annotated by our learning process as a by-product of our algorithm. Therefore, our algorithm can produce two sets of segmentation results. One set of results are the segmentation results on the testing set, obtained by applying our learned model to the testing set. The other set of results are the segmentation results on the training set, obtained from the learning process of our method. Figure 6 shows the segmentation results on the training shapes by using our method for some representative cate- gories of 3D shapes in PSB. Figure 7 shows the segmentation results on the testing shapes by using our method on the same dataset. Although there are large shape variations, a large majority of the segmentation results are desirable and consistent with our perception. The Rand Index score statistics of our segmentation for training shapes on the PSB dataset, as well as those of other methods, are detailed in Table 1. These were obtained by first computing the average Rand Index score for each category respectively, and then averaging over all categories in the corresponding dataset. From Table 1 we can see that our algorithm obtains an average Rand Index of 0.076, which outperforms related algorithms. 4.2 Sensitivities to scribble quality The performance of our algorithm relies on the scribbles provided by users. To test the sensitivity of our algorithm to scribble quality, we compute the Rand Index scores of our algorithm for each 3D shape in PSB, while varying the percentage of the surface covered by the scribbles.. For a 3D shape, let S be the set of faces covered by the scribbles provided by our volunteers. We gradually remove faces from S to reduce the coverage of scribbles. For each step, we randomly remove faces from S and get the corresponding Rand Index scores from our algorithm. Figure 8 shows the corresponding Rand Index scores after averaging over the whole PSB. We can see that although the average Rand In- dex score decreases rapidly when more scribble information provided by users, the score is already very small even when the scribbles provided by the user only cover 1.0% area of the whole surfaces of 3D shapes. Fig. 5. The average Rand Index scores with regard to L on the PSB dataset, where lower values indicate better results. c) Fix Θ and optimize Y to minimize the objec- tive function (4). d) Repeat step (3b) and step (3c) until conver- gence. The results of our algorithm include not only segmented training 3D shapes but also a neural network which can be used to predict and segment test 3D shapes. 4 EVALUATION Experimental dataset. We test our segmentation algorithm on the Princeton Segmentation Benchmark (PSB) dataset [13] and ShapeNetCore dataset [12], [38], which are both open datasets for 3D shape segmentation. The PSB dataset con- tains 19 different object categories and each category con- tains 20 3D shapes. For PSB dataset, we use the ground- truth provided by [8] for evaluation. For ShapeNetCore dataset, we use the ground-truth provided by [38], which are converted from point-based segmentation results pro- vided by [12]. It is worth pointing out that most of the 3D shapes in the ShapeNetCore dataset are non-manifold while our algorithm needs a manifold shape as the input. Therefore, we resample and reconstruct all the shapes in the dataset so that all 3D shapes become manifold. We developed a simple software tool to facilitate the labeling process and employed 3 volunteers to scribble the training 3D shapes. Some examples of scribble results are shown in Appendix. Evaluation metrics. We use four metrics, Rand Index, Cut Discrepancy, Hamming Distance and Consistency Error, which are defined by [13] to evaluate the segmentation results. Rand Index is used to compute the similarity be- tween two segmentations. Cut Discrepancy measures the similarity between different cuts by comparing the bound- aries. As opposed to Cut Discrepancy, Hamming Distance compares the regions and evaluates the minimum number of substitutions required when changing one region into another. Consistency Errors, including the global one (GCE) and the local one (LCE), measure the hierarchical differences between two segmentations of the same shape. 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 7 (a) AirPlane (b) Table (c) Teddy (d) Cup Fig. 6. Representative segmentation results produced by our approach (on training shapes) on the Princeton segmentation benchmark dataset. (a) AirPlane (b) Table (c) Teddy (d) Cup Fig. 7. Representative segmentation results produced by our approach (on testing shapes) on the Princeton segmentation benchmark dataset. The Rand Index scores of segmentation for each category of 3D shapes in PSB dataset with different methods. TABLE 1 Human Cup Octopus Ant Chair Glasses Airplane Object Categories Ours 0.064 0.021 0.048 0.081 0.007 0.029 0.022 0.014 0.068 0.078 0.050 0.100 0.092 0.065 0.222 0.128 0.106 0.101 0.148 0.076 Table Teddy Hand Plier Fish Bird Bust Mech Bearing Vase Fourleg Average Armadillo PMC WcSeg 0.059 0.084 0.125 0.089 0.124 0.167 0.070 0.108 0.009 0.057 0.066 0.116 0.026 0.066 0.025 0.056 0.040 0.294 0.103 0.220 0.078 0.110 0.144 0.471 0.094 0.212 0.195 0.076 0.189 0.256 0.064 0.132 0.087 0.159 0.109 0.351 0.139 0.175 0.162 0.098 RandCuts NormCuts RandWalks Kmeans 0.117 0.268 0.100 0.129 0.041 0.172 0.093 0.413 0.050 0.132 0.119 0.307 0.105 0.093 0.227 0.348 0.160 0.115 0.182 0.167 0.126 0.432 0.137 0.200 0.083 0.081 0.101 0.260 0.122 0.180 0.175 0.450 0.183 0.126 0.327 0.367 0.250 0.285 0.201 0.215 0.162 0.443 0.246 0.249 0.076 0.143 0.103 0.274 0.113 0.194 0.206 0.429 0.220 0.117 0.330 0.388 0.303 0.301 0.232 0.238 0.136 0.559 0.195 0.236 0.109 0.193 0.142 0.465 0.177 0.218 0.179 0.476 0.182 0.108 0.352 0.505 0.322 0.389 0.197 0.270 SDF 0.133 0.384 0.197 0.083 0.009 0.075 0.041 0.174 0.043 0.199 0.368 0.263 0.104 0.076 0.308 0.242 0.105 0.212 0.127 0.165 FitPrim CoreExtra 0.139 0.488 0.223 0.179 0.100 0.198 0.132 0.309 0.130 0.234 0.170 0.477 0.191 0.088 0.315 0.417 0.202 0.271 0.189 0.234 0.199 0.309 0.260 0.272 0.056 0.166 0.048 0.210 0.103 0.166 0.067 0.303 0.111 0.165 0.281 0.335 0.380 0.182 0.164 0.199 In order to obtain a desirable segmentation result, there are two key aspects. On the one hand, users’ input scribbles should be representative, i.e., a scribble should be able to reflect the key geometric features or the overall geometric shape of the target area dominated by the scribble. On the other hand, the segmentation algorithm should be suf- ficiently intelligent to enable different scribbles to snatch their own regions while maintaining a regional balance. Generally speaking, the two aspects are closely related and inseparable. If the geometric shapes of different patches are much different from each other, then it decouples the requirements of input scribbles. For example, for the desk model in Figure 9, whether long scribbles or short scribbles, the segmentation results are identical. However, when the geometric features of different patches are quite similar, scribbles that are more representative have to be required. If the input scribbles are very short and not representative (see the top row of Figure 10), our algorithm may report a low-quality segmentation result. But generally speaking, our algorithm is able to yield a high-quality segmentation result as long as the input scribbles are not intentionally bad. See the bottom row of Figure 10. 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2019.2892076, IEEE Transactions on Visualization and Computer Graphics IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. XX, NO. X, MONTH YEAR 8 Fig. 8. The average Rand Index scores of our segmentation results on training shapes when using different percent of coverage with scribbles. Fig. 9. Segmentation results produced by using short and long scribbles for a Table model. The top line shows short scribbles and the corre- sponding segmentation results. The bottom line shows long scribbles and the corresponding segmentation results. 4.3 Comparison We compare our method with 12 other segmentation al- gorithms including Paint mesh cutting (PMC, [39]), ap- proximate convexity analysis (WcSeg, [16]), Randomized cuts [21], Normalized cuts [21], Random walks [23], K- Means [19], SDF [3], Fitting primitive [40], Core extrac- tion [20], FSDL [10], the method of Kalogerakis et al. (SB19, [8]) and Extreme Learning Machine (ELM, [9]) on the PSB database. All the related segmentation algorithms are compared against human-generated segmentations (pro- vided by [8]). Table 1 shows the comparison of results from our method on training shapes and other algorithms using Rand Index. Additionally, we also use the other four metrics for evaluation and the detailed statistics plots are shown in Figure 11. It can be seen that our segmentation method on training shapes outperforms other segmentation approaches no matter what kind of evaluation metric is used, except the WcSeg method. When comparing with the WcSeg method, our algorithm is also better for all metrics except Consistency Error. No matter what kind of metric is used, lower values mean closer similarity to the human- generated ground truth. Figure 12 provides a qualitative comparison on the Teddy model, the Cup model, and the Armadillo model. Comparison with another scribble-based method. Table 1 shows the comparison between our method and Paint mesh Fig. 10. Segmentation results produced by using short and long scrib- bles on a Bird model. The top line shows short scribbles and the cor- responding segmentation results. The bottom line shows long scribbles and the corresponding segmentation results. cutting (PMC, [39]), which is also a 3D shape segmentation algorithm based on users’ scribbles, with Rand Index score. From the results, we can see that our method outperforms PMC for 16 out of 19 categories. Furthermore, the average Rand Index score of our method for all categories is lower than PMCs’. Additionally, Figure 11 shows the superior performance of our method compared to PMC. Comparison with fully supervised deep learning method. We compare our method with three other 3D shape segmenta- tion methods (FSDL, [10], SB19, [8], ELM, [9]), which em- ploys fully supervised deep learning techniques to segment 3D shapes. Table 2 shows the Rand Index scores for each category of 3D shapes in PSB using these four methods. Note that the results from our method on testing shapes and other methods are obtained by regarding 19 3D shapes in each category (there are 20 3D shapes in each category in total) as training set and the one left out 3D shape is used as testing set. From the results, we can see that Rand Index scores of our method on training shapes for 7 categories are better than FSDLs’ and worse for the other 12 categories. The Rand Index scores of our method on testing shapes for 6 categories is better than FSDLs’ and worse for the other categories. The average performance of our method is worse than FSDL. Similarly, SB19 also achieve slightly better performance than our method. We believe the main reason is that our method is weakly-supervised and utilizes much fewer human labeled training samples than FSDL. Figure 13 shows a qualitative comparison between our method and FSDL on the Fish model, the Bird model, and the Armadillo model. Additionally, we compare our method with the state-of- the-art method PFCN proposed in [38] on ShapeNetCore dataset ([12]), which is also fully supervised. The method combines image-based fully convolutional networks and surface-based conditional random fields to obtain coherent segmentations of 3D shapes. The ShapeNetCore dataset contains 4916 3D shapes totally, where the most 3D shapes in the dataset are non-manifold. Because our method can only be applied to manifold 3D shapes, we remesh all the non-manifold shapes in ShapeNetCore and convert them to manifold version. The training and testing setting used in the two methods follows the one provided by PFCN. Table 3 shows the Rand Index scores of two methods for 14 categories of 3D shapes in the manifold ShapeNetCore dataset. We can see that Rand Index scores of our method on training shapes for 5 categories are better than PFCNs’ and worse for the other 9 categories. The Rand Index scores of 1077-2626 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Shanghai Jiaotong University. Downloaded on February 21,2020 at 10:35:33 UTC from IEEE Xplore. Restrictions apply.
分享到:
收藏