logo资料库

论文研究 - 基于t-SNE-SVM的银行电话营销预测模型.pdf

第1页 / 共14页
第2页 / 共14页
第3页 / 共14页
第4页 / 共14页
第5页 / 共14页
第6页 / 共14页
第7页 / 共14页
第8页 / 共14页
资料共14页,剩余部分请下载后查看
Bank Telemarketing Forecasting Model Based on t-SNE-SVM
Abstract
Keywords
1. Introduction
2. Prerequisite Knowledge
2.1. t-Distributed Stochastic Neighbor Embedding (t-SNE)
2.2. Support Vector Machine (SVM)
3. Data and Empirical Analysis
3.1. Problem Description
3.2. Data Processing
3.3. Model Framework
4. Evaluation and Comparative Analysis
4.1. Confusion Matrix
4.2. ROC Curve
4.3. Lift Curve
5. Conclusion
Acknowledgements
Conflicts of Interest
References
Appendix
Journal of Service Science and Management, 2020, 13, 435-448 https://www.scirp.org/journal/jssm ISSN Online: 1940-9907 ISSN Print: 1940-9893 Bank Telemarketing Forecasting Model Based on t-SNE-SVM Jianguo Che1, Sai Zhao1, Yongfan Li2, Kai Li1 1Business School, Nankai University, Tianjin, China 2Mechanical Electrical Engineering School, Beijing Information Science & Technology University, Beijing, China How to cite this paper: Che, J. G., Zhao, S., Li, Y. F., & Li, K. (2020). Bank Telemarket- ing Forecasting Model Based on t-SNE-SVM. Journal of Service Science and Manage- ment, 13, 435-448. https://doi.org/10.4236/jssm.2020.133029 Received: April 14, 2020 Accepted: May 15, 2020 Published: May 18, 2020 Copyright © 2020 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access Abstract As a low-cost marketing model, telemarketing has always been the most im- portant channel for banks to promote wealth management products. Tradi- tional telemarketing has not only brought intrusiveness to many telephone access customers, but also a waste of resources for the bank itself. In order to improve the success rate of bank telemarketing, it is necessary to predict in advance which customers are most likely to purchase the wealth management product, so as to achieve precision marketing. Aiming at the complex high-dimensional nonlinear characteristics of the factors affecting the success rate of telemarketing, a t-SNE (t-distributed stochastic neighbor embedding) feature extraction method, and then take the extracted low-dimensional fea- tures as input, use nonlinear support vector machine (SVM) for training and prediction. The empirical results show that the bank phone based on t-SNE-SVM proposed in this paper. The marketing prediction model has good learning ability and generalization ability, which can provide certain de- cision-making reference for banks and other industries to achieve precision marketing. Keywords Telemarketing, t-SNE, Support Vector Machine (SVM), Forecasting, Preci- sion Marketing 1. Introduction As a typical strategy to promote business development, marketing activities can generally be divided into mass marketing and direct marketing. Mass marketing is the use of newspapers, radio, television and other media to promote the gen- eral public, while direct marketing is through mobile phones, fixed phones, Email, etc. directly contact customers to promote products or provide customers with 435 Journal of Service Science and Management DOI: 10.4236/jssm.2020.133029 May 18, 2020
J. G. Che et al. DOI: 10.4236/jssm.2020.133029 discounts. In today’s highly competitive market environment, mass marketing is no longer an effective and reliable method, and marketing is moving from tradi- tional mass marketing moving to direct marketing (Elsalamony, 2014). In direct marketing, because of its low cost and easy communication, telemarketing has gradually become one of the most widely used marketing channels. Compared to traditional account manager marketing and branch site marketing, telemarket- ing is promoting finance to individual customers. There are significant ad- vantages in terms of products, and when the products are suitable, telephone marketing can directly make profits, thereby providing banks with new market- ing channels and profit channels (Song, 2011). However, with the increasing use of telemarketing, companies have also received more and more customer com- plaints (Moro et al., 2012). For most customers who are reluctant to purchase the product, marketing calls may mean intrusiveness. On the other hand, many commercial banks have not implemented effective classified marketing strategies for their customers and often sell the same wealth management product to many customers, which is not effective Use information to analyze customer needs, wasting a lot of manpower and other resources (Liu & Zhang, 2008). Therefore, it is necessary for companies implementing telemarketing to analyze customer data in advance using predictive models in order to select those customers who are most likely to respond to targeted marketing (Sing’oei et al., 2013), which can not only improve the marketing efficiency of bank managers, but also maximize To reduce intrusiveness to non-target customers. Banks usually have a large number of databases consisting of customer infor- mation and transaction information. This can not only provide banks with ac- curate and timely business and management information, but also perform functional query, analysis, and decision advice on information, and provide de- tailed information support for marketing activities (An, 2007). Data mining technology can not only explain past marketing results, but also provide decision support for future marketing activities, so it is widely used in bank marketing business. Many data mining algorithms are used to predict the success rate of telemarketing, such as support vector machines (Moro et al., 2012), deep convo- lutional neural networks (Kim et al., 2015), and comparative analysis of various classification algorithms (Elsalamony, 2014; Moro et al., 2011, 2014). Amponsah et al. (2016) used the J48 decision tree and Naive Bayes classifier to conduct an empirical analysis of the survey data set of a rural bank in Ghana, and provided decision suggestions for bank staff to sell a loan product to community residents (Amponsah et al., 2016). Mitik et al. (2017) constructed a data mining method based on profit-cost analysis. The empirical results show that this method will reduce a small amount of total profit, but because the total cost has dropped sig- nificantly, the total profit/cost ratio has increased significantly. Villuendas-Rey et al. (2017) proposed a naive association classifier. This classifier uses a new similarity operator, which can handle missing values as well as mixed classifica- tion and numerical variables. The classifier is verified better performance than other traditional classifiers by related financial data sets. Zakaryazad et al. (2016) 436 Journal of Service Science and Management
J. G. Che et al. considered the cost of misclassification of each sample, modified the artificial neural network using a penalty function, and applied it to fraud detection and bank marketing. Although the empirical results are not as good as before, this method can achieve better performance when considering profit indicators. In related financial and commercial fields, Pang et al. (2009) embedded Boosting technology into the decision tree C5.0 algorithm, established a personal credit rating model based on C5.0, and performed credit rating on personal credit data of a German bank. Li et al. (2011) introduced the cross-selling sequence model to the cross-selling analysis of domestic commercial banks, constructed a cross-selling sequence model for individual customers, and used Logistic regres- sion to predict the probability of customers buying different products. The re- sults show that it can help banks implement cross-selling strategy effectively. Fang et al. (2014) constructed a personal credit risk early-warning model based on the Lasso-Logistic model. Through empirical analysis of credit card consum- er credit default data, it showed that this model can better capture the key factors affecting consumer credit risk, and at the same time it has higher prediction ac- curacy. Zhang et al. (2015) applied Logistics and SVM to bank credit risk early warning. This method can fully capture and characterize the linear and non-linear complex features of influencing factors on customer defaults. The model has better generalization ability and can warn the credit risks of consum- ers accurately. Xiao et al. (2018) used BP neural network to evaluate the credit of network lenders. The verification of actual P2P transaction data showed that the model has strong predictive ability and can be applied to the risk control of P2P online loan platform to a certain extent. Zhang et al. (2018) constructed a credit risk assessment model of P2P online loan borrowers based on non-equilibrium fuzzy approximate support vector machines. The empirical results show that the model has better classification accuracy and better adaptability than other mod- els. In addition, it can effectively reduce the impact of sample imbalance on clas- sification results. As the bank marketing data set generally contains more variables (such as many customer attributes), on the one hand it will affect the training speed of the model (dimensional disaster), it will consume more time and memory, and on the other hand it is easy to cause over-learning of the model, which makes it dif- ficult to understand, so it is necessary to carry out the dimension reduction. The dimension reduction greatly reduces the time and memory requirements of the data mining algorithm, and can reduce the original data to 2D or 3D makes data easier to visualize (Tan et al., 2001). However, using automatic or semi-automatic feature selection methods to eliminate some input features may cause the lack of input information. To this end, this paper proposes a t-SNE-SVM-based bank telephone marketing prediction method. This method first uses the t-SNE algo- rithm to visually reduce the number of input attributes that may affect the suc- cess rate of telemarketing, while reducing complexity while maximizing infor- mation of original input features is retained. Then use the reduced-dimensional low-dimensional data as input to the SVM algorithm to learn and train to pre- 437 Journal of Service Science and Management DOI: 10.4236/jssm.2020.133029
J. G. Che et al. dict which customers will buy the wealth management products. 2. Prerequisite Knowledge 2.1. t-Distributed Stochastic Neighbor Embedding (t-SNE) t-distributed stochastic neighbor embedding (t-SNE) is a nonlinear dimension- ality reduction and visualization algorithm proposed by Maaten et al. (2008), which maps multi-dimensional data to two or three dimensions suitable for hu- man observation for visualization research (Maaten et al., 2008). t-SNE is de- rived from SNE (Stochastic Neighbor Embedding). The idea of the SNE algo- rithm is that while mapping high-dimensional data to low-dimensional data, try to ensure that the distribution probability between the points is unchanged, that is, similar data points in high-dimensional space have similar distances to low-dimensional space. There are two main disadvantages of the SNE algorithm: one is the large amount of gradient calculation caused by asymmetry, and the other is the crowding problem, that is, the clusters of different classes are crowded together and cannot be distinguished. In addition, the SNE algorithm only focuses on the locality of the data and ignores the globality of the data. The t-SNE algorithm is improved for the above defects. For a detailed introduction of the t-SNE algorithm, see the literature (Maaten et al., 2008). 2.2. Support Vector Machine (SVM) Support Vector Machine (SVM) was first proposed by Corinna Cortes and Vapnik in 1995. It has many unique advantages in solving small sample, non-linear and high-dimensional pattern recognition and is widely used in industry, computer science, finance and other fields. Support vector machine provides better deci- sion boundaries than traditional neural networks. The classification results are better than many parametric or non-parametric statistical techniques, and the problem of overfitting can be overcome through the concept of structural risk minimization (Kim et al., 2018). By non-linearly mapping the input vector to a high-dimensional feature space, the support vector machine can use a linear model to construct a nonlinear classification boundary. Kernel functions are often used instead of dot product operations between vector pairs in transformed space. Common positive definite kernel functions that conform to Mercer’s theorem are ● Linear kernel , used for linearly separable cases, less applicable ( , k x y x y = ⋅ ) for practically complex problems; x y ⋅ ● Polynomial kernel = ) ( ( , k x y 1 d   , is suitable for orthogonal normal- ized data. The larger the parameter d , the higher the dimension of the mapping, the more complicated the calculation, and the “over-fitting” phe- nomenon tends to occur. +   ) ● Radial basis kernel (Gaussian kernel) , this func- tion is the most widely used kernel function because it has fewer parameters and is easier to learn; exp γ − = − x y )2 ( k x y , ) ( DOI: 10.4236/jssm.2020.133029 438 Journal of Service Science and Management
J. G. Che et al. ( ( a x y ⋅ ) ) b ( , k x y ) + = tanh ● Sigmoid kernel , is derived from a neural net- work. If the Sigmoid function is used as the kernel function, the support vec- tor machine is equivalent to a multilayer perceptron neural network. In this paper, we use the radial basis function as the kernel function to con- struct a non-linear support vector machine model to learn and train the low-dimensional input after t-SNE feature extraction. Then we use the con- structed model to make classification predictions on the test samples to effec- tively identify which customers are most likely to order bank wealth manage- ment products. 3. Data and Empirical Analysis 3.1. Problem Description The data in this article comes from the real telephone marketing data collected by Sérgio, with 20 input variables, including 15 customer attributes such as cus- tomer age, occupation, marital status, and five social and economic attributes such as employment change rate, consumer price index, consumer confidence index (Moro et al., 2014). The output variable is whether the customer orders a certain financial product (yes/no), and the detailed introduction of each variable is shown in Table A1 in Appendix. 3.2. Data Processing The data set has a total of 41,188 customer records, of which 4640 customers successfully ordered the wealth management product, accounting for 11.27% of the total. It can be seen that the traditional telephone marketing customer re- sponse rate is very low, which brings intrusion to most customers. It is also a waste of resources for the bank itself. By reviewing the data, there are 6 attributes with missing values (“unknown”), as shown in Table 1. After inspection, it is found that 79.12% of the customers in the credit default (default) field have not experienced a credit default, and the remaining 20.87% of the customers have an unknown default status, and only 3 customers (0.01%) have had a credit default (see http://archive.ics.uci.edu/ml/datasets/Bank+Marketing for a more detailed un- derstanding of the data). Therefore, this field is filtered out during modeling. Table 1. Attributes with missing values. Attributes Job Marital Education Default Housing Loan Number of valid records Effective record ratio 40,858 41,108 39,457 32,591 40,198 40,198 99.2% 99.8% 95.8% 79.1% 97.6% 97.6% 439 Journal of Service Science and Management DOI: 10.4236/jssm.2020.133029
J. G. Che et al. Considering adequate samples, in order to ensure the accuracy of the model, delete the customer records that contain missing values (unknown), and eventu- ally retain 38,245 customer records with complete information, including 4258 (11.13%) customers successfully ordered products, 33,987 (88.87%) customers did not order products. Due to the imbalance between the two types of output (yes/no), the learning process of the model will be troubled, resulting in the lack of application value of the model prediction results. The most commonly used method to solve the problem of class imbalance is undersampling, that is, reducing the number of samples of a larger type, making the positive and negative samples proportion- ally balanced. Therefore, this article keeps 4258 customer records of successfully ordered products, randomly selects 4258 samples from 33987 samples of unor- dered products. In the end, there were 8,516 customer records, of which 50% were customers who ordered/not ordered this product. The input variables con- tain 9 categorical variables, which need to be numericalized first, refer to the variable coding method used by Miguéis et al. (2017) which represents categori- cal variables with dummy variables. By this method, we obtain the input matrix of 8516 × 55, and the output is a binary variable (y = yes/no). 3.3. Model Framework Column standardization of the input variables to eliminate the influence of dif- ferent variable dimensions, and then use the t-SNE algorithm to visualize the dimensionality reduction operation, the default output dimension is 2. Set the number of iterations 1000 times, the relationship between the iteration error and the number of iterations is shown in Figure 1. 60 50 40 30 20 10 0 0 r o r r e n o i t a r e t i 200 400 600 800 1000 iterations DOI: 10.4236/jssm.2020.133029 Figure 1. Iteration error of dimensionality reduction by t-SNE algorithm. 440 Journal of Service Science and Management
J. G. Che et al. The objective function of the t-SNE algorithm is to optimize the KL divergence between the original space and the embedded space sample distribution, but the KL divergence is non-convex, so iterative iterations are required to obtain the final stable (convergent) optimal solutions. It can be seen that the error is suffi- cient to converge after 1000 iterations, and the output result is stable and relia- ble. Using t-SNE and SNE algorithms for dimensionality reduction (both target dimensions are 2 dimensions), the output results are visualized as shown in Fig- ure 2 and Figure 3. 150 100 50 0 -50 -100 -150 -150 0 1 -100 -50 0 50 100 150 Figure 2. Two-dimensional display of raw data by t-SNE algorithm. 0 1 200 150 100 50 0 -50 -100 -150 -200 -200 -150 -100 -50 0 50 100 150 200 Figure 3. Two-dimensional display of raw data by SNE algorithm. 441 Journal of Service Science and Management DOI: 10.4236/jssm.2020.133029
J. G. Che et al. The red dot represents a sample of customers who did not order wealth man- agement products (i.e., “y = 0”, see the legend at the top right of Figure 3), and the blue dot represents a sample of customers who ordered wealth management products (i.e., “y = 1”, see the legend at the top right of Figure 3). It can be seen that through the t-SNE algorithm, many input variables are compressed into a two-dimensional space, and samples of different categories appear blocky State, so it is possible to distinguish the two by using the hyperplane of the nonlinear support vector machine. The visualization effect using SNE dimensionality re- duction is not ideal. A large number of samples overlap together, and it is diffi- cult to distinguish the clusters near the center position. This may be due to the complex nonlinear non-Gaussian nature of the input variable customer attributes and social attributes. In contrast, the t-SNE algorithm uses a long-tailed t distri- bution to fit the distribution of data in low-dimensional space, and is more Ro- bust, so it better captures the overall characteristics of the input data. Using the radial basis function as a kernel function, we establish a nonlinear support vector machine classification (prediction) model. 70% of the samples are randomly selected for learning to determine the decision boundary parameters, i.e., w and b , and the rest 30% samples are used for testing to predict whether customers will order wealth management products and the likelihood of order- ing wealth management products. We set that when the error between two itera- tions is , stopping optimization algorithm. In addition, through several tests, taking , can guarantee the accuracy of the classification model and as well as doesn’t lead to over-fitting. 3 1 10−× 15γ= 4. Evaluation and Comparative Analysis 4.1. Confusion Matrix The confusion matrix is one of the most commonly used indicators to evaluate the quality of the prediction model. Define the positive class in the data set as P (Positive, in this case, is customers who order financial products), and the nega- tive class as N (Negative, in this case, is customers who have not ordered finan- cial products), the confusion matrix is shown as Table 2. The relevant indicators are as follows: Classification Rate/Accuracy: (TN + TP)/(TP + TN + FP + FN); Precision: TP/(TP + FP), how many consumers who are predicted to order fi- nancial products will actually order; Specificity: TN/(TN + FP), how many customers who will not actually order financial products are successfully identified; Recall: TP/(TP + FN), how many customers who actually order financial prod- ucts are successfully identified. The confusion matrix of the test set obtained using the t-SNE-SVM prediction model in this paper is shown in Table 3. Calculating from Table 3, we get: 1) accuracy = 86.07%, 2) precision = 83.64%, 3) specificity = 82.82%, 4) recall = 89.38%. 442 Journal of Service Science and Management DOI: 10.4236/jssm.2020.133029
分享到:
收藏