logo资料库

Instructor’s Solution Manual 课后题签字 答案也是英文的哦.pdf

第1页 / 共169页
第2页 / 共169页
第3页 / 共169页
第4页 / 共169页
第5页 / 共169页
第6页 / 共169页
第7页 / 共169页
第8页 / 共169页
资料共169页,剩余部分请下载后查看
Introduction to Data Mining Instructor’s Solution Manual Pang-Ning Tan Michael Steinbach Vipin Kumar Copyright c2006 Pearson Addison-Wesley. All rights reserved.
1 5 19 25 45 71 95 125 147 157 Contents 1 Introduction 2 Data 3 Exploring Data 4 Classification: Basic Concepts, Decision Trees, and Model Evaluation 5 Classification: Alternative Techniques 6 Association Analysis: Basic Concepts and Algorithms 7 Association Analysis: Advanced Concepts 8 Cluster Analysis: Basic Concepts and Algorithms 9 Cluster Analysis: Additional Issues and Algorithms 10 Anomaly Detection iii
1 Introduction 1. Discuss whether or not each of the following activities is a data mining task. (a) Dividing the customers of a company according to their gender. No. This is a simple database query. (b) Dividing the customers of a company according to their prof- itability. No. This is an accounting calculation, followed by the applica- tion of a threshold. However, predicting the profitability of a new customer would be data mining. (c) Computing the total sales of a company. No. Again, this is simple accounting. (d) Sorting a student database based on student identification num- bers. No. Again, this is a simple database query. (e) Predicting the outcomes of tossing a (fair) pair of dice. No. Since the die is fair, this is a probability calculation. If the die were not fair, and we needed to estimate the probabilities of each outcome from the data, then this is more like the problems considered by data mining. However, in this specific case, solu- tions to this problem were developed by mathematicians a long time ago, and thus, we wouldn’t consider it to be data mining. (f) Predicting the future stock price of a company using historical records. Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the
2 Chapter 1 Introduction area of data mining known as predictive modelling. We could use regression for this modelling, although researchers in many fields have developed a wide variety of techniques for predicting time series. (g) Monitoring the heart rate of a patient for abnormalities. Yes. We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly de- tection. This could also be considered as a classification problem if we had examples of both normal and abnormal heart behavior. (h) Monitoring seismic waves for earthquake activities. Yes. In this case, we would build a model of different types of seismic wave behavior associated with earthquake activities and raise an alarm when one of these different types of seismic activity was observed. This is an example of the area of data mining known as classification. (i) Extracting the frequencies of a sound wave. No. This is signal processing. 2. Suppose that you are employed as a data mining consultant for an In- ternet search engine company. Describe how data mining can help the company by giving specific examples of how techniques, such as clus- tering, classification, association rule mining, and anomaly detection can be applied. The following are examples of possible answers. “Sports,” “Politics,” etc. • Clustering can group results with a similar theme and present them to the user in a more concise form, e.g., by reporting the 10 most frequent words in the cluster. • Classification can assign results to pre-defined categories such as • Sequential association analysis can detect that that certain queries follow certain other queries with a high probability, allowing for more efficient caching. • Anomaly detection techniques can discover unusual patterns of user traffic, e.g., that one subject has suddenly become much more popular. Advertising strategies could be adjusted to take advantage of such developments.
3 3. For each of the following data sets, explain whether or not data privacy is an important issue. (a) Census data collected from 1900–1950. No (b) IP addresses and visit times of Web users who visit your Website. Yes (c) Images from Earth-orbiting satellites. No (d) Names and addresses of people from the telephone book. No (e) Names and email addresses collected from the Web. No
分享到:
收藏