国科大刘莹数据挖掘第二次作业.pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：0.94M 资料格式：pdf 举报版权申诉

lh_geek-11952749-4744300845221179405.pdf-第1页.png

第1页 / 共7页

lh_geek-11952749-4744300845221179405.pdf-第2页.png

第2页 / 共7页

lh_geek-11952749-4744300845221179405.pdf-第3页.png

第3页 / 共7页

lh_geek-11952749-4744300845221179405.pdf-第4页.png

第4页 / 共7页

lh_geek-11952749-4744300845221179405.pdf-第5页.png

第5页 / 共7页

lh_geek-11952749-4744300845221179405.pdf-第6页.png

第6页 / 共7页

lh_geek-11952749-4744300845221179405.pdf-第7页.png

第7页 / 共7页

文本预览

1、（a）0.6*4 = 2.4，so the minimum support is 3 Part I So ,frequent itemsets are ：{{A},{B},{D},{A,B},{A,D},{B,D},{A,B,D}} （b） {a, b}{c}: confidence = 2/4=0.5 {c}{a, b}: confidence = 2/2=1 So ,conﬁdence is not a symmetric measure. （c）

2、（a）（b） Apriori 算法: 3 次扫描数据库，并为了排除不满足情况产生了候补的子项，重复的扫描。 FP-growth 算法：2 次扫描数据库，且不产生候补子项，在这方面要上优于 Apriori 算法，但需要借助 FP-tree 的搭建来产生频繁子项。 3、（a）见下图

Question 1: PartII （1）Sort the rules by lift： lift 的 5 个最高规则为： 1、tomato souce => pasta 2、coffee^milk => pasta 3、biscuits^paste => milk 4、water^pasta => milk 5、juices => milk 按照 lift 排序的前 5 个规则没有冗余规则

（2）Sort the rules by support： support 的 5 个最高规则为： 1、milk 2、pasta => milk 3、water => milk 4、biscuits => milk 5、brioches => milk 从结果中可以看出规则 2,3,4,5 是冗余的，因为可以只通过销售 milk，而不需要通过 pasta,water,biscuits 以及 brioches 来销售 milk,因此可以去掉。（3）Sort the rules by confidence：

confidence 的 5 个最高规则为： 1、biscuits^pasta => milk 2、water^pasta => milk 3、juices => milk 4、tomato souce => paste 5、yoghurt => milk 按照 confidence 排序的前 5 个规则没有冗余规则 Question 2 1、（a）当 Minimum records per child branch=56 时的混淆矩阵（b）当 Minimum records per child branch=15 时的混淆矩阵（c）当 Minimum records per child branch=10 时的混淆矩阵

2、在拓扑图中加入分析模块，得到各个树的正确率 Minimum records per child branch 56 正确率错误率 71.5 28.5 15 84.5 15.5 10 84% 16% 根据正确率分析来看，当 Minimum records per child branch=15 时，正确率最高，因此选择这个决策树模型。 3、首先构造要预测的数据文件 predict_data.txt，然后利用生成的决策树模型进行预测预测结果如下所示：其中最后一列$C-pep 即为预测的 RECOMMEND PEP 结果。

分享到：

赞收藏

资料库

国科大刘莹数据挖掘第二次作业.pdf

相关推荐

课程资源

热门标签

最新资料