logo资料库

Educational Data Mining: Applications and Trends.pdf

第1页 / 共477页
第2页 / 共477页
第3页 / 共477页
第4页 / 共477页
第5页 / 共477页
第6页 / 共477页
第7页 / 共477页
第8页 / 共477页
资料共477页,剩余部分请下载后查看
Preface
Contents
Part I Profile
1 Which Contribution Does EDM Provide to Computer-Based Learning Environments?
Abstract
1.1…Introduction
1.2…Educational Data Mining
1.2.1 Definition
1.2.2 Areas in Relation to EDM
1.2.3 Objectives of the EDM
1.2.4 The Used Methods
1.2.5 The Analyzed Data
1.2.6 Process of Applying the EDM
1.2.7 Some Technological Tools Used in EDM
1.3…Examples of EDM Applications in Computer-Based Learning Environments
1.3.1 EDM Applications for Predicting and Evaluating Learning Performance
1.3.2 EDM Applications for Analyzing Learners’ Behaviors
1.3.3 Discussion
1.4…Conclusions
References
2 A Survey on Pre-Processing Educational Data
Abstract
2.1…Introduction
2.2…Types of Educational Environments
2.2.1 Learning Management Systems
2.2.2 Massive Open Online Courses
2.2.3 Intelligent Tutoring Systems
2.2.4 Adaptive and Intelligent Hypermedia Systems
2.2.5 Test and Quiz Systems
2.2.6 Other Types of Educational Systems
2.3…Types of Data
2.3.1 Relational Data
2.3.2 Transactional Data
2.3.3 Temporal, Sequence and Time Series Data
2.3.4 Text Data
2.3.5 Multimedia Data
2.3.6 World Wide Web Data
2.4…Pre-Processing Tasks
2.4.1 Data Gathering
2.4.2 Data Aggregation/Integration
2.4.3 Data Cleaning
2.4.4 User and Session Identification
2.4.5 Attribute/Variable Selection
2.4.6 Data Filtering
2.4.7 Data Transformation
2.5…Pre-Processing Tools
2.5.1 General Purpose Data Pre-Processing Tools
2.5.2 Specific Purpose Data Pre-Processing Tools
2.6…Conclusions
Acknowledgments
References
3 How Educational Data Mining Empowers State Policies to Reform Education: The Mexican Case Study
Abstract
3.1…Introduction
3.2…Domain Study
3.2.1 A Glance at Data Mining
3.2.2 Educational Data Mining in a Nutshell
3.3…Related Works
3.4…Context
3.4.1 The Mexican State
3.4.2 Educational Community
3.4.3 National Assessments
3.4.4 The Constitutional Reform in Education
3.4.5 Community Reaction
3.5…Source Data
3.5.1 EXCALE Databases
3.5.2 Source Data Students’ Opinions
3.5.3 Framework
3.5.4 Exploration Analysis
3.6…Educational Data Mining Approach
3.6.1 Essential Mining
3.6.2 Supplementary Mining
3.7…Discussion
3.7.1 Interpretations of the Basic Findings
3.7.2 Interpretation of Supplementary Mining
3.7.3 A Diagnostic of Students Opinions
3.8…Conclusions
Acknowledgments
References
Part II Student Modeling
4 Modeling Student Performance in Higher Education Using Data Mining
Abstract
4.1…Introduction
4.2…Background
4.2.1 The Decision Tree Classification Model
4.2.2 The Decision Tree Mechanism
4.3…System Overview, Software Interface and Architecture
4.4…Case Study: Modeling Student Performance
4.4.1 Data Description
4.4.2 Data Preparation
4.4.3 Analyzer Model
4.5…Discussion of Results
4.6…Conclusions
References
5 Using Data Mining Techniques to Detect the Personality of Players in an Educational Game
Abstract
5.1…Introduction
5.2…Literature Review
5.2.1 Personality in Computer-Based Learning Environments
5.2.2 Emotion Detection Using Leary’s Rose Frameboard
5.2.3 Automatic Detection of Personality
5.2.4 Personality and Student Behavior
5.2.5 The Relationship Between Personality Traits and Information Competency
5.2.6 Personality Traits and Learning Style in Academic Performance
5.2.7 A Neural Network Model for Human Personality
5.2.8 Relationships Between Academic Motivation and Personality Among the Students
5.2.9 Relation Between Learning from Errors and Personality
5.2.10 Academic Achievement and Big Five Model
5.2.11 The Big Five Personality, Learning Styles, and Academic Achievement
5.2.12 Using Personality and Cognitive Ability to Predict Academic Achievement
5.3…Leary’s Interpersonal Frame Board
5.3.1 Land Science Game
5.3.2 Participants and Data Set Construction
5.4…Annotation Scheme
5.4.1 Human Annotation
5.5…Model
5.5.1 Lexicon Resources
5.5.2 Feature Extraction
5.5.3 The Linguistic Inquiry and Word Count Features
5.5.4 Automated Approaches to Personality Classification
5.5.5 Classification Method
5.6…Experience and Results
5.6.1 Classification Results
5.7…Discussion and Analysis
5.7.1 Personality Trait Tracking Analysis
5.7.2 ANOVA Analysis
5.8…Conclusion and Future Research
Acknowledgments
References
6 Students’ Performance Prediction Using Multi-Channel Decision Fusion
Abstract
6.1…Introduction
6.2…Student Modeling
6.3…Performance Prediction
6.3.1 Performance Prediction in ITS
6.3.2 Data Mining Approaches for Prediction
6.4…Multi-Channel Decision Fusion Performance Prediction
6.4.1 Determining the Performance Level in Assignment Categories
6.4.2 Determining Overall Performance Levels
6.4.3 Mapping from the Performance in Assignment Categories to Overall Performance
6.4.4 The Characteristics of Assignment Categories
6.5…Experimental Results and Discussion
6.6…Conclusion and Future Work
Acknowledgements
References
7 Predicting Student Performance from Combined Data Sources
Abstract
7.1…Introduction
7.2…Defining the Problem
7.2.1 Problem Specification 1
7.2.2 Problem Specification 2
7.2.3 Problem Specification 3
7.2.4 Problem Specification 4
7.3…Sources of Student Data
7.3.1 Student Activity Data from the Virtual Learning Environment
7.3.2 Demographic Data
7.3.3 Past Study
7.3.4 Assessment Data
7.4…Feature Selection and Data-Filtering
7.5…Classifiers for Predicting Student Outcome
7.5.1 Support Vector Machines and Decision Trees
7.5.2 General Unary Hypotheses Automaton
7.5.3 Bayesian Networks and Regression
7.6…Evaluation Framework
7.7…Real-Time Prediction
7.8…Revisiting the Problem Specification in Light of Results
7.8.1 Problem Specification 4 (Revised)
7.8.2 Problem Specification 5
7.8.3 Problem Specification 6
7.9…Developing and Testing Models on Open University Data (A Case Study)
7.10…Beyond OU: Applying Models on Alternative Data Sources
7.11…Conclusions
Acknowledgments
References
8 Predicting Learner Answers Correctness Through Eye Movements with Random Forest
Abstract
8.1…Introduction
8.2…Background
8.2.1 Related Work
8.2.2 Cognitive Processes
8.2.3 Eye Movement Data
8.2.4 Random Forest
8.3…Method
8.3.1 The Purpose of the Study
8.3.2 Design
8.3.3 Pre-Application
8.3.4 Application
8.3.5 Study Group
8.3.6 Data Collection Instruments
8.4…Analyses of Results
8.5…Conclusion and Discussion
8.6…Future Work
A.1. Appendix: Supplementary
A.1.1 Question-1
A.1.2 Question-2
A.1.3 Question-3
A.1.4 Question-4
A.1.5 Question-5
A.1.6 Question-6
A.1.7 Question-7
A.1.8 Question-8
A.1.9 Question-9
References
Part III Assessment
9 Mining Domain Knowledge for Coherence Assessment of Students Proposal Drafts
Abstract
9.1…Introduction
9.2…Background
9.2.1 Global Coherence
9.2.2 Latent Semantic Analysis
9.2.3 Related Work
9.3…Analyzer Model of Global Coherence
9.4…Data Description (Corpus)
9.5…Experiments
9.5.1 Experimental Design
9.5.2 Agreement Evaluation
9.5.3 Across Section Exploration
9.6…Analysis and Discussion of Results
9.6.1 Across Section Exploration
9.7…System Overview
9.7.1 Intelligent Tutoring System
9.7.2 Web Interface
9.8…Conclusions
Acknowledgments
References
10 Adaptive Testing in Programming Courses Based on Educational Data Mining Techniques
Abstract
10.1…Introduction
10.2…Related Work
10.3…Background
10.3.1 Environment
10.3.2 Data Set
10.4…Modeling Programming Knowledge
10.4.1 Programming Knowledge Overview
10.4.2 Modeling Programming Competencies
10.4.3 Modeling Programming Concepts of the C Language
10.5…Estimating Test Difficulty
10.5.1 Estimating Test Item Difficulty
10.5.2 Estimating Student Capacity
10.6…Test Generation Algorithm
10.7…Application and Results
10.8…Conclusion and Future Work
Acknowledgments
References
11 Plan Recognition and Visualization in Exploratory Learning Environments
Abstract
11.1…Introduction
11.2…Related Work
11.2.1 Plan Recognition
11.2.2 Assessment of Students’ Activities
11.3…The Virtual Labs Domain
11.4…Plan Recognition in Virtual Laboratories
11.4.1 Actions, Recipes, and Plans
11.4.2 The Plan Recognition Algorithm
11.4.3 Empirical Methodology
11.4.4 Complete Algorithms
11.5…Visualizing Students’ Activities
11.5.1 Visualization Methods
11.5.2 Empirical Methodology
11.5.3 Results
11.5.4 Discussion
11.6…Conclusion and Future Work
11.7…Experimental Problems
11.8…The Recipe Library for the Dilution Problem
11.8.1 Dilution Problem Recipes
11.8.2 Recipes Explanation
11.9…User Study Questionnaire
References
12 Finding Dependency of Test Items from Students’ Response Data
Abstract
12.1…Introduction
12.2…Related Work
12.3…Mutual Independency Measure
12.3.1 Preliminaries
12.3.2 Mutual Information Measure
12.3.3 Finding the Best Dependency Tree
12.3.4 An Example
12.3.5 Extensions
12.4…Proof-of-Concept Experiments
12.4.1 Data
12.4.2 Results on Synthetic Data Sets
12.4.3 Results on Real Data
12.5…Conclusions and Future Work
Acknowledgment
References
Part IV Trends
13 Mining Texts, Learner Productions and Strategies with ReaderBench
Abstract
13.1…Introduction
13.2…Data and Text Mining for Educational Applications
13.2.1 Predicting Learner Comprehension
13.3…Textual Complexity Assessment for Comprehension Prediction
13.3.1 The Impact of Reading Strategies Extracted from Self-Explanations for Comprehension Assessment
13.4…Cohesion-Based Discourse Analysis: Building the Cohesion Graph
13.5…Topics Extraction
13.6…Cohesion-Based Scoring Mechanism of the Analysis Elements
13.7…Identification Heuristics for Reading Strategies
13.8…Multi-Dimensional Model for Assessing Textual Complexity
13.9…Results
13.10…A Comparison of ReaderBench with Previous Work
13.11…Conclusions
Acknowledgments
References
14 Maximizing the Value of Student Ratings Through Data Mining
Abstract
14.1…Introduction
14.2…Description of the Data Set
14.2.1 The Process for Collecting Evaluations and Presenting Results
14.2.2 Details About the Data Set
14.2.3 Questions and Variables of Interest
14.2.4 Selected Results from the Statistical Analysis
14.3…The Methodology
14.3.1 A High-Level View of the Process
14.3.2 Corpus Word Analysis
14.3.3 Category Selection
14.3.4 The Domain-Specific Lexicon
14.3.5 The Assessment Process
14.3.6 Refining the Lexicon
14.3.7 The Algorithm
14.4…Assessment Results
14.4.1 Qualitative Validity Assessment of Category Scores by Teaching and Learning Specialists
14.4.2 Quantitative Assessment Through the Comparison of Summary Scores with Overall Instructor Performance Ratings
14.4.3 Quantitative Assessment Through the Comparison of Category and Summary Scores for Teaching Award Winners with All Instructors
14.5…Applications of the Methodology
14.5.1 Evaluation of Instruction at the University of Mississippi
14.5.2 Other Educational Applications
14.6…Future Work
Acknowledgments
References
15 Data Mining and Social Network Analysis in the Educational Field: An Application for Non-Expert Users
Abstract
15.1…Introduction
15.2…Background and Related Work
15.2.1 Social Network Analysis
15.2.2 Classification Applied to the Educational Context: Students’ Performance and Dropout
15.2.3 Data Mining Tools for Non-expert Users
15.3…E-Learning Web Miner
15.3.1 Description of E-Learning Web Miner
15.3.2 General View of the E-Learning Web Miner Architecture
15.3.3 New Services Provided
15.3.4 Mode of Working
15.4…Case Study
15.4.1 Courses
15.4.2 Social Network Analysis in E-Learning Courses
15.4.3 Prediction of Students’ Performance and Dropouts
15.5…Conclusions
Acknowledgements
References
16 Collaborative Learning of Students in Online Discussion Forums: A Social Network Analysis Perspective
Abstract
16.1…Introduction
16.2…Background and Related Works
16.2.1 On Collaborative Learning and E-Learning: An Educational Perspective
16.2.2 Social Networks: A Data Mining Perspective
16.2.3 Social Network Analysis of Online Educational Forums: Related Works
16.3…Network Analysis in E-Learning
16.3.1 Students Interaction Network
16.3.2 Term Co-Occurrence Network
16.4…Case Studies
16.4.1 Extracting Networks
16.4.2 Interpreting Students Interaction Network
16.4.3 Interpreting Term Co-Occurrence Network
16.4.4 Objective Evaluation
16.5…Conclusions
References
Author Index
Studies in Computational Intelligence 524 Alejandro Peña-Ayala Editor Educational Data Mining Applications and Trends
Studies in Computational Intelligence Volume 524 Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland e-mail: kacprzyk@ibspan.waw.pl For further volumes: http://www.springer.com/series/7092
About this Series ‘‘Studies Intelligence’’ in Computational The series (SCI) publishes new developments and advances in the various areas of computational intelligence— quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world- wide distribution, which enable both wide and rapid dissemination of research output.
Alejandro Peña-Ayala Editor Educational Data Mining Applications and Trends 123
Editor Alejandro Peña-Ayala World Outreach Light to the Nations Ministries Escuela Superior de Ingeniería Mecánica y Eléctrica, Zacatenco Instituto Politécnico Nacional Mexico City Mexico ISSN 1860-949X ISBN 978-3-319-02737-1 DOI 10.1007/978-3-319-02738-8 Springer Cham Heidelberg New York Dordrecht London ISSN 1860-9503 (electronic) ISBN 978-3-319-02738-8 (eBook) Library of Congress Control Number: 2013953247 Ó Springer International Publishing Switzerland 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface Educational Data Mining (EDM) is a new discipline based on the Data Mining (DM) grounds (i.e., the baseline is composed of models, tasks, methods, and algorithms) to explore data from educational settings to find out descriptive pat- terns and predictions that characterize learners behaviors and achievements, domain knowledge and applications. functionalities, assessments, educational content, This book introduces concepts, models, frameworks, tasks, methods, and algorithms, as well as tools and case studies of the EDM field. The chapters make up a sample of the work currently achieved in countries from the five continents, which illustrates the world labor of the EDM arena. According to the nature of the contributions accepted for this volume, four kinds of topics are identified as follows: • Profile shapes a conceptual view of the EDM. It provides an introduction of the nature, purpose, components, processes, and applications. Through this section, readers are encouraged to: make an incursion in the EDM field, facilitate the extraction of source data to be mined, and acquire consciousness of the use- fulness of this sort of approaches to support education policies. • Student Modeling is an essential functionality of Computer-Based Educational Systems (CBES) to adapt their performance according to users needs. Most of the EDM approaches are oriented to characterize diverse student traits, such as: behavior, acquired domain knowledge, personality, and academic achievements by means of machine learning methods. • Assessment evaluates learners’ domain knowledge acquisition, skills develop- ment, and achieved outcomes, as well as reflection, inquiring, and sentiments are essential subjects to be taken into account by CBES. The purpose is to differentiate student proficiency at the finer grained level through static and dynamic testing, as well as online and offline assessment. • Trends focus on some of the new demands for applying EDM, such as text mining and social networks analysis. Both targets represent challenges to cope with huge, dynamical, and heterogeneous information that new generations of students produce in their every day life. These paradigms represent new edu- cational settings such as: ubiquitous-learning and educational networking. v
vi Preface This volume is the result of one year of effort, where more than 30 chapters were rigorous peer reviewed by a team of 60 reviewers. After several cycles of chapter submission, revision, and tuning based on the Springer quality principles, 16 works were approved, edited as chapters, and organized according to the prior four topics. So the Part I corresponds to Profile that includes Chaps. 1–3; the Part II represents Student Modeling, which embraces Chaps. 4–8; the Part III concerns Assessment and has Chaps. 9–12; the Part IV is related to Trends through Chaps. 13–16. A profile of the chapters is given next: 1. Chapter 1 provides a bibliographic review of studies made in the field of Educational Data Mining (EDM) to identify diverse aspects related to tech- niques and contributions in the field of computer-based learning. Authors pursue to facilitate the use and understanding of Data Mining (DM) techniques to help the educational specialists to develop EDM approaches. 2. Chapter 2 overcomes the lack of data preprocessing literature through the detailed exposition of the tasks involved to extract, clean, transform, and provide suitable data worthy to be mined. The work depicts educational environments and data they offer; as well as gives examples of Moodle data and tools. 3. Chapter 3 illustrates how EDM is able to support government policies devoted to enhance education. The work shapes the context of basic education and how the government aims at reforming the current practices of evaluation to aca- demics and students. Several findings extracted from surveys are shown to highlight the opinion of the community and provide an initial diagnostic. 4. Chapter 4 presents the Student Knowledge Discovery Software, a tool to explore the factors having an impact on the student success based on student profiling. Authors deeply outline how to implement the software to help educational organizations to better understand knowledge discovery processes. 5. Chapter 5 explains how to automate the detection of student’s personality and behavior in an educational game called Land Science. The work includes a model to learn vector space representations for various extracted features. Learner personality is detected by combining the features spaces from psy- cholinguistics and computational linguistics. 6. Chapter 6 attempts to predict student performance to better adjust educational materials and strategies throughout the learning process. Thus, authors design a multichannel decision fusion approach to estimate the overall student per- formance. Such an approach is based on the performance achieved in assignment categories. 7. Chapter 7 explores predictive modeling methods for identifying students who will most benefit from tutor interventions. Authors assert how the predictive capacity of diverse sources of data changes as the course progresses, as well as how a student’s pattern of behavior changes during the course. 8. Chapter 8 predicts learner achievements by recording learner eye movements and mouse click counts. The findings claim: the most important eye metrics
Preface vii that predict answers in reasoning questions include total fixation duration, number of mouse clicks, fixation count, and visit duration. 9. Chapter 9 focuses on coherence expressed in research protocols and thesis. Authors develop a coherence analyzer that employs Latent Semantic Analysis to mine domain knowledge. The analysis outcomes are used to grade students and provide online support with the aim at improving their writings. 10. Chapter 10 tailors an approach to automatically generate tests. It recognizes competence areas and matches the overall competence level of target students. The approach makes use of a concept map of programming competencies and a method for estimating the test item difficulty. The contribution is evaluated in a setting where its results are compared against a solution that randomly searches within the item space to find an adequate test. 11. Chapter 11 outlines methods oriented to support teachers’ understanding of students’ activities on Exploratory Learning Environments (ELE). The work includes an algorithm that intelligently recognizes student activities and visualization facilities for presenting these activities to teachers. The approach is evaluated using real data obtained from students using an ELE to solve six representative problems from introductory chemistry courses. 12. Chapter 12 adopts the concept of entropy from information theory to find the most dependent test items in student responses. The work defines a distance metric to estimate the amount of mutual independency between two items that is used to quantify how independent two items are in a test. The trials show: the approach for finding the best dependency tree is fast and scalable. 13. Chapter 13 proposes ReaderBench, an environment for assessing learner productions and supporting teachers. It applies text mining to perform: assessment of the reading materials, assignment of texts to learners, detection of reading strategies, and comprehension evaluation to fostering learner’s self- regulation process. All of these tasks were subject of empirical validations. 14. Chapter 14 analyzes a data set consisting of student narrative comments that were collected using an online process. The approach uses category vectors to depict instructor traits and a domain-specific lexicon. Sentiment analysis is also used to detect and gauge attitudes embedded in comments about each category. The approach is useful to instructors and administrators, and is a vehicle to analyze student perceptions of teaching to feedback the educational process. 15. Chapter 15 introduces E-learning Web Miner, a tool that assists academics to discover student behavior profiles, models of how they collaborate, and their performance with the purpose of enhancing the teaching-learning process. The tool applies Social Network Analysis (SNA) and classification techniques. 16. Chapter 16 depicts an approach to assess the students’ participation by the analysis of their interactions in social networks. It includes metrics for ranking and determining roles to analyze the student communications, the forming of groups, the role changes, and the interpretation of exchanged messages.
分享到:
收藏