The motivation for this course started with the development of information techniques. The amount of traffic data collected is growing at an increasing rate. At the same time, the users of these data are expecting more sophisticated analysis of these large data sets. The area of data mining has been developed over the last decade to address this problem.
Data Mining is often defined as discovering useful but hidden patterns or relationships in a database, which is one of the hottest fields in computer science. It is a good field to study not only for computer science students, but also for transportation students, as well as lots of or engineer students because the same techniques can be used to solve many problems related to data mining that may arise during their career in the future.
This course intends to cover the basic concepts of data mining as well as specific applications to transportation systems, including data preprocessing, instance-based learning, decision tree, support vector machine, neural network, outlier detection and ensemble learning. The instructors will introduce what the techniques are, what they can do, how they are used, and how they work.
Welcome to join us.
Week 1. Introduction to data mining
1.1 What is data mining?
1.2 Data mining functionality
1.3 Data Mining Techniques
1.4 Summary
Slides
Topic for Discussion: Week 1
Python Foundations
Sklearn
Test 1
Term Project
Term Project
Week 2. Data pre-processing
2.1 Why preprocess the data?
2.2 Data cleaning
2.3 Data integration
2.4 Data reduction
2.5 Data transformation
2.6 Summary
Slides
Topic for Discussion: Week 2
Test 2
Week 3. Instance based learning
3.1 Overview of IBL
3.2 Components of KNN
3.3 Variants of kNN
3.4 Summary
Slides
Topic for Discussion: Week 3
Test 3
Week 4. Decision Trees
4.1 Decision Tree Representation
4.2 Construct Decision Tree
4.3 Overfitting and Tree Pruning
4.4 Pros and Cons of DTs
Slides
Topic for Discussion: Week 4
Test 4
Week 5. Support Vector Machine
5.1 Linear SVMs
5.2 Non-linear SVMs
5.3 Multiclass
5.4 Support Vector Regression
5.5 Summary
Slides
Topic for Discussion: Week 5
Test 5
Week 6. Outlier Mining
6.1 Background of Outlier Detection
6.2 Statistic-based Method
6.3 Distance-based Method
6.4 Density-based Method
6.5 Conclusions
Slides
Topic for Discussion: Week 6
Test 6
Week 7. Ensemble Leaning
7.1 General Idea on Ensemble Methods
7.2 Popular methods for ensemble
7.3 Class-Imbalanced Data
7.4 Summary
Slides
Topic for Discussion: Week 7
Test 7
Week 8 Clustering
8.1 Introduction to Clustering
8.2 K-means and K-medoids
8.3 DBSCAN
8.4 Model Based Clustering
Test 8
International professors\' teaching resources
Satish
Fengxiang Qiao
Bilal
Course Projects
Detection of abnormal driving behavior
Analysis of shared parking choice behavior
Identify factors influencing drowsy driving
Fuel consumption estimation of vehicles
Emission estimation of vehicles
Emissions analysis for LNG bus
Code with Python and Scikit
Code
Academic Write and Present
Skills to write
Skills to present