Datamining I

Print

Sofokli Garo, PhD

Code
CMP 401
Name
Data Mining I
Semester
1
Lecture hours
3.00
Seminar hours
1.00
Laborator hours
0.00
Credits
3.50
ECTS
6.00
Description

This course explores the concepts and techniques of knowledge discovery and data mining. As a multidisciplinary field, data mining draws on work from areas including statistics, machine learning, pattern recognition, database technology, information retrieval, network science, knowledge-based systems, artificial intelligence, high-performance computing, and data visualization. This course focuses on issues relating to the feasibility, use- fulness, effectiveness, and scalability of techniques for the discovery of patterns hidden in large data sets. As a result, this course is not intended as an introduction to statistics, machine learning, database systems, or other such areas, although it does provide some background knowledge to facilitate the reader’s comprehension of their respective roles in data mining.

Objectives

This course aims to: - Familiarize students with data types. - To equip students with the different techniques and ways of analyzing large amounts of data. - To provide students with proper skills to implement data preprocessing methods. - To explain the importance and influence of Data Mining in in field of computer science for finding valuable information in Big Data. - To develop students' critical thinking in analyzing and finding patterns in multi-dimensional data.

Java
Tema
1
Introduction to Data Mining This topic provides an overview of the course, and it will cover topics such as what is Data Mining, the origin and reason for the development of Data Mining as well as the main tasks that can be performed by using Data Mining techniques and methods. (Main Lit., pg. 21-42)
2
Types of Data This lecture covers topics such as attribute types, measurement and values, data categorization and transformation, data sets and their types, as well as data quality analysis due to data measurement and data collection problems. (Main Lit., pg. 43-69)
3
Data Preprocessing - 1 In this lecture, different similarity and distance measures will be covered. The main types of distances that will be treated are Euclidean Distance, Minkoski Distance, Mahalanobis Distance, the main types of Similarities that will be covered are Similarity between binary vectors, Cosinusoidal Similarity and Pearson Correlation. (Main Lit., pg. 91-110)
4
Data Preprocessing - 2 This lecture will cover data preprocessing techniques such as Aggregation, Sampling, Dimensionality Reduction, Feature Subset Selection, New Feature Creation, Discretization, Binarization, Variable Transformation as well as measurement units based on information. (Main Lit., pg. 70-90)
5
Data Exploration - 1 This lecture will cover the basic elements of data summary statistics, such as types of means, types of distributions, various measures of similarity and dissimilarity between different data objects, types of proximity units, mutual information and the use of proper techniques in selecting the appropriate measurement unit. (Recommended Lit., pg. 44 – 55)
6
Data Exploration - 2 This lecture will cover the basic elements of data summary statistics, such as types of means, types of distributions, various measures of similarity and dissimilarity between different data objects, types of proximity units, mutual information and the use of proper techniques in selecting the appropriate measurement unit. (Recommended Lit., pg. 56 – 64 and Main Lit., pg. 110-132)
7
Classification: Basic Concepts and Techniques - 1 This lecture will cover the basic concepts of classification, multiclass and binary classification, general approaches for building a classification model, basic methods for presenting test conditions, the calculation of different impurity measures for different data types, and basic classification algorithms. (Main Lit., pg. 133-167)
8
Midterm Exam
9
Classification: Basic Concepts and Techniques - 2 This lecture will cover the topics of overfitting in the selected classification model, evaluation and selection of different classification models, hyper parameters and limitations of the basic classification algorithms. (Main Lit., pg. 167-212)
10
Association Rules: Basic Concepts and Algorithms This lecture will cover the basic concepts and algorithms of association rules such as the Apriori principle, frequent itemset generation, candidate generation and pruning, techniques and methods for generation of association rules as well as the computational complexity of the basic association rules algorithms. (Main Lit., pg. 213-239)
11
Rules of Association: Issues in model selection and evaluation This lecture will cover topics such as compact representation of frequent itemset, alternative methods for generating frequent itemset, FP-growth algorithm, evaluation of association patterns and the effect of skewed support distribution. (Main Lit., pg. 240-306)
12
Cluster Analysis: Basic Concepts and Algorithms This lecture will cover the basic concepts and algorithms of cluster analysis such as what is Cluster Analysis, the different types of clustering methods, the different types of clusters as well as a detailed analysis of the K-means algorithm. (Main Lit., pg. 307-335)
13
Cluster Analysis: Issues in Model Selection and Evaluation This lecture will cover the basic concepts and algorithms of cluster analysis such as hierarchical agglomerative clustering, detailed analysis of the DBSCAN algorithm, as well as different techniques and methods for cluster evaluation. (Main Lit., pg. 336-394)
14
Classification: Alternative Techniques This lecture will cover topics such as types of classifiers, rule based classifiers, nearest neighbor classifiers, Naïve Bayes classifiers, Logistic Regression, Artificial Neural Network and Support Vector Machine. (Main Lit., pg. 395-463, 478 - 498)
15
Project Presentation and General Review
16
Final Exam
1
Students will be able to understand different types of data.
2
Students will have knowledge of statistical data features and units of measurement.
3
Students will be able to understand the importance of Data Mining in finding valuable information.
4
Students will acquire the most important concepts related to the basic models and algorithms and will have the skills for finding valuable information.
5
Students will be ready to implement in practice given knowledge.
6
Students will be equipped with sufficient theoretical and practical knowledge to continue with other subsequent subjects related to Data Mining.
Quantity Percentage Total percent
Midterms
1 30% 30%
Quizzes
0 0% 0%
Projects
1 30% 30%
Term projects
0 0% 0%
Laboratories
0 0% 0%
Class participation
0 0% 0%
Total term evaluation percent
60%
Final exam percent
40%
Total percent
100%
Quantity Duration (hours) Total (hours)
Course duration (including exam weeks)
16 4 64
Off class study hours
14 4 56
Duties
1 10 10
Midterms
1 10 10
Final exam
1 10 10
Other
0 0 0
Total workLoad
150
Total workload / 25 (hours)
6.00
ECTS
6.00