- Code
- CMP 401
- Name
- Data Mining I
- Semester
- 1
- Lecture hours
- 3.00
- Seminar hours
- 1.00
- Laborator hours
- 0.00
- Credits
- 3.50
- ECTS
- 6.00
- Description
-
Ky kurs eksploron konceptet dhe teknikat e zbulimit të njohurive dhe nxjerrjes së të dhënave. Si një fushë shumëdisiplinore, nxjerrja e të dhënave bazohet në punë nga fusha duke përfshirë statistikat, mësimin e makinerive, njohjen e modeleve, teknologjinë e bazës së të dhënave, rikthimin e informacionit, shkencën e rrjetit, sistemet e bazuara në njohuri, inteligjencën artificiale, llogaritjen me performancë të lartë dhe vizualizimin e të dhënave. Ky kurs fokusohet në çështjet që kanë të bëjnë me fizibilitetin, dobinë, efektivitetin dhe shkallëzimin e teknikave për zbulimin e modeleve të fshehura në grupe të mëdha të dhënash. Si rezultat, ky kurs nuk synohet si një hyrje në statistikat, mësimin e makinerive, sistemet e bazës së të dhënave ose fusha të tjera të tilla, megjithëse ofron disa njohuri bazë për të lehtësuar kuptimin e lexuesit të roleve të tyre përkatëse në nxjerrjen e të dhënave.
- Objectives
-
This course aims to: • Familiarize students with data types. • To equip students with the different techniques and ways of analyzing large amounts of data. • To provide students with proper skills to implement data preprocessing methods. • To explain the importance and influence of Data Mining in in field of computer science for finding valuable information in Big Data. • To develop students' critical thinking in analyzing and finding patterns in multi-dimensional data.
- Java
- Tema
- 1
- Introduction to Data Mining This topic provides an overview of the course, and it will cover topics such as what is Data Mining, the origin and reason for the development of Data Mining as well as the main tasks that can be performed by using Data Mining techniques and methods. (Main Lit., pg. 21-42)
- 2
- Types of Data This lecture covers topics such as attribute types, measurement and values, data categorization and transformation, data sets and their types, as well as data quality analysis due to data measurement and data collection problems. (Main Lit., pg. 43-69)
- 3
- Data Preprocessing - 1 In this lecture, different similarity and distance measures will be covered. The main types of distances that will be treated are Euclidean Distance, Minkoski Distance, Mahalanobis Distance, the main types of Similarities that will be covered are Similarity between binary vectors, Cosinusoidal Similarity and Pearson Correlation. (Main Lit., pg. 91-110)
- 4
- Data Preprocessing - 2 This lecture will cover data preprocessing techniques such as Aggregation, Sampling, Dimensionality Reduction, Feature Subset Selection, New Feature Creation, Discretization, Binarization, Variable Transformation as well as measurement units based on information. (Main Lit., pg. 70-90)
- 5
- Data Exploration - 1 This lecture will cover the basic elements of data summary statistics, such as types of means, types of distributions, various measures of similarity and dissimilarity between different data objects, types of proximity units, mutual information and the use of proper techniques in selecting the appropriate measurement unit. (Recommended Lit., pg. 44 – 55)
- 6
- Data Exploration - 2 This lecture will cover the basic elements of data summary statistics, such as types of means, types of distributions, various measures of similarity and dissimilarity between different data objects, types of proximity units, mutual information and the use of proper techniques in selecting the appropriate measurement unit. (Recommended Lit., pg. 56 – 64 and Main Lit., pg. 110-132)
- 7
- Classification: Basic Concepts and Techniques - 1 This lecture will cover the basic concepts of classification, multiclass and binary classification, general approaches for building a classification model, basic methods for presenting test conditions, the calculation of different impurity measures for different data types, and basic classification algorithms. (Main Lit., pg. 133-167)
- 8
- Midterm Exam
- 9
- Classification: Basic Concepts and Techniques - 2 This lecture will cover the topics of overfitting in the selected classification model, evaluation and selection of different classification models, hyper parameters and limitations of the basic classification algorithms. (Main Lit., pg. 167-212)
- 10
- Association Rules: Basic Concepts and Algorithms This lecture will cover the basic concepts and algorithms of association rules such as the Apriori principle, frequent itemset generation, candidate generation and pruning, techniques and methods for generation of association rules as well as the computational complexity of the basic association rules algorithms. (Main Lit., pg. 213-239)
- 11
- Rules of Association: Issues in model selection and evaluation This lecture will cover topics such as compact representation of frequent itemset, alternative methods for generating frequent itemset, FP-growth algorithm, evaluation of association patterns and the effect of skewed support distribution. (Main Lit., pg. 240-306)
- 12
- Cluster Analysis: Basic Concepts and Algorithms This lecture will cover the basic concepts and algorithms of cluster analysis such as what is Cluster Analysis, the different types of clustering methods, the different types of clusters as well as a detailed analysis of the K-means algorithm. (Main Lit., pg. 307-335)
- 13
- Cluster Analysis: Issues in Model Selection and Evaluation This lecture will cover the basic concepts and algorithms of cluster analysis such as hierarchical agglomerative clustering, detailed analysis of the DBSCAN algorithm, as well as different techniques and methods for cluster evaluation. (Main Lit., pg. 336-394)
- 14
- Classification: Alternative Techniques This lecture will cover topics such as types of classifiers, rule based classifiers, nearest neighbor classifiers, Naïve Bayes classifiers, Logistic Regression, Artificial Neural Network and Support Vector Machine. (Main Lit., pg. 395-463, 478 - 498)
- 15
- Project Presentation and General Review
- 16
- Final Exam
- 1
- Studentët do të jenë të aftë të kuptojnë llojet e ndryshme të të dhënave.
- 2
- Studentët do të kenë njohuri mbi veçoritë dhe njësitë matëse statistikore të të dhënave.
- 3
- Studentët do të përvetësojnë konceptet më të rëndësishme në lidhje me modelet dhe algoritmet bazë për gjetjen e informacioneve me vlerë.
- 4
- Studentët do të jenë të aftë të kuptojnë rëndësinë e Data Mining në gjetjen e informacioneve me vlerë.
- 5
- Studentët do të jenë të gatshëm për implementuar në praktikë njohuritë bazë të dhëna.
- 6
- Studentët do të jenë të pajisur me njohuri të mjaftueshme teorike dhe praktike për të vijuar me lëndët e tjera pasardhëse.
- Quantity Percentage Total percent
- Midterms
- 1 30% 30%
- Quizzes
- 0 0% 0%
- Projects
- 0 0% 0%
- Term projects
- 1 30% 30%
- Laboratories
- 0 0% 0%
- Class participation
- 0 0% 0%
- Total term evaluation percent
- 60%
- Final exam percent
- 40%
- Total percent
- 100%
- Quantity Duration (hours) Total (hours)
- Course duration (including exam weeks)
- 16 4 64
- Off class study hours
- 14 4 56
- Duties
- 1 8 8
- Midterms
- 1 10 10
- Final exam
- 1 12 12
- Other
- 0 0 0
- Total workLoad
- 150
- Total workload / 25 (hours)
- 6.00
- ECTS
- 6.00