Data Science (a.k.a. Data Mining) is about explaining the past and predicting the future by means of data analysis. Data science is a multi-disciplinary field which combines statistics, machine learning, artificial intelligence and database technology. The value of data science applications is often estimated to be very high. Many businesses have stored large amounts of data over years of operation, and data science is able to extract very valuable knowledge from this data. The businesses are then able to leverage the extracted knowledge into more clients, more sales, and greater profits. This is also true in the engineering and medical fields.
An Introduction to
Data Science
.
Explain the Past and Predict the Future
Course Syllabus
Part I. Problem Definition
- Data, Database, Data Science
- Data Science 6-Step
Part II. Data Preparation
- Extraction, Loading and Transformation (ETL)
- Data Cleaning and Wrangling
- Data Labeling
Part III. Data Exploration
- Univariate Statistics and Visualization
- Bivariate Statistics and Visualization
- Principal Component Analysis (PCA)
Part IV. Predictive Modeling
- Classification Models
- Regression Models
- Clustering Models
- Association Rules
Part V. Model Evaluation
- Evaluating Classification Models
- Evaluating Regression Models
- Evaluating Clustering Models
Part VI. Model Deployment
- A/B Testing
Credits
3
Textbooks
- An Introduction to Data Science, Saed Sayad, online book, 2010-2023.
- Introduction to Data Science, Rafael A. Irizarry, Chapman and Hall/CRC, 2019.
Expected Work
- Quizzes (10%)
- Projects (40%)
- Midterm Exam (20%)
- Final Exam (30%)
Learning Goals
This course will increase your marketability in the fast-paced data science industry. With an extensive theorithical knowledge and practoical experience of these in-demand technical skills, as well as the soft skills (e.g., project management) employers seek, you will be prepared to apply your data science top positions.