Skip to main content

Capstone

for

Data Science

How to Become a Solution Architect

You learn data science by doing data science. The capstone project class will allow students to define, design, develop and deploy data science projects  that can be used to show your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.

You learn how to integrate and apply your knowledge and skills gained during the program by engaging in a series of hands-on projects supported by our industry partners to build a full data science pipeline from understanding the problem, preparing data , analyzing and visualizing data to building and evaluation models. Moreover,  you also learn the importance of A/B testing in model deployment. Finally, you learn how effectively to communicate and present the insights and recommendations derived from data mining using visualization and storytelling techniques.

Course Syllabus

Part I. Problem Definition

  1. Review the data science 6-step life cycle
  2. Understanding and defining the business question
  3. Understanding and defining the success criteria
  4. How to become a good solution architect

Part II. Data Preparation

  1. Extraction, Transformation and Loading (ETL)
  2. Data Cleansing
  3. Data Wrangling
  4. How to become a good DBA (Database Administrator)

Part III. Data Exploration

  1. Learn to choose the best set of univariate statistical and visualization methods
  2. Learn to choose the best set of bivariate statistical and visualization methods
  3. Learn to choose the best set of multivariate statistical and visualization methods

Part IV. Predictive Modeling

  1. Building and optimizing linear classification and regression models
  2. Building and optimizing non-linear classification and regression models
  3. Building and optimizing deep learning classification and regression models

Part V. Model Evaluation

  1. Classification models evaluation in action
  2. Regression models evaluation in action
  3. Clustering models evaluation in action

Part V. Model Deployment

  1. A/B Testing
  2. Model deployment using Python
  3. Model deployment using R

Credits

3

Textbooks

  1. Data Science Solutions with Python, Tshepo Chris Nokeri, Apress, 2022
  2. Practical Data Science with R, Nina Zumel, 2019.

Expected Work

  1. Presentation (40%)
  2. Final Exam (60%)

Learning Goals

Students enrolled in this class will be prepared to contribute to a rapidly changing field by acquiring a thorough grounding in the core principles and foundations of relational and NoSQL database systems. They will also acquire a deeper understanding on (elective) topics of more specialized interest, and be able to critically review, assess, and communicate current developments in the field.

Onsite

€100 / hour

€500 / day

Online

€60 / hour

€300 / day