# Data Science ### Statistics is Simple, if we ask the right questions!

Statistics is the science of collecting, classifying, summarizing, organizing, analyzing, and interpreting data. The Statistics for Data Science course is designed to introduce you to the principles of statistical methods and procedures used in data science. After completing this course you will have practical knowledge of crucial topics in statistics including – data gathering, summarizing data using descriptive statistics, displaying and visualizing data, examining relationships between variables, probability distributions, expected values, hypothesis testing, introduction to ANOVA (analysis of variance), regression and correlation analysis. You will take a hands-on approach to statistical analysis using Python and Jupyter Notebooks – the tools of choice for Data Scientists and Data Analysts. At the end of the course, you will complete a project to apply various concepts in the course to a Data Science problem involving a real-life inspired scenario and demonstrate an understanding of the foundational statistical thinking and reasoning.

## Course Syllabus

#### Part I. Basics

1. Descriptive Statistics
2. Inferential Statistics
3. Predictive Statistics

#### Part II. Univariate Statistics

1. Categorical variables – Statistics & Visualization
2. Numerical variables – Statistics & Visualization

#### Part III. Bivariate Statistics

1. Categorical-Categorical variables – Statistics & Visualization
2. Numerical-Numerical variables – Statistics & Visualization
3. Categorical-Numerical variables – Statistics & Visualization

#### Part IV. Multivariate Statistics

1. Principal Component Analysis (PCA)
2. t-distributed Stochastic Neighbor Embedding (t-SNE)
3. Multivariate Data Visualization

3

#### Textbooks

1. Practical Statistics for Data Scientists, Peter Bruce, Andrew Bruce, Peter Gedeck, O’Reilly Media , 2020.
2. Statistics for Data Science, James D. Miller, Packt Publishing, 2017.

#### Expected Work

1. Quizzes (10%)
2. Projects (40%)
3. Midterm Exam (20%)
4. Final Exam (30%)

### Learning Goals

The focus is on developing a clear understanding of the different approaches for different data types, developing an intuitive understanding, making appropriate assessments of the proposed methods, using R or Python to analyze our data, and interpreting the output accurately. This course helps you to start your journey in data and statistics-driven roles such as Data Scientists, Data Analysts, Business Analysts, Statisticians, and Researchers.