Data Science Course - Indic Technologies

Module - 1 : Introduction to data science

What is Data Science
How it is different from Big Data and Data Analytics
Data Driven decision making
Purpose and Business problems
How Data Scientist work
Skills of a data scientist
Different sectors using Data science
Real World Applications
Future of AI and how the world is changing

Module - 2 : Statistics

Introduction to Statistics
- Statistical and Non-Statistical Analysis
- Major categories of statistics – Frequency and Bayesian
- Difference between Statistics and Probabilities
- Statistical terms
- Difference between Descriptive Statistics and Inferential Statistics
- Understanding of Population and Samples
Descriptive Statistics
Inferential Statistics
Central Limit Theorem
Types of variables
- Nominal/Categorical
- Ordinal
- Interval/Ratio
- Continuous, Time Series
Central Tendency
Mean
Median
Mode
Measure of Statistical dispersions
Variance and Bessel correction
Standard Deviation
Standard Error
Margin of Error
IQR
Range
Mean absolute difference
median absolute deviation
Coefficient of variance
Skewness
Law of Large Numbers
Confidence Level & Interval
P value and its interpretation
Correlation and auto correlation & correlation matrix
Correlation ratio

Sampling Techniques
Sampling errors
Sample size estimation
Point estimation & margin of error
Multi Collinearity
Co-variance and correlation
P- value and critical value approach
T-Distribution and T-Statistics
Hypothesis testing’s
What is Hypothesis Testing
Different types of Errors (Type I and Type II Errors)
Z-test
T-test
Chi-square test
ANOVA (one way and two way)
F-test & f score
P-Value & Significance Level

Module - 3 : Probability

Probability
Venn diagram
counting (permutation & combination)
Expectation
Rules of Probabilities
Bayesian Network
Random Variables and Expected Values
Bayes theorem
Maximum likelihood estimation
Probability Distributions
- Continuous Distributions- (Normal, uniform, T, F, chi square)
- “Discrete Distributions- (Bernoulli, binomial, Poisson)
- Empirical Rules with Z- Score

Module - 4 : Python

Why python for data analysis
how to install Anaconda
Running few simple programs using python
“Python objects
- Lists
- Strings
- Tuples
- Dictionaries”
- Arrays, Data frames in python
“Python Libraries
- NumPy
- SciPy
- Matplotlib
- Pandas
- Scikit Learn
- Seaborn
- regular expressions
Introduction to Series and Data frames
Math functions
User defined Functions
Parameters and arguments of functions
Recursive function and its examples
“Conditionals in python
- If loop
- elif
- if elif else”
- “Loops in python
- for loop
- while loop”
Introduction to pandas
Broadcasting in Python
Array shape manipulations
Data structures in pandas
- Series
- Data frame
- Panel”
“Various Data Frame Operations
- Selection
- Deletion etc.
- “Grouping, Merging, and Reshaping of Data
Creating matrixes using NumPy
Statistical operators using NumPy

Module - 5 : Data and Data Science Thinking

Basics of data categorization and different formats of data
- Structured Data
- Unstructured Data
- Time Series
Why and how to raise the right question
Correlation is not the causation and its importance
Limitations as a data scientist
Transformation of intuition-based decision making to data driven
Story Telling

Module - 6 : Data Analytics Overview

Data Analytics Process
Exploratory Data Analysis(EDA)
How to start with Data Analytics Project
Intro to Web Scrapping and Beautiful Soup

Module - 7 : Machine Learning, Data Science and Artificial Intelligence

Supervised Learning
Unsupervised Learning
Difference between Classification and Regression
Data pre-processing
What is data set.
What is training set
What is test set and need for test set
Expectation-Maximization technique for missing value
using Gradient
Feature scaling
binning
one hot encoding
Feature engineering
Outliers treatment
Bias and Variance trade off
Over fitting and Under fitting
Exploratory Data analysis(EDA)
- Univariate analysis
- Bivariate Analysis
- Feature Engineering
- Variable transformation
- Variable /Feature Creation
- Project
Supervised Regression Algorithms
- Simple Linear Regression
- Multiple Linear Regression
- Ordinary Least Square(OLS)
- Decision tree Regression
- Random Forest Regression
- GLM (Poisson regression, spline)
- Support Vector Machines Regression
- Error and Accuracy
- Gradient Descent
- Regularization Techniques
- Maximum Likelihood estimation(MLE)
- Probabilistic diagnosis of outliers
- L2 and L1 Norms
- Ridge Regression
- Lasso Regression and ElasticNet
- Project
Supervised Classification Algorithms
- Logistic regression classification
- Multiclass Classification using Logistic Regression
- Decision tree Classification
- Random Forest classification
- Support Vector Machines classification
- What is Naïve Bayes theorem and the limitation
- Naïve Bayes Classification
- Ada boost/ Adaptive – Boosting Algorithm
- GBM
- Probability in Classification
- Creating the log loss formula with entropy
- Softmax Function
- MLE in classification
- Understanding the Neural Networks
- SVM
- Gradient Boosting
- XG Boost (Extreme Gradient Boosting)
- Project
Unsupervised Algorithms
K-means Clustering
Hierarchical clustering
Association Rule Mining
KNN Classifier
PCA
Project
Model Evaluation Metrics
ROC Curves
Confusion matrix
Accuracy
Recall & Precision
Specificity & Sensitivity
Receiver Operating Characteristic (ROC) curve
Area Under Curve (AUC)
F1-Score
AIC & BIC Scores
R squared & Adjusted R squared
RMSE, MSE
Model selection Techniques
Cross validation
Boot strap
Model selection using Statistical tests
Grid search
Evaluation Matrix
Natural Language Processing (NLP)
- What is NLP
- Cleaning Text
- Tokenization
- Term Frequency (TF)
- Term Frequency – Inverse Document Frequency (TF-IDF)
- Document Term Matrix
AI and Deep Learning
- Introduction to Deep Learning and Neural Networks
- Introduction to Linear Algebra
- Artificial Neural Networks
- Activation Functions
- Back Propogation
- Chain Rule of Differentiation
- Vanishing Gradient Descent
- Exploding Gradient Descent
- Drop Out Layers in Multi Neural Network
- Deep Learning-Activation Functions-Elu, PRelu,Softmax,Swish And Softplus
- Weight Initialization Techniques
- Gradient Descent vs Stochastic Gradient Descent
- AdaGrad Optimizers
- Hyper Parameter Tuning
- CNN
- CNN vs ANN
- LSTM
- Bi-LSTM
Generative AI
- Introduction to Generative AI
- Introduction to Langchain
- Memory in Langchain
- Introduction to Vector Database for AI &Large Language Models (LLM)