- What is Data Science
- How it is different from Big Data and Data Analytics
- Data Driven decision making
- Purpose and Business problems
- How Data Scientist work
- Skills of a data scientist
- Different sectors using Data science
- Real World Applications
- Future of AI and how the world is changing
- Introduction to Statistics
- Statistical and Non-Statistical Analysis
- Major categories of statistics – Frequency and Bayesian
- Difference between Statistics and Probabilities
- Statistical terms
- Difference between Descriptive Statistics and Inferential Statistics
- Understanding of Population and Samples
- Descriptive Statistics
- Inferential Statistics
- Central Limit Theorem
- Types of variables
- Nominal/Categorical
- Ordinal
- Interval/Ratio
- Continuous, Time Series
- Central Tendency
- Mean
- Median
- Mode
- Measure of Statistical dispersions
- Variance and Bessel correction
- Standard Deviation
- Standard Error
- Margin of Error
- IQR
- Range
- Mean absolute difference
- median absolute deviation
- Coefficient of variance
- Skewness
- Law of Large Numbers
- Confidence Level & Interval
- P value and its interpretation
- Correlation and auto correlation & correlation matrix
- Correlation ratio
- Sampling Techniques
- Sampling errors
- Sample size estimation
- Point estimation & margin of error
- Multi Collinearity
- Co-variance and correlation
- P- value and critical value approach
- T-Distribution and T-Statistics
- Hypothesis testing’s
- What is Hypothesis Testing
- Different types of Errors (Type I and Type II Errors)
- Z-test
- T-test
- Chi-square test
- ANOVA (one way and two way)
- F-test & f score
- P-Value & Significance Level
- Probability
- Venn diagram
- counting (permutation & combination)
- Expectation
- Rules of Probabilities
- Bayesian Network
- Random Variables and Expected Values
- Bayes theorem
- Maximum likelihood estimation
- Probability Distributions
- Continuous Distributions- (Normal, uniform, T, F, chi square)
- “Discrete Distributions- (Bernoulli, binomial, Poisson)
- Empirical Rules with Z- Score
- Why python for data analysis
- how to install Anaconda
- Running few simple programs using python
- “Python objects
- Lists
- Strings
- Tuples
- Dictionaries”
- Arrays, Data frames in python
- “Python Libraries
- NumPy
- SciPy
- Matplotlib
- Pandas
- Scikit Learn
- Seaborn
- regular expressions
- Introduction to Series and Data frames
- Math functions
- User defined Functions
- Parameters and arguments of functions
- Recursive function and its examples
- “Conditionals in python
- If loop
- elif
- if elif else”
- “Loops in python
- for loop
- while loop”
- Introduction to pandas
- Broadcasting in Python
- Array shape manipulations
- Data structures in pandas
- Series
- Data frame
- Panel”
- “Various Data Frame Operations
- Selection
- Deletion etc.
- “Grouping, Merging, and Reshaping of Data
- Creating matrixes using NumPy
- Statistical operators using NumPy
- Basics of data categorization and different formats of data
- Structured Data
- Unstructured Data
- Time Series
- Why and how to raise the right question
- Correlation is not the causation and its importance
- Limitations as a data scientist
- Transformation of intuition-based decision making to data driven
- Story Telling
- Data Analytics Process
- Exploratory Data Analysis(EDA)
- How to start with Data Analytics Project
- Intro to Web Scrapping and Beautiful Soup
- Supervised Learning
- Unsupervised Learning
- Difference between Classification and Regression
- Data pre-processing
- What is data set.
- What is training set
- What is test set and need for test set
- Expectation-Maximization technique for missing value
- using Gradient
- Feature scaling
- binning
- one hot encoding
- Feature engineering
- Outliers treatment
- Bias and Variance trade off
- Over fitting and Under fitting
- Exploratory Data analysis(EDA)
- Univariate analysis
- Bivariate Analysis
- Feature Engineering
- Variable transformation
- Variable /Feature Creation
- Project
- Supervised Regression Algorithms
- Simple Linear Regression
- Multiple Linear Regression
- Ordinary Least Square(OLS)
- Decision tree Regression
- Random Forest Regression
- GLM (Poisson regression, spline)
- Support Vector Machines Regression
- Error and Accuracy
- Gradient Descent
- Regularization Techniques
- Maximum Likelihood estimation(MLE)
- Probabilistic diagnosis of outliers
- L2 and L1 Norms
- Ridge Regression
- Lasso Regression and ElasticNet
- Project
- Supervised Classification Algorithms
- Logistic regression classification
- Multiclass Classification using Logistic Regression
- Decision tree Classification
- Random Forest classification
- Support Vector Machines classification
- What is Naïve Bayes theorem and the limitation
- Naïve Bayes Classification
- Ada boost/ Adaptive – Boosting Algorithm
- GBM
- Probability in Classification
- Creating the log loss formula with entropy
- Softmax Function
- MLE in classification
- Understanding the Neural Networks
- SVM
- Gradient Boosting
- XG Boost (Extreme Gradient Boosting)
- Project
- Unsupervised Algorithms
- K-means Clustering
- Hierarchical clustering
- Association Rule Mining
- KNN Classifier
- PCA
- Project
- Model Evaluation Metrics
- ROC Curves
- Confusion matrix
- Accuracy
- Recall & Precision
- Specificity & Sensitivity
- Receiver Operating Characteristic (ROC) curve
- Area Under Curve (AUC)
- F1-Score
- AIC & BIC Scores
- R squared & Adjusted R squared
- RMSE, MSE
- Model selection Techniques
- Cross validation
- Boot strap
- Model selection using Statistical tests
- Grid search
- Evaluation Matrix
- Natural Language Processing (NLP)
- What is NLP
- Cleaning Text
- Tokenization
- Term Frequency (TF)
- Term Frequency – Inverse Document Frequency (TF-IDF)
- Document Term Matrix
- AI and Deep Learning
- Introduction to Deep Learning and Neural Networks
- Introduction to Linear Algebra
- Artificial Neural Networks
- Activation Functions
- Back Propogation
- Chain Rule of Differentiation
- Vanishing Gradient Descent
- Exploding Gradient Descent
- Drop Out Layers in Multi Neural Network
- Deep Learning-Activation Functions-Elu, PRelu,Softmax,Swish And Softplus
- Weight Initialization Techniques
- Gradient Descent vs Stochastic Gradient Descent
- AdaGrad Optimizers
- Hyper Parameter Tuning
- CNN
- CNN vs ANN
- LSTM
- Bi-LSTM
- Generative AI
- Introduction to Generative AI
- Introduction to Langchain
- Memory in Langchain
- Introduction to Vector Database for AI &Large Language Models (LLM)