DATASCIENCE COURSE CONTENT

Who is a Data scientist?

What are business analytics / data analytics?

Analytics

Business analytics

Business intelligence (bi)

Why do organizations need bi?

Challenges of building bi solutions

Data warehousing

Users of business intelligence (bi)

What is predictive analytics?

Statistics

Artificial intelligence

Machine learning

Predictive analytics software

Predictive analytics data flow

Data science components

Prospects

R – pros & cons

R & other analytical products

Data science tools & technologies

Introduction to Data analytics

Origin of R

Downloading & installing R, Rstudio

Interface of r-

R components.

Data types,

Data structures

Data

Definition of data

Types of data

Raw data

Processed or transformed data

Information

Decision making/ Decision support system

Statistics: Making sense of data

Definition of data analysis

Variables

What are variables?

Variable types

Qualitative variables

Quantitative variables

Continuous variables

Categorical variables

Discrete versus continuous variable

Data types

Strings

Vector

Data frame

List

Factors

Vector

Vector Creation

p>Single Element Vector

Multiple Elements Vector

Using sequence (Seq.) operator

Using the c() function

Accessing Vector Elements

Vector Manipulation

Vector element recycling

Vector Element Sorting

Data Frame

Create Data Frame

Structure of the Data Frame

Summary of Data in Data Frame

Extract Data from Data Frame

Expand Data Frame

Data Frame Column Slice

Data Frame Row Slice

Merging data frames

List

Create a list containing strings, numbers, vectors and logical values.

Create a list containing a vector, a matrix and a list.

Accessing List Elements

Manipulating List Elements

Merging Lists

Converting List to Vector

Factors

Build Factors

Factors in Data Frame

Changing the Order of Levels

Generating Factor Levels

Array

Create Array

Naming Columns and Rows

Accessing Array Elements

Manipulating Array Elements

Calculations Across Array Elements

String

Creating a string

Valid Strings

Invalid Strings

String Manipulation

Formatting numbers & strings

Counting number of characters in a string

Changing the case

Extracting parts of a string

R data interfaces

R – csv files

R – excel files

R – binary files

R – xml files

R – json files

R – web data

R – database

Reading tabular data files

Writing data

Functions in R

Numeric functions

Character functions

Statistical probability functions

Other statistical functions

Other useful functions

Operators

Logical operators

Relational operators

Aggregations

Data Aggregation

Multiple Aggregations

4 control structure & functions

Debugging

Statistics

Uni Variate analysis

Measure of central tendency

Mean

Median

Mode

Dispersions techniques.

Range

IQR

Variance

Standard deviation

Distributions

Frequency distributions

Symmetric/ Asymmetric

Skewness

Kurtosys

Normal distribution.

Binomial distributions

Poisson distributions

Tests

Hypothesis

Chi- square test

T –test

F-test

Z- test

Annova

Bi / Multi Variate analysis

Correlation

Regression analysis

Regression Analysis

Linear regression models

Non linear regression models

Logistic regressiPossion Regression

Data science with R

Data mining

Analyzing the past

Predicting the future

Data exploration

Variable identification

Univariate analysis

Bi-variate analysis

Missing Value Treatment

Why missing value treatment is required?

Why data has missing values?

Which are the methods to treat missing value?

Techniques Of Outlier Detection And Treatment

What is an outlier?

What are the types of outliers?

What are the causes of outliers?

What is the impact of outliers on dataset?

How to detect outlier?

How to remove outlier?

The art of feature engineering

What is feature engineering?

What is the process of feature engineering?

What is variable transformation?

When should we use variable transformation?

What are feature variable creation and its benefits?

Data manipulation

What is Data manipulation?

Different ways to manipulate / treat data

List of packages for data manipulation

Working with packages for Data manipulation.

Data visualisation

How to create a scatter plot?

How to create a histogram?

How to create a bar chart?

How to create a stacked bar chart?

How to create a box plot?

How to create an area chart?

How to create a heat map?

How to create a correlogram?

How to plot a geographical map?

How to plot the entire data in a single command?

Machine learning

Introduction to Machine leaning.

Categories of machine learning algorithms

Supervised learning :

Unsupervised learning

Reinforcement learning

Classification

Regression

Classification vs regression

Clustering

Cluster analysis

What is cluster analysis?

Why clustering?

Similarity /dissimilarity

Similarity measurement

Dissimilarity measurement

Clustering classified

K means clustering

Process flow of k – means

Number of clusters k=?

Case study

K means clustering implementation

– Nearest Neighbour

Introduction

is knn algorithm?

How to select appropriate k value?

Calculating distance

Knn algorithm – pros and cons

Case study

Knn algorithm implementation

Regression

Linear regression

Logistic regression

Tree based models

What is a decision tree?

Types of decision tree

Decision trees terminology

Advantages:

Disadvantages:

Decision tree algorithms

How does it work?

Case study

Implementation

Ensemble methods of trees based models

Random forest

What is random forest?

How does it work?

Advantages of random forest

Disadvantages of random forest

Case study

Random forest implementation

Bagging

What is bagging?

How does it work?

Working with gbm in r

Case study

Implementation

Boosting

What is boosting?

How does it work?

Working with xgboost in r

Deep Learning And Neural Network

Introduction

Artificial neural networks

What is a neural network?

What is a Deep Learning?

How a single neuron works?

Why multi-layer networks are useful?

General structure of a neural network back-propagation

Support Vector Machines

Overview

Maximum Margin Classifier ◦What is a Hyperplane?

Classification Using a Separating Hyperplane

The Maximal Margin Classifier ◦Non-separable Case

Support Vector Classifiers ◦Details

Support Vector Machines ◦Classification with non-linear boundaries

The SVM

SVMs with More than Two Classes ◦One-Vs-One Classification

One-Vs-All Classification

Model evaluations

Mean Squared Error

K fold cross validation

Text analytics

Natural language processing.

Text mining

Sentiment analysis

Social network Analysis

PCA

Introduction to PCA

What is Principal Component Analysis?

What are principal components?

Why is normalization of variables necessary?

PCA run with Unscaled and scaled predictors).

Implement PCA in R

Association rule mining

Market basket analysis – concepts

Lift

Support

Confidence

Implement & inspect Rules

Time series analysis

Importance

What is time series analysis in business analytics

how is implemented in business analytics

Forecasting in business analytics

Time series forecasting in r

Time series decomposition in r

Time series best practices.

Data science in python

Basics of python for data analysis ◦

Why learn python for data analysis?

Python 2.7 v/s 3.4

How to install python?

Running a few simple programs in python

Python libraries and data structures

Python data structures

Python iteration and conditional constructs

Python libraries

3. Exploratory analysis in python using pandas

Introduction to series and data frames

DATA MUGGING IN PYTHON USING PANDAS

Pandas for Data Wrangling

Overview

Reading data

Exploration

GroupBy

Plotting

Advanced Indexing

Categorical Data

Building a predictive models by using machine learning algorithms in python

Linear Regression

Logistic regression

Decision tree

Random forest

Boosting

Dictionary

Creating a Dictionary

Accessing Values in Dictionary:

Updating Dictionary

Delete Dictionary Elements

Properties of Dictionary Keys

Tuples

Creating a Tuples

Accessing Values in Tuples:

Updating Tuples

Delete Tuple Elements

Basic Tuples Operations

Strings

Creating a Strings

Accessing Values in Strings

Updating Strings

List

Creating a list

Accessing Values in Lists

Updating Lists

Delete List Elements

Datawarehouse concepts:

Introduction to OLTP

Introduction to DWH/OLAP

Reporting fundamentals

Differences OLTP/OLAP

DWH detailed

Dimensional modeling

Star schema

Snow flake schema

Fact constellation schema

Dimensions

Fact tables

Data modeling

Relational modeling – normalized schema

Data visualization with Tableau

Tableau – Desktop

Overview

Tableau products

Why visualization?

Getting started:

Tableau workspace

VData window

Toolbar:

Cards & shelves

Workbook

Creating dashboards

Connecting to data

Data sources to connect to tableau

open data source

Developing a sample worksheet

Using of show me

Show me with many fields

Save your work

Joining multiple tables

Copying and pasting formatting

Creating an extract

Analysis

Visualization charts

Types of charts in tableau

Bar chart

Heat map

Scatter plot

Building a map view

Map options

Pie charts

Analyzing

Sorting & grouping

Sorting specific fields

Grouping

Creating groups

Aliases

Filtering

Quick filters

Text table

Drilling and drill through

Trend lines and statistics

Formatting

Annotations and marks labels

Point annotations

Area annotations

Titles

Captions

Calculated fields

Calculated fields

How to create calculated field

String functions

Date functions

Logical functions

Aggregate functions

Building workbooks & interactive dashboards

Create workbooks

Dashboards

Creating dashboards

Adding sheets to the dashboards

Adding dashboard objects

Sharing saving workbooks to Tableau public

Publish as pdf

Tableau with R