All you need to know about data science

Published:

Statistics and Math

Some Important Regression Model

OLS(Ordinary Least Square Models)

It is the traditional Least Square Models

WLS(Weighted Least Square Models)

It has an assumption that the nearer data has more impact on the data to be predicted.

Regularized Regression
Fixed Effect

There are relationships between k and x in y=kx+b

Random Effect

There are no relationship between k and x in y=kx+b

Time Series

Bayesian Theorem and Bayesian Estimation

Tha Basic Knowledge about Statistics

Confidence Analysise

Let T represents the confidence area, (1-T) is the probability to reject the hypothesis H0, if the guesser trusts the hypothesis H0, then he can make T higher.

The Value of P

If p > 1-T, then accept H0

The Design of Experiment

Calculate the chi-square, check it out in the chi-square table.

Power Analysis
Hypothesis Testing

The Meassures for Choosing Variables and Dimensionality Reduction

PCA

It is a method to reduce the dimension.

Regularized Regression

Sampling Method (The Steps and the Reason)

Random Sampling
Stratified Sampling

Divide the whole system into different layers, the elements at different layers should be as different as possible, the element in the same layer should be as similar as possible

MCMC

Sampling to get the original distribution, there is a method using accepting probability

Optimization

Machine Learning

Algorithms for Classifying

Desicion Tree / Regression Tree
Random Forest
Logistic Regression

When classifying the class, Use the Logistic Function: g(z) = 1/(1+e^{-z})

XGBoost

Use a large tree to increase the features

Algorithms for Clustering

K-means
Hierarchical

Asset Metrics

R squared
AUC
Sensitivity VS Specificity

Cross Validation

Programming (The details will be on the next post)

SQL

See it on another blog.

Python Pandas and Numpy

See it on another blog

Data Visualization

See it on another blog

Modularize and Productionize

See it on another blog.

Some Others