All you need to know about data science
Published:
Statistics and Math
Some Important Regression Model
OLS(Ordinary Least Square Models)
It is the traditional Least Square Models
WLS(Weighted Least Square Models)
It has an assumption that the nearer data has more impact on the data to be predicted.
Regularized Regression
Fixed Effect
There are relationships between k and x in y=kx+b
Random Effect
There are no relationship between k and x in y=kx+b
Time Series
Bayesian Theorem and Bayesian Estimation
Tha Basic Knowledge about Statistics
Confidence Analysise
Let T represents the confidence area, (1-T) is the probability to reject the hypothesis H0, if the guesser trusts the hypothesis H0, then he can make T higher.
The Value of P
If p > 1-T, then accept H0
The Design of Experiment
Calculate the chi-square, check it out in the chi-square table.
Power Analysis
Hypothesis Testing
The Meassures for Choosing Variables and Dimensionality Reduction
PCA
It is a method to reduce the dimension.
Regularized Regression
Sampling Method (The Steps and the Reason)
Random Sampling
Stratified Sampling
Divide the whole system into different layers, the elements at different layers should be as different as possible, the element in the same layer should be as similar as possible
MCMC
Sampling to get the original distribution, there is a method using accepting probability
Optimization
Machine Learning
Algorithms for Classifying
Desicion Tree / Regression Tree
Random Forest
Logistic Regression
When classifying the class, Use the Logistic Function: g(z) = 1/(1+e^{-z})
XGBoost
Use a large tree to increase the features
Algorithms for Clustering
K-means
Hierarchical
Asset Metrics
R squared
AUC
Sensitivity VS Specificity
Cross Validation
Programming (The details will be on the next post)
SQL
See it on another blog.
Python Pandas and Numpy
See it on another blog
Data Visualization
See it on another blog
Modularize and Productionize
See it on another blog.
