Inferential Statistics and Hypothesis Testing
Categories: IBM Machine Learning
Updated:
Estimation and Inference
Estimation is the application of an algorithm,
to estimate parameter, e.g. mean, variance, etc.
Inference involves putting an accuracy on the
estimated value ? Statistical significancy
Machine Learning and Statistical inference are similar. ML uses
data to learn/infer qualities of a distirbution that generated
the data, which is data-generating process.\
Codes
sns.barplot(x="variable", y="value", data=df)
sns.barplot(y, x=pd.cut(df.variable, bins=#), data=df)
pairplot = data[['x', 'y', 'z']]
sns.pairplot(pairplot, hue = "variable")
sns.jointplot(x="x", y="y", data=df, kind='hex') # hexbin plot
Parametric vs Non-parametric
Non-parametric is creating a distribution(CDF) of the data using a histogram.
Parametric:
Parametric model is a prticular type of statistical model. e.g.)
Nomal distribution. Customer
lifetime value (CLV) is a parametric model.\
Maximum Likelihood Estimation (MLE)
likelihood function is related to probability and is a function of the parameters of the model
Frequentist vs Bayesian
Frequentist
frequentist is concerened with repeated observations in the limit. Processes may have true frequencies, but we focus on repetition of experiment.
- Derive the probabilistic property of a procedure
-
Apply the probability directly to the observed data
Bayesian
Bayesian describes parameters by orobability distributions. Prior distribution is formulated, this prior is updated after seeing data into posterior distbution.
Hypothesis testing
Hypothesis is a statement about a population parameter
- null hypothesis: and alternative hypothesis:
- p-value: In Bayesian inference, we don’t get decision boundary.
Bayesian interpretation
Given Priors
Then by Bayes’ Rule, likelihood ratio is defined as below.
Likelihood ratio tells how we should update the priors in reation to seeing a given set of data.
Types of Error
Neyman-Pearson paradigm (1993)
non-bayesian inference
Accept | Reject | |
---|---|---|
Correct | Type 1 Error | |
Type 2 Error | Correct |
Power of a test: 1 - P(Type 1 Error)
Terminology
test statistics, rejeciton region, acceptance region, null distribution
Leave a comment