Inferential Statistics and Hypothesis Testing
Categories: IBM Machine Learning
Updated:
Estimation and InferencePermalink
Estimation is the application of an algorithm,
to estimate parameter, e.g. mean, variance, etc.
Inference involves putting an accuracy on the
estimated value ? Statistical significancy
Machine Learning and Statistical inference are similar. ML uses
data to learn/infer qualities of a distirbution that generated
the data, which is data-generating process.\
CodesPermalink
sns.barplot(x="variable", y="value", data=df)
sns.barplot(y, x=pd.cut(df.variable, bins=#), data=df)
pairplot = data[['x', 'y', 'z']]
sns.pairplot(pairplot, hue = "variable")
sns.jointplot(x="x", y="y", data=df, kind='hex') # hexbin plot
Parametric vs Non-parametric Permalink
Non-parametric is creating a distribution(CDF) of the data using a histogram.
Parametric:Permalink
Parametric model is a prticular type of statistical model. e.g.)
Nomal distribution. Customer
lifetime value (CLV) is a parametric model.\
Maximum Likelihood Estimation (MLE) Permalink
likelihood function is related to probability and is a function of the parameters of the model
Frequentist vs BayesianPermalink
FrequentistPermalink
frequentist is concerened with repeated observations in the limit. Processes may have true frequencies, but we focus on repetition of experiment.
- Derive the probabilistic property of a procedure
-
Apply the probability directly to the observed data
BayesianPermalink
Bayesian describes parameters by orobability distributions. Prior distribution is formulated, this prior is updated after seeing data into posterior distbution.
Hypothesis testingPermalink
Hypothesis is a statement about a population parameter
- null hypothesis: and alternative hypothesis:
- p-value: In Bayesian inference, we don’t get decision boundary.
Bayesian interpretationPermalink
Given Priors
Then by Bayes’ Rule, likelihood ratio is defined as below.
Likelihood ratio tells how we should update the priors in reation to seeing a given set of data.
Types of ErrorPermalink
Neyman-Pearson paradigm (1993) Permalink
non-bayesian inference
Accept | Reject | |
---|---|---|
Correct | Type 1 Error | |
Type 2 Error | Correct |
Power of a test: 1 - P(Type 1 Error)
TerminologyPermalink
test statistics, rejeciton region, acceptance region, null distribution
Leave a comment