Hypothesis Testing
What is hypothesis testing ?
It’s a statistical method that is used in making
statistical decisions using experimental data. Hypothesis Testing is basically
an assumption that we make about the population parameter
Ex : you say avg student in class is 40 or a boy is
taller than girls.
all those example we assume need some statistic way to prove those. we
need some mathematical conclusion what ever we are assuming is true.
1. Why do we use it ?
A hypothesis test evaluates two
mutually exclusive statements about a population to determine which statement
is best supported by the sample data. When we say that a
finding is statistically significant, it’s thanks to a hypothesis test.
What are Basics of hypothesis?
The basic of hypothesis is normalisation and standard
normalisation. all our hypothesis is revolve around
basic of these 2 terms.
concept of z-score comes in picture when we use standardised
normal data.
Normal
Distribution-
A variable is said to be normally distributed or have
a normal distribution if its distribution has
the shape of a normal curve — a special bell-shaped curve
1. The mean, median, and mode are equal.

Standardised
Normal Distribution —
A standard normal distribution is a normal
distribution with mean 0 and standard deviation 1

Which
are important parameter of hypothesis testing ?
Null hypothesis
:- In inferential statistics, the null
hypothesis is a general statement or default position that there is no
relationship between two measured phenomena, or no association among groups
In other
words it is a basic assumption or made based on domain or problem knowledge.
Example : a
company production is = 50 unit/per day etc.
Alternative
hypothesis :-The
alternative hypothesis is the hypothesis used in hypothesis testing
that is contrary to the null hypothesis. It is usually taken to be that the
observations are the result of a real effect (with some amount of chance
variation superposed)
Example : a company production is !=50 unit/per day etc.
Level of significance: Refers to the degree of
significance in which we accept or reject the null-hypothesis. 100% accuracy is
not possible for accepting or rejecting a hypothesis, so we therefore select a
level of significance that is usually 5%.
This is normally denoted with
alpha(maths symbol ) and generally it is 0.05 or 5% , which means your output
should be 95% confident to give similar kind of result in each sample.
Type I error: When we reject the null
hypothesis, although that hypothesis was true. Type I error is denoted by
alpha. In hypothesis testing, the normal curve that shows the critical region
is called the alpha region
Type II errors: When we accept the null hypothesis
but it is false. Type II errors are denoted by beta. In Hypothesis testing, the
normal curve that shows the acceptance region is called the beta region.
One tailed test :- A test of a statistical hypothesis , where the region of
rejection is on only one side of the sampling distribution , is
called a one-tailed test.
Example :- a college has ≥ 4000 student
or data science ≤ 80% org adopted.
Two-tailed test :- A two-tailed test is a statistical test in
which the critical area of a distribution is two-sided and tests whether
a sample is greater than or less than a certain range of values. If the sample
being tested falls into either of the critical areas, the alternative
hypothesis is accepted instead of the null hypothesis.
Example : a college != 4000 student or
data science != 80% org adopted
P-value :- The P value, or calculated probability, is the probability
of finding the observed, or more extreme, results when the null hypothesis (H
0) of a study question is true — the definition of ‘extreme’
depends on how the hypothesis is being tested.
If your P value is less than the chosen
significance level then you reject the null hypothesis i.e. accept that your
sample gives reasonable evidence to support the alternative hypothesis. It does
NOT imply a “meaningful” or “important” difference; that is for you to decide
when considering the real-world relevance of your result.
Example : you have a coin and you don’t
know whether that is fair or tricky so let’s decide null and alternate
hypothesis
H0 :
a coin is a fair coin.
H1 :
a coin is a tricky coin. and alpha = 5% or 0.05
Now let’s toss the coin and
calculate p- value ( probability value).
Toss a coin 1st time and result is tail-
P-value = 50% (as head and tail have equal probability)
Toss a coin 2nd time and result is tail,
now p-value = 50/2 = 25%
and similarly we Toss 6 consecutive time
and got result as P-value = 1.5% but we set our significance
level as 95% means 5% error rate we allow and here we see we are beyond that
level i.e. our null- hypothesis does not hold good so we need to reject and
propose that this coin is a tricky coin which is actually.
Degree
of freedom :- Now imagine you’re not into
hats. You’re into data analysis.You have a data set with 10 values. If you’re
not estimating anything, each value can take on any number, right? Each value
is completely free to vary.But suppose you want to test the population mean
with a sample of 10 values, using a 1-sample t test. You now have a constraint
— the estimation of the mean. What is that constraint, exactly? By definition
of the mean, the following relationship must hold: The sum of all values in the
data must equal n x mean, where n is the
number of values in the data set.
So if a data set has 10 values, the sum
of the 10 values must equal the mean x 10. If the mean of the
10 values is 3.5 (you could pick any number), this constraint requires that the
sum of the 10 values must equal 10 x 3.5 = 35.
With that constraint, the first value in
the data set is free to vary. Whatever value it is, it’s still possible for the
sum of all 10 numbers to have a value of 35. The second value is also free to
vary, because whatever value you choose, it still allows for the possibility
that the sum of all the values is 35.
Degree of freedom =( n-1)


Comments
Post a Comment