# Hypothesis Testing with Controlled Experiments: Computing P-value using Z-statistic

Google the word “experiment”, the answer returned is, “a scientific procedure undertaken to make a discovery, test a hypothesis, or demonstrate a known fact”

While “Experiment” is a broader term, a controlled experiment specifically is about testing impact of a single factor /variable while the other variables remain constant.

Confused? Don’t worry, read ahead. I will attempt to explain this using a scenario.

Imagine Saidulu is an ice-cream manufacturer who wants to increase the sales of his product “Kya toh bhi Ice-cream”.  His friend, Panthulu, suggests him to double the sugar content in ice-cream in order to achieve higher sales.

Will Saidulu go ahead and increase sugar in his ice-cream? That’s a bad business move. Fortunately Saidulu is smart, he conducts a controlled experiment. Voila!

How does he do that?

First, Saidulu tries to understand his customer segments. He narrows down attributes of his major consumer segment on the parameters like age, gender, income, geography etc.

Now he conducts his controlled experiment on 200 people who fit his customer parameters. He divides his customers into two uniform groups – control and experimental. “Control group” gets ice-cream with old recipe while “Experimental Group” gets ice-cream with exact same recipe except for one difference. Their ice-cream has double the amount of sugar!

Both controlled group and experimental group taste the ice-cream and provide their ratings on a scale of 1 to 5. This concludes the activity of experiment.

Saidulu took utmost care to make sure right people were selected for the experiment and they were divided to form homogeneous groups (Control and experimental). This means that data available from the experiment is more likely to yield reliable results. All Saidulu needs to do now is to compare the collected data and see if the increase in sugar has any impact on customer satisfaction.

The data from experiment shows us following

 Group Average Rating Standard Deviation Control Group – Ice cream with same amount of sugar 3.5 0.2 Experimental Group – Ice Cream with double the sugar 3.55 0.199

Now, what do you see? There is clearly a jump in the average rating from 3.5 to 3.55. Does that mean Saidulu can safely change his recipe to the one served to Experimental Group? Not just yet! What if the increase average rating is a pure coincidence? There is no way to know for sure, but there is way to determine the probability and it is called “Hypothesis Testing”

What exactly is Hypothesis Testing?

It is a statistical method of determining if a given statement is true is or not. And the statement itself is called a “Hypothesis”

How do I perform Hypothesis Testing?

You start performing “Hypothesis Testing” by designing two statements or Hypotheses – A “Null Hypothesis” and an “Alternate Hypothesis”

How to design a Null Hypothesis?

A null hypothesis states that the change in the variable did not impact the system at all. It is represented by symbol “H0” (H not)

In case of Saidulu the null hypothesis will be:

H0: Change in Sugar content did not impact the average rating of the population.

In other words, additional sugar added in ice-cream does not change people’s perception about the ice-cream. It was just a matter of chance that the average rating of Control Group was higher than the experimental group.

How to design an Alternate Hypothesis?

As you might have guessed already, an alternate hypothesis states that the change in variable had an impact on the system. It is represented by the symbol “H1” (H one)

H1: Change in Sugar content impacted the average rating of the population.

Now that we have defined our “Null Hypothesis” and “Alternate Hypothesis” time to perform our hypothesis testing. It’s simple, we will calculate the probability of our Null Hypothesis being true and this probability is what we call “P-value”.

If the “P value” or Probability of null hypothesis being true is low then the null hypothesis is false and alternate hypothesis (H1) is true. Which means that the change in variable had an impact on the system. But, if the “P value” is high then the null hypothesis (H0) is true. The change in variable had no impact. Whether the pvalue is high or low is determined by comparing it against a predetermined level α (alpha).

If p-value > α then the p-value is statistically significant and H0 is true.

If p-value < α then the p-value is not statistically significant and H0 is false. H1 is true.

Saidulu says his threshold is 5% which means

If p-value > 0.05 means his null hypothesis which says “increase in sugar had no impact on the rating” is true. The increase in average rating of experimental group was just by chance. It would have happened even if the same old ice-cream was served to them.

If p-value < 0.05 then the p-value is not statistically significant and H0 is false. H1 is true. The higher average rating seen in experimental group was not by chance. It was in fact due increase in sugar content in their ice-cream.

How do I compute the P-Value or P-statistic?

From here on, our discussion is going to get a little more technical in nature. As you know, P-value is probability of us finding that increased rating without increased sugar content having an impact on user satisfaction. This means, we have to assume that population parameters – mean, standard deviation – remained constant. Our P-value is represented by shaded region in below picture.

P-value = (Area of shaded region)/ (Area of region under curve) What you see above is distribution of means from n random samples. We computed the parameters for above distribution using mean and standard deviation of control group of the experiment. Below is how the parameters were computed.

1. Mean of “distribution of sample means”
• Control group mean is 3.5. It is reasonable to assume that 3.5 is an approximation of population mean, given the control group size was 100 people.
• Mean of distribution of sample mean will be equal to population mean.
• Therefore, Mean of sample mean distribution is 3.5
2. Standard Deviation of “distribution of sample means”
• Standard Deviation of control group is 0.2. It is reasonable to assume that 0.2 is an approximation of population Standard Deviation, given the control group size was 100 people.
• According to Central Limit Theorem, Standard Deviation of sample-mean distribution = (Population Standard Deviation)/Square-Root(Sample size)
• Therefore, Standard Deviation of distribution of sample means is 0.02

To determine the P-value we need to compute the distance of “Experimental Group Mean Rating” from the Mean of Sample-Mean distribution. The distance is usually measure in terms of number of standard deviations and it is called Z-distance or Z-score. We have the z-distance value. To determine the probability or P-value for a given Z-score, we need to look up the Z table (at the bottom of the page) for the value of 2.5. If you notice value for the row saying “2.5” is 0.9938. The value in z-table gives probability of green region in below diagram. But since we need value for red region in below diagram, we will need to subtract 0.9938 from 1.

Therefore,

P-value = 1-0.9938 = 0.0062 = 0.62%

We finally have the P-value. You could have also computed the same value using formula “=1-NORMSDIST(2.5)” in excel.

The p-value is less than our predetermined threshold or α of 5%. That means, P-value is not statistically significant and therefore H0 is false, H1 is true.

Our Alternate-Hypothesis “Change in Sugar content impacted the average rating of the population.” Is true. Which means if Saidulu increases the sugar content in his ice-creams he will end up selling more. This is assuming people will eat in more quantity, the ice-cream that they rated higher. Despite knowing that higher sugar content is not healthy for consumers, Saidulu tells himself “whatever sells bro!”.  He increases the sugar content, sells more ice-cream and makes more money. 