Random Variables are a means of assigning numbers to outcomes of random processes or experiments or activities.

An example of random variable can be an “outcome of rolling a dice”. Let us denote this random variable by symbol “O_{dice-roll }“. The values this random variable can take can be any integer between 1 and 6.

The O

_{dice-roll }is a discrete random-variable. Which means it can take only specific integer values like one among {1,2,3,4,5,6}. It cannot take non-integer values or values outside the 1 to 6 range.An opposite of “discrete random variable” is a “continuous random variable”. A continuous random variable can take any real value (both integer and non-integer) in a given range. An example of continuous random variable can be “height of a person”. Lets represent it by symbol “H

_{person}“. The variable “H_{person}” can take any value between 0 feet to 10 feet.

A function which mathematically represents the outcomes from this random variable is called “Distribution” or a “Distribution Function” of this random variable.

In the above case, the distribution function will be

O

_{dice-roll }= X ; Where X belongs to {1,2,3,4,5,6}

A distribution curve is plotting of outcomes of the distribution function. It represents number of occurrences of each of the values.

Our distribution function for “outcomes from rolling dice” can be easily plotted on a 2-dimensional surface. Assuming that you rolled the dice a hundred times, and then you will see below distribution:

As you can note, we don’t see a line in the graph but only points. This is because our function is a discrete one which can only take point values from 1 to 6.

If I were to plot a distribution curve for continuous random variable like “H_{person}” or “Height of a person” you are likely to see a curve like below. As you can see, the curve is continuous implying that the height of a person can take any real value with in a reasonable range.

Hope this helps your understanding. Thank you for reading, please feel free to share your views and feedback through comments.

In the upcoming articles I will be explaining the concepts Mean, Standard-Deviation and Normal distribution.

]]>While “Experiment” is a broader term, a controlled experiment specifically is about testing impact of a single factor /variable while the other variables remain constant.

Confused? Don’t worry, read ahead. I will attempt to explain this using a scenario.

Imagine Saidulu is an ice-cream manufacturer who wants to increase the sales of his product “Kya toh bhi Ice-cream”. His friend, Panthulu, suggests him to double the sugar content in ice-cream in order to achieve higher sales.

Will Saidulu go ahead and increase sugar in his ice-cream? That’s a bad business move. Fortunately Saidulu is smart, he conducts a controlled experiment. Voila!

How does he do that?

First, Saidulu tries to understand his customer segments. He narrows down attributes of his major consumer segment on the parameters like age, gender, income, geography etc.

Now he conducts his controlled experiment on 200 people who fit his customer parameters. He divides his customers into two uniform groups – control and experimental. “Control group” gets ice-cream with old recipe while “Experimental Group” gets ice-cream with exact same recipe except for one difference. Their ice-cream has double the amount of sugar!

Both controlled group and experimental group taste the ice-cream and provide their ratings on a scale of 1 to 5. This concludes the activity of experiment.

Saidulu took utmost care to make sure right people were selected for the experiment and they were divided to form homogeneous groups (Control and experimental). This means that data available from the experiment is more likely to yield reliable results. All Saidulu needs to do now is to compare the collected data and see if the increase in sugar has any impact on customer satisfaction.

The data from experiment shows us following

Group |
Average Rating |
Standard Deviation |

Control Group – Ice cream with same amount of sugar | 3.5 | 0.2 |

Experimental Group – Ice Cream with double the sugar | 3.55 | 0.199 |

Now, what do you see? There is clearly a jump in the average rating from 3.5 to 3.55. Does that mean Saidulu can safely change his recipe to the one served to Experimental Group? Not just yet! What if the increase average rating is a pure coincidence? There is no way to know for sure, but there is way to determine the probability and it is called “Hypothesis Testing”

What exactly is Hypothesis Testing?

It is a statistical method of determining if a given statement is true is or not. And the statement itself is called a “Hypothesis”

How do I perform Hypothesis Testing?

You start performing “Hypothesis Testing” by designing two statements or Hypotheses – A “Null Hypothesis” and an “Alternate Hypothesis”

How to design a Null Hypothesis?

A null hypothesis states that the change in the variable did not impact the system at all. It is represented by symbol “H

_{0}” (H not)

In case of Saidulu the null hypothesis will be:

*H _{0}: Change in Sugar content did not impact the average rating of the population.*

In other words, additional sugar added in ice-cream does not change people’s perception about the ice-cream. It was just a matter of chance that the average rating of Control Group was higher than the experimental group.

How to design an Alternate Hypothesis?

As you might have guessed already, an alternate hypothesis states that the change in variable had an impact on the system. It is represented by the symbol “H

_{1}” (H one)

Saidulu’s alternate Hypothesis will read:

*H _{1}: Change in Sugar content impacted the average rating of the population.*

Now that we have defined our “Null Hypothesis” and “Alternate Hypothesis” time to perform our hypothesis testing. It’s simple, we will calculate the probability of our Null Hypothesis being true and this probability is what we call “P-value”.

If the “P value” or Probability of null hypothesis being true is low then the null hypothesis is false and alternate hypothesis (H_{1}) is true. Which means that the change in variable had an impact on the system. But, if the “P value” is high then the null hypothesis (H_{0}) is true. The change in variable had no impact. Whether the pvalue is high or low is determined by comparing it against a predetermined level α (alpha).

If p-value > α then the p-value is statistically significant and H

_{0}is true.If p-value < α then the p-value is not statistically significant and H

_{0}is false. H_{1 }is true.

Saidulu says his threshold is 5% which means

If p-value > 0.05 means his null hypothesis which says “increase in sugar had no impact on the rating” is true. The increase in average rating of experimental group was just by chance. It would have happened even if the same old ice-cream was served to them.

If p-value < 0.05 then the p-value is not statistically significant and H_{0} is false. H_{1 }is true. The higher average rating seen in experimental group was not by chance. It was in fact due increase in sugar content in their ice-cream.

How do I compute the P-Value or P-statistic?

From here on, our discussion is going to get a little more technical in nature. As you know, P-value is probability of us finding that increased rating without increased sugar content having an impact on user satisfaction. This means, we have to assume that population parameters – mean, standard deviation – remained constant. Our P-value is represented by shaded region in below picture.

P-value = (Area of shaded region)/ (Area of region under curve)

What you see above is distribution of means from n random samples. We computed the parameters for above distribution using mean and standard deviation of control group of the experiment. Below is how the parameters were computed.

__Mean of “distribution of sample means”__- Control group mean is 3.5. It is reasonable to assume that 3.5 is an approximation of population mean, given the control group size was 100 people.
- Mean of distribution of sample mean will be equal to population mean.
- Therefore, Mean of sample mean distribution is 3.5

__Standard Deviation of “distribution of sample means”__- Standard Deviation of control group is 0.2. It is reasonable to assume that 0.2 is an approximation of population Standard Deviation, given the control group size was 100 people.
- According to Central Limit Theorem, Standard Deviation of sample-mean distribution = (Population Standard Deviation)/Square-Root(Sample size)
- Therefore, Standard Deviation of distribution of sample means is 0.02

To determine the P-value we need to compute the distance of “Experimental Group Mean Rating” from the Mean of Sample-Mean distribution. The distance is usually measure in terms of number of standard deviations and it is called Z-distance or Z-score.

We have the z-distance value. To determine the probability or P-value for a given Z-score, we need to look up the Z table (at the bottom of the page) for the value of 2.5. If you notice value for the row saying “2.5” is 0.9938. The value in z-table gives probability of green region in below diagram.

But since we need value for red region in below diagram, we will need to subtract 0.9938 from 1.

Therefore,

* P-value = 1-0.9938 = 0.0062 = 0.62%*

We finally have the P-value. You could have also computed the same value using formula *“=1-NORMSDIST(2.5)” *in excel.

The p-value is less than our predetermined threshold or α of 5%. That means, P-value is not statistically significant and therefore H_{0} is false, H_{1 }is true.

Our Alternate-Hypothesis *“Change in Sugar content impacted the average rating of the population.” *Is true. Which means if Saidulu increases the sugar content in his ice-creams he will end up selling more. This is assuming people will eat in more quantity, the ice-cream that they rated higher. Despite knowing that higher sugar content is not healthy for consumers, Saidulu tells himself “whatever sells bro!”. He increases the sugar content, sells more ice-cream and makes more money.

Thank you for reading through. Hope this article helps your understanding of Hypothesis testing.

For any further questions or suggestions or comments, please comment below and I shall revert as soon as possible.

Z-table Below:

]]>Machine learning is the ability of a computer to derive rules and patterns from given set of data. To derive these insights statistical tools are deployed through machine learning algorithms. Machine Learning algorithms break down the data (aka training data) to formulate best-fit mathematical models. Machine Learning can be broadly classified into two types

- Unsupervised Machine Learning
- Supervised Machine Learning

Unsupervised Machine Learning is also known as clustering. In this, the learning algorithm tries to group given set of data points into different groups based on similarities in predetermined features.

Example: if you feed a clustering algorithm with 1000 images, it will try to group them based similarity in the image features. These features can be: faces, colors, dimensions etc.

The name “unsupervised” because you don’t need to specify any outputs with corresponding inputs in this type of learning. All you need to do is dump the available data for the algorithm to learn from. This will become clearer when you look at Supervised Machine Learning next.

Below info-graphic will give you an indication of clustering algorithm at work

In a supervised machine learning the learning algorithm tries to identify a relationship between given set of inputs and their corresponding outputs. There are two types of supervised machine learning algorithms:

- Regression
- Classification

Regression is used for systems where the value being predicted falls somewhere on a continuous real number range. These systems help us with questions of “How much?” or “How many?”.

Example: Assume you have historical data of “House prices” vs “House area”

Supervised Machine Learning (Linear Regression) can be used to determine relation between a given house’s area and it’s price. This equation can be used to predict prices of newer houses in the market.

Classification is used for systems where output value being predicted is a category. Classification algorithms help us answer questions like “Is this tumor cancerous?”, “Does this cookie meet our quality standards?” etc.

Example:You have historical data of engine noise and vibrations from a car manufacturing company along with whether each of the engines cleared quality checks or not. You can use this data to build a machine learning model that can predict if a given engine with certain noise and vibration levels will clear quality checks.

In conclusion below info-graphic will give you an easily understandable view

]]>

Capacity Planning is a process of estimating resources required by an organization to meet production demands over-time.

Organizations usually face fluctuating production demands. Most of these fluctuations are seasonal and therefore can be anticipated and catered to with some amount of smart planning. Capacity Planning can help achieve high service-levels at lower costs enabling you to differentiate in the market while delivering customer delight.

It is in fact straight-forward and easy to perform. Let us take an example of a biscuit manufacturing unit. Follow the below steps:

**Step 1:** Identify your unit of product, which here is a box of biscuits

**Step 2:** Identify resources that determine your production capacity, which can be the number of machines employed or people employed

**Step 3:** Estimate your demand numbers per unit time. If you are doing an annual capacity planning, demand per month can be a good unit to use. As can be seen below, the demand is unevenly distributed.

**Step 4: **Link your demand to the capacity that you might need to maintain. For example if 1 machine can produce 4,000 boxes of biscuits a month then you will need 6 machines to produce 23,000 boxes of biscuits. Similarly if one employee can contribute to 6000 boxes per month then 4 employees will be needed. Assuming these capacities, let us look at the employees and machines you will need to meet each months demand.

You will note that in the above table you have higher needs for capacity in months of March, April, September and October. With this information in hand you will need to plan your capacity to meet the production demands.

You can device some smart strategies to optimally plan for capacity:

**Aggregate your production:**Aggregate your production to a single location to smoothen the sharp demand fluctuations.**Manufacture in advance:**If your product has a higher shelf life, try to maximize capacity utilization during low demand periods and increase your inventory for demand spikes. This strategy is contingent on your cost of storage for finished goods.**Explore Just-in-time ramping up:**Explore strategies to ramp up production for those shorter higher demand periods. Can you lease machinery? Can you hire skilled employees on contract?**Develop fungible resources:**This strategy mainly applies to human resources. Can your employees be trained on multiple skills? Assuming you have other products, say cakes, whose demand inversely correlates with biscuits, you can shuffle excess capacity from one product to another.

Apart from some simple heuristics outlined above, you can apply sophisticated statistical methods to estimate your capacity more accurately. More often than not, the above guiding principles should suffice your capacity planning needs.

Please let me know your thoughts/ suggestions through comments.

Also, in case you need assistance with your business, get in touch with me at juned.bizadvisory@gmail.com

]]>Too right? (or) Too left?

Too religious? (or) Too atheistic?

Many more such choices of ideologies that shape our lives. Each of us lie at some point on the spectrum, between the extremes, often tending towards one side or the other.

How do I ensure a balance between these extremes?

- Start with realising the side you are inclined towards.
- Qualify your decisions with questions pertaining to opposite inclination. A believer may ask “Do I miss the practicality of an atheist?”. An atheist may ask “Can i display faith while everything fails me?”.
- Relook at your choices now.

If for nothing, such reflection would atleast enable us to understand the other person better.

To a peaceful coexistence! Cheers!!

]]>To realise any of your dreams it’s required to have a desire strong enough that pushes you to act. The absence of a want is always filled with a justification.

Overcome the explanations to take that first step. Be greeted by a barrage of setbacks, but persist. Keep inching forward building just a bit each day. The culmination of all those micro-efforts will snowball into a manifestation of your dream.

Go ahead! Believe!!!

]]>Addictions allow you to lose yourself. They can take any form, ranging from love to art to work to shopping to smoking to alcoholism. People perhaps find solace in indulging and forgetting themselves.

]]>I wish I had an opportunity to repay the affection I received. Ignorant I was, lost in the moment assuming it would never pass. Lost in the selfish intentions of gratifying self. As I grow old, the unidirectional flow of time becomes more and more apparent.

]]>