Measures of central tendency give you the value which is representative of a given data-set. There are 3 measures of central tendency:

- Mean μ
- Median
- Mode

Below is a link that gives a very good view of measures of central tendency that can be applied.

Well, now you know what measure of central tendency is. But is central tendency in itself is enough to describe a data-set? I don’t think so and so does Hans Rosling when he says in his book Factfulness:

*“When we compare two averages, we risk misleading ourselves even more by focusing on the gap between two single numbers, and missing the overlapping spreads, the overlapping ranges of numbers, that make each average.”*

Averages (or measures of central tendency) alone can be misleading. Looking at Central Tendencies alone might give you an obscured world view. You need to couple the measure of central tendency with that of the spread (or measures of dispersion).

**Range**

Range is simplest measure of dispersion. It is the difference between maximum and minimum values in a given data-set. In case of house-hold incomes it will be difference between lowest income and the highest income. But how effective is the range? you need to be careful in applying range when the data is not evenly spread out. As we know, household income is very unevenly distributed and therefore the range value you get might be much larger because of outliers at both higher and lower ends.

Example: Consider a data-set of household incomes where highest and lowest incomes are as below:

Highest income: 100,00,000 (1 crore or 10 million)

Second highest income: 10,000 ( ten thousand)

Second lowest income: 1000 (one thousand)

Lowest income: 0

The range according to above data will be 1 crore or 10 million (highest minus lowest). But is this an accurate representation of data? No, because much of the data is between 1000 and 10,000 thus making effective range to be 9000.

A adaptation of range is inter-quartile range where the range is computed only for the middle 50% of the data set.

However, even inter-quartile range can be susceptible to the same issues regarding distribution of data. A better approach will be the one which takes into consideration distance of each of the points from center. My first thought was to compute distance of each data point from mean and then calculate the average of all those distances. The sum of distances of all the data points from center (mean) will be zero. Therefore, I should be using squares of distances to avoid negative distances cancel out the positive distances.

Example: Take data-points whose values are 100, 200, 300 and 400. The mean value is 1000/4 which is 250. Now mean of the distances of each data point from 250 will be zero.

{(100-250)+(200-250)+(300-250)+(400-250)}/4 = zero

so we take mean of the squares of distances and get the value 1250

**Variance**

The above discussed example is variance. Variance for a given data-set of discrete values can be calculated using below formula. As you can see, the value of variance for a data-set depends collectively on the distance of each of the data points from the mean of that data-set.

**Standard Deviation**

Standard Deviation is more of a derivation from variance – it is square-root of variance. Though can be used interchangeably, I believe standard-deviation is preferred over variance only because it is of the same dimensions as the original data-set.

For a data-set with discrete real values standard-deviation can be computed by below formula.

Example:Mean of 1,1,2,3,4,4 is given by the expression = (1+1+2+3+4+4)/6Mean =

μ=x̄= 2.5Using the Mean Value computed above we will compute the Standard-Deviation

σ =s= 1.258306

Do the above concepts still seem abstract to you? Please read How to apply Mean and Standard Deviation? for a better understanding.

Hope this helps your understanding. Thank you!

]]>This tells us that there are different types of data-sets and the value they contain depends on the random-variables they represent. Let us look at the classification around the types of data.

There are majorly two types of data:

- Qualitative Data
- Quantitative Data

This type of data contains non-numerical values. The values taken by qualitative random variables are categorical in nature. Two broad classes or Qualitative Data:

1. Nominal Data: The values represent categories which are not in any order.

Example: Color of the flower: Red, blue, green

2. Ordinal Data: The values represent categories which follow a particular order.

Example: Income Class: Higher Income, Middle Income, Lower Income

The Random Variable takes numerical values. Two types of numerical values:

1. Continuous: The variable can take any possible numerical value with in given bouds

Example: Height of a person: 5.5456 feet, 6 feet, 4.00001 feet

2. Discrete: The variable can take only some values with in given bounds

Example: Number of students in a class. The value can only be an integer.

In the upcoming posts we will look at how statistical tools can be applied to each of these types of Random Variables. Thank you!

]]>

Mean of any given distribution is a measure of central tendency of that distribution. Mean is also known as Arithmetic Mean or Average Value or Expected Value.

For a data-set with discrete real values mean can simply be computed as sum of all the values in data-set divided by the number of values. Below is the formula

Example:Mean of 1,1,2,3,4,4 is given by the expression = (1+1+2+3+4+4)/6Mean =

μ=x̄= 2.5

Standard Deviation of any given distribution is a measure of dispersion of that data-set. It tells you how spread out the distribution is with-respect-to its mean.

For a data-set with discrete real values standard-deviation can be computed by below formula.

Example:Mean of 1,1,2,3,4,4 is given by the expression = (1+1+2+3+4+4)/6Mean =

μ=x̄= 2.5Using the Mean Value computed above we will compute the Standard-Deviation

σ =s= 1.258306

Let us look at an example. Zen is an under-graduate student who wants to evaluate and choose between two post-graduate colleges (College A and College B) she got accepted into. The student thinks of her three most important criterion for choosing grad-school.

- Faculty
- Location
- Expected salary after graduating

Zen does not find any significant difference between colleges A & B along her first two criterion. Now the only deciding factor will be the “Expected Salary”. Zen collects 1000 salary data points from each college to evaluate which college will help her earn a higher salary. The issue that Zen faces here is that she has to evaluate 1000 data points from each college to come to a decision. To make sense of such huge number of data points seemed challenging. But then, Zen remembered that she learnt about statistical tools that can help her make a better decision. Below is how she went about evaluating:

Applying the Measure or Central Tendency (or) Applying Mean:Zen asks herself, “If I were to come up with a numeric value for each college, that is representative of the salary earned in that college what will it be?”. She realizes that she can compute mean salary for each college. The value of the mean salary will indicate the central tendency of the data-set. Which means if you were to visualize distribution of the salary, then mean salary will be in between the lowest and highest salaries with its value gravitating towards the salary bucket which is most frequent.

The Mean Salary value for both the colleges A & B turns out to be 95,000 and 97,000 respectively. If Zen were to only use the mean value as deciding factor then she can simply join college B. But then the she wonders, “Mean Salary only represents the central tendency of those 1000 data points from each college. What about how spread out the salary numbers are?”. To get an understanding of how spread out the salary numbers are, she finds the highest and lowest salaries for both the colleges. Below is how the data looks like

Lowest Salary Highest Salary College A 80,000 110,000 College B 50,000 150,000 As can be seen, College A has lower highest salary and lowest salary for both the colleges is same. Does it mean Zen should simply take College B. Not Just Yet! There is an important measure she needs to think about. That is Standard Deviation.

Applying the Measure or Dispersion (or) Applying Standard Deviation:Using the formulas learnt above Zen computes Standard Deviations for both college A and B. Below is how the Standard Deviation looks like.

Standard Deviation of SalaryCollege A 3000 College B 7000 From the above numbers it is evident that College B has higher Standard Deviation which means that salaries in College B are more spread out as compared to College A. This means, taking college B can mean much higher or much lower salary compared to College A. Zen thinks that College B presents a high risk and high reward opportunity while College A is more likely to fetch a salary closer to its mean which is 95000. Zen being a risk averse person, chooses to take up College A.

Such simple tools like Mean and Standard Deviation can enable you to think along dimensions which were hitherto non-existent. Thank you!

]]>Random Variables are a means of assigning numbers to outcomes of random processes or experiments or activities.

An example of random variable can be an “outcome of rolling a dice”. Let us denote this random variable by symbol “O_{dice-roll }“. The values this random variable can take can be any integer between 1 and 6.

The O

_{dice-roll }is a discrete random-variable. Which means it can take only specific integer values like one among {1,2,3,4,5,6}. It cannot take non-integer values or values outside the 1 to 6 range.An opposite of “discrete random variable” is a “continuous random variable”. A continuous random variable can take any real value (both integer and non-integer) in a given range. An example of continuous random variable can be “height of a person”. Lets represent it by symbol “H

_{person}“. The variable “H_{person}” can take any value between 0 feet to 10 feet.

A function which mathematically represents the outcomes from this random variable is called “Distribution” or a “Distribution Function” of this random variable.

In the above case, the distribution function will be

O

_{dice-roll }= X ; Where X belongs to {1,2,3,4,5,6}

A distribution curve is plotting of outcomes of the distribution function. It represents number of occurrences of each of the values.

Our distribution function for “outcomes from rolling dice” can be easily plotted on a 2-dimensional surface. Assuming that you rolled the dice a hundred times, and then you will see below distribution:

As you can note, we don’t see a line in the graph but only points. This is because our function is a discrete one which can only take point values from 1 to 6.

If I were to plot a distribution curve for continuous random variable like “H_{person}” or “Height of a person” you are likely to see a curve like below. As you can see, the curve is continuous implying that the height of a person can take any real value with in a reasonable range.

Hope this helps your understanding. Thank you for reading, please feel free to share your views and feedback through comments.

In the upcoming articles I will be explaining the concepts Mean, Standard-Deviation and Normal distribution.

]]>While “Experiment” is a broader term, a controlled experiment specifically is about testing impact of a single factor /variable while the other variables remain constant.

Confused? Don’t worry, read ahead. I will attempt to explain this using a scenario.

Imagine Saidulu is an ice-cream manufacturer who wants to increase the sales of his product “Kya toh bhi Ice-cream”. His friend, Panthulu, suggests him to double the sugar content in ice-cream in order to achieve higher sales.

Will Saidulu go ahead and increase sugar in his ice-cream? That’s a bad business move. Fortunately Saidulu is smart, he conducts a controlled experiment. Voila!

How does he do that?

First, Saidulu tries to understand his customer segments. He narrows down attributes of his major consumer segment on the parameters like age, gender, income, geography etc.

Now he conducts his controlled experiment on 200 people who fit his customer parameters. He divides his customers into two uniform groups – control and experimental. “Control group” gets ice-cream with old recipe while “Experimental Group” gets ice-cream with exact same recipe except for one difference. Their ice-cream has double the amount of sugar!

Both controlled group and experimental group taste the ice-cream and provide their ratings on a scale of 1 to 5. This concludes the activity of experiment.

Saidulu took utmost care to make sure right people were selected for the experiment and they were divided to form homogeneous groups (Control and experimental). This means that data available from the experiment is more likely to yield reliable results. All Saidulu needs to do now is to compare the collected data and see if the increase in sugar has any impact on customer satisfaction.

The data from experiment shows us following

Group |
Average Rating |
Standard Deviation |

Control Group – Ice cream with same amount of sugar | 3.5 | 0.2 |

Experimental Group – Ice Cream with double the sugar | 3.55 | 0.199 |

Now, what do you see? There is clearly a jump in the average rating from 3.5 to 3.55. Does that mean Saidulu can safely change his recipe to the one served to Experimental Group? Not just yet! What if the increase average rating is a pure coincidence? There is no way to know for sure, but there is way to determine the probability and it is called “Hypothesis Testing”

What exactly is Hypothesis Testing?

It is a statistical method of determining if a given statement is true is or not. And the statement itself is called a “Hypothesis”

How do I perform Hypothesis Testing?

You start performing “Hypothesis Testing” by designing two statements or Hypotheses – A “Null Hypothesis” and an “Alternate Hypothesis”

How to design a Null Hypothesis?

A null hypothesis states that the change in the variable did not impact the system at all. It is represented by symbol “H

_{0}” (H not)

In case of Saidulu the null hypothesis will be:

*H _{0}: Change in Sugar content did not impact the average rating of the population.*

In other words, additional sugar added in ice-cream does not change people’s perception about the ice-cream. It was just a matter of chance that the average rating of Control Group was higher than the experimental group.

How to design an Alternate Hypothesis?

As you might have guessed already, an alternate hypothesis states that the change in variable had an impact on the system. It is represented by the symbol “H

_{1}” (H one)

Saidulu’s alternate Hypothesis will read:

*H _{1}: Change in Sugar content impacted the average rating of the population.*

Now that we have defined our “Null Hypothesis” and “Alternate Hypothesis” time to perform our hypothesis testing. It’s simple, we will calculate the probability of our Null Hypothesis being true and this probability is what we call “P-value”.

If the “P value” or Probability of null hypothesis being true is low then the null hypothesis is false and alternate hypothesis (H_{1}) is true. Which means that the change in variable had an impact on the system. But, if the “P value” is high then the null hypothesis (H_{0}) is true. The change in variable had no impact. Whether the pvalue is high or low is determined by comparing it against a predetermined level α (alpha).

If p-value > α then the p-value is statistically significant and H

_{0}is true.If p-value < α then the p-value is not statistically significant and H

_{0}is false. H_{1 }is true.

Saidulu says his threshold is 5% which means

If p-value > 0.05 means his null hypothesis which says “increase in sugar had no impact on the rating” is true. The increase in average rating of experimental group was just by chance. It would have happened even if the same old ice-cream was served to them.

If p-value < 0.05 then the p-value is not statistically significant and H_{0} is false. H_{1 }is true. The higher average rating seen in experimental group was not by chance. It was in fact due increase in sugar content in their ice-cream.

How do I compute the P-Value or P-statistic?

From here on, our discussion is going to get a little more technical in nature. As you know, P-value is probability of us finding that increased rating without increased sugar content having an impact on user satisfaction. This means, we have to assume that population parameters – mean, standard deviation – remained constant. Our P-value is represented by shaded region in below picture.

P-value = (Area of shaded region)/ (Area of region under curve)

What you see above is distribution of means from n random samples. We computed the parameters for above distribution using mean and standard deviation of control group of the experiment. Below is how the parameters were computed.

__Mean of “distribution of sample means”__- Control group mean is 3.5. It is reasonable to assume that 3.5 is an approximation of population mean, given the control group size was 100 people.
- Mean of distribution of sample mean will be equal to population mean.
- Therefore, Mean of sample mean distribution is 3.5

__Standard Deviation of “distribution of sample means”__- Standard Deviation of control group is 0.2. It is reasonable to assume that 0.2 is an approximation of population Standard Deviation, given the control group size was 100 people.
- According to Central Limit Theorem, Standard Deviation of sample-mean distribution = (Population Standard Deviation)/Square-Root(Sample size)
- Therefore, Standard Deviation of distribution of sample means is 0.02

To determine the P-value we need to compute the distance of “Experimental Group Mean Rating” from the Mean of Sample-Mean distribution. The distance is usually measure in terms of number of standard deviations and it is called Z-distance or Z-score.

We have the z-distance value. To determine the probability or P-value for a given Z-score, we need to look up the Z table (at the bottom of the page) for the value of 2.5. If you notice value for the row saying “2.5” is 0.9938. The value in z-table gives probability of green region in below diagram.

But since we need value for red region in below diagram, we will need to subtract 0.9938 from 1.

Therefore,

* P-value = 1-0.9938 = 0.0062 = 0.62%*

We finally have the P-value. You could have also computed the same value using formula *“=1-NORMSDIST(2.5)” *in excel.

The p-value is less than our predetermined threshold or α of 5%. That means, P-value is not statistically significant and therefore H_{0} is false, H_{1 }is true.

Our Alternate-Hypothesis *“Change in Sugar content impacted the average rating of the population.” *Is true. Which means if Saidulu increases the sugar content in his ice-creams he will end up selling more. This is assuming people will eat in more quantity, the ice-cream that they rated higher. Despite knowing that higher sugar content is not healthy for consumers, Saidulu tells himself “whatever sells bro!”. He increases the sugar content, sells more ice-cream and makes more money.

Thank you for reading through. Hope this article helps your understanding of Hypothesis testing.

For any further questions or suggestions or comments, please comment below and I shall revert as soon as possible.

Z-table Below:

]]>Machine learning is the ability of a computer to derive rules and patterns from given set of data. To derive these insights statistical tools are deployed through machine learning algorithms. Machine Learning algorithms break down the data (aka training data) to formulate best-fit mathematical models. Machine Learning can be broadly classified into two types

- Unsupervised Machine Learning
- Supervised Machine Learning

Unsupervised Machine Learning is also known as clustering. In this, the learning algorithm tries to group given set of data points into different groups based on similarities in predetermined features.

Example: if you feed a clustering algorithm with 1000 images, it will try to group them based similarity in the image features. These features can be: faces, colors, dimensions etc.

The name “unsupervised” because you don’t need to specify any outputs with corresponding inputs in this type of learning. All you need to do is dump the available data for the algorithm to learn from. This will become clearer when you look at Supervised Machine Learning next.

Below info-graphic will give you an indication of clustering algorithm at work

In a supervised machine learning the learning algorithm tries to identify a relationship between given set of inputs and their corresponding outputs. There are two types of supervised machine learning algorithms:

- Regression
- Classification

Regression is used for systems where the value being predicted falls somewhere on a continuous real number range. These systems help us with questions of “How much?” or “How many?”.

Example: Assume you have historical data of “House prices” vs “House area”

Supervised Machine Learning (Linear Regression) can be used to determine relation between a given house’s area and it’s price. This equation can be used to predict prices of newer houses in the market.

Classification is used for systems where output value being predicted is a category. Classification algorithms help us answer questions like “Is this tumor cancerous?”, “Does this cookie meet our quality standards?” etc.

Example:You have historical data of engine noise and vibrations from a car manufacturing company along with whether each of the engines cleared quality checks or not. You can use this data to build a machine learning model that can predict if a given engine with certain noise and vibration levels will clear quality checks.

In conclusion below info-graphic will give you an easily understandable view

]]>

Capacity Planning is a process of estimating resources required by an organization to meet production demands over-time.

Organizations usually face fluctuating production demands. Most of these fluctuations are seasonal and therefore can be anticipated and catered to with some amount of smart planning. Capacity Planning can help achieve high service-levels at lower costs enabling you to differentiate in the market while delivering customer delight.

It is in fact straight-forward and easy to perform. Let us take an example of a biscuit manufacturing unit. Follow the below steps:

**Step 1:** Identify your unit of product, which here is a box of biscuits

**Step 2:** Identify resources that determine your production capacity, which can be the number of machines employed or people employed

**Step 3:** Estimate your demand numbers per unit time. If you are doing an annual capacity planning, demand per month can be a good unit to use. As can be seen below, the demand is unevenly distributed.

**Step 4: **Link your demand to the capacity that you might need to maintain. For example if 1 machine can produce 4,000 boxes of biscuits a month then you will need 6 machines to produce 23,000 boxes of biscuits. Similarly if one employee can contribute to 6000 boxes per month then 4 employees will be needed. Assuming these capacities, let us look at the employees and machines you will need to meet each months demand.

You will note that in the above table you have higher needs for capacity in months of March, April, September and October. With this information in hand you will need to plan your capacity to meet the production demands.

You can device some smart strategies to optimally plan for capacity:

**Aggregate your production:**Aggregate your production to a single location to smoothen the sharp demand fluctuations.**Manufacture in advance:**If your product has a higher shelf life, try to maximize capacity utilization during low demand periods and increase your inventory for demand spikes. This strategy is contingent on your cost of storage for finished goods.**Explore Just-in-time ramping up:**Explore strategies to ramp up production for those shorter higher demand periods. Can you lease machinery? Can you hire skilled employees on contract?**Develop fungible resources:**This strategy mainly applies to human resources. Can your employees be trained on multiple skills? Assuming you have other products, say cakes, whose demand inversely correlates with biscuits, you can shuffle excess capacity from one product to another.

Apart from some simple heuristics outlined above, you can apply sophisticated statistical methods to estimate your capacity more accurately. More often than not, the above guiding principles should suffice your capacity planning needs.

Please let me know your thoughts/ suggestions through comments.

Also, in case you need assistance with your business, get in touch with me at juned.bizadvisory@gmail.com

]]>Too right? (or) Too left?

Too religious? (or) Too atheistic?

Many more such choices of ideologies that shape our lives. Each of us lie at some point on the spectrum, between the extremes, often tending towards one side or the other.

How do I ensure a balance between these extremes?

- Start with realising the side you are inclined towards.
- Qualify your decisions with questions pertaining to opposite inclination. A believer may ask “Do I miss the practicality of an atheist?”. An atheist may ask “Can i display faith while everything fails me?”.
- Relook at your choices now.

If for nothing, such reflection would atleast enable us to understand the other person better.

To a peaceful coexistence! Cheers!!

]]>To realise any of your dreams it’s required to have a desire strong enough that pushes you to act. The absence of a want is always filled with a justification.

Overcome the explanations to take that first step. Be greeted by a barrage of setbacks, but persist. Keep inching forward building just a bit each day. The culmination of all those micro-efforts will snowball into a manifestation of your dream.

Go ahead! Believe!!!

]]>Addictions allow you to lose yourself. They can take any form, ranging from love to art to work to shopping to smoking to alcoholism. People perhaps find solace in indulging and forgetting themselves.

]]>