Central limit theorem or CLT is one of the most important theories in the world of statistics. First of all, one has to understand the concept of population & sample. Let us know this using a small anecdote – Apple CEO wants to find out about what features customers are very happy about & what features they are unhappy about in iPhone. Will we go and check with each & every iPhone users? Will it be feasible? Even if yes then how costly will it be? Here comes a solution for all the problems – Sampling. Sampling is picking a proportion of population & analysing this proportion to make inferences about a complete population. Simple random sampling is considered to be the best-known technique to select a sample.
If the population follows a normal distribution, then sample mean will anyways follow normal distribution. Even if the population does not follow a normal distribution, then the sample mean follows normal distribution provided the sample size is larger. This is exactly what central limit theorem says.
Now we need to understand what is normal distribution. Before that let us understand probabilitty distribution. Probability distributions are of two types – mainly, Discrete probability distribution and Continuous probability distribution. When probability distribution is plotted on a discrete data types then it is called as discrete probability distribution & when probability distribution is plotted on a continuous data type then it is called as continuous probability distribution. If we can represent a specific data in numerical form then it can be classified as continuous data type. If some data cannot be represented in numerical form then it is classified as discrete data type. Also discrete probability distribution tends to appear as continuous probability distribution if the data are very large. There are a lot of terminologies and concepts which one should be aware of to understand the idea behind probability distribution. Firstly let us discuss about histogram which is an extension of a bar plot. Within bar plot each data point is represented using a bar. Histogram is a bar representation of any data based on logical grouping of bars, which are called as bins. Each bar in a histogram is called as a bin. On X-axis, we plot the, variable of interest and on the Y-axis we plot the frequency of occurrence of that variable. If we represent probability on the Y-axis on histogram, instead of the raw count then it is called as a probability distribution. Normal distribution is any data which does not have any abnormalities. For e.g. heights of people; we will find a small proportion of people whose height is very less or very high. Majority of the people will have their height which will fall within the range of 5 feet to 6 feet.
Sample mean (?x) = ? Sample standard deviation (?x) = ? / ? n Where, ?=Population mean ?=Population standard deviation n=Sample size