Content Strategy Manager
Statistics refers to the study of data for a specific purpose. The data is analyzed, studied, and then interpreted. When the data is graphically represented, it gives us a clear and coherent idea about the salient features of the interpreted data. When the values are represented, they form the measure of central tendency. We may have come across them as mean, median, and mode. Through them, we get to know where the data is centered. However, it is equally important to know where the data is scattered or how much the data is bunched around the measure of central tendency. Whether it is grouped or ungrouped. In simple words, what the measure of central tendency might not tell us, the measure of dispersion can. In this article, we will understand the measures of dispersion, its meaning and how to calculate the same.
Definition of Measures of Dispersion
Dispersion in simple words means “scattered” or “spread”. In statistical data, dispersion refers to the extent to which the data is distributed. It can either be tightly clustered or widely scattered.
For example: The values 40,80,120,160.. are widely scattered data while the values 1,1,2,2,3,4,4.. are tightly clustered.
Characteristics of Measures of Dispersion
- A measure of dispersion should be strictly defined.
- It should be easy to calculate and comprehend.
- It is dependent on all the observations.
- It doesn’t fluctuate with change in observations.
Classification of Measures of Dispersion
Measures of Dispersion are classified broadly into two types:
-
Absolute measure of Dispersion:
It includes the range, quartile deviation, mean deviation, and the standard deviation.
- Range
Range can be defined as the difference that exists between the maximum and the minimum value of a data set.
For example if a batsman scored 36 in the first game, 45 in second, 68 in third, and 78 in fourth, the range will be 78-36=42.
Range is most easy to comprehend for any given set of values.
Range= Xmax-Xmin
where Xmax and Xmin are two extreme observations.
Features:
- It is easy to find.
- It is easy to comprehend.
- It is the simplest measure of dispersion.
- It is independent of change in starting value.
- Quartile deviation
Quartile is derived from the word “quarter”, which means something divided into four, so we can understand quartile as values that divide a set of data into four. In any given set of data, there will always be a smallest value, largest value and the median. The middle value between the smallest number and median of the data is called the first quartile or Q1. The median of the given data set makes up the second quartile or Q2. The middle number between a median and the largest value makes up the third quartile or Q3.
Quartile deviation represented by Q is calculated by
Q=1/2 x (Q3-Q1)
Features:
- It is more reliable than range but not the most reliable
- It uses half of the given data
- It is independent of any change in starting value
- It is an appropriate measure of dispersion for open end type of data.
- Mean deviation
The mean is the average of all the values given and mean deviation is the the mean of the absolute values of the observations from a measure of central tendency. The mean deviation can be any value among mean, median or mode. If x1,x2,x3…..xn are the set of observations, then the mean deviation of x about the average A (mean, median or mode) is
Mean deviation from average A = 1⁄n [∑i|xi – A|]
For grouped frequency, the mean deviation is calculated as:
Mean deviation from average A = 1⁄N [∑i fi |xi – A|], N = ∑fi
Here, xi is the the mid value of the ith class interval, and
fi refers to the frequency of the ith class interval.
Mean deviation can be found in three types of series. They are:
- Individual data series: When all data is provided on an individual basis as the name suggests.
Items | 12 | 15 | 18 | 21 | 24 | 27 |
- Discrete data series: The frequencies are given but without any class interval of individual data.
Items | 10 | 20 | 30 | 40 | 50 |
Frequency | 5 | 3 | 6 | 4 | 2 |
- Continuous data series: The frequencies are given with the class interval or range is provided.
Items | 0-10 | 10-20 | 20-30 | 30-40 | 40-50 |
Frequency | 4 | 5 | 7 | 9 | 3 |
Features:
- All the observations are taken into account
- It’s independent of the change of the origin
- It provides the smallest value when deviations are taken from the median.
- Standard deviation
One of the most preferred methods to calculate the dispersion, Standard deviation refers to the positive square root of the mean of the squares of the deviations of the given values from their mean. In simple words, it is the square root of the variance. Variance is defined in mathematical terms as the average of the squared differences from the mean. Denoted by Greek letter sigma (σ), it is also called the root mean square deviation.
The formula for standard deviation is:
σ = [(Σi(yi-y?)/n] ½
= [(Σi.yi 2⁄n) - y?2] ½
For a grouped frequency distribution, it is
σ = [(Σi .fi (yi-y?)/N] ½
= [(Σi.fi .yi 2/n) - y?2] ½
The square of the standard deviation is the variance. It is also a measure of dispersion.
σ2 = [(Σi(yi-y?)/n] ½
= [(Σi.yi 2/n) - y?2]
For a grouped frequency distribution, it is
σ2 = [(Σi .fi (yi-y? ) ⁄ N] ½
= [(Σi.fi .xi 2/n) – y?2]
Features:
- It is the most accepted method to calculate deviation.
- It is not much affected by fluctuations in observations.
- It, by default, becomes zero if all the observations are constant.
- It is independent of the change in origin but dependent on change of scale.
-
Relative Measure of Dispersion
It is used to compare the distribution of two or more data sets whose average differs widely.
Few relative dispersion methods include:
- Coefficient of Range
- Coefficient of Variation
- Coefficient of Standard Deviation
- Coefficient of Quartile Deviation
- Coefficient of Mean Deviation
Based on different measures of dispersion, the coefficients of dispersion (C.D.) are
- Based on Range = (Xmax – Xmin) ⁄ (Xmax + Xmin).
- C.D. based on quartile deviation = (Q3 – Q1) ⁄ (Q3 + Q1).
- Based on mean deviation = Mean Deviation / Average
- For Standard deviation = S.D. / Mean
Sample questions
Question. Following table shows the result values for companies A and B.
Company A | Company B | |
Number of Employees | 900 | 1000 |
Average daily wage | Rs 250 | Rs. 220 |
Variance in the distribution of wages | 100 | 144 |
- Which of the companies, A or B, has a larger wage bill?
- Calculate the coefficients of variations for both companies A and B.
- Calculate the variance of the distribution of wages and the average daily wage of all the employees in both the firms taken together.
Solution:
For Company A
No. of employees i.e. n1 = 900, and
Average daily wages i.e. y?1 = Rs. 250
Now, as we know,
Average daily wage = Total wages ⁄ Total number of employees
or, Total wages = Total employees × average daily wage = 900 × 250 = Rs. 225000 …(i)
For Company B
No. of employees i.e. n2 = 1000, and
Average daily wages i.e. y?2 = Rs. 220
So, Total wages = Total employees × average daily wage = 1000 × 220 = Rs. 220000...(ii)
Comparing the wages above from (i) and (ii), we see that Company A has a larger wage bill.
For Company A
Variance of distribution of the wages i.e. σ12 = 100
C.V. of distribution of wages = 100 x S.D of distribution of wages/Average daily wages
Or, C.V. A = 100 × √100⁄250
= 100 × 10⁄250
= 4 ----- (i)
For Company B
Variance of distribution of wages i.e. σ22 = 144
C.V. B = 100 × √144⁄220
= 100 × 12⁄220
= 5.45 … (ii)
Comparing (i) and (ii), we can see that B Company has greater variability.
For Company A and B, taken together
The average daily wages for when both the companies are taken together-
y? = (n1.y?1 + n2.y?2)/( n1 + n2)
= (900 × 250 + 1000 × 220) ÷ (900 + 1000)
= 445000⁄1900
= Rs. 234.21
The combined variance, σ2 = (1/ n1 + n2) ÷ [n1 (σ12 + d12) + n2 (σ22 + d22)]
Here, d1 = y?1 − y?
= 250 – 234.21 = 15.79,
d2 = y?2 − y?
= 220 – 234.21 = – 14.21.
Hence, σ2 = [900 × (100 +15.792) + 1000 ×(144 + -14.212)] ⁄ (900+1000)
or, σ2 = (314391.69 + 345924.10) ⁄ 1900
= 347.53.
Question: A teacher asked students to complete a total of 60 pages of a record notebook. 8 students have completed just 32, 35, 37, 30, 33, 36, 35, and 37 pages. Find out the standard deviation of the pages that are yet to be completed by them.
Solution:
By Assumed Mean Method:
Pages yet to be completed are 28, 25, 23, 30, 27, 24, 25, and 23
Assumed mean A = 25 and n = 8
xi | di=xi -A di=xi -35 | di2 |
23 | -2 | 4 |
23 | -2 | 4 |
24 | -1 | 1 |
25 | 0 | 0 |
25 | 0 | 0 |
27 | 2 | 4 |
28 | 3 | 9 |
30 | 5 | 25 |
∑di2=5 | ∑di2 = 47 |
Standard Deviation
σ =∑di2/n-(∑di/n)^2
= √478-√582
=√47/8- √25/64
=√351/64
=18.735/8
=2.34
Therefore S.D of the pages to be completed=2.34
Comments