Content Writer
Statistics is a field of mathematics concerned with the study of data collection, analysis, interpretation, presentation, and organization. It is mostly used to acquire a better understanding of data and to focus on specific applications.
- Statistics is the process of gathering, assessing, and summarising data into a mathematical form.
- It was associated with state science.
- They were used to gather and analyze facts and data about a country's economy and population.
- The process is based on statistical population and statistical model data.
- Statistics specify ways in which data can be used to solve complex problems.
- It can be applied to the field of Linear algebra, differential equations, and probability theories.
- Weather forecasting, health insurance and sales tracking are examples of statistics.
Table of Content |
Key Terms: Statistics, Population, Mean, Median, Mode, Variance, Standard Deviation, Descriptive Statistics, Inferential Statistics, Central tendency, Dispersion, Skewness, Bar Graph, Line Chart, Pie Graph
What is Statistics?
[Click Here for Sample Questions]
Statistics is a field of mathematics used for analyzing and manipulating data. It works on every aspect, including planning, collecting, and representing data.
- Statistics provides a clear picture of the work you do regularly.
- Data is divided into quantitative and qualitative information.
- According to Sir Arthur Lyon Bowley, statistics is defined as the numerical representation of facts of any section.
- All sections are interrelated to each other.
- The process is used to measure central tendency and dispersion.
Example of StatisticsExample: Statistics is used to analyse the traffic situation of every city. It is found that engineers regularly monitors the total traffic in entire city on a regular basis to decide the removal or addition of roads. The process is done to initiate a smooth flow of traffic. |
Read More:
Types of Statistics
[Click Here for Sample Questions]
In mathematical statistics, there are two types of statistics for analysing data that are widely used:
Descriptive Statistics
The descriptive technique of statistics is used to measure central tendency and dispersion, which is used to describe the data collected and summarise the data and its attributes.
Inferential Statistics
This statistical strategy is used to generate conclusions from data. Inferential statistics rely on statistical tests on samples to make inferences. It does so by discovering differences between the two groups.
- The p-value is calculated and compared to the probability of chance() = 0.05.
- If the p-value is less than or equal to, the p-value is considered statistically significant.
What is Data in Statistics?
[Click Here for Sample Questions]
A data set is a collection of observations and information. These facts and observations can be expressed as numbers, measures, or statements.
- Qualitative and quantitative data are the two types of data available.
- When the data is descriptive or categorical, it is called qualitative data.
- When the data is numerical, it is called quantitative data.
After we know the data gathering procedures, we want to depict the obtained data in several forms of graphs such as a bar graph, line graph, pie chart, stem and leaf plots, scatter plot, and so on.
- Outliers that are caused by invariability and data measurements are deleted before the data is analysed.
Representation of Data
[Click Here for Sample Questions]
Let's have a look at the many types of data representation used in statistics.
Bar Graph
A bar graph is a collection of data depicted by rectangular bars with lengths proportional to the values. The bars can be plotted either in vertical or horizontal orientation.
Pie Chart
The pie chart is a graph in which a circle is divided into sectors, each representing a percentage of the total.
Line Graph
The data is represented in a line graph as a series of dots connected by a straight line. These are referred to as markers.
Pictograph
A pictograph is data presented in the form of visuals. Different numbers can be used to represent pictorial symbols for words, objects, or sentences.
Histogram
The histogram is a graph with rectangles in which the size is related to the frequency of a variable, and the width is equal to the class interval.
Frequency Distribution
In statistics, the frequency distribution table shows the data in ascending order with their corresponding frequencies. The letter f frequently symbolises the frequency of data.
Also Read:
Measures of Central Tendency
[Click Here for Sample Questions]
The basis of descriptive statistics is the Measures of Central Tendency and the Measures of Dispersion. The measure of central tendency tells us where data points are centered.
- It is called the representative value for the given data.
- This is done to see how the data is dispersed around the centered metric.
- To discover the central measures of tendency, we employ the mean, median, and mode.
Example of Central TendencyExample: We see the average height of students, the average wealth, the average exam result, or the average player height in our daily lives. |
The following are the many measures of central tendency for the data:
Mean
Mean is defined as the arithmetic average of a data set. It is calculated by adding all of the numbers in the set and dividing by the number of observations in the data set.
Example of MeanExample: Consider the following data set which represents the marks obtained by different students in a subject. Calculate the mean.
Ans. The mean of the marks obtained by students: (73 + 89 + 73 + 79 + 73 + 60 )/6 = 74.5 |
Median
The median is the middle number in the data set, regardless of whether it is given in ascending or descending order.
Example of MedianExample: Consider the following data set which represents the marks obtained by different students in a subject. Calculate the median.
Ans. Formula used for calculating median is n/2 as number of terms is even. The median of the marks obtained by students is 6/2 = 3. |
Mode
The median is the number that appears the most in a data set and falls between the highest and lowest values.
Example of ModeExample: Consider the following data set which represents the marks obtained by different students in a subject. Calculate the mode.
Ans. The mode of the marks obtained by students is 73. |
Formula for Mean Median and Mode
Mean, Median and Mode Video Explanation
Measures of Dispersion
[Click Here for Sample Questions]
The measures of central tendency is insufficient to convey all of the information about a set of data. As a result, we must use a metric called the measure of dispersion to characterize the variability.
- Dispersion is defined as the process of interpreting the data of population.
- It determines the extent to which numerical data is varied with respect to the average value.
- Measure of disperison is also known as measure of variability.
The following are the various measure of dispersion:
Range
In statistics, the range is determined as the difference between the data points' highest and minimum values.
Range = Maximum Value – Minimum Value
Standard Deviation and Variance
The variance and standard deviation are two more well-known statistical methods for determining the right measure of dispersion. Taking squares of all the deviations can be tricky when calculating the mean deviation around the mean and the median.
\(If {\Sigma^N_{i=1}(X_{i} - \bar{X})^2 \text{becomes zero},}\)
- When the total for the mean is zero, it indicates that there is no dispersion at all.
- The observations are closer to the mean if the sum is small, indicating a reduced degree of dispersion.
- There is a greater degree of dispersion of the observations from the mean when the sum is large, marked as σ2.
- It is termed as the variance and is denoted as
σ2 = \(\frac{\Sigma^N_{i=1}(X_i - \bar{X})^2}{n}\)
- The standard deviation is the positive square root of the variance is denoted as
\(\sigma = \sqrt\frac{\Sigma^N_{i=1}(X_i - \bar{X})^2}{n}\)
Quartile Deviation
The absolute measure of dispersion is measured by the quartile deviation. The information is separated into three parts. First calculate the data points' median.
- The higher quartile refers to the median of data points to the left of this median.
- The lower quartile is defined as the median of data points to the right of this median.
- The interquartile range is defined as the difference between the upper and lower quartiles.
- The quartile deviation accounts for half of this.
Mean Deviation
The mean deviation is a statistical measure used to calculate the average absolute difference between the items in a distribution and the series' mean or median.
Mean Deviation For ungrouped data
The frequency distributions of data in statistics can be discrete or continuous. In the case of a large number of independent observations,
\(X_1,X_2,X_3,X_r,......X_n \)
- The following formula is used to compute the mean deviation from the mean and median:
- Mean Deviation for ungrouped data=Sum of deviation/number of observations
= \(\frac{\Sigma^N_{i=1}(X_i - \bar{X})}{n}\)
Coefficient of Variation
The coefficients of variations of two or more frequency distributions are compared. The standard deviation to the mean, presented as a percentage, is the coefficient of variance in statistics.
\(CV = \frac{\sigma}{\bar{X}} \times 100\)
The distribution with a higher coefficient of variation has more variability around the central value than the distribution with a lower coefficient of variation value.
Different Model of Statistics
[Click Here for Sample Questions]
The different model of statistics are as follows:
Skewness
Skewness is a process of measuring asymmetry in a probability distribution. The process measures the normal probability distribution of data. Its value is equal to positive, negative or zero.
- The curve is shifted to the left or right when it is skewed.
- When the curve is extended toward the right, it is known as positively skewed.
- When the curve is extended toward the left, it is known as negatively skewed.
ANOVA Statistics
ANOVA stands for Analysis of Variance. The model is used to measure the difference in the mean value of the data from the individual data set to measure the dispersion of the data.
Degree of Freedom
Degree of freedom is a type of statistics model that is used to calculate the change in the data set with respect to the value of the data set.
Regression Analysis
Regression Analysis is a type of statistics model that is used to determine the relationship between variables. It gives the relation between dependent variables and independent variables.
Mean Deviation for Discrete Grouped data
[Click Here for Sample Questions]
In a frequency distribution like this, the measurements of the data units are plainly visible. Let's say there are n different data points:
n different data points
- Mean deviation about mean –
Mean deviation about mean
- The total of the products of xi observations and their respective frequencies fi divided by the sum of the frequencies is the frequency ratio.
Mean Deviation
- Mean deviation about median
To find the median, arrange the observations in increasing order. Calculate the total frequencies. Then find the observation with the highest cumulative frequency is ≥ N/2, where N = sum of frequencies.
- As a result, we've reached the desired median.
- We calculate MD to acquire the absolute values of the deviations from the median (median)=
Mean deviation about median
Read More
Continuous Grouped Data Mean Deviation
[Click Here for Sample Questions]
The data points in this case can take any value within a range and are continuous. Intervals on the real number line can be used to measure and represent them. The frequency with which data is organized into classes can't be counted.
- Mean deviation about mean
In each class, the mean of the continuous frequency distribution is centered at the midpoint. Then, like in the case of discrete frequency distribution, the same process is performed.
- Mean deviation about median
Mean deviation about median
- where the median class is the cf of the class interval: ≥ N/2, N the sum of frequencies.
- l, f, h, and C are the lower limit, frequency, and width of the median class, respectively.
- C is the cumulative frequency of the class immediately preceding the median class.
Things to remember
- Statistics is the science of data collecting and organisation.
- The analysis employing measures of central tendencies and measures of dispersion is used to interpret the results.
- Bar graphs, histograms, pie charts, stem and leaf plots, line graphs, and ogives are all used to show the frequency distribution of data.
- Quantitative which include numerical: discrete and continuous or qualitative which include data might be collected.
- For practice students can try out Statistics Important Questions and Statistics Revision Notes
Sample questions
Ques. Calculate the standard deviation for the numbers 8,10,12,14,16. (3 marks)
Ques. Calculate the mean deviation from the mean using the given information. (3 marks)
Size(x) | 2 | 4 | 6 | 8 | 10 |
---|---|---|---|---|---|
Frequency f | 2 | 4 | 5 | 3 | 1 |
Ques. The variation of the five observations is 8.24, and the mean is 4.4. Find the other two observations if three of the observations are 1, 2, and 6. (5 marks)
Ans. Let's call the remaining two observations a and b.
4 and 9 are the other two observations.
Ques. The table given below shows the marks obtained by 110 students in class. What will be the mean marks of the students? Use the assumed mean method. (5 marks)
Class | 0-10 | 10-20 | 20-30 | 30-40 | 40-50 |
Frequency | 13 | 22 | 25 | 20 | 10 |
Ans. The given data is:
Class (Ci) | Frequency (fi) | Class mark (xi) | di = xi – a | fidi |
---|---|---|---|---|
0-10 | 13 | 5 | 5 – 25 = – 20 | -260 |
10-20 | 22 | 15 | 15 – 25 = – 10 | -220 |
20-30 | 25 | 25 = a | 25-25 = 0 | 0 |
30-40 | 20 | 35 | 35-25 = 10 | 200 |
40-50 | 10 | 45 | 45-25 = 20 | 200 |
Total | Σfi =90 | Σfidi = -60 |
- Let us take Assumed mean = a = 25
- Using the Assumed Mean Method
- Mean = a + (Σfidi / Σfi)
- 25 + (-60 / 90)
- 25 – ( -2 / 3)
- (75 + 2) / 3
- 77 / 3 = 26.3
Thus, the mean marks of the students in the class are 26.3.
Ques. The average of the five integers is 20. If one of the numbers is left out, the mean is 16. Find the number that isn't included. (5 marks)
Ans- Given, n = 5, ¯X¯ = 20
¯X= (∑xi)/n
∑xi= 5 × 20 = 100
Thus, the sum of 5 observations = 100
Let "a" be the excluded number
Total of 4 numbers = 100 - a
Mean of 4 numbers = (100 - a)/4
16 = (100 - a)/4
100 - a = 64
a = 36
⇒ The missing number is 36.
Ques. Consider the following numbers: 56, 60, 50, 42, 78, 40, 20. What is the median? (3 marks)
Ans. In increasing order, the numbers are 20,40,42,50,56,60,78
n (number of observations) = 7 in this case.
average of (n/2)th and (n/2+1)th observation is median if the data set is even.
As a result, (7 + 1)/2 = 4
4th observation = median
54 is the median.
Ques: Find the variance of the number 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. (4 marks)
Ans: Find the mean value of 10 values given above
Mean= (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) / 10
Mean = 55/ 10
Mean = 5.5
Value N | N - N¯ | ( N -N¯)2 |
---|---|---|
1 | -4.5 | 20.25 |
2 | -3.5 | 12.25 |
3 | -2. 5 | 6.25 |
4 | -1.5 | 2.25 |
5 | -0.5 | 0.25 |
6 | +0.5 | 0.25 |
7 | +1.5 | 2.25 |
8 | +2.5 | 6.25 |
9 | +3.5 | 12.25 |
19 | +4.5 | 20.25 |
Total | 0 | 82.50 |
Now, to find population variance:
\(\sigma^2 = \frac{[\sqrt{\Sigma (n - n -1)}]}{\sqrt{N - 1}}\)
= 82.5/10
= 8.25
Ques. 140 cm, 150 cm, 139 cm, 150 cm, and 123 cm are the heights of five persons, find the mean height. (3 marks)
Ans. x = (140 + 150 + 139 + 150 + 123)/5 = 702/5 = 140.4 = mean height
x =140.4 cm on average
As a result, the mean height is 140.4 cm.
Ques. A group of students surveyed 20 homes in a locality on the number of plants they have in their homes. (5 marks)
Number of Plants | 0 - 2 | 2 - 4 | 4 - 6 | 6 - 8 | 8 - 10 | 10 - 12 | 12 - 14 |
Number of Houses | 10 | 20 | 10 | 50 | 60 | 20 | 30 |
Ans. The data is given as:
No. of Plants | No.of Houses (fi) | Xi | di= xi - a | fidi |
---|---|---|---|---|
0-2 | 10 | 1 | 1-7=-6 | -60 |
2-4 | 20 | 3 | 3-7=-4 | -80 |
4-6 | 10 | 5 | 5-7=-2 | -20 |
6-8 | 50 | 7=a | 7-7=0 | 0 |
8-10 | 60 | 9 | 9-7=2 | 120 |
10-12 | 20 | 11 | 11-7=9 | 180 |
12-14 | 30 | 13 | 13-7=6 | 180 |
Total | Σfi =200 | Σfidi = 320 |
- We have taken 7 as the assumed mean here.
- Using the assumed mean method,
- Mean = a + (Σfidi / Σfi)
- 7 + (320/ 20)
- 7+ 16
- 23
The required answer is 23
Ques. Find the median of the data 30, 32, 41, 44, 42, 24, 40, 49, 33. (2 marks)
Ans. Arrange the data in ascending order we get,
24,30,32,33,4041,42,44,49
Here, n = 9 which is odd
Median = Value of (9+1/2) th observation = Value of 5 th observation = 40.
Ques. The average of the five integers is 30. If one of the numbers is left out, the mean is 15. Find the number that isn't included. (5 marks)
Ans- Given, n = 5, ¯X¯ = 30
¯X= (∑xi)/n
∑xi= 5 × 30 = 150
Thus, the sum of 5 observations = 150
Let "a" be the excluded number
Total of 4 numbers = 150 - a
Mean of 4 numbers = (150 - a)/4
15 = (150 - a)/4
150 - a = 60
a = 90
⇒ The missing number is 90.
Read Also:
Comments