Gate 2 Quality
Gate 2 Quality
Gate 2 Quality home Gate 2 Quality home Gate 2 Quality home Gate 2 Quality home Gate 2 Quality home Gate 2 Quality home Gate 2 Quality Forum  
Basic Statistics | Probability | Probability Distribution | SQC  
Quality
  - Basic
  - Quality Guru
  - Quality Tools
  - Quality Glossary
Statistics
  - Basic Statistics
  - Probability
  - Probability Distribution
  - SQC
Services
  - Basics
Tools
  - Basic
  - Excel VBA
  - Excel Examples
Articles
  - Basic
  - Metrics
  - Six Sigma
  - Operations Research
  - ISO
  - Estimation
  - Strategy Framework
Basic Statistics

One of the most powerful tools which like other things is inspired highly by nature is STATISTICS.  Statistics is not only a subject but a tool that is and can be used in day to day life. When we go  to market to buy rice we pick a handful of rice inspect it and say that it is ok this is nothing but what is known as SAMPLING in statistics. Statistics consists of a large number of tools which are now being used in different industries to Monitor, Control and Improve the existing process. Six Sigma, SPC / SQC , Reliability, DOE are some of the famous and most commonly used tools in the  industry.

Statistics: Science that deals with collection, classification, analysis and making inferences from data or information. It is the art of making decisions about a process based on the analysis of the information contained in a sample from that population. It is divided into two branches:
Descriptive:  Description of the characteristics of a product or process through the use of the information collected on it. It describes how simple numerical and graphical techniques can be used to summarize the information from the data.                               
Inferential:  Concluding on unknown process parameters on the basis of the information gathered from the sample.
The descriptive and inferential statistics are inter-related i.e. one take use of another.

Data:
The information that is collected for the product or process of our interest is called data. Data is described by random Variable e.g. length, height, weight, etc. Random variables can be of two types:
 
                                                                                  t -1    t +1
Continuous:
Any random variable assuming any value within a given range e.g. Length, Weight, etc
 

 
 
 
 
 
 
 
Discrete/Attribute: Any random variable assuming finite or countably infinite values, like whole numbers e.g. number of defective screws.

Accuracy & Precision:
The two most important factors which have to be achieved for achieving the desired result again and again.
 
 
                                                                                                     
             None                          Precise but not Accurate             Accurate but not Precise                      Accurate and Precise

Measure of Location / Central Tendency: As the name suggests this tells us about the location of the data. It describes where does the data cluster. It gives information about the center of the data. The measures of location / central tendency are

Mean:
It is simply the numerical average of the data that we have. It shows on an average where does the data lie. Mean calculated from the sample observation is called SAMPLE MEAN and is denoted by " Xbar ", whereas the mean calculated from the whole population is called POPULATION MEAN and is denoted by "µ". Sample mean is also sometimes denoted by E(X), this means the Expected Value of the random variable X. 

                                                  E (X) = Σ Xi / n, where i = 1 to n

 Mean is the most widely used measure of central tendency because of its ease of  calculation and understanding.

Median:
If the data has been ranked either in ascending or descending order than the value (data) in the middle is the median. Median is the value which has 50% of the value less than or equal to it. It the number of data is ODD then median is simply the mid value but if the data is EVEN then the average of the two numbers is the median.

Mode:
The value with the highest frequency in the data set is the mode. The number that occurs most can be taken as the typical value for that data set so mode gives a typical value for a data set.
Measure of Dispersion: When data is collected for some process or product then it is not that they come exactly equal all the time. The values of the data will be scattered around some value. This property of the data is called as Dispersion. The measures of dispersion are

Range:
As the linguistics suggests it is simple the spread of the data set. Range is the difference between the maximum and the minimum value of the data set. It is denote by "R" and is given by

                                                        R = Xmax - Xmin

Variance
: It is the measure of the scatter or fluctuation of the data points from the mean. It is the summation of the square of the difference of the data points from the mean divided by the degree of freedom.
For a random variable "X"

                                               Var(x) = E(x2) - {E(x)}2    
 
Population variance is denoted by σ2 and is given by
                                                       σ2 = Σ (Xi - µ)2 / N , where N = population size
 
Sample variance is denoted by s2 and is given by
                                                     s2= Σ (Xi - Xbar)2 / n-1 , where n = sample size
 
Standard Deviation: It is square root of the variance.

Measure of Association:
Suppose for a process or product we collect data for a variable and we know that the overall output is effected by both, now to find how is the output effected by this  variable or related to this variable we take the help of measure of association. It shows on increasing or decreasing on variables what is the effect on the other variable (s).
 
The measure of association is
Correlation Coefficient: It gives a numerical value to the strength of the linear relation between two variables. It is denoted by "r" and is given by 
                                                 r = Σ (Xi -  Xbar) (Yi - Ybar) / √Σ (Xi - Xbar)2 √Σ (Yi - Ybar)2

The value of  "r" always lie between -1 to +1, where "r = -1" Shows perfectly negative correlation i.e. if we increase the value of one variable the other variable will decrease in the same proportion, whereas "r = +1" shows perfectly positive correlation i.e. if we increase one variable the other variable will also increase in the same proportion. When "r = 0" this shows that there is no correlation between the variables.
 
Feedback Form