STA-2023: Statistics for Business and Economics
Text Book: McClave, Benson and Sincich, 10th edition
Vocabulary


 
 

Chapter 1:Statistical Thinking


 

Statistics is a branch of science dealing with methods for collecting, organizing, summarizing, analyzing and interpreting sets of data.

Descriptive Statistics consists of the procedures used to organize and summarize sets of data, as well as to describe their major characteristics.

Inferential Statistics consists of the procedures used to draw conclusions about a population based on the information contained in a representative sample.

Population is the set of all units (subjects or objects) of interest in any statistical study.

Census is a type of statistical study conducted on the entire population.

Sample is a subset of units chosen from the defined population with the purpose of making a statistical inference.

Representative Sample is a sample that reflects the relevant characteristics of the population. A representative sample can be obtained by using sampling techniques.

Simple Random Sampling is the most basic probability sampling technique. It involves a list of all units of the population which are given an equal chance to be included in the sample.

Sampling survey is a type of statistical study involving a sample of units from the defined population and a questionnaire.


 
 









Chapter 2: Descriptive Statistics


 

Individuals are the units (subjects or objects) included in any statistical study.

Variables are characteristics that vary from one individual to another.

Data/Data Set is the set of all observations/measurements collected for one or more variables on a particular set of individuals.

Categorical (Qualitative) data are the observations describing a categorized attribute of the individuals.

Quantitative data are the observations/measurements describing an intrinsically numerical characteristic of the individuals. (Note: numerical codes as, for example, the zip codes and SSN’s are not quantitative data)
 
 

Frequency table is a way of organizing and summarizing the information contained in a data set.


 

Elements of a frequency table

Classes

Class limits

Class boundaries

Absolute frequency or frequency

Relative frequency

Percent frequency


 

Frequency graphs for categorical data

·Bargraphs: consist of unattached bars on a rectangular system. The height of the bars are associated with the class frequencies.

·Pareto Chart: consist of a bargraph where the bars are arranged in decreasing order of frequency from left to right.

·Piecharts: consist of circle graphs in which the size of the pie slices are associated with the class percent frequencies.


 

Frequency graphs for quantitative data

·Histograms: consist of attached bars on a rectangular system. The height of the bars are associated with the class frequencies.

·Stem & Leaf Plots: consist of several rows representing the different class intervals and the data values associated.


 

Frequency Distribution Curves


 

·Frequency Distribution Curve for a quantitative variable is a smooth curve obtained from the relative frequency histogram. There are three typical patterns:

·Bell curve is symmetric and mound shaped

·Skewed to the right has a long right tail

·Skewed to the left has a long left tail


 

Measures of Central Tendency (Center) for Quantitative Data 

·Mean is the simple average calculated over all data points.

·Median is a value located at the middle of the distribution when the data points are arranged in order.

·Mode is the most frequent or repeated data point.


 

Modal Class of grouped data is the class (category or interval) with the highest frequency.


 

Measures of Variability (Spread) for Quantitative Data 

·Range is the distance between the endpoints (highest and lowest values) of a data set.

·Standard Deviation is an estimate of the average distance from all data points to the mean.

·Variance is the square of the standard deviation.


 

Outliers are extremely high or low data values disconnected from the rest of the data set.

Chebyshev and Empirical Rules provide the expected percent of data points falling within 1, 2, and 3 standard deviations of the mean. 

Percentiles are measures of relative standing that describe the percent of data points falling below or at any given data value.

Quartiles are special percentiles that divide the data set in four (evenly weighted) subsets.

InterquartileRange is the distance between the first and third quartile (Q3 - Q1).It describes the spread of the central 50% of the data set. 

Five Number Summary is a way of describing a data set using five special percentiles (Min, Q1, Q2, Q3, Max).


 
 

Chapter 3: Probability


 

Random Experiment is an observable activity whose outcome can not be predicted with certainty.

Sample Space is the set of all basic outcomes of a random experiment.

Sample points are the elements of the sample space.

Event is any subset of basic outcomes of a random experiment.

Impossible Event is an event containing no sample points.

Certain Event is an event containing all sample points of the sample space.

Tree Diagram is a graphical tool used to determine the sample space of random experiments.

Venn Diagram is a way of graphically portraying the sample space and various events.

Mutually Exclusive Events are events that do not share any sample point.


 

Compound Events:

·Intersection of two events A and B is the compound event containing the sample points that belong to both A and B.

·Unionof two events A and B is the compound event containing the sample points that belong to either A or B.

·Complement of any event A is the compound event containing the sample points that belong to S (sample space) and do not belong to A.


 

Conditional Probability is a probability calculated on a reduced sample space. This reduced sample space is defined by a pre-established event (condition).

Independent Events are events such that the occurrence of one of them does not affect the probability of the other.

Contingency Table is a two-way table containing frequency data on two categorical variables.

Probability Tree is a tree diagram involving probabilities of given events.


 
 

Chapter 4: Probability Distributions
 

Random Variable is a numerical variable whose values are associated with a random experiment.

Types of random variables: Discrete and Continuous.

Discrete Random Variables are random variables defined on isolated real numbers. They are typically used for counting.

Continuous Random Variables are random variables defined on a line interval of real numbers. They are typically used for measuring. 

Discrete Probability Distribution is a table, graph or formula assigning probabilities to each value of a discrete random variable.

Probability Histogram is a graphical representation of a discrete probability distribution associating the heights of rectangles with the given probabilities.

Probability Point Graph is a graphical representation of a discrete probability distribution associating the heights of vertical lines with the given  probabilities. 

Mean of a discrete probability distribution is the expected value of any given discrete random variable “X”. The expected value of “X” takes into account not only the X-values but also their associated probabilities.

Standard Deviation of a discrete probability distribution describes the variability of any discrete random variable “X” relative to the mean ?. It takes into account not only the deviation of X-values but also their associated probabilities.

Binomial experiment is a random experiment involving a number of identical and independent trials in which there are only two possible outcomes (success and failure).

Binomial random variable is a discrete random variable describing the number of successes in a binomial experiment.

Parameters of the binomial probability distribution are the number of trials “n” and the rate of success “p” (probability of success for each trial).

Poisson experiment is a random experiment in which the number of occurrences of a given event during a specified period of time is observed. The occurrences of the event are assumed to be random and independent one to another.

Poisson random variable is a discrete random variable describing the number of occurrences of a given event during a specified period of time.

Parameter of the Poisson probability distribution is the average number of occurrences of the given event during a specified period of time.


 

Normal random variable is a continuous random variable with a frequency curve that is smooth, symmetric, and bell-shaped. 

Normal or Bell Curve is the frequency curve for a normal random variable 

Normal probability distribution is the probability model for a normal random variable.

Parameters of a normal probability distribution are the mean and standard deviation of the associated normal random variable. 

Normal population is a population in which a normal random variable has been defined.

Standard normal variable is a normal random variable with a mean of zero and standard deviation of one.

Z-scores are the values of the standard normal variable. They indicate the number of standard deviations that any value of a normal random variable deviates from the mean.


 

Sampling Distributions

Parameter is a descriptive numerical measure of the population. Parameters are fixed numbers usually unknown because the associated population is very large

Statistic is a descriptive numerical measure of a sample. It varies from sample to sample. Statistics are used to estimate parameters.
 
Sampling Distribution is the probability distribution (model) associated with any statistic when repeated random samples are drawn from the defined population.
Central Limit Theorem is a statistical property stating that the sampling distribution of the sample mean is approximately normal when the sample size is large enough.


 

Chapter 5: Estimation with Confidence Intervals


 

Estimation is the process of estimating or predicting the value of a population parameter using a random sample and an estimator.

Estimatoris a formula or statistic defined on sample data with the purpose of estimating a parameter.

Estimate is the numerical result obtained by substituting the sample data on any given estimator

Types of estimates: point estimate and interval estimate.

Point estimate consists of a single figure predicting the parameter value. 

Interval estimate consists of a numerical range where the parameter is expected to fall with certain confidence.

Confidence coefficient is a probability that measures the reliability of any interval estimate.

Confidence level is the confidence expressed as a percentage.

Confidence interval is an interval estimate calculated with a specified confidence level.

Margin of error is a measure of the error of estimation that involves the given confidence level and sample size.

Precision of any confidence interval is associated with the width of the interval estimate. The precision is better as the margin of error is smaller.


 

Chapter 6: Hypothesis Testing (using a single sample)

 
Hypothesis Testing is the statistical procedure designed to test a claim about a population parameter based on the statistical evidence contained in a representative sample of the population.
 
Research hypothesis is a claim about a population parameter that can be tested using sample data.

Statistical hypotheses are the null and alternative hypotheses.

Alternative hypothesis (Ha) describes the research hypothesis of the problem. It contains strict inequalities.

Null hypothesis (Ho) describes the opposite of the alternative hypothesis. The null hypothesis must contain the equality sign.
Test Statistic is a formula that summarizes the statistical evidence collected against the null hypothesis (or in favor of the alternative/research hypothesis).
Rejection region is the set of values of the test statistic indicating convincing evidence against Ho.
Type I error consists of rejecting the null hypothesis Ho when Ho is actually true.

Type II error consists of failing to reject Ho when Ho is actually false.
 
Alpha designates the probability of type I error. 

Beta designates the probability of type II error.

 
p-value is a probability that measures the strength of our case against Ho (that is, in favor of Ha). The p-value of any statistical test describes the observed probability of type I error.