Chapter
1:Statistical Thinking
Statistics
is a branch of science dealing with methods for collecting, organizing,
summarizing, analyzing and interpreting sets of data.
Descriptive
Statistics
consists of the procedures used to organize and summarize sets of data,
as well as to describe their major characteristics.
Inferential
Statistics
consists of the procedures used to draw conclusions about a population
based on the information contained in a representative sample.
Population
is the set of all units (subjects or objects) of interest in any statistical
study.
Census
is a type of statistical study conducted on the entire population.
Sample
is a subset of units chosen from the defined population with the purpose
of making a statistical inference.
Representative
Sample
is a sample that reflects the relevant characteristics of the population.
A representative sample can be obtained by using sampling techniques.
Simple
Random Sampling
is the most basic probability sampling technique. It involves a list of
all units of the population which are given an equal chance to be included
in the sample.
Sampling
survey
is a type of statistical study involving a sample of units from the defined
population and a questionnaire.
Chapter
2: Descriptive Statistics
Individuals
are the units (subjects or objects) included in any statistical study.
Variables
are characteristics that vary from one individual to another.
Data/Data
Set
is the set of all observations/measurements collected for one or more variables
on a particular set of individuals.
Categorical
(Qualitative) data
are the observations describing a categorized attribute of the individuals.
Quantitative
data
are the observations/measurements describing an intrinsically numerical
characteristic of the individuals. (Note: numerical codes as, for example,
the zip codes and SSN’s are not quantitative
data)
Frequency
table
is a way of organizing and summarizing the information contained in a data
set.
Elements
of a frequency table
Classes
Class
limits
Class
boundaries
Absolute
frequency or frequency
Relative
frequency
Percent
frequency
Frequency
graphs for categorical data
·Bargraphs:
consist of unattached bars on a rectangular system. The height
of the bars are associated with the class frequencies.
·Pareto
Chart:
consist of a bargraph where the bars are arranged in decreasing order of
frequency from left to right.
·Piecharts:
consist of circle graphs in which the size of the pie slices are associated
with the class percent frequencies.
Frequency
graphs for quantitative data
·Histograms:
consist of attached bars on a rectangular system. The height
of the bars are associated with the class frequencies.
·Stem
& Leaf Plots: consist of several rows representing the different class
intervals and the data values associated.
Frequency
Distribution Curves
·Frequency
Distribution Curve for a quantitative variable is a smooth curve obtained
from the relative frequency histogram. There are three typical patterns:
·
·Skewed
to the right has a long right tail
·Skewed
to the left has a long left tail
Measures
of Central Tendency (Center) for Quantitative Data
·Mean
is the simple average calculated over all data points.
·Median
is a value located at the middle of the distribution when the data points
are arranged in order.
·Mode
is the most frequent or repeated data point.
Modal
Class
of grouped data is the class (category or interval) with the highest frequency.
Measures
of Variability (Spread) for Quantitative Data
·Range
is the distance between the endpoints (highest and lowest values) of a
data set.
·Standard
Deviation
is an estimate of the average distance from all data points to the mean.
·Variance
is the square of the standard deviation.
Outliers
are extremely high or low data values disconnected from the rest of the
data set.
Chebyshev
and Empirical Rules
provide the expected percent of data points falling within 1, 2, and 3
standard deviations of the mean.
Percentiles
are measures of relative standing that describe the percent of data points
falling below or at any given data value.
Quartiles
are special percentiles that divide the data set in four (evenly weighted)
subsets.
Five
Number Summary
is a way of describing a data set using five special percentiles (Min,
Q1, Q2, Q3, Max).
Chapter
3: Probability
Random
Experiment
is an observable activity whose outcome can not be predicted with certainty.
Sample
Space
is the set of all basic outcomes of a random experiment.
Sample
points
are the elements of the sample space.
Event
is any subset of basic outcomes of a random experiment.
Impossible
Event
is an event containing no sample points.
Certain
Event
is an event containing all sample points of the sample space.
Tree
Diagram is a graphical tool used to determine the sample space of random
experiments.
Venn Diagram
is a way of graphically portraying the sample space and various events.
Mutually
Exclusive Events
are events that do not share any sample point.
Compound
Events:
·Intersection
of two events A and B is the compound event containing the sample points
that belong to both A and B.
·
·Complement
of any event A is the compound event containing the sample points that
belong to S (sample space) and do not belong to A.
Conditional
Probability
is a probability calculated on a reduced sample space. This reduced sample
space is defined by a pre-established event (condition).
Independent
Events
are events such that the occurrence of one of them does not affect the
probability of the other.
Contingency
Table
is a two-way table containing frequency data on two categorical variables.
Probability
Tree
is a tree diagram involving probabilities of given events.
Chapter
4:
Probability Distributions
Random
Variable
is a numerical variable whose values are associated with a random experiment.
Types
of random variables:
Discrete and Continuous.
Discrete
Random Variables
are random variables defined on isolated real numbers. They are typically
used for counting.
Continuous
Random Variables
are random variables defined on a line interval of real numbers. They are
typically used for measuring.
Discrete
Probability Distribution
is a table, graph or formula assigning probabilities to each value of a
discrete random variable.
Probability
Histogram
is a graphical representation of a discrete probability distribution associating
the heights of rectangles with the given probabilities.
Probability
Point Graph
is a graphical representation of a discrete probability distribution associating
the heights of vertical lines with the given probabilities.
Mean
of a discrete probability distribution
is the expected value of any given discrete random variable “X”. The expected
value of “X” takes into account not only the X-values but also their associated
probabilities.
Standard
Deviation of a discrete probability distribution
describes the variability of any discrete random variable “X” relative
to the mean ?. It takes into account not only the deviation
of X-values but also their associated probabilities.
Binomial
experiment
is a random experiment involving a number of identical and independent
trials in which there are only two possible outcomes (success and failure).
Binomial
random variable
is a discrete random variable describing the number of successes in a binomial
experiment.
Parameters
of the binomial probability distribution
are the number of trials “n” and the rate of success “p” (probability of
success for each trial).
Poisson
experiment
is a random experiment in which the number of occurrences of a given event
during a specified period of time is observed. The occurrences of the event
are assumed to be random and independent one to another.
Poisson
random variable
is a discrete random variable describing the number of occurrences of a
given event during a specified period of time.
Parameter
of the Poisson probability distribution
is the average number of occurrences of the given event during a specified
period of time.
Normal
random variable is a continuous random variable with a frequency curve
that is smooth, symmetric, and bell-shaped.
Normal
or Bell Curve
is the frequency curve for a normal random variable
Normal
probability distribution
is the probability model for a normal random variable.
Parameters
of a normal probability distribution are the mean and standard deviation
of the associated normal random variable.
Normal
population
is a population in which a normal random variable has been defined.
Standard
normal variable
is a normal random variable with a mean of zero and standard deviation
of one.
Z-scores
are the values of the standard normal variable. They indicate the number
of standard deviations that any value of a normal random variable deviates
from the mean.
Sampling
Distributions
Chapter
5:
Estimation with Confidence Intervals
Estimation is the process of estimating or predicting the value of a population parameter using a random sample and an estimator.
Estimatoris
a formula or statistic defined on sample data with the purpose of estimating
a parameter.
Estimate
is the numerical result obtained by substituting the sample data on any
given estimator
Types
of estimates:
point estimate and interval estimate.
Point
estimate
consists of a single figure predicting the parameter value.
Interval
estimate
consists of a numerical range where the parameter is expected to fall with
certain confidence.
Confidence
coefficient
is a probability that measures the reliability of any interval estimate.
Confidence
level
is the confidence expressed as a percentage.
Confidence
interval
is an interval estimate calculated with a specified confidence level.
Margin
of error
is a measure of the error of estimation that involves the given confidence
level and sample size.
Precision
of any confidence interval is associated with the width of the interval
estimate. The precision is better as the margin of error is smaller.
Chapter
6:
Hypothesis Testing (using a single sample)