statistical symbols on a chalkboard

As with all branches of mathematics, statistics employs many statistical symbols and abbreviations. Below is a list of some of these that you may see in studying standard descriptive and inferential statistics, including probability and hypothesis testing. In general, English letters are used to represent values in a sample and Greek letters are used to represent values in a population.

Sample Data

Here are symbols used to represent the values of a set of data and the statistics that describe those data.

n   The number of values in the sample.

x   A data point or value in the sample; the measure of the dependent variable for one independent variable.

f   The frequency that a value occurs.

x̄   The mean of a set of data values.

Σx [Sigma x] Sum of all of the values in the data set.

s   The standard deviation of a set of data values. This is a measure of variability in the data. (s2 is the variance of the data, but standard deviation is more commonly used.)

Consider the following table of values taken from a sample of ages of children in a pre-kindergarten class:

age     frequency (f)

3               7

4               9

5                6

For this sample:

x = 3, 4 and 5

These are the data values (in this sample, ages)

f = 7, 9 and 6

These are the frequencies (how often each data value occurs)

n = 22

This is the sum of all the frequency (f) values.

Σx = 87

This is calculated by multiplying each data value by its frequency and adding the products: 3 ⋅ 7 + 4 ⋅ 9 + 5 ⋅ 6

x̄ = 3.95 years            

Formula: Σx/n. Calculation: 87/22 = 3.95

s = 0.79

This is a measure of the average distance each value is from the mean of the set of data.

Population Parameters

The following symbols are used when referring to values that represent the entire population. In practice, these values are not always known or easy to obtain. When necessary, they may be estimated or hypothesized based on reasonable judgment.

N          The number of values in the entire population from which a sample is drawn.

µ         [mu] The mean of the population.

σ        [sigma] The standard deviation of the population.

Analysis of Data and Hypothesis Testing

Ho        The null hypothesis in a research study, which states that there is no difference between the things being measured.

Ho = there is no change in the rate at which a person types after drinking a caffeinated cup of coffee

H1        The alternative or research hypothesis in a research study, which states that there is some difference between the things being measured. (Some studies will use Ha instead of H1.)

H1 = there will be a significant increase in the rate at which a person types after drinking a caffeinated cup of coffee

z           Standard score for the z probability distribution (used with large samples). The z-score is a measure of a value’s distance from the mean.

In a data set with mean (x̄ ) equal to 100 and standard deviation (s) equal to 15, then a value of 130 has a z-score of 2 because 130 is two standard deviations above the mean.

t           Standard score for t probability distribution (used with small samples).

F          Standard score for the F probability distribution (used to compare two variances or standard deviations).

r           Correlation coefficient. A measure of the relationship between two variables that is between –1 and 1. The closer the value is to –1 or 1, the stronger the relationship.

Scatter plot of five points with a positive correlation (0 < r < 1).

Scatter plot of five points with a positive correlation (0 < r < 1).

 

Scatter plot of five points with no correlation (r ≈ 0).

Scatter plot of five points with no correlation (r ≈ 0).

 

Scatter plot of five points with a negative correlation (–1 < r < 0).

Scatter plot of five points with a negative correlation (–1 < r < 0).

∝       [Alpha] The threshold for concluding a statistically-significant outcome has occurred.

p        The p-value is the probability that the outcome of a statistical test occurred by random chance. If the p-value is less than ∝, then the conclusion of the research is that there is evidence that the dependent variable has a statistically significant effect on the dependent variable.

In a statistical test, if ∝ = 0.05 and p = 0.031, then p < ∝, and we can say that there is a statistically significant effect.

Probability

P(A)     Probability of a specific event, A, occurring.

P(choosing a picture card from a standard deck of cards) = 12/52 = 3/13

nPr     Number of permutations of n items selected r at a time.

12P5 = the number of permutations (ordered arrangements) of 5 elements from a group of 12 elements

12P5 = 12!/(12-5)! = 12!/(7)! = 12 ⋅11 ⋅10 ⋅9 ⋅8 ⋅7 ⋅6 ⋅5 ⋅4 ⋅3 ⋅2 ⋅1 / 7 ⋅6 ⋅5 ⋅4 ⋅3 ⋅2 ⋅1 = 95,040

nCr     Number of combinations of n items selected r at a time.

12C5 = the number of combinations (unordered arrangements) of 5 elements from a group of 12 elements

12C5 =  12!/5!7! = 12 ⋅11 ⋅10 ⋅9 ⋅8 ⋅7 ⋅6 ⋅5 ⋅4 ⋅3 ⋅2 ⋅1 / 7 ⋅6 ⋅5 ⋅4 ⋅3 ⋅2 ⋅1 ⋅5 ⋅4 ⋅3 ⋅2 ⋅1 = 792

Learn More

If you are interested in learning more about statistical symbols and how to use them in data analysis, click here for a listing of online data science degree options.