Sampling Data

There are two broad categories of statistical tests:  parametric and non parametric statistical tests. In most introductory and general statistics courses, parametric tests are discussed, in particular z-tests, t-tests, analysis of variance, correlation, and regression.  In running parametric tests, assumptions are made, including that the data come from a population that is normally distributed (has a bell curve), the data are of interval or ratio level, and the sample is randomly chosen from the population.  In many cases, though, one or more of these assumptions do not hold, so to conduct analysis of the data, the researcher turns to non parametric tests.

Some advantages of non parametric tests are that most of them depend on few assumptions, computations for the tests can be conducted quickly and by hand, and they can be used on lower level data (counts and rankings).  One of the disadvantages of non parametric tests is that they are sometimes used instead of more powerful parametric tests because of the ease of calculations.  A second disadvantage is that if a non-parametric test is used in place of a parametric test, much of the available information about the data is discarded.

 Types of Non-parametric Tests

A large number of non-parametric tests have evolved over the past few centuries.  The ones listed below are intended to show the range of tests available for a variety of situations.

Sign Test

In the sign test, one of the oldest non-parametric tests, each value in the data set is given a sign, positive or negative.  (For example, to test if a hypothesized median for a population is accurate, each value in a data set can be subtracted from the hypothesized median, which results in positive and negative values.)  The calculations for the test statistic are made using the number of values that are positive and negative (that is, the signs of the numbers).

Binomial Test

The binomial test is used to test a hypothesis about a population proportion when the population has only two elements.  Tables of values for the binomial formula for n (the sample size) elements taken r (the number of elements of one type in the sample) at a time with a hypothesized proportion (p) are used to determine for proportion of elements that should be in a sample if the proportion is correct.  The test statistic compares the actual data to what it should be.

Mann-Whitney Test

The Mann-Whitney Test can be used to compare the medians of two unequal-sized samples.  The essence of the test is that that all the values from both sets of data are rank-ordered and the ranks of the values from each set are totaled.  The test statistic takes into account the sample sizes in determining whether the total of the ranks, and thus the medians, are the same or not.

Ansari-Bradley Test

One way to test for the equality of two measures of variation, such as standard deviation, is the Ansari-Bradley test.  Like the Mann-Whitney test, the Ansari-Bradley test assigns ranks to the combined set of values from the two samples.  However, in this test, each rank is assigned twice, once from each end of the combined list of values (e.g., the highest and lowest values are both assigned a rank of 1).  The data set with the lowest total ranks has greater dispersion, though the test statistic compares the totals and tables of values are used to determine if the totals are significantly different.

Chi-Square Test

Perhaps the best-known and most versatile non parametric test is the chi-square test.  Among other things, this test can be used to test goodness of fit of a sample to a specified type of distribution and to test for the independence of two or three samples.  This test uses formulas to compute the expected value of each category of data and compares these values to the actual values to determine if significant differences exist.

Wilcoxon Matched-Pairs Signed-Ranks Test

When two samples are related (such as when the same subjects are measured twice in some way), the Wilcoxon Matched-Pairs Signed-Ranks test can be used to determine if the medians of the samples are the same or not.  In this test, the signs of the differences between matched pairs of values are used to compare the medians to each other.

Kruskal-Wallis One-Way Analysis of Variance by Ranks

To compare the medians of three or more samples, the Kruskal-Wallis One-Way Analysis of Variance by Ranks can be used.  As with other non parametric tests, the Kruskal-Wallis test ranks all the data from the sample and uses the sums of the rankings for each sample to determine if they are the same or if one or more is significantly different than the others.  The test statistic for this test takes into consideration the differences in sample sizes.

Kolmogorov-Smirnov One-Sample Test

Goodness of fit – whether a set of data conforms to a hypothesized distribution – can be checked by the Kolmogorov-Smirnov One-Sample test.  (There is also a Kolmogorov-Smirnov Two-Sample test.)  Like chi-square, this test basically compares observed values from a sample to the values that should occur in a hypothesized distribution.  The test statistic determines if the differences between actual and hypothesized values are close or not.

Spearman Rank Correlation Coefficient

To measure the association of two variables, the Spearman Rank Correlation Coefficient can be computed.  In this test, the data come from paired observations (measurements on the same subject) and rankings are made for each type of observation independently (that is, the two sets of data are not intermingled and ranked together).  The test statistic takes into consideration the rankings of the values for each subject to determine how close the measurements from the same subject are.

Brown-Mood Method

The Brown-Mood Method is a way to determine the slope and y-intercept of a regression line that best fits a set of paired data.  This method employs plotting the values on a coordinate system and determining, in essence, the quartile values for each variable as an estimate of the regression line.

If you are interested in non parametric statistical tests, you may be interested in learning more about degrees in data science.

 

Reference

Daniel, W. W. (1990.) Applied Nonparametric Statistics, 2nd edition.  PWS-Kent Publishing Company.

Tag: Data Analysis, Statistical Analysis, Symbols, Statistical Tests, Non parametric