# Difference between revisions of "Ranking vectors of data"

Jwdietrich (talk | contribs) (→Advanced algorithm addressing potential ties: Removing the variable k, which is not needed in this simplified example.) |
Jwdietrich (talk | contribs) (→Simple algorithm not caring for ties: fixing typo in line 14 of the code (originally reported on Twitter by Jesús Gómez)) |
||

Line 18: | Line 18: | ||

if x[j] < x[i] then | if x[j] < x[i] then | ||

inc(lessCount); | inc(lessCount); | ||

− | result[i] := lessCount | + | result[i] := lessCount + 1; |

end; | end; | ||

end; | end; |

## Latest revision as of 08:48, 7 March 2021

**Ranking** is an important method in statistics and data science. It is a preparative step that replaces each value by a rank in the order of magnitude of the original data. This operation is necessary for many non-parametric methods, e.g. the U test by Wilcoxon-Mann-Whitney or Spearman's rank correlation.

## Contents

## Simple algorithm not caring for ties

In the simplest case, where every value is distinct without identical data, the following frugal algorithm may be sufficient:

```
function rank(x: array of longint): array of real;
var
i, j, n: integer;
lessCount: integer; { number of observations less than a given observation }
begin
n := length(x);
SetLength(result, n);
for i := 0 to n - 1 do
begin
lessCount := 0;
for j := 0 to n - 1 do
if x[j] < x[i] then
inc(lessCount);
result[i] := lessCount + 1;
end;
end;
```

This function delivers a list of ranks that replace the original data. It doesn't deliver satisfactory results if the data contain identical values, e.g. in the vector 3, 4, 4, 9, 0, 1. In statistical terminology, repeated observation-values are referred to as *ties*. The potential presence of ties requires a more complex algorithm:

## Advanced algorithm addressing potential ties

```
type
TTiesMethod = (ascend, descend, average, min, max);
function rank(x: array of longint;
TiesMethod: TTiesMethod = average): array of real;
var
i, j, n: integer;
equalCount, lessCount: array of integer;
begin
n := length(x);
SetLength(lessCount, n);
SetLength(equalCount, n);
SetLength(Result, n);
for i := 0 to n - 1 do
begin
equalCount[i] := 0;
lessCount[i] := 0;
for j := 0 to n - 1 do
case TiesMethod of
average, min, max:
if x[j] = x[i] then
Inc(equalCount[i])
else if x[j] < x[i] then
Inc(lessCount[i]);
ascend:
if (j <= i) and (x[j] = x[i]) then
Inc(equalCount[i])
else if x[j] < x[i] then
Inc(lessCount[i]);
descend:
if (j >= i) and (x[j] = x[i]) then
Inc(equalCount[i])
else if x[j] < x[i] then
Inc(lessCount[i]);
end;
end;
for i := 0 to n - 1 do
case TiesMethod of
average:
Result[i] := lessCount[i] + (equalCount[i] + 1) / 2;
min:
Result[i] := lessCount[i] + 1;
max, ascend, descend:
Result[i] := lessCount[i] + equalCount[i];
end;
end;
```

This function delivers a vector of ranks that is corrected for ties. Similar to the function *rank* of the statistical programming language S (e.g. in the environment R) it supports different methods to address potential ties. They include *average* (default method, the rank for each occurrence of a repeated number is given as *rank*[*i*] = *lessCount* + (*equalCount* + 1) / 2), *ascend* (permutation with increasing values at each index set of ties), *descend* (decreasing values), *min* (replaces values with ties with their minimum) and *max* (replacement by maximum).

## See also

- Samples and permutations
- Dev random
- Functions for descriptive statistics
- Generating Random Numbers
- Marsaglia's pseudo random number generators
- A simple implementation of the Mersenne twister
- Delphi compatible LCG Random
- EverettRandom

## References

- Cooke D, Craven AH, Clarke GM. Statistical Computing in Pascal. Edward Arnold (Publishers), London 1985
- Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
- R Documentation: Sample Ranks.