Introduction
Suppose you have a list of countries and their population. The quantile rank (and percentile rank) of your country correspond the fraction of countries with populations lower or equal than your country.
The difference is that the quantile goes from 0 to 1, and the percentile goes from 0% to 100%.
- 0.25 quantile = 25th percentile = lower quartile
- 0.5 quantile = 50th percentile = median
- 0.75 quantile = 75th percentile = upper quartile
- etc.
So if your country has more inhabitants than 75% of the other countries in the world, it is
- in the 0.75 quantile
- in the 75th percentile
- in the upper quartile.
Let’s compute the quantile rank of your country.
Practice
import pandas as pd import numpy as np
We will use a simplified version of the WorldBank population per country dataset – the original csv file is available here.
df = pd.read_csv("../data/countries-population-2018.csv")
df = df.dropna()
df['population'] = df['population'].apply(lambda x: int(x))
df.to_csv("../data/countries-population-2018.csv", index=False)
df.head(3)
| country | population | |
|---|---|---|
| 0 | aruba | 105845 |
| 1 | afghanistan | 37172386 |
| 2 | angola | 30809762 |
def QuantileRank(df, country):
# your country's population
population = int(df[df['country']==country]['population'])
# countries with population lower or equal than your country
lower = df[df['population'] <= population]
# number of such countries
n_lower = len(lower.index)
# total number of countries
n_countries = len(df.index)
# percntile rank
quantile_rank = n_lower/n_countries
return quantile_rank
def PercentileRank(df, country):
# This is just the quantile rank, times 100
quantile_rank = QuantileRank(df, country)
percentile_rank = 100.0*quantile_rank
return percentile_rank
Canada is the 81th percentile
PercentileRank(df, 'canada')
81.73076923076923
India is in the 99th percentile
PercentileRank(df, 'india')
99.51923076923077
Full code on my Github here.