WorldBankDataTd.jl

WorldBankDataTd package for Julia language

View project onGitHub

Documentation

1 World Bank Data in Julia

As part of the World Bank's Open Data initiative, the World Bank provides free access to their extensive database through their Data Catalog API. This Julia package makes use of the API to interactively access and download World Bank data in Julia.

Similar to the WDI R package, data can be downloaded using function wdi. This function accepts indicators and countries as String or Array{String, 1}. Thereby, countries must be given as World Bank's iso2c code. Two additional Int inputs allow specifying beginning and end of the data period, while input true for keyword argument extra attaches further country information like the capital, longitude, latitude, income range, etc. to each observation. Output will be of type TimeData by default, but can be set to DataFrame through keyword argument format.

Pkg.clone("https://github.com/JuliaFinMetriX/WorldBankDataTd.jl.git")
using WorldBankDataTd

## single indicator, single country
gnp = WorldBankDataTd.wdi("NY.GNP.PCAP.CD", "BR")

## single indicator, multiple countries
gnp = WorldBankDataTd.wdi("NY.GNP.PCAP.CD", ["BR", "US", "DE"])

## multiple indicators, single country
data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"],
                         "BR")

## multiple indicators, multiple countries
data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"],
                         ["BR", "US", "DE"])

## same, but as DataFrame
data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"],
                         ["BR", "US", "DE"],
                         format = DataFrame)

## multiple indicators, multiple countries, additional information
data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"],
                         ["BR", "US", "DE"], extra = true)

data[1:5, :]
idx iso2c country NY.GNP.PCAP.CD SP.DYN.LE00.IN iso3c name region regionId capital longitude latitude income incomeId lending lendingId
1960-12-31 BR Brazil NA 54.6921463414634 BRA Brazil Latin America & Caribbean (all income levels) LCN Brasilia -47.9292 -15.7801 Upper middle income UMC IBRD IBD
1961-12-31 BR Brazil NA 55.1696341463415 BRA Brazil Latin America & Caribbean (all income levels) LCN Brasilia -47.9292 -15.7801 Upper middle income UMC IBRD IBD
1962-12-31 BR Brazil 230 55.6330975609756 BRA Brazil Latin America & Caribbean (all income levels) LCN Brasilia -47.9292 -15.7801 Upper middle income UMC IBRD IBD
1963-12-31 BR Brazil 250 56.08 BRA Brazil Latin America & Caribbean (all income levels) LCN Brasilia -47.9292 -15.7801 Upper middle income UMC IBRD IBD
1964-12-31 BR Brazil 270 56.5102926829268 BRA Brazil Latin America & Caribbean (all income levels) LCN Brasilia -47.9292 -15.7801 Upper middle income UMC IBRD IBD

2 Indicator and country metadata

In addition to the indicator itself, the World Bank also provides a lot of metadata for countries and indicators which can be downloaded with function getWBMeta.

countryData = getWBMeta("countries")
countryData[1:5, :]
iso3c iso2c name region regionId capital longitude latitude income incomeId lending lendingId
ABW AW Aruba Latin America & Caribbean (all income levels) LCN Oranjestad -70.0167 12.5167 High income: nonOECD NOC Not classified LNX
AFG AF Afghanistan South Asia SAS Kabul 69.1761 34.5228 Low income LIC IDA IDX
AFR A9 Africa Aggregates NA NA NA NA Aggregates NA Aggregates NA
AGO AO Angola Sub-Saharan Africa (all income levels) SSF Luanda 13.242 -8.81155 Upper middle income UMC IBRD IBD
ALB AL Albania Europe & Central Asia (all income levels) ECS Tirane 19.8172 41.3317 Upper middle income UMC IBRD IBD

Country metadata provides information for each country or aggregated entity with respect to the following characteristics:

names(countryData)
iso3c
iso2c
name
region
regionId
capital
longitude
latitude
income
incomeId
lending
lendingId

Overall, the country metadata is a DataFrame of the following size:

size(countryData)
262
12

Quite similarly, you can access the complete indicator metadata through input indicators.

indicatorData = getWBMeta("indicators")
indicatorData[1:5, :]
indicator name description sourcedatabase sourcedatabaseId sourceorganization
1.0.HCount.1.25usd Poverty Headcount ($1.25 a day) The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.10usd Under Middle Class ($10 a day) Headcount The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.2.5usd Poverty Headcount ($2.50 a day) The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.Mid10to50 Middle Class ($10-50 a day) Headcount The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.Ofcl Official Moderate Poverty Rate-National The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of data from National Statistical Offices.

The metadata contains information about each indicator regarding the following characteristics:

names(indicatorData)
indicator
name
description
sourcedatabase
sourcedatabaseId
sourceorganization

Just take a look at the size of this table in order to get an impression about the huge amount of indicators that are provided by the World Bank.

size(indicatorData)
13074
6

You can use these two DataFrames of country and indicator metadata to easily search for information through function search_wdi. Before looking at that, however, let's first spend a few words on downloading and caching of country and indicator metadata.

Function getWBMeta is required to force a fresh download of metadata. In general, however, you do not need to download metadata each time that you want to conduct a search. Hence, you should rather use function loadWBMeta instead, which only downloads metadata in case that it can neither be accessed from disk nor from the workspace of the current Julia session. getWBMeta should be called only if you think that your local metadata was corrupted in some way and needs to be refreshed.

Calling loadWBMeta will first try to access metadata stored in global variables WorldBankDataTd.country_cache and WorldBankDataTd.indicator_cache of the current session. If these variables have not been assigned yet, it looks for a local version of the data in directory WorldBankDataTd/data. If there is no local version yet, it will automatically call getWBMeta, which will download the data and store it both on local disk and as caching variables in the current session.

Hence, as we already did load the data, we can access the cached versions in our current workspace.

WorldBankDataTd.country_cache[1:5, :]
iso3c iso2c name region regionId capital longitude latitude income incomeId lending lendingId
ABW AW Aruba Latin America & Caribbean (all income levels) LCN Oranjestad -70.0167 12.5167 High income: nonOECD NOC Not classified LNX
AFG AF Afghanistan South Asia SAS Kabul 69.1761 34.5228 Low income LIC IDA IDX
AFR A9 Africa Aggregates NA NA NA NA Aggregates NA Aggregates NA
AGO AO Angola Sub-Saharan Africa (all income levels) SSF Luanda 13.242 -8.81155 Upper middle income UMC IBRD IBD
ALB AL Albania Europe & Central Asia (all income levels) ECS Tirane 19.8172 41.3317 Upper middle income UMC IBRD IBD
WorldBankDataTd.indicator_cache[1:5, :]
indicator name description sourcedatabase sourcedatabaseId sourceorganization
1.0.HCount.1.25usd Poverty Headcount ($1.25 a day) The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.10usd Under Middle Class ($10 a day) Headcount The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.2.5usd Poverty Headcount ($2.50 a day) The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.Mid10to50 Middle Class ($10-50 a day) Headcount The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank).
1.0.HCount.Ofcl Official Moderate Poverty Rate-National The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. LAC Equity Lab 37 LAC Equity Lab tabulations of data from National Statistical Offices.

3 Searching

The most convenient way to explore indicators probably still is the World Bank webpage, where you can easily use the search functionality to find what your are looking for. Once you found the indicator of interest, you can read off the indicator shortcut name (e.g. SP.DYN.LE00.IN) from the URL of the indicator webpage.

Alternatively, however, this package also contains functionality to interactively search the database from Julia itself. Thereby, function search_wdi makes use of cached country and indicator metadata to speed up search operations. As first argument, you need to choose the metadata to be searched. This can be either "countries" or "indicators". The second argument needs to specify the column to be searched, while the actual search term needs to be given as regex as third argument.

For example, searching the :name column of countries for a case insensitive occurrence of "united":

res = search_wdi("countries", :name, r"united"i)
res
iso3c iso2c name region regionId capital longitude latitude income incomeId lending lendingId
ARE AE United Arab Emirates Middle East & North Africa (all income levels) MEA Abu Dhabi 54.3705 24.4764 High income: nonOECD NOC Not classified LNX
GBR GB United Kingdom Europe & Central Asia (all income levels) ECS London -0.126236 51.5002 High income: OECD OEC Not classified LNX
USA US United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX

Or, searching indicators by some given description:

res = search_wdi("indicators", :description, r"gross national expenditure"i)
res[:name]
Gross national expenditure deflator (base year varies by country)
Gross national expenditure (current US$)
Gross national expenditure (current LCU)
Gross national expenditure (constant 2005 US$)
Gross national expenditure (constant LCU)
Gross national expenditure (% of GDP)

Some further search examples:

search_wdi("countries", :iso2c, r"TZ"i)
search_wdi("countries", :income, r"upper middle"i)
search_wdi("countries", :region, r"Latin America"i)
search_wdi("countries", :capital, r"^Ka"i)
search_wdi("countries", :lending, r"IBRD"i)
search_wdi("indicators", :name, r"gross national expenditure"i)
search_wdi("indicators", :description, r"gross national expenditure"i)
search_wdi("indicators", :source_database, r"Sustainable"i)
search_wdi("indicators", :source_organization,
           r"Global Partnership"i)[1:5, :]
indicator name description sourcedatabase sourcedatabaseId sourceorganization
2.1PRE.PRIMARY.GER School enrolment, preprimary, national source (% gross) Pre-Primary Gross Enrolment Rate (GER): The number of pupils enrolled in pre-primary school, regardless of age, expressed as a percentage of the population in the theoretical age group in pre-primary school. The purpose of this indicator is to measure the general level of participation of children in Early Childhood Education (ECE) programs. Country-specific definition, method and targets are determined by countries themselves. Global Partnership for Education 34 Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG.
2.2GIR Gross intake ratio in grade 1, total, national source (% of relevant age group) Gross intake ratio (GIR): This indicator measures the total number of new entrants in the first grade of primary education, regardless of age, expressed as a percentage of the population at the official primary school-entrance age. Country-specific definition, method and targets are determined by countries themselves. Global Partnership for Education 34 Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG.
2.3GIR.GPI Gender parity index for gross intake ratio in grade 1 Ratio of female to male values of gross intake ratio for primary first grade. Country-specific definition, method and targets are determined by countries themselves. Global Partnership for Education 34 Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG.
2.4OOSC.RATE Rate of out of school children, national source (% of relevant age group) Number of children of official primary school age who are not enrolled in primary or secondary school, expressed as a percentage of the population of official primary school age. This indicator is intended to measure the size of the population in the official primary school age range that should be targeted by policies and efforts to achieve universal primary education. Country-specific definition, method and targets are determined by countries themselves. Global Partnership for Education 34 Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG.
2.5PCR Primary completion rate, total, national source (% of relevant age group) The Primary Completion Rate (PCR) is the percentage of pupils who completed the last year of primary schooling. It is computed by dividing the total number of students in the last grade of primary school minus repeaters in that grade, divided by the total number of children of official completing age. Country-specific definition, method and targets are determined by countries themselves. Global Partnership for Education 34 Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG.

4 Tips and Tricks

You can subset your data with respect to rows, columns or individual entries through the ordinary TimeData functions. For example, selecting entries of US only:

data = wdi("NY.GNP.PCAP.CD", ["US","BR"], 1980, 2012, extra = true)
usData = chkDates(x-> x[:iso2c] .== "US", eachdate(data)) |>
         x -> asArr(x, Bool, false) |>
         x -> data[x[:], :]
usData
idx iso2c country NY.GNP.PCAP.CD iso3c name region regionId capital longitude latitude income incomeId lending lendingId
1980-12-31 US United States 13410 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1981-12-31 US United States 14400 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1982-12-31 US United States 14230 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1983-12-31 US United States 14590 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1984-12-31 US United States 16230 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1985-12-31 US United States 17510 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1986-12-31 US United States 19160 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1987-12-31 US United States 21460 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1988-12-31 US United States 23580 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1989-12-31 US United States 23860 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1990-12-31 US United States 24150 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1991-12-31 US United States 24370 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1992-12-31 US United States 25780 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1993-12-31 US United States 26480 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1994-12-31 US United States 27750 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1995-12-31 US United States 29150 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1996-12-31 US United States 30380 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1997-12-31 US United States 31390 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1998-12-31 US United States 32150 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
1999-12-31 US United States 33800 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2000-12-31 US United States 36090 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2001-12-31 US United States 36840 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2002-12-31 US United States 37460 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2003-12-31 US United States 39950 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2004-12-31 US United States 43690 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2005-12-31 US United States 46350 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2006-12-31 US United States 48080 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2007-12-31 US United States 48640 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2008-12-31 US United States 49350 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2009-12-31 US United States 48040 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2010-12-31 US United States 48960 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2011-12-31 US United States 50660 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX
2012-12-31 US United States 52350 USA United States North America NAC Washington D.C. -77.032 38.8895 High income: OECD OEC Not classified LNX

Furthermore, data can be visualized through the TimeData plotting functions. Simply call function loadPlotting to load Winston and Gadfly packages, and directly plot the data as Timenum object through wstPlot or gdfPlot:

data = wdi("AG.LND.ARBL.HA.PC", "US", 1900, 2011)
arableLand = convert(Timematr, data[symbol("AG.LND.ARBL.HA.PC")])

loadPlotting()

## using Winston
wstPlot(arableLand)

## using Gadfly
gdfPlot(arableLand)

In case of missing values, wdi will return an empty TimeData object without warning.

dfAS = wdi("EN.ATM.CO2E.KT", "AS")

5 Acknowledgement

Most of the package originally has been developed by 4gh (Frank Herrmann) and can still be accessed through the official Julia package repository and on github.

Emacs 24.3.1 (Org mode 8.2.7)

Validate