Documentation
Table of Contents
1 World Bank Data in Julia
As part of the World Bank's Open Data initiative, the World Bank provides free access to their extensive database through their Data Catalog API. This Julia package makes use of the API to interactively access and download World Bank data in Julia.
Similar to the WDI R package, data can be downloaded using function
wdi
. This function accepts indicators and countries as String
or
Array{String, 1}
. Thereby, countries must be given as World Bank's
iso2c code. Two additional Int
inputs allow specifying beginning
and end of the data period, while input true
for keyword argument
extra
attaches further country information like the capital,
longitude, latitude, income range, etc. to each observation. Output
will be of type TimeData
by default, but can be set to DataFrame
through keyword argument format
.
Pkg.clone("https://github.com/JuliaFinMetriX/WorldBankDataTd.jl.git")
using WorldBankDataTd ## single indicator, single country gnp = WorldBankDataTd.wdi("NY.GNP.PCAP.CD", "BR") ## single indicator, multiple countries gnp = WorldBankDataTd.wdi("NY.GNP.PCAP.CD", ["BR", "US", "DE"]) ## multiple indicators, single country data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"], "BR") ## multiple indicators, multiple countries data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"], ["BR", "US", "DE"]) ## same, but as DataFrame data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"], ["BR", "US", "DE"], format = DataFrame) ## multiple indicators, multiple countries, additional information data = WorldBankDataTd.wdi(["NY.GNP.PCAP.CD", "SP.DYN.LE00.IN"], ["BR", "US", "DE"], extra = true) data[1:5, :]
idx | iso2c | country | NY.GNP.PCAP.CD | SP.DYN.LE00.IN | iso3c | name | region | regionId | capital | longitude | latitude | income | incomeId | lending | lendingId |
1960-12-31 | BR | Brazil | NA | 54.6921463414634 | BRA | Brazil | Latin America & Caribbean (all income levels) | LCN | Brasilia | -47.9292 | -15.7801 | Upper middle income | UMC | IBRD | IBD |
1961-12-31 | BR | Brazil | NA | 55.1696341463415 | BRA | Brazil | Latin America & Caribbean (all income levels) | LCN | Brasilia | -47.9292 | -15.7801 | Upper middle income | UMC | IBRD | IBD |
1962-12-31 | BR | Brazil | 230 | 55.6330975609756 | BRA | Brazil | Latin America & Caribbean (all income levels) | LCN | Brasilia | -47.9292 | -15.7801 | Upper middle income | UMC | IBRD | IBD |
1963-12-31 | BR | Brazil | 250 | 56.08 | BRA | Brazil | Latin America & Caribbean (all income levels) | LCN | Brasilia | -47.9292 | -15.7801 | Upper middle income | UMC | IBRD | IBD |
1964-12-31 | BR | Brazil | 270 | 56.5102926829268 | BRA | Brazil | Latin America & Caribbean (all income levels) | LCN | Brasilia | -47.9292 | -15.7801 | Upper middle income | UMC | IBRD | IBD |
2 Indicator and country metadata
In addition to the indicator itself, the World Bank also provides a
lot of metadata for countries and indicators which can be downloaded
with function getWBMeta
.
countryData = getWBMeta("countries") countryData[1:5, :]
iso3c | iso2c | name | region | regionId | capital | longitude | latitude | income | incomeId | lending | lendingId |
ABW | AW | Aruba | Latin America & Caribbean (all income levels) | LCN | Oranjestad | -70.0167 | 12.5167 | High income: nonOECD | NOC | Not classified | LNX |
AFG | AF | Afghanistan | South Asia | SAS | Kabul | 69.1761 | 34.5228 | Low income | LIC | IDA | IDX |
AFR | A9 | Africa | Aggregates | NA | NA | NA | NA | Aggregates | NA | Aggregates | NA |
AGO | AO | Angola | Sub-Saharan Africa (all income levels) | SSF | Luanda | 13.242 | -8.81155 | Upper middle income | UMC | IBRD | IBD |
ALB | AL | Albania | Europe & Central Asia (all income levels) | ECS | Tirane | 19.8172 | 41.3317 | Upper middle income | UMC | IBRD | IBD |
Country metadata provides information for each country or aggregated entity with respect to the following characteristics:
names(countryData)
iso3c |
iso2c |
name |
region |
regionId |
capital |
longitude |
latitude |
income |
incomeId |
lending |
lendingId |
Overall, the country metadata is a DataFrame
of the following size:
size(countryData)
262 |
12 |
Quite similarly, you can access the complete indicator metadata
through input indicators
.
indicatorData = getWBMeta("indicators") indicatorData[1:5, :]
indicator | name | description | sourcedatabase | sourcedatabaseId | sourceorganization |
1.0.HCount.1.25usd | Poverty Headcount ($1.25 a day) | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.10usd | Under Middle Class ($10 a day) Headcount | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.2.5usd | Poverty Headcount ($2.50 a day) | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.Mid10to50 | Middle Class ($10-50 a day) Headcount | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.Ofcl | Official Moderate Poverty Rate-National | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of data from National Statistical Offices. |
The metadata contains information about each indicator regarding the following characteristics:
names(indicatorData)
indicator |
name |
description |
sourcedatabase |
sourcedatabaseId |
sourceorganization |
Just take a look at the size of this table in order to get an impression about the huge amount of indicators that are provided by the World Bank.
size(indicatorData)
13074 |
6 |
You can use these two DataFrames
of country and indicator metadata
to easily search for information through function search_wdi
. Before
looking at that, however, let's first spend a few words on downloading
and caching of country and indicator metadata.
Function getWBMeta
is required to force a fresh download of
metadata. In general, however, you do not need to download metadata
each time that you want to conduct a search. Hence, you should rather
use function loadWBMeta
instead, which only downloads metadata in
case that it can neither be accessed from disk nor from the workspace
of the current Julia session. getWBMeta
should be called only if you
think that your local metadata was corrupted in some way and needs to
be refreshed.
Calling loadWBMeta
will first try to access metadata stored in
global variables WorldBankDataTd.country_cache
and
WorldBankDataTd.indicator_cache
of the current session. If these
variables have not been assigned yet, it looks for a local version of
the data in directory WorldBankDataTd/data. If there is no local
version yet, it will automatically call getWBMeta
, which will
download the data and store it both on local disk and as caching
variables in the current session.
Hence, as we already did load the data, we can access the cached versions in our current workspace.
WorldBankDataTd.country_cache[1:5, :]
iso3c | iso2c | name | region | regionId | capital | longitude | latitude | income | incomeId | lending | lendingId |
ABW | AW | Aruba | Latin America & Caribbean (all income levels) | LCN | Oranjestad | -70.0167 | 12.5167 | High income: nonOECD | NOC | Not classified | LNX |
AFG | AF | Afghanistan | South Asia | SAS | Kabul | 69.1761 | 34.5228 | Low income | LIC | IDA | IDX |
AFR | A9 | Africa | Aggregates | NA | NA | NA | NA | Aggregates | NA | Aggregates | NA |
AGO | AO | Angola | Sub-Saharan Africa (all income levels) | SSF | Luanda | 13.242 | -8.81155 | Upper middle income | UMC | IBRD | IBD |
ALB | AL | Albania | Europe & Central Asia (all income levels) | ECS | Tirane | 19.8172 | 41.3317 | Upper middle income | UMC | IBRD | IBD |
WorldBankDataTd.indicator_cache[1:5, :]
indicator | name | description | sourcedatabase | sourcedatabaseId | sourceorganization |
1.0.HCount.1.25usd | Poverty Headcount ($1.25 a day) | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.10usd | Under Middle Class ($10 a day) Headcount | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.2.5usd | Poverty Headcount ($2.50 a day) | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.Mid10to50 | Middle Class ($10-50 a day) Headcount | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of SEDLAC (CEDLAS and the World Bank). |
1.0.HCount.Ofcl | Official Moderate Poverty Rate-National | The poverty headcount index measures the proportion of the population with daily per capita income below the poverty line. | LAC Equity Lab | 37 | LAC Equity Lab tabulations of data from National Statistical Offices. |
3 Searching
The most convenient way to explore indicators probably still is the World Bank webpage, where you can easily use the search functionality to find what your are looking for. Once you found the indicator of interest, you can read off the indicator shortcut name (e.g. SP.DYN.LE00.IN) from the URL of the indicator webpage.
Alternatively, however, this package also contains functionality to
interactively search the database from Julia itself. Thereby, function
search_wdi
makes use of cached country and indicator metadata to
speed up search operations. As first argument, you need to choose the
metadata to be searched. This can be either "countries" or
"indicators". The second argument needs to specify the column to be
searched, while the actual search term needs to be given as regex as
third argument.
For example, searching the :name
column of countries for a case
insensitive occurrence of "united":
res = search_wdi("countries", :name, r"united"i) res
iso3c | iso2c | name | region | regionId | capital | longitude | latitude | income | incomeId | lending | lendingId |
ARE | AE | United Arab Emirates | Middle East & North Africa (all income levels) | MEA | Abu Dhabi | 54.3705 | 24.4764 | High income: nonOECD | NOC | Not classified | LNX |
GBR | GB | United Kingdom | Europe & Central Asia (all income levels) | ECS | London | -0.126236 | 51.5002 | High income: OECD | OEC | Not classified | LNX |
USA | US | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
Or, searching indicators by some given description:
res = search_wdi("indicators", :description, r"gross national expenditure"i) res[:name]
Gross national expenditure deflator (base year varies by country) |
Gross national expenditure (current US$) |
Gross national expenditure (current LCU) |
Gross national expenditure (constant 2005 US$) |
Gross national expenditure (constant LCU) |
Gross national expenditure (% of GDP) |
Some further search examples:
search_wdi("countries", :iso2c, r"TZ"i) search_wdi("countries", :income, r"upper middle"i) search_wdi("countries", :region, r"Latin America"i) search_wdi("countries", :capital, r"^Ka"i) search_wdi("countries", :lending, r"IBRD"i) search_wdi("indicators", :name, r"gross national expenditure"i) search_wdi("indicators", :description, r"gross national expenditure"i) search_wdi("indicators", :source_database, r"Sustainable"i) search_wdi("indicators", :source_organization, r"Global Partnership"i)[1:5, :]
indicator | name | description | sourcedatabase | sourcedatabaseId | sourceorganization |
2.1PRE.PRIMARY.GER | School enrolment, preprimary, national source (% gross) | Pre-Primary Gross Enrolment Rate (GER): The number of pupils enrolled in pre-primary school, regardless of age, expressed as a percentage of the population in the theoretical age group in pre-primary school. The purpose of this indicator is to measure the general level of participation of children in Early Childhood Education (ECE) programs. Country-specific definition, method and targets are determined by countries themselves. | Global Partnership for Education | 34 | Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG. |
2.2GIR | Gross intake ratio in grade 1, total, national source (% of relevant age group) | Gross intake ratio (GIR): This indicator measures the total number of new entrants in the first grade of primary education, regardless of age, expressed as a percentage of the population at the official primary school-entrance age. Country-specific definition, method and targets are determined by countries themselves. | Global Partnership for Education | 34 | Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG. |
2.3GIR.GPI | Gender parity index for gross intake ratio in grade 1 | Ratio of female to male values of gross intake ratio for primary first grade. Country-specific definition, method and targets are determined by countries themselves. | Global Partnership for Education | 34 | Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG. |
2.4OOSC.RATE | Rate of out of school children, national source (% of relevant age group) | Number of children of official primary school age who are not enrolled in primary or secondary school, expressed as a percentage of the population of official primary school age. This indicator is intended to measure the size of the population in the official primary school age range that should be targeted by policies and efforts to achieve universal primary education. Country-specific definition, method and targets are determined by countries themselves. | Global Partnership for Education | 34 | Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG. |
2.5PCR | Primary completion rate, total, national source (% of relevant age group) | The Primary Completion Rate (PCR) is the percentage of pupils who completed the last year of primary schooling. It is computed by dividing the total number of students in the last grade of primary school minus repeaters in that grade, divided by the total number of children of official completing age. Country-specific definition, method and targets are determined by countries themselves. | Global Partnership for Education | 34 | Data were collected from national and other publicly available sources, and validated by the Local Education Group (LEG) in each country. LEGs are typically led by the Ministry of Education and include development partners and other education stakeholders. Data were not processed or analyzed by the Global Partnership for Education. It is reported as it was presented in the original sources, or as it was communicated to us through the Coordinating Agency or Lead Donor of the LEG. |
4 Tips and Tricks
You can subset your data with respect to rows, columns or individual
entries through the ordinary TimeData
functions. For example,
selecting entries of US only:
data = wdi("NY.GNP.PCAP.CD", ["US","BR"], 1980, 2012, extra = true) usData = chkDates(x-> x[:iso2c] .== "US", eachdate(data)) |> x -> asArr(x, Bool, false) |> x -> data[x[:], :] usData
idx | iso2c | country | NY.GNP.PCAP.CD | iso3c | name | region | regionId | capital | longitude | latitude | income | incomeId | lending | lendingId |
1980-12-31 | US | United States | 13410 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1981-12-31 | US | United States | 14400 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1982-12-31 | US | United States | 14230 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1983-12-31 | US | United States | 14590 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1984-12-31 | US | United States | 16230 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1985-12-31 | US | United States | 17510 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1986-12-31 | US | United States | 19160 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1987-12-31 | US | United States | 21460 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1988-12-31 | US | United States | 23580 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1989-12-31 | US | United States | 23860 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1990-12-31 | US | United States | 24150 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1991-12-31 | US | United States | 24370 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1992-12-31 | US | United States | 25780 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1993-12-31 | US | United States | 26480 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1994-12-31 | US | United States | 27750 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1995-12-31 | US | United States | 29150 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1996-12-31 | US | United States | 30380 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1997-12-31 | US | United States | 31390 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1998-12-31 | US | United States | 32150 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
1999-12-31 | US | United States | 33800 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2000-12-31 | US | United States | 36090 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2001-12-31 | US | United States | 36840 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2002-12-31 | US | United States | 37460 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2003-12-31 | US | United States | 39950 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2004-12-31 | US | United States | 43690 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2005-12-31 | US | United States | 46350 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2006-12-31 | US | United States | 48080 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2007-12-31 | US | United States | 48640 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2008-12-31 | US | United States | 49350 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2009-12-31 | US | United States | 48040 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2010-12-31 | US | United States | 48960 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2011-12-31 | US | United States | 50660 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
2012-12-31 | US | United States | 52350 | USA | United States | North America | NAC | Washington D.C. | -77.032 | 38.8895 | High income: OECD | OEC | Not classified | LNX |
Furthermore, data can be visualized through the TimeData
plotting
functions. Simply call function loadPlotting
to load Winston
and
Gadfly
packages, and directly plot the data as Timenum
object
through wstPlot
or gdfPlot
:
data = wdi("AG.LND.ARBL.HA.PC", "US", 1900, 2011) arableLand = convert(Timematr, data[symbol("AG.LND.ARBL.HA.PC")]) loadPlotting() ## using Winston wstPlot(arableLand) ## using Gadfly gdfPlot(arableLand)
In case of missing values, wdi
will return an empty TimeData object
without warning.
dfAS = wdi("EN.ATM.CO2E.KT", "AS")