EconDatasets package for Julia language

View project onGitHub


1 Motivation

The package shall make various econometric data sets accessible, similar to the way that the RDatasets package provides access to standard data sets that are available in R.

The problem with some econometric data sets, however, is that although data is freely available at the web, it is not allowed to redistribute the data set. Hence, some of the data sets are not shipped automatically with the package, but they need to be downloaded into the package's data directory first.

Once a given data set is downloaded, it can be loaded in Julia with a syntax similar to RDatasets:

using EconDatasets
sectors = dataset("Sectors")

In order to download a given data set, one needs to use call function getDataset first. This way, data is downloaded into the data directory of the package, making it accessible for future use with function dataset.

getDataset("FFF") # Fama French factors

In addition, there also exist functions that directly load data into an interactive Julia session, without storing them on disk for future use. For some of these data sets, however, some minor manual interaction might be required to label and process the data adequately.

2 Table of data sets

Name Description
Sectors Sector affiliations for SP500 components
SP500 Stock price data for SP500 components
UMD Fama French momentum portfolio
FFF Fama French factors
SP500Ticker SP500 ticker symbols from Wikipedia
Indices Major stock price indices
Treasuries US Treasury rates, several maturities
DieboldLi fixed-maturity yields used in Diebold-Li

From these data sets, only the following ones are already included in the repository:

  • Sectors
  • SP500Ticker

3 Data sets to be downloaded first

The following data sets do not ship with the package, as they may not be re-distributed. Hence, they need to be downloaded first. This can be done with function getDataset. Some of these data sets will call a script and not a function, so they will create some variables in your workspace. Also, if running the scripts through the high level function getDataset causes an error, it still might work if you run the script with include:

include(joinpath(Pkg.dir("EconDatasets"), "src/getDataset/", "getSP500.jl"))
  • Data sets downloaded by functions
    • FFF
    • UMD
    • Indices
    • Treasuries
    • DieboldLi
  • Data sets downloaded by scripts
    • SP500 (due to parallel computing)
    • SP500Ticker (due to Gumbo package usage)

4 Table of functions to interactively download data

Name Description
readFamaFrenchRaw see ijulia example
readYahooFinance see blog post
readYahooAdjClose see blog post

5 Acknowledgement

Of course, any package can only be as good as the individual parts that it builds on. Accordingly, I'd like to thank all people that were involved in the development of all the functions that were made ready to use for me to build this package upon. In particular, I want to thank the developers of

  • the Julia language, for their continuous and tremendous efforts during the creation of this free, fast and highly flexible programming language!
  • the DataFrames package, which definitely provides the best representation for general types of data in data analysis. It's a role model that every last bit of code of TimeData depends on, and the interface that every statistics package should use.
  • the Datetime package, which is a thoughtful implementation of dates, time and durations, and the backbone of all time components in TimeData.
  • the TimeSeries package, which follows a different approach to handling time series data. Having a quite similar goal in mind, the package was a great inspiration for me, and occasionally I even could borrow parts of code from it (for example, from an old version of function readtime).

Emacs 24.3.1 (Org mode 8.2.10)