Documentation
Table of Contents
1 Getting started
The goal of the package TimeData is to provide fast, robust and convenient representation of time series data. This shall be achieved through introduction of several new composite types, matching the characteristics of various types of time series data more closely.
Behavior of TimeData
types in most cases borrows heavily from
DataFrames
, as this package provides an excellent way of
representing quite general data. In particular, DataFrames
already
allow for named columns, as well as missing observations. In many
situations, however, the time information of the data adds a separate
and unique dimension to the data. The different characteristics of
time data and observations then would not be accounted for
sufficiently, if time was only one further column amongst the other
observations.
The TimeData
package provides users with the well-known experience
of DataFrames
whenever possible. It only deviates where additional
convenience can be achieved through a more explicit separation between
time information and observations. In particular, the following
features allow convenient data handling.
1.1 Intuitive indexing
Extending the fabulous ways of indexing already provided by
DataFrames
, data can easily be accessed - amongst others - through
variable names and dates.
using TimeData using Dates fileName = joinpath(Pkg.dir("TimeData"), "data/logRet.csv") tm = TimeData.readTimedata(fileName)[1:10, 1:4]
idx | MMM | ABT | ACE | ACT |
2012-01-03 | 2.12505 | 0.88718 | 0.29744 | 0.47946 |
2012-01-04 | 0.82264 | -0.38476 | -0.95495 | -0.52919 |
2012-01-05 | -0.44787 | -0.23157 | 0.28445 | 2.74752 |
2012-01-06 | -0.51253 | -0.93168 | 0.23891 | 1.94894 |
2012-01-09 | 0.58732 | 0 | 0.46128 | 0.28436 |
2012-01-10 | 0.52193 | 0.46693 | 1.31261 | 1.85986 |
2012-01-11 | -0.63413 | -0.38895 | -1.52066 | -3.06604 |
2012-01-12 | 0.60934 | -0.46875 | 0.50453 | -0.93039 |
2012-01-13 | -0.80912 | 0.50771 | -0.47478 | 0.25752 |
2012-01-17 | 0.74711 | 0.50515 | 0.297 | -7.04176 |
Using a range of dates:
tm[Date(2012, 1, 4):Date(2012, 1, 10), 1:2]
idx | MMM | ABT |
2012-01-04 | 0.82264 | -0.38476 |
2012-01-05 | -0.44787 | -0.23157 |
2012-01-06 | -0.51253 | -0.93168 |
2012-01-09 | 0.58732 | 0 |
2012-01-10 | 0.52193 | 0.46693 |
Using numeric indexing:
tm[3:8, 2:3]
idx | ABT | ACE |
2012-01-05 | -0.23157 | 0.28445 |
2012-01-06 | -0.93168 | 0.23891 |
2012-01-09 | 0 | 0.46128 |
2012-01-10 | 0.46693 | 1.31261 |
2012-01-11 | -0.38895 | -1.52066 |
2012-01-12 | -0.46875 | 0.50453 |
Using column names:
tm[3:8, [:ABT, :MMM]]
idx | ABT | MMM |
2012-01-05 | -0.23157 | -0.44787 |
2012-01-06 | -0.93168 | -0.51253 |
2012-01-09 | 0 | 0.58732 |
2012-01-10 | 0.46693 | 0.52193 |
2012-01-11 | -0.38895 | -0.63413 |
2012-01-12 | -0.46875 | 0.60934 |
1.2 Separate time information
Through the hard-coded distinction between time data and observations, code will become cleaner and less error-prone, as users can not mess with time information as easily anymore. Functions that apply to observations only, do not need to explicitly take into account the time information anymore. For example, simply rescaling observations by a factor of 2 becomes:
newTm = 2.*tm
idx | MMM | ABT | ACE | ACT |
2012-01-03 | 4.2501 | 1.77436 | 0.59488 | 0.95892 |
2012-01-04 | 1.64528 | -0.76952 | -1.9099 | -1.05838 |
2012-01-05 | -0.89574 | -0.46314 | 0.5689 | 5.49504 |
2012-01-06 | -1.02506 | -1.86336 | 0.47782 | 3.89788 |
2012-01-09 | 1.17464 | 0 | 0.92256 | 0.56872 |
2012-01-10 | 1.04386 | 0.93386 | 2.62522 | 3.71972 |
2012-01-11 | -1.26826 | -0.7779 | -3.04132 | -6.13208 |
2012-01-12 | 1.21868 | -0.9375 | 1.00906 | -1.86078 |
2012-01-13 | -1.61824 | 1.01542 | -0.94956 | 0.51504 |
2012-01-17 | 1.49422 | 1.0103 | 0.594 | -14.08352 |
Functions that do not naturally extend to time data can simply be overloaded and delegated to observations only.
Calculating mean values for each column:
mean(tm, 1)
MMM | ABT | ACE | ACT |
0.300974 | -0.0038740000000000107 | 0.04458300000000001 | -0.398972 |
Calculating mean values per date:
rowmeans(tm)
idx | x1 |
2012-01-03 | 0.9472824999999999 |
2012-01-04 | -0.261565 |
2012-01-05 | 0.5881325000000001 |
2012-01-06 | 0.18590999999999996 |
2012-01-09 | 0.33324 |
2012-01-10 | 1.0403325 |
2012-01-11 | -1.402445 |
2012-01-12 | -0.0713175 |
2012-01-13 | -0.1296675 |
2012-01-17 | -1.373125 |
2 TimeData types
All types introduced are subtypes of abstract type AbstractTimedata
and have two fields, separating time information from observations:
- vals: a
DataFrame
- idx: an
Array{T, 1}
, consisting of either typeInteger
,Date
, orDateTime
(from packageDates
)
However, accessing these fields directly is considered poor style, as one could circumvent any constraints on the individual fields this way! Hence, only reference the fields directly if you really know what you are doing.
Given these commonalities of all TimeData
types, there exist several
distinct types that implement different constraints on the
observations. Sorted from most general to most specific case, the
following types are introduced:
Timedata
: no further restrictions on observationsTimenum
: observations may consist of numeric values or missing values onlyTimematr
: observations must be numeric values only
Given that these constraints are fulfilled, one is able to define and
use more specific functions matching the data characteristics more
closely. For example, Timematr
instances can directly make use of
fast and numerically optimized methods using Array{Float64, 2}
under
the hood. Hence, it is important that these constraints are reliably
fulfilled. They are guaranteed as they are hard-coded into variables
at time of creation through the individual constructors. And, by now,
only a few methods exist that allow editing fields for data
manipulation, and they all are designed to maintain all restrictions
on the data.
The package generally tries to achieve high performance by delegating
functionality to the most specialized case. For example, methods for
instances of Timematr
are delegated to Array{Float64, 2}
, as they
are not allowed to entail NAs
anyways.
3 Constructors
For each type, variables can be created by directly handing over
observations as DataFrame
and time information as Array
to the
inner constructor.
vals = rand(4, 3); dats = [Date(2013, 7, ii) for ii=1:4]; nams = [:A, :B, :C]; valsDf = composeDataFrame(vals, nams); tm = Timematr(valsDf, dats)
idx | A | B | C |
2013-07-01 | 0.3668076837657581 | 0.6132177634585174 | 0.7271614060006228 |
2013-07-02 | 0.8402870142445031 | 0.8268085074961338 | 0.7631093687543671 |
2013-07-03 | 0.6880695323767783 | 0.03210864954311465 | 0.9955387409975662 |
2013-07-04 | 0.8182674804881986 | 0.7627315792913878 | 0.966201684866371 |
Besides, there also exist several outer constructors for each type,
allowing more convenient creation. In particular, if observations do
not entail any NAs
, there is no need to wrap them up into
DataFrames
previously, but TimeData
objects can simply be created
from Arrays
. Also, there might be situations where variable names
and / or dates are missing. For these cases, there exist more
convenient outer constructors, too, which generally follow the
convention that dates never precede variable names as arguments.
td = Timedata(vals, nams, dats) td = Timedata(vals, nams) td = Timedata(vals, dats) td = Timedata(vals)
idx | x1 | x2 | x3 |
1 | 0.3668076837657581 | 0.6132177634585174 | 0.7271614060006228 |
2 | 0.8402870142445031 | 0.8268085074961338 | 0.7631093687543671 |
3 | 0.6880695323767783 | 0.03210864954311465 | 0.9955387409975662 |
4 | 0.8182674804881986 | 0.7627315792913878 | 0.966201684866371 |
4 Indexing
The idea of getindex
is to stick with the behavior of DataFrames
as far as possible for the basics, while extending it to allow
indexing of rows by dates. Hence, indexing TimeData
types should
hopefully fit seamlessly into behavior familiar from other important
types, with only intuitive extensions. However, it is important to
note that indexing deviates from DataFrame
behavior in one aspect:
getindex
will NEVER change the type of the variable! If you call it
on a Timematr
variable, it will also return a Timematr
variable,
and if you call it on type Timenum
it will return Timenum
as well.
This behavior does deviate from DataFrame
behavior in such that, for
example, DataFrames
return Array
for single columns.
typeof(valsDf[:, 1]) typeof(td[:, 1]) typeof(valsDf[1, 1]) typeof(td[1, 1]) ## empty instance typeof(td[4:3, 5:4])
This will print:
Array{Float64,1} Timedata{Int64} (constructor with 1 method) Float64 Timedata{Int64} (constructor with 1 method) Timedata{Int64} (constructor with 1 method)
Possible ways of indexing are:
## indexing by numeric indices tmp = tm[2:3] tmp = tm[1:3, 1:2] tmp = tm[2, :] tmp = tm[2] tmp = tm[1:2, 2] tmp = tm[3, 3] ## indexing with column names as symbols tmp = tm[:A] tmp = tm[2, [:A, :B]] ## logical indexing logicCol = [true, false, true] logicRow = repmat([true, false], 2, 1)[:] tmp = tm[logicCol] tmp = tm[logicRow, logicCol] tmp = tm[logicRow, :] ## indexing with Dates DatesToFind = [Date(2013, 7, ii) for ii=2:3] tmp = tm[DatesToFind] tm[Date(2013,7,1):Date(2013,7,3)] tm[Date(2013,7,2):Date(2013,7,3), :B] tm[Date(2013,7,3):Date(2013,7,12), [true, false, false]]
As all getindex
methods are designed to retain the TimeData
structure, we need an additional get
method to actual get the value
of an entry itself.
## returning the first value only get(tm, 1, 1) ## returning all values as Array{Any,2} kk = get(tm) isa(kk, Array{Any})
true
5 Read, write, io
Data can easily be imported from csv-files using function
readTimedata
. Under the hood, the function makes use of readtable
from the DataFrames
package. Additionally, columns are parsed for
dates. The first column matching the regexp for dates will be chosen
as time identifier.
filePath = joinpath(Pkg.dir("TimeData"), "data", "logRet.csv"); tm = readTimedata(filePath) tm[1:5, 1:4]
idx | MMM | ABT | ACE | ACT |
2012-01-03 | 2.12505 | 0.88718 | 0.29744 | 0.47946 |
2012-01-04 | 0.82264 | -0.38476 | -0.95495 | -0.52919 |
2012-01-05 | -0.44787 | -0.23157 | 0.28445 | 2.74752 |
2012-01-06 | -0.51253 | -0.93168 | 0.23891 | 1.94894 |
2012-01-09 | 0.58732 | 0 | 0.46128 | 0.28436 |
In the REPL itself, Julia calls the display
method to show
information about an instance.
tm
type: Timematr{Date} dimensions: (333,348) 333x6 DataFrame |-------|------------|----------|----------|----------|----------|----------| | Row # | idx | MMM | ABT | ACE | ACT | ADBE | | 1 | 2012-01-03 | 2.12505 | 0.88718 | 0.29744 | 0.47946 | 1.0556 | | 2 | 2012-01-04 | 0.82264 | -0.38476 | -0.95495 | -0.52919 | -1.02024 | | 3 | 2012-01-05 | -0.44787 | -0.23157 | 0.28445 | 2.74752 | 0.70472 | | 4 | 2012-01-06 | -0.51253 | -0.93168 | 0.23891 | 1.94894 | 0.83917 | | 5 | 2012-01-09 | 0.58732 | 0.0 | 0.46128 | 0.28436 | -0.66376 | | 6 | 2012-01-10 | 0.52193 | 0.46693 | 1.31261 | 1.85986 | 2.32125 | | 7 | 2012-01-11 | -0.63413 | -0.38895 | -1.52066 | -3.06604 | 0.41012 | | 8 | 2012-01-12 | 0.60934 | -0.46875 | 0.50453 | -0.93039 | -0.30743 | ⋮ | 325 | 2013-04-19 | 0.69118 | 0.86745 | 0.77089 | 1.84469 | 0.6278 | | 326 | 2013-04-22 | 0.08606 | -0.84023 | 0.27067 | -0.64178 | -0.47048 | | 327 | 2013-04-23 | 1.48952 | 0.86721 | 0.8188 | 0.93582 | 0.76063 | | 328 | 2013-04-24 | 0.451 | -1.8794 | -0.51518 | -0.49734 | -0.44673 | | 329 | 2013-04-25 | -2.81414 | -0.08252 | -0.04492 | 0.61876 | 0.84708 | | 330 | 2013-04-26 | -1.04683 | -0.08259 | -0.63106 | 2.05182 | -0.31125 | | 331 | 2013-04-29 | 0.03897 | 0.74085 | -0.02261 | 4.49427 | 0.33344 | | 332 | 2013-04-30 | 0.84381 | 0.51807 | 0.24845 | 0.14197 | 0.04438 | | 333 | 2013-05-01 | -0.14498 | -0.08162 | -0.94057 | -1.27548 | -0.82415 |
As one can see, the display
method will show the type of the
variable, together with its dimensions and a snippet of the first
values. Note that the number of columns does not entail the dates
column, but does only count the columns of the remaining variables.
Inherently, display
makes use of the display method that is
implemented for DataFrames
, which is the reason for the somewhat
misleading output line 333x348 DataFrame:
. An issue that still needs
to be fixed. However, html display in IJulia already shows an improved
table output.
An even more elaborate way of looking at the data contained in a
TimeData
type is function str
(following the name used in R),
which will print:
## str(tm) # uncomment for execution
type: Timematr{Date} :vals DataFrame :idx Array{Date,1} dimensions of vals: (333,348) ------------------------------------------- From: 2012-01-03, To: 2013-05-01 ------------------------------------------- 333x6 DataFrame |-------|------------|----------|----------|----------|----------|----------| | Row # | idx | MMM | ABT | ACE | ACT | ADBE | | 1 | 2012-01-03 | 2.12505 | 0.88718 | 0.29744 | 0.47946 | 1.0556 | | 2 | 2012-01-04 | 0.82264 | -0.38476 | -0.95495 | -0.52919 | -1.02024 | | 3 | 2012-01-05 | -0.44787 | -0.23157 | 0.28445 | 2.74752 | 0.70472 | | 4 | 2012-01-06 | -0.51253 | -0.93168 | 0.23891 | 1.94894 | 0.83917 | | 5 | 2012-01-09 | 0.58732 | 0.0 | 0.46128 | 0.28436 | -0.66376 | | 6 | 2012-01-10 | 0.52193 | 0.46693 | 1.31261 | 1.85986 | 2.32125 | | 7 | 2012-01-11 | -0.63413 | -0.38895 | -1.52066 | -3.06604 | 0.41012 | | 8 | 2012-01-12 | 0.60934 | -0.46875 | 0.50453 | -0.93039 | -0.30743 | ⋮ | 325 | 2013-04-19 | 0.69118 | 0.86745 | 0.77089 | 1.84469 | 0.6278 | | 326 | 2013-04-22 | 0.08606 | -0.84023 | 0.27067 | -0.64178 | -0.47048 | | 327 | 2013-04-23 | 1.48952 | 0.86721 | 0.8188 | 0.93582 | 0.76063 | | 328 | 2013-04-24 | 0.451 | -1.8794 | -0.51518 | -0.49734 | -0.44673 | | 329 | 2013-04-25 | -2.81414 | -0.08252 | -0.04492 | 0.61876 | 0.84708 | | 330 | 2013-04-26 | -1.04683 | -0.08259 | -0.63106 | 2.05182 | -0.31125 | | 331 | 2013-04-29 | 0.03897 | 0.74085 | -0.02261 | 4.49427 | 0.33344 | | 332 | 2013-04-30 | 0.84381 | 0.51807 | 0.24845 | 0.14197 | 0.04438 | | 333 | 2013-05-01 | -0.14498 | -0.08162 | -0.94057 | -1.27548 | -0.82415 |
This additionally shows the names of the fields of the object, and also explicitly displays the time period of the data.
To save an object to disk, simply call function writeTimedata
, which
internally uses writetable
from the DataFrame
package. In
accordance with writetable
, the first argument is the filename as
string, while the second argument is the variable to be saved.
# writeTimedata("data/logRet2.csv", tm) # uncomment for execution
6 Conversion
To DataFrame:
- td.vals (without index)
- convert(DataFrame, td)
Get column as DataArray:
- td.vals[ii]
- td.vals[:, ii]
As array:
- asArr
Array with NaN to DataFrame with NA:
- still missing?
Remove NAs:
- dropna (DataArray)
- completecases (DataFrame)
- narm (TimeData)
7 Data manipulation
The following data manipulation functions could be used from
DataFrames
and are not yet implemented for TimeData
types yet.
- sort
- select / groupby
- stack / melt (wide to long format)
- unstack / cast (long to wide)
- merge / join
- normalization
8 Functions and operators
Mathematical operators and functions are only implemented for
Timematr
and Timenum
types, since they are not well defined
operations for general data (strings, …).
Whenever possible, functions apply element-wise to observations only,
and you should get back the same type that you did call the function
on. In case that this is not possible, the type that you get back
should be the natural first choice. For example, element-wise
comparisons should return a logical value for each entry, which by
definition could not be of type Timenum
where only numeric values
are allowed.
typeof(tm .+ tm) typeof(tm .> 0.5)
Timematr{Date} (constructor with 1 method) Timedata{Date} (constructor with 1 method)
The standard library for TimeData
comprises all standard operators
and mathematical functions. As expected, these functions all apply
elementwise, and leave the time information untouched. Where
additional arguments are allowed for Arrays
, they are allowed for
TimeData
types as well.
tm[1:3, 1:3] .> 0.5 exp(tm[1:3, 1:3]) round(tm[1:3, 1:3], 2)
type: Timedata{Date} dimensions: (3,3) 3x4 DataFrame |-------|------------|-------|-------|-------| | Row # | idx | MMM | ABT | ACE | | 1 | 2012-01-03 | true | true | false | | 2 | 2012-01-04 | true | false | false | | 3 | 2012-01-05 | false | false | false | type: Timematr{Date} dimensions: (3,3) 3x4 DataFrame |-------|------------|----------|----------|----------| | Row # | idx | MMM | ABT | ACE | | 1 | 2012-01-03 | 8.37332 | 2.42827 | 1.34641 | | 2 | 2012-01-04 | 2.2765 | 0.680614 | 0.384831 | | 3 | 2012-01-05 | 0.638988 | 0.793287 | 1.32903 | type: Timematr{Date} dimensions: (3,3) 3x4 DataFrame |-------|------------|-------|-------|-------| | Row # | idx | MMM | ABT | ACE | | 1 | 2012-01-03 | 2.13 | 0.89 | 0.3 | | 2 | 2012-01-04 | 0.82 | -0.38 | -0.95 | | 3 | 2012-01-05 | -0.45 | -0.23 | 0.28 |
9 Iterators and map
The idea of iterators and map
is to allow application of a given
function successively to parts of the data set, such that the
resulting values will automatically be embedded into the metadata
again. Applying a function iteratively and simply capturing the
results in an Array{Any, 1}
is a quite easy task with
comprehensions. However, automatically storing the data in a
meaningful way encapsulated by its respective metadata is the hard
part of the task. This, however, can be easily achieved through a
combination of iterators and map
for a quite general set tasks.
9.1 Iterators
For rectangular data, there basically exist three ways of iteratively stepping through:
- element-wise
- row-wise
- column-wise
In addition, functions can either apply to individual values only or
to values that are embedded with metadata (values plus time
information and / or variable name). Hence, the package implements six
different standard iterators, each returning different slices of a
given TimeData
object:
value | value plus metadata | |
---|---|---|
element-wise | eachentry | eachobs |
row-wise | eachrow | eachdate |
column-wise | eachcol | eachvar |
All iterators containing metadata will provide slices of the data in
the exact same type as the original data. Hence, eachdate
applied to
a Timematr
will return Timematr
objects of dimension \(1\times m\),
while eachvar
applied to a Timedata
object will return \(n \times
1\) Timedata
objects.
Iterators without metadata provide data in the following formats
eachentry
: individual entries onlyeachrow
: \(1\times m\)DataFrame
- this is different toeachrow
forDataFrames
eachcol
: tuple (nam, col), where nam is the variable name asSymbol
and col is the data asDataArray
orArray
9.2 Map
With iterators specifying how to slice a given data set, we still need
to define what type of function should be applied to each part.
Thereby, we again distinguish functions into two groups. Functions
that preserve the original dimensions of the data are implemented
through function map
, while functions that reduce the original
dimension are implemented through collapse
(see below).
Hence, any operation that will map a data row to a row again, and a
data column to a column again, is implemented through function map
.
Furthermore, as we do not know a priori which type to expect of an
arbitrary function, output values will always be captured in the most
general format as type Timedata
. Hence, the output of any function
applied iteratively to data with map
will always be a Timedata
object of dimensions equal to the original data set.
Although we know that map
will always map a row to a row, there
still are multiple formats that conceptually could represent row-wise
data: an \(1 \times m\) DataFrame
, an \(1 \times m\) TimeData
object,
an \(1 \times m\) Array
or an \(m \times 1\) Array
. In other words:
some mapping functions preserve metadata, and some don't.
So far, map
automatically handles several function output formats
for any given input iterator:
iterator | output formats |
---|---|
eachentry |
a single value of any standard type |
eachrow |
mx1 Array or 1xm Array , 1xm DataFrame |
eachcol |
nx1 Array or DataArray |
eachobs |
1x1 Array , DataFrame or TimeData |
eachdate |
mx1 Array or 1xm Array , 1xm DataFrame or TimeData |
eachvar |
nx1 Array or DataFrame or DataArray or TimeData |
Internally, column data is extracted from possible metadata by
function getColData
, row data by nthRowElem
and singleton entries
by get(x, :)
. Only the extracted values themselves then will be used
in the output.
9.3 Collapse
While map
is designed to always match the dimension of the original
input data, there are also situations where a function collapses a
complete row or column to a single value. Such mappings are
implemented by two collapse functions, each of them being accessible
with two different iterators: one returning values only, and one
returning values with metadata. Also, the output of both functions
differs:
collapseDates
:- output: \(1 \times m\)
DataFrame
- possible iterators:
eachrow
,eachdate
- output: \(1 \times m\)
collapseVars
:- output: \(n \times 1\)
TimeData
- possible iterators:
eachcol
,eachvar
- output: \(n \times 1\)
9.4 Check condition
For some special cases there exist more fine-tailored implementations
of iterative data mapping in order to allow for more efficient data
handling. For example, chk...
functions are successively checking
whether a given condition is fulfilled for the individual data slices,
and hence will return one of only three possible values for each
slice:
true
false
NA
Again, we can check whether a certain condition holds based on three
different ways of slicing the data: the condition can be a property of
either individual values (chkElw
), individual rows (chkDates
) or
individual columns (chkVars
). This way, output dimensions and
formats will vary depending on the condition used:
chkElw
: \(n \times m\)TimeData
objectchkDates
: \(n \times 1\)TimeData
objectchkVars
: \(1 \times m\)DataFrame
Still, however, all individual entries of any output must be either
Bool
or NA
.
Also, a certain condition can be a property of the plain data values
themselves, or of data combined with metadata. Hence, all three
chk...
functions again allow for two different iterators each:
chkElw
: eithereachentry
oreachobs
chkDates
: eithereachrow
oreachdate
chkVars
: eithereachcol
oreachvar
One frequent application of chk...
functions is selecting data
slices based on whether a certain condition is fulfilled. This
selection generally comprises three steps:
- checking whether the condition is met for data slices
- deciding how to interpret slices with unknown property regarding the
condition: should
NA
be interpreted astrue
orfalse
? - converting the result to
Array{Bool, 1}
orArray{Bool, 2}
to use it for logical indexing
Thereby the selection of certain rows or columns will again result in
a rectangular shaped data set. chkElw
, however, selects individual
entries from possibly different columns and rows, so that the data has
to be returned as TimeData
object in long format: in addition to the
index, the result comprises columns variable
and value
.
For example, selecting all columns with a maximum value above 10:
filePath = joinpath(Pkg.dir("TimeData"), "data", "logRet.csv"); tm = readTimedata(filePath) columnsMeetingCondition = chkVars(x -> minimum(x, 1)[1, 1] .< -25, eachvar(tm)) |> x -> asArr(x, Bool, false) |> x -> tm[x[:]] minimum(columnsMeetingCondition, 1)
APOL | CTL | FOSL |
-25.04314 | -25.61743 | -47.11015 |
10 Plotting
problem with dates:
- as strings:
- not working for Winston
- interpreted as categorical data by gadfly: very slow
10.1 Winston
10.2 Gadfly
only numeric values
11 Additional functions
Besides basic mathematical functions and operators, there are already
quite some additional functions that are defined for several
TimeData
types. You can find them in the online documentation.
12 Under the hood: implementation
The balancing act between emulating and extending DataFrames
is
implemented in Julia maybe a bit less naturally than in traditional
object oriented programming languages. There, one can easily inherit
behavior from other classes through subclasses, thereby overwriting
inherited methods whenever desired. In Julia, however, composite types
are not allowed to be subtypes of other composite types, but only
abstract types may act as parent. Under the hood, TimeData
types
hence inherit their behavior by owning a field of type DataFrame
.
This way, functions can easily be delegated to this field whenever
necessary. For a more elaborate discussion on this topic and the
interior design of TimeData
, take a look at this post on my blog.
13 Current state
All TimeData
types should already provide a convenient way to
represent and handle time series data. In extension to the already
implemented functions directly applying to types of the TimeData
package, any DataFrame
functionality in principle can easily be
regained by delegating functions to field vals
. So far, I only
tested TimeData
types with Date
type from the Dates
package
myself, and not yet with type DateTime
.
14 Acknowledgement
Of course, any package can only be as good as the individual parts that it builds on. Accordingly, I'd like to thank all people that were involved in the development of all the functions that were made ready to use for me to build this package upon. In particular, I want to thank the developers of
- the Julia language, for their continuous and tremendous efforts during the creation of this free, fast and highly flexible programming language!
- the DataFrames package, which definitely provides the best
representation for general types of data in data analysis. It's a
role model that every last bit of code of
TimeData
depends on, and the interface that every statistics package should use. - the Dates package, which is a thoughtful implementation of
dates, time and durations, and the backbone of all time components
in
TimeData
. - the TimeSeries package, which follows a different approach to
handling time series data. Having a quite similar goal in mind, the
package was a great inspiration for me, and occasionally I even
could borrow parts of code from it (for example, from an old version
of function
readtime
).