Working with dates and times

Mahbubul Majumder, PhD
Oct 9, 2014

What is date?

  • Dates are just count of days from a specific origin written in a specific format
as.Date(730000, origin = '0000-01-01')
[1] "1998-09-03"
as.Date(35000, origin = '1900-01-01')
[1] "1995-10-30"
  • Julian day is the day obtained from a date based on a origin
as.POSIXlt(Sys.Date(), 'yyyy-mm-dd')
[1] "2014-10-09 UTC"
as.POSIXlt(Sys.Date(), 'yyyy-mm-dd')$yday
[1] 281

Date and time

  • Current date as the computer is set
myDate <- Sys.Date()
myDate
[1] "2014-10-09"
  • Current date and time
date()
[1] "Thu Oct  9 20:28:10 2014"
  • Generate a vector of dates incremented by 10 days (notice the last day)
myDates <- seq(myDate, length=4, by = '10 day')
myDates
[1] "2014-10-09" "2014-10-19" "2014-10-29" "2014-11-08"

Converting text to dates

  • Dates are special data types
tDate <- c("2014-10-09")
myDt <- as.Date(tDate)
myDt
[1] "2014-10-09"
  • Formatting the dates
as.Date(tDate,'%Y-%m-%d')
[1] "2014-10-09"
format(myDt,format="%B %d %Y")
[1] "October 09 2014"
  • Time zone setting
x <- .POSIXct(myDt,tz="GMT")
x
[1] "1970-01-01 04:32:32 GMT"
  • Time formatting
format(x, "%H:%M:%S")
[1] "04:32:32"
  • ? format

Exploring weekdays and months

str(myDates)
 Date[1:4], format: "2014-10-09" "2014-10-19" "2014-10-29" "2014-11-08"
weekdays(myDates)
[1] "Thursday"  "Sunday"    "Wednesday" "Saturday" 
weekdays(myDates+1)
[1] "Friday"   "Monday"   "Thursday" "Sunday"  
months(myDates)
[1] "October"  "October"  "October"  "November"

Date differences

  • Date differences in days
days <- myDates[1] - myDates[2]
days
Time difference of -10 days
  • Differences in time
difftime(myDates[2], myDates[1], units="secs")
Time difference of 864000 secs
difftime(myDates[2], myDates[1], units="mins")
Time difference of 14400 mins
difftime(myDates[2], myDates[1], units="hours")
Time difference of 240 hours

Plotting date as text vs date

      dates rents
 2012-10-01  3010
 2012-10-02  2390
 2012-10-03  2455
 2012-10-04  2552
 2012-10-05  3699
 2012-10-06  2790
 2012-10-07  1974
 2012-10-08  2394
 2012-10-09  2643
 2012-10-10  1698
 2012-10-11  2934
 2012-10-12  2411
 2012-10-13  1952
 2012-10-14  1451
 2012-10-15  2978
 2012-10-16  2801
 2012-10-17  2750
 2012-10-18  3085
 2012-10-19  2298
 2012-10-20  3512

plot of chunk unnamed-chunk-15

  • But when we convert text as date

plot of chunk unnamed-chunk-16

Challenges to work with dates and times

  • They look like numbers but they are not
    • special formats
    • leap years
    • different time zones
    • simple arithmetics does not work
  • If you are not still scared enough think of day light saving time

    • how do you add/subtract an hour to a date on a moment of day light saving time
    • by the time you added an hour, they day light saving time may make it unchanged
  • Dates and times are one of the most scary data types. If all the challenges are not handled correctly, all the analysis may be messed up.

Working with dates conveniently

  • Package lubridate by Garrett and Hadley
# install.packages("lubridate")
library(lubridate)
  • Base R functions may become unintuitive when dealing with dates
  • lubridate made dates easier to handle
    • algebra with dates become intuitive
    • easy to remember
ymd('1971-12-29')
[1] "1971-12-29 UTC"
dmy(29121971)
[1] "1971-12-29 UTC"

Parse text to date for use

  • At first, parse characters to dates
Functions Descriptions
ymd() year, month, day
ydm() year, day, month
mdy() month, day, year
dmy() day, month, year
hm() hour, minute
hms() hour, minute, second
ymd_hms() year, month, day, hour, minute, second
  • Example
x <- ymd_hms('20141009 20:45:36')
x
[1] "2014-10-09 20:45:36 UTC"
  • QUIZ: weekdays(x) = ?
Functions Descriptions
year() Year
month() Month
week() Week
yday() Day of year
mday() Day of month
wday() Day of week
hour() Hour
minute() Minute
second() Second
tz() Time zone
  • Once parsed, we can explore
minute(x)
[1] 45

Exploring dates

date <- now()
month(date)
[1] 10
month(date, label=TRUE)
[1] Oct
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
wday(date, label=TRUE, abbr=FALSE)
[1] Thursday
7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Dates, times look simple !

  • Not all the years are of same lengths
  • Not all the months are of same lengths
  • What about days, minutes and seconds ?
  • How can we add or subtract or divide?
    • add 1 day to February 28
    • subtract 1 day from June 1
    • day difference from February 28 to June 1
x <- mdy(02282012)
x
[1] "2012-02-28 UTC"
x + days(1)
[1] "2012-02-29 UTC"
y <- mdy(06012012)
y
[1] "2012-06-01 UTC"
y - days(1)
[1] "2012-05-31 UTC"
y - x
Time difference of 94 days
xx <- x + years(1)
yy <- y + years(1)
yy - xx
Time difference of 93 days

Time zone

myTime <- ymd_hms('20141009 23:59:59')
nyTime <- force_tz(myTime, tz = "America/New_York")
nyTime
[1] "2014-10-09 23:59:59 EDT"
chTime <- force_tz(myTime, tz = "America/Chicago")
chTime
[1] "2014-10-09 23:59:59 CDT"
  • Same time but different time zones create differences
nyTime - chTime
Time difference of -1 hours

Date time algebra

chTime
[1] "2014-10-09 23:59:59 CDT"
  • Add one hour, one minute and one second to current time
chTime + dhours(1) + dminutes(1) + dseconds(1)
[1] "2014-10-10 01:01:00 CDT"
  • Add one year, one month and one day to Chicago time. Notice the time zone change. The daylight saving times are considered while doing the algebra.
chTime + years(1) + months(1) + days(1)
[1] "2015-11-10 23:59:59 CST"

Case Study: Boston hubway

boston-hubway

Case Study: when do Bostonians bike most?

  • Boston hubway data
    • Saturday and Sunday make sense
    • why does Wednesday get crazy?

plot of chunk unnamed-chunk-31

  • Are the patterns for each of the days same? What plot should display that?

Case study: Boston hubway data

  • What are those peaks? Data does not know how to lie.

plot of chunk unnamed-chunk-32

Case study: Boston hubway data

plot of chunk unnamed-chunk-33

Reading assignment and references