Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timestamp library and tests #403

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open

Add timestamp library and tests #403

wants to merge 2 commits into from

Conversation

launeh
Copy link
Contributor

@launeh launeh commented Nov 23, 2022

Motivation and Context

This pr adds a Timestamp class that wraps a UNIX timestamp with functionality for datetime parsing and formatting. Secondary changes add supporting library/toolkit functionality for processing datetimes. Notably, all the changes are in typed_python and Entrypointable.

Approach

The Timestamp class wraps a UNIX timestamp. This UNIX timestamp can be provided, parsed from a string representing a datetime, or constructed from a set of values representing a datetime.

For e.g, you can create a Timestamp from a unix timestamp with any of the following statements.

ts1 = Timestamp.make(1654615145)
ts2 = Timestamp(ts=1654615145)
ts3 = Timestamp.from_date(year=2022, month=11, day=20)

The module provides 3 ways to create timestamps from string representation of dates.
1: You can tell the parser the format of the provided datestring. This is the most efficient option. This is equivalent to datetime.strptime(). E.g.

ts1 = Timestamp.parse_with_format(date_str="2022-01-05", format="%Y-%m-%d")

2: If the string is any variant of an ISO 8601 formatted string, you can use the .parse_iso_str method. This method is slightly more permissive than the ISO 8601 standard in that it allows a space for the datetime separator (in addition to 'T') and allows timezone abbreviations E.g.

  ts1 = Timestamp.parse_iso_str("2022-01-05T10:11:12")
  ts2 = Timestamp.parse_iso_str("2022-01-05T10:11:12")
  ts3 = Timestamp.parse_iso_str("2022-01-05 10:11:12-0500")
  ts4 = Timestamp.parse_iso_str("2022-01-05 10:11:12ET")
  ts5 = Timestamp.parse_iso_str("2022-01-05 10:11:12NYC")

3: Can parse a range of non-iso date formats with .parse_non_iso_str. E.g

  ts1 = Timestamp.parse_iso_str("January 1, 1997")
  ts2 = Timestamp.parse_iso_str("Jan-1-1997")
  ts3 = Timestamp.parse_iso_str("1-Jan-1997")

For convenience, there's a multi-use .parse() entry point. That will parse a datestring with a format if provided. If no format string is provided, .parse will attempt to parse the date_str as an ISO 8601 string. Failing that, it attempts to parse using the supported non-iso formats.

  ts1 = Timestamp.parse("January 01, 1997", "%B %d, %Y" )
  ts2 = Timestamp.parse("1997-01-01")
  ts3 = Timestamp.parse("1-Jan-1997")

You can convert Timestamps to strings using standard python time format directives. E.g:

ts = Timestamp.make(1654615145)
print(ts.format(utc_offset=144000))  # 2022-06-09T07:19:05
print(ts.format(format="%Y-%m-%d"))  # 2022-06-09

The functionality for parsing datestrings is implemented in the reusable DateParser component. Specifically, the component exposes DateParser.parse which in turn proxies to DateParser.parse_iso_format and DateParser.parse_non_iso_format. These methods convert a string representation of a datetime to a UNIX timestamp. E.g.

  time = DateParser.parse("2022-01-05T10:11:12+00:15")
  time = DateParser.parse("2022-01-05T10:11:12NYC")

DateParser additionally depends on Timezone. Timestamps are pegged to UTC and do not store timezone information. The parser needs to adjust the timestamp by the appropriate offset from UTC. Timezone provides support for converting a timezone abbreviation to a utc_offset. Timezone offset supports relative zones - meaning if the offset is "ET (Eastern Time)" or "NYC" then it will return either the offset for EST (Eastern Standard Time) or EDT (Eastern Daylight time) as appropriate.

Note: the date parsing logic handles a useful range of non-iso date formats. For example, it will correctly parse dates like "Jan 2, 1997" or "Jan-1-1997" or "1-January-1997". However, parsing of ambiguous dates is NOT supported. For example, attempting to parse a date with a 2 digit year cause the parser to throw an error.

The supporting functionality for formatting Timestamps as strings is implemented in the reusable DateFormatter component. E.g

print(DateFormatter.format(ts=22323232, utc_offset=144000))  # 2022-06-09T07:19:05
print(DateFormatter.format(format="%Y-%m-%d"))  # 2022-06-09

By default DateFormatter.format outputs an ISO 8601 formatted string (YYYY-MM-DDTHH:MM:SS). However, it also accepts a format string (E.g. "%Y-%m-%d") using standard python format directives.

By default DateFormatter.format returns a date string in UTC. However, it also accepts a utc_offset (in seconds) as input.

DateParser and DateFormatter both depend on some low level datetime processing/validation algorithms. For eg. these algorithms let you convert a timestamp to (day, month, year, day of week, weekday, hour, etc) values and vice-versa. These algorithms are implemented in the Chrono component.

How Has This Been Tested?

This PR adds tests for the individual components (DateParser, DateFormatter, Timezone, Chrono). Also adds extensive unit tests for main Timestamp component.

The unit tests compare against standard python objects/builtins where relevant. This means, for example, that the Timestamp.format functionality is checked for correctness against python's Datetime.strftime and Timestamp.parse* methods are checked for correctness against Datetime.strptime

For exhaustiveness (and possibly overkill) the tests run over long time ranges (e.g. 'all the days over a span of two years' or 'all the seconds over a span of 3 months)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@launeh launeh force-pushed the laune-timestamp-module branch 8 times, most recently from b4ab8bd to 075df9f Compare November 26, 2022 13:25
@launeh launeh force-pushed the laune-timestamp-module branch 2 times, most recently from 9283c0b to 6ff6a6f Compare December 1, 2022 15:23
@launeh launeh force-pushed the laune-timestamp-module branch from 7298776 to 6dacbeb Compare December 13, 2022 19:25
@launeh launeh force-pushed the laune-timestamp-module branch from 6dacbeb to 6a9de9a Compare December 18, 2022 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant