Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Throw invalid_time_zone exception from tzdb [4/4] #12475

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions velox/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ add_subdirectory(vector)
add_subdirectory(row)
add_subdirectory(flag_definitions)
add_subdirectory(external/date)
add_subdirectory(external/tzdb)
add_subdirectory(external/md5)
add_subdirectory(external/hdfs)
#
Expand Down
36 changes: 18 additions & 18 deletions velox/docs/develop/timestamp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,9 +73,9 @@ used to efficiently represent timezones, preventing the use of inefficient
timezone string names like ``America/Los_Angeles``. Considering there are about
2k valid timezone definitions, 12 bits are enough to represent timezone IDs. 

Timezone IDs in Velox are based on the id map used by Presto and are
`available here <https://github.com/prestodb/presto/blob/master/presto-common/src/main/resources/com/facebook/presto/common/type/zone-index.properties>`_.
They are automatically generated using `this script <https://github.com/facebookincubator/velox/blob/main/velox/type/tz/gen_timezone_database.py>`_.
Timezone IDs in Velox are based on the id map used by Presto and are
`available here <https://github.com/prestodb/presto/blob/master/presto-common/src/main/resources/com/facebook/presto/common/type/zone-index.properties>`_.
They are automatically generated using `this script <https://github.com/facebookincubator/velox/blob/main/velox/type/tz/gen_timezone_database.py>`_.
While timezone IDs are an implementation detail and ideally should not leak
outside of Velox execution, they are exposed if data containing
TimestampWithTimezones are serialized, for example.
Expand All @@ -98,7 +98,7 @@ ID).

However, unpacking/converting a TimestampWithTimezone into an absolute time
definition requires a
`timezone conversion <https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/DateTimeFunctions.h#L74-L84>`_.
`timezone conversion <https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/DateTimeFunctions.h#L74-L84>`_.

Conversions Across Timezones
----------------------------
Expand All @@ -119,7 +119,7 @@ for Linux. 
In Velox, Timezone conversions are done using std::chrono. Starting in C++20,
std::chrono `supports conversion of timestamp across timezones <https://en.cppreference.com/w/cpp/chrono/time_zone>`_.
To support older versions of the C++ standard, in Velox we vendor an
implementation of this API at `velox/external/date/ <https://github.com/facebookincubator/velox/tree/main/velox/external/date>`_.
implementation of this API at `velox/external/tzdb/ <https://github.com/facebookincubator/velox/tree/main/velox/external/tzdb>`_.
This class handles timezone conversions by leveraging APIs provided by the
operating system, based on the tzdata database installed locally. If systems
happen to have inconsistent or older versions of the tzdata database, Velox’s
Expand Down Expand Up @@ -153,23 +153,23 @@ on the string on not:
::

SELECT typeof(TIMESTAMP '1970-01-01 00:00:00'); -- timestamp
SELECT typeof(TIMESTAMP '1970-01-01 00:00:00 UTC'); -- timestamp with time zone
SELECT typeof(TIMESTAMP '1970-01-01 00:00:00 UTC'); -- timestamp with time zone

Converting a TimestampWithTimezone into a Timestamp works by dropping the
timezone information and returning only the timestamp portion:

::

SELECT cast(TIMESTAMP '1970-01-01 00:00:00 UTC' as timestamp); -- 1970-01-01 00:00:00.000
SELECT cast(TIMESTAMP '1970-01-01 00:00:00 America/New_York' as timestamp); -- 1970-01-01 00:00:00.000
SELECT cast(TIMESTAMP '1970-01-01 00:00:00 UTC' as timestamp); -- 1970-01-01 00:00:00.000
SELECT cast(TIMESTAMP '1970-01-01 00:00:00 America/New_York' as timestamp); -- 1970-01-01 00:00:00.000

To convert a Timestamp into a TimestampWithTimezone, one needs to specify a
timezone. In Presto, the session timezone is used by default:

::

SELECT current_timezone(); -- America/Los_Angeles
SELECT cast(TIMESTAMP '1970-01-01 00:00:00' as timestamp with time zone); -- 1970-01-01 00:00:00.000 America/Los_Angeles
SELECT cast(TIMESTAMP '1970-01-01 00:00:00' as timestamp with time zone); -- 1970-01-01 00:00:00.000 America/Los_Angeles

Conversion across TimestampWithTimezone can be done using the AT TIME ZONE
construct. 
Expand All @@ -180,15 +180,15 @@ the clock/calendar read at the target timezone (Los Angeles)?

::

SELECT TIMESTAMP '1970-01-01 00:00:00 UTC' AT TIME ZONE 'America/Los_Angeles'; -- 1969-12-31 16:00:00.000 America/Los_Angeles
SELECT TIMESTAMP '1970-01-01 00:00:00 UTC' AT TIME ZONE 'UTC'; -- 1970-01-01 00:00:00.000 UTC
SELECT TIMESTAMP '1970-01-01 00:00:00 UTC' AT TIME ZONE 'America/Los_Angeles'; -- 1969-12-31 16:00:00.000 America/Los_Angeles
SELECT TIMESTAMP '1970-01-01 00:00:00 UTC' AT TIME ZONE 'UTC'; -- 1970-01-01 00:00:00.000 UTC

Strings can be converted into Timestamp and TimestampWithTimezone:

::

SELECT cast('1970-01-01 00:00:00' as timestamp); -- 1970-01-01 00:00:00.000
SELECT cast('1970-01-01 00:00:00 America/Los_Angeles' as timestamp with time zone); -- 1970-01-01 00:00:00.000 America/Los_Angeles
SELECT cast('1970-01-01 00:00:00' as timestamp); -- 1970-01-01 00:00:00.000
SELECT cast('1970-01-01 00:00:00 America/Los_Angeles' as timestamp with time zone); -- 1970-01-01 00:00:00.000 America/Los_Angeles

One can also convert a TimestampWithTimezone into a unix epoch/time. The
semantic of this operation is: at the absolute point in time described by the
Expand All @@ -197,16 +197,16 @@ that unix epoch is the number of seconds since ``1970-01-01 00:00:00`` in UTC:

::

SELECT to_unixtime(TIMESTAMP '1970-01-01 00:00:00 UTC'); -- 0.0
SELECT to_unixtime(TIMESTAMP '1970-01-01 00:00:00 America/Los_Angeles'); -- 28800.0
SELECT to_unixtime(TIMESTAMP '1970-01-01 00:00:00 UTC'); -- 0.0
SELECT to_unixtime(TIMESTAMP '1970-01-01 00:00:00 America/Los_Angeles'); -- 28800.0

The opposite conversion can be achieved using ``from_unixtime()``. The function
may take an optional second parameter to specify the timezone, having the same
semantic as AT TIME ZONE described above:

::

SELECT from_unixtime(0); -- 1970-01-01 00:00:00.000
SELECT from_unixtime(0); -- 1970-01-01 00:00:00.000
SELECT from_unixtime(0, 'UTC'); -- 1970-01-01 00:00:00.000 UTC 
SELECT from_unixtime(0, 'America/Los_Angeles'); -- 1969-12-31 16:00:00.000 America/Los_Angeles

Expand All @@ -225,8 +225,8 @@ timestamps have a different semantic:
::

SET SESSION legacy_timestamp = true;
SELECT cast(TIMESTAMP '1970-01-01 00:00:00 UTC' as timestamp); -- 1969-12-31 16:00:00.000
SELECT cast('1970-01-01 00:00:00 UTC' as timestamp); -- 1969-12-31 16:00:00.000
SELECT cast(TIMESTAMP '1970-01-01 00:00:00 UTC' as timestamp); -- 1969-12-31 16:00:00.000
SELECT cast('1970-01-01 00:00:00 UTC' as timestamp); -- 1969-12-31 16:00:00.000

To support the two timestamp semantics, the
``core::QueryConfig::kAdjustTimestampToTimezone`` query flag was added to Velox.
Expand Down
1 change: 0 additions & 1 deletion velox/dwio/parquet/tests/reader/ParquetTableScanTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@
#include "velox/exec/tests/utils/HiveConnectorTestBase.h" // @manual
#include "velox/exec/tests/utils/PlanBuilder.h"
#include "velox/exec/tests/utils/TempDirectoryPath.h"
#include "velox/external/date/tz.h"
#include "velox/type/tests/SubfieldFiltersBuilder.h"
#include "velox/type/tz/TimeZoneMap.h"

Expand Down
1 change: 0 additions & 1 deletion velox/expression/CastExpr-inl.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
#include "velox/common/base/Exceptions.h"
#include "velox/core/CoreTypeSystem.h"
#include "velox/expression/StringWriter.h"
#include "velox/external/date/tz.h"
#include "velox/type/Type.h"
#include "velox/vector/SelectivityVector.h"

Expand Down
60 changes: 16 additions & 44 deletions velox/expression/tests/CastExprTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -623,11 +623,17 @@ TEST_F(CastExprTest, stringToTimestamp) {
"1970-01-01 00:00 America/Sao_Paulo",
"2000-01-01 12:21:56Z",
"2000-01-01 12:21:56+01:01:01",
"2045-12-31 18:00:00",
// Test going back and forth across DST boundaries.
"2024-03-10 09:59:59 -00:00:02",
"2024-03-10 10:00:01 +00:00:02",
"2024-11-03 08:59:59 -00:00:02",
"2024-11-03 09:00:01 +00:00:02",
// Test going back and forth across DST boundaries in the distant future.
"2100-03-14 09:59:59 -00:00:02",
"2100-03-14 10:00:01 +00:00:02",
"2100-11-07 09:59:59 -00:00:02",
"2100-11-07 10:00:01 +00:00:02",
};
expected = {
Timestamp(28800, 0),
Expand All @@ -636,10 +642,15 @@ TEST_F(CastExprTest, stringToTimestamp) {
Timestamp(10800, 0),
Timestamp(946729316, 0),
Timestamp(946725655, 0),
Timestamp(2398384800, 0),
Timestamp(1710064801, 0),
Timestamp(1710064799, 0),
Timestamp(1730624401, 0),
Timestamp(1730624399, 0),
Timestamp(4108701601, 0),
Timestamp(4108701599, 0),
Timestamp(4129264801, 0),
Timestamp(4129264799, 0),
};
testCast<std::string, Timestamp>("timestamp", input, expected);

Expand All @@ -656,14 +667,6 @@ TEST_F(CastExprTest, stringToTimestamp) {
(evaluateOnce<Timestamp, std::string>(
"try_cast(c0 as timestamp)", "201915-04-23 11:46:00.000")),
"Timepoint is outside of supported year range");
VELOX_ASSERT_THROW(
(evaluateOnce<Timestamp, std::string>(
"cast(c0 as timestamp)", "2045-12-31 18:00:00")),
"Unable to convert timezone 'America/Los_Angeles' past 2037-11-01 09:00:00");
VELOX_ASSERT_THROW(
(evaluateOnce<Timestamp, std::string>(
"try_cast(c0 as timestamp)", "2045-12-31 18:00:00")),
"Unable to convert timezone 'America/Los_Angeles' past 2037-11-01 09:00:00");
// Only one white space is allowed before the offset string.
VELOX_ASSERT_THROW(
(evaluateOnce<Timestamp, std::string>(
Expand Down Expand Up @@ -777,34 +780,18 @@ TEST_F(CastExprTest, timestampToString) {
Timestamp(0, 0),
Timestamp(946729316, 123),
Timestamp(-50049331622, 0),
Timestamp(253405036800, 0),
Timestamp(-62480038022, 0),
std::nullopt,
},
{
"1969-12-31 16:00:00.000",
"2000-01-01 04:21:56.000",
"0384-01-01 00:00:00.000",
"10000-02-01 08:00:00.000",
"-0010-02-01 02:00:00.000",
std::nullopt,
});

// Ensure external/date throws since it doesn't know how to convert large
// timestamps.
auto mustThrow = [&]() {
return testCast<Timestamp, std::string>(
"string", {Timestamp(253405036800, 0)}, {"10000-02-01 08:00:00.000"});
};
VELOX_ASSERT_THROW(
mustThrow(), "Unable to convert timezone 'America/Los_Angeles' past");

// try_cast should also throw since it's runtime error.
auto tryCastMustThrow = [&]() {
return testTryCast<Timestamp, std::string>(
"string", {Timestamp(253405036800, 0)}, {"10000-02-01 08:00:00.000"});
};
VELOX_ASSERT_THROW(
tryCastMustThrow(),
"Unable to convert timezone 'America/Los_Angeles' past");
}

TEST_F(CastExprTest, dateToTimestamp) {
Expand All @@ -831,6 +818,7 @@ TEST_F(CastExprTest, timestampToDate) {
Timestamp(0, 0),
Timestamp(946684800, 0),
Timestamp(1257724800, 0),
Timestamp(2534050368, 0),
std::nullopt,
};

Expand All @@ -841,6 +829,7 @@ TEST_F(CastExprTest, timestampToDate) {
0,
10957,
14557,
29329,
std::nullopt,
},
TIMESTAMP(),
Expand All @@ -854,28 +843,11 @@ TEST_F(CastExprTest, timestampToDate) {
-1,
10956,
14556,
29328,
std::nullopt,
},
TIMESTAMP(),
DATE());

// Ensure external/date throws since it doesn't know how to convert large
// timestamps.
auto mustThrow = [&]() {
return testCast<Timestamp, int32_t>(
"date", {Timestamp(253405036800, 0)}, {0});
};
VELOX_ASSERT_THROW(
mustThrow(), "Unable to convert timezone 'America/Los_Angeles' past");

// try_cast should also throw since it's runtime error.
auto tryCastMustThrow = [&]() {
return testTryCast<Timestamp, int32_t>(
"date", {Timestamp(253405036800, 0)}, {0});
};
VELOX_ASSERT_THROW(
tryCastMustThrow(),
"Unable to convert timezone 'America/Los_Angeles' past");
}

TEST_F(CastExprTest, timestampInvalid) {
Expand Down
6 changes: 2 additions & 4 deletions velox/external/date/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,5 @@
# See the License for the specific language governing permissions and
# limitations under the License.


velox_add_library(velox_external_date tz.cpp)
velox_include_directories(velox_external_date SYSTEM PUBLIC velox/external)
velox_compile_definitions(velox_external_date PRIVATE USE_OS_TZDB)
add_library(velox_external_date INTERFACE)
velox_include_directories(velox_external_date INTERFACE velox/external/date)
56 changes: 0 additions & 56 deletions velox/external/date/ios.h

This file was deleted.

Loading
Loading