Conversation
|
All alerts resolved. Learn more about Socket for GitHub. This PR previously contained dependency changes with security issues that have been resolved, removed, or ignored. |
|
Oh nice with this PR :) I was just opening an issue that DuckDB could really be a quicker way to do data analytics on GTFS datasets. |
3b631aa to
68b2d4e
Compare
Member
Author
|
I'm making progress! With the current state, importing the |
Member
Author
|
I stumbled upon this weird behaviour (bug?) in DuckDB v1.2.2's query plan output. I redefined `arrivals_departures` as follows:CREATE OR REPLACE VIEW "main.arrivals_departures" AS
SELECT
(
to_base64(encode(trip_id))
|| ':' || to_base64(encode(
extract(ISOYEAR FROM "date")
|| '-' || lpad(extract(MONTH FROM "date")::text, 2, '0')
|| '-' || lpad(extract(DAY FROM "date")::text, 2, '0')
))
|| ':' || to_base64(encode(stop_sequence::text))
-- frequencies_row
|| ':' || to_base64(encode('-1'))
-- frequencies_it
|| ':' || to_base64(encode('-1'))
) as arrival_departure_id,
-- todo: expose local arrival/departure "wall clock time"?
-1 AS frequencies_row,
-1 AS frequencies_it,
stop_times_based.*
EXCLUDE (
arrival_time,
departure_time
)
FROM (
SELECT
agency.agency_id,
trips.route_id,
route_short_name,
route_long_name,
route_type,
s.trip_id,
trips.direction_id,
trips.trip_headsign,
trips.wheelchair_accessible,
trips.bikes_allowed,
service_days.service_id,
trips.shape_id,
"date",
stop_sequence,
stop_sequence_consec,
stop_headsign,
pickup_type,
drop_off_type,
shape_dist_traveled,
timepoint,
agency.agency_timezone as tz,
arrival_time,
(
make_timestamptz(
date_part('year', "date")::int,
date_part('month', "date")::int,
date_part('day', "date")::int,
12, 0, 0,
agency.agency_timezone
)
- INTERVAL '12 hours'
+ arrival_time
) t_arrival,
departure_time,
(
make_timestamptz(
date_part('year', "date")::int,
date_part('month', "date")::int,
date_part('day', "date")::int,
12, 0, 0,
agency.agency_timezone
)
- INTERVAL '12 hours'
+ departure_time
) t_departure,
trip_start_time,
s.stop_id, stops.stop_name,
stations.stop_id station_id, stations.stop_name station_name,
-- todo: PR #47
coalesce(
nullif(stops.wheelchair_boarding, 'no_info_or_inherit'),
nullif(stations.wheelchair_boarding, 'no_info_or_inherit'),
'no_info_or_inherit'
) AS wheelchair_boarding
FROM (
"main.stop_times" s
JOIN "main.stops" stops ON s.stop_id = stops.stop_id
LEFT JOIN "main.stops" stations ON stops.parent_station = stations.stop_id
JOIN "main.trips" trips ON s.trip_id = trips.trip_id
JOIN "main.routes" routes ON trips.route_id = routes.route_id
LEFT JOIN "main.agency" agency ON (
-- The GTFS spec allows routes.agency_id to be NULL if there is exactly one agency in the feed.
-- Note: We implicitly rely on other parts of the code base to validate that agency has just one row!
-- It seems that GTFS has allowed this at least since 2016:
-- https://github.com/google/transit/blame/217e9bf/gtfs/spec/en/reference.md#L544-L554
routes.agency_id IS NULL -- match first (and only) agency
OR routes.agency_id = agency.agency_id -- match by ID
)
JOIN "main.service_days" service_days ON trips.service_id = service_days.service_id
)
-- todo: this slows down slightly
-- ORDER BY route_id, s.trip_id, "date", stop_sequence
) stop_times_based;Look at the time ( query planedit: maybe duckdb/duckdb#17607 is related, but most likely not |
8ff6f4f to
a7b26f8
Compare
9e32723 to
00895ea
Compare
4d2805f to
dfe8b8d
Compare
This works around a crash I couldn't make sense of.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
0.8.0.