-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Covering index for website_event and query optimization for pageview and session metrics #3966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@ahocevar is attempting to deploy a commit to the umami-software Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryOptimized pageview and session metrics queries for high-traffic scenarios by adding a covering index on Key changes:
Performance impact: Testing recommendations:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant API
participant QueryRouter
participant OptimizedQuery
participant Database
participant Index as Covering Index
Client->>API: Request pageview/session metrics
API->>QueryRouter: getPageviewMetrics() or getSessionMetrics()
alt SESSION_COLUMNS && no EVENT_COLUMNS filters
QueryRouter->>OptimizedQuery: Use CTE optimization path
OptimizedQuery->>Database: Scan website_event (CTE)
Database->>Index: Use covering index
Index-->>Database: (website_id, created_at, event_type, session_id)
Database-->>OptimizedQuery: Distinct session_ids
OptimizedQuery->>Database: JOIN to session table
Database-->>OptimizedQuery: Session data with filters
OptimizedQuery->>Database: GROUP BY and COUNT
Database-->>OptimizedQuery: Aggregated metrics
OptimizedQuery-->>API: Results
else Original query path
QueryRouter->>Database: Full table scan with COUNT DISTINCT
Database-->>API: Results
end
API-->>Client: Return metrics
|
Greptile's behavior is changing!From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section. This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR". |
|
hi @ahocevar , interesting PR
do you have a rough idea of how much traffic the site had when things started slowing down? Like approximate daily pageviews/sessions? |
|
@IndraGunawan The site had approximately 30000 daily page views when performance degraded severely. |
|
Thanks @ahocevar, that's helpful. If you don’t mind sharing, do you also have an approximate number of daily sessions around that time? one thought: instead of adding a separate index, we could extend the existing index Line 137 in 63d2bfe
eventType as the left-most column.
since |
franciscao633
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these don't make sense, please let me know. Thank you for your contributions!
| } | ||
|
|
||
| if ( | ||
| SESSION_COLUMNS.includes(type) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think if statement would ever apply. If you look at src\app\api\websites[websiteId]\metrics\route.ts, this query is only called if EVENT_COLUMNS.includes(type). I would just remove changes to this file.
6731cbc to
23cbff3
Compare
|
There were about 7000 sessions during that time. And by checking that, I found that also The index I had previously added makes sense and should, in my opinion, not be replaced with your suggestion. sessionId varies and the resulting sorting is by sessionId first and then by createdAt - this means that for every single session the timestamp must be scanned. You were absolutely right regarding the if-clause though. The dead code in there was a copy/paste from the session metrics optimization without the required changes anyway, so I got rid of that. Sorry for the confusion. Instead, I added a few more changes to make use of strict event typing, for even more benefit from the new index. |
I've been using Umami for a website which started skyrocketing a few days ago. And as soon as it did, the performance of statistics generation severely degraded. I was able to track things down to database queries for pageview and session metrics. To optimize them, I did the following:
WebsiteEventgetPageviewMetrix.tsandgetSessionMetrics.tsto use a common table expression when there are no filters set. Instead of scanning events, join all to session, then group and count distinct, we now scan events (faster thanks to the covering index), then get distinct ids, then join once to session, then count.With these changes, I get superb performance again.