Skip to content

Conversation

@ahocevar
Copy link
Contributor

@ahocevar ahocevar commented Jan 20, 2026

I've been using Umami for a website which started skyrocketing a few days ago. And as soon as it did, the performance of statistics generation severely degraded. I was able to track things down to database queries for pageview and session metrics. To optimize them, I did the following:

  • Added a covering index on WebsiteEvent
  • Rewrote the logic in getPageviewMetrix.ts and getSessionMetrics.ts to use a common table expression when there are no filters set. Instead of scanning events, join all to session, then group and count distinct, we now scan events (faster thanks to the covering index), then get distinct ids, then join once to session, then count.

With these changes, I get superb performance again.

@vercel
Copy link

vercel bot commented Jan 20, 2026

@ahocevar is attempting to deploy a commit to the umami-software Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 20, 2026

Greptile Summary

Optimized pageview and session metrics queries for high-traffic scenarios by adding a covering index on website_event and introducing CTE-based query paths that reduce expensive COUNT DISTINCT operations.

Key changes:

  • Added covering index (website_id, created_at, event_type, session_id) on website_event table to enable index-only scans
  • Refactored getPageviewMetrics.ts and getSessionMetrics.ts to use CTEs when querying SESSION_COLUMNS without EVENT_COLUMNS filters
  • New query path: scan events → get distinct session_ids → join session table → aggregate (vs old: scan events → join session → count distinct)

Performance impact:
The optimization targets the common case where users query session-level metrics (browser, OS, device, location) without event-level filters. By getting distinct sessions before joining, the query avoids counting duplicates across the larger website_event table. The covering index ensures Postgres can satisfy the CTE entirely from the index without touching table data.

Testing recommendations:

  • Verify index build time on production-sized website_event tables
  • Confirm query performance improvements match expectations under load
  • Check that session metrics return identical results for both query paths

Confidence Score: 4/5

  • This PR is safe to merge with moderate risk - the optimization logic is sound but requires database testing
  • The PR implements a well-designed query optimization using CTEs and a covering index. The logic correctly identifies when to use the optimized path (SESSION_COLUMNS without EVENT_COLUMNS filters). However, score is 4/5 because: (1) The optimization changes query execution plans significantly and should be tested under production-like load, (2) The covering index will need to be built on existing data which could be slow, (3) The string interpolation of column names follows existing patterns but isn't parameterized
  • Test the query files (getPageviewMetrics.ts, getSessionMetrics.ts) under high traffic conditions to verify performance gains match expectations

Important Files Changed

Filename Overview
prisma/migrations/15_add_website_event_covering_index/migration.sql Adds covering index on website_event for (website_id, created_at, event_type, session_id) to optimize queries scanning events by website and time range
prisma/schema.prisma Schema update adds index definition matching the migration, correctly positioned at line 141
src/queries/sql/pageviews/getPageviewMetrics.ts Adds optimized CTE query path for SESSION_COLUMNS when no EVENT_COLUMNS filters present, reducing joins by getting distinct sessions first
src/queries/sql/sessions/getSessionMetrics.ts Adds optimized CTE query path when no EVENT_COLUMNS filters present, includes proper country handling for city/region queries

Sequence Diagram

sequenceDiagram
    participant Client
    participant API
    participant QueryRouter
    participant OptimizedQuery
    participant Database
    participant Index as Covering Index

    Client->>API: Request pageview/session metrics
    API->>QueryRouter: getPageviewMetrics() or getSessionMetrics()
    
    alt SESSION_COLUMNS && no EVENT_COLUMNS filters
        QueryRouter->>OptimizedQuery: Use CTE optimization path
        OptimizedQuery->>Database: Scan website_event (CTE)
        Database->>Index: Use covering index
        Index-->>Database: (website_id, created_at, event_type, session_id)
        Database-->>OptimizedQuery: Distinct session_ids
        OptimizedQuery->>Database: JOIN to session table
        Database-->>OptimizedQuery: Session data with filters
        OptimizedQuery->>Database: GROUP BY and COUNT
        Database-->>OptimizedQuery: Aggregated metrics
        OptimizedQuery-->>API: Results
    else Original query path
        QueryRouter->>Database: Full table scan with COUNT DISTINCT
        Database-->>API: Results
    end
    
    API-->>Client: Return metrics
Loading

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 20, 2026

Greptile's behavior is changing!

From now on, if a review finishes with no comments, we will not post an additional "statistics" comment to confirm that our review found nothing to comment on. However, you can confirm that we reviewed your changes in the status check section.

This feature can be toggled off in your Code Review Settings by deselecting "Create a status check for each PR".

@IndraGunawan
Copy link
Contributor

hi @ahocevar , interesting PR

the performance of statistics generation severely degraded

do you have a rough idea of how much traffic the site had when things started slowing down? Like approximate daily pageviews/sessions?

@ahocevar
Copy link
Contributor Author

@IndraGunawan The site had approximately 30000 daily page views when performance degraded severely.

@IndraGunawan
Copy link
Contributor

Thanks @ahocevar, that's helpful. If you don’t mind sharing, do you also have an approximate number of daily sessions around that time?


one thought: instead of adding a separate index, we could extend the existing index

@@index([websiteId, sessionId, createdAt])
by adding eventType as the left-most column.

since eventType has low cardinality, PostgreSQL 18 can skip the left-most column when it's not referenced in the query, so queries that don't filter on eventType should still benefit. reference: https://neon.com/postgresql/postgresql-18/skip-scan-btree

Copy link
Collaborator

@franciscao633 franciscao633 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these don't make sense, please let me know. Thank you for your contributions!

}

if (
SESSION_COLUMNS.includes(type) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think if statement would ever apply. If you look at src\app\api\websites[websiteId]\metrics\route.ts, this query is only called if EVENT_COLUMNS.includes(type). I would just remove changes to this file.

@ahocevar ahocevar force-pushed the statistics-performance branch from 6731cbc to 23cbff3 Compare January 27, 2026 15:24
@ahocevar
Copy link
Contributor Author

There were about 7000 sessions during that time. And by checking that, I found that also getWebsiteSessions.ts could benefit from query optimization, which I added.

The index I had previously added makes sense and should, in my opinion, not be replaced with your suggestion. sessionId varies and the resulting sorting is by sessionId first and then by createdAt - this means that for every single session the timestamp must be scanned.

You were absolutely right regarding the if-clause though. The dead code in there was a copy/paste from the session metrics optimization without the required changes anyway, so I got rid of that. Sorry for the confusion.

Instead, I added a few more changes to make use of strict event typing, for even more benefit from the new index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants