Skip to content

Performance: ap_actor table grows unbounded, hourly refresh job doesn't scaleΒ #2763

@kraftbj

Description

@kraftbj

Summary

The ap_actor custom post type accumulates entries indefinitely from any ActivityPub interaction (comments, mentions, replies, etc.), while the hourly activitypub_update_remote_actors cron job attempts to refresh ALL cached actors regardless of whether they're followers or just one-time interactions.

On sites with significant ActivityPub activity, this causes severe performance degradation.

Example Scenario

A site with:

  • 122 actual followers (stored in ap_inbox)
  • 19,381 cached actor profiles (stored in ap_actor)

The hourly job tries to refresh all 19,381 actors, causing:

  • 180% CPU spike
  • 502 timeouts
  • php-fpm workers exhausted

Technical Analysis

Data Structure

  • ap_inbox - Stores follower relationships, has _activitypub_user_id meta
  • ap_actor - Caches remote actor profiles, purely a cache with no follower relationship

These are separate. An ap_actor entry is created for ANY remote actor the site interacts with:

  • Someone who comments once
  • A user mentioned in a post
  • A reply target
  • Any ActivityPub interaction

The Hourly Job (update_remote_actors)

// class-scheduler.php:201-229
public static function update_remote_actors() {
    $number = 5;  // Only 5 per run (50 with DISABLE_WP_CRON)
    $actors = Remote_Actors::get_outdated( $number );
    
    foreach ( $actors as $actor ) {
        $meta = get_remote_metadata_by_actor( $actor->guid, false );  // HTTP request
        // ... updates actor
    }
}
// class-remote-actors.php:461-476 - "Outdated" definition
'date_query' => array(
    'column' => 'post_modified_gmt',
    'before' => gmdate( 'Y-m-d', time() - DAY_IN_SECONDS ),  // > 24 hours old
),

Problems:

  1. Queries ALL ap_actor entries, not just followers
  2. With 19k actors at 5/hour = 3,877 hours (161 days) to cycle through
  3. Each requires an HTTP request to the remote server
  4. No distinction between important actors (followers) and irrelevant ones (one-time commenters)

The Cleanup Job (cleanup_remote_actors)

Only removes actors that:

  • Return a Tombstone (explicitly deleted on remote server)
  • Fail 5+ consecutive fetch attempts

Does NOT clean up actors that are:

  • Not followers
  • Haven't interacted in months/years
  • From defunct servers that still respond

Investigation Queries

-- Actors used by comments
SELECT COUNT(DISTINCT meta_value) FROM wp_commentmeta 
WHERE meta_key = '_activitypub_remote_actor_id';
-- Result: 42

-- Actors that are followers
SELECT COUNT(DISTINCT p.ID) FROM wp_posts p
WHERE p.post_type = 'ap_actor'
AND p.guid IN (
  SELECT DISTINCT pm.meta_value FROM wp_postmeta pm
  INNER JOIN wp_posts inbox ON inbox.ID = pm.post_id
  WHERE inbox.post_type = 'ap_inbox'
  AND pm.meta_key = '_activitypub_activity_remote_actor'
);
-- Result: 23

-- Orphaned actors (not used by anything)
-- Result: 19,329 out of 19,381 total

Suggested Fixes

Option 1: Scope actor refresh to relevant actors only

Only refresh actors that are:

  • Actual followers (have corresponding ap_inbox entry)
  • Have interacted within a configurable timeframe
  • Are being actively followed by the site

Option 2: Add proper cleanup for stale actors

Create a cleanup job that removes ap_actor entries for:

  • Non-followers with no interaction in X days (configurable)
  • Actors not referenced by any comments or interactions

Option 3: Add interaction tracking

Store _activitypub_last_interaction timestamp on actors to enable:

  • Smart refresh prioritization (recent interactions first)
  • Cleanup of truly stale actors
  • Skip refreshing actors that haven't been relevant in months

Option 4: Reduce refresh frequency for non-followers

  • Followers: refresh every 24 hours (current behavior)
  • Non-followers: refresh only on interaction, or weekly/monthly

Workaround

Site owners can manually clean up orphaned actors:

-- Delete actors not used by comments or followers
DELETE wp_posts FROM wp_posts
LEFT JOIN wp_commentmeta cm ON wp_posts.ID = cm.meta_value 
  AND cm.meta_key = '_activitypub_remote_actor_id'
LEFT JOIN (
  SELECT DISTINCT pm.meta_value as actor_guid
  FROM wp_postmeta pm
  INNER JOIN wp_posts inbox ON inbox.ID = pm.post_id
  WHERE inbox.post_type = 'ap_inbox'
  AND pm.meta_key = '_activitypub_activity_remote_actor'
) followers ON wp_posts.guid = followers.actor_guid
WHERE wp_posts.post_type = 'ap_actor'
AND cm.meta_value IS NULL
AND followers.actor_guid IS NULL;

-- Clean up orphaned postmeta
DELETE pm FROM wp_postmeta pm
LEFT JOIN wp_posts p ON pm.post_id = p.ID
WHERE p.ID IS NULL;

Environment

  • ActivityPub plugin 7.8.4
  • PHP 8.3
  • WordPress 6.x
  • VPS with limited PHP workers (5-10)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions