-
Notifications
You must be signed in to change notification settings - Fork 381
Description
Summary
We should enhance the existing libp2p resource management and rate limiting to support adaptive scaling based on hardware capabilities. Currently, resource limits are hard-coded and static (5000 incoming streams, 10000 outgoing streams, 200 connections per IP). This enhancement would make limits dynamic and adaptive—scaling from resource-constrained devices to high-performance servers—while adding a soft connection trimmer and optional private network support.
Motivation
Bee nodes currently have basic rate limiting and per-IP connection limits, but they don't adapt to hardware capabilities:
- Static Limits Don't Scale: Hard-coded stream limits (5000/10000) and per-IP limits (200 connections) are the same for a 1GB Raspberry Pi and a 32GB server. A Raspberry Pi could be under-utilized or overloaded, while a server is artificially constrained.
- No Soft Limit Management: Nodes hit hard resource limits abruptly with no graceful degradation. There's no connection trimmer to reduce load before hitting system limits, leading to sudden disconnections.
- Limited Rate Limiting: While per-IP rate limiting exists (10 conn/sec, burst 40), it's not integrated with the overall resource strategy or configurable per deployment.
- Private Network Limitations: Private IP ranges (10.x, 192.168.x) cannot be exempted from rate limits, breaking local cluster deployments that developers need for testing.
- No Bootnode Prioritization: Bootnodes aren't automatically exempted from rate limits, potentially causing bootstrap failures under high load.
- No Protocol Prioritization: All protocols consume resources equally; critical protocols like Hive could be starved by background traffic.
Implementation
-
Hardware-Adaptive Scaling:
- Move from hard-coded limits to auto-scaled limits based on available system memory
- Define reasonable base limits for constrained devices
- Scale limits proportionally upward as more memory becomes available
- Preserve the existing rate limiting approach but integrate it with dynamic system limits
-
Improved Per-IP Connection Limits:
- Replace fixed 200-per-IP limit with dynamic calculation based on total system connections
- Scale fairly: smaller nodes get smaller per-IP allowances, larger servers allow more
- Maintain the existing IPv4 /32 and IPv6 /56 subnet-based approach
-
Connection Manager (Soft Limits):
- Add a connection manager that trims excess connections before hitting hard limits
- Implement configurable grace periods to protect new connections
- Use hysteresis to prevent rapid connection cycling under load
- This prevents the abrupt failures that currently occur when limits are exceeded
-
Bootnode Allowlisting:
- Automatically exempt bootnode multiaddrs from rate limits
- Ensure reliable bootstrap connectivity regardless of load conditions
- Gracefully handle invalid bootnode addresses
-
Private CIDR Support (new optional flag):
- Add --allow-private-cidrs flag to exempt private IP ranges from rate limiting
- Enable local cluster deployments and development setups
- Disabled by default for security on public nodes
-
Encapsulated Resource Manager:
- Extract resource manager configuration into a separate module for maintainability
- Centralize all resource limit logic in one place
- Prepare codebase for future enhancements (e.g., per-protocol limits commented as example)
Key Differences from Current Implementation:
- Replaces static stream limits (5000/10000) with adaptive scaling
- Replaces static per-IP limit (200) with dynamic calculation
- Adds connection manager for graceful load management
- Adds bootnode allowlisting for reliable bootstrap
- Adds optional private network support
- Consolidates resource manager logic into dedicated module
Drawbacks
-
Configuration Complexity: Understanding how system resources map to connection limits requires more knowledge. Operators need to be aware of the auto-scaling behavior.
-
Testing Requirements: Validation across diverse hardware profiles and network conditions is important to ensure limits work correctly in varied deployments.
-
Tuning Uncertainty: Initial scaling factors are educated estimates. Real-world deployments may reveal suboptimal values requiring adjustments.
-
Private CIDR Security: The --allow-private-cidrs flag, if accidentally enabled on public nodes, bypasses rate limits for entire private ranges. Clear documentation and warnings are needed.
-
Soft Limit Interactions: Connection trimming behavior adds complexity and requires careful testing to ensure it doesn't cause unintended disconnections.
-
Upgrade Impact: Nodes will enforce new adaptive limits during upgrade, potentially causing temporary connection fluctuations. Network stability during the upgrade window requires monitoring.
-
Memory Calculation Variability: Auto-scaling based on system memory may be inaccurate for containerized deployments or NUMA systems, potentially requiring manual calibration.