Query Throttler

VTTablet runs a query throttler that protects tablets from being overloaded by incoming queries. Unlike the Tablet Throttler, which manages outgoing operations like VReplication and OnlineDDL, the query throttler manages incoming user queries to prevent database overload.

Why throttle incoming queries? #

When tablets experience high load from incoming queries, they can become overloaded. This can cause:

  • Increased query latency: High query volume increases query execution times as the database struggles to process all requests.
  • Resource exhaustion: Too many concurrent queries can consume all available connections, memory, or CPU resources.
  • Cascading failures: An overloaded tablet can affect replica lag, which impacts the entire shard and can lead to system-wide issues.
  • Degraded user experience: When tablets are overwhelmed, all users suffer from poor performance instead of just lower-priority workloads.

The query throttler monitors tablet health metrics and selectively rejects queries when the tablet is under stress. This keeps critical queries running with acceptable performance while temporarily rejecting lower-priority traffic.

How it works #

The query throttler evaluates each incoming query before execution. When enabled, it:

  1. Checks the query's priority (if specified via the PRIORITY comment directive)
  2. Determines which throttling strategy to apply based on configuration
  3. Evaluates current system metrics (replication lag, load average, running threads, etc.)
  4. Makes a decision to allow or reject the query based on configured thresholds
  5. Returns a RESOURCE_EXHAUSTED error if the query should be throttled

The throttler adds minimal overhead in healthy conditions (typically less than 5% latency increase) through fast-path optimization and aggressive caching.

Architecture #

The query throttler uses a pluggable strategy architecture.

Strategies #

The throttler supports different throttling strategies, which can be selected via configuration:

  • NoOp: The default strategy. Does not throttle any queries. This is a safe fallback that ensures queries are never blocked if configuration is missing or invalid.
  • TabletThrottler: A production-ready strategy that uses the existing tablet throttler's metrics to make throttling decisions. This strategy can be configured with detailed rules for different tablet types and SQL statement types.
  • Custom strategies: The architecture supports custom throttling strategies through a registry system, allowing you to implement your own logic.

Configuration #

The throttler loads configuration from a JSON file at /config/throttler-config.json by default. Configuration is refreshed periodically (default: every 1 minute) without requiring tablet restarts.

Basic configuration #

The throttler uses JSON configuration:

{
  "enabled": true,
  "strategy": "NoOp"
}

TabletThrottler strategy #

When using the TabletThrottler strategy, you can define rules for different tablet types and SQL statement types:

{
  "enabled": true,
  "strategy": "TabletThrottler",
  "tablet_strategy_config": {
    "tablet_rules": {
      "PRIMARY": {
        "INSERT": {
          "lag": {
            "thresholds": [
              {"above": 10.0, "throttle": 25}
            ]
          }
        }
      }
    }
  }
}

This configuration:

  • Enables the throttler
  • Uses the TabletThrottler strategy
  • Applies a rule to PRIMARY tablets for INSERT statements
  • When replication lag exceeds 10 seconds, throttles 25% of INSERT queries

Advanced configuration #

You can define multiple thresholds for graduated throttling along with monitoring multiple metrics:

{
  "enabled": true,
  "strategy": "TabletThrottler",
  "tablet_strategy_config": {
    "tablet_rules": {
      "PRIMARY": {
        "INSERT": {
          "lag": {
            "thresholds": [
              {"above": 5.0, "throttle": 10},
              {"above": 15.0, "throttle": 25},
              {"above": 30.0, "throttle": 50}
            ]
          },
          "threads_running": {
            "thresholds": [
              {"above": 50, "throttle": 15},
              {"above": 100, "throttle": 35}
            ]
          }
        },
        "UPDATE": {
          "lag": {
            "thresholds": [
              {"above": 10.0, "throttle": 20}
            ]
          }
        }
      },
      "REPLICA": {
        "SELECT": {
          "lag": {
            "thresholds": [
              {"above": 60.0, "throttle": 20}
            ]
          },
          "loadavg": {
            "thresholds": [
              {"above": 4.0, "throttle": 25},
              {"above": 8.0, "throttle": 50}
            ]
          }
        }
      }
    }
  }
}

This configuration:

  • Sets different rules for PRIMARY and REPLICA tablets
  • Uses graduated thresholds (higher metric values trigger more aggressive throttling)
  • Monitors multiple metrics simultaneously (lag, threads_running, loadavg)
  • Applies different rules for different SQL statement types (INSERT, UPDATE, SELECT)

Supported metrics #

The TabletThrottler strategy can monitor the same metrics as the Tablet Throttler:

  • lag: Replication lag in seconds
  • threads_running: MySQL's Threads_running status value
  • loadavg: Load average per core on the tablet server
  • mysqld-loadavg: Load average per core on the MySQL server
  • custom: Custom query results
  • mysqld-datadir-used-ratio: Disk space usage (0.0 to 1.0)
  • history_list_length: InnoDB's history list length

Priority-based throttling #

The query throttler supports priority-based query execution using the PRIORITY comment directive. This ensures critical queries are never throttled while allowing lower-priority queries to be throttled more aggressively.

How priority works #

Priority is specified as a value from 0 to 100, with 0 being the highest priority and 100 the lowest. The value determines whether or not the query is potentially throttled based on the current configuration and system state:

  • Priority 0: Never throttled. Reserved for the most critical queries.
  • Priority 1-99: Probabilistically throttled based on the priority value. Higher numbers mean it's more likely to be throttled.
  • Priority 100: Always evaluated for potential throttling.

If no priority is specified, queries default to priority 100.

Using priority in queries #

Specify priority using the PRIORITY comment directive:

SELECT /*vt+ PRIORITY=0 */ * FROM critical_table;
SELECT /*vt+ PRIORITY=50 */ * FROM normal_table;
SELECT /*vt+ PRIORITY=100 */ * FROM batch_table;

Priority evaluation #

The throttler uses probabilistic priority checking:

  1. Generate a random number between 0 and 99
  2. If the random number is less than the query's priority, evaluate throttling rules
  3. If the random number is greater than or equal to the priority, allow the query without checking metrics

This means:

  • Priority 0 queries always skip throttling (random 0-99 is never < 0)
  • Priority 50 queries are checked 50% of the time
  • Priority 100 queries are always checked

Workload classification #

The query throttler can track metrics by workload using the WORKLOAD_NAME comment directive. This lets you monitor which workloads are being throttled most frequently.

SELECT /*vt+ WORKLOAD_NAME=analytics */ * FROM large_table;

When combined with the --enable-per-workload-table-metrics flag on vttablet, you can track throttling behavior per workload in the QueryThrottlerRequests and QueryThrottlerThrottled metrics.

Monitoring #

The query throttler emits several metrics:

  • QueryThrottlerRequests: Total number of queries evaluated by the throttler
  • QueryThrottlerThrottled: Number of queries that were throttled
  • QueryThrottlerTotalLatencyNs: Total latency added by throttler evaluation
  • QueryThrottlerEvaluateLatencyNs: Latency of the throttling decision evaluation

These metrics include labels for:

  • strategy: The throttling strategy used (NoOp, TabletThrottler)
  • workload: The workload name (if specified via WORKLOAD_NAME directive)
  • priority: The query priority (if specified via PRIORITY directive)

See Query Serving Metrics for details.

Error messages #

When a query is throttled, the query throttler returns a RESOURCE_EXHAUSTED error with details about why the query was rejected:

vttablet: rpc error: code = ResourceExhausted desc = [VTTabletThrottler] Query throttled: metric=lag value=15.23 breached threshold=10.00 throttle=25%

The error message includes:

  • The metric that triggered throttling
  • The current metric value
  • The configured threshold
  • The throttle percentage applied

Differences from tablet throttler #

The query throttler differs from the Tablet Throttler:

FeatureTablet ThrottlerQuery Throttler
PurposeThrottles outgoing operations (VReplication, OnlineDDL)Throttles incoming user queries
What it protectsPrevents background jobs from overloading the databasePrevents user queries from overloading the database
Default behaviorEnabled by defaultDisabled by default (NoOp strategy)
StrategiesSingle strategyPluggable strategies (NoOp, TabletThrottler, or custom)

Both throttlers can monitor the same set of metrics and can coexist in the same cluster.

Performance considerations #

The query throttler adds minimal overhead:

  • Healthy systems: Less than 5% latency increase due to fast-path optimization
  • Cache hit rate: Greater than 95% in normal operations, reducing the need for metric collection
  • Under load: Graduated throttling (10-50% throttle rates) prevents complete overload while allowing some queries through
  • Priority 0 queries: Zero throttling overhead, allowing critical queries to bypass all checks

Best practices #

  1. Start with NoOp: Begin with the NoOp strategy to ensure the throttler is working without impacting traffic
  2. Use priority carefully: Reserve priority 0 for truly critical queries only
  3. Set graduated thresholds: Use multiple threshold levels to gradually increase throttling as metrics worsen
  4. Monitor metrics: Watch the QueryThrottlerRequests and QueryThrottlerThrottled metrics to understand throttling behavior
  5. Test in development: Test throttling configurations in non-production environments first
  6. Combine with workload names: Use WORKLOAD_NAME to track which workloads are being throttled most
  7. Adjust thresholds: Start with conservative thresholds and adjust based on observed behavior

Flags #

The query throttler behavior can be configured with vttablet flags:

  • --query-throttler-config-refresh-interval: How frequently to refresh configuration (default: 1 minute)

See also #