Concurrency Caps by Target Type: How to Scale Without Triggering Blocks

11 min read

If you scale traffic the same way to every site, you’ll get blocked. The safest way to grow throughput is to set concurrency caps by target type—practical limits on simultaneous requests per site, per IP, and per session that reflect how that target detects abuse. The short answer: start conservative, model your rate budget, adapt in real time based on error signals, and scale horizontally with more identities (IPs, sessions) instead of cranking threads on a single identity.

This article shows you how to pick safe per-target caps, model your capacity with simple math, and implement adaptive throttling. You’ll also get starting points by vertical (SERPs, e‑commerce, APIs, social), a checklist, and troubleshooting tips to avoid blocks while maintaining speed.

Why blocks happen (and how to spot them early)

Most targets don’t just look at raw request volume. They correlate multiple signals:

  • Rate spikes: bursts of parallel requests or sudden RPS (requests per second) jumps.
  • Identity concentration: too many hits from one IP, ASN, device fingerprint, or account.
  • Behavioral anomalies: zero dwell time, uniform paths, headless browser traits, no assets.
  • Error/captcha challenges: HTTP 429 (Too Many Requests), 403 (Forbidden), 503/520, TCP RST, JS-challenges, or CAPTCHAs.
  • Latency inflation: server-side throttling quietly increases response time as an early warning.

Direct answer: If your per-IP concurrency or burst RPS exceeds a target’s tolerance, you’ll trigger any of the above. The fix is to cap concurrent requests by target type, smooth bursts, and spread load across more identities.

A practical framework: rate budget and the concurrency ladder

Let’s define terms:

  • Concurrency: how many requests are in flight at the same time for a given identity (IP/session).
  • Throughput (RPS): completed requests per second.
  • Latency (RTT): average time a request spends in flight.
  • Rate budget: the safe throughput a target allows per identity before triggering defenses.

Use Little’s Law to reason about safe concurrency:

  • Concurrency ≈ Throughput × Latency

If a safe per-IP throughput for a target is ~0.5 RPS and median latency is 1.5 s, then safe concurrency per IP is ~0.75 (round down to 1 concurrent request). If you need 50 RPS, you’ll scale horizontally with ~100 IPs instead of running 50 threads on one IP.

The Concurrency Ladder (how to scale safely):

  1. Start with 1 concurrent request per IP per target.
  2. Increase gradually (e.g., +1 every 5–10 minutes) while monitoring 429/403 rates, latency, and challenge frequency.
  3. When error or challenge rate exceeds a threshold (e.g., >1–2% sustained), step down and widen the identity pool.
  4. Never increase concurrency and per-IP RPS at the same time; change one variable at a time.

Token bucket for burst control:

  • Assign each IP a small burst allowance (tokens) and a refill rate that matches safe RPS.
  • This prevents short spikes that often trip defenses even if your long-term average looks safe.

Build a per-target profile before you scale

Not all targets are equal. Profile each one on a small sample first:

  • Identity sensitivity: Does the site tie behavior to IP, cookie, account, or TLS/JA3 fingerprint?
  • Surface type: Static HTML/CDN, dynamic SPA, API with explicit rate limits, search results, member-only pages.
  • Latency spread: Median and p95 RTT to compute safe concurrency accurately.
  • Allowed behavior: robots.txt, published API limits, or ToS (follow site rules and your legal counsel’s guidance).
  • Error thresholds: At what concurrency do you see 429/403/CAPTCHAs? Capture the exact point and back off 30–50%.

Document these in a per-target config so your scheduler doesn’t treat all domains the same.

These are conservative starting points for per-IP concurrency and per-IP steady RPS. Tune up or down based on your profile and legal constraints. “Burst” is short spikes (≤10 s).

Target typePer-IP concurrency (start)Per-IP steady RPSBurst notes
Search engines (SERPs)10.2–0.3Long delays between requests; vary params; expect aggressive defenses
E‑commerce product pages1–20.3–0.5Spread across categories; avoid cart/checkout URLs
Price/stock JSON APIs1Respect published limitsUse API keys if available; enforce strict token bucket
Social profile pages10.1–0.2Heavy identity correlation; prefer session-bound crawling
Classifieds/directories1–20.3–0.6Sensitive to patterns; randomize paths and times
News/blogs on CDN2–40.7–1.5CDNs are tolerant but watch for WAF rules and geo anomalies
Documentation/static sites2–41.0–2.0Cache-friendly; still smooth bursts
Internal or partner APIsAs per contractAs per contractFollow SLA/rate headers strictly

Guideline: When in doubt, start at 1 concurrent request per IP and ramp slowly.

Session and identity strategy: don’t overload a single identity

Most blocking correlates across these identity layers:

  • IP address and ASN (autonomous system).
  • Cookie/session or login account.
  • User agent, viewport, and browser features.
  • TLS and HTTP/2 fingerprinting.

Best practices:

  • Bind session cookies to a single IP. Mixing many IPs per session or many sessions per IP raises flags.
  • Keep per-session concurrency at 1 unless the target is tolerant (e.g., static CDN content).
  • Vary user agents and minor request headers consistently per identity, not per request.
  • Respect geographic expectations: some content is geo-scoped. Use locations that match target audience.

If you need more throughput, scale with more identities rather than raising threads per identity.

Proxy planning: capacity math and safe scaling

Dedicated private proxies give you stable identities you control. Plan pool size using your rate budget per target:

  • Required IPs ≈ Desired RPS per target ÷ Safe per-IP RPS
  • Or via latency: Safe concurrency per IP ≈ Safe per-IP RPS × Median RTT

Example: You need 20 RPS on product pages. Starting safe per-IP RPS is 0.4. You’ll want ~50 IPs (20 ÷ 0.4). With median RTT 1.2 s, per-IP concurrency ~0.48 (keep it at 1 concurrent request per IP and scale out).

Where to start with proxies:

  • Use dedicated private proxies you control, ideally with location choice to match target geography. See InstantProxies’ private proxies and available locations.
  • Warm up new IPs gradually: begin at 0.05–0.1 RPS and increase over hours/days.
  • Avoid sudden pool-wide bursts when you add capacity; ramp each IP individually.
  • Keep health stats per IP: success rate, average RTT, challenge rate, and cool‑off timers.

If you’re budgeting, InstantProxies’ pricing makes it straightforward to estimate cost per added identity. Even with conservative per-IP caps, scaling horizontally is usually safer and cheaper than fighting WAFs with complexity.

Adaptive throttling: let the target tell you the limit

Hard-coded caps break when conditions change. Implement adaptive control:

  • Signals to watch per target and per identity:
    • 429/403 rates over sliding windows (e.g., 2–5 minutes).
    • Median and p95 latency drift (a slowdown often precedes explicit blocks).
    • Challenge/captcha frequency.
  • Actions:
    • Additive increase, multiplicative decrease (AIMD): slowly raise concurrency by small steps; on errors, cut by 30–50%.
    • Token bucket refill tuning: decrease refill rate when signals degrade; restore slowly as they improve.
    • Cool‑off timers: park an identity after consecutive 429/403s for 15–60 minutes.
    • Path spreading: diversify URLs and time-of-day to avoid hot spots.

Pseudo-flow:

for each target,identity:
  if error_rate > 1% or p95_latency > threshold:
    concurrency *= 0.7
    refill_rate *= 0.8
    set_cooloff_if_persistent()
  else if success_rate_stable:
    concurrency += 1 (max step every N minutes)

Implementation checklist

  • Define target profiles: type, geo, auth status, robots/API limits.
  • Establish initial caps: per-IP concurrency = 1; per-IP RPS based on table above.
  • Measure latency: compute median and p95 RTT to inform caps via Little’s Law.
  • Add burst smoothing: token bucket with small bursts and steady refill.
  • Separate identities: bind session, headers, and cookie jar to one IP.
  • Warm up: ramp each new IP’s RPS gradually.
  • Monitor: 2xx/3xx, 4xx/5xx, challenge counts, RTT, retry rates.
  • Backoff: AIMD with cool‑off windows for unhappy identities.
  • Rotate work: distribute domains evenly across the proxy pool.
  • Log and tune: store per-target cap decisions and revisit weekly.

Troubleshooting: signals your caps are too high (or too low)

Too high:

  • Rising 429/403s or intermittent 5xx after increases.
  • CAPTCHAs appear suddenly or much more often.
  • RTT increases 2–3× with no network change.
  • Short runs succeed, long runs degrade.

Fixes: Decrease concurrency 30–50%, add IPs, lengthen delays, increase randomness, and re‑profile.

Too low:

  • CPU/worker idle while queues grow; consistent 200s with room to increase.
  • RTT stable, zero challenges for hours.

Fixes: Increase concurrency by small steps and re-measure.

Mini-scenarios to make it concrete

Scenario 1: Tracking search visibility on SERPs

  • Goal: 2 RPS sustained, 10k queries/day.
  • Start: 1 concurrent per IP, 0.2 RPS per IP, random user agents, long think time (5–15 s) between queries.
  • Capacity plan: Need ~10 IPs to reach 2 RPS. Expect aggressive defenses—watch for JS challenges.
  • Adaptation: If challenge rate >0.5%, slow to 0.15 RPS/IP or increase pool to 15 IPs.

Scenario 2: Price checks on e‑commerce product pages

  • Goal: 15 RPS across many categories.
  • Start: 1–2 concurrent per IP, 0.4 RPS/IP, randomized paths. Avoid cart/account flows.
  • Capacity plan: ~38–40 IPs. Stagger crawl over 24 hours to avoid time-of-day spikes.
  • Adaptation: If p95 latency doubles, cut concurrency to 1/IP and add 10 more IPs.

Scenario 3: Polling a documented JSON API

  • Goal: Respect 60 requests/min per key.
  • Start: Concurrency sized by RTT. At 500 ms RTT, safe concurrency per key ~0.5 RPS × 0.5 s = 0.25 (round to 1), but obey the explicit 60/min window.
  • Capacity plan: Scale with more keys/accounts if permitted by the provider and ToS; otherwise queue.
  • Adaptation: Parse rate-limit headers; back off precisely when the server instructs you to.

Scenario 4: Crawling news on a CDN-backed site

  • Goal: 50 RPS to fetch headlines and article pages.
  • Start: 2–3 concurrent/IP, 1.0 RPS/IP. Avoid hammering sitemaps; spread across sections.
  • Capacity plan: ~50 IPs. CDNs tolerate more, but WAF rules can still trip on bursts.
  • Adaptation: If 403s rise, drop to 0.6 RPS/IP and widen the pool.

Decision criteria: when to scale up threads vs. add IPs

  • Scale threads (increase per-IP concurrency) if:
    • Target is tolerant (static/CDN) and your 4xx rate is near zero for days.
    • Median RTT is high, but error rates are flat and burst smoothing is in place.
  • Add IPs (horizontal scale) if:
    • Any 429/403s appear as you increase threads.
    • You see CAPTCHAs or JS challenges.
    • The target binds behavior tightly to IP or session.

Heuristic: If you double threads and errors rise, revert and add 50–100% more IPs instead.

Measuring success: a simple scorecard

Track these per target and per identity:

  • Success rate: target ≥98% 2xx/3xx sustained.
  • Challenge rate: target ≤0.2% for long-running jobs.
  • Latency stability: p95 within 1.5–2× of median.
  • Throughput stability: sustained RPS with <10% variance hour to hour.

If you hold these for a week while raising output, your concurrency caps are working.

Frequently Asked Questions

  • What’s the difference between concurrency and rate limit?

    • Concurrency is simultaneous in-flight requests; rate limit is completed requests over time. High concurrency with high latency can still mean low RPS.
  • How many threads per proxy is safe?

    • For sensitive targets (SERPs, social), 1 thread/IP. For tolerant targets (static/CDN), 2–4 threads/IP. Always validate with measurements.
  • Do headless browsers change the caps?

    • Usually yes—they’re heavier and more fingerprintable. Keep per-IP concurrency at 1 for headless unless you’ve verified tolerance.
  • Can I just randomize delays and skip proxies?

    • Randomization helps, but identity concentration remains a risk. Dedicated private proxies let you spread load safely and predictably.
  • How do geo and ASN affect blocking?

    • Some sites expect traffic from specific regions or network types. Matching geography using proxies in relevant locations can reduce false positives.

Key takeaways

  • Set concurrency caps by target type, not one-size-fits-all.
  • Use Little’s Law and a token bucket to model safe per-IP concurrency and burst behavior.
  • Scale horizontally with more identities (IPs/sessions) before raising per-IP threads.
  • Implement adaptive throttling—watch 429/403s, latency, and challenges to steer your caps in real time.
  • Start conservative: 1 concurrent per IP is often the right beginning; prove tolerance before increasing.

When you’re ready to scale out, a stable pool of dedicated private proxies with the right geographies makes it straightforward to keep concurrency per identity low while total throughput climbs. If you need predictable capacity, review InstantProxies’ private proxies, available locations, and straightforward pricing as part of your plan.