Rate limiting caps the number of requests an API or service accepts within a defined time window. Rules like "60 requests per minute" or "1,000 requests per day" are enforced, and requests exceeding the limit receive an HTTP 429 (Too Many Requests) response.
Rate limiting serves three purposes: server protection (preventing service outages from request floods), fairness (stopping individual users from monopolizing resources), and cost management (controlling pay-per-use cloud billing).
URL shortening services apply rate limiting in several areas: URL shortening API request limits (preventing spam-like mass generation), access limits on shortened URLs (mitigating DDoS attacks), and login attempt limits on dashboards (blocking brute-force attacks).
Four main rate limiting algorithms exist. Fixed window resets a counter at regular intervals. Sliding window continuously calculates requests within the most recent time frame. Token bucket replenishes tokens at a steady rate, consuming one per request. Leaky bucket processes requests at a constant rate, queuing any excess.
Developers working with rate-limited APIs should check response headers (X-RateLimit-Limit, X-RateLimit-Remaining, Retry-After) and adjust request frequency based on remaining quota. When receiving a 429 response, exponential backoff - waiting for the duration specified in the Retry-After header before retrying - is the recommended approach. You can find related books on Amazon.