Crawl budget refers to the limit on how many times a search engine crawler (such as Googlebot) will crawl a given site within a certain period. Google adjusts crawl frequency per site to avoid placing excessive load on servers.
Crawl budget is determined by two factors: the crawl rate limit (the maximum concurrent crawl rate based on server response speed) and crawl demand (how frequently the site's content is updated and how popular it is).
Small sites (a few hundred pages or fewer) rarely need to worry about crawl budget. Google's Gary Illyes has stated that "sites with fewer than 1,000 pages don't need to worry about crawl budget." However, for large sites with tens of thousands to millions of pages, crawl budget optimization becomes a critical SEO concern.
The relationship between URL shortening services and crawl budget is indirect but important. Services that generate large volumes of shortened URLs may have each shortened URL's page (such as preview pages) become a crawl target. To prevent unnecessary crawling, it is important to control crawler access via robots.txt and use noindex tags to keep unneeded pages out of the index.
Key techniques for optimizing crawl budget include: keeping server response times fast (improving the crawl rate limit), removing duplicate and low-quality pages (narrowing the crawl scope), using XML sitemaps to highlight important pages (guiding crawl priority), and blocking unnecessary paths with robots.txt (eliminating wasted crawls). Related books are also available on Amazon.