StartupKitstartupkit
SEO

Robots.txt

How to configure robots.txt for search engine crawlers

The robots.txt file tells search engines which pages to crawl and index. Use generateRobots to create one.

Basic Usage

app/robots.ts
import {  } from "@startupkit/seo"

export default function () {
  return ({
    : "https://myapp.com"
  })
}

This generates /robots.txt:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/
Disallow: /auth/

Sitemap: https://myapp.com/sitemap.xml

Custom Disallow Paths

app/robots.ts
import {  } from "@startupkit/seo"

export default function () {
  return ({
    : "https://myapp.com",
    : [
      "/api/",
      "/dashboard/",
      "/auth/",
      "/admin/",
      "/settings/",
      "/checkout/",
      "/_next/"
    ]
  })
}

Parameters

ParameterTypeDefaultDescription
baseUrlstringRequiredFull site URL
disallowPathsstring[]["/api/", "/dashboard/", "/auth/"]Paths to block

Common Disallow Patterns

Web Applications

disallowPaths: [
  "/api/",           // API endpoints
  "/dashboard/",     // User dashboard
  "/auth/",          // Auth pages
  "/settings/",      // User settings
  "/checkout/",      // Checkout flow
  "/admin/"          // Admin panel
]

E-commerce

disallowPaths: [
  "/api/",
  "/cart/",
  "/checkout/",
  "/account/",
  "/search?*",       // Search with params
  "/wishlist/"
]

Content Sites

disallowPaths: [
  "/api/",
  "/admin/",
  "/draft/",         // Draft content
  "/preview/"        // Preview pages
]

Advanced Configuration

For more control, use Next.js's native robots format:

app/robots.ts
import type { MetadataRoute } from "next"

export default function (): MetadataRoute. {
  return {
    : [
      {
        : "*",
        : "/",
        : ["/api/", "/dashboard/"]
      },
      {
        : "Googlebot",
        : "/",
        : "/private/"
      },
      {
        : "BadBot",
        : "/"
      }
    ],
    : "https://myapp.com/sitemap.xml"
  }
}

Per-Agent Rules

Target specific crawlers:

rules: [
  {
    userAgent: "Googlebot",
    allow: "/",
    crawlDelay: 2
  },
  {
    userAgent: "Bingbot",
    allow: "/",
    crawlDelay: 5
  }
]

Environment-Based Configuration

Block crawlers in non-production:

app/robots.ts
import {  } from "@startupkit/seo"

export default function () {
  // Block all crawlers in preview deployments
  if (.. === "preview") {
    return {
      : {
        : "*",
        : "/"
      }
    }
  }

  return ({ : "https://myapp.com" })
}

What to Block

Always Block

PathReason
/api/API endpoints aren't content
/auth/Sign in/up pages
/dashboard/User-specific content
/_next/Next.js internals

Consider Blocking

PathReason
/search?*Avoid duplicate content from search
/checkout/No value for SEO
/print/Print-friendly versions
/*?ref=*URLs with tracking params

Don't Block

PathReason
/Homepage needs indexing
/aboutImportant for SEO
/pricingHigh-value pages
/blog/Content for indexing

Testing

Check Robots.txt

Visit https://myapp.com/robots.txt directly.

Google's Robots Testing Tool

  1. Go to Search Console
  2. Select your property
  3. Go to Settings → robots.txt Tester
  4. Enter a URL to test

Test Specific URLs

# Check if a URL is blocked
curl -A "Googlebot" https://myapp.com/robots.txt

Common Issues

Pages Not Indexed

If pages aren't being indexed:

  1. Check they're not in disallow
  2. Verify sitemap includes them
  3. Check page has proper metadata
  4. Use Search Console's URL Inspection

Crawl Budget

For large sites, be strategic about blocking to preserve crawl budget:

disallowPaths: [
  "/api/",
  "/tag/*",          // Low-value tag pages
  "/author/*",       // Author archive pages
  "/*?sort=*",       // Sorted variations
  "/*?page=*"        // Paginated URLs
]

Conflicting Rules

Rules are evaluated in order. More specific rules should come first:

rules: [
  {
    userAgent: "Googlebot",
    allow: "/api/public/",    // Allow specific API
    disallow: "/api/"         // Block rest of API
  }
]

Relationship with noindex

robots.txt and noindex serve different purposes:

robots.txtnoindex
Blocks crawlingYesNo
Blocks indexingNoYes
Where definedrobots.txtPage meta tag
Use forCrawler accessIndexing control

For pages you never want indexed:

  1. Block in robots.txt (stops crawling)
  2. Add noindex meta tag (stops indexing if crawled via links)

Next Steps

On this page