Robots.txt

The robots.txt file tells search engines which pages to crawl and index. Use generateRobots to create one.

Basic Usage

app/robots.ts

import {  } from "@startupkit/seo"

export default function () {
  return ({
    : "https://myapp.com"
  })
}

This generates /robots.txt:

User-agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/
Disallow: /auth/

Sitemap: https://myapp.com/sitemap.xml

Custom Disallow Paths

app/robots.ts

import {  } from "@startupkit/seo"

export default function () {
  return ({
    : "https://myapp.com",
    : [
      "/api/",
      "/dashboard/",
      "/auth/",
      "/admin/",
      "/settings/",
      "/checkout/",
      "/_next/"
    ]
  })
}

Parameters

Parameter	Type	Default	Description
`baseUrl`	`string`	Required	Full site URL
`disallowPaths`	`string[]`	`["/api/", "/dashboard/", "/auth/"]`	Paths to block

Common Disallow Patterns

Web Applications

disallowPaths: [
  "/api/",           // API endpoints
  "/dashboard/",     // User dashboard
  "/auth/",          // Auth pages
  "/settings/",      // User settings
  "/checkout/",      // Checkout flow
  "/admin/"          // Admin panel
]

E-commerce

disallowPaths: [
  "/api/",
  "/cart/",
  "/checkout/",
  "/account/",
  "/search?*",       // Search with params
  "/wishlist/"
]

Content Sites

disallowPaths: [
  "/api/",
  "/admin/",
  "/draft/",         // Draft content
  "/preview/"        // Preview pages
]

Advanced Configuration

For more control, use Next.js's native robots format:

app/robots.ts

import type { MetadataRoute } from "next"

export default function (): MetadataRoute. {
  return {
    : [
      {
        : "*",
        : "/",
        : ["/api/", "/dashboard/"]
      },
      {
        : "Googlebot",
        : "/",
        : "/private/"
      },
      {
        : "BadBot",
        : "/"
      }
    ],
    : "https://myapp.com/sitemap.xml"
  }
}

Per-Agent Rules

Target specific crawlers:

rules: [
  {
    userAgent: "Googlebot",
    allow: "/",
    crawlDelay: 2
  },
  {
    userAgent: "Bingbot",
    allow: "/",
    crawlDelay: 5
  }
]

Environment-Based Configuration

Block crawlers in non-production:

app/robots.ts

import {  } from "@startupkit/seo"

export default function () {
  // Block all crawlers in preview deployments
  if (.. === "preview") {
    return {
      : {
        : "*",
        : "/"
      }
    }
  }

  return ({ : "https://myapp.com" })
}

What to Block

Always Block

Path	Reason
`/api/`	API endpoints aren't content
`/auth/`	Sign in/up pages
`/dashboard/`	User-specific content
`/_next/`	Next.js internals

Consider Blocking

Path	Reason
`/search?*`	Avoid duplicate content from search
`/checkout/`	No value for SEO
`/print/`	Print-friendly versions
`/?ref=`	URLs with tracking params

Don't Block

Path	Reason
`/`	Homepage needs indexing
`/about`	Important for SEO
`/pricing`	High-value pages
`/blog/`	Content for indexing

Testing

Check Robots.txt

Visit https://myapp.com/robots.txt directly.

Google's Robots Testing Tool

Go to Search Console
Select your property
Go to Settings → robots.txt Tester
Enter a URL to test

Test Specific URLs

# Check if a URL is blocked
curl -A "Googlebot" https://myapp.com/robots.txt

Common Issues

Pages Not Indexed

If pages aren't being indexed:

Check they're not in disallow
Verify sitemap includes them
Check page has proper metadata
Use Search Console's URL Inspection

Crawl Budget

For large sites, be strategic about blocking to preserve crawl budget:

disallowPaths: [
  "/api/",
  "/tag/*",          // Low-value tag pages
  "/author/*",       // Author archive pages
  "/*?sort=*",       // Sorted variations
  "/*?page=*"        // Paginated URLs
]

Conflicting Rules

Rules are evaluated in order. More specific rules should come first:

rules: [
  {
    userAgent: "Googlebot",
    allow: "/api/public/",    // Allow specific API
    disallow: "/api/"         // Block rest of API
  }
]

Relationship with noindex

robots.txt and noindex serve different purposes:

	robots.txt	noindex
Blocks crawling	Yes	No
Blocks indexing	No	Yes
Where defined	`robots.txt`	Page meta tag
Use for	Crawler access	Indexing control

For pages you never want indexed:

Block in robots.txt (stops crawling)
Add noindex meta tag (stops indexing if crawled via links)

Next Steps

Sitemaps - Help crawlers find pages
Page metadata - Control per-page indexing

On this page