SEO
Robots.txt
How to configure robots.txt for search engine crawlers
The robots.txt file tells search engines which pages to crawl and index. Use generateRobots to create one.
Basic Usage
import { } from "@startupkit/seo"
export default function () {
return ({
: "https://myapp.com"
})
}This generates /robots.txt:
User-agent: *
Allow: /
Disallow: /api/
Disallow: /dashboard/
Disallow: /auth/
Sitemap: https://myapp.com/sitemap.xmlCustom Disallow Paths
import { } from "@startupkit/seo"
export default function () {
return ({
: "https://myapp.com",
: [
"/api/",
"/dashboard/",
"/auth/",
"/admin/",
"/settings/",
"/checkout/",
"/_next/"
]
})
}Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
baseUrl | string | Required | Full site URL |
disallowPaths | string[] | ["/api/", "/dashboard/", "/auth/"] | Paths to block |
Common Disallow Patterns
Web Applications
disallowPaths: [
"/api/", // API endpoints
"/dashboard/", // User dashboard
"/auth/", // Auth pages
"/settings/", // User settings
"/checkout/", // Checkout flow
"/admin/" // Admin panel
]E-commerce
disallowPaths: [
"/api/",
"/cart/",
"/checkout/",
"/account/",
"/search?*", // Search with params
"/wishlist/"
]Content Sites
disallowPaths: [
"/api/",
"/admin/",
"/draft/", // Draft content
"/preview/" // Preview pages
]Advanced Configuration
For more control, use Next.js's native robots format:
import type { MetadataRoute } from "next"
export default function (): MetadataRoute. {
return {
: [
{
: "*",
: "/",
: ["/api/", "/dashboard/"]
},
{
: "Googlebot",
: "/",
: "/private/"
},
{
: "BadBot",
: "/"
}
],
: "https://myapp.com/sitemap.xml"
}
}Per-Agent Rules
Target specific crawlers:
rules: [
{
userAgent: "Googlebot",
allow: "/",
crawlDelay: 2
},
{
userAgent: "Bingbot",
allow: "/",
crawlDelay: 5
}
]Environment-Based Configuration
Block crawlers in non-production:
import { } from "@startupkit/seo"
export default function () {
// Block all crawlers in preview deployments
if (.. === "preview") {
return {
: {
: "*",
: "/"
}
}
}
return ({ : "https://myapp.com" })
}What to Block
Always Block
| Path | Reason |
|---|---|
/api/ | API endpoints aren't content |
/auth/ | Sign in/up pages |
/dashboard/ | User-specific content |
/_next/ | Next.js internals |
Consider Blocking
| Path | Reason |
|---|---|
/search?* | Avoid duplicate content from search |
/checkout/ | No value for SEO |
/print/ | Print-friendly versions |
/*?ref=* | URLs with tracking params |
Don't Block
| Path | Reason |
|---|---|
/ | Homepage needs indexing |
/about | Important for SEO |
/pricing | High-value pages |
/blog/ | Content for indexing |
Testing
Check Robots.txt
Visit https://myapp.com/robots.txt directly.
Google's Robots Testing Tool
- Go to Search Console
- Select your property
- Go to Settings → robots.txt Tester
- Enter a URL to test
Test Specific URLs
# Check if a URL is blocked
curl -A "Googlebot" https://myapp.com/robots.txtCommon Issues
Pages Not Indexed
If pages aren't being indexed:
- Check they're not in
disallow - Verify sitemap includes them
- Check page has proper metadata
- Use Search Console's URL Inspection
Crawl Budget
For large sites, be strategic about blocking to preserve crawl budget:
disallowPaths: [
"/api/",
"/tag/*", // Low-value tag pages
"/author/*", // Author archive pages
"/*?sort=*", // Sorted variations
"/*?page=*" // Paginated URLs
]Conflicting Rules
Rules are evaluated in order. More specific rules should come first:
rules: [
{
userAgent: "Googlebot",
allow: "/api/public/", // Allow specific API
disallow: "/api/" // Block rest of API
}
]Relationship with noindex
robots.txt and noindex serve different purposes:
| robots.txt | noindex | |
|---|---|---|
| Blocks crawling | Yes | No |
| Blocks indexing | No | Yes |
| Where defined | robots.txt | Page meta tag |
| Use for | Crawler access | Indexing control |
For pages you never want indexed:
- Block in
robots.txt(stops crawling) - Add
noindexmeta tag (stops indexing if crawled via links)
Next Steps
- Sitemaps - Help crawlers find pages
- Page metadata - Control per-page indexing