KosmoKrator

ai

Firecrawl Lua API for KosmoKrator Agents

Agent-facing Lua documentation and function reference for the Firecrawl KosmoKrator integration.

6 functions 6 read 0 write API key auth

Lua Namespace

Agents call this integration through app.integrations.firecrawl.*. Use lua_read_doc("integrations.firecrawl") inside KosmoKrator to discover the same reference at runtime.

Agent-Facing Lua Docs

This is the rendered version of the full Lua documentation exposed to agents when they inspect the integration namespace.

Firecrawl — Lua API Reference

scrape

Scrape a single URL and extract its content in the requested format.

Parameters

NameTypeRequiredDescription
urlstringyesThe URL to scrape (e.g., "https://example.com")
formatsarraynoOutput formats: "markdown", "html", "rawHtml", "content", "links", "screenshot". Default: ["markdown"]
onlyMainContentbooleannoExtract only main content, remove nav/footers. Default: true
includeTagsarraynoCSS selectors to include
excludeTagsarraynoCSS selectors to exclude
waitForintegernoMilliseconds to wait for dynamic content
timeoutintegernoTimeout in ms (default: 30000)
actionsarraynoActions before scraping (click, scroll, wait, screenshot)

Example

local result = app.integrations.firecrawl.scrape({
  url = "https://example.com",
  formats = {"markdown", "links"},
  onlyMainContent = true
})

print(result.data.markdown)

crawl

Start an asynchronous crawl job to scrape all pages from a website. Returns a job ID for status checking.

Parameters

NameTypeRequiredDescription
urlstringyesRoot URL to crawl from
limitintegernoMax pages to crawl. Default: 10
maxDepthintegernoMax depth from root URL
formatsarraynoOutput formats per page. Default: ["markdown"]
excludePathsarraynoURL path patterns to exclude
includePathsarraynoOnly crawl URLs matching these patterns
allowBackwardLinksbooleannoAllow crawling parent page links. Default: false
allowExternalLinksbooleannoAllow crawling external domains. Default: false
onlyMainContentbooleannoExtract only main content per page. Default: true

Example

local job = app.integrations.firecrawl.crawl({
  url = "https://example.com",
  limit = 50,
  formats = {"markdown"}
})

print("Crawl started with ID: " .. job.id)

-- Poll for results
local status = app.integrations.firecrawl.get_crawl_status({
  id = job.id
})

if status.status == "completed" then
  for _, page in ipairs(status.data) do
    print(page.metadata.sourceURL .. ": " .. #page.markdown .. " chars")
  end
end

get_crawl_status

Check the status and retrieve results of a crawl job.

Parameters

NameTypeRequiredDescription
idstringyesThe crawl job ID returned by crawl

Status Values

scraping, completed, failed, cancelled

Example

local result = app.integrations.firecrawl.get_crawl_status({
  id = "crawl_abc123"
})

print("Status: " .. result.status)
print("Pages scraped: " .. #result.data)

map

Discover all URLs on a website without scraping content.

Parameters

NameTypeRequiredDescription
urlstringyesRoot URL to map
limitintegernoMax URLs to return
includeSubdomainsbooleannoInclude subdomain URLs. Default: false
searchstringnoFilter URLs matching this term
ignoreSitemapbooleannoSkip sitemap.xml. Default: false
includePathsarraynoOnly include URLs matching these patterns
excludePathsarraynoExclude URLs matching these patterns

Example

local result = app.integrations.firecrawl.map({
  url = "https://example.com",
  limit = 100,
  includePaths = {"/docs/*"}
})

for _, url in ipairs(result.links) do
  print(url)
end

extract

Extract structured data from one or more URLs using AI.

Parameters

NameTypeRequiredDescription
urlsarrayyesList of URLs to extract from
promptstringnoNatural language description of what to extract
schemaobjectnoJSON schema for expected output structure
systemPromptstringnoSystem prompt to guide AI behavior
allowExternalLinksbooleannoFollow external domain links. Default: false
enableWebSearchbooleannoSupplement with web search. Default: false
includeSubdomainsbooleannoInclude subdomains. Default: false

Example

local result = app.integrations.firecrawl.extract({
  urls = {"https://example.com/product/1"},
  prompt = "Extract the product name, price, and description"
})

print(result.data.product_name)
print(result.data.price)

With JSON schema

local result = app.integrations.firecrawl.extract({
  urls = {"https://example.com/product/1"},
  schema = {
    type = "object",
    properties = {
      name = {type = "string"},
      price = {type = "number"},
      description = {type = "string"},
      inStock = {type = "boolean"}
    }
  }
})

get_current_user

Get the authenticated user’s account information and usage stats.

Parameters

None.

Example

local user = app.integrations.firecrawl.get_current_user({})
print("Account: " .. user.email)
print("Plan: " .. user.plan)

Multi-Account Usage

If you have multiple Firecrawl accounts configured, use account-specific namespaces:

-- Default account (always works)
app.integrations.firecrawl.scrape({url = "https://example.com"})

-- Explicit default (portable across setups)
app.integrations.firecrawl.default.scrape({url = "https://example.com"})

-- Named accounts
app.integrations.firecrawl.production.scrape({url = "https://example.com"})
app.integrations.firecrawl.staging.scrape({url = "https://staging.example.com"})

All functions are identical across accounts — only the credentials differ.

Raw agent markdown
# Firecrawl — Lua API Reference

## scrape

Scrape a single URL and extract its content in the requested format.

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `url` | string | yes | The URL to scrape (e.g., `"https://example.com"`) |
| `formats` | array | no | Output formats: `"markdown"`, `"html"`, `"rawHtml"`, `"content"`, `"links"`, `"screenshot"`. Default: `["markdown"]` |
| `onlyMainContent` | boolean | no | Extract only main content, remove nav/footers. Default: `true` |
| `includeTags` | array | no | CSS selectors to include |
| `excludeTags` | array | no | CSS selectors to exclude |
| `waitFor` | integer | no | Milliseconds to wait for dynamic content |
| `timeout` | integer | no | Timeout in ms (default: 30000) |
| `actions` | array | no | Actions before scraping (click, scroll, wait, screenshot) |

### Example

```lua
local result = app.integrations.firecrawl.scrape({
  url = "https://example.com",
  formats = {"markdown", "links"},
  onlyMainContent = true
})

print(result.data.markdown)
```

---

## crawl

Start an asynchronous crawl job to scrape all pages from a website. Returns a job ID for status checking.

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `url` | string | yes | Root URL to crawl from |
| `limit` | integer | no | Max pages to crawl. Default: 10 |
| `maxDepth` | integer | no | Max depth from root URL |
| `formats` | array | no | Output formats per page. Default: `["markdown"]` |
| `excludePaths` | array | no | URL path patterns to exclude |
| `includePaths` | array | no | Only crawl URLs matching these patterns |
| `allowBackwardLinks` | boolean | no | Allow crawling parent page links. Default: `false` |
| `allowExternalLinks` | boolean | no | Allow crawling external domains. Default: `false` |
| `onlyMainContent` | boolean | no | Extract only main content per page. Default: `true` |

### Example

```lua
local job = app.integrations.firecrawl.crawl({
  url = "https://example.com",
  limit = 50,
  formats = {"markdown"}
})

print("Crawl started with ID: " .. job.id)

-- Poll for results
local status = app.integrations.firecrawl.get_crawl_status({
  id = job.id
})

if status.status == "completed" then
  for _, page in ipairs(status.data) do
    print(page.metadata.sourceURL .. ": " .. #page.markdown .. " chars")
  end
end
```

---

## get_crawl_status

Check the status and retrieve results of a crawl job.

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `id` | string | yes | The crawl job ID returned by `crawl` |

### Status Values

`scraping`, `completed`, `failed`, `cancelled`

### Example

```lua
local result = app.integrations.firecrawl.get_crawl_status({
  id = "crawl_abc123"
})

print("Status: " .. result.status)
print("Pages scraped: " .. #result.data)
```

---

## map

Discover all URLs on a website without scraping content.

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `url` | string | yes | Root URL to map |
| `limit` | integer | no | Max URLs to return |
| `includeSubdomains` | boolean | no | Include subdomain URLs. Default: `false` |
| `search` | string | no | Filter URLs matching this term |
| `ignoreSitemap` | boolean | no | Skip sitemap.xml. Default: `false` |
| `includePaths` | array | no | Only include URLs matching these patterns |
| `excludePaths` | array | no | Exclude URLs matching these patterns |

### Example

```lua
local result = app.integrations.firecrawl.map({
  url = "https://example.com",
  limit = 100,
  includePaths = {"/docs/*"}
})

for _, url in ipairs(result.links) do
  print(url)
end
```

---

## extract

Extract structured data from one or more URLs using AI.

### Parameters

| Name | Type | Required | Description |
|------|------|----------|-------------|
| `urls` | array | yes | List of URLs to extract from |
| `prompt` | string | no | Natural language description of what to extract |
| `schema` | object | no | JSON schema for expected output structure |
| `systemPrompt` | string | no | System prompt to guide AI behavior |
| `allowExternalLinks` | boolean | no | Follow external domain links. Default: `false` |
| `enableWebSearch` | boolean | no | Supplement with web search. Default: `false` |
| `includeSubdomains` | boolean | no | Include subdomains. Default: `false` |

### Example

```lua
local result = app.integrations.firecrawl.extract({
  urls = {"https://example.com/product/1"},
  prompt = "Extract the product name, price, and description"
})

print(result.data.product_name)
print(result.data.price)
```

### With JSON schema

```lua
local result = app.integrations.firecrawl.extract({
  urls = {"https://example.com/product/1"},
  schema = {
    type = "object",
    properties = {
      name = {type = "string"},
      price = {type = "number"},
      description = {type = "string"},
      inStock = {type = "boolean"}
    }
  }
})
```

---

## get_current_user

Get the authenticated user's account information and usage stats.

### Parameters

None.

### Example

```lua
local user = app.integrations.firecrawl.get_current_user({})
print("Account: " .. user.email)
print("Plan: " .. user.plan)
```

---

## Multi-Account Usage

If you have multiple Firecrawl accounts configured, use account-specific namespaces:

```lua
-- Default account (always works)
app.integrations.firecrawl.scrape({url = "https://example.com"})

-- Explicit default (portable across setups)
app.integrations.firecrawl.default.scrape({url = "https://example.com"})

-- Named accounts
app.integrations.firecrawl.production.scrape({url = "https://example.com"})
app.integrations.firecrawl.staging.scrape({url = "https://staging.example.com"})
```

All functions are identical across accounts — only the credentials differ.

Metadata-Derived Lua Example

local result = app.integrations.firecrawl.firecrawl_scrape({
  url = "example_url",
  formats = "example_formats",
  onlyMainContent = true,
  includeTags = "example_includeTags",
  excludeTags = "example_excludeTags",
  waitFor = 1,
  timeout = 1,
  actions = "example_actions"
})
print(result)

Functions

firecrawl_scrape

Scrape a single URL and extract its content. Returns the page content in the requested format (markdown by default). Supports actions like waiting for JavaScript, taking screenshots, and extracting specific elements.

Operation
Read read
Full name
firecrawl.firecrawl_scrape
ParameterTypeRequiredDescription
url string yes The URL to scrape (e.g., "https://example.com").
formats array no Output formats to return. Options: "markdown", "html", "rawHtml", "content", "links", "screenshot", "actions". Default: ["markdown"].
onlyMainContent boolean no Extract only the main content, removing navigation, footers, etc. Default: true.
includeTags array no CSS selectors to include. Only these elements will be scraped.
excludeTags array no CSS selectors to exclude. These elements will be removed from the result.
waitFor integer no Time in milliseconds to wait for dynamic content to load before scraping.
timeout integer no Timeout in milliseconds for the scrape request. Default: 30000.
actions array no List of actions to perform before scraping (e.g., click, scroll, wait, screenshot).

firecrawl_crawl

Start a crawl job to scrape all pages from a website starting at the given URL. Returns a crawl job ID — use firecrawl_get_crawl_status to check progress and retrieve results.

Operation
Read read
Full name
firecrawl.firecrawl_crawl
ParameterTypeRequiredDescription
url string yes The root URL to start crawling from (e.g., "https://example.com").
limit integer no Maximum number of pages to crawl. Default: 10.
maxDepth integer no Maximum crawl depth from the root URL. Default: based on plan.
formats array no Output formats for each page. Options: "markdown", "html", "rawHtml", "content", "links". Default: ["markdown"].
excludePaths array no URL path patterns to exclude from crawling (e.g., ["/blog/*"]).
includePaths array no Only crawl URLs matching these path patterns (e.g., ["/docs/*"]).
allowBackwardLinks boolean no Allow crawling links that go back to parent pages. Default: false.
allowExternalLinks boolean no Allow crawling links to external domains. Default: false.
onlyMainContent boolean no Extract only main content from each page. Default: true.

firecrawl_get_crawl_status

Check the status and retrieve results of a crawl job. Returns the current status (scraping, completed, failed, cancelled) and all scraped data once complete.

Operation
Read read
Full name
firecrawl.firecrawl_get_crawl_status
ParameterTypeRequiredDescription
id string yes The crawl job ID returned by the firecrawl_crawl tool.

firecrawl_map

Map a website to discover all linked URLs. Returns a list of all URLs found on the site without scraping full content. Useful for understanding site structure before crawling.

Operation
Read read
Full name
firecrawl.firecrawl_map
ParameterTypeRequiredDescription
url string yes The root URL to map (e.g., "https://example.com").
limit integer no Maximum number of URLs to return. Default: based on plan.
includeSubdomains boolean no Include URLs from subdomains. Default: false.
search string no Filter URLs that match a search term (only returns URLs containing this string).
ignoreSitemap boolean no Skip sitemap.xml discovery and only use on-page links. Default: false.
includePaths array no Only include URLs matching these path patterns.
excludePaths array no Exclude URLs matching these path patterns.

firecrawl_extract

Extract structured data from one or more URLs using AI. Provide a prompt describing what to extract, or a JSON schema for the expected output format. Ideal for pulling specific data points from web pages.

Operation
Read read
Full name
firecrawl.firecrawl_extract
ParameterTypeRequiredDescription
urls array yes List of URLs to extract data from (e.g., ["https://example.com/about"]).
prompt string no Natural language description of what data to extract from the pages.
schema object no JSON schema defining the expected output structure. The response will conform to this schema.
systemPrompt string no System prompt to guide the AI extraction behavior.
allowExternalLinks boolean no Allow following links to external domains during extraction. Default: false.
enableWebSearch boolean no Enable web search to supplement extraction with additional context. Default: false.
includeSubdomains boolean no Include subdomains when following links. Default: false.

firecrawl_get_current_user

Get the authenticated user's account information, including plan details and usage statistics. Useful for verifying API key validity and checking remaining credits.

Operation
Read read
Full name
firecrawl.firecrawl_get_current_user
ParameterTypeRequiredDescription
No parameters.