automation site/scraper agent workflow

User opens Web Scraping chat projects/automations/app/chat/[agent]/page.tsx:1 validates /scrape against lib/automations.ts and renders the generic ChatView. 2. Frontend sends messages to the scrape API projects/automations/components/ChatView.tsx:421 uses DefaultChatTransport({ api: “/api/chat/scrape” }). It also knows how to render scrape tool cards for: crawl_url, search_sources, list_new_models, get_model_details, mark_reviewed. 3. API creates MCP tools, then runs the agent projects/automations/app/api/chat/scrape/route.ts:1 does three things: - reads chat messages - calls getScrapeMcpTools() - streams the response with createScrapeAgent(tools) 4. Scrape agent wraps an LLM with deterministic tools projects/automations/agents/scrape/agent.ts:1 creates a ToolLoopAgent named aura_scrape, using gpt-4o-mini from projects/automations/agents/_shared/model.ts:1. 5. Instructions force tool-first behavior projects/automations/agents/scrape/instruction.ts:1 tells the agent not to invent model-launch records, to call tools for crawl/search/list/detail/review tasks, and to handle “latest,” “Chinese providers,” date ranges, and review status updates deterministically. 6. MCP server is launched over stdio projects/automations/lib/mcp.ts:35 starts /root/projects/stdio-mcp/dist/index.js with:

 ALLOWED_TOOLS=crawl_url,search_sources,list_new_models,get_model_details,mark_reviewed

 So the scrape agent only sees those model-monitor tools.

7. MCP server registers and filters tools In /root/projects/stdio-mcp, projects/stdio-mcp/src/index.ts:1 starts the MCP server, projects/stdio-mcp/src/tools/ modelMonitor/index.ts:1 registers the five model-monitor tools, and projects/stdio-mcp/src/routes/toolRoutes.ts:1 validates input with Zod before execution. 8. Tool behavior - crawl_url: fetches an allowlisted HTTPS URL, extracts markdown/text/html, saves crawl/source records. - search_sources: searches stored candidate source records. - list_new_models: queries stored model launches with filters. - get_model_details: returns one full model launch record. - mark_reviewed: updates review status, requiring a reason for ignored. 9. Storage layer The scrape workflow uses Postgres via DATABASE_URL / MODEL_MONITOR_DATABASE_URL, or Supabase REST env vars. The tables are defined in projects/automations/prisma/schema.prisma:55: ModelMonitorCrawl, ModelMonitorSource, ModelLaunch, and ModelLaunchSource.

Shivam's Notes

Graph View

automation site/scraper agent workflow