Semantic Fetch Intelligence - AI Browser Automation Extension

AI-Powered Browser Automation for Everyone

Natural language commands. No coding required. Works anywhere.

Developer Automation

Smarter than Playwright

Skip brittle CSS selectors and complex scripting. Use natural language to automate testing, scraping, and workflows. AI adapts to page changes automatically.

✓ No selector maintenance - semantic understanding

✓ Write tests in plain English, not code

✓ Self-healing automation that adapts to UI changes

Example command:


                                "Go to GitHub, search for React hooks, and open the first result"

Elderly Assistant

Simplifying the web

Help seniors navigate complex websites, fill forms, book appointments, and find information - all through simple voice-like commands.

✓ Simple commands, no technical knowledge needed

✓ Helps with banking, shopping, healthcare portals

✓ Reduces frustration with confusing interfaces

Example command:


                                "Go to my pharmacy website and refill my prescription"

Use Cases

🔍 Research Automation

Automatically search multiple sources, navigate documentation, and gather information.


                                "Go to Wikipedia and search for quantum computing"

🛒 E-commerce Tasks

Search products, compare prices, and navigate shopping sites autonomously.


                                "Search for wireless headphones on Amazon"

📝 Form Filling

Intelligent form detection and filling with context understanding.


                                "Fill out the contact form with my details"

🔗 Multi-step Workflows

Chain complex actions across multiple pages and sites.


                                "Go to GitHub and search for React components"

Ready to Get Started?

Semantic Fetch Intelligence v1.0.0

Free to use • MIT Licensed • Commercial use allowed

Get on Millpond.ai

Available on Millpond.ai • Chrome & Edge compatible • Requires Gemini API key

Quick Start Guide

Get the Extension

Visit Millpond.ai and get fetch-extension-v1.0.0.zip
Extract the ZIP file
Open Chrome/Edge and navigate to chrome://extensions/
Enable "Developer mode" (toggle in top right)
Click "Load unpacked" and select the extracted folder

Configure API Key

Get a free Gemini API key from ai.google.dev
Click the extension icon in your browser toolbar
Click the settings (⚙️) icon
Paste your API key and save

Example API key format:

AIzaSyDxxxxxxxxxxxxxxxxxxxxxxxxxxx

Start Automating

Open the extension side panel
Type a natural language command
Press Enter or click the send button
Watch Fetch autonomously complete the task!

💡 Try these examples:

→ "Go to Wikipedia and search for artificial intelligence"
→ "Search for wireless mouse on amazon.com"
→ "Navigate to reddit.com and search for machine learning"

Common Issues

Extension won't load?

Make sure you extracted the ZIP and selected the dist folder when loading.

API quota exceeded?

Free tier has daily limits. Wait 24 hours or upgrade your API plan at ai.google.dev/pricing.

Agent not starting?

Extension requires a regular webpage to be active. It won't work on chrome:// or edge:// pages. Try opening google.com first.

Technical Specifications

Core Technology

Extension Type: Chrome Extension MV3
Framework: React 19 + TypeScript
AI Model: Gemini 2.5 Flash
Styling: Tailwind CSS
Build Tool: Vite

Required Permissions

✓
sidePanel
Display extension UI in browser side panel
✓
activeTab
Access current tab for automation
✓
scripting
Inject scripts to interact with pages
✓
storage
Store API key and settings
✓
tabs
Create and navigate tabs

Key Features

→ Semantic DOM analysis
→ Autonomous agent loop
→ Natural language commands
→ Multi-step task execution
→ Error recovery & retry logic
→ Real-time status updates

Requirements

•
Browser
Chrome/Edge (MV3 support)
•
API Key
Free Gemini API key from Google
•
Network
Internet connection for AI inference

Version 1.0.0

Release Date

January 2026

License

MIT (Free to use)

Package Size

113 KB

Availability

Millpond.ai

Framework

Cormorant Foraging v1.0

System Architecture

Autonomous Agent Loop

┌─────────────────┐
│   User Input    │
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Start Agent    │
└────────┬────────┘
         │
         ▼
    ┌────────────┐
    │ Valid Tab? │───── No ────▶ ┌──────────────────┐
    └────────────┘               │ Create New Tab   │
         │                       └────────┬─────────┘
        Yes                               │
         │                                │
         └────────────────────────────────┘
                        │
                        ▼
              ┌──────────────────┐
              │  Analyze DOM     │◀─────────┐
              └────────┬─────────┘          │
                       │                    │
                       ▼                    │
              ┌──────────────────┐          │
              │ Semantic         │          │
              │ Anchoring        │          │
              └────────┬─────────┘          │
                       │                    │
                       ▼                    │
              ┌──────────────────┐          │
              │ Gemini AI        │          │
              │ Thinking         │          │
              └────────┬─────────┘          │
                       │                    │
                       ▼                    │
                  ┌─────────┐               │
                  │Decision?│               │
                  └────┬────┘               │
                       │                    │
        ┌──────────────┼──────────────┐    │
        │              │              │    │
    Navigate        Click          Type    │
        │              │              │    │
        ▼              │              │    │
  ┌──────────┐        │              │    │
  │Update URL│        │              │    │
  └────┬─────┘        │              │    │
       │              │              │    │
       ▼              ▼              ▼    │
  ┌──────────┐  ┌──────────┐  ┌──────────┐
  │Wait Load │  │Execute   │  │Execute   │
  └────┬─────┘  │Click     │  │Type      │
       │        └────┬─────┘  └────┬─────┘
       └─────────────┴─────────────┴──────┘
                       │
                       │    Submit ──▶ Execute Submit ──┐
                       │                                 │
                       │    Finish ──▶ Task Complete    │
                       │                     │           │
                       └─────────────────────┘           │
                                                         │
                                                         ▼
                                                ┌─────────────────┐
                                                │Display Results  │
                                                └─────────────────┘

Component Architecture

┌──────────────────┐
│  Side Panel UI   │
└────────┬─────────┘
         │
         ▼
    ┌─────────┐
    │App.tsx  │
    └────┬────┘
         │
         ├────────────────┬─────────────────┬──────────────────┐
         │                │                 │                  │
         ▼                ▼                 ▼                  ▼
┌────────────────┐  ┌──────────┐  ┌──────────────┐  ┌──────────────────┐
│ GeminiService  │  │   DOM    │  │ChatMessage   │  │  Settings UI     │
│                │  │ Observer │  │    List      │  │                  │
└───────┬────────┘  └────┬─────┘  └──────────────┘  └────────┬─────────┘
        │                │                                    │
        ▼                ▼                                    ▼
┌────────────────┐  ┌──────────┐                    ┌──────────────────┐
│  Gemini API    │  │ Content  │                    │  Storage API     │
│                │  │  Script  │                    │                  │
└────────────────┘  └────┬─────┘                    └──────────────────┘
                         │
                         ▼
                   ┌──────────┐
                   │ Web Page │
                   └──────────┘

┌──────────────────┐
│ Background       │
│ Service Worker   │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ Tab Management   │
└──────────────────┘

Data Flow

sequenceDiagram participant U as User participant UI as Side Panel participant A as Agent Loop participant D as DOM Observer participant G as Gemini AI participant P as Web Page U->>UI: Enter command UI->>A: Start agent A->>D: Request DOM D->>P: Extract semantic DOM P-->>D: Return simplified DOM D-->>A: DOM structure A->>G: Analyze + decide G-->>A: Next action A->>P: Execute action P-->>A: Action result A->>UI: Update status Note over A,G: Loop continues until task complete

Core Modules

App.tsx

Main React component managing agent state, UI, and orchestration of the autonomous loop.

GeminiService.ts

AI inference layer communicating with Gemini API for decision-making and action planning.

domObserver.ts

Semantic DOM extraction using anchoring techniques to identify actionable elements.

background.js

Service worker handling tab lifecycle, navigation events, and extension lifecycle.

Code Examples

Basic Agent Invocation

// User types in natural language
const userInput = "Go to Wikipedia and search for artificial intelligence";

// Agent starts autonomous loop
startAgent(userInput);

// Agent will:
// 1. Navigate to wikipedia.org
// 2. Find search input
// 3. Type the query
// 4. Submit the form
// 5. Report completion

Semantic DOM Extraction

// domObserver.ts - Semantic anchoring
export function getSemanticDOM(): string {
  const elements = document.querySelectorAll('a, button, input, textarea, select');

  const anchors = Array.from(elements).map((el, idx) => {
    const tag = el.tagName.toLowerCase();
    const text = el.textContent?.trim().slice(0, 50);
    const id = el.id;
    const name = el.getAttribute('name');
    const type = el.getAttribute('type');

    return {
      index: idx,
      tag,
      text,
      id,
      name,
      type,
      selector: generateSelector(el)
    };
  });

  return JSON.stringify(anchors, null, 2);
}

AI Decision Making

// gemini.ts - Get next action from AI
async getNextAction(dom: string, goal: string, history: string): Promise {
  const response = await this.ai.models.generateContent({
    model: 'gemini-2.5-flash',
    contents: `
      GOAL: ${goal}
      HISTORY: ${history}
      CURRENT DOM: ${dom}

      What is the next step?
    `,
    config: {
      systemInstruction: SYSTEM_PROMPT,
      responseMimeType: "application/json",
      responseSchema: {
        type: Type.OBJECT,
        properties: {
          thought_process: { type: Type.STRING },
          action: { type: Type.STRING },
          selector: { type: Type.STRING },
          value: { type: Type.STRING },
          url: { type: Type.STRING }
        }
      }
    }
  });

  return JSON.parse(response.text);
}

Executing Actions on Page

// domObserver.ts - Execute action in page context
export function executeAction(
  action: string,
  selector: string,
  value: string
): { success: boolean; message: string } {
  try {
    const element = document.querySelector(selector);

    if (!element) {
      return { success: false, message: `Element not found: ${selector}` };
    }

    switch (action) {
      case 'click':
        (element as HTMLElement).click();
        return { success: true, message: 'Clicked element' };

      case 'type':
        if (element instanceof HTMLInputElement || element instanceof HTMLTextAreaElement) {
          element.value = value;
          element.dispatchEvent(new Event('input', { bubbles: true }));
          return { success: true, message: `Typed "${value}"` };
        }
        break;

      case 'submit':
        if (element instanceof HTMLFormElement) {
          element.submit();
        } else {
          (element as HTMLElement).click();
        }
        return { success: true, message: 'Submitted form' };
    }

    return { success: false, message: 'Unknown action' };
  } catch (error) {
    return { success: false, message: error.message };
  }
}

Agent Action Schema

{
  "thought_process": "The user wants to search Wikipedia. I need to navigate to wikipedia.org first.",
  "action": "navigate",
  "url": "https://www.wikipedia.org",
  "selector": null,
  "value": null
}

// After navigation completes...
{
  "thought_process": "I'm on Wikipedia homepage. I can see a search input with id 'searchInput'. I'll type the query.",
  "action": "type",
  "selector": "#searchInput",
  "value": "artificial intelligence",
  "url": null
}

// After typing...
{
  "thought_process": "Query typed. Now I'll submit the search form.",
  "action": "submit",
  "selector": "form[role='search']",
  "value": null,
  "url": null
}

// After results load...
{
  "thought_process": "Search completed successfully. Results are displayed.",
  "action": "finish",
  "selector": null,
  "value": null,
  "url": null
}

API Reference

Agent Actions

navigate

Navigate to a specified URL

{ "action": "navigate", "url": "https://example.com" }

click

Click an element by selector

{ "action": "click", "selector": "button.submit" }

type

Type text into an input element

{ "action": "type", "selector": "#search", "value": "query text" }

submit

Submit a form by selector

{ "action": "submit", "selector": "form#login" }

finish

Mark task as complete

{ "action": "finish", "thought_process": "Task completed successfully" }

Agent Status States

IDLE

Agent waiting for user input

ANALYZING_DOM

Extracting page structure

THINKING

AI deciding next action

EXECUTING

Performing action on page

WAITING_FOR_NAV

Waiting for page navigation

FINISHED

Task completed successfully

ERROR

Error occurred, agent stopped

TypeScript Interfaces

// types.ts
export enum AgentStatus {
  IDLE = 'Idle',
  ANALYZING_DOM = 'Analyzing page...',
  THINKING = 'Thinking...',
  EXECUTING = 'Executing action...',
  WAITING_FOR_NAV = 'Navigating...',
  FINISHED = 'Complete',
  ERROR = 'Error'
}

export interface AgentAction {
  thought_process: string;
  action: 'navigate' | 'click' | 'type' | 'submit' | 'finish';
  selector?: string;
  value?: string;
  url?: string;
}

export interface ChatMessage {
  role: 'user' | 'model' | 'system';
  content: string;
  timestamp: number;
  type: 'text' | 'action_log';
  metadata?: AgentAction;
}