Use natural language to control your browser, powered by LLM and playwright
- ✨ Use natural language to write and run e2e test cases
- 🧪 Generate traditional e2e test cases after executing the cases
- 💭 More feautes are coming soon! Feel free to open an issue or submit a pull request
demo.mp4
User Task
Click search and input "Web API", press "arrow down" once to select the second result. Then press "ENTER" to search it. Find "Keyboard API" nearby title "K" and click it
- Only claude-3-5-sonnet LLM is supported for now
- Only playwright framework is supported for now
- NodeJS >= 18
The example is under directory example
, try the following steps to have an experience:
# clone the repository
git clone https://github.com/lvqq/intelli-browser.git
# install dependencies and browsers
pnpm i
# build all packages
pnpm -r build
# add your ANTHROPIC_API_KEY
# run demo
pnpm run demo
# use npm
npm install @intelli-browser/core
# use yarn
yarn add @intelli-browser/core
# use pnpm
pnpm add @intelli-browser/core
import { IntelliBrowser } from '@intelli-browser/core';
const client = new IntelliBrowser({
apiKey: '', // add apiKey or provide ANTHROPIC_API_KEY in .env file
})
await client.run({
page, // playwright Page instance
message: 'Click search and input "Web API", press "arrow down" to select the second result. then press "ENTER" to search it', // user prompt
})
If you want to generate the tranditional E2E test cases after executing, just get the return data from client.run
import { IntelliBrowser } from '@intelli-browser/core';
const client = new IntelliBrowser({
apiKey: '', // add apiKey or provide ANTHROPIC_API_KEY in .env file
})
// will return the generated e2e cases as string array
const e2e = await client.run({
page, // playwright Page instance
message: 'Click search and input "Web API", press "arrow down" to select the second result. then press "ENTER" to search it', // user prompt
})
console.log(e2e)
As the demo video shows:
[
'await page.mouse.move(1241.61, 430.2)',
'await page.waitForTimeout(2266)',
'await page.mouse.down()',
'await page.mouse.up()',
'await page.waitForTimeout(3210)',
"await page.mouse.type('Web API')",
'await page.waitForTimeout(3064)',
"await page.keyboard.press('ArrowDown')",
'await page.waitForTimeout(2917)',
"await page.keyboard.press('Enter')",
'await page.waitForTimeout(6471)',
"await page.keyboard.press('PageDown')",
'await page.waitForTimeout(7021)',
'await page.mouse.move(687.39, 923.4)',
'await page.waitForTimeout(4501)',
'await page.mouse.down()',
'await page.mouse.up()'
]
By default, LLM conversations and actions are logged as the demo shows. If you don't want it, you can try:
import { IntelliBrowser } from '@intelli-browser/core';
const client = new IntelliBrowser({
apiKey: '', // add apiKey or provide ANTHROPIC_API_KEY in .env file
verbose: false, // don't log out conversations and actions
})
By default, context will be cleaned after each client.run
to save tokens usage. If you want to retain context, you can try:
import { IntelliBrowser } from '@intelli-browser/core';
const client = new IntelliBrowser({
apiKey: '', // add apiKey or provide ANTHROPIC_API_KEY in .env file
autoClean: false, // don't auto clean context
})
- Inspired by claude-3.5-sonnet computer use funtion, it simulates browser use and combines with e2e cases
- User prompt and page info are sent to the LLM to analyze page content and interactive elements
- Intelli-Browser executes the LLM planned actions and feeds it back
- It ends when no more action or cannot achieve the goal of the task
Based on MIT License