To initialize:
You can view the integration sample in quick-start.
You can view the integration sample in quick-start.
In the following documentation, you may see functions called with the
mid.prefix. If you use destructuring in Playwright, likeasync ({ ai, aiQuery }) => { /* ... */}, you can call the functions without this prefix. It's just a matter of syntax.
.aiAction(steps: string) or .ai(steps: string) - Control the pageYou can use .aiAction to perform a series of actions. It accepts a steps: string as a parameter, which describes the actions. In the prompt, you should clearly describe the steps. Midscene will take care of the rest.
.ai is the shortcut for .aiAction.
These are some good samples:
Steps should always be clearly and thoroughly described. A very brief prompt like 'Tweet "Hello World"' will result in unstable performance and a high likelihood of failure.
Under the hood, Midscene will plan the detailed steps by sending your page context and a screenshot to the AI. After that, Midscene will execute the steps one by one. If Midscene deems it impossible to execute, an error will be thrown.
The main capabilities of Midscene are as follows, and your task will be split into these types. You can see them in the visualization tools:
Currently, Midscene can't plan steps that include conditions and loops.
Related Docs:
.aiQuery(dataDemand: any) - extract any data from pageYou can extract customized data from the UI. Provided that the multi-modal AI can perform inference, it can return both data directly written on the page and any data based on "understanding". The return value can be any valid primitive type, like String, Number, JSON, Array, etc. Just describe it in the dataDemand.
For example, to parse detailed information from page:
You can also describe the expected return value format as a plain string:
.aiAssert(assertion: string, errorMsg?: string) - do an assertion.aiAssert works just like the normal assert method, except that the condition is a prompt string written in natural language. Midscene will call AI to determine if the assertion is true. If the condition is not met, an error will be thrown containing errorMsg and a detailed reason generated by AI.
Assertions are usually a very important part of your script. To prevent the possibility of AI hallucinations ( especially for the false negative situation ), you can also use .aiQuery + normal JavaScript assertions to replace the .aiAssert calls.
For example, to replace the previous assertion,
.aiWaitFor(assertion: string, {timeoutMs?: number, checkIntervalMs?: number }) - wait until the assertion is met.aiWaitFor will help you check if your assertion has been met or an timeout error occurred. Considering the AI service cost, the check interval will not exceed checkIntervalMs milliseconds. The default config sets timeoutMs to 15 seconds and checkIntervalMs to 3 seconds: i.e. check at most 5 times if all assertions fail and the AI service always responds immediately.
When considering the time required for the AI service, .aiWaitFor may not be very efficient. Using a simple sleep method might be a useful alternative to waitFor.
By setting MIDSCENE_DEBUG_AI_PROFILE, you can take a look at the time and token consumption of AI calls.
LangSmith is a platform designed to debug the LLMs. To integrate LangSmith, please follow these steps:
Launch Midscene, you should see logs like this: