Agent Evaluation Report

Eval ID
bu-benchmark-p22ml
Generated
2026-05-20T01:37:02.709Z
Agent Type
pilo
LLM Provider
vertex
Model
gemini-2.5-flash
Browser
chrome
20 / 20
Completed Tasks
7
Successful (Verdict)
35%
Success Rate
1293117
Total Tokens
1576
Total Events

Failure Classifications

agent_gave_up_early: 10
element_interaction_failed: 3

7 successful, 13 failed — failures.txtanalysis-summary.md

Task Results

Task ID Website Question Verdict Tokens Events Duration
browser-use-benchmark--296beb37-09d9-49f6-b644-1efa0383a483 296beb37-09d9-49f6-b644-1efa0383a483 Go to the URL and complete the Ember form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/ember-form.html agent_gave_up_early
50567 80 8.2s
browser-use-benchmark--bb85cb1e-679f-455d-95e5-edd421ea8205 bb85cb1e-679f-455d-95e5-edd421ea8205 Go to the URL and complete the React Hook Form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/react-hook-form.html agent_gave_up_early
56728 81 10.4s
browser-use-benchmark--fe521e79-3b4d-4218-8bf9-f3421e44cb5c fe521e79-3b4d-4218-8bf9-f3421e44cb5c Go to the URL and complete the Formik form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/formik-form.html agent_gave_up_early
47744 72 9.8s
browser-use-benchmark--a34dfbf1-3e36-4099-bb52-f6c688453375 a34dfbf1-3e36-4099-bb52-f6c688453375 Go to the URL and complete the jQuery Bootstrap form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/jquery-bootstrap-form.html agent_gave_up_early
81524 96 16.3s
browser-use-benchmark--0a8d83a8-32c5-4609-ac78-2e8c784315bb 0a8d83a8-32c5-4609-ac78-2e8c784315bb Go to the URL and complete the Shadow DOM form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/shadow-dom-form.html SUCCESS
82341 104 4.8s
browser-use-benchmark--7fa92efb-8237-4c81-82dc-e0a5bdc0b675 7fa92efb-8237-4c81-82dc-e0a5bdc0b675 Go to the URL and complete the Vue form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/vue-form.html element_interaction_failed
63433 88 21.3s
browser-use-benchmark--97c99135-3aff-4831-af24-c9fcf8d92ef7 97c99135-3aff-4831-af24-c9fcf8d92ef7 Go to the URL and complete the Wufoo-style form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/wufoo-style-form.html agent_gave_up_early
62407 80 6.8s
browser-use-benchmark--78bdcade-3ac8-46a9-bd6f-1f4c1c219a5f 78bdcade-3ac8-46a9-bd6f-1f4c1c219a5f Go to the URL and complete the form with hidden labels by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/hidden-labels-form.html agent_gave_up_early
40855 64 10.3s
browser-use-benchmark--9be48103-a449-4247-9177-d6e90c76576e 9be48103-a449-4247-9177-d6e90c76576e Go to the URL and complete the iframe inception challenge level 3. https://browser-use.github.io/stress-tests/challenges/iframe-inception-level3.html SUCCESS
73424 96 2.9s
browser-use-benchmark--cecf0cdc-87eb-44ec-9c8e-17c8335afa5f cecf0cdc-87eb-44ec-9c8e-17c8335afa5f Go to the URL and complete the Material-UI form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/material-ui-form.html agent_gave_up_early
76225 97 13.9s
browser-use-benchmark--45789896-66b8-4b38-810d-6fc839df03da 45789896-66b8-4b38-810d-6fc839df03da Go to the URL and complete the iframe inception challenge level 2. https://browser-use.github.io/stress-tests/challenges/iframe-inception-level2.html SUCCESS
stale refs ×3
304677 192 3.4s
browser-use-benchmark--7d17ab49-6539-40d0-a67d-68f12958620f 7d17ab49-6539-40d0-a67d-68f12958620f Go to the URL and drag the slider completely to the right to complete the task. https://browser-use.github.io/stress-tests/challenges/slider-drag.html SUCCESS
8781 24 3.5s
browser-use-benchmark--4df60129-66b6-49e4-979d-f201930d732a 4df60129-66b6-49e4-979d-f201930d732a Go to the URL and complete the AngularJS form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/angularjs-form.html agent_gave_up_early
50473 72 8.9s
browser-use-benchmark--53cd3515-d11a-4340-b352-e73f49e70d09 53cd3515-d11a-4340-b352-e73f49e70d09 Go to the URL and complete the Svelte form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/svelte-form.html element_interaction_failed
53768 80 9.0s
browser-use-benchmark--2a8e2322-f7b3-418e-a871-0819dcc55474 2a8e2322-f7b3-418e-a871-0819dcc55474 Go to the URL and complete the table-based form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/table-form.html agent_gave_up_early
60842 72 7.9s
browser-use-benchmark--7db8369d-727b-485e-9d7a-e0f0ecdd964d 7db8369d-727b-485e-9d7a-e0f0ecdd964d Go to the URL and complete the iframe inception challenge level 1. https://browser-use.github.io/stress-tests/challenges/iframe-inception-level1.html SUCCESS
stale refs ×3
26102 40 3.7s
browser-use-benchmark--36f4e2db-4387-4163-99c3-c221d63a9733 36f4e2db-4387-4163-99c3-c221d63a9733 Go to the URL and read the text from the canvas CAPTCHA and enter it to complete the task. https://browser-use.github.io/stress-tests/challenges/canvas-captcha.html SUCCESS
CAPTCHA
8923 24 2.9s
browser-use-benchmark--c1f60ee6-506c-4ddb-b4eb-64106930667b c1f60ee6-506c-4ddb-b4eb-64106930667b Go to the URL and complete the form with non-Latin characters by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/non-latin-form.html element_interaction_failed
57888 80 18.5s
browser-use-benchmark--9ba7ecfc-e5ad-43d3-9e0d-e380bb8891b6 9ba7ecfc-e5ad-43d3-9e0d-e380bb8891b6 Go to the URL and complete the animated form by filling in all required fields and submitting. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/animated-form.html SUCCESS
40741 62 4.9s
browser-use-benchmark--755c9f4c-02e3-47fe-99ff-847a8037e227 755c9f4c-02e3-47fe-99ff-847a8037e227 Go to the URL and complete the rich text form by filling in all required fields including the rich text editor. If needed, create a file. Validate that the form was filled and submitted successfully. https://browser-use.github.io/stress-tests/challenges/rich-text-form.html agent_gave_up_early
45674 72 18.2s