Copyright 2012-2025 家電庫 版權(quán)所有 京ICP備20132067號(hào)-1
谷歌的 Computer Use 模型來了!
今天凌晨,谷歌 DeepMind 重磅發(fā)布了基于 Gemini 2.5 的計(jì)算機(jī)使用模型Gemini 2.5 Computer Use。
考慮到前些天谷歌才剛剛發(fā)布了 Chrome DevTools (MCP),Gemini 2.5 Computer Use 的誕生倒不是特別讓人驚訝。簡單來說,與 OpenAI 的 Computer-Using Agent (CUA) 類似,DeepMind 的這個(gè)模型可讓 AI 直接控制用戶的瀏覽器 —— 在視覺理解和推理能力的基礎(chǔ)上,該模型可以幫助用戶在瀏覽器中執(zhí)行點(diǎn)擊、滾動(dòng)和輸入等操作。
先來看兩個(gè)官方演示。
提示詞:From https://tinyurl.com/pet-care-signup , get all details for any pet with a California residency and add them as a guest in my spa CRM at https://pet-luxe-spa.web.app/. Then, set up a follow up visit appointment with the specialist Anima Lavar for October 10th anytime after 8am. The reason for the visit is the same as their requested treatment.
提示詞:My art club brainstormed tasks ahead of our fair. The board is chaotic and I need your help organizing the tasks into some categories I created. Go to sticky-note-jam.web.app and ensure notes are clearly in the right sections. Drag them there if not.
可以看到,不管是收集網(wǎng)絡(luò)信息與執(zhí)行動(dòng)作,還是整理雜亂筆記,Gemini 2.5 Computer Use 都非常準(zhǔn)確地完成了任務(wù),同時(shí)速度也相當(dāng)快。
在相關(guān)基準(zhǔn)上,Gemini 2.5 Computer Use 的性能表現(xiàn)也達(dá)到了 SOTA 水平: