大模型代肝,自动刷《崩铁》升级材料,Claude操纵计算机还能这么用!
来源:机器之心
2024-12-24 09:51:52
0浏览
收藏
在科技周边实战开发的过程中,我们经常会遇到一些这样那样的问题,然后要卡好半天,等问题解决了才发现原来一些细节知识点还是没有掌握好。今天golang学习网就整理分享《大模型代肝,自动刷《崩铁》升级材料,Claude操纵计算机还能这么用!》,聊聊,希望可以帮助到正在努力赚钱的你。
大模型的执行力从哪里来?






论文链接:https://arxiv.org/pdf/2411.10323
项目链接:https://github.com/showlab/computer_use_ootb


系统提示
<section><code>System Overview</code></section><section><code>* You have access to a set of functions that allow you to interact with a sandboxed computing environment.</code></section><section><code>* You do NOT have access to external resources, except through the functions provided below.</code></section><section><code>* You can invoke one or more functions by writing a <function_calls> block like this:</function_calls></code></section><section><code>plaintext</code></section><section><code><function_calls></function_calls></code></section><section><code><invoke name="$FUNCTION_NAME"></invoke></code></section><section><code><parameter name="$PARAMETER_NAME">$PARAMETER_VALUE</parameter></code></section><section><code>...</code></section><section><code></code></section><section><code><invoke name="$FUNCTION_NAME2"></invoke></code></section><section><code>...</code></section><section><code></code></section><section><code></code></section><section><code>* String and scalar parameters should be passed as is. Lists and objects should be passed in JSON format.</code></section><section><code>* The output or any errors will appear in a subsequent <function_results> block. If a <function_results> block does NOT appear, your function call was likely malformatted.</function_results></function_results></code></section><section><code>Available Functions</code></section><section><code>1. Computer Interaction (GUI):</code></section><section><code>* Description: Use a mouse and keyboard to interact with the computer and take screenshots.</code></section><section><code>You can only interact with the desktop GUI (no terminal or application menu access).</code></section><section><code>* Actions include:</code></section><section><code>* key: Press a key or key-combination.</code></section><section><code>* type: Type a string of text.</code></section><section><code>* mouse_move: Move the cursor to specified coordinates.</code></section><section><code>* left_click, right_click, middle_click, double_click: Perform mouse clicks.</code></section><section><code>* left_click_drag: Click and drag the cursor.</code></section><section><code>* screenshot: Take a screenshot of the screen.</code></section><section><code>* Important Notes:</code></section><section><code>* The screen resolution is [SCREEN_RESOLUTION, e.g., 1024x768].</code></section><section><code>* Always check the coordinates of elements via screenshots before moving the cursor.</code></section><section><code>* If a click fails, adjust your cursor position and retry.</code></section><section><code>* Parameters:</code></section><section><code>* action (required): The action to perform, such as key, type, etc.</code></section><section><code>* coordinate: The (x, y) coordinates for mouse-related actions.</code></section><section><code>* text: The text to type or key to press for type and key actions.</code></section><section><code>Bash Shell Commands:</code></section><section><code>* Description:</code><code>Run commands in a bash shell.</code></section><section><code>* Parameters:</code></section><section><code>* command (required): The bash command to run.</code></section><section><code>* restart: If true, restarts the tool.</code></section><section><code>File Editing Tool:</code></section><section><code>* Description: View, create, and edit files.</code></section><section><code>* view: Displays a file or lists directory contents.</code></section><section><code>* create: Creates a new file (fails if the file already exists).</code></section><section><code>* str_replace: Replaces a specific string in a file.</code></section><section><code>* insert: Inserts a string after a specified line.</code></section><section><code>* Parameters:</code></section><section><code>* path (required): The absolute path to the file or directory.</code></section><section><code>* write_text: The content for creating a file.</code></section><section><code>* str: Strings for replacing or inserting content.</code></section><section><code>* line: Line number for inserting content.</code></section><section><code>* view_range: Specify range of lines to view.</code></section><section><code>System Capabilities</code></section><section><code>* You are using an Ubuntu virtual machine with aarch64 architecture.</code></section><section><code>* You can install applications using apt or pip.</code></section><section><code>* Firefox is installed (use the firefox-esr version).</code></section><section><code>* GUI applications can be started from the Bash shell using DISPLAY=:1.</code></section><section><code>* The current date is [DATETIME, e.g., Wednesday, October 23, 2024].</code></section><section><code>Important Notes</code></section><section><code>* If the startup wizard for Firefox appears, ignore it. Do not click "skip this step." Instead, click on the address bar and enter the appropriate URL or search there.</code></section><section><code>* For handling PDFs, it may be better to download using a URL and convert it to text using pdftotext for easier reading.</code></section><section><code>Summary of How to Use the Tools</code></section><section><code>* Function Invocation: To interact with the environment, use the <function_calls> block.</function_calls></code></section><section><code>* Error Handling: If no <function_results> appear, check for malformatted calls.</function_results></code></section><section><code>* Multiple Calls: Where possible, chain multiple function calls to optimize workflow.</code></section>
状态观察
推理范式
智能体的工具

<section><code>{</code></section><section><code>"properties": {</code></section><section><code>"action": {</code></section><section><code>"description": """The action to perform. The available actions are:</code></section><section><code>* key: Press a key or key-combination on the keyboard.</code></section><section><code>* This supports xdotool's key syntax.</code></section><section><code>* Examples: "a", "Return", "alt+Tab", "ctrl+s", "Up", "KP_0" (for the numpad 0 key).</code></section><section><code>* type: Type a string of text on the keyboard.</code></section><section><code>* cursor_position: Get the current (x, y) pixel coordinate of the cursor on the screen.</code></section><section><code>* mouse_move: Move the cursor to a specified (x, y) pixel coordinate on the screen.</code></section><section><code>* left_click: Click the left mouse button.</code></section><section><code>* left_click_drag: Click and drag the cursor to a specified (x, y) pixel coordinate on the screen.</code></section><section><code>* right_click: Click the right mouse button.</code></section><section><code>* middle_click: Click the middle mouse button.</code></section><section><code>* double_click: Double-click the left mouse button.</code></section><section><code>* screenshot: Take a screenshot of the screen.</code></section><section><code>""",</code></section><section><code>"enum": [</code></section><section><code>"key",</code></section><section><code>"type",</code></section><section><code>"mouse_move",</code></section><section><code>"left_click",</code></section><section><code>"left_click_drag",</code></section><section><code>"right_click",</code></section><section><code>"middle_click",</code></section><section><code>"double_click",</code></section><section><code>"screenshot",</code></section><section><code>"cursor_position"</code></section><section><code>],</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"coordinate": {</code></section><section><code>"description": "(x, y): The x (pixels from the left edge) and y (pixels from the top edge) coordinates to move the mouse to. Required only by action=mouse_move and action=left_click_drag.",</code></section><section><code>"type": "array"</code></section><section><code>},</code></section><section><code>"text": {</code></section><section><code>"description": "Required only by action=type and action=key.",</code></section><section><code>"type": "string"</code></section><section><code>}</code></section><section><code>},</code></section><section><code>"required": ["action"],</code></section><section><code>"type": "object"</code></section><section><code>}</code></section>

<section><code>{</code></section><section><code>"properties": {</code></section><section><code>"command": {</code></section><section><code>"description": "The commands to run. Allowed options are:`view`,`create`,`str_replace`,`insert`,`undo_edit`.",</code><code> </code></section><section><code> "enum": ["view", "create", "str_replace", "insert", "undo_edit"],</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"file_text": {</code></section><section><code>"description": "Required parameter of`create`command, with the content of the file to be created.",</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"insert_line": {</code></section><section><code>"description": "Required parameter of`insert`command. The`new_str`will be inserted AFTER the line`insert_line`of`path`.",</code></section><section><code>"type": "integer"</code></section><section><code>},</code></section><section><code>"new_str": {</code></section><section><code>"description": "Optional parameter of`str_replace`command containing the new string (if not given, no string will be added). Required parameter of`insert`command containing the string to insert.",</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"old_str": {</code></section><section><code>"description": "Required parameter of`str_replace`command containing the string in`path`to replace.",</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"path": {</code></section><section><code>"description": "Absolute path to file or directory, e.g.,`/repo/file.py`or`/repo/`.",</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"view_range": {</code></section><section><code>"description": "Optional parameter of`view`command when`path`points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g., [11, 12] will show lines 11 and 12. Indexing starts at 1. Setting`[start_line, -1]`shows all lines from`start_line`to the end of the file.",</code></section><section><code>"items": { "type": "integer" },</code></section><section><code>"type": "array"</code></section><section><code>}</code></section><section><code>},</code></section><section><code>"required": ["command", "path"],</code></section><section><code>"type": "object"</code></section><section><code>}</code></section>

<section><code>{</code></section><section><code>"properties": {</code><code> </code></section><section><code> "command": {</code></section><section><code>"description": "The bash command to run. Required unless the tool is being restarted.",</code></section><section><code>"type": "string"</code></section><section><code>},</code></section><section><code>"restart": {</code></section><section><code>"description": "Specifying true will restart this tool. Otherwise, leave this unspecified.",</code></section><section><code>"type": "boolean"</code></section><section><code>}</code></section><section><code>}</code></section><section><code>}</code></section>
动作空间
智能体的记忆





规划:评估模型根据用户的输入生成可执行计划的能力。这个计划应该是能让软件整体成功运行,每个步骤都清晰且可执行的正确流程。
行动:评估模型是否能够准确识别并操作可交互的 GUI 元素,同时按照派生计划逐步执行具体操作。
反思:衡量模型对动态环境的感知能力,包括其根据操作结果进行调整的能力,例如在任务失败时尝试重试,或在任务完成后及时终止操作。









以上就是本文的全部内容了,是否有顺利帮助你解决问题?若是能给你带来学习上的帮助,请大家多多支持golang学习网!更多关于科技周边的相关知识,也可关注golang学习网公众号。
版本声明
本文转载于:机器之心 如有侵犯,请联系study_golang@163.com删除

- 上一篇
- win10隐藏文件夹怎么显示 win10隐藏文件夹显示方法介绍

- 下一篇
- 2023年推荐的最佳电脑录视频软件全解析
查看更多
最新文章
-
- 科技周边 · 人工智能 | 2分钟前 |
- 豆包AI编程教程:手把手教你写程序
- 198浏览 收藏
-
- 科技周边 · 人工智能 | 4分钟前 | 智能制造 优化 AIOverviews 生产调度系统 信息整合
- AI优化生产线,智能制造调度系统详解
- 171浏览 收藏
-
- 科技周边 · 人工智能 | 5分钟前 |
- 即梦AI高清封面导出教程与缩略图技巧
- 152浏览 收藏
-
- 科技周边 · 人工智能 | 12分钟前 | 代码转换 豆包AI
- 豆包AI代码转换技巧:语言互转快速方法
- 250浏览 收藏
-
- 科技周边 · 人工智能 | 15分钟前 |
- 5月销量榜:小米SU7排在ModelY之后
- 248浏览 收藏
-
- 科技周边 · 人工智能 | 20分钟前 | 配件 MagSafe iPhone17Pro Logo下移 重新设计
- iPhone17Pro后置Logo下移,MagSafe或升级
- 286浏览 收藏
-
- 科技周边 · 人工智能 | 22分钟前 |
- 即梦AI去水印方法:无痕设置教程分享
- 335浏览 收藏
-
- 科技周边 · 人工智能 | 24分钟前 |
- 1-4月进口车销量榜:雷克萨斯前三占三
- 139浏览 收藏
查看更多
课程推荐
-
- 前端进阶之JavaScript设计模式
- 设计模式是开发人员在软件开发过程中面临一般问题时的解决方案,代表了最佳的实践。本课程的主打内容包括JS常见设计模式以及具体应用场景,打造一站式知识长龙服务,适合有JS基础的同学学习。
- 542次学习
-
- GO语言核心编程课程
- 本课程采用真实案例,全面具体可落地,从理论到实践,一步一步将GO核心编程技术、编程思想、底层实现融会贯通,使学习者贴近时代脉搏,做IT互联网时代的弄潮儿。
- 508次学习
-
- 简单聊聊mysql8与网络通信
- 如有问题加微信:Le-studyg;在课程中,我们将首先介绍MySQL8的新特性,包括性能优化、安全增强、新数据类型等,帮助学生快速熟悉MySQL8的最新功能。接着,我们将深入解析MySQL的网络通信机制,包括协议、连接管理、数据传输等,让
- 497次学习
-
- JavaScript正则表达式基础与实战
- 在任何一门编程语言中,正则表达式,都是一项重要的知识,它提供了高效的字符串匹配与捕获机制,可以极大的简化程序设计。
- 487次学习
-
- 从零制作响应式网站—Grid布局
- 本系列教程将展示从零制作一个假想的网络科技公司官网,分为导航,轮播,关于我们,成功案例,服务流程,团队介绍,数据部分,公司动态,底部信息等内容区块。网站整体采用CSSGrid布局,支持响应式,有流畅过渡和展现动画。
- 484次学习
查看更多
AI推荐
-
- 免费AI认证证书
- 科大讯飞AI大学堂推出免费大模型工程师认证,助力您掌握AI技能,提升职场竞争力。体系化学习,实战项目,权威认证,助您成为企业级大模型应用人才。
- 19次使用
-
- 茅茅虫AIGC检测
- 茅茅虫AIGC检测,湖南茅茅虫科技有限公司倾力打造,运用NLP技术精准识别AI生成文本,提供论文、专著等学术文本的AIGC检测服务。支持多种格式,生成可视化报告,保障您的学术诚信和内容质量。
- 160次使用
-
- 赛林匹克平台(Challympics)
- 探索赛林匹克平台Challympics,一个聚焦人工智能、算力算法、量子计算等前沿技术的赛事聚合平台。连接产学研用,助力科技创新与产业升级。
- 197次使用
-
- 笔格AIPPT
- SEO 笔格AIPPT是135编辑器推出的AI智能PPT制作平台,依托DeepSeek大模型,实现智能大纲生成、一键PPT生成、AI文字优化、图像生成等功能。免费试用,提升PPT制作效率,适用于商务演示、教育培训等多种场景。
- 177次使用
-
- 稿定PPT
- 告别PPT制作难题!稿定PPT提供海量模板、AI智能生成、在线协作,助您轻松制作专业演示文稿。职场办公、教育学习、企业服务全覆盖,降本增效,释放创意!
- 167次使用
查看更多
相关文章
-
- GPT-4王者加冕!读图做题性能炸天,凭自己就能考上斯坦福
- 2023-04-25 501浏览
-
- 单块V100训练模型提速72倍!尤洋团队新成果获AAAI 2023杰出论文奖
- 2023-04-24 501浏览
-
- ChatGPT 真的会接管世界吗?
- 2023-04-13 501浏览
-
- VR的终极形态是「假眼」?Neuralink前联合创始人掏出新产品:科学之眼!
- 2023-04-30 501浏览
-
- 实现实时制造可视性优势有哪些?
- 2023-04-15 501浏览