
欢迎来到《聚焦AI》(Eye on AI)!我是AI记者莎伦·戈德曼,本期由我代班正在休假的杰里米·卡恩,为您带来最新资讯。本期看点有:美国总务管理局(General Services Administration)批准将OpenAI、Google、Anthropic纳入联邦AI供应商名单,AI投资热潮对美国经济的影响,Clay AI完成1亿美元融资,估值达31亿美元。周六,约2000名学生、研究人员及科技圈人士涌入加州大学伯克利分校,共话AI智能体发展前景,或许只有在湾区大家才会对这种周末安排习以为常。当我拿着为期一天的“代理式AI峰会”(Agentic AI Summit)的参会证,看着排队的人群在学生活动中心的大堂里蜿蜒前行时,感觉自己不像是在参加学术会议,倒更像是来到了硅谷版的纽约网红餐厅。
之所以会出现如此盛况,与本次峰会豪华的演讲嘉宾阵容显然有着莫大关系,其中不乏顶级AI研究人员和科学家,包括OpenAI首席科学家雅各布·帕乔基、谷歌DeepMind研究副总裁艾德•池、英伟达(Nvidia)首席科学家比尔·戴利、Databricks与Anyscale联合创始人扬·斯托伊卡,以及专注AI安全领域的业内先驱、加州大学伯克利分校教授宋晓东。
缔造如此火爆场面的另一个推手,或许是本次峰会的主题本身——当下的热门话题——AI智能体(AI Agent)。简单来说,它是一套由AI驱动的系统,能够高度自主地调用各类软件工具完成任务。我们可以将其想象为一种聊天机器人,只是其不仅能够推荐度假行程,更能直接帮你订机票、订酒店。
正如我的同事杰里米·卡恩在近期文章中所言:“这种自动化技术一直让企业高管魂牵梦绕。过去十年间,企业广泛引入‘机器人流程自动化’(Robotic Process Automation,简称RPA)工具。这类软件能够自动执行重复性任务,例如在数据库程序间剪切粘贴数据。但传统RPA系统僵化死板,无法处理意外情况,且通常仅能完成单一特定任务。”代理式AI(Agentic AI) 的设计目标,正是以更强的灵活性与功能突破这些局限,适应业务需求。
在2025年1月的一篇博客文章中,OpenAI首席执行官山姆·奥特曼表示:“我们相信,到2025年,首批AI智能体或将‘正式入职’企业,给企业的工作效率带来实质性变化。”
尽管热度空前高涨,“代理式AI峰会”的主基调却十分清醒克制:AI智能体固然是当下AI领域的“当红炸子鸡”,但这项技术目前仍不成熟。AI智能体的表现难言稳健可靠,令人遗憾,且其常会陷入“记忆断层”困局。
例如,谷歌DeepMind的艾德·池就强调,当前AI智能体在定制化演示环境中所展示出的能力与真实生产环境的需求之间仍存在显著差距。帕乔基则强调了对智能体系统安全性、安保性与可信度的关切,尤其是在这类系统被集成至敏感应用场景,或需完全自主运行时。
OpenAI API工程主管吴雪枫说:“我始终认为AI智能体的表现未达预期。其在某些通用场景确实运作良好,但我的日常工作体验并未因智能体的应用而产生实质性变化。”
尽管当下AI智能体的表现与市场的狂热预期间仍有差距(如Salesforce首席执行官马克·贝尼奥夫近日宣称,向“数字化劳动力”转型意味着他将是“Salesforce最后一位只管理人类员工的CEO”),但代理式AI峰会的演讲嘉宾们仍对该技术的前景满怀信心。Databricks的斯托伊卡对基础设施的升级做出了高度评价,认为这些进步将明显降低智能体系统的开发门槛。英伟达的戴利则指出,硬件技术的持续突破将助力AI智能体获得更强大、高效的行为能力。还有多位专家列举了编程等特定领域取得的“局部突破”。
如今,AI智能体或许仍面临成长阵痛,但加州大学伯克利分校挤爆会场的盛况足以证明,整个行业对其未来发展仍充满期待,寄望其有朝一日能在现实世界实现可靠运行。从业者坚信,等待终将换来丰厚的回报。
先说到这,下面是更多AI领域的新闻。
AI新闻速递
美国联邦政府批准OpenAI、谷歌、Anthropic加入AI供应商名录。路透社(Reuters)报道,美国政府中央采购部门——总务管理局(GSA)已将OpenAI的ChatGPT、谷歌的Gemini和Anthropic的Claude等大模型列入AI供应商名单,加速政府部门对AI技术的应用。这些工具将通过一个设有合同条款的平台开放给各机构使用。GSA强调,获批AI供应商“承诺遵循负责任使用原则,并确保相关服务符合联邦标准”。
AI投资热潮或对美国经济产生实质影响。据《华盛顿邮报》(Washington Post)报道,尽管美国整体经济显现放缓迹象,但谷歌、Meta、亚马逊和微软等科技巨头今年在AI领域的创纪录投资(超3500亿美元)将成为推动经济增长的关键动力。在就业增长降温的背景下,该领域的巨额投资将推动数据中心建设,并刺激市场对芯片、服务器及网络设备的需求,预计在2025年或将拉动0.7%的GDP增长。但也有经济学家警告称,经济增长对科技巨头的依赖性不断增强也会带来风险,一旦AI热潮开始消退,经济或将承受严重冲击。
AI销售工具Clay完成1亿美元C轮融资,估值飙升至31亿美元。据《纽约时报》(New York Times)Dealbook报道,专注帮助销售与营销人员挖掘潜在新客户并推动转化的AI平台Clay,近日完成1亿美元(约合人民币7.3亿元)C轮融资,投后估值达31亿美元(约合人民币222.9亿元)。本轮投资由谷歌母公司Alphabet旗下投资机构CapitalG领投,Meritech Capital Partners及红杉资本(Sequoia Capital)跟投。此次融资距该初创企业上一轮12.5亿美元估值融资仅相隔约半年。
AI研发新动向
谷歌DeepMind发布新一代Genie 3“世界模型”,打造可实时交互虚拟世界。谷歌DeepMind推出革命性AI系统Genie 3,仅需输入简单文本提示即可生成内容丰富的交互式虚拟世界,支持以每秒24帧的速率实时探索动态环境。尽管我们很容易联想到使用该模型为玩家提供终极游戏体验,但其本质仍是谷歌长期推进“世界模型”(即能学习世界运行规律并模拟真实环境的AI系统)的最新突破。这类模型被视为训练高级智能体乃至实现通用人工智能(AGI)的关键技术。与此前的视频生成模型不同,Genie 3生成的场景能动态维持数分钟的视觉一致性,用户可在其中自由行动,甚至可以通过指令(如““下雪”或“添加角色”)实时改变环境状态。目前,DeepMind仅向少部分研究人员和创作者开放访问权限,探索负责任部署路径,评估潜在风险。
前沿探索
“思考深度”会否成为影响AI推理能力的关键要素?
新问世的一款微型AI模型颠覆了我们对模型推理学习机制的认知。新加坡的Sapient Intelligence团队近期发布的分层推理模型(HRM)借鉴了人脑的分层思考过程,相关成果已在AI界引发热议。尽管HRM的数据量仅为ChatGPT的1/100,训练所用的样本数量也仅为1000个(未使用互联网数据或进行分步指导),却能解决让许多体量更大的模型都束手无策的数独、迷宫导航等复杂逻辑问题以及抽象推理任务。与模仿人类语言不同,HRM通过内部隐藏的逻辑循环进行推理,与人在脑海中解谜的过程非常相似。该模型的成功或许预示AI领域将迎来重大变革,让思考深度成为比模型规模更重要的影响因素。(*)
译者:梁宇
审校:夏林
欢迎来到《聚焦AI》(Eye on AI)!我是AI记者莎伦·戈德曼,本期由我代班正在休假的杰里米·卡恩,为您带来最新资讯。本期看点有:美国总务管理局(General Services Administration)批准将OpenAI、Google、Anthropic纳入联邦AI供应商名单,AI投资热潮对美国经济的影响,Clay AI完成1亿美元融资,估值达31亿美元。周六,约2000名学生、研究人员及科技圈人士涌入加州大学伯克利分校,共话AI智能体发展前景,或许只有在湾区大家才会对这种周末安排习以为常。当我拿着为期一天的“代理式AI峰会”(Agentic AI Summit)的参会证,看着排队的人群在学生活动中心的大堂里蜿蜒前行时,感觉自己不像是在参加学术会议,倒更像是来到了硅谷版的纽约网红餐厅。
之所以会出现如此盛况,与本次峰会豪华的演讲嘉宾阵容显然有着莫大关系,其中不乏顶级AI研究人员和科学家,包括OpenAI首席科学家雅各布·帕乔基、谷歌DeepMind研究副总裁艾德•池、英伟达(Nvidia)首席科学家比尔·戴利、Databricks与Anyscale联合创始人扬·斯托伊卡,以及专注AI安全领域的业内先驱、加州大学伯克利分校教授宋晓东。
缔造如此火爆场面的另一个推手,或许是本次峰会的主题本身——当下的热门话题——AI智能体(AI Agent)。简单来说,它是一套由AI驱动的系统,能够高度自主地调用各类软件工具完成任务。我们可以将其想象为一种聊天机器人,只是其不仅能够推荐度假行程,更能直接帮你订机票、订酒店。
正如我的同事杰里米·卡恩在近期文章中所言:“这种自动化技术一直让企业高管魂牵梦绕。过去十年间,企业广泛引入‘机器人流程自动化’(Robotic Process Automation,简称RPA)工具。这类软件能够自动执行重复性任务,例如在数据库程序间剪切粘贴数据。但传统RPA系统僵化死板,无法处理意外情况,且通常仅能完成单一特定任务。”代理式AI(Agentic AI) 的设计目标,正是以更强的灵活性与功能突破这些局限,适应业务需求。
在2025年1月的一篇博客文章中,OpenAI首席执行官山姆·奥特曼表示:“我们相信,到2025年,首批AI智能体或将‘正式入职’企业,给企业的工作效率带来实质性变化。”
尽管热度空前高涨,“代理式AI峰会”的主基调却十分清醒克制:AI智能体固然是当下AI领域的“当红炸子鸡”,但这项技术目前仍不成熟。AI智能体的表现难言稳健可靠,令人遗憾,且其常会陷入“记忆断层”困局。
例如,谷歌DeepMind的艾德·池就强调,当前AI智能体在定制化演示环境中所展示出的能力与真实生产环境的需求之间仍存在显著差距。帕乔基则强调了对智能体系统安全性、安保性与可信度的关切,尤其是在这类系统被集成至敏感应用场景,或需完全自主运行时。
OpenAI API工程主管吴雪枫说:“我始终认为AI智能体的表现未达预期。其在某些通用场景确实运作良好,但我的日常工作体验并未因智能体的应用而产生实质性变化。”
尽管当下AI智能体的表现与市场的狂热预期间仍有差距(如Salesforce首席执行官马克·贝尼奥夫近日宣称,向“数字化劳动力”转型意味着他将是“Salesforce最后一位只管理人类员工的CEO”),但代理式AI峰会的演讲嘉宾们仍对该技术的前景满怀信心。Databricks的斯托伊卡对基础设施的升级做出了高度评价,认为这些进步将明显降低智能体系统的开发门槛。英伟达的戴利则指出,硬件技术的持续突破将助力AI智能体获得更强大、高效的行为能力。还有多位专家列举了编程等特定领域取得的“局部突破”。
如今,AI智能体或许仍面临成长阵痛,但加州大学伯克利分校挤爆会场的盛况足以证明,整个行业对其未来发展仍充满期待,寄望其有朝一日能在现实世界实现可靠运行。从业者坚信,等待终将换来丰厚的回报。
先说到这,下面是更多AI领域的新闻。
AI新闻速递
美国联邦政府批准OpenAI、谷歌、Anthropic加入AI供应商名录。路透社(Reuters)报道,美国政府中央采购部门——总务管理局(GSA)已将OpenAI的ChatGPT、谷歌的Gemini和Anthropic的Claude等大模型列入AI供应商名单,加速政府部门对AI技术的应用。这些工具将通过一个设有合同条款的平台开放给各机构使用。GSA强调,获批AI供应商“承诺遵循负责任使用原则,并确保相关服务符合联邦标准”。
AI投资热潮或对美国经济产生实质影响。据《华盛顿邮报》(Washington Post)报道,尽管美国整体经济显现放缓迹象,但谷歌、Meta、亚马逊和微软等科技巨头今年在AI领域的创纪录投资(超3500亿美元)将成为推动经济增长的关键动力。在就业增长降温的背景下,该领域的巨额投资将推动数据中心建设,并刺激市场对芯片、服务器及网络设备的需求,预计在2025年或将拉动0.7%的GDP增长。但也有经济学家警告称,经济增长对科技巨头的依赖性不断增强也会带来风险,一旦AI热潮开始消退,经济或将承受严重冲击。
AI销售工具Clay完成1亿美元C轮融资,估值飙升至31亿美元。据《纽约时报》(New York Times)Dealbook报道,专注帮助销售与营销人员挖掘潜在新客户并推动转化的AI平台Clay,近日完成1亿美元(约合人民币7.3亿元)C轮融资,投后估值达31亿美元(约合人民币222.9亿元)。本轮投资由谷歌母公司Alphabet旗下投资机构CapitalG领投,Meritech Capital Partners及红杉资本(Sequoia Capital)跟投。此次融资距该初创企业上一轮12.5亿美元估值融资仅相隔约半年。
AI研发新动向
谷歌DeepMind发布新一代Genie 3“世界模型”,打造可实时交互虚拟世界。谷歌DeepMind推出革命性AI系统Genie 3,仅需输入简单文本提示即可生成内容丰富的交互式虚拟世界,支持以每秒24帧的速率实时探索动态环境。尽管我们很容易联想到使用该模型为玩家提供终极游戏体验,但其本质仍是谷歌长期推进“世界模型”(即能学习世界运行规律并模拟真实环境的AI系统)的最新突破。这类模型被视为训练高级智能体乃至实现通用人工智能(AGI)的关键技术。与此前的视频生成模型不同,Genie 3生成的场景能动态维持数分钟的视觉一致性,用户可在其中自由行动,甚至可以通过指令(如““下雪”或“添加角色”)实时改变环境状态。目前,DeepMind仅向少部分研究人员和创作者开放访问权限,探索负责任部署路径,评估潜在风险。
前沿探索
“思考深度”会否成为影响AI推理能力的关键要素?
新问世的一款微型AI模型颠覆了我们对模型推理学习机制的认知。新加坡的Sapient Intelligence团队近期发布的分层推理模型(HRM)借鉴了人脑的分层思考过程,相关成果已在AI界引发热议。尽管HRM的数据量仅为ChatGPT的1/100,训练所用的样本数量也仅为1000个(未使用互联网数据或进行分步指导),却能解决让许多体量更大的模型都束手无策的数独、迷宫导航等复杂逻辑问题以及抽象推理任务。与模仿人类语言不同,HRM通过内部隐藏的逻辑循环进行推理,与人在脑海中解谜的过程非常相似。该模型的成功或许预示AI领域将迎来重大变革,让思考深度成为比模型规模更重要的影响因素。(*)
译者:梁宇
审校:夏林
Welcome to Eye on AI! AI reporter Sharon Goldman here, filling in for Jeremy Kahn, who is on holiday. In this edition…General Services Administration approves OpenAI, Google, Anthropic for federal AI vendor list…Consequences of AI spending boom on U.S. economy…Clay AI raises $100 million at $3.1 billion valuation. Only in the Bay Area does spending a Saturday geeking out about AI agents—alongside 2,000 students, researchers, and tech insiders crammed into UC Berkeley—feel like a totally normal weekend plan. As I picked up my badge at the day-long Agentic AI Summit and watched the line snake through the student union lobby, it felt less like an academic conference and more like Silicon Valley’s version of a buzzy New York brunch spot.
This was certainly due to the speaker lineup, which was stacked with top AI researchers and scientists, including Jakob Pachocki, chief scientist at OpenAI; Ed Chi, VP of research at Google DeepMind; Bill Dally, chief scientist at Nvidia; Ion Stoica, cofounder at Databricks & Anyscale, as well as a UC Berkeley professor; and Dawn Song, a pioneering UC Berkeley professor focused on AI security.
The popularity also might have been due to the buzzy topic—AI agents, generally defined as an AI-powered system that can complete tasks, mostly autonomously, using other software tools. Think a chatbot not only suggesting a vacation itinerary, but also booking the flight and making the hotel reservation.
As my colleague Jeremy Kahn said in a recent article, “This kind of automation is a perennial C-suite fever dream. Over the past decade, companies embraced ‘robotic process automation,’ or RPA. This was software that could automate repetitive tasks, such as cutting and pasting between database programs. But traditional RPA systems are inflexible and unable to deal with exceptions, and can usually handle only one narrow task.” Agentic AI is meant to be both more flexible and powerful, adapting to business needs.
In a January 2025 blog post, OpenAI CEO Sam Altman said, “We believe that, in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies.”
But despite the hype, the overall message at the Agentic AI Summit was cautious and grounded: Agents may be the buzziest trend in AI right now, but the tech still has a long way to go, they said. AI agents, unfortunately, aren’t always reliable. They may not remember what came before.
Google DeepMind’s Chi, for example, stressed the gap between what agents can do in curated demos versus what’s still needed in real-world production environments. Pachocki highlighted concerns around the safety, security, and trustworthiness of agentic systems, particularly when they’re integrated into sensitive applications or operate autonomously.
“I still don’t think agents have really lived up to their promise,” said Sherwin Wu, head of engineering at OpenAI API. “Certain more generic cases have worked, but my day-to-day work doesn’t really feel that different with agents.”
While today’s agents may not currently live up to the massive hype (consider Salesforce CEO Marc Benioff’s recent claim that a shift to digital labor means he will be the “last CEO of Salesforce who only managed humans”), the speakers at the Agentic AI Summit still had plenty of optimism to share. Databricks’ Stoica expressed enthusiasm about infrastructure improvements that are making it easier to build agentic systems. Nvidia’s Dally suggested that continued hardware advances will enable more powerful and efficient agent behavior. Several pointed out “narrow wins” in specific domains, like coding.
Today’s AI agents may still have growing pains, but given the crowded UC Berkeley ballroom, the industry maintains its eye on the prize: AI agents that can reliably operate in the real world. The payoff, they believe, will be well worth the wait.
With that, here’s more AI news.
AI IN THE NEWS
U. S. agency approves OpenAI, Google, Anthropic for federal AI vendor list. Reuters reported today that the General Services Administration, which is the U.S. government's central purchasing arm, added OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude to a list of approved AI vendors in order to accelerate use of the technology by government agencies. The tools will be available to the agencies through a platform with contract terms in place. The GSA said approved AI providers "are committed to responsible use and compliance with federal standards."
The AI spending boom could have real consequences for the U.S. economy. According to the Washington Post, Big Tech’s record-breaking investment in artificial intelligence—more than $350 billion this year from Google, Meta, Amazon, and Microsoft—is becoming a major economic force, even as the broader U.S. economy shows signs of slowing. While job growth is cooling, this massive AI spending spree is fueling construction of data centers and driving demand for chips, servers, and networking gear—potentially boosting GDP growth by up to 0.7% in 2025. But economists warn the growing reliance on tech giants to prop up the economy is risky: if the AI boom loses steam, the economic fallout could be significant.
AI sales tool Clay raises $100 million at a $3.1 billion valuation. The New York Times Dealbook reported that Clay, which helps sales reps and marketers find new leads and turn them into customers, has raised $100 million at a $3.1 billion valuation.The round was led by CapitalG, an investment arm of Alphabet, Google’s parent company. Other participants included Meritech Capital Partners and Sequoia Capital. It comes around six months after the start-up raised money at a $1.25 billion valuation.
EYE ON AI RESEARCH
Google DeepMind's new Genie 3 'world model' creates real-time interactive simulations. Google DeepMind has unveiled Genie 3, a powerful new AI system that can generate rich, interactive virtual worlds from simple text prompts—making it possible to navigate dynamic environments in real time at 24 frames per second. But while it's tempting to immediately leap to using the model for the ultimate gaming experience, it’s actually the latest leap in the company’s long-term push toward 'world models'—or AI systems that can learn how the world works and simulate real-world environments. These are seen as key to training advanced agents and, eventually, achieving artificial general intelligence. Unlike prior video generators, Genie 3 allows users to move through AI-generated environments that stay visually consistent over several minutes—and even respond to commands like “make it snow” or “add a character.” For now, DeepMind is limiting access to Genie 3 to a small group of researchers and creators while it explores responsible deployment and risk.
BRAIN FOOD
Could "depth of thought" be key to AI reasoning?
A tiny new AI model is challenging what we know about how models learn to reason: Researchers from Singapore's Sapient Intelligence recently released the Hierarchical Reasoning Model (HRM), which draws inspiration from the brain’s layered thinking process—and the results have the AI community chattering. Despite being 100 times smaller than ChatGPT and trained on just 1,000 examples (with no internet data or step-by-step guidance), HRM solves tough logic problems like Sudoku, maze navigation, and abstract reasoning tasks that stump much larger models. Instead of mimicking human language, HRM reasons internally—quietly working through problems in hidden loops, much like a person thinking through a puzzle in their head. Its success hints at a radical shift in AI: one where depth of thought might matter more than scale.