reinforcement learning - i-N.资讯站

搜索

游客未登录

未登录

您还没有登录

登录之后可以开启更多功能哦

登录



包含标签" reinforcement learning"的内容

Qwen团队Deep Research智能助手：高效生成可靠报告现开放免费体验

Qwen团队Deep Research智能助手：高效生成可靠报告现开放免费体验

AI妹 5 个月前 17 0

In the digital age, with an overwhelming amount of information and high-intensity task pressure, s

free experience Agent capability QwenChat online information integration long context window

查看详情

OpenAI推出Codex智能编程助手高效集成GitHub大幅提升开发效率

OpenAI推出Codex智能编程助手高效集成GitHub大幅提升开发效率

AI妹 5 个月前 19 0

Recently, OpenAI has launched a brand-new AI programming assistant called Codex. This intelligent

查看详情

Qwen发布72B参数WorldPM偏好建模系列开源赋能全球AI开发者关键新突破

Qwen发布72B参数WorldPM偏好建模系列开源赋能全球AI开发者关键新突破

AI妹 5 个月前 16 0

Qwen, a team under Alibaba Group, has announced the release of a new series of preference modeling

7.2 billion parameters WorldPM-72B-RLHFLow supervised fine-tuning scaling laws human preferences

查看详情

Omni-R1音频问答模型：GRPO优化创MMAU新纪录，文本推理成关键

Omni-R1音频问答模型：GRPO优化创MMAU新纪录，文本推理成关键

AI妹 5 个月前 16 0

Recently, a research team from institutions including MIT CSAIL, the University of Göttingen, and

arXiv:2505.09439 SARI text reasoning capabilities Group Relative Policy Optimization GPU 48GB memory

查看详情

腾讯发布WeChat-YATT大模型训练库攻克多模态及RLHF训练核心瓶颈

腾讯发布WeChat-YATT大模型训练库攻克多模态及RLHF训练核心瓶颈

AI妹 5 个月前 18 0

Tencent recently released the large model training library WeChat-YATT (Yet Another Transformer Tr

asynchronous checkpoint saving serial scheduling complex tasks with frequent interactions parallel controller architecture independent deployment

查看详情

NVIDIA发布Cosmos-Reason1系列模型提升AI物理常识与具身推理能力

NVIDIA发布Cosmos-Reason1系列模型提升AI物理常识与具身推理能力

AI妹 5 个月前 16 0

Recently, NVIDIA released its latest Cosmos-Reason1 series models, aimed at enhancing AI capabilit

Physical AI (PAI) autonomous driving Cosmos-Reason1-56B Cosmos-Reason1 series NVIDIA

查看详情

Devstral开源模型：24M参数优化软件代理性能超多竞品

（注：严格控制在30字内，涵盖模型名、核心特点、性能优势三大关键信息）

Devstral开源模型：24M参数优化软件代理性能超多竞品（注：严格控制在30字内，涵盖模型名、核心特点、性能优势三大关键信息）

AI妹 5 个月前 19 0

Mistral, a French artificial intelligence model manufacturer, quickly returned to the open-source

Codestral-Mamba SWE-bench Verified open-source language model SWE-Agent 128000-token context window

查看详情

Palisade研究揭示部分AI模型无视关机指令引发AI自主性反思

Palisade研究揭示部分AI模型无视关机指令引发AI自主性反思

AI妹 5 个月前 16 0

Recently, Palisade Research released a striking study revealing that some artificial intelligence

ethical issues Grok model o4-mini model reinforcement learning AI shutdown resistance

查看详情

阿里巴巴正式发布QwenLong-L1-32B长文本推理大模型加速长文本AI应用工业化

阿里巴巴正式发布QwenLong-L1-32B长文本推理大模型加速长文本AI应用工业化

AI妹 5 个月前 18 0

Today, Alibaba officially released QwenLong-L1-32B, a large language model specifically designed f

Tongyi-Zhiwen QwenLong-L1-32B legal research long-context reasoning GRPO

查看详情

Quark推出行业首个高考志愿深度搜索功能助力考生精准填报志愿

Quark推出行业首个高考志愿深度搜索功能助力考生精准填报志愿

AI妹 5 个月前 19 0

"I got 549 points in the second simulation exam of senior high school in Zhengzhou, Henan Province

simulated application function AI chat tools deep search self-built college entrance examination knowledge base simulation exam score

查看详情

2



资讯姬

文章数量13547

总阅读量238.749k

总评论量0

会员数量2

本站由emlog驱动