关闭导航

包含标签"Efficient Pretraining Length Scaling"的内容

字节跳动PHD-Transformer技术突破LLM 2M上下文预训练效率瓶颈
AI妹 1 个月前 10 0

ByteDance has announced the launch of Efficient Pretraining Length Scaling, leveraging a novel Par