大语言模型—人工智能发展史上的里程碑

赵军; 曹鹏飞

doi:10.12405/j.issn.2097-1486.2023.01.009

您当前的位置：

首页 >

文章列表页 >

大语言模型—人工智能发展史上的里程碑

更新时间：2024-03-29

- 大语言模型—人工智能发展史上的里程碑
- The Large Language Model - A Milestone in the History of Artificial Intelligence
- 新兴科学和技术趋势 2023年2卷第1期页码：80-88
- 作者机构：
  
  1.中国科学院自动化研究所复杂系统认知与决策实验室，北京 100190
  2.中国科学院大学人工智能学院，北京 100049
- 作者简介：
  
  [ "赵军，男，中国科学院自动化研究所研究员，博士生导师；中国科学院特聘核心岗位研究员；中国科学院大学人工智能学院岗位教授。研究领域为自然语言处理和知识工程。承担科技部新一代人工智能重大项目、国家自然科学基金重点项目等国家级科研项目。出版专著《知识图谱》、《知识图谱：算法与实践》；发表学术论文80余篇，Google Scholar引用近两万次。曾获第25届国际计算语言学大会最佳论文奖（2014），中国中文信息学会钱伟长中文信息处理科学技术奖一等奖（2018），北京市科学技术进步奖一等奖（2019），中国科学院朱李月华优秀教师奖（2020）。Email：jzhao@nlpr.ia.ac.cn" ]
  [ "曹鹏飞，男，中国科学院自动化研究所助理研究员，研究方向为自然语言处理、知识图谱、预训练语言模型，在AAAI、ACL、EMNLP等人工智能与自然语言处理国际会议上发表论文多篇。博士期间曾获中国科学院院长奖学金、国家奖学金、攀登奖学金、北京市优秀毕业生等荣誉。曾任中国中文信息学会青年工作委员会学生执委会的执行委员，第一届中国自然语言处理学生研讨会的博士生论坛主席，并担任TKDE、TALLIP、ACL、AAAI、IJCAI、EMNLP、COLING等国际高水平学术期刊和会议的审稿人。Email：pengfei.cao@nlpr.ia.ac.cn" ]
- 基金信息：
- DOI：10.12405/j.issn.2097-1486.2023.01.009
  中图分类号： TP18
- 纸质出版日期：2023-03-25，
  
  收稿日期：2023-01-30，
  
  修回日期：2023-03-10，
扫描看全文
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88.

ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88.
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88. DOI： 10.12405/j.issn.2097-1486.2023.01.009.

ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88. DOI： 10.12405/j.issn.2097-1486.2023.01.009.

摘要

近几年，大语言模型技术突飞猛进，极大地推动了自然语言处理乃至人工智能领域的发展。本文介绍大语言模型的技术原理，回顾大语言模型的发展历程，分析大语言模型的关键技术，梳理大语言模型的实现方法，并探讨大语言模型面临的挑战。

Abstract

In recent years， the technology of large language models has made rapid progress， greatly promoting the development of the field of natural language processing and even artificial intelligence. This paper first introduces the technical principles of large language models and reviews the development process of language models. Then， we analyze the key capabilities of large language models and the implementation methods of the large language model. Finally， we summarize and explore the status and challenges of large language models.

关键词

人工智能自然语言处理大语言模型情境学习思维链指令微调

Keywords

artificial intelligencenatural language processinglarge language modelsin-context learningchain-of-thoughtinstruction tuning

references

Devlin J， Chang M W， Lee K， et al. BERT： Pre-training of Deep Bidirectional Transformers for Language Understanding［C］//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. 2019： 4171-4186.

Radford A， Narasimhan K， Salimans T， et al. Improving language understanding by generative pre-training［J］. 2018.

Kaplan J， McCandlish S， Henighan T， et al. Scaling laws for neural language models［J］. arXiv preprint arXiv：2001.08361， 2020.

Brown T， Mann B， Ryder N， et al. Language models are few-shot learners［J］. Advances in neural information processing systems， 2020， 33： 1877-1901.

Chowdhery A， Narang S， Devlin J， et al. Palm： Scaling language modeling with pathways［J］. arXiv preprint arXiv：2204.02311， 2022.

Zhang S， Roller S， Goyal N， et al. Opt： Open pre-trained transformer language models［J］. arXiv preprint arXiv：2205.01068， 2022.

Zeng A， Liu X， Du Z， et al. GLM-130B： An Open Bilingual Pre-trained Model［C］//The Eleventh International Conference on Learning Representations. 2022.

Bengio Y， Ducharme R， Vincent P. A neural probabilistic language model［J］. Advances in neural information processing systems， 2000， 13.

Mikolov T， Karafiát M， Burget L， et al. Recurrent neural network based language model［C］//Interspeech. 2010， 2（3）： 1045-1048.

Vaswani A， Shazeer N， Parmar N， et al. Attention is all you need［J］. Advances in neural information processing systems， 2017， 30.

Zhao W X， Zhou K， Li J， et al. A survey of large language models［J］. arXiv preprint arXiv：2303.18223， 2023.

Radford A， Wu J， Child R， et al. Language models are unsupervised multitask learners［J］. OpenAI blog， 2019， 1（8）： 9.

Lewis M， Liu Y， Goyal N， et al. BART： Denoising Sequence-to-Sequence Pre-training for Natural Language Generation， Translation， and Comprehension［C］//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020： 7871-7880.

Wei J， Tay Y， Bommasani R， et al. Emergent Abilities of Large Language Models［J］. Transactions on Machine Learning Research， 2022.

Dai D， Sun Y， Dong L， et al. Why can gpt learn in-context？ language models secretly perform gradient descent as meta optimizers［J］. arXiv preprint arXiv：2212.10559， 2022.

Wei J， Wang X， Schuurmans D， et al. Chain-of-thought prompting elicits reasoning in large language models［J］. Advances in Neural Information Processing Systems， 2022， 35： 24824-24837.

Kojima T， Gu S S， Reid M， et al. Large language models are zero-shot reasoners［J］. Advances in neural information processing systems， 2022， 35： 22199-22213.

Wei J， Bosma M， Zhao V， et al. Finetuned Language Models are Zero-Shot Learners［C］//International Conference on Learning Representations. 2021.

Ouyang L， Wu J， Jiang X， et al. Training language models to follow instructions with human feedback［J］. Advances in Neural Information Processing Systems， 2022， 35： 27730-27744.

Bengio Y， Ducharme R， Vincent P. A Neural Probabilistic Language Model［J］. Advances in Neural Information Processing Systems， 2000， 13.

Mikolov T， Karafiát M， Burget L， et al. Recurrent neural network based language model［C］//Interspeech. 2010， 2（3）： 1045-1048.

Pham N Q， Kruszewski G， Boleda G. Convolutional neural network language models［C］//Proceedings of the 2016 conference on empirical methods in natural language processing. 2016： 1153-1162.

更多指标>

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

暂无数据