1.中国科学院自动化研究所 复杂系统认知与决策实验室,北京 100190
2.中国科学院大学 人工智能学院,北京 100049
[ "赵军,男,中国科学院自动化研究所研究员,博士生导师;中国科学院特聘核心岗位研究员;中国科学院大学人工智能学院岗位教授。研究领域为自然语言处理和知识工程。承担科技部新一代人工智能重大项目、国家自然科学基金重点项目等国家级科研项目。出版专著《知识图谱》、《知识图谱:算法与实践》;发表学术论文80余篇,Google Scholar引用近两万次。曾获第25届国际计算语言学大会最佳论文奖(2014),中国中文信息学会钱伟长中文信息处理科学技术奖一等奖(2018),北京市科学技术进步奖一等奖(2019),中国科学院朱李月华优秀教师奖(2020)。Email:jzhao@nlpr.ia.ac.cn" ]
[ "曹鹏飞,男,中国科学院自动化研究所助理研究员,研究方向为自然语言处理、知识图谱、预训练语言模型,在AAAI、ACL、EMNLP等人工智能与自然语言处理国际会议上发表论文多篇。博士期间曾获中国科学院院长奖学金、国家奖学金、攀登奖学金、北京市优秀毕业生等荣誉。曾任中国中文信息学会青年工作委员会学生执委会的执行委员,第一届中国自然语言处理学生研讨会的博士生论坛主席,并担任TKDE、TALLIP、ACL、AAAI、IJCAI、EMNLP、COLING等国际高水平学术期刊和会议的审稿人。Email:pengfei.cao@nlpr.ia.ac.cn" ]
纸质出版日期:2023-03-25,
收稿日期:2023-01-30,
修回日期:2023-03-10,
扫 描 看 全 文
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88.
ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88.
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88. DOI: 10.12405/j.issn.2097-1486.2023.01.009.
ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88. DOI: 10.12405/j.issn.2097-1486.2023.01.009.
近几年,大语言模型技术突飞猛进,极大地推动了自然语言处理乃至人工智能领域的发展。本文介绍大语言模型的技术原理,回顾大语言模型的发展历程,分析大语言模型的关键技术,梳理大语言模型的实现方法,并探讨大语言模型面临的挑战。
In recent years, the technology of large language models has made rapid progress, greatly promoting the development of the field of natural language processing and even artificial intelligence. This paper first introduces the technical principles of large language models and reviews the development process of language models. Then, we analyze the key capabilities of large language models and the implementation methods of the large language model. Finally, we summarize and explore the status and challenges of large language models.
人工智能自然语言处理大语言模型情境学习思维链指令微调
artificial intelligencenatural language processinglarge language modelsin-context learningchain-of-thoughtinstruction tuning
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[J]. 2018.
Kaplan J, McCandlish S, Henighan T, et al. Scaling laws for neural language models[J]. arXiv preprint arXiv:2001.08361, 2020.
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.
Chowdhery A, Narang S, Devlin J, et al. Palm: Scaling language modeling with pathways[J]. arXiv preprint arXiv:2204.02311, 2022.
Zhang S, Roller S, Goyal N, et al. Opt: Open pre-trained transformer language models[J]. arXiv preprint arXiv:2205.01068, 2022.
Zeng A, Liu X, Du Z, et al. GLM-130B: An Open Bilingual Pre-trained Model[C]//The Eleventh International Conference on Learning Representations. 2022.
Bengio Y, Ducharme R, Vincent P. A neural probabilistic language model[J]. Advances in neural information processing systems, 2000, 13.
Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model[C]//Interspeech. 2010, 2(3): 1045-1048.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
Zhao W X, Zhou K, Li J, et al. A survey of large language models[J]. arXiv preprint arXiv:2303.18223, 2023.
Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8): 9.
Lewis M, Liu Y, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 7871-7880.
Wei J, Tay Y, Bommasani R, et al. Emergent Abilities of Large Language Models[J]. Transactions on Machine Learning Research, 2022.
Dai D, Sun Y, Dong L, et al. Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers[J]. arXiv preprint arXiv:2212.10559, 2022.
Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models[J]. Advances in Neural Information Processing Systems, 2022, 35: 24824-24837.
Kojima T, Gu S S, Reid M, et al. Large language models are zero-shot reasoners[J]. Advances in neural information processing systems, 2022, 35: 22199-22213.
Wei J, Bosma M, Zhao V, et al. Finetuned Language Models are Zero-Shot Learners[C]//International Conference on Learning Representations. 2021.
Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[J]. Advances in Neural Information Processing Systems, 2022, 35: 27730-27744.
Bengio Y, Ducharme R, Vincent P. A Neural Probabilistic Language Model[J]. Advances in Neural Information Processing Systems, 2000, 13.
Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model[C]//Interspeech. 2010, 2(3): 1045-1048.
Pham N Q, Kruszewski G, Boleda G. Convolutional neural network language models[C]//Proceedings of the 2016 conference on empirical methods in natural language processing. 2016: 1153-1162.
0
浏览量
1
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构