浏览全部资源
扫码关注微信
1.中国科学院自动化研究所 复杂系统认知与决策实验室,北京 100190
2.中国科学院大学 人工智能学院,北京 100049
Published:25 March 2023,
Received:30 January 2023,
Revised:10 March 2023,
扫 描 看 全 文
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88.
ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88.
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88. DOI: 10.12405/j.issn.2097-1486.2023.01.009.
ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88. DOI: 10.12405/j.issn.2097-1486.2023.01.009.
近几年,大语言模型技术突飞猛进,极大地推动了自然语言处理乃至人工智能领域的发展。本文介绍大语言模型的技术原理,回顾大语言模型的发展历程,分析大语言模型的关键技术,梳理大语言模型的实现方法,并探讨大语言模型面临的挑战。
In recent years, the technology of large language models has made rapid progress, greatly promoting the development of the field of natural language processing and even artificial intelligence. This paper first introduces the technical principles of large language models and reviews the development process of language models. Then, we analyze the key capabilities of large language models and the implementation methods of the large language model. Finally, we summarize and explore the status and challenges of large language models.
人工智能自然语言处理大语言模型情境学习思维链指令微调
artificial intelligencenatural language processinglarge language modelsin-context learningchain-of-thoughtinstruction tuning
Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019: 4171-4186.
Radford A, Narasimhan K, Salimans T, et al. Improving language understanding by generative pre-training[J]. 2018.
Kaplan J, McCandlish S, Henighan T, et al. Scaling laws for neural language models[J]. arXiv preprint arXiv:2001.08361, 2020.
Brown T, Mann B, Ryder N, et al. Language models are few-shot learners[J]. Advances in neural information processing systems, 2020, 33: 1877-1901.
Chowdhery A, Narang S, Devlin J, et al. Palm: Scaling language modeling with pathways[J]. arXiv preprint arXiv:2204.02311, 2022.
Zhang S, Roller S, Goyal N, et al. Opt: Open pre-trained transformer language models[J]. arXiv preprint arXiv:2205.01068, 2022.
Zeng A, Liu X, Du Z, et al. GLM-130B: An Open Bilingual Pre-trained Model[C]//The Eleventh International Conference on Learning Representations. 2022.
Bengio Y, Ducharme R, Vincent P. A neural probabilistic language model[J]. Advances in neural information processing systems, 2000, 13.
Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model[C]//Interspeech. 2010, 2(3): 1045-1048.
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
Zhao W X, Zhou K, Li J, et al. A survey of large language models[J]. arXiv preprint arXiv:2303.18223, 2023.
Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8): 9.
Lewis M, Liu Y, Goyal N, et al. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 7871-7880.
Wei J, Tay Y, Bommasani R, et al. Emergent Abilities of Large Language Models[J]. Transactions on Machine Learning Research, 2022.
Dai D, Sun Y, Dong L, et al. Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers[J]. arXiv preprint arXiv:2212.10559, 2022.
Wei J, Wang X, Schuurmans D, et al. Chain-of-thought prompting elicits reasoning in large language models[J]. Advances in Neural Information Processing Systems, 2022, 35: 24824-24837.
Kojima T, Gu S S, Reid M, et al. Large language models are zero-shot reasoners[J]. Advances in neural information processing systems, 2022, 35: 22199-22213.
Wei J, Bosma M, Zhao V, et al. Finetuned Language Models are Zero-Shot Learners[C]//International Conference on Learning Representations. 2021.
Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[J]. Advances in Neural Information Processing Systems, 2022, 35: 27730-27744.
Bengio Y, Ducharme R, Vincent P. A Neural Probabilistic Language Model[J]. Advances in Neural Information Processing Systems, 2000, 13.
Mikolov T, Karafiát M, Burget L, et al. Recurrent neural network based language model[C]//Interspeech. 2010, 2(3): 1045-1048.
Pham N Q, Kruszewski G, Boleda G. Convolutional neural network language models[C]//Proceedings of the 2016 conference on empirical methods in natural language processing. 2016: 1153-1162.
0
Views
1
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution