The Large Language Model - A Milestone in the History of Artificial Intelligence

ZHAO Jun; CAO Pengfei

doi:10.12405/j.issn.2097-1486.2023.01.009

您当前的位置：

首页 >

文章列表页 >

The Large Language Model - A Milestone in the History of Artificial Intelligence

更新时间：2024-03-29

- The Large Language Model - A Milestone in the History of Artificial Intelligence
- Emerging Science and Technology(Nat.Sci.Ed.) Vol. 2, Issue 1, Pages: 80-88(2023)
- 作者机构：
  
  1.中国科学院自动化研究所复杂系统认知与决策实验室，北京 100190
  2.中国科学院大学人工智能学院，北京 100049
- 作者简介：
- 基金信息：
- DOI：10.12405/j.issn.2097-1486.2023.01.009
  CLC： TP18
- Published：25 March 2023，
  
  Received：30 January 2023，
  
  Revised：10 March 2023，
扫描看全文
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88.

ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88.
赵军,曹鹏飞.大语言模型—人工智能发展史上的里程碑[J].新兴科学和技术趋势,2023,2(1):80-88. DOI： 10.12405/j.issn.2097-1486.2023.01.009.

ZHAO Jun,CAO Pengfei.The Large Language Model - A Milestone in the History of Artificial Intelligence[J].Emerging Science and Technology,2023,2(1):80-88. DOI： 10.12405/j.issn.2097-1486.2023.01.009.

摘要

近几年，大语言模型技术突飞猛进，极大地推动了自然语言处理乃至人工智能领域的发展。本文介绍大语言模型的技术原理，回顾大语言模型的发展历程，分析大语言模型的关键技术，梳理大语言模型的实现方法，并探讨大语言模型面临的挑战。

Abstract

In recent years， the technology of large language models has made rapid progress， greatly promoting the development of the field of natural language processing and even artificial intelligence. This paper first introduces the technical principles of large language models and reviews the development process of language models. Then， we analyze the key capabilities of large language models and the implementation methods of the large language model. Finally， we summarize and explore the status and challenges of large language models.

关键词

人工智能自然语言处理大语言模型情境学习思维链指令微调

Keywords

artificial intelligencenatural language processinglarge language modelsin-context learningchain-of-thoughtinstruction tuning

references

Devlin J， Chang M W， Lee K， et al. BERT： Pre-training of Deep Bidirectional Transformers for Language Understanding［C］//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies， Volume 1 （Long and Short Papers）. 2019： 4171-4186.

Radford A， Narasimhan K， Salimans T， et al. Improving language understanding by generative pre-training［J］. 2018.

Kaplan J， McCandlish S， Henighan T， et al. Scaling laws for neural language models［J］. arXiv preprint arXiv：2001.08361， 2020.

Brown T， Mann B， Ryder N， et al. Language models are few-shot learners［J］. Advances in neural information processing systems， 2020， 33： 1877-1901.

Chowdhery A， Narang S， Devlin J， et al. Palm： Scaling language modeling with pathways［J］. arXiv preprint arXiv：2204.02311， 2022.

Zhang S， Roller S， Goyal N， et al. Opt： Open pre-trained transformer language models［J］. arXiv preprint arXiv：2205.01068， 2022.

Zeng A， Liu X， Du Z， et al. GLM-130B： An Open Bilingual Pre-trained Model［C］//The Eleventh International Conference on Learning Representations. 2022.

Bengio Y， Ducharme R， Vincent P. A neural probabilistic language model［J］. Advances in neural information processing systems， 2000， 13.

Mikolov T， Karafiát M， Burget L， et al. Recurrent neural network based language model［C］//Interspeech. 2010， 2（3）： 1045-1048.

Vaswani A， Shazeer N， Parmar N， et al. Attention is all you need［J］. Advances in neural information processing systems， 2017， 30.

Zhao W X， Zhou K， Li J， et al. A survey of large language models［J］. arXiv preprint arXiv：2303.18223， 2023.

Radford A， Wu J， Child R， et al. Language models are unsupervised multitask learners［J］. OpenAI blog， 2019， 1（8）： 9.

Lewis M， Liu Y， Goyal N， et al. BART： Denoising Sequence-to-Sequence Pre-training for Natural Language Generation， Translation， and Comprehension［C］//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020： 7871-7880.

Wei J， Tay Y， Bommasani R， et al. Emergent Abilities of Large Language Models［J］. Transactions on Machine Learning Research， 2022.

Dai D， Sun Y， Dong L， et al. Why can gpt learn in-context？ language models secretly perform gradient descent as meta optimizers［J］. arXiv preprint arXiv：2212.10559， 2022.

Wei J， Wang X， Schuurmans D， et al. Chain-of-thought prompting elicits reasoning in large language models［J］. Advances in Neural Information Processing Systems， 2022， 35： 24824-24837.

Kojima T， Gu S S， Reid M， et al. Large language models are zero-shot reasoners［J］. Advances in neural information processing systems， 2022， 35： 22199-22213.

Wei J， Bosma M， Zhao V， et al. Finetuned Language Models are Zero-Shot Learners［C］//International Conference on Learning Representations. 2021.

Ouyang L， Wu J， Jiang X， et al. Training language models to follow instructions with human feedback［J］. Advances in Neural Information Processing Systems， 2022， 35： 27730-27744.

Bengio Y， Ducharme R， Vincent P. A Neural Probabilistic Language Model［J］. Advances in Neural Information Processing Systems， 2000， 13.

Mikolov T， Karafiát M， Burget L， et al. Recurrent neural network based language model［C］//Interspeech. 2010， 2（3）： 1045-1048.

Pham N Q， Kruszewski G， Boleda G. Convolutional neural network language models［C］//Proceedings of the 2016 conference on empirical methods in natural language processing. 2016： 1153-1162.

更多指标>

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

⁰