查看原文
其他

构建大型自定义语言模型的分步指南(101张PPT)

常华Andy Andy730 2024-03-16

Source: https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31082/

在过去的两年里,自然语言处理(NLP)取得了前所未有的进展,诸如BERT、RoBerta、Electra以及现在的GPT-3等模型正在改变无数基于NLP的应用。鉴于全球语言的数量以及领域特定语言的复杂性(例如专业医学、工程、金融文本),这些进展才刚刚开始在英语以外的领域产生影响。本讲座将提供一个端到端的演示,介绍如何训练自定义大型语言模型(包括获取训练数据、数据清洗/质量评估、分布式训练和评估),并展示如何高效地将其部署到生产环境中(包括模型压缩和分布式/模型并行推断的技术概述)。我们将讨论诸如Megatron LM、Microsoft Deep Speed以及用于推断的TensorRT、Faster Transformer和Triton Inference Server等技术。

A Step-by-step Guide to Building Large Custom Language Models

The last two years have seen unprecedented progress in natural language processing (NLP) with models such as BERT, RoBerta, Electra, and now GPT-3 transforming countless NLP-based applications. Given the number of languages across the globe and the complexity of domain-specific languages (e.g., specialized medical, engineering, financial text), those advancements are just starting to make an impact outside of general-purpose English. This walkthrough will not only provide an end-to-end demonstration of how to train custom large language models (from obtaining the training data, its cleaning/quality assessment to distributed training and evaluation) but also will show how to efficiently deploy them to production (including an overview of technologies to compress the models and do distributed/model parallel inference). We'll discuss technologies such as Megatron LM, Microsoft Deep Speed, and for inference TensorRT, Faster Transformer, and Triton Inference Server.

继续滑动看下一个
向上滑动看下一个

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存