Directory

Tutorials — NVIDIA NeMo Framework User Guide

Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

Tutorials#

The best way to get started with NeMo is to start with one of our tutorials. They cover various domains and provide both introductory and advanced topics.

These tutorials can be run from inside the NeMo Framework Docker Container.

Large Language Models#

Data Curation#

Explore examples of data curation techniques using NeMo Curator:

Title with Link	Description
Distributed Data Classification	The notebook showcases how to use NeMo Curator with two distinct classifiers: one for evaluating data quality and another for identifying data domains. The integration of these classifiers streamlines the annotation process, thereby enhancing the combination of diverse datasets essential for the training of foundational models.
PEFT Curation	The tutorial demonstrates how to use the NeMo Curator Python API to curate a dataset for PEFT. Specifically, it uses the Enron dataset, which contains emails along with classification labels. Each email entry includes a subject, body, and category (class label). Throughout the tutorial, different filtering and processing operations are demonstrated, which can be applied to each record.
Single Node Data Curation Pipeline	The notebook provides a typical data curation pipeline using NeMo Curator, with the Thai Wikipedia dataset as an example. It includes demonstrations of how to download Wikipedia data using NeMo Curator, perform language separation using FastText, apply GPU-based exact deduplication and fuzzy deduplication, and utilize CPU-based heuristic filtering.
NeMo Curator Python API with Tinystories	The tutorial shows how to use the NeMo Curator Python API to curate the TinyStories dataset. TinyStories is a dataset of short stories generated by GPT-3.5 and GPT-4, featuring words that are understood by 3 to 4-year-olds. The small size of this dataset makes it ideal for creating and validating
Curating Datasets for Parameter Efficient Fine-tuning (PEFT) with Synthetic Data Generation (SDG)	The tutorial demonstrates the usage of NeMo Curator’s Python API for data curation as well as synthetic data generation, and qualitative score assignment to prepare a dataset for PEFT of LLMs.

Training & Customization#

Title with Link	Description
Quickstart with NeMo 2.0 API	The example showcases a running a simple training loop using NeMo 2.0. It uses the train API from the NeMo Framework LLM collection.
Pre-training & PEFT Quickstart with NeMo Run	An Introduction to running any of the supported NeMo 2.0 Recipes using NeMo-Run. This tutorial takes a pretraining and finetuning recipe and shows how to run it locally, as well as remotely, on a Slurm-based cluster.
Long-Context LLM Training with NeMo Run	Demonstrates using NeMo 2.0 Recipes with NeMo-Run for long-context model training, as well as extending the context length of an existing pretrained model.

Speech AI#

Most NeMo Speech AI tutorials can be run on Google’s Colab.

Running Tutorials on Colab#

To run a tutorial:

Click the Colab link associated with the tutorial you are interested in from the table below.
Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.

Speech AI Fundamentals#

Title	GitHub / Colab URL
Getting Started: NeMo Fundamentals	NeMo Fundamentals
Getting Started: Audio translator example	Audio translator example
Getting Started: Voice swap example	Voice swap example
Getting Started: NeMo Models	NeMo Models
Getting Started: NeMo Adapters	NeMo Adapters
Getting Started: NeMo Models on Hugging Face Hub	NeMo Models on HF Hub

Automatic Speech Recognition (ASR) Tutorials#

Title	GitHub / Colab URL
ASR with NeMo	ASR with NeMo
ASR with Subword Tokenization	ASR with Subword Tokenization
Offline ASR	Offline ASR
Online ASR Microphone Cache Aware Streaming	Online ASR Microphone Cache Aware Streaming
Online ASR Microphone Buffered Streaming	Online ASR Microphone Buffered Streaming
ASR CTC Language Fine-Tuning	ASR CTC Language Fine-Tuning
Intro to Transducers	Intro to Transducers
ASR with Transducers	ASR with Transducers
ASR with Adapters	ASR with Adapters
Speech Commands	Speech Commands
Online Offline Microphone Speech Commands	Online Offline Microphone Speech Commands
Voice Activity Detection	Voice Activity Detection
Online Offline Microphone VAD	Online Offline Microphone VAD
Speaker Recognition and Verification	Speaker Recognition and Verification
Speaker Diarization Inference	Speaker Diarization Inference
ASR with Speaker Diarization	ASR with Speaker Diarization
Online Noise Augmentation	Online Noise Augmentation
ASR for Telephony Speech	ASR for Telephony Speech
Streaming inference	Streaming inference
Buffered Transducer inference	Buffered Transducer inference
Buffered Transducer inference with LCS Merge	Buffered Transducer inference with LCS Merge
Offline ASR with VAD for CTC models	Offline ASR with VAD for CTC models
Self-supervised Pre-training for ASR	Self-supervised Pre-training for ASR
Multi-lingual ASR	Multi-lingual ASR
Hybrid ASR-TTS Models	Hybrid ASR-TTS Models
ASR Confidence Estimation	ASR Confidence Estimation
Confidence-based Ensembles	Confidence-based Ensembles

Text-to-Speech (TTS) Tutorials#

Title	GitHub / Colab URL
Basic and Advanced: NeMo TTS Primer	NeMo TTS Primer
Basic and Advanced: TTS Speech/Text Aligner Inference	TTS Speech/Text Aligner Inference
Basic and Advanced: FastPitch and MixerTTS Model Training	FastPitch and MixerTTS Model Training
Basic and Advanced: FastPitch Finetuning	FastPitch Finetuning
Basic and Advanced: FastPitch and HiFiGAN Model Training for German	FastPitch and HiFiGAN Model Training for German
Basic and Advanced: Tacotron2 Model Training	Tacotron2 Model Training
Basic and Advanced: FastPitch Duration and Pitch Control	FastPitch Duration and Pitch Control
Basic and Advanced: FastPitch Speaker Interpolation	FastPitch Speaker Interpolation
Basic and Advanced: TTS Inference and Model Selection	TTS Inference and Model Selection
Basic and Advanced: TTS Pronunciation Customization	TTS Pronunciation Customization

Tools and Utilities#

Title	GitHub / Colab URL
Utility Tools for Speech and Text: NeMo Forced Aligner	NeMo Forced Aligner
Utility Tools for Speech and Text: Speech Data Explorer	Speech Data Explorer
Utility Tools for Speech and Text: CTC Segmentation	CTC Segmentation

Text Processing (TN/ITN) Tutorials#

Title	GitHub / Colab URL
Text Normalization Techniques: Text Normalization	Text Normalization
Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger	Inverse Text Normalization with Thutmose Tagger
Text Normalization Techniques: WFST Tutorial	WFST Tutorial