Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Tutorials#
The best way to get started with NeMo is to start with one of our tutorials. They cover various domains and provide both introductory and advanced topics.
These tutorials can be run from inside the NeMo Framework Docker Container.
Large Language Models#
Data Curation#
Explore examples of data curation techniques using NeMo Curator:
Title with Link |
Description |
---|---|
The notebook showcases how to use NeMo Curator with two distinct classifiers: one for evaluating data quality and another for identifying data domains. The integration of these classifiers streamlines the annotation process, thereby enhancing the combination of diverse datasets essential for the training of foundational models. |
|
The tutorial demonstrates how to use the NeMo Curator Python API to curate a dataset for PEFT. Specifically, it uses the Enron dataset, which contains emails along with classification labels. Each email entry includes a subject, body, and category (class label). Throughout the tutorial, different filtering and processing operations are demonstrated, which can be applied to each record. |
|
The notebook provides a typical data curation pipeline using NeMo Curator, with the Thai Wikipedia dataset as an example. It includes demonstrations of how to download Wikipedia data using NeMo Curator, perform language separation using FastText, apply GPU-based exact deduplication and fuzzy deduplication, and utilize CPU-based heuristic filtering. |
|
The tutorial shows how to use the NeMo Curator Python API to curate the TinyStories dataset. TinyStories is a dataset of short stories generated by GPT-3.5 and GPT-4, featuring words that are understood by 3 to 4-year-olds. The small size of this dataset makes it ideal for creating and validating |
|
Curating Datasets for Parameter Efficient Fine-tuning (PEFT) with Synthetic Data Generation (SDG) |
The tutorial demonstrates the usage of NeMo Curator’s Python API for data curation as well as synthetic data generation, and qualitative score assignment to prepare a dataset for PEFT of LLMs. |
Training & Customization#
Title with Link |
Description |
---|---|
The example showcases a running a simple training loop using NeMo 2.0. It uses the train API from the NeMo Framework LLM collection. |
|
An Introduction to running any of the supported NeMo 2.0 Recipes using NeMo-Run. This tutorial takes a pretraining and finetuning recipe and shows how to run it locally, as well as remotely, on a Slurm-based cluster. |
|
Demonstrates using NeMo 2.0 Recipes with NeMo-Run for long-context model training, as well as extending the context length of an existing pretrained model. |
Speech AI#
Most NeMo Speech AI tutorials can be run on Google’s Colab.
Running Tutorials on Colab#
To run a tutorial:
Click the Colab link associated with the tutorial you are interested in from the table below.
Once in Colab, connect to an instance with a GPU by clicking Runtime > Change runtime type and selecting GPU as the hardware accelerator.
Speech AI Fundamentals#
Title |
GitHub / Colab URL |
---|---|
Getting Started: NeMo Fundamentals |
|
Getting Started: Audio translator example |
|
Getting Started: Voice swap example |
|
Getting Started: NeMo Models |
|
Getting Started: NeMo Adapters |
|
Getting Started: NeMo Models on Hugging Face Hub |
Automatic Speech Recognition (ASR) Tutorials#
Title |
GitHub / Colab URL |
---|---|
ASR with NeMo |
|
ASR with Subword Tokenization |
|
Offline ASR |
|
Online ASR Microphone Cache Aware Streaming |
|
Online ASR Microphone Buffered Streaming |
|
ASR CTC Language Fine-Tuning |
|
Intro to Transducers |
|
ASR with Transducers |
|
ASR with Adapters |
|
Speech Commands |
|
Online Offline Microphone Speech Commands |
|
Voice Activity Detection |
|
Online Offline Microphone VAD |
|
Speaker Recognition and Verification |
|
Speaker Diarization Inference |
|
ASR with Speaker Diarization |
|
Online Noise Augmentation |
|
ASR for Telephony Speech |
|
Streaming inference |
|
Buffered Transducer inference |
|
Buffered Transducer inference with LCS Merge |
|
Offline ASR with VAD for CTC models |
|
Self-supervised Pre-training for ASR |
|
Multi-lingual ASR |
|
Hybrid ASR-TTS Models |
|
ASR Confidence Estimation |
|
Confidence-based Ensembles |
Text-to-Speech (TTS) Tutorials#
Title |
GitHub / Colab URL |
---|---|
Basic and Advanced: NeMo TTS Primer |
|
Basic and Advanced: TTS Speech/Text Aligner Inference |
|
Basic and Advanced: FastPitch and MixerTTS Model Training |
|
Basic and Advanced: FastPitch Finetuning |
|
Basic and Advanced: FastPitch and HiFiGAN Model Training for German |
|
Basic and Advanced: Tacotron2 Model Training |
|
Basic and Advanced: FastPitch Duration and Pitch Control |
|
Basic and Advanced: FastPitch Speaker Interpolation |
|
Basic and Advanced: TTS Inference and Model Selection |
|
Basic and Advanced: TTS Pronunciation Customization |
Tools and Utilities#
Title |
GitHub / Colab URL |
---|---|
Utility Tools for Speech and Text: NeMo Forced Aligner |
|
Utility Tools for Speech and Text: Speech Data Explorer |
|
Utility Tools for Speech and Text: CTC Segmentation |
Text Processing (TN/ITN) Tutorials#
Title |
GitHub / Colab URL |
---|---|
Text Normalization Techniques: Text Normalization |
|
Text Normalization Techniques: Inverse Text Normalization with Thutmose Tagger |
|
Text Normalization Techniques: WFST Tutorial |