Icefall for dummies tutorial

This tutorial walks you step by step about how to create a simple ASR (Automatic Speech Recognition) system with Next-gen Kaldi.

We use the yesno dataset for demonstration. We select it out of two reasons:

  • It is quite tiny, containing only about 12 minutes of data

  • The training can be finished within 20 seconds on CPU.

That also means you don’t need a GPU to run this tutorial.

Let’s get started!

Please follow items below sequentially.

Note

The Data Preparation runs only on Linux and on macOS. All other parts run on Linux, macOS, and Windows.

Help from the community is appreciated to port the Data Preparation to Windows.