Demo of Singing Voice Synthesis in Muskits-ESPnet
This is the demo page of our toolkit Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.
Singing Voice Synthesis (SVS) takes a music score as input and generates singing vocal with the voice of a specific singer.
Music score usually includes lyrics, as well as duration and pitch of each word in lyrics,
How to use:
- Choose Model-Language:
- Choose "zh" for Chinese lyrics input or "jp" for Japanese lyrics input.
- For example, "Model②(Mulitlingual)-zh" means model "Model②(Multilingual)" with lyrics input in Chinese.
- [Optional] Choose Singer: Choose a singer from the drop-down menu.
- Input lyrics:
- Input Chinese characters for "zh" and hiragana for "jp".
- You may include special symbols: 'AP' for breath, 'SP' for silence, and '-' for slur (Chinese lyrics only).
- Separate each lyric by either a space (' ') or a newline ('\n') (no quotation marks needed).
- Input durations:
- Input durations as float numbers.
- The durations sequence should match the lyric sequence in length, with each duration aligned to a lyric.
- Separate each duration by a space (' ') or a newline ('\n') (no quotation marks needed).
- Input pitches:
- Input MIDI note names or MIDI note numbers (e.g., MIDI note name "69" represents the MIDI note number "A4", and others follow accordingly).
- The pitch sequence should match the lyric sequence in length, with each pitch corresponding to a lyric.
- Separate each duration by a space (' ') or a newline ('\n') (no quotation marks needed).
- Hit "Generate" and listen:
- "Running Status" shows the status of singing generatation. If any error exists, it will show the error information.
- "Pseudo MOS" represents predicted mean opinion score for the generated song.
Notice:
- Plenty of exmpales are provided.
- Extreme values may result in suboptimal generation quality!
Singer
Examples
Model-Language | Singer | Lyrics | Duration | Pitch |
---|
References: Muskits-ESPnet paper | espnet | Model①(Chinese) | Model②(Multilingual) | SingMOS