Job description
I am looking for a freelancer that would create for me a Colabolatory Script that will allow me to train and test TTS based on nVidia NeMo Framewrok. It is important to have an option to also modify/fake/augment/swap voice. I wish to train on a large amount of data and swap voice based on smaller sample of the data. To be more precise:
Colabolatory Script:
- Should be clear and easy to understand, with comments and description what each cell with code actually does
- It should have indicated places where I could change training parameters or methods + a comment what are possible values to put there
- After training there should be option to input text and get generated audio file
- After training there should be option to measure Mel cepstral distortion (MCD)
- When above is done next cell should be able to modify/fake/augment/swap voice based on small audio sample of a voice. For this task I would like to able to use either trained model, or pre=trained from nVidia NeMo.
- There should be option to upload from computer previously trained models – with and without voice swap. And use them for TTS or MCD.
Training data:
- I need the script to able to process M-AILABS and Common Voice 7.0, separately or joined together
- I need also to able to provide my own training data – I need you to define some simple data format in which I will have to prepare my own data
- Since free Collaboratory has limits, it is OK to train in fragments of the data I mentioned
- You can test on any language but before delivery I need to work with West Slavic language I will define.