https://touch-sp.hatenablog.com/entry/2022/07/11/232407

はじめに

前回ESPnetというのを使わせて頂き音声合成を行いました。
touch-sp.hatenablog.com

スクリプトを一部変えるだけで様々な音声が合成できるようなので今回一部を試してみました。

結果（3種類の音声）

jsut

前回と同じです。

text2speech = Text2Speech.from_pretrained(
    model_tag=str_or_none('kan-bayashi/jsut_full_band_vits_prosody'),
    vocoder_tag=str_or_none('none'),
    device="cuda"
)

tsukuyomi

text2speech = Text2Speech.from_pretrained(
    model_tag=str_or_none('kan-bayashi/tsukuyomi_full_band_vits_prosody'),
    vocoder_tag=str_or_none('none'),
    device="cuda"
)

jvs

text2speech = Text2Speech.from_pretrained(
    model_tag=str_or_none('kan-bayashi/jvs_jvs010_vits_prosody'),
    vocoder_tag=str_or_none('none'),
    device="cuda"
)