SynthV 2 has worse accents than SynthV 1, and worst of all, bad Korean.
I can’t get a good Mandarin Chinese “u” with a Japanese voicebank, it all sounds like a “uhh” instead of “ooo”, much like how it is in a Japanese accent. An English voicebank struggles with Chinese “ts`h” and Spanish “ll”, and all non-Korean voicebanks can’t say the Korean letter “시” properly, it’s supposed to sound exactly like Japanese “し” (somewhere in between English “sea” and “she” but MUCH, MUCH closer to “she”) and yet all non-Korean voicebanks say it more like “sea” instead. The voicebanks can say “し” pretty well, but “시”, which is supposed to be identical, ends up sounding totally different and “wrong” (the initial consonant sound is the issue, it’s too few palatalization). The previous version (SynthV 1.11.2) could say all of these sounds correctly, without any noticeable accent, but SynthV 2 feels like starting from SynthV 1.5.0 all over again: bad “non-native feeling” accent that’s hard to get rid of, many phonemes sound “wrong”, vowels and consonants feel like spoken by a “no sabo” kid, place and manner of articulation bleeds together in unexpected ways, which makes the whole sound feel “wrong”. Mai’s English is too much “Japanese accented”, “ae” gets confused with “ah”, “ax” with “ao”, and “uw” with "uh’. Liam’s Chinese sounds “spotty”, and all voicebanks I have, the Korean letters “시” and “ㅢ” are all bad, with “ㅡ” getting mixed up with “ㅜ”, “ㅢ” with “ㅟ” (or with Lojban “.ui”), and “ㅝ”/wʌ/ with “ᄝᅩ”/wo/. The previous version had much less issues ilke that, and they could be fixed with simple phoneme parameter editing. The new version is a lot more jarring, and no matter how many parameters I add, I can’t get rid of that ugly “uhh” which shouldn’t exist in Chinese, and that sticky Spanish “ll” which seems to sound too much of a “jju”, like a wet piece of saliva that becomes stuck on the roof your mouth.
3 Likes
Update: I’m thinking of workarounds to avoid the problems. The korean 시 (and also 씨) is surprisingly simple, replace it with “shi”(JP) or “xi”(ZH), but if you want a more soft pronunciation (시) you can decrease the strength and also (optional) change the duration and mouth opening, and for a hard pronunciation (씨) I just increase strength or duration, and when it sounds bad I can just double the phoneme so it’s [sh sh i] or [s\ s\ i], for Japanese and Chinese respectively.
I found some issues with Spanish too
1 Like
exactly what? Is it the “sticky jju” I was talking about?
it should sound like “lyyu” (with prominent lateral sound, like how those people without yeismo say their ll’s in Spanish, it’s very close to gli in Italian) but I’m getting a “jju” (more similar to Italian “giu”) (more of a fricative or affricate, but the intended sound is an approximant). For more info about “lyyu” and “jju”, watch this video Spanish speakers CAN’T pronounce this Spanish sound! (LL vs Y) - YouTube. (at about 1:49 she says the word “yeve” with sticky “jju” and a bit earlier she says “lleve” with “lyyu”)
the sticky jju I hear is probably a crude imitation of a yeismo sound but with a strong foreign accent (possibly a Chinese or Japanese accent), despite the fact I’m literally using the phoneme that was specifically added to handle NON-YEISMO PRONUNCIATIONS
the Korean 우 issue can be fixed by replacing the Hangul character with the Latin letters “uq”.

Now let me talk more about the sticky “JJU” issue. The sticky jju is a sound produced by sticking your wet salivated tongue to your roof of your mouth an saying the word “jju”. It incorrectly shows up in place of “ll”. To think of a solution to make the non yeismo sound from https://www.youtube.com/watch?v=SgGmXr0DzA8, I had to think of ways to intentionally misspell the word. I thought of a simple VCV sequence like /aʎa/, which when spelled normally (as “alla”) mistakenly sounded as /aʑ͡ʝa/. Given that Italian, a language closely related to Spanish, spells a similar sound as “gli”, I tried using “aglia”. Surprisingly, even though the phonemes were wrong, it gave me the correct sound! Why? Because the “aglia” (phoneme is [a g l y a]) in Spanish is a combination of 3 phonemes, a soft “g” (IPA /ɣ/ or sometimes /ɰ/) to give us that sudden drop in sound intensity at the start of that lateral approximant , a sound “l” for the lateral “lyyu”, and the palatal “y” for the intense palatal effect right where it transitions to the vowel. So essentially, we’re dealing with IPA /ɰlj/, and when it’s spoken quickly, the 3 sounds slur together into /ʎ/ or /ʎ̟/, which is exactly what I wanted.
In Spanish I am having more issues with “RR” (instead of a soft “r”) “J” (instead of aspired “h”) and “Z” (or C, instead of “s”… very strong cases of “seseo”).
It is difficult to get a “cerveza” instead of “servesa”, a “perro” instead of “pero” or “Naranja” instead of “naranha”.
for some reason the “use European pronunciation” box doesn’t work. But try replacing the “s” phoneme with “C”, and sometimes “b” with “B” because the letter “v” might default to a hard “b” sometimes.

1 Like
for “r” I would suggest shortening it and changing the pitch bend to a “/” shape instead of a "///" shape, and if it doesn’t work maybe replace it with English “dx”, “er dx”, “dx er”, or “er dx er” depending on what comes before and after it, so it doesn’t become a rr
VrV → V dx V
CrV → C er dx V
VrC → V dx er C
CrC → C er dx er C
1 Like
Yes, I came to the same conclusion yesterday: the “use European pronunciation” option is broken and behind some of the issues I found.
Thanks for all your suggestions!
for rr, use [er dx er dx er dx er], alternating between er and dx.

1 Like
the sticky jju, is avoided, as I said, by the g, l, i sound.

1 Like
/ʑ͡ʝ/, (voiced partially-un-sibilant-ed palatal fricative) somewhere in between /ʑ/(fully sibilant) or /ʝ/(non-sibilant). The sound /ʑ͡ʝ/ is a fricative, and is made with significant friction in the “sticky” region of the tongue, right near where it meets the salivary gland on the hard palate, which is why there’s a significant amount of saliva. There is also an affricate variant /ɟ͡ʑ/, a voiceless fricative variant /ɕ͡ç/, and a voiceless affricate /c͡ɕ/. SynthV uses all four of /ʑ͡ʝ/, /ɟ͡ʑ/, /ɕ͡ç/, /c͡ɕ/ interchangeably but incorrectly. I believe it might be a Japanese influence.
/ʑ͡ʝ/ might sound similar to じ.
/ɟ͡ʑ/ might sound similar to ぢ or ぎ
/ɕ͡ç/ might sound similar to し or ひ
/c͡ɕ/ might sound similar to き or ち
The sound /ʑ͡ʝ/ , /ɟ͡ʑ/ is made by by sticking your wet salivated tongue to your roof of your mouth an saying the word “jju”.