Can we have something like "global phoneme palette"?

Instead of each language having separate phoneme inventories, what about merging them into one global phoneme palette? This way, a language would be able to borrow a phoneme from another language.

My proposal is like this:

Symbol IPA English Japanese Mandarin Cantonese Spanish Korean
m [m] Mine Ma (ま) M (ㄇ) M (ㄇ) Madre M (ㅁ)
n [n] Nice Na (な) N (ㄋ) N (ㄋ) Nido N (ㄴ)
nj [ɲ] - - - - Ñandú -
ng [ŋ] loNG - aNG (ㄤ) NG (ㄫ) ciNco NG (ㅇ; coda)
p [p] Pill Pa (ぱ) B (ㄅ) B (ㄅ) Pozo PP (ㅃ)
p_h [pʰ] - - P (ㄆ) P (ㄆ) - P (ㅍ)
p0 [p̚] - - - P (ㄆ; coda) - P (ㅂ; coda)
b [b] Bit Ba (ば) - - Bestia B (ㅂ)
t [t] Ten Ta (た) D (ㄉ) D (ㄉ) Tamis TT (ㄸ)
t_h [tʰ] - - T (ㄊ) T (ㄊ) - T (ㅌ)
t0 [t̚] - - - T (ㄊ; coda) - T (ㄷ; coda)
d [d] Dive Da (だ) - - Dedo D (ㄷ)
k [k] Key Ka (か) G (ㄍ) G (ㄍ) Kilo KK (ㄲ)
k_h [kʰ] - - K (ㄎ) K (ㄎ) - K (ㅋ)
k0 [k̚] - - - K (ㄎ; coda) - K (ㄱ; coda)
g [g] Gas Ga (が) - - Gato G (ㄱ)
Q [ʔ] (glottal stop) T (っ) (glottal stop) (glottal stop) (glottal stop) (glottal stop)
r [r] - - - - Rumbo -
dr [ɾ~ɺ] - Ra (ら) - - caRo R (ㄹ)
V [β] - - - - beBé -
f [f] Fine - F (ㄈ) F (ㄈ) Fase -
v [v] Vine - - - aFgano -
th [θ] absinTHe - - - Cereal -
dh [ð] boTHer - - - dáDiva -
s [s] Song Sa (さ) S (ㄙ) S (ㄒ) Saco SS (ㅆ)
s_h [sʰ] - - - - - S (ㅅ)
z [z] Zoo Za (ざ) - - iSla -
S [ʃ~ʂ] SHin - SH (ㄕ) - SHanghái -
Z [ʒ] aZure - - - - -
sj [ɕ] - - X (ㄒ) S (ㄒ; iotated) - SS (ㅆ; iotated)
sj_h [ɕʰ] - SHi (し) - - - S (ㅅ; iotated)
J [ʝ~ʎ] - - - - LLave -
gh [ɣ] - - - - triGo -
h [x~h] Honey Ha (は) H (ㄏ) H (ㄏ) Jamón H (ㅎ)
ts [ts] piZZa TSu (つ) Z (ㄗ) DZ (ㄐ) queTZal -
ts_h [tsʰ] - - C (ㄘ) TS (ㄑ) - -
dz [dz] paDS - - - - -
tS [tʃ~ʈʂ] - - ZH (ㄓ) - - -
tS_h [tʃʰ~ʈʂʰ] CHop - CH (ㄔ) - oCHo -
dZ [dʒ] Jump - - - - -
tsj [tɕ] - - J (ㄐ) DZ (ㄐ; iotated) - JJ (ㅉ)
tsj_h [tɕʰ] - CHi (ち) Q (ㄑ) TS (ㄑ; iotated) - CH (ㅊ)
dzj [dʑ] - - - - - J (ㅈ)
rH [ɻ] Run - R (ㄖ) - - -
j [j] Yes Ya (や) Y (ㄧ) Y (ㄧ) haY Ya (ㅑ)
wj [ɥ] - - YU (ㄩ) YU (ㄩ) - Wi (ㅟ)
w [w] We Wa (わ) W (ㄨ) W (ㄨ) HUeso Wa (ㅘ)
l [l] Line - L (ㄌ) L (ㄌ) Lino L (ㄹ; coda)
i [i] flEEce I (い) I (ㄧ) I (ㄧ) mÍo I (ㅣ)
y [y] - - YU (ㄩ) YU (ㄩ) - -
uu [ɨ~ɯ] - - I (ㄭ; after z/c/s) - - EU (ㅡ)
u [u] gOOse - U (ㄨ) U (ㄨ) dÚo U (ㅜ)
I [ɪ] kIt - - - - -
U [ʊ] fOOt U (う) - - - -
e [e~ɛ] drEss E (え) Ê (ㄝ) E (ㄝ) mÉxico E (ㅔ)
o [o~ɔ] thOUGHt O (お) O (ㄛ) O (ㄛ) acciÓn O (ㅗ)
E [ə~ɜ] commA - E (ㄜ) A (ㆿ) - EO (ㅓ)
oe [œ~ø~ɵ] - - - OE (ㆾ) - OE (ㅚ)
ae [æ] trAp - - - - AE (ㅐ; nonstandard)
AE [ɐ] strUt - - - - -
a [a] mOuth A (あ) A (ㄚ) A (ㄚ) Álbum A (ㅏ)
A [ɑ] pAlm - - - - -
m= [m̩] - - - M (ㄇ; syllabic) - -
N= [ŋ̩~ɴ̩] - N (ん) - NG (ㄫ; syllabic) - -
rH= [ɚ~ɻ̩] lettER - I (ㄭ; after zh/ch/sh) - - -
4 Likes

This would be more along the lines of what Vocaloid does.

However, it would be a step in the wrong direction.

Each speaker applies their own accent to phonemes, which means that despite how a phoneme appears on paper, it will sound different with different speakers.

You could argue that, by definition, the speaker is doing it wrong if the phonemes can’t be transferred to different languages. But that would assume that some sort of abstract, language independent version of the phonemes exists. The reality is that those phonemes are abstractions, but the actual sound depends on the context.

For example, a phoneme at the front of a word typically doesn’t have a release sound. Think of the /p/, which can have a sort of /h/ or /eh/ sound at the end. You’ll see an attempt to capture that in some phoneme charts with something like a /ph/.

In order to have a universal system that accurately captured this sort of variety, you’d have a massive list of allophones you’d need to add to that chart. And that would be too unwieldy to use.

One of the strengths of SynthesizerV is the ability to capture those language differences, which results in a more natural output that comes “for free” to the user, but results a more restrictive set of phonemes per language.

2 Likes

OK, hows about an addendum to OP’s proposal:
Alongside the 6(and hopefully more in future) available languages, why not have a bonus selectable language in the dropdown. It could be called “Multilingual” or “Global”, and then the accent, as you mentioned, could be selected kind from a multitude of checkboxes (like the checkbox for relaxed phonemes english gets).
I think this feature of having a toggle-able ‘accent override’ for voices could be REALLY helpful beyond just using it with a “Multilingual” language selection.

The way “Accent Override” would work would be as follows:
For each selectable language, you can choose to use the “System Accent” or “Character Accent” and that way, when say you use a Japanese voice such as Yuma, who has a VERY noticeable accent, you could switch between his usual, baked-into-the-voicebank accent given to him by the voice actor, and an averaged out accent that every voicebank has access to. Sure it’d make the voice sound a little different probably, but it’d be a nice creative tool, and I assume would not be enabled by default.

This would also be available during XSL, so say you’re using Yuma and he is struggling with English when using his character accent. You could select System Accent, and it would more aggressively override his pronunciations to match with an average english accent.

Where this feature would become REALLY COOL, could be for adding regional accents to the program. It would be really neat if they added some different English speaking regional accents… Although i think there’s such a large variety of different english accents, they’d probably want to use a drop down selection for this. Accents such as “American”(I don’t think any of the current english voices have a STRONG accent in any direction so this’d probably be what we currently have?? idk) “Southern USA”, “New York”, “London”, “Manchester” (idk much about UK accents, I think there’s supposed to be a lot of em tho) “Australia”… as you can see I’m not really sure if they should be labeled by region, city, or country…

But yeah i still agree with OP aside from my addition of a selectable accent. We need a language option with access to every phoneme, because that could open up custom dictionaries for not yet supported languages. It’d be useful to combine phonemes that come from different langs within single notes.

1 Like