Singer timing is wrong

Hey guys! I’ve bought Synth V a year ago and loved playing around with it. Now that I’m at ease with music production, I decided to use it for real… and realized there is a HUGE timing error between the AI singer and the written notes.

timng error

You can see it here. The soundwave starts much earlier than the note! Using Synth V by itself, I never realized since I had nothing to compare the voice to.
But now I’m using it with instruments, and the singer is just completely off.

How do you guys deal with that issue?
For reference, I’m using Mai, but also bought Frimomen and use free voices ; this problem is systematic.

If you look, it’s just the phoneme ‘k’ which is one of the cases where a singer would voice it prior the written note so the long ‘aaaa’ starts on queue. It sounds natural so I leave it exactly where it is.
It is actually clever programming by the Dreamtronics whizzes.

Sorry, but I disagree. The k should start at the note. I hear the discrepancy very clearly. I’ve spent my week-end ripping apart my project thinking it was due to a setting of mine, only to discover this today…

idk what you could do other than adding a sil before ka, mess around with the slider etc… but almost all vocal synths work like this.

Then drag the note sideways slightly, if your chosen grid resolution is too coarse and ‘snap to grid’ is selected then it will peap sideways to the next grid line but you can either choose a finer grid or deselect snapping then you can align the voice wherever you prefer.

3 Likes

Thank you for your suggestions, guys! I’ll keep digging and come back if I find something. Who knows, maybe the problem lies somewhere else ^^.

Maybe your hearing problem comes from a specific song tempo that is not audible by other songs (slower or faster).
In some cases, even the singer sings with a small decay, it can be a way to interpret (in real life) or to make the melody move. In some songs, I add a small decay directly on my DAW track. It’s easy in Cubase, but all the notes are delayed. The best part is that you don’t need to touch the melody in SynthV.

1 Like

You can disagree, but you’re wrong. :smile:

As a rule, the start of the vowel is aligned with the start of the note - not with the cosonants.

The consonants are placed at the tail of the prior vowel. If there is no vowel, they are placed in the prior silence.There are some exceptions like with semi-vowels /R/ and /L/.

So when you write something like:

| ay | s ih ng | aa | s aa ng |

SynthesizerV (and other vocal synthesis programs) will place the consonants as if they were part of the prior note:

| ay s | ih ng | aa s | aa ng |

You can check it out yourself with your own singing using a tool like Praat.

If you want, you can manually adjust the note so the consonant falls where you want it. But there’s no way override the behavior, because that’s not how leading consonants are sung.

As jfa notes, it’s pretty easy to make adjustments in your DAW when you want to “humanize” a line a bit more without fighting with the grid.

4 Likes

The consonnant and the vowel aren’t separate things, despite the way we humans decrypt sounds ^^'. The vowel starts as soon as the consonnant does. You don’t say k-ah, you say kah. Now if you insist on the consonnant, then I can see why you’d start earlier ; but by default, it’s not the case.

Now the software does split them up. But just load any combination of consonnant and vowel and look for the beginning of the vowel. Even the vowel starts too early, so even if you insist that they are separate things, well, there’s still a problem.

You can think I’m wrong all you want, buddy. I’ll trust my ears, I do hear a desync in my projects. For now I do just like you suggested ^^. If the syncing annoys me, I move around the song track. The problem is not every consonnant has the same “delay”. So I always have to find a compromise for all notes…

So, you say “kah” is correct, somehow claiming that this makes consonant and vowel simultaneous? NOPE! I agree there is no gap but you will struggle to pronounce both simultaneously!

NOPE! I agree there is no gap but you will struggle to pronounce both simultaneously!

You can move each word independently [a normal part of tuning a voice] if you insist that the defaut timing is errenous but — I do this with some words anyway to humanise the singing as we (real people) do NOT sing ‘to the grid’, we regularly start a word before or after the actual beat for dramatic effect.

@dcuny is, as always, spot on.

1 Like

This is Correct. Even if you can bring synth to spell the both k and ah almost simultaneously , it will sound like GAH ang not KAH.

1 Like

If you mean to say that there’s no break between the consonant and the vowel, that’s accurate.

There’s no break between words, either. The phonemes are all connected.

It sounds like you’re saying they start at the same time.

I don’t think that’s what you intended to say.

Just to be clear on terms, here’s a waveform and spectrogram of me saying “cat”:

We agree - the phonemes are smoothly connected together.

Well, you can see there’s a stops before the /t/. There’s also one before the /k/, because they’re “stop” consonants.

But in the large sense, you’re right - “cat” is pronounced as one continuous word, and not broken in to individual phonemes with rests in between them.

Except for stop consonants. :wink:

For that matter, words are also smoothly connected. While we write them with spaces between them, in speech there are no gaps for spaces.

I’m not insisting they are separated things.

I’m saying that part of the word that is at the start of the note is the vowel, not the consonant.

Why do I insist on this? For one thing, analyzing examples of singing will show this to be the case.

For another, I’ve written my own singing synthesis program. I’ve tried putting the consonants first, and it sounds wrong.

Do whatever you want to make the music you want. That’s what these tools are all about. :+1:

But consonants typically do not fall on the start of the note, and the SynthesizerV behavior is correct.

5 Likes

You’re not alone, I’ve experienced this too! Sometimes notes just sound like they’re coming in too early even though they’re snapped to the grid and it really throws off the flow of things; I don’t find it happens in every session but still enough to cause an annoyance. I end up just chopping up the WAV in Pro Tools after I’ve rendered it out of Synth V.

Can you tell us more about that? Is this something that is publicly available?

I never released it because I was never happy with how it sounded.

But you can read all about it here:

2 Likes

Wow, that is insane!

I have just flashed through your documentation - a decade’s worth - I now have eye strain and the beginning of a headache. The amount of detail in this write-up is fantastic, anyone here interested in the inner workings of speech synthesis should take a look.
I admire your dedication in persuing your research for so long and recording your findings so carefully. :man_bowing:

1 Like

Why would you do that when you could simply adjust the timing of every word quite easily in SynthV before rendering - you could alter the timing of every phoneme if you wished.

1 Like

I think if you believe in your abilities and skills for good development, you should not stop, absolutely. :wink:

Thanks, but the reasons for my motivations are no longer the same as when I started the project.

One of the main goals was to fill a gap in the market, for people who wanted free software other than UTAU for singing synthesis. But the free version of SynthesizerV fills that gap.

I really wanted to be able to accomplish the synthesis using formant synthesis, but that technology’s not up to the task. If I get some spare time I may continue working on the project, but even with the results I was getting with the WORLD vocoder, I just couldn’t imagine people wanting to use it when AI technologies outpaced what I was able to do.

2 Likes

I understand. I can only say that with such knowledge as a programmer and developer I would be more ambitious. :wink:

1 Like