ITTO940649A1

ITTO940649A1 - PROCEDURE AND DEVICE FOR THE SYNTHESIS OF VOICE SIGNALS FOR EXAMPLE FOR ELECTROMUSICAL APPLICATIONS.

Info

Publication number: ITTO940649A1
Application number: IT94TO000649A
Authority: IT
Inventors: Andrea Paladin; Paolo Andrenacci
Original assignee: Iris S R L
Priority date: 1994-08-04
Filing date: 1994-08-04
Publication date: 1996-02-04
Also published as: IT1266910B1; ITTO940649A0

Abstract

La soluzione secondo l'invenzione consente, utilizzando l'usuale tastiera (K) di uno strumento elettronico, di caratterizzare la melodia e la velocità di esecuzione di una generica parte cantata registrata in precedenza. Le finalità di tale applicazione possono essere di natura esecutiva, in quanto si da al musicista la possibilità di "suonare" una voce che canta, sia di tipo didattico, nel caso in cui il principiante cerchi di riprodurre sulla tastiera (K) la linea melodica originale della canzone.(fig. 1).The solution according to the invention allows, using the usual keyboard (K) of an electronic instrument, to characterize the melody and the execution speed of a generic singing part previously recorded. The purposes of this application can be of an executive nature, as it gives the musician the opportunity to "play" a singing voice, both didactic, in the event that the beginner tries to reproduce the melodic line on the keyboard (K) original of the song (fig. 1).

Description

DESCRIZIONE dell’invenzione industriale dal titolo: "Procedimento e dispositivo per la sintesi di segnali vocali, ad esemplo per applicazioni elettromusicali" DESCRIPTION of the industrial invention entitled: "Procedure and device for the synthesis of vocal signals, for example for electromusical applications"

TESTO DELLA DESCRIZIONE TEXT OF THE DESCRIPTION

La presente invenzione si riferisce alle tecniche per la sintesi di segnali vocali ed è stata sviluppata con particolare attenzione al possibile impiego nel settore elettromusicale. The present invention relates to techniques for the synthesis of vocal signals and has been developed with particular attention to its possible use in the electromusical sector.

Alle tecniche di sintesi di segnali vocali è stata dedicata, negli ultimi decenni, un’attività di ricerca piuttosto intensa. Parte di tale attività è diretta ad applicazioni di natura generale nel settore delle telecomunicazioni (ad esempio per la generazione di messaggi di avviso con voce sintetica, oppure per la ricostruzione di un segnale vocale intellegibile a partire da un segnale trasmesso con elevata riduzione di ridondanza, ecc.). In recent decades, a rather intense research activity has been dedicated to the techniques of synthesis of speech signals. Part of this activity is aimed at applications of a general nature in the telecommunications sector (for example for the generation of warning messages with synthetic voice, or for the reconstruction of an intelligible voice signal starting from a signal transmitted with a high reduction of redundancy, etc.).

E’ parimenti noto il fatto che nel settore delle applicazioni elettromusicali si è assistito, nel corso degli ultimi anni, ad uno sviluppo di vaste proporzioni delle tecniche di sintesi e - in generale - di controllo della generazione di segnali musicali . It is also known that in the field of electromusical applications there has been, over the last few years, a development of vast proportions of synthesis techniques and - in general - of control of the generation of musical signals.

La presente invenzione si pone il compito^di fornire una soluzione in grado di associare l'azione di sintesi di un segnale vocale, ad esempio corrispondente ad una generica parte cantata registrata in precedenza, con un'azione di esecuzione realizzata sulla tastiera di uno strumento elettronico.Tutto^questjo con lo scopo di fare in modo che la melodia ed il tempo siano determinati dal suonatore, mentre il testo ed il timbro della voce rimangono quelli cantati dall’esecutore originale. The present invention has the task of providing a solution capable of associating the synthesis action of a vocal signal, for example corresponding to a generic previously recorded sung part, with an execution action performed on the keyboard of an instrument. All this is done with the aim of ensuring that the melody and tempo are determined by the player, while the text and the timbre of the voice remain those sung by the original player.

Le finalità che portano a ricavare una simile soluzione possono essere molteplici. Queste possono essere di tipo esecutivo, in quanto si dà al musicista la possibilità di "suonare" una voce che canta, o di tipo didattico, nel caso in cui un principiante cerchi di riprodurre sulla tastiera la linea melodica originale, ad esempio di una canzone. Ancora,una tale soluzione è vantaggiosa nelle applicazioni in cui sia necessario accelerare o rallentare il cantato senza distoreere la voce al fine di sincronizzarlo con altre fonti, sonore o visive. The purposes that lead to obtaining such a solution can be manifold. These can be of an executive type, as the musician is given the opportunity to "play" a singing voice, or of an educational type, in the event that a beginner tries to reproduce the original melodic line on the keyboard, for example of a song. . Again, such a solution is advantageous in applications where it is necessary to speed up or slow down the singing without distorting the voice in order to synchronize it with other sound or visual sources.

Secondo la presente Invenzione, tale scopo viene raggiunto grazie ad un procedimento avente le caratteristiche richiamate in modo specifico nelle rivendicazioni che seguono. L'Invenzione ha anche per oggetto un dispositivo per implementare tale procedimento . According to the present invention, this object is achieved thanks to a process having the characteristics referred to specifically in the following claims. The invention also relates to a device for implementing this method.

L’invenzione verrà ora descritta,a puro titolo di esemplo non limitativo, con riferimento ai disegni annessi, in cui: The invention will now be described, purely by way of non-limiting example, with reference to the attached drawings, in which:

la flg.l è uno schema a blocchi che illustra la struttura di un dispositivo di sintesi operante secondo l'invenzione, Figure 1 is a block diagram illustrating the structure of a synthesis device operating according to the invention,

la fig.2 illustra in termini schematici lo svolgimento delle funzioni di sintesi in un dispositivo secondo l’invenzione, Fig. 2 illustrates in schematic terms the performance of the synthesis functions in a device according to the invention,

la fig.3 illustra, sotto forma di diagramma di flusso, la sequenza delle varie funzioni svolte in un dispositivo di sintesi operante secondo l’invenzione, e Fig. 3 illustrates, in the form of a flow chart, the sequence of the various functions performed in a synthesis device operating according to the invention, and

la fig.4 illustra in termini schematici ulteriori caratteristiche dell'invenzione. Fig. 4 schematically illustrates further characteristics of the invention.

Principi alla base dell’invenzione Principles underlying the invention

Come premessa alla descrizione particolareggiata dell'invenzione, sarà utile richiamare alcuni fondamenti di base su cui essa si fonda. As a premise to the detailed description of the invention, it will be useful to recall some basic foundations on which it is based.

Per produrre una voce di sintesi viene utilizzato, secondo l’invenzione, un modello nel quale l’emissione vocale viene suddivisa in due fasi indipendenti: eccitazione e risonanza. To produce a synthetic voice, according to the invention, a model is used in which the vocal emission is divided into two independent phases: excitation and resonance.

L.a fase di eccitazione tiene conto del fat»to che il suono sia classificabile come sonoro (voiced) ovvero sordo (unvoiced). The excitation phase takes into account the fact that the sound can be classified as voiced or unvoiced.

Nel caso di un evento di tipo sonoro, l’eccitazione viene attuata attraverso la generazione di un treno di impulsi che simula il treno di impulsi prodotto, nella fonazione umana, dalle corde vocali prima dell’ingresso nel tratto vocale. La frequenza della nota emessa (dunque l’altezza o pitch della nota) dipende esclusivamente da questo elemento. In the case of a sound event, the excitation is implemented through the generation of a train of impulses that simulates the train of impulses produced, in human speech, by the vocal cords before entering the vocal tract. The frequency of the note emitted (therefore the height or pitch of the note) depends exclusively on this element.

Nel caso dei suoni sordi, in cui le corde vocali non intervengono, l'eccitazione viene in pratica realizzata sotto forma di sorgente di rumore. In the case of deaf sounds, in which the vocal cords do not intervene, the excitation is practically realized in the form of a noise source.

La fase di risonanza è invece destinata a considerare l’effetto filtrante che la cavità orale effettua sul suono prodotto dall’eccitazione. L'inviluppo spettrale del suono emesso, che determina la differenza fra due diverse vocali, dipende esclusivamente da questo elemento. The resonance phase is instead intended to consider the filtering effect that the oral cavity has on the sound produced by the excitation. The spectral envelope of the emitted sound, which determines the difference between two different vowels, depends exclusively on this element.

La divisione considerata (eccitazione/risonanza) tiene anche conto del fatto che il movimento della cavità orale è molto più lento del modo vibratorio delle corde vocali, per cui i due fenomeni sono analizzabili separatamente, in quanto si possono considerare disaccoppiati. The division considered (excitation / resonance) also takes into account the fact that the movement of the oral cavity is much slower than the vibratory mode of the vocal cords, so the two phenomena can be analyzed separately, as they can be considered decoupled.

Una possibile simulazione della cavità orale, a livello di modello, si ottiene dìscretizzando la cavità in un insieme di tubi in cascata. La larghezza reciproca dei tubi in un dato istante determina le frequenze di risonanza della cavità, chiamate formanti, che caratterizzano il suono prodotto. A possible simulation of the oral cavity, at model level, is obtained by de-writing the cavity in a set of cascaded tubes. The reciprocal width of the tubes at a given instant determines the resonance frequencies of the cavity, called formants, which characterize the sound produced.

Se,, ad esempio, si considera un insieme di 25 tubi posti in cascata, la forma del tratto vocale in un dato istante potrà essere modellata ricorrendo a 25 parametri, ciascuno corrispondente al diametro del rispettivo tubo. If, for example, we consider a set of 25 tubes placed in cascade, the shape of the vocal tract at a given instant can be modeled using 25 parameters, each corresponding to the diameter of the respective tube.

Data la sua inerzia meccanica, il tratto vocale si muove con relativa lentezza, per cui il modello può evolvere in modo corrispondente: ad esempio, la sperimentazione dimostra che è sufficiente aggiornare i parametri di cui sopra ogni 10-30 ms, senza che l’orecchio sia in grado di cogliere effetti di discontinuità. Il fenomeno è simile a quello che accade per il fotogrammi di una pellicola cinematografica o di una ripresa televisiva, che danno l'impressione di movimento pure essendo una sequenza di immagini statiche aggiornate, ad esempio ogni 40 ms. Given its mechanical inertia, the vocal tract moves relatively slowly, so the model can evolve in a corresponding way: for example, experimentation shows that it is sufficient to update the above parameters every 10-30 ms, without the ear is able to grasp the effects of discontinuity. The phenomenon is similar to what happens for the frames of a film or a television shoot, which give the impression of movement even if they are a sequence of static images updated, for example every 40 ms.

Per l'ulteriore approfondimento dei criteri in precedenza descritti, si può utilmente consultare l’opera di riferimento "Digital Processing Of Speech Signals" di L.R.Rabiner e R.W.Schafer, Prentice-Hall, Ine., Englewood Cliffs, New Jersey, 1978. Descrizione di un esempio di attuazione dell'invenzione For further study of the criteria described above, one can usefully consult the reference work "Digital Processing Of Speech Signals" by L.R.Rabiner and R.W.Schafer, Prentice-Hall, Ine., Englewood Cliffs, New Jersey, 1978. Description of an example of embodiment of the invention

In una sua possibile forma di attuazione, il procedimento secondo l'invenzione viene Implementato nell'ambito di un sistema di sintesi di segnali elettromusicali , configurato secondo l'architettura illustrata in dettaglio nella fig.l. In one of its possible embodiments, the method according to the invention is implemented in the context of a synthesis system of electromusical signals, configured according to the architecture illustrated in detail in fig.

In modo specifico, un tale sistema comprende i seguenti elementi: Specifically, such a system includes the following elements:

una tastiera K (ad esempio una cosiddetta "master keyboard” o equivalente) sulla quale un esecutore può impartire comandi corrispondenti alle note che intende eseguire ed alle modalità di esecuzione delle stesse (scelta dei timbri, ecc.), un'unità o modulo di sintesi S che riceve i segnali di ingresso della tastiera K ed è in grado di generare, a partire da tali segnali, segnali numerici corrispondenti al suono di sintesi, a K keyboard (for example a so-called "master keyboard" or equivalent) on which a performer can issue commands corresponding to the notes he intends to play and the way they are played (choice of timbres, etc.), a unit or module of synthesis S which receives the input signals of the keyboard K and is able to generate, starting from these signals, numeric signals corresponding to the synthesis sound,

un'interfaccia di conversione I che riceve dal modulo di sintesi S i suddetti segnali numerici convertendoli in un segnale elettrico di tipo analogico, amplificandolo cosi da renderlo suscettibile di essere riprodotto ad esempio attraverso un altoparlante L ovvero di essere registrato su un sopporto di registrazione. a conversion interface I which receives from the synthesis module S the aforesaid digital signals converting them into an electrical signal of the analog type, amplifying it so as to make it capable of being reproduced for example through a loudspeaker L or of being recorded on a recording medium.

Sistemi di sintesi del tipo sopra specificati sono ampiamente noti nella tecnica, sia sotto forma di strumenti musicali elettronici, sia sotto forma di sistemi per l’elaborazione di segnali elettromusicali, .ad esempio del tipo operante secondo lo standard normalizzato denominato MIDI. Synthesis systems of the type specified above are widely known in the art, both in the form of electronic musical instruments, and in the form of systems for processing electromusical signals, for example of the type operating according to the normalized standard called MIDI.

In una sua tipica forma di attuazione, l’invenzione prevede, nell'ambito dell'unità di sintesi S, la predisposizione dimezzi elaborativi (suscettibili di operare in parallelo ed in modo coordinato con tutte le altre funzioni predisposte e predisponibili in una tale unità S) per svolgere, in due rispettivi moduli indicati con 1 e 2, le due funzioni di eccitazione e di risonanze descritte in precedenza. In one of its typical embodiments, the invention provides, in the context of the synthesis unit S, for the preparation of processing means (capable of operating in parallel and in a coordinated manner with all the other functions set up and predisposed in such a unit S ) to perform, in two respective modules indicated with 1 and 2, the two excitation and resonance functions described above.

In particolare, la funzione di eccitazione 1 basa il suo funzionamento su due parametri, vale a dire: In particular, the excitation function 1 bases its operation on two parameters, namely:

un indicatore sordo/sonoro (VUV) che consente di scegliere fra rumore e treno di impulsi, e in questo secondo caso (evento sonoro) la frequenza (pitch) a cui si deve emettere questo treno di impulsi, frequenza scelta in funzione della posizione del tasto nell’ambito della tastiera K o del cosiddetto pitch bender (nel caso di controllo MI-DI) . a deaf / audible indicator (VUV) that allows you to choose between noise and pulse train, and in this second case (sound event) the frequency (pitch) at which this pulse train is to be emitted, frequency chosen according to the position of the key within the K keyboard or the so-called pitch bender (in the case of MI-DI control).

Di solito, l’indicatore VUV viene generato in sede di analisi (secondo le modalità meglio descritte nel seguito), con l’eventuale possibilità di immettere da tastiere un comando che impone di avere costantemente un suono sordo (VUV - sordo). Usually, the VUV indicator is generated during the analysis (according to the methods better described below), with the possible possibility of entering a command from keyboards that requires you to constantly have a deaf sound (VUV - deaf).

In pratica, la funzione di scelta sordo/sonoro viene attuata nell’ambito di un modulo 3 che invia verso la funzione 2 (risonanza), o il treno di impulsi (impulso glottale) generato da un modulo 4 con la frequenza stabilita dalla posizione del tasto schiacciato dalla tastiera K ovvero una sequenza pseudo-casuale generata da un modulo 5 ed avente le caratteristiche di rumore (bianco o colorato). In practice, the deaf / sound choice function is implemented in the context of a module 3 which sends towards function 2 (resonance), or the train of impulses (glottal impulse) generated by a module 4 with the frequency established by the position of the key pressed by the keyboard K or a pseudo-random sequence generated by a module 5 and having the characteristics of noise (white or colored).

Va naturalmente sottolineato che, come già precisato in precedenza facendo riferimento alle caratteristiche generali del modulo S, le suddette funzioni di elaborazione vengono svolte operando su segnali digitali, sotto il controllo generale di un modulo di lettura 6. It should naturally be emphasized that, as already specified previously with reference to the general characteristics of the module S, the above processing functions are performed by operating on digital signals, under the general control of a reading module 6.

La funzione 2, che svolge la funzione di risonanza, si compone essenzialmente di due moduli, vale a dire un banco di filtri 7 ed un elemento di regolazione del guadagno 8. Function 2, which performs the resonance function, essentially consists of two modules, namely a filter bank 7 and a gain adjustment element 8.

Come già si è detto, il funzionamento dei moduli 3, 7 e 8 è controllato, secondo modalità che verranno illustrate in maggior dettaglio nel seguito, dal modulo di lettura 6 che agisce su una memoria 9 (tipicamente una RAM) in cui sono memorizzate le informazioni relative al segnale di sintesi. Al riguardo, nell'ambito dell’invenzione, è possibile ricorrere tanto a segnali registrati una volta per tutte (e dunque disponibili come "banca dei suoni”; ad esemplo su un floppy-disc), quanto a segnali di volta in volta ottenuti campionando un evento sonoro vocale rilevato, ad esempio con un microfono M ed analizzandolo internamente al modulo S, cosi da ottenere rispettivi coefficienti da scrivere nella memoria 9. As already mentioned, the operation of modules 3, 7 and 8 is controlled, according to methods which will be illustrated in greater detail below, by the reading module 6 which acts on a memory 9 (typically a RAM) in which the information relating to the synthesis signal. In this regard, within the scope of the invention, it is possible to use both signals recorded once and for all (and therefore available as a "sound bank"; for example on a floppy-disc), as well as signals obtained from time to time by sampling a vocal sound event detected, for example with a microphone M and analyzing it internally to the module S, so as to obtain respective coefficients to be written in the memory 9.

Quale che sia l’origine, il segnale vocale che si intende utilizzare per la sintesi viene suddiviso im fette o trame temporali (frame) consecutive di lunghezza pari ad esempio a 25 ms. Per ogni frame di segnale è possibile ricavare in modo automatico l'insieme dei parametri della funzione di risonanza 2 precedentemente definiti.Tutto ciò utilizzando un metodo analitico chiamato codifica a predizione lineare o in breve LPC. In questo metodo, l'insieme dei tubi (ossia l’insieme dei loro parametri caratteristici) viene descritto tramite un filtro a soli poli (implementato come filtro numerico 7). I parametri che descrivono le sezioni dei tubi diventano i coefficienti del filtro in questione, mentre un ulteriore parametro di guadagno (o gain), destinato ad essere utilizzato per pilotare l'elemento 8,viene calcolato in base all'ampiezza del suono che entra nel filtro. Una seconda analisi è necessaria per determinare se il trame di segnale sia sordo o sonoro (in funzione dello svolgimento della funzione del modulo 3) e - in caso di evento sonoro - per calcolare il valore della frequenza emessa (altezza o pitch del suono), anche se quest'informazione non viene utilizzata quando si determina il pitch con la tastiera . Whatever the origin, the voice signal to be used for the synthesis is divided into consecutive images or frames of a length equal to, for example, 25 ms. For each signal frame it is possible to automatically obtain the set of parameters of the resonance function 2 previously defined, all using an analytical method called linear prediction coding or LPC for short. In this method, the set of pipes (i.e. the set of their characteristic parameters) is described through a single-pole filter (implemented as a numerical filter 7). The parameters that describe the sections of the tubes become the coefficients of the filter in question, while a further gain parameter, intended to be used to drive element 8, is calculated based on the amplitude of the sound entering the filter. A second analysis is necessary to determine whether the signal frame is deaf or sonorous (depending on the performance of the function of module 3) and - in the event of a sound event - to calculate the value of the emitted frequency (height or pitch of the sound), although this information is not used when determining pitch with the keyboard.

Una volta ottenuti questi parametri che descrivono separatamente l’eccitazione e la risonanza, è possibile ottenere una risintesi della voce che controlla la frequenza della nota emessa (e quindi della linea melodica del canto o della prosodia nel parlato) In modo indipendente dall’articolazione del tratto vocale (e quindi della sequenza di parole associata alla linea melodica). Once these parameters have been obtained, which describe excitation and resonance separately, it is possible to obtain a resynthesis of the voice that controls the frequency of the emitted note (and therefore of the melodic line of the song or of the prosody in the speech). vocal tract (and therefore of the sequence of words associated with the melodic line).

La visione diagrammatica della fig.2 fa vedere come per ogni fraine di segnale vocale vengono ricavati i parametri associati, consentendo di marcare i punti in cui avvengono 1 cambiamenti nella linea melodica . The diagrammatic view of fig. 2 shows how the associated parameters are obtained for each vocal signal pattern, allowing to mark the points in which the changes in the melodic line occur.

Nella parte alta della visione diagrammatica della flg.2, è rappresentato, in funzione del tempo t e con riferimento ad un’ordinata indicativa dell'ampiezza in unità arbitrarie, l’andamento di un segnale sonoro vocale V. In the upper part of the diagrammatic view of Fig. 2, the trend of a vocal sound signal V.

Come si è detto, il segnale vocale V(t) {che si supporrà in generale espresso in forma numerica, ossia come sequenza di campioni numerizzati) viene suddiviso in fette (frame) consecutive aventi una lunghezza di frame data. I frame in questione sono sfalsati gli uni rispetto agli altri da un'entità data (indicata in fig.2 come "spostamento" ΔΤ). scelta in modo da risultare inferiore all'ampiezza di ciascuna finestra corrispondente ad un rispettivo frame. Tutto questo in modo tale da far si che i segmenti di segnale corrispondenti a due frame adiacenti siano almeno in parte giuntivi ovvero sovrapponentisi . As stated, the vocal signal V (t) (which will generally be assumed to be expressed in numerical form, ie as a sequence of numbered samples) is divided into consecutive slices (frames) having a given frame length. The frames in question are offset from each other by a given entity (indicated in fig.2 as "displacement" ΔΤ). chosen so as to be smaller than the width of each window corresponding to a respective frame. All this in such a way as to ensure that the signal segments corresponding to two adjacent frames are at least partly splicing or overlapping.

Dì conseguenza, in fase di sintesi (si veda al riguardo la successiva parte della descrizione) la frequenza del segnale vocale è data solo dall’elemento 4 ed in particolare dall'intervallo di tempo che separa gli impulsi generati da tale elemento, che simula la funzione delle corde vocali. Sempre in fase di sintesi, la velocità di scorrimento della frase è data solo dall'elemento 7, ovvero dalla velocità con cui si aggiornano i suoi coefficienti. Se si parla più velocemente (se il filtro modula la stessa fcase più velocemente), si riduce ilΔΤ tra il caricamento del parametri relativi a due fraine adiacenti . Consequently, in the synthesis phase (see the following part of the description in this regard) the frequency of the voice signal is given only by element 4 and in particular by the time interval that separates the impulses generated by this element, which simulates the function of the vocal cords. Also in the synthesis phase, the speed of the sentence is given only by element 7, that is, by the speed with which its coefficients are updated. If you speak faster (if the filter modulates the same fcase faster), you reduce the ΔΤ between loading the parameters relative to two adjacent frames.

Il segmento di segnale corrispondente a ciascun frame viene analizzato secondo il programma di codifica a predizione lineare (LPC) così da generare, per ciascun frame, un rispettivo record (R1, R2, Rn nella rappresentazione schematica della fig.2, dove Rn rappresenta in generale il record relativo al frame di ordine n) contenente le informazioni del rispettivo frame. The signal segment corresponding to each frame is analyzed according to the linear prediction coding program (LPC) so as to generate, for each frame, a respective record (R1, R2, Rn in the schematic representation of fig. 2, where Rn represents in general the record relating to the frame of order n) containing the information of the respective frame.

In particolare, ciascun record contiene - in relazione alla funzione di eccitazione - le informazioni relative all’emissione di tipo sonoro o sordo (flag VUV) e - nel caso di emissione sonora - l'informazione relativa all'altezza (pitch), ovverosia alla frequenza del treno di impulsi (modulo oscillatore 4) che deve pilotare la sintesi, se non la si decide utilizzando la tastiera. In particular, each record contains - in relation to the excitation function - the information relating to the sound or deaf emission (flag VUV) and - in the case of sound emission - the information relating to the pitch, that is to frequency of the train of impulses (oscillator module 4) which must drive the synthesis, if it is not decided using the keyboard.

Per quanto riguarda invece la funzione di risonanza, ciascun record contiene essenzialmente le due informazioni seguenti: As for the resonance function, each record essentially contains the following two information:

i coefficienti (in numero di venticinque nell'esempio di attuazione illustrato) che consentono di realizzare nel filtro 7 la funzione di risonanza, e the coefficients (twenty-five in the illustrated embodiment example) which allow the resonance function to be realized in the filter 7, and

informazione sul guadagno (gain) per il pilotaggio dell'elemento 8. gain information for driving element 8.

Oltre a ciò, la soluzione secondo l'invenzione consente di marcare i punti in cui avvengono cambiamenti nella linea melodica. In particolare, ciascun record porta associato un marker relativo al corrispondente framee che vale "stop” se nella fase di sintesi ci si dovrà fermare in attesa di una nuova nota dalla tastiera K, ovvero ”go” in caso contrario. In addition to this, the solution according to the invention allows to mark the points in which changes occur in the melodic line. In particular, each record bears an associated marker relating to the corresponding frame which is "stop" if in the synthesis phase one has to stop waiting for a new note from the K keyboard, or "go" if not.

Ciascuno dei parametri di risonanza (nel seguito si farà sempre riferimento alla presenza di venticinque parametri di questo tipo)viene memorizzato dal programma In modo sequenziale: se ad esemplo C„(i) indica il coefficiente n-esimo del filtro di ricostruzione relativo al frante i, si avrà la seguente disposizione dei coefficienti: Each of the resonance parameters (hereinafter reference will always be made to the presence of twenty-five parameters of this type) is stored by the program sequentially: if for example C "(i) indicates the n-th coefficient of the reconstruction filter relating to the frant i, you will have the following arrangement of the coefficients:

Dopo questa fase, si avranno quindi a disposizione tutti i parametri relativi all’analisi della voce di partenza. After this phase, all the parameters relating to the analysis of the starting item will then be available.

Per chiarezza di illustrazione, va segnalato che la tecnica di calcolo dei coefficienti in questione viene attuata secondo criteri ampiamente noti nella tecnica e che non richiedono pertanto di essere illustrati in questa sede. Per una generale illustrazione dei criteri che regolano l'applicazione di un algoritmo LPC al contesto qui considerato, si può fare utilmente riferimento all’articolo "Linear Prediction: A Tutorial Review” di JohnMakhoul,pubblicato sui Proceedings of thè IEEE, aprile 1975, pagg. For the sake of clarity of illustration, it should be noted that the calculation technique of the coefficients in question is carried out according to criteria widely known in the art and which therefore do not need to be illustrated here. For a general illustration of the criteria that regulate the application of an LPC algorithm to the context considered here, one can usefully refer to the article "Linear Prediction: A Tutorial Review" by John Makhoul, published in the Proceedings of the IEEE, April 1975, pages .

561-580. 561-580.

Una volta memorizzati nella memoria 9, i dati in questione vengono utilizzati per pilotare le funzioni 1 e 2 di cui alla fig.l, di solito implementate sotto forma di un microprocessore dedicato al calcolo veloce (DSP) che consente la risintesi del segnale originale e la sua modifica (tramite dispositivi di controllo associati alla tastiera κ e/o ad esempio secondo tipiche funzioni MIDI) per ottenere particolari effetti o fini musicali. Once stored in the memory 9, the data in question are used to drive the functions 1 and 2 of fig. 1, usually implemented in the form of a microprocessor dedicated to fast calculation (DSP) which allows the resynthesis of the original signal and its modification (through control devices associated with the κ keyboard and / or for example according to typical MIDI functions) to obtain particular effects or musical purposes.

Alcuni dei parametri che possono essere variati nella rlsintesi sono analizzati in seguito. Some of the parameters that can be varied in the synthesis are analyzed below.

In primo luogo, è possibile variare la scelta del tipo dì eccitazione. Si può ad esemplo eccitare il tratto vocale utilizzando un impulso simile a quello prodotto dalle corde vocali per ottenere una voce realistica. Però è anche possibile utilizzare il suono .di un’orchestra ed avere l'impressione che essa parli, o ancora è possibile utilizzare solo rumore per avere lo stesso effetto che si ottiene parlandò senza utilizzare le corde vocali. First, it is possible to vary the choice of the type of arousal. For example, the vocal tract can be excited by using an impulse similar to that produced by the vocal cords to obtain a realistic voice. However, it is also possible to use the sound of an orchestra and get the impression that it is speaking, or you can only use noise to have the same effect that you get when speaking without using the vocal cords.

Un altro parametro variabile è la velocità di scorrimento della frase in avanti o al1’indietro.Va considerato che queste variazioni di velocità non producono modifiche sulla naturalezza della voce sintetizzata, al contrario di quanto accadrebbe per una voce registrata. Infatti,mentre nel secondo caso se si accelera la frase si al2a anche la frequenza fondamentale (pitch) e si deforma lo spettro originale, ricorrendo ad una sintesi tipo la sintesi LPC il pitch e la forma dello spettro non dipendono dalla velocità con cui si fanno muovere ì tubi che descrivono il tratto vocale, ovvero i coefficienti del filtro di sintesi. In pratica, con riferimento alla rappresentazione della fig.2, è possibile far variare la velocità di scorrimento della frase in avanti o all'indietro in modo del tutto indipenden»te dal pitch, che è governato solo dalla frequenza di eccitazione dell’oscillatore 4. Another variable parameter is the speed at which the phrase scrolls forward or backward. It should be considered that these speed variations do not produce changes on the naturalness of the synthesized voice, unlike what would happen for a recorded voice. In fact, while in the second case, if the phrase is accelerated, the fundamental frequency (pitch) is also raised and the original spectrum is deformed, using a synthesis such as LPC synthesis, the pitch and shape of the spectrum do not depend on the speed with which they are made. move the tubes that describe the vocal tract, or the coefficients of the synthesis filter. In practice, with reference to the representation of fig. 2, it is possible to vary the speed of the phrase sliding forward or backward in a completely independent way from the pitch, which is governed only by the excitation frequency of the oscillator 4. .

Ancora un'altra possibilità consiste nel poter scegliere fra l'uso della frequenza data dall'analisi, ovvero l’uso di una frequenza scelta agendo sulla tastiera K. Questo significa, nel caso di una voce che canta, avere la possibilità di usare la melodia originale oppure di suonare sulla tastiera una nuova linea melodica. In questo modo, è possibile far cantare una voce agendo sui tasti e - se si sbaglia una nota - sentire una "stecca”. Yet another possibility consists in being able to choose between the use of the frequency given by the analysis, or the use of a frequency chosen by acting on the keyboard K. This means, in the case of a singing voice, having the possibility to use the original melody or to play a new melody line on the keyboard. In this way, it is possible to make a voice sing by acting on the keys and - if a note is wrong - hear a "cue".

Specialmente questi ultimi due punti consentono numerose applicazioni in campo musicale e didattico. Especially these last two points allow numerous applications in the musical and didactic field.

Ad esemplo vien facilitata la generazione di armonizzazioni vocali particolarmente sofisticate, con l’impiego di intervalli di difficile intonazione. For example, the generation of particularly sophisticated vocal harmonizations is facilitated, with the use of intervals that are difficult to intonate.

Si può poi rallentare od accelerare tramite un controllo la voce di un cantante senza che questa perda naturalezza,suonando contemporaneamente sulla tastiera la melodia della canzone. You can then slow down or speed up a singer's voice through a control without losing its naturalness, playing the melody of the song on the keyboard at the same time.

In sintesi, per realizzare la soluzione secondo l'Invenzione sono quindi necessari 1 seguenti passi preliminari: In summary, the following preliminary steps are therefore necessary to implement the solution according to the invention:

acquisizione della voce che canta una data canzone (tramite campionamento diretto o utilizzando un sopporto di registrazione associato al sistema S), analisi LPC e memorizzazione del coefficienti ottenuti (attuata secondo criteri di per sé noti), e acquisition of the voice singing a given song (by direct sampling or using a recording medium associated with the S system), LPC analysis and memorization of the coefficients obtained (carried out according to criteria known per se), and

(di_preferenza, ma non in modo assolutamente necessario) marcamento con dei flag o marker (stop o go) dei punti in cui cambiano le note sulla melodia originale e loro memorizzazione. (di_preference, but not absolutely necessary) marking with flags or markers (stop or go) of the points where the notes on the original melody change and their memorization.

Va notato che quest'ultima funzione,pur essendo evolta di preferenza da un operatore umano, si presta (almeno in linea di principio) ad essere svolta anche in modo automatico utilizzando ad esempio programmi per la lettura in modo automatico di spartiti musicali. It should be noted that this latter function, although preferably carried out by a human operator, lends itself (at least in principle) to being carried out also in an automatic way using, for example, programs for the automatic reading of musical scores.

Una volta che i dati siano disponibili nella memoria 9, è possibile effettuare la sintesi della frase analizzata guidata dalla tastiera K secondo le modalità brevemente riassunte nel seguito. Once the data are available in the memory 9, it is possible to carry out the synthesis of the analyzed sentence guided by the keyboard K according to the methods briefly summarized below.

Nel momento in cui si preme per la prima volta un tasto della tastiera K, l’algoritmo implementato dalle funzioni 1 e 2 della fig.1 Inizia a sintetizzare la voce. In particolare, la frequenza dell'oscillatore 4 è determinata dal tasto premuto. I coefficienti del filtro di sintesi (risonatori) 7 vengono aggiornati periodicamente ad una velocità determinata dal controllo. Più è veloce questo aggiornamento, più la frase viene percorsa velocemente. When a key on the keyboard K is pressed for the first time, the algorithm implemented by functions 1 and 2 of fig.1 begins to synthesize the voice. In particular, the frequency of the oscillator 4 is determined by the key pressed. The coefficients of the synthesis filter (resonators) 7 are periodically updated at a rate determined by the control. The faster this update, the faster the sentence is traversed.

Nell'aggiornamento dei coefficientidel filtro, è preferibile interpolare i parametri al fine di evitare che il brusco cambio di un coefficiente introduca delle discontinuità nella sintesi. Se ad esempio consideriamo il coefficiente C(K) del filtro, il passaggio tra due valori temporalmente consecutivi CT(K) e CT.+1(K) si potrà realizzare con una f i d l i When updating the filter coefficients, it is preferable to interpolate the parameters in order to avoid that the sudden change of a coefficient introduces discontinuities in the synthesis. For example, if we consider the coefficient C (K) of the filter, the passage between two temporally consecutive values CT (K) and CT. + 1 (K) can be achieved with a f i d l i

Quando la sintesi raggiunge il marcatore che segnala il cambio di nota della melodia, i coefficienti del filtro 7 non vengono più aggiornati, per cui la sintesi rimane sull’ultimo suono emesso prima del cambio di nota fino a che non viene premuto un nuovo tasto. Questo determina l’aggiornamento della frequenza dell’oscillatore di eccitazione con un valore corrispondente al tasto premuto e la ripresa dell’aggiornamento periodico dei coefficienti del filtro. La procedura prosegue fino all'esaurimento della frase. When the synthesis reaches the marker that signals the change of note of the melody, the coefficients of filter 7 are no longer updated, so the synthesis remains on the last sound emitted before the note change until a new key is pressed. This determines the updating of the excitation oscillator frequency with a value corresponding to the key pressed and the resumption of the periodic updating of the filter coefficients. The procedure continues until the sentence is exhausted.

La fig.3 illustra In maggior dettaglio, sotto forma di diagramma di flusso, l’attuazione, secondo l'invenzione, di un procedimento che consente di sintetizzare una voce cantata controllando la linea melodica e la velocità del canto con l’uso della tastiera K. Fig. 3 illustrates in greater detail, in the form of a flow chart, the implementation, according to the invention, of a procedure that allows to synthesize a sung voice by controlling the melodic line and the speed of the song with the use of the keyboard. K.

Nell'esecuzione umana di parti cantate, però, anche in presenza di note tenute molto lunghe, il timbro e la frequenza di emissione non rimane costante per tutta la durata della nota,ma viene leggermente modulato. Queste piccole modulazioni assumono quasi sempre degli andamenti ciclici e la loro presenza contribuisce notevolmente alla naturalezza del suono prodotto, una semplice modifica nel cala -lo della fase di aggiornamento dei coefficienti dei filtri simula questo tipo di modulazioni. Quando, facendo riferimento all'algoritmo della fig.3, si raggiunge un marker, invece di bloccare l’aggiornamento dei parametri lo si fa ciclare tra gli ultimi frames che precedono il marker. In the human performance of sung parts, however, even in the presence of very long notes, the timbre and the emission frequency do not remain constant for the entire duration of the note, but are slightly modulated. These small modulations almost always assume cyclical patterns and their presence greatly contributes to the naturalness of the sound produced, a simple modification in the drop of the filter coefficient update phase simulates this type of modulation. When, referring to the algorithm in fig. 3, a marker is reached, instead of blocking the updating of the parameters, it is cycled between the last frames preceding the marker.

Nella flg.4 è mostrato un grafico di esempio: sull'asse dell'ascisse è mostrato il tempo, su quello delle ordinate la fase di aggiornamento dei coefficienti dei filtri: Fig. 4 shows an example graph: the time is shown on the abscissa axis, the filter coefficient update phase on the ordinate axis:

a) in un dato istante tl, viene premuto un tasto che fa partire l’aggiornamento dei parametri dal marker corrente che supponiamo essere markl; a) at a given instant tl, a key is pressed which starts the updating of the parameters from the current marker which we assume to be markl;

b) quando l'aggiornamento dei aprametri raggiunge il successivo marker, l'aggiornamento degli stessi comincia a clclare fra gli ultimi valori mandati in un range di alcuni framesj b) when the update of the aprameters reaches the next marker, the update of the same begins to clare among the last values sent in a range of a few framesj

c) nell'istante t2, viene premuto un altro tasto: l’aggiornamento dei parametri riprende dal marker successivo (mark2) e continua in maniera analoga. c) at the instant t2, another key is pressed: the updating of the parameters resumes from the next marker (mark2) and continues in the same way.

Naturalmente, si suppone di avere precedentemente resi disponibili (secondo i criteri descritti in maggiore dettaglio in precedenza) tutti i parametri necessari alla sintesi della voce (record Ri, R2, ... Rn della fig.2), inclusi i flag che definiscono i punti in cui è necessario un cambio di nota. Naturally, it is assumed to have previously made available (according to the criteria described in greater detail above) all the parameters necessary for the synthesis of the voice (record Ri, R2, ... Rn of fig. 2), including the flags that define the points where a note change is required.

ognuno di questi dati è memorizzato sequenzialmente in una tabella contenuta nella memoria 9 del sistema S. each of these data is stored sequentially in a table contained in the memory 9 of the S system.

Dopo 1’inizializzazione del sistema (passo 100 dello schema della fig.3), un passo 101'determina il fraine di segnale Iniziale da sintetizzare, puntando le locazioni della memoria 9 che contengono 1 valori correnti dei parametri. Questo avviene In funzione di un parametro di conteggio PH, che inizialmente viene messo ad un valore di partenza (0) per poi essere gradualmente incrementato in funzione di un valore di passo (step) che costituisce uno dei parametri suscettibili di essere Immessi selettivamente dall 'utilizzatore (ad esempio agendo sulla tastiera K, tipicamente su un registro ad essa associato} così da determinare la velocità di emissione della linea cantata. After the initialization of the system (step 100 of the diagram of fig. 3), a step 101 'determines the initial signal to be synthesized, pointing to the locations of memory 9 which contain the current values of the parameters. This occurs as a function of a PH counting parameter, which is initially set to a starting value (0) and then gradually increased as a function of a step value which constitutes one of the parameters capable of being selectively entered by the user (for example by acting on the keyboard K, typically on a register associated with it) so as to determine the emission speed of the sung line.

Il passo 102 che segue il passo 101 nel diagramma di flusso della fig.3 è essenzialmente un passo di attesa, nel corso del quale il programma attende dalla tastiera K il segnale indicativo del fatto che è stato premuto un particolare tasto. The step 102 which follows the step 101 in the flow chart of Figure 3 is essentially a wait step, during which the program waits from the keyboard K for the signal indicating that a particular key has been pressed.

Assicuratosi che ciò sia avvenuto, il programma si avvia al successivo passo 103 che corrisponde alla lettura del flag che può assumere (secondo i criteri descritti in precedenza con specifica attenzione alla rappresentazione schematica della fig.2) i valori logici "stop” e ”go”. L’accertamento della natura di questo flag viene attuato in un successivo passo di scelta 104. Having made sure that this has happened, the program starts at the next step 103 which corresponds to the reading of the flag which can assume (according to the criteria described above with specific attention to the schematic representation of fig. 2) the logical values "stop" and "go The ascertainment of the nature of this flag is carried out in a subsequent choice step 104.

In caso di esito positivo (11 £lag corrisponde al valore "stop”), il programma ritorna immediatamente a valle del passo 101 mettendosi in attesa della pressione di un nuovo tasto. In the event of a positive outcome (11 £ lag corresponds to the "stop" value), the program returns immediately after step 101, waiting for a new key to be pressed.

In caso contrario, il programma prosegue con la fase di scorrimento della frase che si articola»in più passi successivi. Otherwise, the program continues with the passage of the sentence which is divided into several successive steps.

In un primo passo, indicato con 105, viene avviata la fase di lettura corrente spedendola al modulo 6 che pilota la sintesi con i corrispondenti parametri. In particolare, nel passo immediatamente successiyo, indicato con 106, il programma carica dal dispositivo di controllo (tipicamente la tastiera K) i dati relativi al tasto (pitch) ed alla velocità di scorrimento desiderata, evolvendo quindi verso un passo 107 in cui il valore corrente della fase viene incrementato del valore di passo (step) che determina la velocità di emissione della linea cantata letta a partire dal dispositivo di controllo (tastiera K). Il passo successivo, indicato con 108, corrisponde all'invio verso l'oscillatore 4 del valore indicativo della frequenza che deve essere utilizzata dall’oscillatore per generare il treno di impulsi da inviare nel filtro di sintesi di tipo LPC indicato con 7. In a first step, indicated with 105, the current reading phase is started by sending it to module 6 which drives the synthesis with the corresponding parameters. In particular, in the step immediately following, indicated by 106, the program loads from the control device (typically the keyboard K) the data relating to the key (pitch) and to the desired scrolling speed, thus evolving towards a step 107 in which the value phase current is increased by the step value which determines the emission speed of the sung line read from the control device (keyboard K). The next step, indicated with 108, corresponds to the sending to the oscillator 4 of the indicative value of the frequency that must be used by the oscillator to generate the train of pulses to be sent in the LPC type synthesis filter indicated with 7.

Al termine di queste operazioni, si effettua un test (passo 109} per verificare se la fase abbia raggiunto la fine della canzone o del brano. At the end of these operations, a test is carried out (step 109} to check whether the phase has reached the end of the song or piece.

In caso di esito negativo, l'evoluzione del programma ritorna a valle del passo 102. In the event of a negative outcome, the evolution of the program returns downstream of step 102.

In caso di esito positivo, il programma ritorna al passo di iniziallzzazlone 100. In altre parole, il programma si rimette in attesa della pressione di un tasto sul dispositivo dì controllo (tastiera K). If successful, the program returns to the initialisation step 100. In other words, the program waits for a key to be pressed on the control device (keyboard K).

Si apprezzerà il fatto che,quando l'evoluzione del programma prevede di tornare dal passo 109 a valle del passo 102, nel passo 103 il programma legge il flag corrispondente al nuovo fraine da sintetizzare (nel passo 107 il valore corrispondente è stato incrementato di un passo). Il nuovo flag indica al processo di controllo se proseguire la scansione della frase secondo i passi esposti in precedenza o se continuare la sintesi sul frame corrente sino alla pressione di un tasto. It will be appreciated that, when the evolution of the program foresees to return from step 109 downstream of step 102, in step 103 the program reads the flag corresponding to the new pattern to be synthesized (in step 107 the corresponding value has been increased by a step). The new flag indicates to the control process whether to continue scanning the sentence according to the steps described above or whether to continue the synthesis on the current frame until a key is pressed.

Come tipico esempio di applicazione della soluzione secondo l’invenzione, si consideri la seguente. As a typical example of application of the solution according to the invention, consider the following.

Si consideri di avere a disposizione sullo strumento e nel formato necessario i dati relativi ad una voce che canta ad esempio una canzone quale "Fra Martino Nel momento in cui viene premuto un tasto sulla tastiera K, lo strumento esegue il primo segmento di testo: "Fra" alla frequenza data dalla nota corrispondente al tasto premuto, rimanendo sull’ultima vocale "a” in attesa che venga premuto un nuovo tasto. Quando questo accade, viene eseguito il secondo segmento ("Mar") sulla nuova nota determinata dal tasto premuto e rimanendo nuovamente sulla "r" in attesa di un nuovo evento sulla tastiera e si prosegue in questo modo fino alla fine della canzone. Naturalmente, quando non vengono utilizzati i marker sarà compito dell'esecutore seguire correttamente l’emissione del segnale cantato, assicurando la corretta evoluzione della linea melodica con l’azionamento dei tasti. Consider having available on the instrument and in the necessary format the data relating to a voice that sings for example a song such as "Fra Martino When a key is pressed on the keyboard K, the instrument plays the first segment of text:" Fra "at the frequency given by the note corresponding to the key pressed, remaining on the last vowel" a "waiting for a new key to be pressed. When this happens, the second segment ("Mar") is played on the new note determined by the key pressed and remaining on the "r" again waiting for a new event on the keyboard and continues in this way until the end of the song. Naturally, when the markers are not used, the performer will have to correctly follow the emission of the sung signal, ensuring the correct evolution of the melodic line by pressing the keys.

Come si evìnce dall’esempio, la melodia (essenzialmente dettata dalla sequenza delle altezze o "pitch" delle varie note) ed il tempo saranno determinati dal suonatore, mentre il testo ed il timbro della voce (essenzialmente dettate dal modello di filtraggio predittivo LPC) rimarranno quelli cantati dall'esecutore originale. As can be seen from the example, the melody (essentially dictated by the sequence of heights or "pitch" of the various notes) and the tempo will be determined by the player, while the text and the timbre of the voice (essentially dictated by the LPC predictive filtering model) those sung by the original performer will remain.

Naturalmente, fermo restando il principio dell'invenzione, i particolari di realizzazione e le forme di attuazione potranno essere ampiamente variati rispetto a quanto descritto ed illustrato senza per questo uscire dall'ambito della presente invenzione. Naturally, the principle of the invention remaining the same, the details of construction and the embodiments may be varied widely with respect to those described and illustrated without thereby departing from the scope of the present invention.

Claims

CLAIMS 1. Process for the synthesis of vocal signals, for example of a musical type, said synthesis being selectively controlled as a function of note signals indicative of the pitch of said vocal signal and of the audible or deaf character of the vocal signal itself, characterized by the fact that it includes the operations of: storing (9) the information corresponding to the speech signal to be synthesized in the form of frames of a given length staggered in time by a given displacement entity, generate a resonance model corresponding to the filtering effect that the oral cavity achieves on the sound in the form of a filtering function (7), preferably of the LPC type, identified by respective coefficients, and representing each of said signal frames with a respective set of coefficients of said filtering function, the synthesis operation thus including, for the sequence of said frames, the carrying out of the following phases: generation (4) of a train of excitation pulses with frequency corresponding to said pitch at least in the case in which the signal to be synthesized is a signal with a sound character, generation (5) of a noise signal in the case in which the signal to be synthesized is a signal with a deaf character, feeding of said train of pulses (4) ^ 0, alternatively (3), of said noise signal (5) towards said filtering function (7), whose respective coefficients are maintained at the values determined for the respective frame at the moment synthesized; the output of said filtering function thus constituting the synthesis signal.

2. Process according to claim 1, characterized in that it further comprises the operation of associating to each frame a marker which assumes a first value, if in the synthesis phase one has to stop waiting for a new note signal, that is a second value in the opposite case and by the fact that, in the synthesis phase, the coefficients of said filtering function (7) are kept at the values determined for the respective frame currently synthesized as a function of the value of the respective marker.

3. Process according to claim 1 or claim 2, characterized in that, during the synthesis phase, a selectively variable gain (8) is applied to the synthesis signal coming from said filtering function (7), stored (9) for each frame.

4. Process according to any one of claims 1 to 3, characterized in that it comprises the operation of selectively varying the value of said amount of displacement given between successive frames of said signal.

5. Process according to any one of claims 1 to 4, characterized in that the signal synthesis operation is performed by taking the frames in reverse order with respect to the order of said frames in the stored voice signal (9).

6. Process according to any one of the preceding claims, characterized in that for said resonance model a number of filtering coefficients of the order of twenty-five is chosen for each frame.

Method according to any one of the preceding claims, characterized in that a poly-only filtering function (7) is used.

8. Process according to any one of the preceding claims, characterized in that said frames are chosen so as to correspond to speech signal sections having a duration of the order of 25 ms.

9. A device for the synthesis of a vocal signal, for example of a musical type, comprising: input means (K) for emitting synthesis control signals corresponding to the pitch of said voice signal and to the audible or deaf character of the voice signal itself, memory means (9) for storing the information corresponding to the speech signal to be synthesized in the form of frames of a given length offset in time by a given displacement amount, in which each of said signal frames is represented with a respective set of coefficients of a resonance model corresponding to the filtering effect that the oral cavity achieves on the sound in the form of a filter function (7), preferably of the LPC type, identified by respective coefficients of said filter, a first generator (4) to generate a train of excitation pulses with a frequency corresponding to said pitch, at least in the case in which the signal to be synthesized is a signal with a sound character a second generator (5) to generate a noise signal if the signal to be synthesized is a signal with a deaf character, filtering means (7) corresponding to said resonance model, e switching means {3) for alternatively supplying said train of pulses or said noise signal to said filtering means (7) whose corresponding coefficients are maintained at the values determined for the respective frame currently synthesized as a function of the respective marker; the output of said filter media (7) constituting the synthesis signal.

10. Device according to claim 9, in which in said memory means (9) a marker is associated to each frame which assumes a first value if in the synthesis phase one has to stop waiting for a new note control signal to be said input means (K), or a second value otherwise and by the fact that the respective coefficients of said filtering means (7) are maintained at the values determined for the respective frame currently synthesized as a function of the respective marker.

Device according to claim 9 or claim 10, characterized in that it comprises gain control means (8) for applying to the synthesis signal a stored selectively variable gain (9) for each frame.

Device according to any one of claims 9 to 11, characterized in that it comprises means (9) for selectively varying the value of said amount of displacement given between successive sequences of said voice signal.

13. Device according to any one of claims 9 to 12, characterized in that said memory means (9) are associated with access means (8) capable of reading said frames during the synthesis phase in reverse order with respect to the order of said frame in the stored voice signal (9).

Device according to any one of claims 9 to 13, characterized in that said filtering means (7) operate with a number of filtering coefficients of the order of twenty-five.

15. Device according to any one of claims 9 to 14, characterized in that said filtering means (7) are poly-only filtering means.