EP1197952A1 - Procédé de codage de la prosodie pour un codeur de parole à très bas débit - Google Patents
Procédé de codage de la prosodie pour un codeur de parole à très bas débit Download PDFInfo
- Publication number
- EP1197952A1 EP1197952A1 EP01402684A EP01402684A EP1197952A1 EP 1197952 A1 EP1197952 A1 EP 1197952A1 EP 01402684 A EP01402684 A EP 01402684A EP 01402684 A EP01402684 A EP 01402684A EP 1197952 A1 EP1197952 A1 EP 1197952A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- coding
- energy
- speech
- representatives
- recognized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims description 53
- 230000007704 transition Effects 0.000 claims description 13
- 239000011295 pitch Substances 0.000 description 38
- 238000004458 analytical method Methods 0.000 description 9
- 230000015572 biosynthetic process Effects 0.000 description 8
- 238000003786 synthesis reaction Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 230000002123 temporal effect Effects 0.000 description 6
- 238000012937 correction Methods 0.000 description 5
- 230000005284 excitation Effects 0.000 description 5
- 238000013519 translation Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000011218 segmentation Effects 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- 238000006677 Appel reaction Methods 0.000 description 1
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 241001080024 Telles Species 0.000 description 1
- 241000897276 Termes Species 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0018—Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
Definitions
- the present invention relates to a method for coding the speech at very low speed and the associated system. It applies in particular for speech coding-decoding systems by unit indexing of variable size.
- the speech coding method implemented at low bit rate is generally that of the vocoder using a fully parametric model of the speech signal.
- the parameters used relate to voicing which describes the character periodic or random signal, the fundamental frequency of voiced sounds still known by the English term "PITCH", the evolution temporal energy, as well as the spectral envelope of the signal generally modeled by an LPC filter (abbreviation for Anglo-Saxon Linear Predictive Coding).
- These different parameters are estimated periodically over the speech signal, typically every 10 to 30 ms. They are developed in level of an analysis device and are generally transmitted remotely in direction of a synthesis device reproducing the speech signal from of the quantized value of the model parameters.
- One way to reduce the speed is to use vocoders phonetic type segmentals with segments of variable duration which combine principles of speech recognition and synthesis.
- the encoding procedure essentially uses a system of automatic speech recognition in continuous flow, which segments and "Labels" the speech signal according to a number of speech units of size variable. These phonetic units are coded by indexing in a small dictionary. Decoding is based on the principle of speech synthesis by concatenation from the index of phonetic units and the prosody.
- the term "prosody” mainly includes parameters following: signal energy, pitch, voicing information and possibly the time rhythm.
- this type of encoder is broken down mainly in two stages: a learning stage and a coding-decoding described in figure 1.
- an automatic procedure determines for example after a parametric analysis 1 and a segmentation step 2, a set of 64 classes of acoustic units designated "UA".
- Each of these classes of acoustic units is associated with a statistical model 3, of the Markov model type (HMM abbreviation Anglo-Saxon de Hidden Markov Model), as well as a small number of units representing a class, designated under the term “representatives” 4.
- representatives are simply the 8 longest units belonging to the same acoustic class. They can also be determined as being the N units most representative of the acoustic unit.
- a recognition procedure (6, 7), using an algorithm of Viterbi, determines the succession of acoustic units of the speech signal and identifies the "best representative" to use for speech synthesis.
- This choice is made for example by using a spectral distance criterion, such as the DTW algorithm (abbreviation for Dynamic Time Warping).
- DTW algorithm abbreviation for Dynamic Time Warping
- the number of the acoustic class, the index of this representative unit, the length of the segment, the content of DTW and the prosodic information resulting from the parametric analysis are transmitted to the decoder. Speech synthesis is done by concatenation of the best representatives, possibly using a parametric LPC type synthesizer.
- This parametric process notably allows modifications of prosody such as time evolution, fundamental frequency or pitch, compared to a simple concatenation of waveforms.
- the parametric speech model used by the process analysis / synthesis can be binary excitation voiced / unvoiced type LPC 10 as described in the document entitled "The government standard linear predictive coding algorithm: LPC-10 ”by T.Tremain published in the Speech Technology, vol.1, n ° 2, pp 40-49.
- This technique makes it possible to code the spectral envelope of the signal in around 185 bits / s for a single-speaker system, for an average about 21 segments per second.
- the object of the present invention relates to a coding method, prosody decoding for a very low bit rate speech coder using including the best representatives.
- the invention relates to a speech coding-decoding method.
- a very low bit rate coder including a learning step allowing to identify “representatives” of the speech signal and a step coding to segment the speech signal and determine the "best representative ”associated with each recognized segment. It is characterized in that that it includes at least one coding-decoding step of one of the parameters of at least the prosody of the recognized segments, such as the energy and / or the pitch and / or the voicing and / or the length of the segments, in using prosody information from the "best representatives".
- the prosody information of the representatives used is by example the energy contour or the voicing or the length of the segments or the pitch.
- the step of coding the length of the recognized segments consists for example in coding the difference in length between the length of a recognized segment and the length of the "best representative" multiplied by a given factor.
- it includes a step of coding the time alignment of the best representatives using the path of DTW and looking for the nearest neighbor in a shape table.
- the energy coding step may include a step of determining for each start of a “recognized segment” the difference ⁇ E (j) between the energy value E rd (j) of the “best representative” and the value d energy E sd (j) from the start of the “recognized segment” and the decoding step include, for each recognized segment, a first step consisting in translating the energy contour of the best representative by a quantity ⁇ E (j) to make coincide the first energy E rd (j) of the "best representative" with the first energy E sd (j + 1) of the recognized segment of index j + 1.
- the voicing coding step comprises for example a step of determining the existing differences ⁇ T k for each end of a voicing area of index k between the voicing curve of the recognized segments and that of the best representatives and the step of decoding comprises for example for each end of a voicing area of index k a step of correcting the time position of this end with a corresponding value ⁇ T k and / or a step of deleting or inserting a transition .
- the method also relates to a coding-decoding system for the speech comprising at least one memory for storing a dictionary comprising a set of representatives of the speech signal, a microprocessor suitable for determining the recognized segments, for reconstruct speech from the "best representatives" and to put implementing the process steps according to one of the aforementioned characteristics.
- the representative dictionary is for example common to coder and decoder of the coding-decoding system.
- the method and the system according to the invention can be used for speech coding and decoding at bit rates below 800 bits / s and preferably less than 400 bits / s.
- the coding-decoding method and system according to the invention offer in particular the advantage of coding prosody at very low speed and of thus providing a complete encoder in this field of application.
- the coding principle according to the invention is based on the use of "Best representatives", including their prosody information, for coding and / or decoding at least one of the prosody parameters of a signal speech, for example pitch, signal energy, voicing, length recognized segments.
- This dictionary is known to the coder and the decoder. he corresponds for example to one or more languages and to one or more speakers.
- the coding-decoding system comprises for example a memory to store the dictionary, a microprocessor suitable for determine the recognized segments, for the implementation of the different steps of the method according to the invention and for reconstructing speech from best representatives.
- the method according to the invention implements at least one of the steps following: the coding of the length of the segments, the coding of the time alignment of the “best representatives”, the coding and / or the energy decoding, encoding and / or decoding information from voicing and / or encoding and / or decoding of pitch and / or decoding of the length of the segments and the time alignment.
- the coding system determines on average a number Ns of segments per second, for example 21 segments.
- the size of these segments varies depending on the class of UA acoustic units. It appears that for the majority of UA, the number of segments decreases according to a relation 1 / x 2.6 , where x is the length of the segment.
- An alternative embodiment of the method according to the invention consists in code the difference in variable length between the "recognized segment” and the length of the “best representative” according to a diagram described in Figure 2.
- This diagram in the left column shows the length of the code word to use and in the right column the length difference between the length of the segment recognized by the coder for the speech signal and that of the best representative.
- the coding of the absolute length of a recognized segment is carried out using a code of variable length similar to that of Huffman known to man of the profession, which makes it possible to obtain a bit rate of around 55 bits / s.
- this factor can be between 0 (absolute coding) and 1 (difference coding).
- the time alignment is for example carried out by following the DTW path (Anglo-Saxon abbreviation for Dynamic Time Warping) which was determined when looking for the "best representative" for code the "recognized segment".
- DTW path Anglo-Saxon abbreviation for Dynamic Time Warping
- FIG. 4 represents the path (C) of the DTW corresponding to the time contour which minimizes the distortion between the parameter to be coded (axis abscissas), for example the vector of “cepstral” coefficients, and the "Best representative” (ordinate axis).
- the coding of the alignment of the "best representatives" is performed by finding the nearest neighbor in a table containing type forms.
- the choice of these standard forms is made for example by a statistical approach, such as learning on a database of speech or by an algebraic approach for example the description by configurable mathematical equations, these different methods being known to those skilled in the art.
- the process performs a alignment of segments along the diagonal rather than the exact path of DTW. The flow is then zero.
- the energy coding is described below in relation to FIGS. 5 and 6, where the ordinate axis corresponds to the energy of the speech signal at code expressed in dB and the abscissa axis at time expressed in frames.
- FIG. 5 represents the curve (III) gathering the energy contours of the best aligned representatives and the curve (IV) of the energy contours of the recognized segments separated by * in the figure.
- a recognized segment of index j is delimited by two points of respective coordinates [E sd (j); T sd (j)] and [E sf (j); T sf (j)] where E sd (j) is the start of segment energy and E sf (j) the end of segment energy, for the instants T df and T sf corresponding.
- the references E rd (j) and E rf (j) are used for the energy values of the beginning and the end of a "best representative" and the reference ⁇ E (j) corresponds to the translation determined for a recognized segment index j.
- the method comprises a first step of determining the translation to be carried out.
- the difference ⁇ E (j) existing between the energy value E rd (j) of the best representative (curve III) and the energy value E sd at the start of the segment is determined for each start of the “recognized segment”. recognized (curve IV).
- ⁇ E (j) We obtain a set of values ⁇ E (j) that we quantify for example uniformly so as to know the translation to be applied during decoding. The quantification is carried out for example using methods known to those skilled in the art.
- the method notably consists in using the energy contours of the best representatives (curve III) to reconstruct the contours energy of the signal to be coded (curve IV).
- a first step consists in translating the energy contour of the best representative to make it coincide with the first energy E rd (j) by applying to it the translation ⁇ E (j), defined in the coding step by example, to determine the value E sd (j).
- the method comprises a step of modifying the slope of the energy contour of the best representative in order to link the last energy value E rd (j) of the "best representative" to the first energy E sd (j + 1) of the next segment of index j + 1.
- FIG. 6 represents the curves (VI) and (VII) corresponding respectively to the original energy contour of the speech signal to be coded and of the energy contour decoded after implementation of the steps described previously.
- coding the start energies of each segment on 4 bits provides for segmental energy coding a bit rate of around 80 bits / s.
- FIG. 7 represents the temporal evolution of information of binary voicing of four successive segments 35, 36, 37 for the signal to code curve (VII) and for the best representatives (curve VIII) after time alignment by DTW.
- the method performs a coding step of the voicing information, for example by browsing the temporal evolution of the voicing information of the recognized segments and that of the best aligned representatives (curve VIII) and by coding the differences existing ⁇ T k between these two curves.
- These differences ⁇ T k can be: an advance a of the frame, a delay b of the frame, the absence and / or the presence of a reference transition c (k corresponds to the index of an end of a zone of voicing).
- variable length code an example of which is given in table I below, to code the correction to be made to each of the voicing transitions for each recognized segments. All segments with no transition voicing, it is possible to reduce the flow associated with voicing by not coding that the existing voicing transitions in the voicing at code and in the best representatives.
- the voicing information is coded on approximately 22 bits per second.
- Example of coding table for voicing transitions Coded Interpretation 000 Transition to be deleted 001 1 frame shift to Right 010 Offset 1 frame to the Left 011 Offset 2 frames to the Right 100 Offset 2 frames to the Left 101 Insert a transition (a code specifying the location of the transition follows this one) 110 No lag 111 Displacement greater than 3 frames (another code follows this one)
- the coding of the voicing information comprises also the coding of the variation in the voicing proportion.
- the decoder has the voicing information of the "Best aligned representatives" obtained at the coder level.
- the correction is made, for example, as follows:
- the process provides additional information to the decoder which is the correction to be made to this end.
- the correction can be an advance a or a delay b to bring to this end. This time difference is for example expressed in number of frames in order to obtain the exact position of the end of voicing of the original speech signal.
- the correction can also take the form of a deletion or insertion of a transition.
- the process uses for example a predictive scalar quantizer on for example 5 bits applied to the logarithm of the pitch.
- Prediction is for example the first pitch value of the best representative corresponding to the position of the pitch to be decoded, multiplied by a prediction factor for example between 0 and 1.
- the prediction can be the minimum value of the speech recording to be coded.
- this value can be transmitted to the decoder by scalar quantization on by 8-bit example.
- the method comprises a step where the temporal spacing is specified, for example in number of frames, between each of these values of pitch.
- a variable length code allows for example to code these 2-bit spacing on average.
- This procedure allows a flow of approximately 65 / bits per second for a maximum distance over the pitch period of 7 samples.
- the decoding step firstly includes a step of decoding the time spacing between the different pitch values transmitted in order to recover the pitch update times, as well as the pitch value for each of these moments.
- the value of the pitch for each frames of the voiced area is reconstructed for example by interpolation linear between the transmitted values.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Description
Le numéro de la classe acoustique, l'indice de cette unité représentante , la longueur du segment, le contenu de DTW et les informations prosodiques issues de l'analyse paramétrique sont transmises au décodeur. La synthèse de la parole se fait par concaténation des meilleurs représentants, éventuellement en utilisant un synthétiseur paramétrique de type LPC.
- le terme « représentant » correspond à l'un des segments de la base d'apprentissage qui a été jugé représentatif d'une des classes d'unités acoustique,
- l'expression « segment reconnu » correspond à un segment de la parole qui a été identifié comme appartenant à l'une des classes acoustiques, par le codeur,
- l'expression « meilleur représentant » désigne le représentant déterminé au niveau du codage qui représente le mieux le segment reconnu.
- la figure 1 représente un schéma d'apprentissage, de codage et de décodage de la parole selon l'art antérieur,
- les figures 2 et 3 décrivent des exemples de codage de la longueur des segments reconnus,
- la figure 4 schématise un modèle d'alignement temporel des « meilleurs représentants »,
- les figures 5 et 6 montrent des courbes des énergies du signal à coder et des représentants alignés, ainsi que les contours des énergies initial et décodé obtenus en mettant en oeuvre le procédé selon l'invention,
- la figure 7 schématise le codage du voisement du signal de parole, et
- la figure 8 est un exemple de codage du pitch.
- plusieurs classes d'unités acoustiques UA, chaque classe étant déterminée à partir d'un modèle statistique,
- pour chaque classe d'unités acoustiques, un ensemble de représentants.
Exemple de table de codage pour les transitions de voisement : | |
Code | Interprétation |
000 | Transition à supprimer |
001 | Décalage 1 trame à Droite |
010 | Décalage 1 trame à Gauche |
011 | Décalage 2 trames à Droite |
100 | Décalage 2 trames à Gauche |
101 | Insérer une transition (un code précisant l'emplacement de la transition suit celui-ci) |
110 | Pas de décalage |
111 | Déplacement supérieur à 3 trames (un autre code suit celui-ci) |
- le taux de voisement en sous-bande, l'analyse de cette information fait appel à une méthode décrite par exemple dans le document suivant : "Multiband Excitation Vocoders", ayant pour auteurs D.W. Griffin and J.S. Lim, IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 36, no. 8, pp. 1223-1235, 1988 ;
- la fréquence de transition entre une bande basse voisée et une bande haute non-voisée, le codage utilise une méthode telle que décrite dans le document ayant pour auteurs C. Laflamme, R. Salami, R. Matmti, and J-P. Adoul, intitulé "Harmonic Stochastic Excitation (HSX) speech coding below 4 kbits/s", IEEE International Conférence on Acoustics, Speech, and Signal Processing, Atlanta, May 1996, pp. 204-207.
- le procédé considère uniquement les valeurs du pitch au début des segments reconnus. Partant de la droite Di joignant les valeurs du pitch aux deux extrémités de la zone voisée, le procédé recherche le début de segment dont la valeur de pitch est la plus éloignée de cette droite, ce qui correspond à une distance dmax. Il compare cette valeur dmax à une valeur seuil dseuil. Si la distance dmax est supérieure à dseuil, le procédé décompose la droite initiale Di en deux droites Di1 et Di2, en prenant le début du segment trouvé comme nouvelle valeur de pitch à transmettre. Cette opération est réitérée sur ces deux nouvelles zones voisée délimitées par les droites Di1 et Di2 jusqu'à ce que la distance dmax trouvée soit inférieure à la distance dseuil.
Claims (11)
- Procédé de codage-décodage de la parole utilisant un codeur à très bas débit comprenant une étape d'apprentissage permettant d'identifier des « représentants » du signal de parole et une étape de codage pour segmenter le signal de parole et déterminer le « meilleur représentant » associé à chaque segment reconnu caractérisé en ce qu'il comporte au moins une étape de codage-décodage d'un des paramètres au moins de la prosodie des segments reconnus, tel que l'énergie et/ou le pitch et/ou le voisement et/ou la longueur des segments, en utilisant une information de prosodie des « meilleurs représentants ».
- Procédé selon la revendication 1 caractérisé en ce que l'information de prosodie des représentants utilisée est le contour d'énergie ou le voisement ou la longueur des segments ou le pitch.
- Procédé selon la revendication 1 caractérisé en ce qu'il comporte une étape de codage de la longueur des segments reconnus consistant à coder la différence de longueur entre la longueur d'un segment reconnu et la longueur du « meilleur représentant » multiplié par un facteur donné.
- Procédé selon la revendication 1 caractérisé en ce qu'il comporte une étape de codage de l'alignement temporel des meilleurs représentants en utilisant le chemin de DTW et en recherchant le plus proche voisin dans une table de formes.
- Procédé selon l'une des revendications 1 à 4 caractérisé en ce que l'étape de codage de l'énergie comporte une étape de détermination pour chaque début de « segment reconnu » de la différence ΔE(j) entre la valeur d'énergie Erd(j) du « meilleur représentant » et la valeur d'énergie Esd(j) du début du « segment reconnu ».
- Procédé selon la revendication 5 caractérisé en ce que l'étape de décodage de l'énergie comporte pour chaque segment reconnu, une première étape consistant à translater le contour d'énergie du meilleur représentant d'une quantité ΔE(j) pour faire coïncider la première énergie Erd(j) du « meilleur représentant» avec la première énergie Esd(j+1) du segment reconnu d'indice j+1.
- Procédé selon l'une des revendications 1 à 4 caractérisé en ce que l'étape de codage de voisement comporte une étape de détermination des différences existantes ΔTk pour chaque extrémité d'une zone de voisement d'indice k entre la courbe du voisement des segments reconnus et celle des meilleurs représentants.
- Procédé selon la revendication 7 caractérisé en ce que l'étape de décodage comporte pour chaque extrémité d'une zone de voisement d'indice k une étape de correction de la position temporelle de cette extrémité d'une valeur ΔTk correspondante et/ou une étape de suppression ou d'insertion d'une transition.
- Système de codage-décodage de la parole comportant au moins une mémoire pour stocker un dictionnaire comprenant un ensemble de représentants du signal de parole, un microprocesseur adapté pour déterminer les segments reconnus, pour reconstruire la parole à partir des « meilleurs représentants » et pour mettre en oeuvre les étapes du procédé selon l'une des revendications 1 à 8.
- Système selon la revendication 9 caractérisé en ce que le dictionnaire des représentants est commun au codeur et au décodeur du système codage-décodage.
- Utilisation du procédé selon l'une des revendications 1 à 8 ou du système selon l'une des revendications 9 et 10 au codage-décodage de la parole pour des débits inférieurs à 800 bits/s et de préférence inférieurs à 400 bits/s.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0013628A FR2815457B1 (fr) | 2000-10-18 | 2000-10-18 | Procede de codage de la prosodie pour un codeur de parole a tres bas debit |
FR0013628 | 2000-10-18 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1197952A1 true EP1197952A1 (fr) | 2002-04-17 |
EP1197952B1 EP1197952B1 (fr) | 2009-12-02 |
Family
ID=8855687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01402684A Expired - Lifetime EP1197952B1 (fr) | 2000-10-18 | 2001-10-17 | Procédé de codage de la prosodie pour un codeur de parole à très bas débit |
Country Status (10)
Country | Link |
---|---|
US (1) | US7039584B2 (fr) |
EP (1) | EP1197952B1 (fr) |
JP (1) | JP2002207499A (fr) |
KR (1) | KR20020031305A (fr) |
AT (1) | ATE450856T1 (fr) |
CA (1) | CA2359411C (fr) |
DE (1) | DE60140651D1 (fr) |
ES (1) | ES2337020T3 (fr) |
FR (1) | FR2815457B1 (fr) |
IL (1) | IL145992A0 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265049A (zh) * | 2019-05-27 | 2019-09-20 | 重庆高开清芯科技产业发展有限公司 | 一种语音识别方法及语音识别系统 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2388439A1 (fr) * | 2002-05-31 | 2003-11-30 | Voiceage Corporation | Methode et dispositif de dissimulation d'effacement de cadres dans des codecs de la parole a prevision lineaire |
US20040166481A1 (en) * | 2003-02-26 | 2004-08-26 | Sayling Wen | Linear listening and followed-reading language learning system & method |
JP4256189B2 (ja) * | 2003-03-28 | 2009-04-22 | 株式会社ケンウッド | 音声信号圧縮装置、音声信号圧縮方法及びプログラム |
US20050091044A1 (en) * | 2003-10-23 | 2005-04-28 | Nokia Corporation | Method and system for pitch contour quantization in audio coding |
FR2861491B1 (fr) * | 2003-10-24 | 2006-01-06 | Thales Sa | Procede de selection d'unites de synthese |
KR101410230B1 (ko) * | 2007-08-17 | 2014-06-20 | 삼성전자주식회사 | 종지 정현파 신호와 일반적인 연속 정현파 신호를 다른방식으로 처리하는 오디오 신호 인코딩 방법 및 장치와오디오 신호 디코딩 방법 및 장치 |
US8374873B2 (en) * | 2008-08-12 | 2013-02-12 | Morphism, Llc | Training and applying prosody models |
US8670990B2 (en) * | 2009-08-03 | 2014-03-11 | Broadcom Corporation | Dynamic time scale modification for reduced bit rate audio coding |
CN107256710A (zh) * | 2017-08-01 | 2017-10-17 | 中国农业大学 | 一种基于动态时间伸缩算法的哼唱旋律识别方法 |
US11830473B2 (en) * | 2020-01-21 | 2023-11-28 | Samsung Electronics Co., Ltd. | Expressive text-to-speech system and method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5933805A (en) * | 1996-12-13 | 1999-08-03 | Intel Corporation | Retaining prosody during speech analysis for later playback |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4802223A (en) * | 1983-11-03 | 1989-01-31 | Texas Instruments Incorporated | Low data rate speech encoding employing syllable pitch patterns |
US5305421A (en) * | 1991-08-28 | 1994-04-19 | Itt Corporation | Low bit rate speech coding system and compression |
US5233660A (en) * | 1991-09-10 | 1993-08-03 | At&T Bell Laboratories | Method and apparatus for low-delay celp speech coding and decoding |
US5682464A (en) * | 1992-06-29 | 1997-10-28 | Kurzweil Applied Intelligence, Inc. | Word model candidate preselection for speech recognition using precomputed matrix of thresholded distance values |
EP0706172A1 (fr) * | 1994-10-04 | 1996-04-10 | Hughes Aircraft Company | Codeur et décodeur de parole à faible débit binaire |
US6393391B1 (en) * | 1998-04-15 | 2002-05-21 | Nec Corporation | Speech coder for high quality at low bit rates |
JPH10260692A (ja) * | 1997-03-18 | 1998-09-29 | Toshiba Corp | 音声の認識合成符号化/復号化方法及び音声符号化/復号化システム |
US6456965B1 (en) * | 1997-05-20 | 2002-09-24 | Texas Instruments Incorporated | Multi-stage pitch and mixed voicing estimation for harmonic speech coders |
FR2784218B1 (fr) * | 1998-10-06 | 2000-12-08 | Thomson Csf | Procede de codage de la parole a bas debit |
FR2786908B1 (fr) * | 1998-12-04 | 2001-06-08 | Thomson Csf | Procede et dispositif pour le traitement des sons pour correction auditive des malentendants |
WO2002027709A2 (fr) * | 2000-09-29 | 2002-04-04 | Lernout & Hauspie Speech Products N.V. | Systeme de traduction de prosodie base sur un corpus |
-
2000
- 2000-10-18 FR FR0013628A patent/FR2815457B1/fr not_active Expired - Fee Related
-
2001
- 2001-10-17 JP JP2001319231A patent/JP2002207499A/ja not_active Withdrawn
- 2001-10-17 AT AT01402684T patent/ATE450856T1/de not_active IP Right Cessation
- 2001-10-17 DE DE60140651T patent/DE60140651D1/de not_active Expired - Lifetime
- 2001-10-17 EP EP01402684A patent/EP1197952B1/fr not_active Expired - Lifetime
- 2001-10-17 ES ES01402684T patent/ES2337020T3/es not_active Expired - Lifetime
- 2001-10-17 CA CA2359411A patent/CA2359411C/fr not_active Expired - Fee Related
- 2001-10-17 IL IL14599201A patent/IL145992A0/xx unknown
- 2001-10-18 US US09/978,680 patent/US7039584B2/en not_active Expired - Fee Related
- 2001-10-18 KR KR1020010064436A patent/KR20020031305A/ko not_active Application Discontinuation
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5933805A (en) * | 1996-12-13 | 1999-08-03 | Intel Corporation | Retaining prosody during speech analysis for later playback |
Non-Patent Citations (5)
Title |
---|
BAUDOIN G ET AL: "Speech coding at low and very low bit rates", ANNALES DES TELECOMMUNICATIONS, SEPT.-OCT. 2000, EDITIONS HERMES, FRANCE, vol. 55, no. 9-10, pages 462 - 482, XP001010733, ISSN: 0003-4347 * |
CERNOCKY J ET AL: "Very low bit rate speech coding: comparison of data-driven units with syllable segments", TEXT, SPEECH AND DIALOGUE. SECOND INTERNATIONAL WORKSHOP, TDS'99. PROCEEDINGS (LECTURE NOTES IN ARTIFICIAL INTELLIGENCE VOL.1692), PLZEN, CZECH REPUBLIC, 13-17 SEPT. 1999, 1999, Berlin, Germany, Springer-Verlag, Germany, pages 262 - 267, XP001010738, ISBN: 3-540-66494-7 * |
FELICI M ET AL: "Very low bit rate speech coding using a diphone-based recognition and synthesis approach", ELECTRONICS LETTERS,IEE STEVENAGE,GB, vol. 34, no. 9, 30 April 1998 (1998-04-30), pages 859 - 860, XP006009638, ISSN: 0013-5194 * |
LEE K -S ET AL: "TTS BASED VERY LOW BIT RATE SPEECH CODER", PHOENIX, AZ, MARCH 15 - 19, 1999,NEW YORK, NY: IEEE,US, 15 March 1999 (1999-03-15), pages 181 - 184, XP000898289, ISBN: 0-7803-5042-1 * |
NAKACHE ET AL.: "Codage de la prosodie pour un codeur de parole à très bas débit par indexation d'unités de taille variable", CORESA'2000, 19 October 2000 (2000-10-19) - 20 October 2000 (2000-10-20), Poitier, XP002170481 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265049A (zh) * | 2019-05-27 | 2019-09-20 | 重庆高开清芯科技产业发展有限公司 | 一种语音识别方法及语音识别系统 |
Also Published As
Publication number | Publication date |
---|---|
ES2337020T3 (es) | 2010-04-20 |
US20020065655A1 (en) | 2002-05-30 |
ATE450856T1 (de) | 2009-12-15 |
CA2359411C (fr) | 2010-07-06 |
FR2815457B1 (fr) | 2003-02-14 |
DE60140651D1 (de) | 2010-01-14 |
IL145992A0 (en) | 2002-07-25 |
FR2815457A1 (fr) | 2002-04-19 |
CA2359411A1 (fr) | 2002-04-18 |
US7039584B2 (en) | 2006-05-02 |
JP2002207499A (ja) | 2002-07-26 |
KR20020031305A (ko) | 2002-05-01 |
EP1197952B1 (fr) | 2009-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1372289B1 (fr) | Création d'une trame de description de silence pour engendrer un bruit de confort | |
EP2277172B1 (fr) | Dissimulation d'erreur de transmission dans un signal audionumerique dans une structure de decodage hierarchique | |
EP1692689B1 (fr) | Procede de codage multiple optimise | |
FR2813722A1 (fr) | Procede et dispositif de dissimulation d'erreurs et systeme de transmission comportant un tel dispositif | |
EP1051703B1 (fr) | Procede decodage d'un signal audio avec correction des erreurs de transmission | |
FR2907586A1 (fr) | Synthese de blocs perdus d'un signal audionumerique,avec correction de periode de pitch. | |
EP1197952B1 (fr) | Procédé de codage de la prosodie pour un codeur de parole à très bas débit | |
EP2080194B1 (fr) | Attenuation du survoisement, notamment pour la generation d'une excitation aupres d'un decodeur, en absence d'information | |
WO2005066936A1 (fr) | Transcodage entre indices de dictionnaires multi-impulsionnels utilises en codage en compression de signaux numeriques | |
EP1526508B1 (fr) | Procédé de sélection d'unités de synthèse | |
WO1998047134A1 (fr) | Procede et dispositif de codage d'un signal audiofrequence par analyse lpc 'avant' et 'arriere' | |
EP1836699B1 (fr) | Procédé et dispositif de codage audio optimisé entre deux modèles de prediction à long terme | |
EP0428445A1 (fr) | Procédé et dispositif de codage de filtres prédicteurs de vocodeurs très bas débit | |
EP3138095B1 (fr) | Correction de perte de trame perfectionnée avec information de voisement | |
WO2023165946A1 (fr) | Codage et décodage optimisé d'un signal audio utilisant un auto-encodeur à base de réseau de neurones | |
EP2203915A1 (fr) | Dissimulation d'erreur de transmission dans un signal numerique avec repartition de la complexite | |
EP1756806B1 (fr) | Procede de quantification d'un codeur de parole a tres bas debit | |
JP3019342B2 (ja) | 音声符号化方式 | |
GB2626841A (en) | Voice audio compression using neural networks | |
FR2988894A1 (fr) | Procede de detection de la voix | |
FR2661541A1 (fr) | Procede et dispositif de codage bas debit de la parole. | |
WO2001091106A1 (fr) | Fenetres d'analyse adaptatives pour la reconnaissance de la parole | |
FR2581272A1 (fr) | Procede de codage mic differentiel et installation de transmission d'information utilisant un tel codage. | |
JPH03156498A (ja) | 音声符号化方式 | |
JPH04243300A (ja) | 音声符号化方式 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
17P | Request for examination filed |
Effective date: 20021014 |
|
AKX | Designation fees paid |
Free format text: AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: THALES |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D Free format text: NOT ENGLISH |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 60140651 Country of ref document: DE Date of ref document: 20100114 Kind code of ref document: P |
|
REG | Reference to a national code |
Ref country code: SE Ref legal event code: TRGR |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: VDEP Effective date: 20091202 |
|
REG | Reference to a national code |
Ref country code: ES Ref legal event code: FG2A Ref document number: 2337020 Country of ref document: ES Kind code of ref document: T3 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20091202 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20091202 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FD4D |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20091202 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20100402 Ref country code: IE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20091202 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20100303 |
|
26N | No opposition filed |
Effective date: 20100903 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20091202 |
|
BERE | Be: lapsed |
Owner name: THALES Effective date: 20101031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20101031 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20101031 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20101031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: BE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20101031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20101017 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 16 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: ES Payment date: 20160927 Year of fee payment: 16 Ref country code: TR Payment date: 20160926 Year of fee payment: 16 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FI Payment date: 20161010 Year of fee payment: 16 Ref country code: DE Payment date: 20161011 Year of fee payment: 16 Ref country code: GB Payment date: 20161005 Year of fee payment: 16 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: IT Payment date: 20161024 Year of fee payment: 16 Ref country code: SE Payment date: 20161011 Year of fee payment: 16 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 17 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20170929 Year of fee payment: 17 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60140651 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: SE Ref legal event code: EUG |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20171017 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171017 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180501 Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171017 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171018 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171017 |
|
REG | Reference to a national code |
Ref country code: ES Ref legal event code: FD2A Effective date: 20181221 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: ES Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171018 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181031 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20171017 |