WO2000074035A1 - Verfahren und anordnung zur sprachkodierung mittels phonetischer dekodierung und übertragung von sprechermerkmalen - Google Patents
Verfahren und anordnung zur sprachkodierung mittels phonetischer dekodierung und übertragung von sprechermerkmalen Download PDFInfo
- Publication number
- WO2000074035A1 WO2000074035A1 PCT/DE2000/001662 DE0001662W WO0074035A1 WO 2000074035 A1 WO2000074035 A1 WO 2000074035A1 DE 0001662 W DE0001662 W DE 0001662W WO 0074035 A1 WO0074035 A1 WO 0074035A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data signals
- transmitter
- receiver
- transmission
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 17
- 230000005540 biological transmission Effects 0.000 title claims description 43
- 238000012545 processing Methods 0.000 claims description 11
- 230000015572 biosynthetic process Effects 0.000 claims description 9
- 238000003786 synthesis reaction Methods 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 230000006978 adaptation Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 239000013589 supplement Substances 0.000 claims description 2
- 230000006837 decompression Effects 0.000 claims 3
- 238000011084 recovery Methods 0.000 claims 2
- 230000000295 complement effect Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 claims 1
- 230000001131 transforming effect Effects 0.000 claims 1
- 230000009467 reduction Effects 0.000 description 7
- 238000011161 development Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 238000002955 isolation Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/0018—Speech coding using phonetic or linguistical decoding of the source; Reconstruction using text-to-speech synthesis
Definitions
- the invention relates to a method for transmitting data signals with individual features, in particular voice signals, according to the preamble of claim 1, and to an arrangement for transmitting data signals with individual features, in particular voice signals, according to the preamble of claim 7.
- Voice transmission is one of the most important, if not the most important, telecommunications service.
- voice transmission is one of the most important, if not the most important, telecommunications service.
- due to the limited resources there is the requirement, on the one hand, to manage with the lowest possible transmission rates, and, on the other hand, the relatively poorer and strongly changing transmission properties, as compared to wired transmission, generally result in relatively high error rates.
- redundancy reduction eliminates redundant signal contents before transmission, the identification of which is based on prior knowledge of certain (for example statistical) parameters of the signal. If these redundant signal components are re-imprinted on the signal after transmission, there is no transmission-related loss of quality. With the irrelevance reduction, signal components are eliminated before the transmission, which is assumed to be for the receiver are irrelevant. If one chooses the option of not re-impressing these signal components after transmission, there are objective differences between the speech signal generated on the receiver side and the original speech signal, but these are accepted or (at best) cannot be heard.
- Waveform encoding in which the analog speech signal is digitized on the transmitter side and an error-free conversion of an analog signal on the receiver side, and in which bit rates of approximately 16 kbit / s up to 64 kbit / s an acceptable speech quality is achieved, or on the principle of parametric representation (vocoder principle), with which the bit rate is reduced (to 400 bit / s to 5 kbit / s), which is generally only of limited satisfaction is achieved.
- the speech signal is segmented into small sections during which the speech signal changes only insignificantly and can be characterized by certain excitation or filter parameters. It is not the actual signal that is transmitted, but rather the sequence of the excitation or filter parameters. Individual characteristics of the language (emphasis, accents and sentence melody) can only be transferred to a very limited extent with this method.
- the invention includes the basic technical idea of separating individual features from the overall data signals on the transmitter side and separately transmitting the remaining, standardized (and compressed) data signals on the one hand and the individualization data corresponding to the individual features on the other. Depending on the specific application, this separate transmission can also take place at different times or essentially simultaneously.
- a knowledge base regarding the individual features can be built up in advance on the receiver side, from which a re-impression of the individual features is then disputed after the transmission of the standardized data signals.
- an e-recipient knowledge base with respect to the individual characteristics in the course of the transmission successively built up, in particular in access to a corresponding transmitter knowledge base.
- this access to the transmitter-side knowledge base is controlled in such a way that the prioritized transmission of the standardized, compressed data signals as the main information carrier is not disturbed - ie in the case of speech transmission, in particular m speech pauses or sections of pronounced word stretching or m times, m those a higher channel bandwidth is available.
- the individual features are separated from the overall data signal m in a coder / decoder unit, the decoder part of which corresponds to the decoder provided on the receiver side for the data signal coded on the transmitter side, a delay stage connected in parallel with this unit and one with the Outputs of both components connected unit for obtaining a suitably structured difference signal between the total data signal present at the transmitter input and the normalized data signal after passing the codec.
- a coder / decoder unit the decoder part of which corresponds to the decoder provided on the receiver side for the data signal coded on the transmitter side
- a delay stage connected in parallel with this unit and one with the Outputs of both components connected unit for obtaining a suitably structured difference signal between the total data signal present at the transmitter input and the normalized data signal after passing the codec.
- the transmitter has speech recognition means known per se for converting speech m, the data signals m in the form of characters and the receiver speech synthesis means for synthesizing acoustically outputable speech from the signs.
- speech recognition means known per se for converting speech m
- the data signals m in the form of characters
- the receiver speech synthesis means for synthesizing acoustically outputable speech from the signs.
- the proposed arrangement is not only suitable for voice communication, but in principle for any transmission averaging of signals with individual characteristics, for example also for the compressed transmission of manuscripts or images with artistic "handwriting" (paintings, graphics etc.).
- the delay stage provided in the preferred embodiment of the separation means is m adaptation to the signal processing by the transmitter, i.e. the coding / decoding and, if necessary, speech analysis, conditional current running time preferably controllable in its delay time. This can result in a considerable saving of time in signal processing as a whole, since the assumption of a fixed signal transit time for the processing operations associated with the separation of the individual features had to be adapted to the "worst case" of a data signal sequence which requires a maximally time-consuming processing.
- Receivers an individual feature knowledge base connected on the input side to the separating means and on the output side at least indirectly with an input of the speech synthesis means for storing individual features in association with the associated characters as a representation of the standardized data signals.
- the transmitter contains a first such knowledge base and the receiver contains a second such knowledge base, and control means are provided for the transmission of new data sets to supplement the memory content of the second knowledge base from the inventory of the first, which enable efficient and secure transmission of the ensure appropriate individualization data via the separate channel.
- Their transmission takes place, in particular, with a lower priority than the standardized data signals, especially during breaks in the transmission of the latter.
- the figure only shows the components essential for the explanation of the invention, while the usual components of a mobile radio transmission or reception part are omitted here for the sake of clarity.
- the sound waves recorded with a microphone 7 are - if necessary after preprocessing, which are used for amplification and / or filtering
- False suppression includes - digitized in an A / D converter 9, and at the output of the A / D converter 9 the signal path in a node 11 m splits two partial paths.
- a first partial path 13a the digitized speech signal is first subjected to a speech recognition algorithm (known per se) and a speech recognition stage 15, the received speech signals being converted into m characters, and then subjected to coding m in an encoder 17.
- the specific code used for each of the previously formed characters depends on its predecessor, since the probability of a syllable or a word and also its emphasis is dependent on the preceding one.
- the coded characters are prepared in a transmission stage which is known per se and therefore not shown in the figure, and is sent to the receiver via a first logical channel CH1, where it is initially transmitted in a reception stage which is likewise known and is therefore omitted here HF-preprocessed and possibly also in accordance with the regulations of a special mobile radio protocol by despreading, descrambling or the like and then fed to a speech decoder 19.
- the further signal processing on the egg chip side is described below.
- the coded speech data available at the output of the transmitter-side encoder 17 are not only transmitted to the receiver, but are also immediately decoded again in a transmitter-side speech decoder 21 that corresponds functionally to the receiver-side speech decoder 19.
- a transformation into an n-dimensional state space then takes place in a first transformation stage 23a using algorithms known per se.
- the signal branched off in the node 11 in the second partial signal path 13b is also subjected to a corresponding transformation in a second transformation stage 23b after it has undergone a delay in a delay stage 25 which is synchronized with the signal propagation time in the first partial signal path 13a.
- the data signal present at the input of the first transformation stage is a standardized speech signal reduced by the process of coding and subsequent decoding in stages 17, 21, while it is the data signal present at the input of the second transformation stage 23 is still the - only suitably delayed - total data signal.
- the delay time impressed by the delay stage 25 is controlled in adaptation to the running time in the processing chain of stages 15, 17 and 21; in the figure is (somewhat simplified) a control depending on the result of the speech recognition, i.e. that is, starting from speech recognition level 15.
- Signal paths 13a, 13b can uniquely assign these individual features to the characters obtained in the result of the speech recognition and can be stored in this assignment m in an individual feature knowledge base 29 on the transmitter side.
- the individualization data presenting the individual features are transmitted to the receiver 5 via a separate, second logical channel CH2, at the beginning and end of which there is a codec 31, 33.
- This data is initially received in the receiver using a special control channel
- CH3 in a comparator and memory control stage (not shown in the figure) is checked as to whether or not they are already contained in a receiver-side individual feature knowledge base 35. If this is not the case, they are stored in the knowledge base 35 on the receiver side - again m assignment to the corresponding characters of the standardized speech signals (transmitted separately to the receiver 5).
- the receiver-side knowledge base 35 is thus “tracked” in its data stock of the transmitter-side knowledge base 29, so that only information regarding the individual characteristics that does not already exist in the receiver-side knowledge base 35 has to be transmitted via the separate channel. The amount of data to be transmitted here can therefore be kept comparatively small.
- the individualization data Since the individualization data also have practically no indispensable information value, they are transmitted with lower priority than the standardized speech signals. This data can be transmitted, for example, only during the speech pauses or m word intervals with strong word stretching.
- the proposed solution therefore requires a separate logical data channel, but no additional channel resources.
- the transmission of the individualization data separate from the standardized speech data in connection with the provision of a "learning" knowledge base for the individualization data at least in the receiver (preferably m transmitter and receiver) is even achieved 9 a strong reduction in the required transmission bandwidth despite achieving a relatively high voice quality (which of course depends on the effort involved in processing the individualization features).
- the individualization data available in the receiver-side individual feature knowledge base 35 - again synchronized with the normalized speech signal data leaving the decoder 19 - are fed to a speech synthesis unit ("voice generator") 37, where the standardized speech signals are linked with the individualization data and with their output an acoustic output unit 39 is connected for sound conversion.
- voice generator speech synthesis unit
- the mode of operation of the speech synthesis unit 37 and the output unit 39 are known per se and are therefore not explained further here;
- a special feature of the speech synthesis unit 37 is the additional input for the individualization data and the implementation of an algorithm suitable for linking this individualization data with the standardized speech signals.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP00941925A EP1181685A1 (de) | 1999-06-01 | 2000-05-24 | Verfahren und anordnung zur sprachkodierung mittels phonetischer dekodierung und übertragung von sprechermerkmalen |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19925264.5 | 1999-06-01 | ||
DE1999125264 DE19925264A1 (de) | 1999-06-01 | 1999-06-01 | Verfahren und Anordnung zur Übertragung von mit Individualmerkmalen behafteten Datensignalen, insbesondere Sprachsignalen |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2000074035A1 true WO2000074035A1 (de) | 2000-12-07 |
Family
ID=7910007
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/DE2000/001662 WO2000074035A1 (de) | 1999-06-01 | 2000-05-24 | Verfahren und anordnung zur sprachkodierung mittels phonetischer dekodierung und übertragung von sprechermerkmalen |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1181685A1 (de) |
DE (1) | DE19925264A1 (de) |
WO (1) | WO2000074035A1 (de) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0706172A1 (de) * | 1994-10-04 | 1996-04-10 | Hughes Aircraft Company | Sprachkodierer und Dekodierer mit niedriger Bitrate |
FR2771544A1 (fr) * | 1997-11-21 | 1999-05-28 | Sagem | Procede de codage de la parole et terminaux pour la mise en oeuvre du procede |
-
1999
- 1999-06-01 DE DE1999125264 patent/DE19925264A1/de not_active Ceased
-
2000
- 2000-05-24 EP EP00941925A patent/EP1181685A1/de not_active Withdrawn
- 2000-05-24 WO PCT/DE2000/001662 patent/WO2000074035A1/de not_active Application Discontinuation
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0706172A1 (de) * | 1994-10-04 | 1996-04-10 | Hughes Aircraft Company | Sprachkodierer und Dekodierer mit niedriger Bitrate |
FR2771544A1 (fr) * | 1997-11-21 | 1999-05-28 | Sagem | Procede de codage de la parole et terminaux pour la mise en oeuvre du procede |
Non-Patent Citations (1)
Title |
---|
FELICI M ET AL: "VERY LOW BIT RATE SPEECH CODING USING A DIPHONE-BASED RECOGNITION AND SYNTHESIS APPROACH", ELECTRONICS LETTERS,GB,IEE STEVENAGE, vol. 34, no. 9, 30 April 1998 (1998-04-30), pages 859 - 860, XP000799124, ISSN: 0013-5194 * |
Also Published As
Publication number | Publication date |
---|---|
DE19925264A1 (de) | 2000-12-14 |
EP1181685A1 (de) | 2002-02-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69735097T2 (de) | Verfahren und vorrichtung zur verbesserung der sprachqualität in tandem-sprachkodierern | |
DE69631318T2 (de) | Verfahren und Vorrichtung zur Erzeugung von Hintergrundrauschen in einem digitalen Übertragungssystem | |
DE69634645T2 (de) | Verfahren und Vorrichtung zur Sprachkodierung | |
EP0954909B1 (de) | Verfahren zum codieren eines audiosignals | |
DE60214358T2 (de) | Zeitskalenmodifikation von signalen mit spezifischem verfahren je nach ermitteltem signaltyp | |
DE60034484T2 (de) | Verfahren und vorrichtung in einem kommunikationssystem | |
EP1388147B1 (de) | Verfahren zur erweiterung der bandbreite eines schmalbandig gefilterten sprachsignals, insbesondere eines von einem telekommunikationsgerät gesendeten sprachsignals | |
DE69927505T2 (de) | Verfahren zum einfügen von zusatzdaten in einen audiodatenstrom | |
DE602004005784T2 (de) | Verbesserte Anregung für Höherband-Kodierung in einem Codec basierend auf Frequenzbandtrennungs-Kodierungsverfahren | |
EP1869671B1 (de) | Verfahren und vorrichtung zur geräuschunterdrückung | |
DE69911723T2 (de) | Automatische Sprach/Sprecher-Erkennung über digitale drahtlose Kanäle | |
DE60128121T2 (de) | Wahrnehmungsbezogen verbesserte aufbesserung kodierter akustischer signale | |
DE2626793A1 (de) | Verfahren zur bewertung stimmhafter und stimmloser zustaende eines sprachsignals | |
EP2245621B1 (de) | Verfahren und mittel zur enkodierung von hintergrundrauschinformationen | |
DE102006049154A1 (de) | Kodierung eines Informationssignals | |
DE9006717U1 (de) | Anrufbeantworter für die digitale Aufzeichnung und Wiedergabe von Sprachsignalen | |
EP1051701B1 (de) | Verfahren zum übermitteln von sprachdaten | |
EP0508547B1 (de) | Schaltungsanordnung zur Spracherkennung | |
WO2001086634A1 (de) | Verfahren zum erzeugen einer sprachdatenbank für einen zielwortschatz zum trainieren eines spracherkennungssystems | |
EP1327243A1 (de) | Verfahren und vorrichtung zum erzeugen eines skalierbaren datenstroms und verfahren und vorrichtung zum decodieren eines skalierbaren datenstroms | |
WO2000074035A1 (de) | Verfahren und anordnung zur sprachkodierung mittels phonetischer dekodierung und übertragung von sprechermerkmalen | |
DE69419846T2 (de) | Sende- und empfangsverfahren für kodierte sprache | |
DE60304237T2 (de) | Sprachkodiervorrichtung und Verfahren mit TFO (Tandem Free Operation) Funktion | |
DE69534799T2 (de) | Übertragungssystem mit anwendung verschiedener kodierprinzipen | |
DE69411275T2 (de) | Sprachsynthese durch konversion von phonemen in digitale wellenformen |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2000941925 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 2000941925 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09980400 Country of ref document: US |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000941925 Country of ref document: EP |