WO2003049389A1 - Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet - Google Patents
Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet Download PDFInfo
- Publication number
- WO2003049389A1 WO2003049389A1 PCT/EP2001/014359 EP0114359W WO03049389A1 WO 2003049389 A1 WO2003049389 A1 WO 2003049389A1 EP 0114359 W EP0114359 W EP 0114359W WO 03049389 A1 WO03049389 A1 WO 03049389A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- sound
- speech
- packet
- data packets
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/64—Hybrid switching systems
- H04L12/6418—Hybrid transport
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/64—Hybrid switching systems
- H04L12/6418—Hybrid transport
- H04L2012/6481—Speech, voice
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/64—Hybrid switching systems
- H04L12/6418—Hybrid transport
- H04L2012/6494—Silence suppression
Definitions
- the invention relates to a method for transmitting sound and / or voice data in a packet-oriented communication system with the generic features of claim 1 and an apparatus for performing such a method.
- GSM Global System for Mobile Communication
- GPRS General Packet Radio System
- recorded speech is digitized. Regardless of the information in the data stream, the digitization is carried out continuously in equidistant steps, with an equivalent digital value being assigned to every instantaneous analog value at every sampling time of the speech signal.
- the digital values obtained in this way can be additionally compressed in a subsequent processing step. Subsequently, the information or values obtained in this way are packed in the usual way, always the same size data packets, as can also be seen from FIG. 3.
- the individual data are then transmitted to the receiver via the communication network with the aid of transmission devices.
- the information from the data packets is reconstructed both in terms of content and with regard to the temporal behavior during the subsequent playback.
- the reconstruction of the speech signal is very sensitive to fluctuations in the transmission duration, that is to say to transmission delays. Ultimately, this leads to a deteriorated or incomplete speech quality during playback.
- the object of the invention is to improve a method and a device for transmitting sound and / or voice data in a packet-oriented communication system.
- This object is achieved by a method for transmitting sound and / or speech data with the features of claim 1 or a transmission and / or
- An advantageous method for reproducing such speech or sound data is the subject of claim 8 with independent inventive significance.
- Word parts e.g. single syllables, and if possible not to separate whole words individually.
- the division of words into data packets should take place in such a way that the beginning of a word or part of a word coincides with the beginning of the data packet or its useful data section, while there may be free spaces towards the end of the data packet. Such free spaces can expediently be filled with blank data or other information data.
- a speech recognition By using such a speech recognition, it is particularly easy to make it possible to recognize speech structures, in particular words or syllables, in order to carry out the distribution on individual data packets accordingly.
- a memory or memory section can expediently also be stored with a type of dictionary, as is also known per se for speech recognition programs, so that a further refinement of the analysis of the speech structure can be carried out with the aid of stored sample words.
- Fig. 1 shows an arrangement for recording, digitizing, and send data, and to receive, reconstruct and reproducing data in a communication system
- Fig. 2 shows an analog speech diagram with a temporal
- Amplitude distribution and marking of boundaries for packing individual speech parts into different data packets and 3 shows such a diagram to illustrate the assignment of the voice information to individual data packets according to the prior art.
- an exemplary transmission device SE can consist of a large number of individual components, but these can also be partially omitted and / or incorporated in other devices.
- a microphone MIC which is connected to an analog / digital converter A / D, is used to record speech or other sound sequences.
- the analog / digital converter A / D converts the analog voice signal into a digital signal. Digitization usually takes place without
- the digitized data values are input from the analog / digital converter A / D into a processor, in particular microprocessor ⁇ PS.
- the processor ⁇ PS can also have a further input for entering existing digital data values.
- the processor ⁇ PS forwards the processed data values to a transmitting device, which in the preferred embodiment is designed as a transmitting / receiving device S / R.
- the transceiver prepares the received data values for transmission via an interface. As an interface for outputting the
- an antenna A is connected to the transmitting / receiving device S / R, wherein any other transmission paths, in particular line-bound interfaces, can also be used instead of a radio interface V shown.
- a receiving device RE has a large number of corresponding components. About one
- the signal sent or transmitted by the transmitting device SE via the interface V is received with the data values and received and preprocessed to a receiving device, in the preferred exemplary embodiment shown a transmitting / receiving device S / R.
- the transmitting / receiving device S / R forwards the corresponding preprocessed signal or the corresponding preprocessed data values to a processor, in the exemplary embodiment shown a microprocessor ⁇ PR.
- the received data values are processed or processed in the processor ⁇ PR and then output to a digital / analog converter D / A, which converts them into an analog signal.
- the analog signal output by the digital / analog converter D / A is then output via an amplifier to a loudspeaker Sp which outputs the originally spoken language for a listener. Additionally or alternatively, an interface for a digital output of the voice data can be provided at the receiving device RE.
- independent transmitter devices SE and independent receiver devices RE can be provided, but combined transmitter / receiver devices that have both the modules and functions of the transmitter devices SE and the modules and functions of the receiver device RE.
- Digitized data values are input into the processor ⁇ PS in the transmission device SE and ultimately represent the course shown in FIG. 3 as a continuous signal.
- the corresponding amplitudes are around the dynamic zero value “0 *” over the time axis t of the signal or the digital formed therefrom after sampling.
- the digital data is currently packaged by packing a fixed number of data values into the user data block of a packet (packet 1, packet 2, ..., packet 5, ). These data packets transmitted via the interface V are then unpacked in the receiving device by the processor ⁇ PR and reconstructed into a data sequence again.
- the individual packets are reproduced on the receiver side in the receiving device RE, for example in accordance with a chronological sequence, in such a way that data values of a packet arriving too late are unpacked after a corresponding, artificially generated speech pause and reproduced via the loudspeaker Sp. If the subsequent data packet arrives punctually at the receiving device RE via a shorter data path or via an undelayed path, it is unpacked and the data values are reproduced directly via the loudspeaker Sp in accordance with the specification of the smallest possible time delay. The reproduction of data values of packet 1 that have not yet been sent is suppressed for this purpose.
- Such a procedure creates unnatural language gaps in the middle of a word or even in the middle of a phoneme, i.e. a sound or a natural sequence of sounds.
- parts of words, words or phonemes are left out, also in places where they interfere with speech or even understanding.
- a structure recognition is connected upstream for the packaging of speech data or sound data, that is to say also music data.
- the natural speech structure is analyzed, the criteria for the analysis being the search for language breaks between words, the search for syllables or the search for phonemes.
- the sensible limits shown in FIG. 2 for separating speech, sound or corresponding data values that belong to each other due to the structure are located, for example, in areas in which the amplitudes d of the data values do not move out of a predetermined differential dynamic range ⁇ d over a certain period of time ⁇ t.
- Such amplitude values over a corresponding time period ⁇ t are, for example, a sign of a pause between two words.
- all those positions that are mathematically characterized in that the first derivative of the function that describes the language is at zero over a longer, optionally predeterminable duration or a possibly predeterminable interval around the zero line are particularly suitable for packet boundaries exceeds.
- a first data packet packet 1 is filled with only a small number of data values, while in the second data packet packet 2 a longer speech or Sound sequence or their data values are used.
- the second data packet is followed by a longer speech pause or language gap, the data of which are preferably packaged in no packet at all in order to reduce the data and signaling load on the communication network.
- the third data packet packet 3 also again has a longer sequence of data values before there is another speech pause.
- a compulsory limit can of course also be set, so that in such a case faults such as in the prior art are necessarily accepted.
- any other criteria can of course also be used.
- the basic dynamic level may be above this limit ⁇ d, for example, which is why it can be useful not only to analyze limit values around the zero range, but also to generally investigate whether the speech or sound data are within a certain period of time with regard to their amplitude values of a certain dynamic range.
- a large number of conventional phonemes are stored in a table or a memory M, which is expediently connected to the processor ⁇ PS.
- Spoken and digitized data values that arrive at the microprocessor ⁇ PS are then compared as a data value sequence with a corresponding data value sequence of the phonemes stored in the memory M.
- a phoneme is recognized, its end is marked or registered as a possible limit.
- the actual packaging can then be used to search for limits determined in this way in order to enable the data value sequences to be optimally packaged in the data packages.
- the number of data values to be packed per data packet is kept low.
- the usual packet sizes are 1500, 9800 or 64000 bytes.
- voice data at the usual sampling rates of e.g. 8 kHz and a typical phoneme duration of the order of a few tenths of a second only use data volumes of approx. 500 bytes per data packet.
- the data packets are unpacked immediately after receipt and the reproduction of the sound or speech structure is effected via the loudspeaker Sp.
- a first speech packet with a longer natural speech or sound pause arrives at the receiver, such as packet 2 from FIG. 2, and then the subsequent packet, ie packet 3 from FIG. 2, arrives late, then the natural speech or sound pause at the end of package 2 can be artificially extended without any problems.
- the sound sensation or speech sensation is only slightly or not at all disturbed by the packaging of the data in the package 2 with a speech or sound pause at the end in such a reproduction.
- the data processing ⁇ PR can be carried out in the receiving device RE in accordance with the extension of sound gaps in such a way that the last sound or tone is doubled, tripled, ... reproduced, which is like a sound reproduction prolonged stretching appears and also has only a negligible or negligible effect on the sensation of sound or speech.
- the duration of the data values to be packed into the data packets is expediently chosen to be so long that a sufficient number of data values can be used to insert a sufficient number of phonemes, syllables and / or words, depending on the requirement of the separation criterion can, so that ideally there is always a number of unoccupied data values after the useful data values used, which are overwritten by the first data values when the next data packet arrives during playback in the receiving device RE.
- the method can of course also be used for the preservation of speech documents, for example in order to be able to store a historically important speech in the meantime in packets in a memory in order to be able to store it later To enable playback.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2002219159A AU2002219159A1 (en) | 2001-12-06 | 2001-12-06 | Method and device for transferring sound and/or voice data in a packet-oriented communication system |
PCT/EP2001/014359 WO2003049389A1 (fr) | 2001-12-06 | 2001-12-06 | Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2001/014359 WO2003049389A1 (fr) | 2001-12-06 | 2001-12-06 | Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003049389A1 true WO2003049389A1 (fr) | 2003-06-12 |
Family
ID=8164717
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2001/014359 WO2003049389A1 (fr) | 2001-12-06 | 2001-12-06 | Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet |
Country Status (2)
Country | Link |
---|---|
AU (1) | AU2002219159A1 (fr) |
WO (1) | WO2003049389A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107924610A (zh) * | 2015-06-24 | 2018-04-17 | 大众汽车有限公司 | 用于提高在远程触发时的安全性的方法和设备,机动车 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0859353A2 (fr) * | 1997-02-13 | 1998-08-19 | Siemens Business Communication Systems, Inc. | Procédé et dispositif de traitement de la parole utilisant des limites logiques de parole |
-
2001
- 2001-12-06 AU AU2002219159A patent/AU2002219159A1/en not_active Abandoned
- 2001-12-06 WO PCT/EP2001/014359 patent/WO2003049389A1/fr not_active Application Discontinuation
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0859353A2 (fr) * | 1997-02-13 | 1998-08-19 | Siemens Business Communication Systems, Inc. | Procédé et dispositif de traitement de la parole utilisant des limites logiques de parole |
Non-Patent Citations (1)
Title |
---|
BOCCI P ET AL: "DYNAMIC DATA PACKET SIZING BASED ON REAL TIME MONITORING OF SYSTEM VOICE ACTIVITY", MOTOROLA TECHNICAL DEVELOPMENTS, MOTOROLA INC. SCHAUMBURG, ILLINOIS, US, vol. 31, 1 June 1997 (1997-06-01), pages 172, XP000741064, ISSN: 0887-5286 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107924610A (zh) * | 2015-06-24 | 2018-04-17 | 大众汽车有限公司 | 用于提高在远程触发时的安全性的方法和设备,机动车 |
Also Published As
Publication number | Publication date |
---|---|
AU2002219159A1 (en) | 2003-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE60223131T2 (de) | Verfahren und vorrichtung zum codieren und decodieren von pauseninformationen | |
DE69634645T2 (de) | Verfahren und Vorrichtung zur Sprachkodierung | |
DE69735097T2 (de) | Verfahren und vorrichtung zur verbesserung der sprachqualität in tandem-sprachkodierern | |
DE60034484T2 (de) | Verfahren und vorrichtung in einem kommunikationssystem | |
DE69839312T2 (de) | Kodierverfahren für vibrationswellen | |
DE69910240T2 (de) | Vorrichtung und verfahren zur wiederherstellung des hochfrequenzanteils eines überabgetasteten synthetisierten breitbandsignals | |
DE60126513T2 (de) | Verfahren zum ändern der grösse eines zitlerpuffers zur zeitausrichtung, kommunikationssystem, empfängerseite und transcoder | |
DE60029147T2 (de) | Qualitätsverbesserung eines audiosignals in einem digitalen netzwerk | |
DE69310990T2 (de) | Verfahren zum Einfügen digitaler Daten in ein Audiosignal vor der Kanalkodierung | |
DE69923346T2 (de) | Vorrichtung und verfahren zur ip kommunikation mit sprachgeneriertem text | |
DE69910837T2 (de) | Beseitigung von tonerkennung | |
DE69613611T2 (de) | System zur Speicherung von und zum Zugriff auf Sprachinformation | |
EP2245621B1 (fr) | Procédé et moyens d encodage d informations de bruit de fond | |
DE69730721T2 (de) | Verfahren und vorrichtungen zur geräuschkonditionierung von signalen welche audioinformationen darstellen in komprimierter und digitalisierter form | |
EP1051701B1 (fr) | Procede de transmission de donnees vocales | |
DE60220307T2 (de) | Verfahren zur übertragung breitbandiger tonsignale über einen übertragungskanal mit verminderter bandbreite | |
DE60118922T2 (de) | Messung der wahrgenommenen sprachqualität während des betriebs durch messen von objektiver fehlerparamter | |
DE69815562T2 (de) | Verfahren und Vorrichtung zur Signalverarbeitung mittels logischer Sprachgrenzen | |
WO2002058054A1 (fr) | Procede et dispositif pour produire un flux de donnees modulable et procede et dispositif pour decoder un flux de donnees modulable | |
DE69828849T2 (de) | Signalverarbeitungsgerät und -verfahren sowie Informationsaufzeichnungsgerät | |
WO2003049389A1 (fr) | Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet | |
EP0658874A1 (fr) | Procédé et dispositif de circuit pour l'agrandissement de la largeur de signaux de langage à bande étroite | |
EP1062487B1 (fr) | Dispositif a microphone pour la reconnaissance vocale dans des conditions spatiales variables | |
DE2303497C2 (de) | Verfahren zur Übertragung von Sprachsignalen | |
DE102013005844B3 (de) | Verfahren und Vorrichtung zum Messen der Qualität eines Sprachsignals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Country of ref document: JP |