WO2003049389A1 - Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet - Google Patents

Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet Download PDF

Info

Publication number
WO2003049389A1
WO2003049389A1 PCT/EP2001/014359 EP0114359W WO03049389A1 WO 2003049389 A1 WO2003049389 A1 WO 2003049389A1 EP 0114359 W EP0114359 W EP 0114359W WO 03049389 A1 WO03049389 A1 WO 03049389A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sound
speech
packet
data packets
Prior art date
Application number
PCT/EP2001/014359
Other languages
German (de)
English (en)
Inventor
Klaus Huenlich
Original Assignee
Siemens Aktiengesellschaft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Aktiengesellschaft filed Critical Siemens Aktiengesellschaft
Priority to AU2002219159A priority Critical patent/AU2002219159A1/en
Priority to PCT/EP2001/014359 priority patent/WO2003049389A1/fr
Publication of WO2003049389A1 publication Critical patent/WO2003049389A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • H04L2012/6481Speech, voice
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport
    • H04L2012/6494Silence suppression

Definitions

  • the invention relates to a method for transmitting sound and / or voice data in a packet-oriented communication system with the generic features of claim 1 and an apparatus for performing such a method.
  • GSM Global System for Mobile Communication
  • GPRS General Packet Radio System
  • recorded speech is digitized. Regardless of the information in the data stream, the digitization is carried out continuously in equidistant steps, with an equivalent digital value being assigned to every instantaneous analog value at every sampling time of the speech signal.
  • the digital values obtained in this way can be additionally compressed in a subsequent processing step. Subsequently, the information or values obtained in this way are packed in the usual way, always the same size data packets, as can also be seen from FIG. 3.
  • the individual data are then transmitted to the receiver via the communication network with the aid of transmission devices.
  • the information from the data packets is reconstructed both in terms of content and with regard to the temporal behavior during the subsequent playback.
  • the reconstruction of the speech signal is very sensitive to fluctuations in the transmission duration, that is to say to transmission delays. Ultimately, this leads to a deteriorated or incomplete speech quality during playback.
  • the object of the invention is to improve a method and a device for transmitting sound and / or voice data in a packet-oriented communication system.
  • This object is achieved by a method for transmitting sound and / or speech data with the features of claim 1 or a transmission and / or
  • An advantageous method for reproducing such speech or sound data is the subject of claim 8 with independent inventive significance.
  • Word parts e.g. single syllables, and if possible not to separate whole words individually.
  • the division of words into data packets should take place in such a way that the beginning of a word or part of a word coincides with the beginning of the data packet or its useful data section, while there may be free spaces towards the end of the data packet. Such free spaces can expediently be filled with blank data or other information data.
  • a speech recognition By using such a speech recognition, it is particularly easy to make it possible to recognize speech structures, in particular words or syllables, in order to carry out the distribution on individual data packets accordingly.
  • a memory or memory section can expediently also be stored with a type of dictionary, as is also known per se for speech recognition programs, so that a further refinement of the analysis of the speech structure can be carried out with the aid of stored sample words.
  • Fig. 1 shows an arrangement for recording, digitizing, and send data, and to receive, reconstruct and reproducing data in a communication system
  • Fig. 2 shows an analog speech diagram with a temporal
  • Amplitude distribution and marking of boundaries for packing individual speech parts into different data packets and 3 shows such a diagram to illustrate the assignment of the voice information to individual data packets according to the prior art.
  • an exemplary transmission device SE can consist of a large number of individual components, but these can also be partially omitted and / or incorporated in other devices.
  • a microphone MIC which is connected to an analog / digital converter A / D, is used to record speech or other sound sequences.
  • the analog / digital converter A / D converts the analog voice signal into a digital signal. Digitization usually takes place without
  • the digitized data values are input from the analog / digital converter A / D into a processor, in particular microprocessor ⁇ PS.
  • the processor ⁇ PS can also have a further input for entering existing digital data values.
  • the processor ⁇ PS forwards the processed data values to a transmitting device, which in the preferred embodiment is designed as a transmitting / receiving device S / R.
  • the transceiver prepares the received data values for transmission via an interface. As an interface for outputting the
  • an antenna A is connected to the transmitting / receiving device S / R, wherein any other transmission paths, in particular line-bound interfaces, can also be used instead of a radio interface V shown.
  • a receiving device RE has a large number of corresponding components. About one
  • the signal sent or transmitted by the transmitting device SE via the interface V is received with the data values and received and preprocessed to a receiving device, in the preferred exemplary embodiment shown a transmitting / receiving device S / R.
  • the transmitting / receiving device S / R forwards the corresponding preprocessed signal or the corresponding preprocessed data values to a processor, in the exemplary embodiment shown a microprocessor ⁇ PR.
  • the received data values are processed or processed in the processor ⁇ PR and then output to a digital / analog converter D / A, which converts them into an analog signal.
  • the analog signal output by the digital / analog converter D / A is then output via an amplifier to a loudspeaker Sp which outputs the originally spoken language for a listener. Additionally or alternatively, an interface for a digital output of the voice data can be provided at the receiving device RE.
  • independent transmitter devices SE and independent receiver devices RE can be provided, but combined transmitter / receiver devices that have both the modules and functions of the transmitter devices SE and the modules and functions of the receiver device RE.
  • Digitized data values are input into the processor ⁇ PS in the transmission device SE and ultimately represent the course shown in FIG. 3 as a continuous signal.
  • the corresponding amplitudes are around the dynamic zero value “0 *” over the time axis t of the signal or the digital formed therefrom after sampling.
  • the digital data is currently packaged by packing a fixed number of data values into the user data block of a packet (packet 1, packet 2, ..., packet 5, ). These data packets transmitted via the interface V are then unpacked in the receiving device by the processor ⁇ PR and reconstructed into a data sequence again.
  • the individual packets are reproduced on the receiver side in the receiving device RE, for example in accordance with a chronological sequence, in such a way that data values of a packet arriving too late are unpacked after a corresponding, artificially generated speech pause and reproduced via the loudspeaker Sp. If the subsequent data packet arrives punctually at the receiving device RE via a shorter data path or via an undelayed path, it is unpacked and the data values are reproduced directly via the loudspeaker Sp in accordance with the specification of the smallest possible time delay. The reproduction of data values of packet 1 that have not yet been sent is suppressed for this purpose.
  • Such a procedure creates unnatural language gaps in the middle of a word or even in the middle of a phoneme, i.e. a sound or a natural sequence of sounds.
  • parts of words, words or phonemes are left out, also in places where they interfere with speech or even understanding.
  • a structure recognition is connected upstream for the packaging of speech data or sound data, that is to say also music data.
  • the natural speech structure is analyzed, the criteria for the analysis being the search for language breaks between words, the search for syllables or the search for phonemes.
  • the sensible limits shown in FIG. 2 for separating speech, sound or corresponding data values that belong to each other due to the structure are located, for example, in areas in which the amplitudes d of the data values do not move out of a predetermined differential dynamic range ⁇ d over a certain period of time ⁇ t.
  • Such amplitude values over a corresponding time period ⁇ t are, for example, a sign of a pause between two words.
  • all those positions that are mathematically characterized in that the first derivative of the function that describes the language is at zero over a longer, optionally predeterminable duration or a possibly predeterminable interval around the zero line are particularly suitable for packet boundaries exceeds.
  • a first data packet packet 1 is filled with only a small number of data values, while in the second data packet packet 2 a longer speech or Sound sequence or their data values are used.
  • the second data packet is followed by a longer speech pause or language gap, the data of which are preferably packaged in no packet at all in order to reduce the data and signaling load on the communication network.
  • the third data packet packet 3 also again has a longer sequence of data values before there is another speech pause.
  • a compulsory limit can of course also be set, so that in such a case faults such as in the prior art are necessarily accepted.
  • any other criteria can of course also be used.
  • the basic dynamic level may be above this limit ⁇ d, for example, which is why it can be useful not only to analyze limit values around the zero range, but also to generally investigate whether the speech or sound data are within a certain period of time with regard to their amplitude values of a certain dynamic range.
  • a large number of conventional phonemes are stored in a table or a memory M, which is expediently connected to the processor ⁇ PS.
  • Spoken and digitized data values that arrive at the microprocessor ⁇ PS are then compared as a data value sequence with a corresponding data value sequence of the phonemes stored in the memory M.
  • a phoneme is recognized, its end is marked or registered as a possible limit.
  • the actual packaging can then be used to search for limits determined in this way in order to enable the data value sequences to be optimally packaged in the data packages.
  • the number of data values to be packed per data packet is kept low.
  • the usual packet sizes are 1500, 9800 or 64000 bytes.
  • voice data at the usual sampling rates of e.g. 8 kHz and a typical phoneme duration of the order of a few tenths of a second only use data volumes of approx. 500 bytes per data packet.
  • the data packets are unpacked immediately after receipt and the reproduction of the sound or speech structure is effected via the loudspeaker Sp.
  • a first speech packet with a longer natural speech or sound pause arrives at the receiver, such as packet 2 from FIG. 2, and then the subsequent packet, ie packet 3 from FIG. 2, arrives late, then the natural speech or sound pause at the end of package 2 can be artificially extended without any problems.
  • the sound sensation or speech sensation is only slightly or not at all disturbed by the packaging of the data in the package 2 with a speech or sound pause at the end in such a reproduction.
  • the data processing ⁇ PR can be carried out in the receiving device RE in accordance with the extension of sound gaps in such a way that the last sound or tone is doubled, tripled, ... reproduced, which is like a sound reproduction prolonged stretching appears and also has only a negligible or negligible effect on the sensation of sound or speech.
  • the duration of the data values to be packed into the data packets is expediently chosen to be so long that a sufficient number of data values can be used to insert a sufficient number of phonemes, syllables and / or words, depending on the requirement of the separation criterion can, so that ideally there is always a number of unoccupied data values after the useful data values used, which are overwritten by the first data values when the next data packet arrives during playback in the receiving device RE.
  • the method can of course also be used for the preservation of speech documents, for example in order to be able to store a historically important speech in the meantime in packets in a memory in order to be able to store it later To enable playback.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

L'invention concerne un procédé permettant de répartir dans des paquets de données des données sonores et/ou vocales destinées à être transmises par l'intermédiaire d'un réseau de communication commuté par paquets. L'objectif de l'invention est de perturber le moins possible la sensibilité sonore et/ou vocale lors de la reproduction des données vocales et/ou sonores reconstruites. A cet effet, une analyse de la structure sonore et/ou vocale est effectuée avant l'empaquetage de valeurs de données dans les paquets de données et une décision est prise relative aux suites de valeurs de données à insérer respectivement dans un paquet de données. Il est particulièrement approprié de remplir les paquets de données au moyen de valeurs de données, de façon qu'à la fin de, si possible, chaque paquet de données, il y ait une pause vocale et/ou sonore, une limite de phonème, une limite d'une partie de mot ou une limite de mot.
PCT/EP2001/014359 2001-12-06 2001-12-06 Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet WO2003049389A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2002219159A AU2002219159A1 (en) 2001-12-06 2001-12-06 Method and device for transferring sound and/or voice data in a packet-oriented communication system
PCT/EP2001/014359 WO2003049389A1 (fr) 2001-12-06 2001-12-06 Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2001/014359 WO2003049389A1 (fr) 2001-12-06 2001-12-06 Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet

Publications (1)

Publication Number Publication Date
WO2003049389A1 true WO2003049389A1 (fr) 2003-06-12

Family

ID=8164717

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/014359 WO2003049389A1 (fr) 2001-12-06 2001-12-06 Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet

Country Status (2)

Country Link
AU (1) AU2002219159A1 (fr)
WO (1) WO2003049389A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924610A (zh) * 2015-06-24 2018-04-17 大众汽车有限公司 用于提高在远程触发时的安全性的方法和设备,机动车

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0859353A2 (fr) * 1997-02-13 1998-08-19 Siemens Business Communication Systems, Inc. Procédé et dispositif de traitement de la parole utilisant des limites logiques de parole

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0859353A2 (fr) * 1997-02-13 1998-08-19 Siemens Business Communication Systems, Inc. Procédé et dispositif de traitement de la parole utilisant des limites logiques de parole

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BOCCI P ET AL: "DYNAMIC DATA PACKET SIZING BASED ON REAL TIME MONITORING OF SYSTEM VOICE ACTIVITY", MOTOROLA TECHNICAL DEVELOPMENTS, MOTOROLA INC. SCHAUMBURG, ILLINOIS, US, vol. 31, 1 June 1997 (1997-06-01), pages 172, XP000741064, ISSN: 0887-5286 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107924610A (zh) * 2015-06-24 2018-04-17 大众汽车有限公司 用于提高在远程触发时的安全性的方法和设备,机动车

Also Published As

Publication number Publication date
AU2002219159A1 (en) 2003-06-17

Similar Documents

Publication Publication Date Title
DE60223131T2 (de) Verfahren und vorrichtung zum codieren und decodieren von pauseninformationen
DE69634645T2 (de) Verfahren und Vorrichtung zur Sprachkodierung
DE69735097T2 (de) Verfahren und vorrichtung zur verbesserung der sprachqualität in tandem-sprachkodierern
DE60034484T2 (de) Verfahren und vorrichtung in einem kommunikationssystem
DE69839312T2 (de) Kodierverfahren für vibrationswellen
DE69910240T2 (de) Vorrichtung und verfahren zur wiederherstellung des hochfrequenzanteils eines überabgetasteten synthetisierten breitbandsignals
DE60126513T2 (de) Verfahren zum ändern der grösse eines zitlerpuffers zur zeitausrichtung, kommunikationssystem, empfängerseite und transcoder
DE60029147T2 (de) Qualitätsverbesserung eines audiosignals in einem digitalen netzwerk
DE69310990T2 (de) Verfahren zum Einfügen digitaler Daten in ein Audiosignal vor der Kanalkodierung
DE69923346T2 (de) Vorrichtung und verfahren zur ip kommunikation mit sprachgeneriertem text
DE69910837T2 (de) Beseitigung von tonerkennung
DE69613611T2 (de) System zur Speicherung von und zum Zugriff auf Sprachinformation
EP2245621B1 (fr) Procédé et moyens d encodage d informations de bruit de fond
DE69730721T2 (de) Verfahren und vorrichtungen zur geräuschkonditionierung von signalen welche audioinformationen darstellen in komprimierter und digitalisierter form
EP1051701B1 (fr) Procede de transmission de donnees vocales
DE60220307T2 (de) Verfahren zur übertragung breitbandiger tonsignale über einen übertragungskanal mit verminderter bandbreite
DE60118922T2 (de) Messung der wahrgenommenen sprachqualität während des betriebs durch messen von objektiver fehlerparamter
DE69815562T2 (de) Verfahren und Vorrichtung zur Signalverarbeitung mittels logischer Sprachgrenzen
WO2002058054A1 (fr) Procede et dispositif pour produire un flux de donnees modulable et procede et dispositif pour decoder un flux de donnees modulable
DE69828849T2 (de) Signalverarbeitungsgerät und -verfahren sowie Informationsaufzeichnungsgerät
WO2003049389A1 (fr) Procede et dispositif de transmission de donnees sonores et/ou vocales dans un systeme de communication oriente paquet
EP0658874A1 (fr) Procédé et dispositif de circuit pour l'agrandissement de la largeur de signaux de langage à bande étroite
EP1062487B1 (fr) Dispositif a microphone pour la reconnaissance vocale dans des conditions spatiales variables
DE2303497C2 (de) Verfahren zur Übertragung von Sprachsignalen
DE102013005844B3 (de) Verfahren und Vorrichtung zum Messen der Qualität eines Sprachsignals

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP