WO2004053837A1 - Terminal utilisateur et methode de reconnaissance vocale distribuee - Google Patents

Terminal utilisateur et methode de reconnaissance vocale distribuee Download PDF

Info

Publication number
WO2004053837A1
WO2004053837A1 PCT/EP2003/050686 EP0350686W WO2004053837A1 WO 2004053837 A1 WO2004053837 A1 WO 2004053837A1 EP 0350686 W EP0350686 W EP 0350686W WO 2004053837 A1 WO2004053837 A1 WO 2004053837A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
user terminal
application
voice activity
transmission
Prior art date
Application number
PCT/EP2003/050686
Other languages
English (en)
Inventor
David Pearce
Holly Kelleher
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to AU2003282110A priority Critical patent/AU2003282110A1/en
Publication of WO2004053837A1 publication Critical patent/WO2004053837A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Definitions

  • the present invention relates to the field of speech transmission.
  • a client device In speech transmission, delay is an obvious aspect of quality to the user. The degree of impact varies from application to application, but it would be desirable to minimise the total transmission delay between a client device and a server application (either a voice enabled service or a router to another user) .
  • a client device is a portable radio communications device, such as a mobile or portable radio, or a mobile phone. This device may be wirelessly linked to a network, the server being part of the network.
  • An example of transmission to a voice enabled service is provided by distributed speech recognition.
  • the front-end processing (feature extraction) is performed by the client application in the user terminal.
  • the back-end processing (speech recognition) is performed at a server somewhere in a network.
  • the front-end features are transmitted over the network from the client to the server.
  • the network may either be terrestrial, such as the Internet, or wireless, such as GPRS or 3G.
  • terrestrial networks the bandwidth is comparatively high, error rates are comparatively low, and consequently a good recognition performance is obtained.
  • bandwidth tends to be lower for wireless networks, and the transmission error rates higher, resulting in poorer recognition performance.
  • the user experience of DSR is strongly influenced by two important factors. The first is the recognition performance, which is dependent on the quality (integrity) of the data. The second is the latency in recognition due to transmission delays. In existing implementations there is a trade-off between the recognition performance and latency, especially for poor quality transmission channels.
  • Mitigating techniques such as allowing packet retransmissions over the network, can reduce the performance degradation caused by transmission errors. However, each packet retransmission increases the delay.
  • the upper part of figure 1 shows an uneven trace, which represents the speech energy received by a user terminal plotted against time.
  • This prior art terminal transmits to a server at a constant high quality level HI .
  • the transmission continues at level HI, even after the received speech energy has fallen to zero.
  • the system shown in the upper part of figure 1 would continue to use the high quality transmission level until, for example, the user of the terminal hung up' the call, thereby terminating the call.
  • the lower part of figure 1 shows the same trace of speech energy received at the user terminal as in the upper part of figure 1.
  • transmission by the user terminal to the server only continues at high quality level H2 for a finite time.
  • the transmission ceases a certain time after the cessation of speech.
  • This transmission scheme is referred to as discontinuous transmission.
  • the time between the cessation of speech and the cessation of transmission at level H2 is referred to as the ⁇ hangover time' .
  • a user terminal as claimed in claim 1.
  • a method for transmission as claimed in claim 15. Further aspects of the present invention are defined in the dependent claims .
  • FIG 1 illustrates the signalling schemes of two known prior art arrangements
  • FIG. 2 illustrates the general signalling scheme in accordance with the invention
  • FIG 3 is a more detailed illustration of various signals that may be generated by a device in accordance with the invention.
  • Figure 4 illustrates a determination that may be made by an enhanced version of the invention;
  • FIG. 5 is a flowchart illustrating a method in accordance with the invention.
  • Figure 6 illustrates a mobile radio communications device, which is one example of the user terminal 2 of the invention
  • FIG. 7 illustrates a communications system in accordance with the invention.
  • the present invention alleviates the trade-off between quality and latency. This is done by regularly updating the configuration of the communication process, either at the application level or at the network level, using information about voice activity within a user's utterance.
  • Speech data is sent over wireless networks using real-time protocols (RTP) .
  • RTP real-time protocols
  • a sequence of RTP payloads is used to transport speech data to the recognition application.
  • the speech data represents the user's utterance at the client terminal .
  • a signalling scheme in accordance with the invention is illustrated in accordance with figure 2.
  • the apparatus of the invention comprises a user terminal 2, which will be described in more detail in relation to figure 6.
  • the user terminal 2 is for use in a speech recognition system, and includes a client application, wherein, in use, the client application is connected to a server application 54 over a network 52. This arrangement is illustrated in figure 7.
  • the server application 54 performs speech recognition processing.
  • Communication between the client application of the user terminal 2 and the server application 54 depends on communication settings . These communication settings are dynamic, and their state at any particular time depends on the output of a voice activity detector.
  • the voice activity detector is part of the user terminal 2.
  • the voice activity detector provides an indication of the state of an utterance on a frame-by-frame basis. Voice activity detectors are themselves known, and therefore will not be described in further detail here.
  • the voice activity detector generates information that indicates which of a plurality of states is represented by user utterance data.
  • the user terminal 2 is adapted to choose the communication settings, at any or all stages of the communication link between the client application and the server application, in dependence on the indicated state of the utterance data.
  • the available communication settings comprise at least: (i) high quality transmission H3; and (ii) low quality transmission Ll .
  • Figure 2 illustrates the high quality transmission, shown as H3.
  • Figure 2 also illustrates the low quality transmission, shown as Ll .
  • Transmission at quality Ll commences at the end of the utterance, which transition will be indicated by the output of the voice activity detector transitioning at this point.
  • the low quality transmission Ll in figure 2 is a period in which transmission of data packets from the user terminal 2 to the server application 54 can still occur. This period Ll allows relatively rapid transmission, since the transmission is at low quality. The transmission at Ll will ensure that the system has caught up with all necessary packet transmissions by the time that an utterance is at an end. It is also advantageous over the scheme shown in the lower trace of figure 1, because the scheme of figure 2 does not completely stop transmission. If it did, a substantial time would be needed to re-commence transmission once again.
  • Figure 3 illustrates the energy, SD and AFE Payload values That may be observed and generated in a user terminal in accordance with the invention.
  • the upper ⁇ energy' trace of figure 3 shows that possible speech energy is identifiable between points a and b, d and e, and h and i of the input signal.
  • the SD' trace relates to speech detection.
  • the detection of speech at the client device, the user terminal 2 classifies frames as belonging to speech or non-speech.
  • Non-speech frames may comprise noise or quiet. This is the output of a signal processing algorithm, rather than the actual speech endpoints . In particular these positions may be different in high background noise.
  • the speech detection algorithm includes any handling of intra-word gaps, and intra-word silence is marked as speech. Examples of intra word gaps include stop gaps before plosives or unvoiced phonemes that may have a low energy in the input signal. This low energy may be due to reduced bandwidth, or being hidden in background noise.
  • An utterance segment is a group of one or more spoken words, grouped together based on their temporal proximity. This is defined by the constraint that the start of a word in an utterance is not separated from the end of the previous word by more than a specified duration.
  • a speech segment is a group of one or more spoken words resulting from speech detection, plus additional frames at the start and end.
  • a speech segment contains all the frames that are needed by the recogniser to achieve good recognition performance. Typically extra frames are needed before and after speech detection to compensate for SD overshoot or undershoot in background noise. These extra frames correspond to c-d, e-f, g-h and i-j in the lower trace of figure 3, and Tl and T2 in figure 4.
  • the resulting payload in the lower ⁇ AFE payload' trace begins at a point c, before the start of speech point d.
  • the point c is where the voice-activity detector first indicates speech. This portion of speech continues to point e, but the voice activity detector will continue to indicate speech until point f .
  • the time periods c to d and e to f are dealt with more thoroughly in connection with figure 4, below.
  • the ⁇ zig-zag' line from e to h indicates a time for which the present invention may judge that one utterance is continuing, even though the voice activity detector ceases indicating speech at point f.
  • the present invention may judge the entire time period from c to the end of the zig-zag after point j as being one utterance.
  • the invention can use: (i) high quality transmission for the periods c-f and g-j ; and
  • FIG. 4 illustrates voice activity states.
  • the upper trace of figure 4 shows that voice activity detection information may indicate one of the following states for the current frame of utterance data: i) Speech Tl, S, T2; or ii) Intra-speech gap G.
  • the user terminal 2 will be adapted to choose: (i) high quality transmission from the client application to the server application, when the voice activity detection indicates speech Tl, S, T2; or (ii) low quality transmission from the client application to the server application, when the voice activity detection indicates an intra-speech gap G.
  • the voice activity detector is adapted to indicate the presence of speech whilst either speech S is received from the user of terminal 2, or within the first threshold period Tl before speech commences, or until the second threshold period T2 has elapsed since speech was last received.
  • the first threshold period Tl is commensurate with typical speech attack times, preferably about 50ms.
  • the second threshold period T2 is commensurate with typical speech decay times, preferably about 150ms.
  • the periods Tl and T2 can be viewed as delays within the voice activity detector circuitry.
  • the voice activity detector can indicate the presence of an intra-speech gap G. This occurs when the second threshold period T2 has elapsed since speech was last received, and until the start of the first threshold period Tl before speech commences.
  • the voice activity detector is adapted to continue to indicate an intra-speech gap G whilst either silence or noise is received.
  • the arrangement of the present invention may employ discontinuous transmission.
  • Such a transmission scheme would mean that the user terminal would cease even low quality transmission Ll under certain conditions. These conditions can be set by means of a third time threshold T3. If the voice activity detector indicates that an intra- speech gap G exceeds the threshold period T3, then the user terminal 2 may be arranged to cease transmission.
  • the lower trace of figure 4 shows an example of how the threshold T3 can operate.
  • threshold T3 In the gap period Gl, threshold T3 is not exceeded. Gap Gl might be the pause in an utterance where the speaker is drawing breath. In gap period G2, threshold T3 is exceeded. Gap G2 might be a gap of several seconds, during which a speaker is looking for a new page of notes from which to read.
  • the voice activity detection information further indicates the end of the complete utterance for the current frame of utterance data .
  • the user terminal is adapted to discontinue transmission from the client to the application server at this point.
  • the period T3 may typically be in the range of 1-3 seconds, preferably being about 1.5 seconds.
  • the user terminal 2 may be adapted to alter communication settings that control any or all of the following, in dependence on the voice activity detection information: i) Application level protocol; ii) Transmission quality of service; and iii) Error mitigation scheme.
  • This control of the transmission quality of service may take the form of requesting or allowing a greater number of permitted retransmissions when a speech packet T1;S;T2 is indicated than when an intra-speech gap G packet is indicated.
  • control of the transmission quality of service may comprise the assignment of different coding schemes, using a more robust coding scheme when a speech packet T1;S;T2 is indicated than when an intra-speech gap G packet is indicated.
  • control of the application level protocol may be achieved by the preference to use TCP when a speech packet T1;S;T2 is indicated, and the preference to use UDP when an intra-speech gap G packet is indicated.
  • FIG. 5 shows a flowchart, which illustrates a method in accordance with the invention .
  • the method for transmission between the user terminal 2 and the server application 54 involves the user terminal 2, with its client application.
  • the client application is connected to the server application over the network 52, the server application performing speech recognition processing, and communication between the client application and the server application depending on communication settings.
  • the flowchart of figure 5 shows a method of deriving those settings.
  • the voice activity detector of the user terminal generates information that indicates which of a plurality of states is represented by the user utterance data.
  • the user terminal 2 chooses communication settings, at any or all stages of the communication link between the client application and the server application, in dependence on the indicated state of the utterance data, the available communication settings comprising at least: (i) high quality transmission; and (ii) low quality transmission.
  • signal 510 is provided to voice activity detector 512. If decision box 514 indicates that speech is present, then a clock is reset to zero, box 518. Decision box 514 indicates that speech is present during the periods Tl, S and T2. This is the time for which the voice activity detector indicates that speech is present.
  • the voice activity detector indicates that speech is present, see box 522, and the flowchart returns to box 514. If in box 520 the clock value is found to be greater than zero, then a check is made in box 524 as to whether the clock value exceeds a threshold E. If yes, then the method determines that the utterance has ended, see box 528. If the result of box 524 is no, then an indication can be made that there is an intra-speech gap, see box 526.
  • the indication of an intra-speech gap clearly corresponds to gap ⁇ G' shown in Fig. 4.
  • the clock value E can be set to determine how large a gap G is treated as being just part of one utterance, or is treated as being the break between different utterances. So the value E determines the threshold T3.
  • E could, for example, correspond to a time greater than Gl in figure 4, but less than the time corresponding to gap G2.
  • the flowchart of figure 5 would classify gap Gl as simply part of one continuous utterance, see box 526 on figure 5.
  • Gap G2 however would be the end of an utterance, box 528 on figure 5.
  • voice activity detection information from figure 5 may indicate, for the current frame of utterance data, either:
  • the user terminal 2 then may choose communication settings that provide the following:
  • intra-speech gap G exceeds a threshold period T3.
  • the voice activity detection information further indicates the end of the complete utterance for the current frame of utterance data.
  • the user terminal can then discontinue transmission from the client to the application server.
  • the period T3 may be in the range of 1-3 seconds, preferably being about 1.5 seconds.
  • Figure 6 illustrates a mobile radio communications device, which is one example of the user terminal 2 of the invention.
  • the user terminal may for example be either a portable- or a mobile radio.
  • the radio 2 of figure 6 can transmit speech from a user of the radio.
  • the radio comprises a microphone 34, which provides a signal for transmission by the radio.
  • the signal from the microphone is transmitted by transmission circuit 22.
  • Transmission circuit 22 transmits via switch 24 and antenna 26.
  • the transmitter 2 also has a controller 30 and a read only memory (ROM) 32.
  • Controller 30 may be a microprocessor.
  • ROM 32 is a permanent memory, and may be a non-volatile Electrically Erasable Programmable Read Only Memory (EEPROM) .
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • the radio 2 also comprises a display 42 and keypad 44, which serve as part of the user interface circuitry of the radio. At least the keypad 44 portion of the user interface circuitry is activatable by the user. Voice activation of the radio, or other means of interaction with a user, may also be employed.
  • Signals received by the radio are routed by the switch 24 to receiving circuitry 28. From there, the received signals are routed to controller 30 and audio processing circuitry
  • a loudspeaker 40 is connected to audio circuit 38.
  • Loudspeaker 40 forms a further part of the user interface.
  • Controller 30 performs the function of the voice activity detector of the present invention.
  • a data terminal 36 may be provided. Terminal 36 would provide a signal comprising data for transmission by transmitter circuit 22, switch 24 and antenna 26.
  • Figure 7 illustrates the relationship between the user terminal 2 of the present invention, and the network 52 and server application 54.
  • the server application is either a Distributed Speech Recognition (DSR) application or an Automatic Speech Recognition (ASR) application.
  • DSR Distributed Speech Recognition
  • ASR Automatic Speech Recognition
  • the user terminal 2 may be adapted to communicate with the server via a packet-switched radio transmission network, the indicated state of an entire packet being determined by the indicated states of the data frames within the packet.
  • User terminal 2 may take the form of a portable- or mobile radio, a wirelessly linked lap-top PC, Personal Digital Assistant or personal organiser, or a mobile telephone.
  • Network 52 and one or more user terminals 2 comprise a communication system.
  • the total length of the data segment to be transmitted consists of a whole utterance. Each whole utterance is made up of both speech and the gaps within the speech, provided that those gaps do not exceed period T3.
  • the length of the gap determines the segmentation: up to threshold T3 for the duration of the gaps, speech instances are categorised as being part of the same utterance. Speech utterances are categorised as part of a new utterance, if the gap is longer than this threshold.
  • a complete utterance consists of the actual speech together with intra-speech gaps of up to (for example) 1.5 seconds between words, and a final gap of typically 1.5 seconds at the end.
  • Frames designated speech' in this scheme may include a number of frames preceding/following actual detected speech to form a buffer.
  • the respective number of frames would be commensurate with typical speech attack and decay times; typically 50ms for attack and 150ms for decay, but varying with the vocabulary used on the system.
  • the preceding speech buffer would require a small delay.
  • the voice activity detector may indicate confidence in these states, and/or sub-categorise the states, for example sub-categorising speech as voiced and unvoiced speech.
  • the indicated states of each frame in the utterance are used to control the trade-off between recognition performance and latency. This is done by selecting communication settings emphasising recognition performance during speech (T1,S,T2), and selecting communication settings emphasising low latency during intra-speech gaps (G) . This selection must be done as permitted by current transmission conditions, such as packet data size (i.e. a single packet of data may span both conditions) , or service availability. For communication settings operating on whole packets, a state for the whole packet can be determined from the packet content.
  • simple rules can be employed to determine more sophisticated decisions for the situation of a packet spanning several states .
  • An example would be deciding whether the amount of speech in a packet is significant depending on the percentage of speech frames within the packet, and/or whether they are contiguous frames.
  • rules appropriate to the circumstances.
  • the ⁇ indicated state' refers to either the indicated state of the data frame or the indicated state of the data packet as appropriate.
  • an example of application level protocol control would be to choose between TCP or UDP protocols depending on the indicated state of the packet. This would involve using the TCP protocol for the speech components of the utterance, which would guarantee their transmission but can incur latency. It would conversely involve using UDP for the intra-speech gaps, which would risk their loss in the network, but reduce overall latency. Clearly, any appropriate protocols available on a given network may be used in a similar manner, if they exhibit similar tradeoffs.
  • An example of transmission quality of service control would be to define the number of permitted retransmissions for a packet depending on the indicated state of the packet. A packet containing a significant amount of speech would be permitted more retransmissions than one predominantly comprising an intra-speech gap.
  • GPRS provides four coding levels, CS1 through to CS4.
  • CSl is robust to channel errors but contains relatively little data.
  • CS4 is not very robust to channel errors but contains a relatively large amount of data .
  • the coding decision could be either based on the indicated state of the packet, or the indicated state of the constituent data frames. This would depend on the relative size of the RTP packet and the GPRS transmission blocks.
  • the effect of the control provided by the present invention is to minimise latency, whilst preserving the robustness of the speech within an utterance, thereby providing a more effective means of balancing the recognition performance versus latency trade-off for distributed speech recognition systems .
  • the above mechanisms of the present invention are employed within the user's terminal 2.
  • the voice activity indication is transmitted to or derivable by the server, one may employ state-dependent schemes at the server also.
  • An example of error mitigation scheme control based on the transmitted voice activity indication from the user terminal would be to select different schemes depending on the indicated state of the data frames. For intra-speech gaps, low latency but relatively poor methods could be used. Such a method would be, for example, copy-forward error correction. For speech, higher latency methods, that require both the last and next good packet, could be employed.
  • the selection of different schemes could also be used for other recognition server based tasks, such as frame error mitigation and/or the adjustment of recognition complexity parameters (such as beamwidth) within the recogniser itself.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

L'invention porte sur un terminal utilisateur (2), destiné à être utilisé dans un système de reconnaissance vocale et comprenant une application client, qui est, en utilisation, connectée à une application serveur (54) sur un réseau (52), l'application serveur effectuant le traitement de la reconnaissance vocale, et la communication entre l'application client et l'application serveur dépendant des paramètres de communication. Le terminal utilisateur (2) comprend un détecteur d'activité vocale qui génère des informations indiquant quelle est la pluralité d'états (T1, S, T2) représentée par les données d'émission de paroles de l'utilisateur. Le terminal utilisateur (2) est également adapté pour choisir les paramètres de communication, à n'importe laquelle ou à toutes les étapes de la liaison de communication entre l'application client et l'application serveur, en fonction de l'état indiqué des données d'émission de paroles, les paramètres de communication disponibles comprenant au moins une transmission de haute qualité (H3) et une transmission de moindre qualité (L1).
PCT/EP2003/050686 2002-12-10 2003-10-03 Terminal utilisateur et methode de reconnaissance vocale distribuee WO2004053837A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003282110A AU2003282110A1 (en) 2002-12-10 2003-10-03 A user terminal and method for distributed speech recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0228765A GB2396271B (en) 2002-12-10 2002-12-10 A user terminal and method for voice communication
GB0228765.4 2002-12-10

Publications (1)

Publication Number Publication Date
WO2004053837A1 true WO2004053837A1 (fr) 2004-06-24

Family

ID=9949410

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2003/050686 WO2004053837A1 (fr) 2002-12-10 2003-10-03 Terminal utilisateur et methode de reconnaissance vocale distribuee

Country Status (3)

Country Link
AU (1) AU2003282110A1 (fr)
GB (1) GB2396271B (fr)
WO (1) WO2004053837A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8451823B2 (en) 2005-12-13 2013-05-28 Nuance Communications, Inc. Distributed off-line voice services

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1840877A4 (fr) * 2005-01-18 2008-05-21 Fujitsu Ltd Méthode de changement de vitesse d'elocution et dispositif de changement de vitesse d'elocution
FR2881867A1 (fr) * 2005-02-04 2006-08-11 France Telecom Procede de transmission de marques de fin de parole dans un systeme de reconnaissance de la parole

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1107231A2 (fr) * 1991-06-11 2001-06-13 QUALCOMM Incorporated Vocodeur à vitesse variable
WO2002093555A1 (fr) * 2001-05-17 2002-11-21 Qualcomm Incorporated Systeme et procede de transmission d'une activite vocale dans un systeme de reconnaissance vocale distribue

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3090842B2 (ja) * 1994-04-28 2000-09-25 沖電気工業株式会社 ビタビ復号法に適応した送信装置
JP2596388B2 (ja) * 1994-10-28 1997-04-02 日本電気株式会社 ディジタルコードレス電話システム
NZ508340A (en) * 2000-11-22 2002-10-25 Tait Electronics Ltd Mobile radio half duplex communication with synchronisation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1107231A2 (fr) * 1991-06-11 2001-06-13 QUALCOMM Incorporated Vocodeur à vitesse variable
WO2002093555A1 (fr) * 2001-05-17 2002-11-21 Qualcomm Incorporated Systeme et procede de transmission d'une activite vocale dans un systeme de reconnaissance vocale distribue

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ETSI: "ETSI ES 202 050 V1.1.1", ETSI STANDARD, October 2002 (2002-10-01), SOPHIA ANTIPOLIS, FRANCE, XP002270548 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8451823B2 (en) 2005-12-13 2013-05-28 Nuance Communications, Inc. Distributed off-line voice services

Also Published As

Publication number Publication date
GB2396271B (en) 2005-08-10
AU2003282110A1 (en) 2004-06-30
GB0228765D0 (en) 2003-01-15
GB2396271A (en) 2004-06-16

Similar Documents

Publication Publication Date Title
US9047863B2 (en) Systems, methods, apparatus, and computer-readable media for criticality threshold control
KR100575193B1 (ko) 적응 포스트필터를 포함하는 디코딩 방법 및 시스템
US20070206645A1 (en) Method of dynamically adapting the size of a jitter buffer
TWI390505B (zh) 用於間斷傳輸及精確重製背景雜訊資訊之方法
US7983906B2 (en) Adaptive voice mode extension for a voice activity detector
EP1224659B1 (fr) Detection de l'activite d'un signal complexe pour ameliorer la classification vocale/bruit d'un signal audio
US8019599B2 (en) Speech codecs
US9712287B2 (en) System and method of redundancy based packet transmission error recovery
US7979272B2 (en) System and methods for concealing errors in data transmission
US7573907B2 (en) Discontinuous transmission of speech signals
KR20030048067A (ko) 음성 복호기에서 프레임 오류 은폐를 위한 개선된스펙트럼 매개변수 대체
EP2055055A2 (fr) Ajustement de tampon d'agitation
WO2007132377A1 (fr) Commande de gestion de gigue adaptative dans un décodeur
US8631295B2 (en) Error concealment
US20170187635A1 (en) System and method of jitter buffer management
EP1982331B1 (fr) Procédé et agencement destinés à un codage de paroles dans des systèmes de communication sans fil
US7231348B1 (en) Tone detection algorithm for a voice activity detector
KR101002405B1 (ko) 오디오 신호의 타임-스케일링 제어
US8144862B2 (en) Method and apparatus for the detection and suppression of echo in packet based communication networks using frame energy estimation
EP2109950B1 (fr) Procédé de transmission de données dans un système de communication
US20100251051A1 (en) Error concealment
US20080103765A1 (en) Encoder Delay Adjustment
JPH10190498A (ja) 不連続伝送中に快適雑音を発生させる改善された方法
WO2004053837A1 (fr) Terminal utilisateur et methode de reconnaissance vocale distribuee
US8204753B2 (en) Stabilization and glitch minimization for CCITT recommendation G.726 speech CODEC during packet loss scenarios by regressor control and internal state updates of the decoding process

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP