EP0140249A1 - Sprachanalyse und Synthese mit Energienormalisierung - Google Patents
Sprachanalyse und Synthese mit Energienormalisierung Download PDFInfo
- Publication number
- EP0140249A1 EP0140249A1 EP84112266A EP84112266A EP0140249A1 EP 0140249 A1 EP0140249 A1 EP 0140249A1 EP 84112266 A EP84112266 A EP 84112266A EP 84112266 A EP84112266 A EP 84112266A EP 0140249 A1 EP0140249 A1 EP 0140249A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- frames
- speech
- energy
- frame
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010606 normalization Methods 0.000 title abstract description 29
- 238000003786 synthesis reaction Methods 0.000 title abstract description 14
- 230000015572 biosynthetic process Effects 0.000 title abstract description 13
- 230000001629 suppression Effects 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 21
- 230000005284 excitation Effects 0.000 claims description 22
- 230000000694 effects Effects 0.000 abstract description 6
- 230000003044 adaptive effect Effects 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 6
- 230000004044 response Effects 0.000 description 5
- 230000000630 rising effect Effects 0.000 description 4
- 210000000689 upper leg Anatomy 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001154 acute effect Effects 0.000 description 1
- 230000001668 ameliorated effect Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 235000015115 caffè latte Nutrition 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000063 preceeding effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L2025/783—Detection of presence or absence of voice signals based on threshold decision
- G10L2025/786—Adaptive threshold
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
Definitions
- the present invention relates to voice coding systems.
- voice coding systems including voice mail in microcomputer networks, voice mail sent and received over telephone lines by microcomputers, user-programmed synthetic speech, etc.
- a particular problem in such applications is energy variation. That is, not only will a speaker's voice intensity typically contain a large dynamic range related to sentence inflection, but different speakers will have different volume levels, and the same speaker's voice level may vary widely at different times. Untrained speakers are especially likely to use nonuniform uncontrolled variations in volume, which the listener normally ignores. This large dynamic range would mean that the voice coding method used must accommodate a-wide dynamic range, and therefore an increased number of bits would be required for coding at reasonable resolution.
- Energy normalization also improves the intelligibility of the speech received. That is, the dynamic range available from audio amplifiers and loudspeakers is much less than that which can easily be perceived by the human ear. In fact, the dynamic range of loudspeakers is typically much less than that. of microphones. This means that a dynamic range which is perfectly intelligible to a human listener may be hard to understand if communicated through a loudspeaker, even if absolutely perfect encoding and decoding is used.
- a further desideratum is that, in many attractive applications, the person listening to synthesized speech should not be required to twiddle a volume control frequently. Where a volume control is available, dynamic range can be analog-adjusted for each received synthetic speech signal, to shift the narrow window provided by the loudspeaker's narrow dynamic range, but this is obviously undesirable for voice mail systems and many other applications.
- analog automatic gain controls have been used to achieve energy normalization of raw signals.
- analog automatic gain controls distort the signal input to the analog to digital converter. That is, where (e.g.) reflection coefficients are used to encode speech data, use of an automatic gain control in the analog signal will introduce error into the calculated reflection coefficients. While it is hard to analyze the nature of this error, error is in fact introduced.
- use of an analog automatic gain control requires an analog part, and every introduction of special analog parts into a digital system greatly increases the cost of the digital system. If an AGC circuit having a fast response is used, the energy levels of consecutive allophones may be inappropriate.
- the sibilant /s/ will normally show a much lower energy than the vowel /i/. If a fast-response AGC circuit is used, the energy-normalized-word “six" is left with a sound extremely hissy, since the initial /s/ will be raised to the same energy as the /i/, inappropriately. Even if a slower-response AGC circuit is used, substantial problems still may exist, such as raising the noise floor up to signal levels during periods of silence, or inadequate limiting of a loud utterance following a silent period.
- a further general problem with energy normalization is caused by the existence of noise during silent periods. That is, if an energy normalization system brings the noise floor up towards the expected normal energy level during periods when no speech signal is present, the intelligibility of speech will be degraded and the speech will be unpleasant to listen to. In addition, substantial bandwidth will be wasted encoding noise signals during speech silence periods.
- the present invention solves the problems of energy normalization digitally, by using look-ahead energy normalization. That is, an adaptive energy normalization parameter is carried from frame to frame during a speech analysis portion of an analysis-synthesis system. Speech frames are buffered for a fairly long period, e.g. h second, and then are normalized according to the current energy normalization parameter. That is, energy normalization is "look ahead" normalization in that each frame of speech (e.g. each 20 millisecond interval of speech) is normalized according to the energy normalization value from much later, e.g. from 25 frames later. The energy normalization value is calculated for the frames as received by using a fast-rising slow-falling peak-tracking value.
- a novel silence suppression scheme is used.
- Silence suppression is achieved by tracking 2 additional energy contours.
- One contour is a slow-rising fast-falling value, which is updated only during unvoiced speech frames, and therefore tracks a lower envelope of the energy contour. (This in effect tracks the ambient noise level.)
- the other'parameter is a fast-rising slow-falling parameter, which is updated only during voiced speech frames, and thus tracks an upper envelope of the energy contour. (This in effect tracks the average speech level.)
- a threshold value is calculated as the maximum of respective multiples of these 2 parameters, e.g. the greater of: (5 times the lower envelope parameter), and (one fifth of the upper envelope parameter).
- Speech is not considered to have begun unless a first frame which both has an energy above-the threshold level and is also voiced is detected.
- the system then backtracks among the buffered frames to include as "speech" all immediately preceding frames which also have energy greater than the threnhold. That is, after a period during which the framen of parameters received have been identified as silent frames, all succeeding frames are tentively identified as silent frames, until a super-threshold-energy voiced frame is found.
- the silence suppression system backtracks among frames immediately preceding this super-threshold energy voiced frame until an broken string subthreshold-energy frames at least to 0.4 seconds long is Sound. When such a 0.4 second interval of silence is found, backtracking ceases, and only those frames after the 0.4 seconds of silence and before the first voiced super-threshold energy frame are identified as non-silent frames.
- a waiting counter is started. If the waiting reaches an upper limit (e.g. 0.4 seconds), without the energy again increasing above T, the utterance is considered to have stopped.
- an upper limit e.g. 0.4 seconds
- a speech coding system comprising:
- the presen.t invention provides a novel speech analysis/synthesis system, which can be configured in a wide variety of embodiments.
- the presently preferred embodiment uses a VAX 11/780 computer, coupled with a Digital Sound Corporation Model. 200 A/D and D/A converter to provided high-resolution high-bit-rate digitizing and to provide speech synthesis.
- a conventional microphone and loudspeaker, with an analog amplifier such as a Digital Sound Corporation Model 240 are also used in conjunction with the system.
- the present invention contains novel teachings which are also particularly applicable to microcomputer-based systems. That is; the high resolution provided by the above digitizer is not necessary, and the computing power available on the VAX is also not necessary.
- TM TI Professional Computer
- the system configuration of the presently preferred embodiment is shown schematically in Figure 5. That is, a raw voice input is received by microphone 10, amplified by microphone amplifier 12, and digitized by D/A converter 14.
- the D/A converter used in the presently preferred embodiment is an expensive high-resolution instrument, which provides 16 bits of resolution at a sample rate of 8kHz.
- the data received at this high sample rate will be transformed to provide speech parameters at a desired frame rate.
- the frame rate is 50 frames per second, but the frame period can easily range between 10 milliseconds and 30 milliseconds, or over an even wider range.
- linear predictive coding based analysis is used to encode the speech. That is, the successive samples (at the original high bit rate, of, in this example, 8000 per second) are used as inputs to derive a set of linear predictive coding parameters, for example 10 reflection coeff i can t s k 1 -k 10 plus pitch and energy, as described below.
- the audible speech is first translated into a meaningful input for the system.
- a microphone within range of the audible speech is connected to a microphone preamplifier and to an analog-to-digital converter.
- the input stream is sampled 8000 times per second, to an accuracy of 16 bits.
- the stream of input data is then arbitrarily divided up into successive "frames", and, in the presently preferred embodiment, each frame is defined to include 160 samples. That is, the interval between frames is 20 msec, but the L P C parameters of each frame are calculated over a range of 240 samples (30 msec).
- the sequence of samples in each speech input frame is first transformed into a set of inverse filter coefficients a k , as conventionally defined.
- a k the predictor coefficients with which a signal S k in a time series can be modeled as the sum of an input u k and a linear combination of past values S k-n in the series. That is:
- Each input frame contains a large number of sampling points, and the sampling points within any one input frame can themselves be-considered as a time series.
- the actual derivation of the filter coefficients a k for the sample frame is as follows: First, the time-series autocorrelation values Ri are computed as where the summation is taken over the range of samples within the input frame. In this embodiment, 11 autocorrelation values are calculated (R 0 -R 10 ). A recursive procedure is now used to derive the inverse filter coefficients as follows:
- the presently preferred embodiment uses a procedure due to Leroux-Gueguen.
- the normalized error energy E i.e. the self-residual energy of the input frame
- the Leroux-Gueguen algorithm also produces the reflection coefficients (also referred to as partial correlation coefficients) ki.
- the reflection coefficients k r are very stable parameters, and are insensitive to coding errors (quantization noise).
- the Leroux-Gueguen procedure is set forth, for example, in IEEE Transactions on Acoustic Speech and Signal Processing, page 257 (June 1977), which is hereby incorporated by reference.
- This algorithm is a recursive procedure, defined as follows:
- This algorithm computes the reflection coefficients ki using as intermediaries impulse response estimates e k rather then the filter coefficients a k .
- Linear predictive coding models generally are well known in the art, and can be found extensively discussed in such references as R abiner and Schafer, Digital Processing of Speech Signals (1978), Markel, and Gray, Linear Predictive Coding of Speech (1976), which are hereby incorporated by reference, and in many other widely available publications.
- the excitation coding transmitted need not be merely energy and pitch, but may also contain some additional information regarding a residual signal. For example, it would be possible to encode a bandwidth of the residual signal which was an integral multiple of the pitch, and approximately equal to 1000 Hz, as an excitation signal.
- Such a technique is extensively discussed in Patent Application No. 484,720, filed April 13, 1983, which is hereby incorporated by reference.
- the LPC parameters can be encoded in various ways. For example, as is also well known in the art, there are numerous equivalent formulations of linear predictive coefficients. These can be expressed as the LPC filter coefficients a k , or as the reflection coefficients ki, or as the autocorrelations Ri, or as other parameter sets such as the impulse response estimates parameters E(i) which are provided by the LeRoux-Guegen procedure. Moreover, the LPC model order is not necessarily 10, but can be 8, 12, 14, or other.
- the present invention does not necessarily have to be used in combination with an LPC speech encoding model at all. That is, the present invention provides an energy normalization method which digitally modifies only the energy of each of a sequence of speech frames, with regard to only the energy and voicing of each of a sequence of speech frames.
- the present invention is applicable to energy normalization of the systems using any one of a great variety of speech encoding methods, including transform techniques, formant encoding techniques, etc..
- the present invention operates on the energy value of the data vectors.
- the encoded parameters are the reflection coefficients k 1 -k 10 , the energy, and pitch.
- ENORM is subsequently updated, for each successive frame, as follows:
- ENORM(i) is set equal to beta times E(i) + (1 - beta) times ENORM (i-1), where alpha is given a value close to 1 to provide a fast rising time constant (preferably about 0.1 seconds), and Beta has given a value close to 0, to provide a slow falling time constant (preferably in the neighborhood of 4 seconds).
- the adapative parameter ENORM provides an envelope tracking measure, which tracks the peak energy of the sequence of frames I.
- This adaptive peak-tracking parameter ENORM(i) is used to normalize the energy of the frames, but this not done directly.
- the energy of each frame I is normalized by dividing it by a look ahead normalized energy ENORM * (i), where ENORM * (i) is defined to be equal to ENORM(i+d), where d represents a number of frames of delay which is typically chosen to be equivalent to 1 ⁇ 2 second (but may be in the range of 0.1 to 2 seconds, or even have values outside this range).
- ENORM * (i) is defined to be equal to ENORM(i+d)
- d represents a number of frames of delay which is typically chosen to be equivalent to 1 ⁇ 2 second (but may be in the range of 0.1 to 2 seconds, or even have values outside this range).
- the falling time constant (corresponding to the parameter beta) is so long, energy normalization at the end of a word will not be distorted by the approximately zero-energy value of the following frames of silence.
- the silence suppression will prevent ENORM from falling very far in this situation.
- the long time constant corresponding to beta will mean that the energy normalization value ENORM of the silent frames h second after the end of a word will be still be dominated by the voiced phonemes immediately preceding the final unvoiced consonant.
- the final unvoiced constant will be normalized with respect to preceeding voiced frames, and its energy also will not be unduly raised.
- the foregoing steps provide a normalized energy E * (i) for each speech frame i.
- a further novel step is used to suppress silent periods.
- silence detection is used to selectively prevent certain frames from being encoded. Those frames which are encoded are encoded with a normalized energy E * (i), together with the remaining speech parameters in the chosen model (which in the presently preferred embodiment are the pitch P and the reflection coefficients k 1 -k 10 ) ⁇
- Silence suppression is accomplished in a further novel aspect of the present invention, by carrying 2 envelope parameters: ELOW and E H IG H . Both of these parameters are started from some initial value (e.g. 10.0) and then are updated depending on tie energy E(i) of each frame i and on the voiced or unvoiced s:atus of that frame. If the frame is unvoiced, then only the lower parameter ELOV is updated as follows:
- the 2 envelope parameters ELOW and EHIGH are then used to generate 2 threshold parameters TLOW and THIGH, defined as:
- the end of the word i.e. the beginning of "silent" frames which need not be encoded
- a voiced frame is found which has its energy E(i) less than T
- a waiting counter is started. If the waiting reaches an upper limit (e.g. 0.4 seconds) without the energy ever rising above T, then speech is determined to have stopped, and frames after the last frame which had energy E(i) greater than T are considered to be silent frames. These frames are therefore not encoded.
- the energy normalization and silence suppression features of the system of the present invention are both dependant in important ways on the voicing decision. It is preferable, although not strictly necessary, that the voicing. decision be made by means of a dynamic programming procedure which makes pitch and voicing decisions simultaneously, using an interrelated distance measure. Such a system is presently preferred, and is described in greater detail in U.S. Patent Application No. 484, 718, filed April 13, 1983, which is hereby incorporated by reference (TI-9623). It should also be noted that this system tends to classify low-energy frames as unvoiced. This is desirable.
- the actual encoding can now be performed with a minimum bit rate.
- 5 bits are used to encode the energy of each frame, 3 bits are used for each of the ten reflection coefficients, and 5 bits are used for the pitch.
- this bit rate can be further compressed by one of the many variations of delta coding, e.g. by fitting a polynomial to the sequence of parameter values across successive frames and then encoding merely the coefficients of that polynomial, by simple linear delta coding, or by any of the various well known methods.
- an analysis system as described above is combined with speech synthesis capability, to provide a voice mail station, or a station capable of. generating user-generated spoken reminder messages.
- the encoded output of the analysis section, as described above is connected to a data channel of some sort.
- This may be a wire to which an RS 232 UAR T chip is connected, or may be a telephone line accessed by a modem, or may be simply a local data buss which is also connected to a memory board or memory chips, or may of course be any of a tremendous variety of other data channels.
- connection to any of these normal data channels is easily and conveniently made two way, so that data may be received from a communications channel or recalled from memory. Such data received from the channel will thus contain a plurality of speech parameters, including an energy value..
- the encoded data received from the data channel will contain LPC filter parameters for each speech frame, as well as some excitation information.
- the data vector for each speech frame contains 10 reflection coefficients as well as pitch and energy. The reflection coefficients configure a tense-order lattice filter, and an excitation signal is generated from the excitation parameters and provided as input to this lattice filter.
- the excitation parameters are pitch and energy
- a pulse at intervals equal to the pitch period, is provided as the excitation function during voiced frames (i.e.
- the energy parameter can be used to define the power provided in the excitation function.
- the output of the lattice filter provides the LPC-modeled synthetic signal, which will typically be of good intelligible quality, although not absolutely transparent. This output is then digital-to-analog converted, and the analog output of the'd-a converter is provided to an audio amplifier, which drives a loudspeaker or headphones.
- such a voice mail system is configured in a microcomputer-based system.
- TM Texas Instruments Professional Computer
- TM TM
- a speech board incorporated is used as a voice mail terminal. Additional information regarding this hardware configuration is provided in Appendix B attached hereto, which is hereby incorporated by reference.
- This configuration uses a 8088-based system, together with a special board having,,a TMS 320 numeric processure chip mounted thereon. The fast multiple provided by the TMS 320 is-very convenient in performing signal processing functions.
- a pair of audio amplifiers for input and output is also provided on the speech board, as is an 8 bit mu-law codec.
- the function of this embodiment is essentially identical to that of the VAX embodiment described above, except for a sl-ight difference regarding the converters.
- the 8 bit codec performs mu-law conversion, which is non linear but provides enhanced dynamic range.
- a lookup table is used to transform the 8 bit mu-law output provided from the codec chip into a 13 bit linear output.
- the linear output of the lattice filter operation is pre-converted, using the same lookup table, to an 8-bit word which will give an appropriate analog output signal from the codec.
- This microcomputer. embodiment also includes an internal speaker, and a microphone jack.
- a further preferred realization is the use of multiple micro-computer based voice mail stations, as described above, to configure a microcomputer-based voice mail system.
- microcomputers are conventionally connected in a local area network, using one of the many conventional LAN protoacalls, or are connected using PBX tilids.
- PBX tilids Substantial background information regarding such embodiments is contained in Appendix C, which is hereby incorporated by reference.
- the only slightly distinctive' feature of this voice mail system embodiment is that the transfer mechnizam used must be able to pass binary data, and not merely ASCII data. As between microcomputer stations which have the voice mail.
- the voice mail operation is simply a straight forward file transfer, wherein a file representing encoded speech data is generated by an analysis operation at one station, is transferred as a file to another station, and then is converted to analog speech data by a synthesis operation at the second station.
- the crucial changes taught by the present invention are changes in the analysis portion of an analysis/synthesis system, but these changes affect the system as a whole. That is, the system as a whole will achieve higher throughput of intelligible speech information per transmitted bit, better perceptual quality of synthesized sound at the synthesis section, and other system-level advantages.
- microcomputer network voice mail systems perform better with minimized channel loading according to the present invention.
- the present invention provides the objects described above, of energy normalization and of silent suppression, as well as other objects,.advantageously.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US541410 | 1983-10-13 | ||
US06/541,497 US4696039A (en) | 1983-10-13 | 1983-10-13 | Speech analysis/synthesis system with silence suppression |
US06/541,410 US4696040A (en) | 1983-10-13 | 1983-10-13 | Speech analysis/synthesis system with energy normalization and silence suppression |
US541497 | 1983-10-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0140249A1 true EP0140249A1 (de) | 1985-05-08 |
EP0140249B1 EP0140249B1 (de) | 1988-08-10 |
Family
ID=27066699
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP19840112266 Expired EP0140249B1 (de) | 1983-10-13 | 1984-10-12 | Sprachanalyse und Synthese mit Energienormalisierung |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0140249B1 (de) |
JP (1) | JPH0644195B2 (de) |
DE (1) | DE3473373D1 (de) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0341128A1 (de) * | 1988-05-04 | 1989-11-08 | Thomson-Csf | Verfahren und Anordnung zur Feststellung der Anwesenheit von Sprachsignalen |
EP0459363A1 (de) * | 1990-05-28 | 1991-12-04 | Matsushita Electric Industrial Co., Ltd. | Sprachkodierer |
FR2686183A1 (fr) * | 1992-01-15 | 1993-07-16 | Idms Sa | Systeme de numerisation d'un signal audio, procede et dispositif de mise en óoeuvre pour constituer une base de donnees numeriques. |
EP0683482A2 (de) * | 1994-05-13 | 1995-11-22 | Sony Corporation | Verfahren zur Rauschreduktion eines Sprachsignals und zur Detektion des Rauschbereichs |
WO1999044191A1 (en) * | 1998-02-27 | 1999-09-02 | At & T Corp. | System and method for noise threshold adaptation for voice activity detection in nonstationary noise environments |
EP1168306A2 (de) * | 2000-06-01 | 2002-01-02 | Avaya Technology Corp. | Verfahren und Vorrichtung zur Verbesserung von der Verständlichkeit eines digital komprimierten Sprachsignals |
GB2367467A (en) * | 2000-09-30 | 2002-04-03 | Mitel Corp | Noise level calculation, e.g. for an echo canceller |
DE19952538C2 (de) * | 1998-11-06 | 2003-03-27 | Ibm | Automatische Verstärkungsregelung in einem Spracherkennungssystem |
WO2005038773A1 (en) * | 2003-10-16 | 2005-04-28 | Koninklijke Philips Electronics N.V. | Voice activity detection with adaptive noise floor tracking |
US7529670B1 (en) | 2005-05-16 | 2009-05-05 | Avaya Inc. | Automatic speech recognition system for people with speech-affecting disabilities |
US7653543B1 (en) | 2006-03-24 | 2010-01-26 | Avaya Inc. | Automatic signal adjustment based on intelligibility |
US7660715B1 (en) | 2004-01-12 | 2010-02-09 | Avaya Inc. | Transparent monitoring and intervention to improve automatic adaptation of speech models |
US7675411B1 (en) | 2007-02-20 | 2010-03-09 | Avaya Inc. | Enhancing presence information through the addition of one or more of biotelemetry data and environmental data |
US7925508B1 (en) | 2006-08-22 | 2011-04-12 | Avaya Inc. | Detection of extreme hypoglycemia or hyperglycemia based on automatic analysis of speech patterns |
US7962342B1 (en) | 2006-08-22 | 2011-06-14 | Avaya Inc. | Dynamic user interface for the temporarily impaired based on automatic analysis for speech patterns |
US8041344B1 (en) | 2007-06-26 | 2011-10-18 | Avaya Inc. | Cooling off period prior to sending dependent on user's state |
WO2014165032A1 (en) * | 2013-03-12 | 2014-10-09 | Aawtend, Inc. | Integrated sensor-array processor |
US10049685B2 (en) | 2013-03-12 | 2018-08-14 | Aaware, Inc. | Integrated sensor-array processor |
US10204638B2 (en) | 2013-03-12 | 2019-02-12 | Aaware, Inc. | Integrated sensor-array processor |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2308248A1 (fr) * | 1975-04-18 | 1976-11-12 | Siemens Ag | Dispositif pour le reglage automatique de niveau sonore |
US4071695A (en) * | 1976-08-12 | 1978-01-31 | Bell Telephone Laboratories, Incorporated | Speech signal amplitude equalizer |
FR2380612A1 (fr) * | 1977-02-09 | 1978-09-08 | Thomson Csf | Dispositif de discrimination des signaux de parole et systeme d'alternat comportant un tel dispositif |
FR2451680A1 (fr) * | 1979-03-12 | 1980-10-10 | Soumagne Joel | Discriminateur parole/silence pour interpolation de la parole |
EP0027066A1 (de) * | 1979-09-28 | 1981-04-15 | Thomson-Csf | Detektorvorrichtung für Sprachsignale und eine solche Vorrichtung enthaltendes Umschaltsystem |
US4280192A (en) * | 1977-01-07 | 1981-07-21 | Moll Edward W | Minimum space digital storage of analog information |
EP0047589A1 (de) * | 1980-09-09 | 1982-03-17 | Northern Telecom Limited | Verfahren und Vorrichtung für die Anzeige von Sprachsignalen in einem Übertragungskanal |
US4351983A (en) * | 1979-03-05 | 1982-09-28 | International Business Machines Corp. | Speech detector with variable threshold |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS58171099A (ja) * | 1982-03-31 | 1983-10-07 | 富士通株式会社 | 音声パラメ−タ修正方法 |
-
1984
- 1984-10-12 DE DE8484112266T patent/DE3473373D1/de not_active Expired
- 1984-10-12 EP EP19840112266 patent/EP0140249B1/de not_active Expired
- 1984-10-13 JP JP59215061A patent/JPH0644195B2/ja not_active Expired - Lifetime
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2308248A1 (fr) * | 1975-04-18 | 1976-11-12 | Siemens Ag | Dispositif pour le reglage automatique de niveau sonore |
US4071695A (en) * | 1976-08-12 | 1978-01-31 | Bell Telephone Laboratories, Incorporated | Speech signal amplitude equalizer |
US4280192A (en) * | 1977-01-07 | 1981-07-21 | Moll Edward W | Minimum space digital storage of analog information |
FR2380612A1 (fr) * | 1977-02-09 | 1978-09-08 | Thomson Csf | Dispositif de discrimination des signaux de parole et systeme d'alternat comportant un tel dispositif |
US4351983A (en) * | 1979-03-05 | 1982-09-28 | International Business Machines Corp. | Speech detector with variable threshold |
FR2451680A1 (fr) * | 1979-03-12 | 1980-10-10 | Soumagne Joel | Discriminateur parole/silence pour interpolation de la parole |
EP0027066A1 (de) * | 1979-09-28 | 1981-04-15 | Thomson-Csf | Detektorvorrichtung für Sprachsignale und eine solche Vorrichtung enthaltendes Umschaltsystem |
EP0047589A1 (de) * | 1980-09-09 | 1982-03-17 | Northern Telecom Limited | Verfahren und Vorrichtung für die Anzeige von Sprachsignalen in einem Übertragungskanal |
Non-Patent Citations (6)
Title |
---|
ELECTRONICS LETTERS, vol. 9, no. 14, 12th July 1973, pages 298-300, Stevenage, GB; M.G. CROLL et al.: " 'Nearly instantaneous' digital compandor for transmitting six sound-programme signals in a 2.048Mbit/s multiplex" * |
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 20, no. 12, May 1978, pages 5437-5440, New York, US; S.J. BOIES et al.: "Amplitude-detection method for producing rate-controlled speech" * |
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 25, no. 7B, December 1982, pages 3678-3680, New York, US; D.R. IRVIN: "Voice activity detector" * |
ICASSP 79, 1979 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Washington, D.C., 2nd-4th April 1979, pages 212-215, IEEE, New York, US; R.D. PREUSS: "A frequency domain noise cancelling preprocessor for narrowband speech communications systems" * |
ICASSP 83, PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, Boston, Massachusetts, 14th-16th April 1983, vol. 2, pages 511-514, IEEE, New York, US; J.A. FELDMAN et al.: "A custom IC for automatic gain control in LPC vocoders" * |
NEW ELECTRONICS, vol. 15, no. 2, January 1982, pages 30-32, London, GB; B. DANCE: "A digital speech compressor" * |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2631147A1 (fr) * | 1988-05-04 | 1989-11-10 | Thomson Csf | Procede et dispositif de detection de signaux vocaux |
US4982341A (en) * | 1988-05-04 | 1991-01-01 | Thomson Csf | Method and device for the detection of vocal signals |
EP0341128A1 (de) * | 1988-05-04 | 1989-11-08 | Thomson-Csf | Verfahren und Anordnung zur Feststellung der Anwesenheit von Sprachsignalen |
US5652843A (en) * | 1990-05-27 | 1997-07-29 | Matsushita Electric Industrial Co. Ltd. | Voice signal coding system |
EP0747879A1 (de) * | 1990-05-28 | 1996-12-11 | Matsushita Electric Industrial Co., Ltd. | Sprachkodierer |
EP0459363A1 (de) * | 1990-05-28 | 1991-12-04 | Matsushita Electric Industrial Co., Ltd. | Sprachkodierer |
FR2686183A1 (fr) * | 1992-01-15 | 1993-07-16 | Idms Sa | Systeme de numerisation d'un signal audio, procede et dispositif de mise en óoeuvre pour constituer une base de donnees numeriques. |
EP0683482A2 (de) * | 1994-05-13 | 1995-11-22 | Sony Corporation | Verfahren zur Rauschreduktion eines Sprachsignals und zur Detektion des Rauschbereichs |
EP0683482A3 (de) * | 1994-05-13 | 1997-12-03 | Sony Corporation | Verfahren zur Rauschreduktion eines Sprachsignals und zur Detektion des Rauschbereichs |
EP1065657A1 (de) * | 1994-05-13 | 2001-01-03 | Sony Corporation | Verfahren zur Detektion des Rauschbereichs |
WO1999044191A1 (en) * | 1998-02-27 | 1999-09-02 | At & T Corp. | System and method for noise threshold adaptation for voice activity detection in nonstationary noise environments |
DE19952538C2 (de) * | 1998-11-06 | 2003-03-27 | Ibm | Automatische Verstärkungsregelung in einem Spracherkennungssystem |
EP1168306A2 (de) * | 2000-06-01 | 2002-01-02 | Avaya Technology Corp. | Verfahren und Vorrichtung zur Verbesserung von der Verständlichkeit eines digital komprimierten Sprachsignals |
US6889186B1 (en) | 2000-06-01 | 2005-05-03 | Avaya Technology Corp. | Method and apparatus for improving the intelligibility of digitally compressed speech |
EP1168306A3 (de) * | 2000-06-01 | 2002-10-02 | Avaya Technology Corp. | Verfahren und Vorrichtung zur Verbesserung von der Verständlichkeit eines digital komprimierten Sprachsignals |
GB2367467B (en) * | 2000-09-30 | 2004-12-15 | Mitel Corp | Noise level calculator for echo canceller |
FR2814875A1 (fr) * | 2000-09-30 | 2002-04-05 | Zarlink Semiconductor Inc | Procede de calcul de niveau de bruit dans un signal pour annulateur d'echo |
GB2367467A (en) * | 2000-09-30 | 2002-04-03 | Mitel Corp | Noise level calculation, e.g. for an echo canceller |
US7146003B2 (en) | 2000-09-30 | 2006-12-05 | Zarlink Semiconductor Inc. | Noise level calculator for echo canceller |
WO2005038773A1 (en) * | 2003-10-16 | 2005-04-28 | Koninklijke Philips Electronics N.V. | Voice activity detection with adaptive noise floor tracking |
US7660715B1 (en) | 2004-01-12 | 2010-02-09 | Avaya Inc. | Transparent monitoring and intervention to improve automatic adaptation of speech models |
US7529670B1 (en) | 2005-05-16 | 2009-05-05 | Avaya Inc. | Automatic speech recognition system for people with speech-affecting disabilities |
US7653543B1 (en) | 2006-03-24 | 2010-01-26 | Avaya Inc. | Automatic signal adjustment based on intelligibility |
US7925508B1 (en) | 2006-08-22 | 2011-04-12 | Avaya Inc. | Detection of extreme hypoglycemia or hyperglycemia based on automatic analysis of speech patterns |
US7962342B1 (en) | 2006-08-22 | 2011-06-14 | Avaya Inc. | Dynamic user interface for the temporarily impaired based on automatic analysis for speech patterns |
US7675411B1 (en) | 2007-02-20 | 2010-03-09 | Avaya Inc. | Enhancing presence information through the addition of one or more of biotelemetry data and environmental data |
US8041344B1 (en) | 2007-06-26 | 2011-10-18 | Avaya Inc. | Cooling off period prior to sending dependent on user's state |
WO2014165032A1 (en) * | 2013-03-12 | 2014-10-09 | Aawtend, Inc. | Integrated sensor-array processor |
US9443529B2 (en) | 2013-03-12 | 2016-09-13 | Aawtend, Inc. | Integrated sensor-array processor |
US9721583B2 (en) | 2013-03-12 | 2017-08-01 | Aawtend Inc. | Integrated sensor-array processor |
US10049685B2 (en) | 2013-03-12 | 2018-08-14 | Aaware, Inc. | Integrated sensor-array processor |
US10204638B2 (en) | 2013-03-12 | 2019-02-12 | Aaware, Inc. | Integrated sensor-array processor |
Also Published As
Publication number | Publication date |
---|---|
JPS60107700A (ja) | 1985-06-13 |
DE3473373D1 (en) | 1988-09-15 |
JPH0644195B2 (ja) | 1994-06-08 |
EP0140249B1 (de) | 1988-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4696039A (en) | Speech analysis/synthesis system with silence suppression | |
US4696040A (en) | Speech analysis/synthesis system with energy normalization and silence suppression | |
EP0140249B1 (de) | Sprachanalyse und Synthese mit Energienormalisierung | |
US6092039A (en) | Symbiotic automatic speech recognition and vocoder | |
US6041297A (en) | Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations | |
EP0993670B1 (de) | Verfahren und vorrichtung zur sprachverbesserung in einem sprachübertragungssystem | |
US6889186B1 (en) | Method and apparatus for improving the intelligibility of digitally compressed speech | |
EP1391879A2 (de) | Vorrichtung zur Sprachkodierung, zur linear-prädiktiven Analyse und zur Rauschverringerung | |
EP0814458A2 (de) | Verbesserungen bei oder in Bezug auf Sprachkodierung | |
US5706392A (en) | Perceptual speech coder and method | |
JPH08254994A (ja) | 分類化及び輪郭の目録(インベントリー)による音声符号化パラメータの配列の再構成 | |
KR0155315B1 (ko) | Lsp를 이용한 celp보코더의 피치 검색방법 | |
JPH07199997A (ja) | 音声信号の処理システムにおける音声信号の処理方法およびその処理における処理時間の短縮方法 | |
Van Schalkwyk et al. | Linear predictive speech coding at 2400 b/s | |
GB2336978A (en) | Improving speech intelligibility in presence of noise | |
CN117153196B (zh) | Pcm语音信号处理方法、装置、设备及介质 | |
Xydeas | Differential encoding techniques applied to speech signals | |
JP2003323200A (ja) | 音声符号化のための線形予測係数の勾配降下最適化 | |
Yegnanarayana | Effect of noise and distortion in speech on parametric extraction | |
JPH0414813B2 (de) | ||
KR0138878B1 (ko) | 보코더용 피치검색 처리시간 단축법 | |
JP2847730B2 (ja) | 音声符号化方式 | |
Viswanathan et al. | Medium and low bit rate speech transmission | |
Chilton | Factors affecting the quality of linear predictive coding of speech at low bit-rates | |
GB2266213A (en) | Digital signal coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Designated state(s): DE FR GB |
|
17P | Request for examination filed |
Effective date: 19851023 |
|
17Q | First examination report despatched |
Effective date: 19861212 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 3473373 Country of ref document: DE Date of ref document: 19880915 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20030915 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20031003 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20031031 Year of fee payment: 20 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20041011 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 |