US20010001141A1 - System and method for noise-compensated speech recognition - Google Patents

System and method for noise-compensated speech recognition Download PDF

Info

Publication number
US20010001141A1
US20010001141A1 US09/728,650 US72865000A US2001001141A1 US 20010001141 A1 US20010001141 A1 US 20010001141A1 US 72865000 A US72865000 A US 72865000A US 2001001141 A1 US2001001141 A1 US 2001001141A1
Authority
US
United States
Prior art keywords
noise
speech recognition
speech
database
input signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US09/728,650
Inventor
Gilbert Sih
Ning Bi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US09/728,650 priority Critical patent/US20010001141A1/en
Publication of US20010001141A1 publication Critical patent/US20010001141A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • the present invention relates to speech processing. More particularly, the present invention relates to a system and method for the automatic recognition of spoken words or phrases.
  • Speech recognition allows the driver to place telephone calls while continuously watching the road and maintaining both hands on the steering wheel. Handsfree carkits containing speech recognition will likely be a legislated requirement in future systems for safety reasons.
  • Speaker-dependent speech recognition the most common type in use today, operates in two phases: a training phase and a recognition phase.
  • the speech recognition system prompts the user to speak each of the words in the vocabulary once or twice so it can learn the characteristics of the user's speech for these particular words or phrases.
  • the recognition vocabulary sizes are typically small (less than 50 words) and the speech recognition system will only achieve high recognition accuracy on the user that trained it.
  • An example of a vocabulary for a handsfree carkit system would include the digits on the keypad, the keywords “call”, “send”, “dial”, “cancel”, “clear”, “add”, “delete”, “history”, “program”, “yes”, and “no”, as well as 20 names of commonly-called coworkers, friends, or family members.
  • the user can initiate calls in the recognition phase by speaking the trained keywords. For example, if the name “John” was one of the trained names, the user can initiate a call to John by saying the phrase “Call John.”
  • the speech recognition system recognizes the words “Call” and “John”, and dials the number that the user had previously entered as John's telephone number.
  • Training unit 6 receives as input s(n), a set of digitized speech samples for the word or phrase to be trained.
  • Parameter determination unit 7 may implement any of a number of speech parameter determination techniques, many of which are well-known in the art.
  • An exemplary embodiment of a parameter determination technique is the vocoder encoder described in U.S. Pat. No.
  • a parameter determination technique is a fast fourier transform (FFT), where the N parameters are the N FFT coefficients. Other embodiments derive parameters based on the FFT coefficients.
  • FFT fast fourier transform
  • Each spoken word or phrase produces one template of N parameters that is stored in template database 8 .
  • template database 8 contains M templates, each containing N parameters. Template database 8 is stored into some type of non-volatile memory so that the templates stay resident when the power is turned off.
  • FIG. 2 is a block diagram of speech recognition unit 10 , which operates during the recognition phase of a speaker-dependent speech recognition system.
  • Speech recognition unit 10 comprises template database 14 , which in general will be template database 8 from training unit 6 .
  • the input to speech recognition unit 10 is digitized input speech x(n), which is the speech to be recognized.
  • the input speech x(n) is passed into parameter determination block 12 , which performs the same parameter determination technique as parameter determination block 7 of training unit 6 .
  • Recognition template t(n) is then passed to pattern comparison block 16 that performs a pattern comparison between template t(n) and all the templates stored in template database 14 .
  • the distances between template t(n) and each of the templates in template database 14 are forwarded to decision block 18 , which selects from template database 14 the template that most closely matches recognition template t(n).
  • the output of decision block 18 is the decision as to which word in the vocabulary was spoken.
  • Recognition accuracy is a measure of how well a recognition system correctly recognizes spoken words or phrases in the vocabulary. For example, a recognition accuracy of 95% indicates that the recognition unit correctly recognizes words in the vocabulary 95 times out of 100. In a traditional speech recognition system, the recognition accuracy is severely degraded in the presence of noise.
  • the main reason for the loss of accuracy is that the training phase typically occurs in a quiet environment but the recognition typically occurs in a noisy environment.
  • a handsfree carkit speech recognition system is usually trained while the car is sitting in a garage or parked in the driveway, so the engine and air conditioning are not running and the windows are usually rolled up.
  • recognition is normally used while the car is moving, so the engine is running, there is road and wind noise present, the windows may be down, etc.
  • the recognition template does not form a good match with any of the templates obtained during training. This increases the likelihood of a recognition error or failure.
  • FIG. 3 illustrates a speech recognition unit 20 which must perform speech recognition in the presence of noise.
  • summer 22 adds speech signal x(n) with noise signal w(n) to produce noise-corrupted speech signal r(n).
  • the noise-corrupted speech signal r(n) is input to parameter determination block 24 , which produces noise-corrupted template t 1 (n).
  • Pattern comparison block 28 compares template t 1 (n) with all the templates in template database 26 , which was constructed in a quiet environment. Since noise-corrupted template t 1 (n) does not exactly match any of the training templates, there is a high probability that the decision produced by decision block 30 may be a recognition error or failure.
  • the present invention is a system and method for the automatic recognition of spoken words or phrases in the presence of noise.
  • Speaker-dependent speech recognition systems operate in two phases: a training phase and a recognition phase.
  • a training phase of a traditional speech recognition system
  • the digitized speech samples for each word or phrase are processed to produce a template of parameters characterizing the spoken words.
  • the output of the training phase is a library of such templates.
  • the recognition phase the user speaks a particular word or phrase to initiate a desired action.
  • the spoken word or phrase is digitized and processed to produce a template, which is compared with all the templates produced during training. The closest match determines the action that will be performed.
  • the main impairment limiting the accuracy of speech recognition systems is the presence of noise.
  • the addition of noise during recognition severely degrades recognition accuracy, because this noise was not present during training when the template database was produced.
  • the invention recognizes the need to account for the particular noise conditions that are present at the time of recognition to improve recognition accuracy.
  • the improved speech processing system and method stores the digitized speech samples for each spoken word or phrase in the training phase.
  • the training phase output is therefore a digitized speech database.
  • the noise characteristics in the audio environment are continually monitored.
  • a noise-compensated template database is constructed by adding a noise signal to each of the signals in the speech database and performing parameter determination on each of the speech plus noise signals.
  • This added noise signal is an artificially-synthesized noise signal with characteristics similar to that of the actual noise.
  • An alternative embodiment is a recording of the time window of noise that occurred just before the user spoke the word or phrase to initiate recognition. Since the template database is constructed using the same type of noise that is present in the spoken word or phrase to be recognized, the speech recognition unit can find a good match between templates, improving the recognition accuracy.
  • FIG. 1 is a block diagram of a training unit of a speech recognition system
  • FIG. 2 is a block diagram of a speech recognition unit
  • FIG. 3 is a block diagram of a speech recognition unit which performs speech recognition on a speech input corrupted by noise;
  • FIG. 4 is a block diagram of an improved training unit of a speech recognition system.
  • FIG. 5 is a block diagram of an exemplary improved speech recognition unit.
  • This invention provides a system and method for improving speech recognition accuracy when noise is present. It takes advantage of the recent advances in computation power and memory integration and modifies the training and recognition phases to account for the presence of noise during recognition.
  • the function of a speech recognition unit is to find the closest match to a recognition template that is computed on noise-corrupted speech. Since the characteristics of the noise may vary with time and location, the invention recognizes that the best time to construct the template database is during the recognition phase.
  • FIG. 4 shows a block diagram of an improved training unit 40 of a speech recognition system.
  • training unit 40 is modified to eliminate the parameter determination step. Instead of storing templates of parameters, digitized speech samples of the actual words and phrases are stored.
  • training unit 40 receives as input speech samples s(n), and stores digitized speech samples s(n) in speech database 42 .
  • speech database 42 contains M speech signals, where M is the number of words in the vocabulary.
  • FIG. 5 shows a block diagram of an improved speech recognition unit 50 for use in conjunction with training unit 40 .
  • the input to speech recognition unit 50 is noise corrupted speech signal r(n).
  • Noise-corrupted speech signal r(n) is generated by summer 52 adding speech signal x(n) with noise signal w(n).
  • summer 52 is not a physical element of the system, but is an artifact of a noisy environment.
  • Speech recognition unit 50 comprises speech database 60 , which contains the digitized speech samples that were recorded during the training phase. Speech recognition unit 50 also comprises parameter determination block 54 , through which noise-corrupted speech signal r(n) is passed to produce noise-corrupted template t 1 (n). As in a traditional voice recognition system, parameter determination block 54 may implement any of a number speech parameter determination techniques.
  • An exemplary parameter determination technique uses linear predictive coding (LPC) analysis techniques.
  • LPC analysis techniques model the vocal tract as a digital filter.
  • LPC cepstral coefficients c(m) may be computed to be the parameters for representing the speech signal.
  • the coefficients c(m) are computed using the following steps. First, the noise-corrupted speech signal r(n) is windowed over a frame of speech samples by applying a window function v(n):
  • the window function v(n) is a hamming window and the frame size N is equal to 160.
  • the number of autocorrelation coefficients to be computed is equal to the order of the LPC predictor, which is 10 .
  • the LPC coefficients are then computed directly from the autocorrelation values using Durbin's recursion algorithm.
  • the algorithm may be stated as follows:
  • the signal r(n) is passed to speech detection block 56 which determines the presence or absence of speech.
  • Speech detection block 56 may determine the presence or absence of speech using any of a number of techniques.
  • One such method is disclosed in the aforementioned U.S. Pat. No. 5,414,796, entitled “VARIABLE RATE VOCODER.” This technique analyzes the level of speech activity to make the determination regarding the presence or absence of speech.
  • the level of speech activity is based on the energy of the signal in comparison with the background noise energy estimate.
  • the energy E(n) is computed for each frame, which in a preferred embodiment is composed of 160 samples.
  • the background noise energy estimate B(n) may then calculated using the equation:
  • T 1( B ( n )) ⁇ (5.544613 ⁇ 10 ⁇ 6 )* B 2 ( n )+4.047152 *B ( n )+362 (14)
  • T 1( B ( n )) ⁇ (9.043945 ⁇ 10 ⁇ 8 )* B 2 ( n )+3.535748 *B ( n ) ⁇ 62071 (17)
  • T 2( B ( n )) ⁇ (1.986007 ⁇ 10 ⁇ 7 )* B 2 ( n )+4.941658 *B ( n )+223951 (18)
  • This speech detection method indicates the presence of speech when energy E(n) is greater than threshold T2(B(n)), and indicates the absence of speech when energy E(n) is less than threshold T2(B(n)).
  • this method can be extended to compute background noise energy estimates and thresholds in two or more frequency bands. Additionally, it should be understood that the values provided in Equations (13)-(19) are experimentally determined, and may be modified depending on the circumstances.
  • speech detection block 56 determines that speech is absent, it sends a control signal that enables noise analysis, modeling, and synthesis block 58 . It should be noted that in the absence of speech, the received signal r(n) is the same as the noise signal w(n).
  • noise analysis, modeling, and synthesis block 58 When noise analysis, modeling, and synthesis block 58 is enabled, it analyzes the characteristics of noise signal r(n), models it, and synthesizes a noise signal w 1 (n) that has similar characteristics to the actual noise w(n).
  • An exemplary embodiment for performing noise analysis, modeling, and synthesis is disclosed in U.S. Pat. No. 5,646,991, entitled “NOISE REPLACEMENT SYSTEM AND METHOD IN AN ECHO CANCELLER,” which is assigned to the assignee of the present invention and incorporated by reference herein.
  • This method performs noise analysis by passing the noise signal r(n) through a prediction error filter given by:
  • the output is the synthesized noise w 1 (n).
  • the synthesized noise w 1 (n) is added to each set of digitized speech samples in speech database 60 by summer 62 to produce sets of synthesized noise corrupted speech samples. Then, each set of synthesized noise corrupted speech samples is passed through parameter determination block 64 , which generates a set of parameters for each set of synthesized noise corrupted speech samples using the same parameter determination technique as that used in parameter determination block 54 .
  • Parameter determination block 54 produces a template of parameters for each set of speech samples, and the templates are stored in noise-compensated template database 66 .
  • Noise-compensated template database 66 is a set of templates that is constructed as if traditional training had taken place in the same type of noise that is present during recognition.
  • estimated noise w 1 (n) there are many possible methods for producing estimated noise w 1 (n) in addition to the method disclosed in U.S. Pat. No. 5,646,991.
  • An alternative embodiment is to simply record a time window of the actual noise present when the user is silent and use this noise signal as the estimated noise w 1 (n).
  • the time window of noise recorded right before the word or phrase to be recognized is spoken is an exemplary embodiment of this method.
  • Still another method is to average various windows of noise obtained over a specified period.
  • pattern comparison block 68 compares the noise corrupted template t 1 (n) with all the templates in noise compensated template database 66 . Since the noise effects are included in the templates of noise compensated template database 66 , decision block 70 is able to find a good match for t 1 (n). By accounting for the effects of noise in this manner, the accuracy of the speech recognition system is improved.

Abstract

A system and method for improving speech recognition accuracy in the presence of noise is provided. The speech recognition training unit is modified to store digitized speech samples into a speech database that can be accessed at recognition time. The improved recognition unit comprises a noise analysis, modeling, and synthesis unit which continually analyzes the noise characteristics present in the audio environment and produces an estimated noise signal with similar characteristics. The recognition unit then constructs a noise-compensated template database by adding the estimated noise signal to each of the speech samples in the speech database and performing parameter determination on the resulting sums. This procedure accounts for the presence of noise in the recognition phase by retraining all the templates using an estimated noise signal with similar characteristics as the actual noise signal that corrupted the word to be recognized. This method improves the likelihood of a good template match, which increases the recognition accuracy.

Description

    CROSS REFERENCE
  • 1. This application is a continuation of co-pending application Ser. No. 09/018,257, filed Feb. 4, 1998, entitled “System and Method for Noise-Compensated Speech Recognition.”
  • BACKGROUND OF THE INVENTION
  • 2. I. Field of the Invention
  • 3. The present invention relates to speech processing. More particularly, the present invention relates to a system and method for the automatic recognition of spoken words or phrases.
  • 4. II. Description of the Related Art
  • 5. Digital processing of speech signals has found widespread use, particularly in cellular telephone and PCS applications. One digital speech processing technique is that of speech recognition. The use of speech recognition is gaining importance due to safety reasons. For example, speech recognition may be used to replace the manual task of pushing buttons on a cellular phone keypad. This is especially important when a user is initiating a telephone call while driving a car. When using a phone without speech recognition, the driver must remove one hand from the steering wheel and look at the phone keypad while pushing the buttons to dial the call. These acts increase the likelihood of a car accident. Speech recognition allows the driver to place telephone calls while continuously watching the road and maintaining both hands on the steering wheel. Handsfree carkits containing speech recognition will likely be a legislated requirement in future systems for safety reasons.
  • 6. Speaker-dependent speech recognition, the most common type in use today, operates in two phases: a training phase and a recognition phase. In the training phase, the speech recognition system prompts the user to speak each of the words in the vocabulary once or twice so it can learn the characteristics of the user's speech for these particular words or phrases. The recognition vocabulary sizes are typically small (less than 50 words) and the speech recognition system will only achieve high recognition accuracy on the user that trained it. An example of a vocabulary for a handsfree carkit system would include the digits on the keypad, the keywords “call”, “send”, “dial”, “cancel”, “clear”, “add”, “delete”, “history”, “program”, “yes”, and “no”, as well as 20 names of commonly-called coworkers, friends, or family members. Once training is complete, the user can initiate calls in the recognition phase by speaking the trained keywords. For example, if the name “John” was one of the trained names, the user can initiate a call to John by saying the phrase “Call John.” The speech recognition system recognizes the words “Call” and “John”, and dials the number that the user had previously entered as John's telephone number.
  • 7. A block diagram of a training unit 6 of a speaker-dependent speech recognition system is shown in FIG. 1. Training unit 6 receives as input s(n), a set of digitized speech samples for the word or phrase to be trained. The speech signal s(n) is passed through parameter determination block 7, which produces a template of N parameters {p(n) n=1 . . . N} capturing the characteristics of the user's pronunciation of the particular word or phrase. Parameter determination unit 7 may implement any of a number of speech parameter determination techniques, many of which are well-known in the art. An exemplary embodiment of a parameter determination technique is the vocoder encoder described in U.S. Pat. No. 5,414,796, entitled “VARIABLE RATE VOCODER,” which is assigned to the assignee of the present invention and incorporated by reference herein. An alternative embodiment of a parameter determination technique is a fast fourier transform (FFT), where the N parameters are the N FFT coefficients. Other embodiments derive parameters based on the FFT coefficients. Each spoken word or phrase produces one template of N parameters that is stored in template database 8. After training is completed over M vocabulary words, template database 8 contains M templates, each containing N parameters. Template database 8 is stored into some type of non-volatile memory so that the templates stay resident when the power is turned off.
  • 8.FIG. 2 is a block diagram of speech recognition unit 10, which operates during the recognition phase of a speaker-dependent speech recognition system. Speech recognition unit 10 comprises template database 14, which in general will be template database 8 from training unit 6. The input to speech recognition unit 10 is digitized input speech x(n), which is the speech to be recognized. The input speech x(n) is passed into parameter determination block 12, which performs the same parameter determination technique as parameter determination block 7 of training unit 6. Parameter determination block 12 produces a recognition template of N parameters {t(n) n=1 . . . N} that models the characteristics of input speech x(n). Recognition template t(n) is then passed to pattern comparison block 16 that performs a pattern comparison between template t(n) and all the templates stored in template database 14. The distances between template t(n) and each of the templates in template database 14 are forwarded to decision block 18, which selects from template database 14 the template that most closely matches recognition template t(n). The output of decision block 18 is the decision as to which word in the vocabulary was spoken.
  • 9. Recognition accuracy is a measure of how well a recognition system correctly recognizes spoken words or phrases in the vocabulary. For example, a recognition accuracy of 95% indicates that the recognition unit correctly recognizes words in the vocabulary 95 times out of 100. In a traditional speech recognition system, the recognition accuracy is severely degraded in the presence of noise. The main reason for the loss of accuracy is that the training phase typically occurs in a quiet environment but the recognition typically occurs in a noisy environment. For example, a handsfree carkit speech recognition system is usually trained while the car is sitting in a garage or parked in the driveway, so the engine and air conditioning are not running and the windows are usually rolled up. However, recognition is normally used while the car is moving, so the engine is running, there is road and wind noise present, the windows may be down, etc. As a result of the disparity in noise level between the training and recognition phases, the recognition template does not form a good match with any of the templates obtained during training. This increases the likelihood of a recognition error or failure.
  • 10.FIG. 3 illustrates a speech recognition unit 20 which must perform speech recognition in the presence of noise. As shown in FIG. 3, summer 22 adds speech signal x(n) with noise signal w(n) to produce noise-corrupted speech signal r(n). It should be understood that summer 22 is not a physical element of the system, but is an artifact of a noisy environment. The noise-corrupted speech signal r(n) is input to parameter determination block 24, which produces noise-corrupted template t1(n). Pattern comparison block 28 compares template t1(n) with all the templates in template database 26, which was constructed in a quiet environment. Since noise-corrupted template t1(n) does not exactly match any of the training templates, there is a high probability that the decision produced by decision block 30 may be a recognition error or failure.
  • SUMMARY OF THE INVENTION
  • 11. The present invention is a system and method for the automatic recognition of spoken words or phrases in the presence of noise. Speaker-dependent speech recognition systems operate in two phases: a training phase and a recognition phase. In the training phase of a traditional speech recognition system, a user is prompted to speak all the words or phrases in a specified vocabulary. The digitized speech samples for each word or phrase are processed to produce a template of parameters characterizing the spoken words. The output of the training phase is a library of such templates. In the recognition phase, the user speaks a particular word or phrase to initiate a desired action. The spoken word or phrase is digitized and processed to produce a template, which is compared with all the templates produced during training. The closest match determines the action that will be performed. The main impairment limiting the accuracy of speech recognition systems is the presence of noise. The addition of noise during recognition severely degrades recognition accuracy, because this noise was not present during training when the template database was produced. The invention recognizes the need to account for the particular noise conditions that are present at the time of recognition to improve recognition accuracy.
  • 12. Instead of storing templates of parameters, the improved speech processing system and method stores the digitized speech samples for each spoken word or phrase in the training phase. The training phase output is therefore a digitized speech database. In the recognition phase, the noise characteristics in the audio environment are continually monitored. When the user speaks a word or phrase to initiate recognition, a noise-compensated template database is constructed by adding a noise signal to each of the signals in the speech database and performing parameter determination on each of the speech plus noise signals. One embodiment of this added noise signal is an artificially-synthesized noise signal with characteristics similar to that of the actual noise. An alternative embodiment is a recording of the time window of noise that occurred just before the user spoke the word or phrase to initiate recognition. Since the template database is constructed using the same type of noise that is present in the spoken word or phrase to be recognized, the speech recognition unit can find a good match between templates, improving the recognition accuracy.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • 13. The features, objects, and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings in which like reference characters identify correspondingly throughout and wherein:
  • 14.FIG. 1 is a block diagram of a training unit of a speech recognition system;
  • 15.FIG. 2 is a block diagram of a speech recognition unit;
  • 16.FIG. 3 is a block diagram of a speech recognition unit which performs speech recognition on a speech input corrupted by noise;
  • 17.FIG. 4 is a block diagram of an improved training unit of a speech recognition system; and
  • 18.FIG. 5 is a block diagram of an exemplary improved speech recognition unit.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • 19. This invention provides a system and method for improving speech recognition accuracy when noise is present. It takes advantage of the recent advances in computation power and memory integration and modifies the training and recognition phases to account for the presence of noise during recognition. The function of a speech recognition unit is to find the closest match to a recognition template that is computed on noise-corrupted speech. Since the characteristics of the noise may vary with time and location, the invention recognizes that the best time to construct the template database is during the recognition phase.
  • 20.FIG. 4 shows a block diagram of an improved training unit 40 of a speech recognition system. As opposed to the traditional training method shown in FIG. 1, training unit 40 is modified to eliminate the parameter determination step. Instead of storing templates of parameters, digitized speech samples of the actual words and phrases are stored. Thus, training unit 40 receives as input speech samples s(n), and stores digitized speech samples s(n) in speech database 42. After training, speech database 42 contains M speech signals, where M is the number of words in the vocabulary. Whereas the previous system and method of performing parameter determination loses information about the speech characteristics by only storing speech parameters, this system and method may preserve all the speech information for use in the recognition phase.
  • 21.FIG. 5 shows a block diagram of an improved speech recognition unit 50 for use in conjunction with training unit 40. The input to speech recognition unit 50 is noise corrupted speech signal r(n). Noise-corrupted speech signal r(n) is generated by summer 52 adding speech signal x(n) with noise signal w(n). As before, summer 52 is not a physical element of the system, but is an artifact of a noisy environment.
  • 22. Speech recognition unit 50 comprises speech database 60, which contains the digitized speech samples that were recorded during the training phase. Speech recognition unit 50 also comprises parameter determination block 54, through which noise-corrupted speech signal r(n) is passed to produce noise-corrupted template t1(n). As in a traditional voice recognition system, parameter determination block 54 may implement any of a number speech parameter determination techniques.
  • 23. An exemplary parameter determination technique uses linear predictive coding (LPC) analysis techniques. LPC analysis techniques model the vocal tract as a digital filter. Using LPC analysis, LPC cepstral coefficients c(m) may be computed to be the parameters for representing the speech signal. The coefficients c(m) are computed using the following steps. First, the noise-corrupted speech signal r(n) is windowed over a frame of speech samples by applying a window function v(n):
  • y(n)=r(n)v(n) 0<=n<=N−1  (1)
  • 24. In the exemplary embodiment, the window function v(n) is a hamming window and the frame size N is equal to 160. Next, the autocorrelation coefficients are computed on the windowed samples using the equation: R ( k ) = m = 0 N - k y ( m ) y ( m + k ) k = 1 , 2 , , P
    Figure US20010001141A1-20010510-M00001
  • 25. In the exemplary embodiment, P, the number of autocorrelation coefficients to be computed, is equal to the order of the LPC predictor, which is 10. The LPC coefficients are then computed directly from the autocorrelation values using Durbin's recursion algorithm. The algorithm may be stated as follows:
  • 1. E (0) =R(0), i=1  (3)
  • 26. 2. k i = { R ( i ) - j = 1 i - 1 α j ( i - 1 ) R ( i - j ) } / E ( i - 1 ) ( 4 )
    Figure US20010001141A1-20010510-M00002
  • 3. αi (i) =k i  (5)
  • 27. 4. α j ( i ) = α j ( i - 1 ) - k i α i - j ( i - 1 ) 1 <= j <= i - 1 ( 6 )
    Figure US20010001141A1-20010510-M00003
  • 5. E (i)=(1−k i 2)E (i−1)  (7)
  • 6. If i<P then go to (2) with i=i+1.  (8)
  • 7. The final solution for the LPC coefficients is given as a j=α j (P) 1<=j<=P  (9)
  • 28. The LPC coefficients are then converted to LPC cepstral coefficients using the following equations: c ( 0 ) = ln ( R ( 0 ) ) ( 10 ) c ( m ) = a m + k = 1 m - 1 ( k m ) c k a m - k 1 <= m <= P ( 11 ) c ( m ) = k = 1 m - 1 ( k m ) c k a m - k m > P ( 12 )
    Figure US20010001141A1-20010510-M00004
  • 29. It should be understood that other techniques may be used for parameter determination instead of the LPC cepstral coefficients.
  • 30. In addition, the signal r(n) is passed to speech detection block 56 which determines the presence or absence of speech. Speech detection block 56 may determine the presence or absence of speech using any of a number of techniques. One such method is disclosed in the aforementioned U.S. Pat. No. 5,414,796, entitled “VARIABLE RATE VOCODER.” This technique analyzes the level of speech activity to make the determination regarding the presence or absence of speech. The level of speech activity is based on the energy of the signal in comparison with the background noise energy estimate. First, the energy E(n) is computed for each frame, which in a preferred embodiment is composed of 160 samples. The background noise energy estimate B(n) may then calculated using the equation:
  • B(n)=min[E(n), 5059644, max (1.00547*B(n−1), B(n−1)+1)].  (13)
  • 31. If B(n)<160000, three thresholds are computed using B(n) as follows:
  • T1(B(n))=−(5.544613×10−6)*B 2(n)+4.047152*B(n)+362  (14)
  • T2(B(n))=−(1.529733×10−5)*B 2(n)+8.750045*B(n)+1136  (15)
  • T3(B(n))=−(3.957050×10−5)*B 2(n)+18.89962*B(n)+3347  (16)
  • 32. If B(n)>160000, the three thresholds are computed as:
  • T1(B(n))=−(9.043945×10−8)*B 2(n)+3.535748*B(n)−62071  (17)
  • T2(B(n))=−(1.986007×10−7)*B 2(n)+4.941658*B(n)+223951  (18)
  • T3(B(n))=−(4.838477×10−7)*B 2(n)+8.630020*B(n)+645864  (19)
  • 33. This speech detection method indicates the presence of speech when energy E(n) is greater than threshold T2(B(n)), and indicates the absence of speech when energy E(n) is less than threshold T2(B(n)). In an alternative embodiment, this method can be extended to compute background noise energy estimates and thresholds in two or more frequency bands. Additionally, it should be understood that the values provided in Equations (13)-(19) are experimentally determined, and may be modified depending on the circumstances.
  • 34. When speech detection block 56 determines that speech is absent, it sends a control signal that enables noise analysis, modeling, and synthesis block 58. It should be noted that in the absence of speech, the received signal r(n) is the same as the noise signal w(n).
  • 35. When noise analysis, modeling, and synthesis block 58 is enabled, it analyzes the characteristics of noise signal r(n), models it, and synthesizes a noise signal w1(n) that has similar characteristics to the actual noise w(n). An exemplary embodiment for performing noise analysis, modeling, and synthesis is disclosed in U.S. Pat. No. 5,646,991, entitled “NOISE REPLACEMENT SYSTEM AND METHOD IN AN ECHO CANCELLER,” which is assigned to the assignee of the present invention and incorporated by reference herein. This method performs noise analysis by passing the noise signal r(n) through a prediction error filter given by: A ( z ) = 1 - i = 1 P a i z - i ( 20 )
    Figure US20010001141A1-20010510-M00005
  • 36. where P, the order of the predictor, is 5 in the exemplary embodiment. The LPC coefficients ai, are computed as explained earlier using equations (1) through (9). Once the LPC coefficients are obtained, synthesized noise samples can be generated with the same spectral characteristics by passing white noise through the noise synthesis filter given by: 1 A ( z ) = 1 1 - i = 1 P a i z - i ( 21 )
    Figure US20010001141A1-20010510-M00006
  • 37. which is just the inverse of the filter used for noise analysis. After applying a scaling factor to each of the synthesized noise samples to make the synthesized noise energy equal to the actual noise energy, the output is the synthesized noise w1(n).
  • 38. The synthesized noise w1(n) is added to each set of digitized speech samples in speech database 60 by summer 62 to produce sets of synthesized noise corrupted speech samples. Then, each set of synthesized noise corrupted speech samples is passed through parameter determination block 64, which generates a set of parameters for each set of synthesized noise corrupted speech samples using the same parameter determination technique as that used in parameter determination block 54. Parameter determination block 54 produces a template of parameters for each set of speech samples, and the templates are stored in noise-compensated template database 66. Noise-compensated template database 66 is a set of templates that is constructed as if traditional training had taken place in the same type of noise that is present during recognition. Note that there are many possible methods for producing estimated noise w1(n) in addition to the method disclosed in U.S. Pat. No. 5,646,991. An alternative embodiment is to simply record a time window of the actual noise present when the user is silent and use this noise signal as the estimated noise w1(n). The time window of noise recorded right before the word or phrase to be recognized is spoken is an exemplary embodiment of this method. Still another method is to average various windows of noise obtained over a specified period.
  • 39. Referring still to FIG. 5, pattern comparison block 68 compares the noise corrupted template t1(n) with all the templates in noise compensated template database 66. Since the noise effects are included in the templates of noise compensated template database 66, decision block 70 is able to find a good match for t1(n). By accounting for the effects of noise in this manner, the accuracy of the speech recognition system is improved.
  • 40. The previous description of the preferred embodiments is provided to enable any person skilled in the art to make or use the present invention. The various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the inventive faculty. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

We claim:
1. A speech recognition system, comprising:
a training unit for receiving signals of words or phrases to be trained, generating digitized samples for each said words or phrases, and storing said digitized samples in a speech database; and
a speech recognition unit for receiving a noise corrupted input signal to be recognized, generating a noise compensated template database by applying the effects of noise to said digitized samples of said speech database, and providing a speech recognition outcome for said noise corrupted input signal based on said noise compensated template database.
2. The speech recognition system of
claim 1
wherein said speech recognition unit comprises:
a first parameter determination unit for receiving said noise corrupted input signal and generating a template of parameters representative of said input signal in accordance with a predetermined parameter determination technique;
a second parameter determination unit for receiving said speech database with the effects of noise applied to said digitized samples, and generating said noise compensated template database in accordance with said predetermined parameter determination technique; and
a pattern comparison unit for comparing said template of parameters representative of said input signal with the templates of said noise compensated template database to determine the best match and thereby identify said speech recognition outcome.
3. The speech recognition system of
claim 2
, wherein said parameter determination technique is a linear predictive coding (LPC) analysis technique.
4. A speech recognition unit of a speech recognition system for recognizing an input signal, said speech recognition unit accounting for the effects of a noisy environment, comprising:
means for storing digitized samples of words or phrases of a vocabulary in a speech database;
means for applying the effects of noise to said digitized samples of said vocabulary to generate noise corrupted digitized samples of said vocabulary;
means for generating a noise compensated template database based on said noise corrupted digitized samples; and
means for determining a speech recognition outcome for said input signal based on said noise compensated template database.
5. The speech recognition unit of
claim 4
, further comprising:
first parameter determination means for receiving said input signal and generating a template of parameters representative of said input signal in accordance with a predetermined parameter determination technique; and
second parameter determination means for receiving said noise corrupted digitized samples of said vocabulary and generating the templates of said noise compensated template database in accordance with said predetermined parameter determination technique;
wherein said means for determining said speech recognition outcome compares said template of parameters representative of said input signal with the templates of said noise compensated template database to determine the best match and thereby identify said speech recognition outcome.
6. A method for speech recognition accounting for the effects of a noisy environment, comprising the steps of:
generating digitized samples of each word or phrase trained, each said word or phrase belonging to a vocabulary;
storing said digitized samples in a speech database;
receiving an input signal to be recognized;
applying the effects of noise to said digitized samples of said vocabulary to generate noise corrupted digitized samples of said vocabulary;
generating a noise compensated template database based on said noise corrupted digitized samples; and
providing a speech recognition outcome for said noise corrupted input signal based on said noise compensated template database.
7. The method of speech recognition of
claim 6
, further comprising the steps of:
generating a template of parameters representative of said input signal in accordance with a predetermined parameter determination technique; and
generating templates for said noise compensated template database in accordance with said predetermined parameter determination technique;
wherein said step of providing a speech recognition outcome compares said template of parameters representative of said input signal with said templates of said noise compensated template database to determine the best match and thereby identify said speech recognition outcome.
US09/728,650 1998-02-04 2000-12-01 System and method for noise-compensated speech recognition Abandoned US20010001141A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/728,650 US20010001141A1 (en) 1998-02-04 2000-12-01 System and method for noise-compensated speech recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/018,257 US6381569B1 (en) 1998-02-04 1998-02-04 Noise-compensated speech recognition templates
US09/728,650 US20010001141A1 (en) 1998-02-04 2000-12-01 System and method for noise-compensated speech recognition

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/018,257 Continuation US6381569B1 (en) 1998-02-04 1998-02-04 Noise-compensated speech recognition templates

Publications (1)

Publication Number Publication Date
US20010001141A1 true US20010001141A1 (en) 2001-05-10

Family

ID=21787025

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/018,257 Expired - Lifetime US6381569B1 (en) 1998-02-04 1998-02-04 Noise-compensated speech recognition templates
US09/728,650 Abandoned US20010001141A1 (en) 1998-02-04 2000-12-01 System and method for noise-compensated speech recognition

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/018,257 Expired - Lifetime US6381569B1 (en) 1998-02-04 1998-02-04 Noise-compensated speech recognition templates

Country Status (9)

Country Link
US (2) US6381569B1 (en)
EP (1) EP1058925B1 (en)
JP (1) JP4750271B2 (en)
KR (1) KR100574594B1 (en)
CN (1) CN1228761C (en)
AU (1) AU2577499A (en)
DE (1) DE69916255T2 (en)
HK (1) HK1035600A1 (en)
WO (1) WO1999040571A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169602A1 (en) * 2001-05-09 2002-11-14 Octiv, Inc. Echo suppression and speech detection techniques for telephony applications
US20040002867A1 (en) * 2002-06-28 2004-01-01 Canon Kabushiki Kaisha Speech recognition apparatus and method
US20060080089A1 (en) * 2004-10-08 2006-04-13 Matthias Vierthaler Circuit arrangement and method for audio signals containing speech
US20060184362A1 (en) * 2005-02-15 2006-08-17 Bbn Technologies Corp. Speech analyzing system with adaptive noise codebook
US20070055502A1 (en) * 2005-02-15 2007-03-08 Bbn Technologies Corp. Speech analyzing system with speech codebook
US20090222264A1 (en) * 2008-02-29 2009-09-03 Broadcom Corporation Sub-band codec with native voice activity detection
US20100094633A1 (en) * 2007-03-16 2010-04-15 Takashi Kawamura Voice analysis device, voice analysis method, voice analysis program, and system integration circuit
US20110004470A1 (en) * 2009-07-02 2011-01-06 Mr. Alon Konchitsky Method for Wind Noise Reduction
US20110071821A1 (en) * 2007-06-15 2011-03-24 Alon Konchitsky Receiver intelligibility enhancement system
US20120143604A1 (en) * 2010-12-07 2012-06-07 Rita Singh Method for Restoring Spectral Components in Denoised Speech Signals
US20150019223A1 (en) * 2011-12-31 2015-01-15 Jianfeng Chen Method and device for presenting content
US20150179184A1 (en) * 2013-12-20 2015-06-25 International Business Machines Corporation Compensating For Identifiable Background Content In A Speech Recognition Device
US9299347B1 (en) 2014-10-22 2016-03-29 Google Inc. Speech recognition using associative mapping
US9343079B2 (en) 2007-06-15 2016-05-17 Alon Konchitsky Receiver intelligibility enhancement system
US9786270B2 (en) 2015-07-09 2017-10-10 Google Inc. Generating acoustic models
US9858922B2 (en) 2014-06-23 2018-01-02 Google Inc. Caching speech recognition scores
CN109256144A (en) * 2018-11-20 2019-01-22 中国科学技术大学 Sound enhancement method based on integrated study and noise perception training
US20190051291A1 (en) * 2017-08-14 2019-02-14 Samsung Electronics Co., Ltd. Neural network method and apparatus
US10229672B1 (en) 2015-12-31 2019-03-12 Google Llc Training acoustic models using connectionist temporal classification
US10403291B2 (en) 2016-07-15 2019-09-03 Google Llc Improving speaker verification across locations, languages, and/or dialects
US10706840B2 (en) 2017-08-18 2020-07-07 Google Llc Encoder-decoder models for sequence to sequence mapping
US20210242891A1 (en) * 2020-02-04 2021-08-05 Infineon Technologies Ag Apparatus and method for correcting an input signal

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6744887B1 (en) * 1999-10-05 2004-06-01 Zhone Technologies, Inc. Acoustic echo processing system
JP4590692B2 (en) 2000-06-28 2010-12-01 パナソニック株式会社 Acoustic model creation apparatus and method
US6631348B1 (en) * 2000-08-08 2003-10-07 Intel Corporation Dynamic speech recognition pattern switching for enhanced speech recognition accuracy
JP4244514B2 (en) * 2000-10-23 2009-03-25 セイコーエプソン株式会社 Speech recognition method and speech recognition apparatus
US6999926B2 (en) * 2000-11-16 2006-02-14 International Business Machines Corporation Unsupervised incremental adaptation using maximum likelihood spectral transformation
JP4240878B2 (en) * 2001-12-13 2009-03-18 四一 安藤 Speech recognition method and speech recognition apparatus
US7340397B2 (en) * 2003-03-03 2008-03-04 International Business Machines Corporation Speech recognition optimization tool
US20050228673A1 (en) * 2004-03-30 2005-10-13 Nefian Ara V Techniques for separating and evaluating audio and video source data
CN1936829B (en) * 2005-09-23 2010-05-26 鸿富锦精密工业(深圳)有限公司 Sound output system and method
US7729911B2 (en) * 2005-09-27 2010-06-01 General Motors Llc Speech recognition method and system
KR100751923B1 (en) * 2005-11-11 2007-08-24 고려대학교 산학협력단 Method and apparatus for compensating energy features for robust speech recognition in noise environment
US20070118372A1 (en) * 2005-11-23 2007-05-24 General Electric Company System and method for generating closed captions
CN100389421C (en) * 2006-04-20 2008-05-21 北京理工大学 Method for quickly forming voice data base for key word checkout task
US8615397B2 (en) * 2008-04-04 2013-12-24 Intuit Inc. Identifying audio content using distorted target patterns
DE102009059138A1 (en) 2009-12-19 2010-07-29 Daimler Ag Test system for automatic testing of speech recognition system in motor vehicle, has application programming interface connected to speech recognition system to be tested and designed as logical abstraction layer
US9143571B2 (en) * 2011-03-04 2015-09-22 Qualcomm Incorporated Method and apparatus for identifying mobile devices in similar sound environment
CN103514878A (en) * 2012-06-27 2014-01-15 北京百度网讯科技有限公司 Acoustic modeling method and device, and speech recognition method and device
US9293148B2 (en) 2012-10-11 2016-03-22 International Business Machines Corporation Reducing noise in a shared media session
CN103903616B (en) * 2012-12-25 2017-12-29 联想(北京)有限公司 The method and electronic equipment of a kind of information processing
CN103544953B (en) * 2013-10-24 2016-01-20 哈尔滨师范大学 A kind of acoustic environment recognition methods based on ground unrest minimum statistics measure feature
EP3317879B1 (en) 2015-06-30 2020-02-19 Fraunhofer Gesellschaft zur Förderung der Angewand Method and device for the allocation of sounds and for analysis
CN105405447B (en) * 2015-10-27 2019-05-24 航宇救生装备有限公司 One kind sending words respiratory noise screen method
CN106816154A (en) * 2016-12-15 2017-06-09 北京青笋科技有限公司 A kind of light fixture voice identification control method with Intelligent noise reduction function
US10762905B2 (en) * 2018-07-31 2020-09-01 Cirrus Logic, Inc. Speaker verification
CN109841227B (en) * 2019-03-11 2020-10-02 南京邮电大学 Background noise removing method based on learning compensation
CN110808030B (en) * 2019-11-22 2021-01-22 珠海格力电器股份有限公司 Voice awakening method, system, storage medium and electronic equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4933973A (en) 1988-02-29 1990-06-12 Itt Corporation Apparatus and methods for the selective addition of noise to templates employed in automatic speech recognition systems
US5095503A (en) 1989-12-20 1992-03-10 Motorola, Inc. Cellular telephone controller with synthesized voice feedback for directory number confirmation and call status
DE69233794D1 (en) 1991-06-11 2010-09-23 Qualcomm Inc Vocoder with variable bit rate
US5307405A (en) 1992-09-25 1994-04-26 Qualcomm Incorporated Network echo canceller
DE4340679A1 (en) 1993-11-30 1995-06-01 Detecon Gmbh Speech module providing playback of short message in mobile station
US5845246A (en) * 1995-02-28 1998-12-01 Voice Control Systems, Inc. Method for reducing database requirements for speech recognition systems
IL116103A0 (en) 1995-11-23 1996-01-31 Wireless Links International L Mobile data terminals with text to speech capability
US5778342A (en) * 1996-02-01 1998-07-07 Dspc Israel Ltd. Pattern recognition system and method
US5950123A (en) 1996-08-26 1999-09-07 Telefonaktiebolaget L M Cellular telephone network support of audible information delivery to visually impaired subscribers

Cited By (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020169602A1 (en) * 2001-05-09 2002-11-14 Octiv, Inc. Echo suppression and speech detection techniques for telephony applications
US7236929B2 (en) * 2001-05-09 2007-06-26 Plantronics, Inc. Echo suppression and speech detection techniques for telephony applications
US7337113B2 (en) * 2002-06-28 2008-02-26 Canon Kabushiki Kaisha Speech recognition apparatus and method
US20040002867A1 (en) * 2002-06-28 2004-01-01 Canon Kabushiki Kaisha Speech recognition apparatus and method
US20060080089A1 (en) * 2004-10-08 2006-04-13 Matthias Vierthaler Circuit arrangement and method for audio signals containing speech
US8005672B2 (en) * 2004-10-08 2011-08-23 Trident Microsystems (Far East) Ltd. Circuit arrangement and method for detecting and improving a speech component in an audio signal
US8219391B2 (en) 2005-02-15 2012-07-10 Raytheon Bbn Technologies Corp. Speech analyzing system with speech codebook
US7797156B2 (en) 2005-02-15 2010-09-14 Raytheon Bbn Technologies Corp. Speech analyzing system with adaptive noise codebook
US20060184362A1 (en) * 2005-02-15 2006-08-17 Bbn Technologies Corp. Speech analyzing system with adaptive noise codebook
US20070055502A1 (en) * 2005-02-15 2007-03-08 Bbn Technologies Corp. Speech analyzing system with speech codebook
US8478587B2 (en) * 2007-03-16 2013-07-02 Panasonic Corporation Voice analysis device, voice analysis method, voice analysis program, and system integration circuit
US20100094633A1 (en) * 2007-03-16 2010-04-15 Takashi Kawamura Voice analysis device, voice analysis method, voice analysis program, and system integration circuit
US20110071821A1 (en) * 2007-06-15 2011-03-24 Alon Konchitsky Receiver intelligibility enhancement system
US9343079B2 (en) 2007-06-15 2016-05-17 Alon Konchitsky Receiver intelligibility enhancement system
US8868417B2 (en) * 2007-06-15 2014-10-21 Alon Konchitsky Handset intelligibility enhancement system using adaptive filters and signal buffers
US8190440B2 (en) * 2008-02-29 2012-05-29 Broadcom Corporation Sub-band codec with native voice activity detection
US20090222264A1 (en) * 2008-02-29 2009-09-03 Broadcom Corporation Sub-band codec with native voice activity detection
US20110004470A1 (en) * 2009-07-02 2011-01-06 Mr. Alon Konchitsky Method for Wind Noise Reduction
US8433564B2 (en) * 2009-07-02 2013-04-30 Alon Konchitsky Method for wind noise reduction
US20120143604A1 (en) * 2010-12-07 2012-06-07 Rita Singh Method for Restoring Spectral Components in Denoised Speech Signals
US10078690B2 (en) * 2011-12-31 2018-09-18 Thomson Licensing Dtv Method and device for presenting content
US10489452B2 (en) * 2011-12-31 2019-11-26 Interdigital Madison Patent Holdings, Sas Method and device for presenting content
US20150019223A1 (en) * 2011-12-31 2015-01-15 Jianfeng Chen Method and device for presenting content
US20150179184A1 (en) * 2013-12-20 2015-06-25 International Business Machines Corporation Compensating For Identifiable Background Content In A Speech Recognition Device
US9466310B2 (en) * 2013-12-20 2016-10-11 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Compensating for identifiable background content in a speech recognition device
US9858922B2 (en) 2014-06-23 2018-01-02 Google Inc. Caching speech recognition scores
US9299347B1 (en) 2014-10-22 2016-03-29 Google Inc. Speech recognition using associative mapping
US10204619B2 (en) 2014-10-22 2019-02-12 Google Llc Speech recognition using associative mapping
US9786270B2 (en) 2015-07-09 2017-10-10 Google Inc. Generating acoustic models
US10803855B1 (en) 2015-12-31 2020-10-13 Google Llc Training acoustic models using connectionist temporal classification
US11769493B2 (en) 2015-12-31 2023-09-26 Google Llc Training acoustic models using connectionist temporal classification
US10229672B1 (en) 2015-12-31 2019-03-12 Google Llc Training acoustic models using connectionist temporal classification
US11341958B2 (en) 2015-12-31 2022-05-24 Google Llc Training acoustic models using connectionist temporal classification
US10403291B2 (en) 2016-07-15 2019-09-03 Google Llc Improving speaker verification across locations, languages, and/or dialects
US11017784B2 (en) 2016-07-15 2021-05-25 Google Llc Speaker verification across locations, languages, and/or dialects
US11594230B2 (en) 2016-07-15 2023-02-28 Google Llc Speaker verification
US10504506B2 (en) * 2017-08-14 2019-12-10 Samsung Electronics Co., Ltd. Neural network method and apparatus
US20190051291A1 (en) * 2017-08-14 2019-02-14 Samsung Electronics Co., Ltd. Neural network method and apparatus
US10706840B2 (en) 2017-08-18 2020-07-07 Google Llc Encoder-decoder models for sequence to sequence mapping
US11776531B2 (en) 2017-08-18 2023-10-03 Google Llc Encoder-decoder models for sequence to sequence mapping
CN109256144A (en) * 2018-11-20 2019-01-22 中国科学技术大学 Sound enhancement method based on integrated study and noise perception training
US20210242891A1 (en) * 2020-02-04 2021-08-05 Infineon Technologies Ag Apparatus and method for correcting an input signal
US11942975B2 (en) * 2020-02-04 2024-03-26 Infineon Technologies Ag Apparatus and method for correcting an input signal

Also Published As

Publication number Publication date
JP4750271B2 (en) 2011-08-17
CN1296607A (en) 2001-05-23
DE69916255D1 (en) 2004-05-13
AU2577499A (en) 1999-08-23
KR100574594B1 (en) 2006-04-28
WO1999040571A1 (en) 1999-08-12
US6381569B1 (en) 2002-04-30
CN1228761C (en) 2005-11-23
JP2002502993A (en) 2002-01-29
KR20010040669A (en) 2001-05-15
HK1035600A1 (en) 2001-11-30
DE69916255T2 (en) 2005-04-14
EP1058925A1 (en) 2000-12-13
EP1058925B1 (en) 2004-04-07

Similar Documents

Publication Publication Date Title
US6381569B1 (en) Noise-compensated speech recognition templates
US10109271B2 (en) Frame erasure concealment technique for a bitstream-based feature extractor
EP1301922B1 (en) System and method for voice recognition with a plurality of voice recognition engines
US6324509B1 (en) Method and apparatus for accurate endpointing of speech in the presence of noise
US8639508B2 (en) User-specific confidence thresholds for speech recognition
US6411926B1 (en) Distributed voice recognition system
Kim et al. A bitstream-based front-end for wireless speech recognition on IS-136 communications system
EP2151821B1 (en) Noise-reduction processing of speech signals
US7941313B2 (en) System and method for transmitting speech activity information ahead of speech features in a distributed voice recognition system
US20030004720A1 (en) System and method for computing and transmitting parameters in a distributed voice recognition system
US20030018472A1 (en) Vocoder-based voice recognizer
US6182036B1 (en) Method of extracting features in a voice recognition system
US20060165202A1 (en) Signal processor for robust pattern recognition
JPH075892A (en) Voice recognition method
EP1159735B1 (en) Voice recognition rejection scheme
US6792405B2 (en) Bitstream-based feature extraction method for a front-end speech recognizer
Ruehl et al. Speech recognition in the noisy car environment
Kotnik et al. Efficient noise robust feature extraction algorithms for distributed speech recognition (DSR) systems
Genoud et al. Deliberate Imposture: A Challenge for Automatic Speaker Verification Systems.
Yang et al. A codebook adaptation algorithm for SCHMM using formant distribution

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION