WO2000031728A1 - Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede - Google Patents

Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede Download PDF

Info

Publication number
WO2000031728A1
WO2000031728A1 PCT/FR1999/002852 FR9902852W WO0031728A1 WO 2000031728 A1 WO2000031728 A1 WO 2000031728A1 FR 9902852 W FR9902852 W FR 9902852W WO 0031728 A1 WO0031728 A1 WO 0031728A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
series
energy
frames
acoustic signal
Prior art date
Application number
PCT/FR1999/002852
Other languages
English (en)
French (fr)
Inventor
Pierre-Albert Breton
Original Assignee
Thomson-Csf Sextant
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson-Csf Sextant filed Critical Thomson-Csf Sextant
Priority to DE69906569T priority Critical patent/DE69906569T2/de
Priority to US09/831,344 priority patent/US6868378B1/en
Priority to EP99956096A priority patent/EP1131813B1/de
Publication of WO2000031728A1 publication Critical patent/WO2000031728A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Definitions

  • the present invention relates to a method of voice recognition in a noisy acoustic signal.
  • the invention also relates to a voice recognition system implementing this method.
  • the invention therefore relates to the processing of acoustic signals containing speech picked up in noisy environments. Therefore, it finds a main application, although not exclusive, in the context of telephone or radiotelephone communications, voice recognition, sound recording on board civil or military aircraft, and more generally in all noisy vehicles, on-board intercoms, etc.
  • the noises result from the engines, from the air conditioning, from the ventilation of the on-board equipment or from the aerodynamic noises. All these noises are picked up, at least partially, by the microphone in which the pilot or another member of the crew speaks.
  • one of the characteristics of the noises is to be very variable over time. In fact, they are very dependent on the engine operating speed (take-off phase, stabilized speed, etc. .).
  • the useful signals that is to say the signals representing the conversations, also have peculiarities: they are most often of short duration.
  • a system 1 comprises two main functional blocks: a parameterization block 11 of a time signal received from an electroacoustic transducer, for example a Mie microphone, via analog-digital conversion circuits 10, and a shape classification block 12.
  • the parameterization transforms the time signal received from the analog-digital conversion circuits 10, namely a series of digital samples into a series of parameter vectors, each vector being representative of a time segment which is called frame, as recalled previously.
  • the advantage of parameterization is to express the acoustic content of the signal in a reduced number of values. In the application considered, a frame of 256 samples is typically represented by a vector of 8 parameters.
  • the shape recognition block itself comprises two modules: a shape recognition module proper 121 and an acoustic reference memorization module 120.
  • the module 121 compares the series of vectors resulting from the parameterization with a series of vectors obtained during a learning phase, phase during which acoustic footprints of each word or phoneme are determined. The comparison makes it possible to establish a "distance" between the spoken sentence and the sentences of syntax.
  • the syntax sentence with the smallest distance a priori represents the sentence to be recognized.
  • the digital signals representing the recognized sentence are transmitted to a user unit 13.
  • the useful signals that is to say the voice signals
  • the useful signals are more or less tainted with noise, since the Mie microphone picks up sounds foreign to speech, as noted.
  • This noise is the main source of errors in the speech recognition process.
  • Noise masks part of the acoustic signal, which results in a loss of resolution of the recognition. This phenomenon is all the more accentuated when the noise level is high.
  • the useful signal is completely "drowned" in the noise.
  • noise processing must be efficient under the two conditions of sound environment.
  • noise reduction treatments are applied, prior to voice recognition, that is to say treatments aimed at minimizing the effects of noise.
  • these treatments only limit the degradation of the recognition rate caused by noise. For levels of high noises, they do not allow to maintain a sufficient degree of performance.
  • the signal after denoising operations, remains tainted with a so-called residual noise level. Although lower than the initial level, this residual noise level remains, in the majority of cases, not negligible and greatly disturbs the speech recognition process. This method alone is therefore not sufficient to eliminate the nuisance.
  • the object of the invention is to set a method which makes it possible, at the same time, to keep a high sensitivity of a parameterization chain when the ambient noise level is low, or even almost non-existent, and to make it robust in the presence of high noise level.
  • the method according to the invention adapts, in real time, the degree of robustness of the parameterization chain, so as to obtain, at all times, the best possible compromise between robustness and sensitivity, whatever the level of noise.
  • the subject of the invention is therefore a method of voice recognition in a noisy acoustic signal, the method comprising at least one phase of digitization and cutting of said acoustic signal in the form of a series of time frames of predetermined duration, a phase of parameterization.
  • said time frames so as to transform them into a first series of parameter vectors in the frequency domain and a phase of comparison of said parameter vectors of the first series with parameter vectors of a second series, prerecorded in a phase so-called preliminary learning, so as to obtain said recognition by determining a minimum distance between the vectors of the first series and particular vectors of the second series, characterized in that the said parametrization phase comprises the following steps:
  • the invention also relates to a voice recognition system for implementing this method.
  • FIG. 1 schematically illustrates, in the form of a block diagram, a voice recognition system of the known art, operating according to a so-called global method;
  • Figure 2 illustrates, in more detail, a parameterization block, component of the system according to Figure 1;
  • FIG. 3 is a diagram illustrating the configuration of so-called Bark windows
  • FIG. 4 is a diagram illustrating the shape of curves representing Qlog type functions
  • FIG. 5 illustrates a parameterization chain for the implementation of the voice recognition method according to a first embodiment of the invention
  • FIG. 6 illustrates a complete system implementing the voice recognition method according to a preferred embodiment of the invention
  • FIG. 7 shows a typical example of an acoustic signal from a noise pickup
  • FIG. 8 is a flowchart showing the steps of a particular method of finding a noise model.
  • FIG. 2 illustrates such a parameterization block 11. This includes three functional modules, 110 to 112.
  • the first module, 110 makes it possible to determine the spectral energy.
  • the input signals Se consist of digital frames generated by the circuits 10 (FIG. 1).
  • the spectrum of each time frame is squared.
  • Weighting windows are then applied to the digital values obtained, preferably sixteen so-called Bark windows, reproducing the shape of the filters of the human auditory system, so as to obtain sixteen energy values, in frequency channels.
  • FIG. 3 is a diagram illustrating the shape of the sixteen applied Bark windows, FB ⁇ , with 1 ⁇ i ⁇ 16. On the ordinate, the amplitude of the weighting coefficients is represented and the frequency (in Hz) on the abscissa.
  • the first windows have a high amplitude peak and a narrow bandwidth, while the amplitude of the windows of higher rank decreases and the bandwidth widens.
  • the Bark FB ⁇ windows have overlaps two by two. The exact characteristics of these Bark windows are well known in the state of the art and there is no need to describe them further. For more details, one can profitably refer to the book: "Speech and its automatic processing", Calliope, ASSON Edition, 1989, more particularly on page 268 of this book.
  • the sixteen values obtained are then transmitted in the form of digital signals to logarithmic compression circuits 111.
  • the compression function is a Qlog function.
  • Such a function is represented on the diagram of FIG. 4, by the curve referenced C_.
  • the Qlog function takes the value zero at the origin.
  • the Qlog function has a logarithmic behavior for abscissa values greater than zero.
  • the compressed digital signals are then transmitted to the module 112 which performs a discrete cosine transform. We then select the coefficients 2 to 9 of this transform. These coefficients constitute the sought parameter vector which is presented to the shape recognition block ( Figure 1: 12).
  • an offset value K ⁇ is added to the set of Bark coefficients B ⁇ .
  • the parameterization chain modified according to the method of the invention, now referenced 11 ′, is illustrated in FIG. 5.
  • the circuits 110, 111 and 112 are strictly identical, as regards the functions fulfilled, to the circuits with the same reference in FIG. 2 , and there is no need to rewrite them.
  • an additional circuit 113 is inserted between the circuits 110 and 111. This circuit has the function of applying the above-mentioned offset to the sixteen Bark coefficients.
  • FIG. 4 shows the Qlog curve modified due to the application of the above-mentioned offset K ⁇ .
  • K ⁇ equal to 2000.
  • the initial slope of the modified Qlog curve 2 initially passes through the ordinate 40 and has a lower slope than the unmodified Qlog C_ curve, which translates into less sensitivity to variations in Bark coefficients at low levels.
  • the results obtained will be further improved, by making the robustness adaptive, in real time, to the level of ambient noise.
  • the sixteen offset values being independent, it is therefore possible to choose distinct values for the sixteen frequency channels of the Bark type.
  • the choice of offset values will be determined by two main factors.
  • the robustness must be greater for the channels with the highest noise level.
  • the noise picked up by the pilot's oxygen mask typically has a peak at 2 kHz. It is therefore advantageous for the offset corresponding to this frequency value to be high.
  • the voice recognition system receives a time signal U (t) as an input.
  • This signal can consist of a pure useful signal, that is to say speech not marred by noise, a signal acoustic more or less drowned in noise, or a noise signal only.
  • a first module, 30, discriminates in the incoming time signal U (t), the speech signal and the noise signal.
  • the noise segments are isolated and transmitted to a noise modeling module 31.
  • Speech detection is a conventional and well-known signal processing. Different methods have been proposed.
  • the preliminary treatment briefly recalled above may be of a known type.
  • the development of a noise model for a noisy signal is a conventional operation in itself.
  • the method used for this operation can be a method of the known art, but also an original method.
  • This method is based on a permanent and automatic search for a noise model.
  • the search is carried out on the samples of the signal U (t), digitized and stored in an input buffer memory (not shown).
  • This memory is capable of simultaneously memorizing all the samples of several frames of the input signal (at least two frames and, in the general case, N frames).
  • the noise model sought is made up of a succession of several frames whose energy stability and relative energy level make one think that it is an ambient noise and not a speech signal or a other disturbing noise. We will see later how this automatic search is done.
  • the starting postulates for the automatic development of a noise model are as follows:
  • the noise that we want to eliminate is the ambient background noise
  • noise and speech are superimposed in terms of signal energy, so that a signal containing speech or disturbing noise, including breathing in the microphone, necessarily contains more energy than '' an ambient noise signal.
  • ambient noise is a signal having a minimum energy stable in the short term.
  • the number of frames intended to evaluate the stability of the noise is from 5 to 20.
  • the energy must be stable over several frames, otherwise it must be assumed that the signal contains rather speech or noise other than ambient noise. It must be minimal, failing which it is considered that the signal contains respiration or phonetic elements of speech resembling noise but superimposed on ambient noise.
  • FIG. 7 represents a typical configuration of the temporal evolution of the energy of a microphone signal at the time of the start of emission of speech, with a phase of breath noise, which goes out for a few tens to hundreds of milliseconds to make room for ambient noise alone, after which a high energy level indicates the presence of speech, to finally return to ambient noise.
  • NI NI
  • the digital values of all the samples of these N frames are stored.
  • This set of NxP samples constitutes the current noise model. It is used in denoising. Analysis of the following frames continues.
  • the noise model is generally wedged on permanent ambient noise. Even before speaking, preceded by breathing, there is a phase where ambient noise alone is present for a sufficient time to be taken into account as an active noise model. This phase of ambient noise alone, after breathing, is brief. The NI number is chosen to be relatively low, so that there is time to readjust the noise model to the ambient noise after the breathing phase.
  • the ambient noise changes slowly, the change will be taken into account since the comparison threshold with the stored model is greater than 1. If it changes more rapidly in the increasing direction, the change may not be taken into account, so that it is preferable to plan from time to time a re-initialization of the search for a noise model.
  • the ambient noise is relatively low, and during the take-off phase the noise model should not remain fixed on what it was at a standstill , because a noise model is only replaced by a less energetic model or not much more energetic.
  • the re-initialization methods envisaged will be explained below.
  • FIG. 8 represents a flow diagram of the operations for automatically searching for an ambient noise model.
  • the input signal U (t), sampled at the frequency F e 1 / T e and digitized by an analog-digital converter, is stored in a buffer memory WO 00/31728, ⁇ g PCT / FR99 / 02852
  • n The number of the current frame in a search operation for a noise model is designated by n and is counted by a counter as the search is carried out. At the initialization of the search, n is set to 1. This number n will be incremented as a model of several successive frames is developed. When analyzing the current frame n, the model already includes by hypothesis n-1 successive frames meeting the conditions imposed to be part of a model.
  • the signal energy of the frame is calculated by summing the squares of the digital values of the samples of the frame. It is kept in memory.
  • the ratio between the energies of the two frames is calculated. If this ratio is between two thresholds S and S 'one of which is greater than 1 and the other of which is less than 1, it is considered that the energies of the two frames are close and that the two frames can be part of a noise model.
  • the frames are declared incompatible and the search is reset by resetting n to 1.
  • the rank n of the current frame is incremented, and in an iterative procedure loop, the energy of the next frame is calculated and a comparison with the energy of the previous frame or previous frames, using thresholds S and S '.
  • the first type of comparison consists in comparing only the energy of the frame n to l energy of frame n-1.
  • the second type consists in comparing the energy of the frame n to each of the frames 1 to n-l.
  • the second way leads to a greater homogeneity of the model but it has the disadvantage of not taking into account sufficiently well the cases where the noise level increases or decreases quickly.
  • the energy of the frame of rank n is compared with the energy of the frame of rank n-1 and possibly of other previous frames (not necessarily all of them for that matter).
  • - or n is less than or equal to a minimum number NI below which the model cannot be considered as significant of the ambient noise because the duration of homogeneity is too short; for example NI ⁇ 5; in this case we abandon the model being developed, and we reset the search at the beginning by giving n to 1; - or n is greater than the minimum number NI.
  • NI minimum number below which the model cannot be considered as significant of the ambient noise because the duration of homogeneity is too short
  • the number N2 is chosen so as to limit the computation time in the subsequent operations for estimating the spectral noise density.
  • the homogeneous frame is added to the previous ones to help build the noise model, n is incremented and the next frame is analyzed.
  • the frame is also added to the n-1 previous homogeneous frames and the model of n homogeneous frames is stored to be used for noise elimination.
  • the search for a model is also reset by setting n to 1.
  • the previous steps relate to the first model search. But once a model has been stored, it can be replaced at any time by a more recent model.
  • the replacement condition is still an energy condition, but this time it relates to the average energy of the model and no longer on the energy of each frame.
  • the new model is considered to be better and it is stored in place of the previous one. Otherwise, the new model is rejected and the old one remains in force.
  • the threshold SR is preferably slightly greater than 1.
  • the threshold SR is approximately 1.5. Above this threshold we will keep the old model; below this threshold we will replace the old model by the new. In both cases, the search will be reinitialized by recommencing the reading of a first frame of the input signal U (t), and setting n to 1.
  • the digital signal processing commonly used in speech detection makes it possible to identify the presence of speech based on the characteristic spectra of periodicity of certain phonemes, in particular the phonemes corresponding to vowels or to voiced consonants.
  • This inhibition is to avoid that certain sounds are taken for noise, whereas they are useful phonemes, that a noise model based on these sounds is stored and that the suppression of noise after the development of the model then tends to suppress all similar sounds.
  • Ambient noise can indeed increase significantly and rapidly, for example during the acceleration phase of the engines of an airplane or other vehicle, air, land or sea.
  • the threshold SR requires that the previous noise model be kept when the average noise energy increases too quickly.
  • the simplest way is to reinitialize the model periodically by searching for a new model and by imposing it as active model independently of the comparison between this model and the previously stored model.
  • the periodicity can be based on the average duration of speech in the envisaged application; for example, the speaking times are on average a few seconds for the crew of an airplane, and the re-initialization can take place with a periodicity of a few seconds.
  • the implementation of the method for developing a noise model can be done from non-specialized computers, provided with necessary calculation programs and receiving the samples of digitized signals as they are supplied by an analog-digital converter, via a suitable port.
  • This implementation can also be done using a specialized computer based on digital signal processors, which makes it possible to process a larger number of digital signals more quickly.
  • the computers are associated, as is well known, with different types of memories, static and dynamic, for recording the programs and the intermediate data, as well as with circulating memories of the type
  • the method according to the invention introduces a dependence between the energy of the noise measured and the parameterization chain.
  • the operation has two main stages.
  • the first step carried out in the single module 40 of block 4, consists in quantifying the energy of the noise.
  • This module directly receives the speech signals and the noise signals from the noise modeling module 31.
  • the energy of the noise is determined and compared to a pre-defined series of energy values.
  • the energy of a signal can be obtained simply by taking the quadratic average of samples.
  • Each interval, bounded by a maximum value and a minimum energy value of the noise, will correspond to a predetermined level of robustness.
  • the parameterization chain is constant. Naturally the different intervals are contiguous.
  • the determination of the operating ranges is carried out, a priori, once and for all, during a preliminary phase, depending on the precise application envisaged.
  • the second step is to selectively modify the parameterization chain.
  • each operating range corresponds to a different parameterization chain.
  • the parameterization chain proper is represented by a block 5 itself comprising several modules: a module 50, which will be called for convenience switch, an optional module 51 for denoising the speech signal, a module 52 for calculating the Bark coefficients, a module 53 for offsets configuration and a module 54 for calculating cepstres.
  • the parameterization chain 5 has a configuration close to that described with regard to FIG. 5.
  • the module 52 makes it possible to determine the spectral energy contained in the Bark windows. It is identical to the module 110 in FIG. 5.
  • the module 54 groups, for its part, the modules 111 and 112 of FIG. 5. These modules are moreover common to the known art (FIG. 2: 110 to 112) .
  • the calculation of the spectral energy can be optionally preceded by a denoising of the speech signal produced in the module 52.
  • the module 51 can conventionally include a Wiener filter or a generalized Wiener filter.
  • Wiener filters are described in the following books, to which one can profitably refer:
  • the Wiener filter receives, at the input, a digital signal, said to be useful, tainted with noise, for example the speech signal in the application described, and restores on its output this same signal free, theoretically, of the noise component.
  • noise for example the speech signal in the application described
  • the Wiener filter restores on its output this same signal free, theoretically, of the noise component.
  • residual noise of generally not insignificant amplitude.
  • the parameterization chain 5 comprises two components specific to the invention.
  • the first, referenced 53 allows, as in the case of the configuration described with reference to FIG. 5, to add offset values to the Bark coefficients.
  • the advantageous characteristic proper to the preferred embodiment, illustrated by FIG. 6, the offset values are no longer constant over time.
  • a separate offset configuration is applied for each of the above energy ranges.
  • the number of ranges depends on the specific application envisaged. In the example illustrated in FIG. 6, it has been assumed that there are five ranges, and therefore five different offset configurations.
  • the second specific module is constituted by what has been called the switch 50.
  • This member detects the result of the comparison carried out in the quantization module 40.
  • the result of the comparison can be constituted by a binary word control, representing numbers 1 to 5, and more generally 1 to n, if there are n distinct configurations.
  • the member 50 cooperates with the module 53 and transmits this binary command word to it via the link 500.
  • the latter can include a memory area 530, for example made up of registers for storing the different offset configurations, and an area of logic circuits 531 allowing the addition of offsets K ⁇ to the sixteen Bark coefficients.
  • the binary word transmitted by the link 500 makes it possible to select one of the offset configurations recorded in the zone 530 and its application to the Bark coefficients, in the manner previously described, before the logarithmic compression operation carried out in the module 54 .
  • the shape recognition module 6 can be produced using a conventional and well-known method per se, for example of the so-called “DTW” (for "Dynamic Time Warping") or "HMM” (for "Hidden Markov Model”) type. ). However, it is advantageous for the shape recognition module to be informed of the offset configuration chosen. Indeed, in conventional methods of pattern recognition, it is customary to take into account a threshold parameter known as "Pruning", which corresponds to the maximum distortion authorized for a given locution. This parameter sizes the response time of the system.
  • Pruning a threshold parameter known as "Pruning”
  • the module 50 therefore also transmits, via a link 501, a control word to the pattern recognition module 6, in order to adjust the so-called "Pruning" threshold for each offset configuration adopted.
  • the necessary adaptation of block 6 requires only a minimal modification of the standard circuits used for this purpose.
  • the control word can also be the same as that transmitted to the module 53. In this case, the same transmission link 500 is used.
  • the digital signals representing it are transmitted to a user unit 7: headphones, recorder, etc.
  • the implementation of the method according to the invention does not appreciably modify the workload of the operator, with regard to the acoustic references (memorized references).
  • the computation time of the station in which learning is carried out increases. This is due to the fact that it is necessary to carry out learning for each preset offset configuration.
  • these calculations are carried out once and for all, or at least on rare occasions: system modifications, increase in the learning corpus, etc.
  • the recognition time is identical to that of a recognition with conventional cepstral parameterization.
  • the additional computation time required by the quantification of the noise energy and the loading of the offset configuration values is completely negligible compared to the computation time linked to the actual recognition.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Noise Elimination (AREA)
  • Circuit For Audible Band Transducer (AREA)
PCT/FR1999/002852 1998-11-20 1999-11-19 Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede WO2000031728A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
DE69906569T DE69906569T2 (de) 1998-11-20 1999-11-19 Verfahren und vorrichtung zur spracherkennung eines mit störungen behafteten akustischen signals
US09/831,344 US6868378B1 (en) 1998-11-20 1999-11-19 Process for voice recognition in a noisy acoustic signal and system implementing this process
EP99956096A EP1131813B1 (de) 1998-11-20 1999-11-19 Verfahren und vorrichtung zur spracherkennung eines mit störungen behafteten akustischen signals

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR9814641A FR2786308B1 (fr) 1998-11-20 1998-11-20 Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede
FR98/14641 1998-11-20

Publications (1)

Publication Number Publication Date
WO2000031728A1 true WO2000031728A1 (fr) 2000-06-02

Family

ID=9533000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FR1999/002852 WO2000031728A1 (fr) 1998-11-20 1999-11-19 Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede

Country Status (5)

Country Link
US (1) US6868378B1 (de)
EP (1) EP1131813B1 (de)
DE (1) DE69906569T2 (de)
FR (1) FR2786308B1 (de)
WO (1) WO2000031728A1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6868378B1 (en) * 1998-11-20 2005-03-15 Thomson-Csf Sextant Process for voice recognition in a noisy acoustic signal and system implementing this process

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7165028B2 (en) * 2001-12-12 2007-01-16 Texas Instruments Incorporated Method of speech recognition resistant to convolutive distortion and additive distortion
US7155388B2 (en) 2004-06-30 2006-12-26 Motorola, Inc. Method and apparatus for characterizing inhalation noise and calculating parameters based on the characterization
US7139701B2 (en) * 2004-06-30 2006-11-21 Motorola, Inc. Method for detecting and attenuating inhalation noise in a communication system
US7254535B2 (en) * 2004-06-30 2007-08-07 Motorola, Inc. Method and apparatus for equalizing a speech signal generated within a pressurized air delivery system
US7436969B2 (en) * 2004-09-02 2008-10-14 Hewlett-Packard Development Company, L.P. Method and system for optimizing denoising parameters using compressibility
US7774202B2 (en) * 2006-06-12 2010-08-10 Lockheed Martin Corporation Speech activated control system and related methods
US8239190B2 (en) * 2006-08-22 2012-08-07 Qualcomm Incorporated Time-warping frames of wideband vocoder
US7873114B2 (en) * 2007-03-29 2011-01-18 Motorola Mobility, Inc. Method and apparatus for quickly detecting a presence of abrupt noise and updating a noise estimate
FR2938396A1 (fr) * 2008-11-07 2010-05-14 Thales Sa Procede et systeme de spatialisation du son par mouvement dynamique de la source
JP6169849B2 (ja) * 2013-01-15 2017-07-26 本田技研工業株式会社 音響処理装置
CN113794979B (zh) * 2021-08-30 2023-05-12 航宇救生装备有限公司 低阻抗送话器匹配低阻抗音频控制模块时的评价控制方法
CN114743562B (zh) * 2022-06-09 2022-11-01 成都凯天电子股份有限公司 一种飞机声纹识别方法、系统、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997033273A1 (en) * 1996-03-08 1997-09-12 Motorola Inc. Method and recognizer for recognizing a sampled sound signal in noise
US5696878A (en) * 1993-09-17 1997-12-09 Panasonic Technologies, Inc. Speaker normalization using constrained spectra shifts in auditory filter domain

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5583961A (en) * 1993-03-25 1996-12-10 British Telecommunications Public Limited Company Speaker recognition using spectral coefficients normalized with respect to unequal frequency bands
US6570991B1 (en) * 1996-12-18 2003-05-27 Interval Research Corporation Multi-feature speech/music discrimination system
FR2771542B1 (fr) * 1997-11-21 2000-02-11 Sextant Avionique Procede de filtrage frequentiel applique au debruitage de signaux sonores mettant en oeuvre un filtre de wiener
FR2786308B1 (fr) * 1998-11-20 2001-02-09 Sextant Avionique Procede de reconnaissance vocale dans un signal acoustique bruite et systeme mettant en oeuvre ce procede

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5696878A (en) * 1993-09-17 1997-12-09 Panasonic Technologies, Inc. Speaker normalization using constrained spectra shifts in auditory filter domain
WO1997033273A1 (en) * 1996-03-08 1997-09-12 Motorola Inc. Method and recognizer for recognizing a sampled sound signal in noise

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
COLE R A ET AL: "Experiments with a spoken dialogue system for taking the US census", SPEECH COMMUNICATION, vol. 23, no. 3, 1 November 1997 (1997-11-01), pages 243-260, XP004112587, ISSN: 0167-6393 *
HERMANSKY H: "AUTOMATIC SPEECH RECOGNITION AND HUMAN AUDITORY PERCEPTION", EUROPEAN CONFERENCE ON SPEECH TECHNOLOGY, EDINBURGH, SEPTEMBER 1987, vol. 1, no. CONF. 1, 1 September 1987 (1987-09-01), LAVER J;JACK M A, pages 79 - 82, XP000010688 *
VERGIN R ET AL: "Compensated mel frequency cepstrum coefficients", 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING CONFERENCE PROCEEDINGS (CAT. NO.96CH35903), 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING CONFERENCE PROCEEDINGS, ATLANTA, GA, USA, 7-10 M, 1996, New York, NY, USA, IEEE, USA, pages 323 - 326 vol. 1, XP002110436, ISBN: 0-7803-3192-3 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6868378B1 (en) * 1998-11-20 2005-03-15 Thomson-Csf Sextant Process for voice recognition in a noisy acoustic signal and system implementing this process

Also Published As

Publication number Publication date
DE69906569T2 (de) 2004-01-08
DE69906569D1 (de) 2003-05-08
US6868378B1 (en) 2005-03-15
EP1131813A1 (de) 2001-09-12
FR2786308B1 (fr) 2001-02-09
EP1131813B1 (de) 2003-04-02
FR2786308A1 (fr) 2000-05-26

Similar Documents

Publication Publication Date Title
EP1154405B1 (de) Verfahren und Vorrichtung zur Spracherkennung in einer Umgebung mit variablerem Rauschpegel
EP0918317B1 (de) Verfahren zur Frequenzfilterung mittels eines Wiener Filters für die Geräuschunterdrückung von Audiosignalen
EP0993671B1 (de) Verfahren zur bestimmung eines rauschmodells in einem gestörten audiosignal
EP0768770B1 (de) Verfahren und Vorrichtung zur Erzeugung von Hintergrundrauschen in einem digitalen Übertragungssystem
EP1789956B1 (de) Verfahren zum verarbeiten eines rauschbehafteten tonsignals und einrichtung zur implementierung des verfahrens
EP2415047B1 (de) Klassifizieren von in einem Tonsignal enthaltenem Hintergrundrauschen
EP1593116B1 (de) Verfahren zur differenzierten digitalen Sprach- und Musikbearbeitung, Rauschfilterung, Erzeugung von Spezialeffekten und Einrichtung zum Ausführen des Verfahrens
EP1131813B1 (de) Verfahren und vorrichtung zur spracherkennung eines mit störungen behafteten akustischen signals
EP0867856A1 (de) Verfahren und Vorrichtung zur Sprachdetektion
EP2772916B1 (de) Verfahren zur Geräuschdämpfung eines Audiosignals mit Hilfe eines Algorithmus mit variabler Spektralverstärkung mit dynamisch modulierbarer Härte
WO2003048711A2 (fr) System de detection de parole dans un signal audio en environnement bruite
WO2008096084A1 (fr) Synthèse de blocs perdus d'un signal audionumérique, avec correction de période de pitch
EP0906613B1 (de) Verfahren und vorrichtung zur kodierung eines audiosignals mittels "vorwärts"- und "rückwärts"-lpc-analyse
CA2404441C (fr) Parametres robustes pour la reconnaissance de parole bruitee
EP3627510A1 (de) Filterung eines tonsignals, das durch ein stimmerkennungssystem erfasst wurde
FR2751776A1 (fr) Procede d'extraction de la frequence fondamentale d'un signal de parole
FR2856506A1 (fr) Procede et dispositif de detection de parole dans un signal audio
WO2002082424A1 (fr) Procede et dispositif d'extraction de parametres acoustiques d'un signal vocal
WO1999027523A1 (fr) Procede de reconstruction, apres debruitage, de signaux sonores
WO2001091106A1 (fr) Fenetres d'analyse adaptatives pour la reconnaissance de la parole
FR2864319A1 (fr) Procede et dispositif de detection de parole dans un signal audio
FR2796486A1 (fr) Procedes et dispositifs pour substituer une voix synthetisee dynamiquement a des vocabulaires identifies automatiquement

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1999956096

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09831344

Country of ref document: US

WWP Wipo information: published in national office

Ref document number: 1999956096

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1999956096

Country of ref document: EP