EP1062659A1 - Method and device for processing a sound signal - Google Patents
Method and device for processing a sound signalInfo
- Publication number
- EP1062659A1 EP1062659A1 EP99917771A EP99917771A EP1062659A1 EP 1062659 A1 EP1062659 A1 EP 1062659A1 EP 99917771 A EP99917771 A EP 99917771A EP 99917771 A EP99917771 A EP 99917771A EP 1062659 A1 EP1062659 A1 EP 1062659A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- segments
- signal
- determined
- envelope
- audio signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Definitions
- the invention relates to a method and a device for processing a sound signal.
- a wavelet transformation is known from [2].
- a wavelet transformation is preferably carried out in a plurality of transformation stages, a transformation stage dividing a pattern into a high-pass and a low-pass component.
- the respective high-pass or low-pass portion preferably has a reduced resolution compared to the pattern (technical term: subsampling, i.e. reduced sampling rate, thus reduced resolution). From the high pass and the
- the wavelet transformation can be one-dimensional, two-dimensional or multidimensional.
- a sound signal comprises a useful signal and a disturbance signal, the strength of the disturbance signal depending on the environment. It is essential for further processing of the audio signal
- the object of the invention is to provide a method and a device which ensures the processing of a sound signal in such a way that the disadvantage described above is avoided.
- FFT frequency transform
- a wavelet transformation or any other transformation can also be used to map the time range m the frequency range.
- a method for processing an audio signal in which the audio signal is transformed into a frequency range. For at least one predetermined frequency of the audio signal, an envelope of the audio signal transformed over the frequency range is determined over time. The envelope is subdivided into a set of segments, which segments are each determined by a predetermined duration. A maximum of the envelope is determined for each segment of the set of segments. The smallest maximum is determined for a predetermined number of segments of the set of segments. The audio signal is processed by subtracting the smallest maximum weighted by a factor from the audio signal.
- the smallest maximum is thus advantageously specified, which is determined over a predetermined duration for the respective frequency, the envelope of which is determined over time, the smallest maximum preferably detecting an interference signal, which comprises a useful signal and an interference signal, the interference signal.
- an interference signal which comprises a useful signal and an interference signal, the interference signal.
- the multiple segments include a dynamic course of the interference signal over time.
- the disturbance signal can be an engine noise in a motor vehicle which continuously accelerates the motor vehicle over a period of time.
- the disturbance signal in the motor vehicle thus increases over time (during acceleration). Since the smallest maximum is determined in each case for the number of segments, the smallest maximum is (re) determined over time for each number of segments, so that the dynamic development of the fault signal can also be taken into account.
- a further development of the invention consists in that a minimum is determined for a further number of segments of the set of segments, and that the audio signal is processed by subtracting the minimum maximum combined with the minimum from the audio signal.
- the minimum which is determined for a further number of segments, proves to be extremely advantageous for adapting the interference signal, which is to be subtracted from the audio signal in order to obtain the useful signal.
- the minimum characterizes the interference signal and is therefore subtracted from the sound signal.
- the coefficients are to be specified in such a way that the interference signal is reduced favorably for the application.
- An advantageous development consists in that an update is carried out each time the number or the further number of segments has elapsed, in such a way that an updated fault signal is subtracted from the sound signal.
- the sound signal is a speech signal, preferably naturally spoken speech.
- the processed audio signal is used for speech recognition.
- a clear useful signal if possible without a disturbance signal component, is an advantageous requirement, especially for a system for speech recognition.
- the speech recognition system recognizes the spoken language the better, the clearer the useful signal is.
- the useful signal can also be output. 5
- a device for processing a sound signal which has a processor unit which is set up in such a way that the sound signal m can be transformed over a frequency range. For at least one predetermined frequency, an envelope of the tone signal transformed over the frequency range can be determined over time.
- the envelope can be subdivided into a set of segments, each of which is determined by a predetermined duration. A maximum of the envelope is determined for each segment of the set of segments. For a number of segments of the
- the smallest maximum is determined for the number of segments.
- the audio signal is processed by subtracting the smallest maximum weighted by a factor from the audio signal.
- a possible development of the device for processing a sound signal is that the processor unit is set up in such a way that a minimum is determined for a further number of segments, and that the sound signal is processed by combining the smallest maximum with the minimum of is deducted from the sound signal.
- the device is particularly suitable for carrying out the method according to the invention or a further development described above.
- FIG.l is a block diagram showing steps of a method for processing a sound signal; 6 FIG. 2 shows a profile of an envelope f (t) of a frequency £ _ over the time t;
- Fig.l shows a block diagram which has steps of a method for processing a sound signal. Two variants for processing the sound signal are shown below using Fig.la and Fig.lb.
- the sound signal m is transformed at least one frequency range (see step 101).
- This transformation is preferably a Fast Fourier Transformation (FFT).
- FFT Fast Fourier Transformation
- the transformation is carried out at specific points in time t ⁇ and thus a course of at least one frequency is determined over the points in time t x .
- An envelope is determined in a step 102 via this time-dependent course of the frequency. This is carried out for at least one frequency, in particular for several significant frequencies of the audio signal.
- the respective envelope m is subdivided into a set of segments, which segments preferably have the same duration. A maximum is determined for each segment in the course of the envelope (cf. step 104).
- the smallest maximum of a predetermined number of segments is determined in a step 105 and this smallest maximum, in particular weighted by a factor, is subtracted from the audio signal in order to reduce the interference signal and to ensure the strongest possible useful signal (cf. step 106).
- the smallest maximum is determined for a certain number of previous segments, with an update being carried out again after a predefined time for the smallest maximum, taking into account the number predefined at this new time 7 past segments.
- the smallest maximum for the envelope of the respective frequency is dynamically adjusted over time at all times given by the number N of previous segments.
- the disturbance signal is an accelerating vehicle in which an engine noise increases over time in accordance with the acceleration.
- the disturbance signal corresponding to the increasing engine noise is adapted by updating the smallest maximum at predetermined times for the envelope of predetermined frequencies in order to obtain a high-quality useful signal from the audio signal.
- Fig.lb shows the blocks 101, 102, 103, 104 and 105 corresponding to Fig.la.
- step 103 in addition to the determination of the maximum (104 and 105), a minimum over a predetermined time of the envelope of the particular examined Frequency determined (see step 107).
- the (smallest) minimum of a predetermined number of previous segments is of interest, that is to say the minimum that results from the envelope from a current point in time for a duration to be taken into account.
- both the smallest maximum and the minimum are linked to one another in order to obtain a disturbance signal to be subtracted from the audio signal and thus decisively improve the quality of the useful signal.
- a denotes a first predetermined coefficient
- b a second predetermined coefficient
- max the smallest maximum
- mm the minimum
- N denotes a noise estimate or a value strongly correlated with the noise.
- This link also takes into account the temporal variation of the fault signal. If a constant interference signal is superimposed on the useful signal, exactly this interference signal or a proportion proportional to it is eliminated.
- the time interval T to be taken into account to determine the minimum and possibly also the smallest maximum, which characterizes the duration of the number of past segments, is chosen in particular so that this time interval T is longer than the spoken word (the sound signal corresponds to naturally spoken language).
- Amplitude Af of the frequency f and on the abscissa is the
- the time axis t is divided into segments SEG X , 1 representing a time variable.
- the segments SEG1, SEG2, ..., SEG6 are shown in FIG. 2 as an example.
- a maximum Max x is determined, which represents em maximum of the envelope f (t) of the frequency f x over time t related to the respective segment SEG X.
- the maxima Maxl, Max2, ..., Max 6 result.
- the smallest of the maxima, in the example Maximum Max ⁇ from segment SEG ⁇ is determined.
- the minimum Mm of the segments SEG L shown is m segment SEG2.
- the smallest maximum Max ⁇ and the minimum Mm determined in this way are linked to one another in the manner described above and by the sound signal, that is to say the frequency f 1; subtracted to improve the useful signal (again based on the frequency f).
- a weighted average of the smallest maximum and minimum is subtracted from the audio signal (based on the frequency f x to be taken into account in each case).
- the smallest maximum and the minimum at a time t a ] t are determined taking into account a predetermined number N of segments before this time t k.
- the smallest maximum and the minimum are again determined at different times t a kt, linked to one another and subtracted from the useful signal (based on the respective frequency f).
- FIG. 2 shows an example of the envelope f H (t) for a predetermined frequency i x .
- transformation for example after carrying out an FFT
- the sound signal x (t) into the frequency range exactly one value of an amplitude Af at the respective time t is obtained for each frequency f x .
- the course of the frequency f ⁇ _ (t) over time t results from transformations into the frequency range carried out at different times t. In this way, the time course of a predetermined frequency f x (t) is obtained.
- the envelope f (t) is determined via this time course of the frequency f ⁇ (t).
- This envelope f (t) is shown in Fig.2.
- an envelope f (t) is determined for several frequencies f x , so that the
- Invention is applied to a plurality of envelopes f (t), which represent the course of a plurality of frequencies f x over time, and thus a significant improvement in the sound signal is achieved by subtracting the determined interference signal from a sound signal containing information.
- the processor unit PRZE comprises a processor CPU, a memory SPE and an input / output interface IOS, which is used in different ways via an interface IFC: an output is visible on a monitor MON and / or on a printer via a graphic interface PRT issued. An entry is made using a mouse MAS or a keyboard TAST.
- the processor unit PRZE also has a data bus BUS, which ensures the connection of a memory MEM, the processor CPU and the input / output interface IOS. Additional components can also be connected to the data bus BUS, e.g. additional memory, data storage (hard disk) or scanner.
- Fig. Shows em speech recognition system.
- a prerequisite for recognizing naturally spoken language is a suitable formalism for representing knowledge.
- a complete speech recognition system comprises several levels of processing. These are in particular acoustic phonetics, intonation, syntax, semantics and pragmatics.
- Fig. 4 shows the processing levels during recognition (see [1]) - 11
- the natural speech signal SPRS reaches the speech recognition system.
- a feature extraction is carried out there in a component MEX.
- MEX acoustic-phonetic units
- SPLE acoustic-phonetic units
- This is the calculation of acoustic distance parameters.
- the lexical decoding takes place in a block LDK with the aid of the pronunciation model or word lexicon WOLX and then a syntax analysis SYAL with the help of the language model which includes the grammar, GRSML.
- the word recognition LDK and the syntax analysis SYAL represent the search for a correspondence for the speech signal.
- a semantic postprocessing is carried out in a block SENB, taking into account context knowledge and pragmatics KWPM and finally the language recognized by the speech recognition system ERSPR.
Abstract
Description
Claims
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE19812207 | 1998-03-19 | ||
DE19812207 | 1998-03-19 | ||
PCT/DE1999/000615 WO1999048084A1 (en) | 1998-03-19 | 1999-03-08 | Method and device for processing a sound signal |
Publications (2)
Publication Number | Publication Date |
---|---|
EP1062659A1 true EP1062659A1 (en) | 2000-12-27 |
EP1062659B1 EP1062659B1 (en) | 2002-01-30 |
Family
ID=7861632
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP99917771A Expired - Lifetime EP1062659B1 (en) | 1998-03-19 | 1999-03-08 | Method and device for processing a sound signal |
Country Status (5)
Country | Link |
---|---|
US (1) | US6804646B1 (en) |
EP (1) | EP1062659B1 (en) |
JP (1) | JP4276781B2 (en) |
DE (1) | DE59900797D1 (en) |
WO (1) | WO1999048084A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8092034B2 (en) * | 2007-11-07 | 2012-01-10 | Richard David Ashoff | Illuminated tile systems and methods for manufacturing the same |
US8321209B2 (en) | 2009-11-10 | 2012-11-27 | Research In Motion Limited | System and method for low overhead frequency domain voice authentication |
US8326625B2 (en) * | 2009-11-10 | 2012-12-04 | Research In Motion Limited | System and method for low overhead time domain voice authentication |
CN111387978B (en) * | 2020-03-02 | 2023-09-26 | 京东科技信息技术有限公司 | Method, device, equipment and medium for detecting action segment of surface electromyographic signal |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3196212A (en) | 1961-12-07 | 1965-07-20 | Ibm | Local amplitude detector |
US4185168A (en) * | 1976-05-04 | 1980-01-22 | Causey G Donald | Method and means for adaptively filtering near-stationary noise from an information bearing signal |
US4888806A (en) * | 1987-05-29 | 1989-12-19 | Animated Voice Corporation | Computer speech system |
KR950013552B1 (en) * | 1990-05-28 | 1995-11-08 | 마쯔시다덴기산교 가부시기가이샤 | Voice signal processing device |
JPH04150522A (en) * | 1990-10-15 | 1992-05-25 | Sony Corp | Digital signal processor |
US5323337A (en) * | 1992-08-04 | 1994-06-21 | Loral Aerospace Corp. | Signal detector employing mean energy and variance of energy content comparison for noise detection |
US5479560A (en) | 1992-10-30 | 1995-12-26 | Technology Research Association Of Medical And Welfare Apparatus | Formant detecting device and speech processing apparatus |
JP3237089B2 (en) * | 1994-07-28 | 2001-12-10 | 株式会社日立製作所 | Acoustic signal encoding / decoding method |
JP3765171B2 (en) * | 1997-10-07 | 2006-04-12 | ヤマハ株式会社 | Speech encoding / decoding system |
-
1999
- 1999-03-08 JP JP2000537202A patent/JP4276781B2/en not_active Expired - Fee Related
- 1999-03-08 EP EP99917771A patent/EP1062659B1/en not_active Expired - Lifetime
- 1999-03-08 US US09/646,593 patent/US6804646B1/en not_active Expired - Fee Related
- 1999-03-08 WO PCT/DE1999/000615 patent/WO1999048084A1/en active IP Right Grant
- 1999-03-08 DE DE59900797T patent/DE59900797D1/en not_active Expired - Lifetime
Non-Patent Citations (1)
Title |
---|
See references of WO9948084A1 * |
Also Published As
Publication number | Publication date |
---|---|
US6804646B1 (en) | 2004-10-12 |
DE59900797D1 (en) | 2002-03-14 |
WO1999048084A1 (en) | 1999-09-23 |
EP1062659B1 (en) | 2002-01-30 |
JP4276781B2 (en) | 2009-06-10 |
JP2002507775A (en) | 2002-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE69726526T2 (en) | Scheme and model adaptation for pattern recognition based on Taylor expansion | |
DE102007001255B4 (en) | Audio signal processing method and apparatus and computer program | |
DE69131776T2 (en) | METHOD FOR VOICE ANALYSIS AND SYNTHESIS | |
DE69725802T2 (en) | Pre-filtering using lexical trees for speech recognition | |
DE69826446T2 (en) | VOICE CONVERSION | |
DE69830017T2 (en) | Method and device for speech recognition | |
EP0690436B1 (en) | Detection of the start/end of words for word recognition | |
EP2158588B1 (en) | Spectral smoothing method for noisy signals | |
EP0076234B1 (en) | Method and apparatus for reduced redundancy digital speech processing | |
DE69720861T2 (en) | Methods of sound synthesis | |
EP0994461A2 (en) | Method for automatically recognising a spelled speech utterance | |
EP1280138A1 (en) | Method for audio signals analysis | |
EP0076233B1 (en) | Method and apparatus for redundancy-reducing digital speech processing | |
EP0285222B1 (en) | Method for detecting associatively pronounced words | |
DE69918635T2 (en) | Apparatus and method for speech processing | |
EP0987682B1 (en) | Method for adapting linguistic language models | |
EP1193689A2 (en) | Method for the computation of an eigenspace for the representation of a plurality of training speakers | |
DE19581667C2 (en) | Speech recognition system and method for speech recognition | |
DE69922769T2 (en) | Apparatus and method for speech processing | |
DE3228757A1 (en) | METHOD AND DEVICE FOR PERIODIC COMPRESSION AND SYNTHESIS OF AUDIBLE SIGNALS | |
DE69723930T2 (en) | Method and device for speech synthesis and data carriers therefor | |
EP1062659B1 (en) | Method and device for processing a sound signal | |
DE60224100T2 (en) | GENERATION OF LSF VECTORS | |
EP1136982A2 (en) | Generation of a language model and an acoustic model for a speech recognition system | |
DE602004007223T2 (en) | Continuous vocal tract resonance tracking method using piecewise linear approximations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20000417 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): DE FR GB |
|
17Q | First examination report despatched |
Effective date: 20010212 |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
RIC1 | Information provided on ipc code assigned before grant |
Free format text: 7G 10L 21/02 A |
|
GRAG | Despatch of communication of intention to grant |
Free format text: ORIGINAL CODE: EPIDOS AGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAH | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOS IGRA |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REF | Corresponds to: |
Ref document number: 59900797 Country of ref document: DE Date of ref document: 20020314 |
|
GBT | Gb: translation of ep patent filed (gb section 77(6)(a)/1977) |
Effective date: 20020407 |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20111001 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20130408 Year of fee payment: 15 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20130521 Year of fee payment: 15 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20140312 Year of fee payment: 16 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 59900797 Country of ref document: DE |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: ST Effective date: 20141128 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 59900797 Country of ref document: DE Effective date: 20141001 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20140331 |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20150308 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20150308 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20141001 |