WO2019100500A1 - 语音信号降噪方法及设备 - Google Patents

语音信号降噪方法及设备 Download PDF

Info

Publication number
WO2019100500A1
WO2019100500A1 PCT/CN2017/117553 CN2017117553W WO2019100500A1 WO 2019100500 A1 WO2019100500 A1 WO 2019100500A1 CN 2017117553 W CN2017117553 W CN 2017117553W WO 2019100500 A1 WO2019100500 A1 WO 2019100500A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
sample
speech
frame
voice
Prior art date
Application number
PCT/CN2017/117553
Other languages
English (en)
French (fr)
Inventor
陈维亮
Original Assignee
歌尔科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔科技有限公司 filed Critical 歌尔科技有限公司
Priority to US16/766,236 priority Critical patent/US11475907B2/en
Publication of WO2019100500A1 publication Critical patent/WO2019100500A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02087Noise filtering the noise being separate speech, e.g. cocktail party
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise

Definitions

  • the present invention relates to the field of signal processing technologies, and in particular, to a voice signal denoising method and device.
  • the signal input by the user is mainly used for noise reduction processing using a Least Mean Square (LMS) algorithm.
  • LMS Least Mean Square
  • the LMS algorithm is mainly used to filter out environmental noise signals. If the user input signal is mixed with other people's sound signals in addition to the environmental noise signal, the LMS algorithm is used to reduce the noise and still obtain an unclear effective voice signal. . It can be seen that a more effective speech signal denoising method is needed to remove various noises in the speech signal to obtain a clear and effective speech signal.
  • aspects of the present invention provide a speech signal denoising method and apparatus for effectively removing ambient noise signals and other noise signals in a speech signal to obtain a clear speech signal.
  • the invention provides a speech signal denoising method, comprising:
  • the obtaining, from the voice signal sample library, the sample signal that matches the first voice signal including:
  • a sample signal having the highest similarity to the spectral features of the first speech signal is used as a sample signal matching the first speech signal.
  • performing the voiceprint recognition on the first voice signal to obtain a spectral feature of the first voice signal including:
  • the extracting the spectral features of the at least one frame of the frequency domain signal to obtain the spectral features of the first voice signal includes:
  • a gray value corresponding to each frequency in the first frequency domain signal is used as a spectral feature of the first voice signal.
  • filtering the other noise signals in the first voice signal according to the sample signal matched with the first voice signal to obtain a valid voice signal including:
  • Each frame of valid time domain signals is sequentially combined to obtain the valid speech signal.
  • the method before the filtering the ambient noise signal in the original input signal according to the interference signal related to the ambient noise signal in the original input signal, the method further includes:
  • the second specified distance is greater than the first specified distance.
  • filtering the ambient noise signal in the original input signal according to the interference signal related to the ambient noise signal in the original input signal to obtain the first voice signal including:
  • the ambient noise signal in the original input signal is filtered according to an interference signal associated with the ambient noise signal in the original input signal using a least mean square algorithm to obtain the first speech signal.
  • the present invention also provides an electronic device comprising: a processor, and a memory connected to the processor;
  • the memory is configured to store one or more computer instructions
  • the processor is configured to execute the one or more computer instructions for:
  • the processor when the processor acquires a sample signal that matches the first voice signal from a library of voice signal samples, the processor is specifically configured to:
  • a sample signal having the highest similarity to the spectral features of the first speech signal is used as a sample signal matching the first speech signal.
  • the processor when the processor performs voiceprint recognition on the first voice signal to obtain a spectral feature of the first voice signal, specifically, the processor is specifically configured to:
  • the first noise reduction of the original input signal is achieved, that is, the environmental noise signal is filtered out; on this basis, according to
  • the sample signal matched by the first voice signal filters out other noise signals in the first voice signal to obtain a valid voice signal, so that the valid voice signal sent by the user can be retained according to the sample signal matched with the first voice signal.
  • Filter out noise signals other than the effective signal to achieve the second noise reduction Especially when other noise signals are speech from other speakers, they can be effectively filtered according to the sample signal.
  • the environmental noise signal and other noise signals in the original input signal are sequentially filtered out by two noise reductions, and the obtained effective voice signal is more clear.
  • FIG. 1 is a schematic flowchart of a method for reducing noise of a voice signal according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an electronic device according to another embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a method for reducing noise of a voice signal according to an embodiment of the present invention. As shown in Figure 1, the method includes the following steps:
  • S101 Filter an ambient noise signal in the original input signal according to an interference signal related to an ambient noise signal in the original input signal to obtain a first voice signal.
  • S102 Acquire, from the voice signal sample library, a sample signal that matches the first voice signal.
  • S103 Filter out other noise signals in the first voice signal according to the sample signal matched with the first voice signal to obtain a valid voice signal.
  • the original input signal refers to the voice signal input by the user through a microphone on a device such as a headset or a mobile phone. Due to environmental noise and the presence of other speakers, the original input signal includes ambient noise signals and other noise signals in addition to the valid voice signals emitted by the user. Among them, the environmental noise signal refers to the sound signal generated in industrial production, construction, transportation and social life that interferes with the surrounding living environment. Other noise signals may refer to noise signals other than ambient noise, such as voice signals from other speakers than the user.
  • the first noise reduction process is first performed on the original input signal to filter out the ambient noise signal in the original input signal. Then, the second noise reduction process is performed on the voice signal after the first noise reduction process to filter out other noise signals such as voice signals from other speakers, thereby obtaining a clearer voice signal.
  • the finally obtained speech signal is referred to as an effective speech signal.
  • the first noise reduction process filtering the ambient noise signal in the original input signal according to the interference signal associated with the ambient noise signal in the original input signal to obtain the first voice signal (ie, step S101).
  • the interference signal is a signal that is acquired from the same environment as the original input signal.
  • the original input signal is collected from a rainy environment, and the interference signal is also collected from the rainy environment.
  • the interference signal is mainly composed of an environmental noise signal, and has a fluctuating relationship with the environmental noise signal, that is, the interference signal is related to the environmental noise signal.
  • a noise reduction algorithm such as a least mean square algorithm, can be used to obtain a signal that approximates the ambient noise signal based on the interference signal.
  • the original input signal is then subtracted from the signal obtained from the interference signal to obtain a speech signal that filters out the ambient noise signal.
  • a speech signal that filters out an environmental noise signal is referred to as a first speech signal.
  • the first speech signal includes other noise signals in addition to the effective speech signal, and based on this, the second speech noise reduction process is performed on the first speech signal.
  • a second noise reduction process acquiring a sample signal matching the first voice signal from the voice signal sample library (ie, step S102); filtering the first voice signal according to the sample signal matched with the first voice signal Other noise signals to obtain a valid speech signal (ie, step S103).
  • At least one sample signal is stored in the speech signal sample library.
  • These sample signals may be voice signals that are pre-inputted by the user in a relatively quiet environment, and these sample signals may be considered as valid voice signals that do not include noise.
  • one user can correspond to one sample signal or multiple sample signals.
  • the user can store one sample signal each in the normal and inflamed condition of the dice.
  • Matching the sample signal with the first speech signal means that the sample signal matches the time domain waveform, spectral characteristics, or statistical characteristics of the first speech signal. If the first speech signal matches the sample signal, indicating that the first speech signal includes a valid speech signal sent by the user, the first speech signal may be further subjected to noise reduction processing according to the sample signal to obtain an effective speech signal.
  • the first speech signal is matched with the sample signal such that the sample signal has a correlation with an effective speech signal in the first speech signal and has no correlation with other noise signals.
  • the signal related to the sample signal that is, the effective speech signal
  • the signal unrelated to the sample signal that is, other noise signals, is filtered out.
  • the other noise signals are, for example, speech signals of other speakers.
  • the sample signal that matches the first speech signal is, for example, signal A. Since the voice system of other speakers is different from the voice system of the user, the voice signal from other speakers is not related to the signal A. Based on this, the speech signals of other speakers in the first speech signal can be filtered out to obtain a valid speech signal emitted by the user.
  • the first noise reduction of the original input signal is achieved, that is, the environmental noise signal is filtered out; on this basis, according to
  • the sample signal matched by the first voice signal filters out other noise signals in the first voice signal to obtain a valid voice signal, so that the valid voice signal sent by the user can be retained according to the sample signal matched with the first voice signal.
  • Filter out noise signals other than the effective signal to achieve the second noise reduction Especially when other noise signals are speech from other speakers, they can be effectively filtered according to the sample signal.
  • the environmental noise signal and other noise signals in the original input signal are sequentially filtered out by two noise reductions, and the obtained effective voice signal is more clear.
  • acquiring a sample signal matching the first voice signal from the voice signal sample library includes: performing voiceprint recognition on the first voice signal to obtain a spectrum of the first voice signal Characterizing; calculating a similarity between a spectral feature of the first speech signal and a spectral feature of each sample signal stored in the speech signal sample library; and using the sample signal having the highest similarity with the spectral feature of the first speech signal as the first speech signal Matching sample signals.
  • the voiceprint is a sound wave spectrum carrying speech information displayed by an electroacoustic instrument.
  • the sound wave spectrum carrying the speech information in the first speech signal can be obtained by performing voiceprint recognition on the first speech signal, and the feature of the acoustic wave spectrum is extracted from the acoustic wave spectrum as the spectral feature of the first speech signal.
  • the spectral characteristics of different human speech signals are different. If the spectral features of the two speech signals are similar, the higher the probability that the utterers of the two speech signals are the same person, the more the two speech signals match. Based on this, calculating a similarity between the spectral features of the first speech signal and the spectral features of each sample signal stored in the speech signal sample library; using the sample signal having the highest similarity with the spectral features of the first speech signal as the first speech The sample signal that matches the signal.
  • the voice signal sample library may store spectral features corresponding to each sample signal, so as to directly compare the similarity between the first voice signal and each sample signal.
  • a difference between amplitudes of the first speech signal and each sample signal at the same frequency may be calculated.
  • the sample signal having the highest similarity to the spectral features of the first speech signal is used as the sample signal matching the first speech signal.
  • the user's voice signal may not be stored, and there is no sample signal matching the first voice signal. Based on this, a similarity threshold can be set. A sample signal having the highest similarity with the spectral features of the first speech signal and having a similarity greater than the similarity threshold is used as a sample signal matching the first speech signal, thereby performing a subsequent noise reduction operation. If there is no sample signal in the voice signal sample library that has the highest similarity to the spectral features of the first voice signal and the similarity is greater than the similarity threshold, the first voice signal can be directly used as the effective voice signal, and the operation ends.
  • the first voice signal in performing voiceprint recognition on the first voice signal to obtain a spectral feature of the first voice signal, may be first windowed to obtain at least a frame of speech signal; then, performing Fourier transform on at least one frame of the speech signal to obtain at least one frame of the frequency domain signal; and then extracting spectral features of the at least one frame of the frequency domain signal to obtain spectral characteristics of the first speech signal .
  • the infinitely long signal cannot be processed, but its finite time segment is taken for analysis.
  • the speech signal belongs to a short-term stationary signal, it is generally considered that the speech signal characteristics are basically within 10 to 30 ms. The above is constant or slow, so a small segment of the speech signal can be intercepted for spectrum analysis.
  • the first speech signal can be split into signals of at least one time segment by a window function, and the signal of each time segment can be referred to as a frame speech signal.
  • the length of the time segment may be any length of 10 to 30 ms.
  • the first voice signal may not be windowed, and the first voice signal may be directly used as a frame voice signal.
  • At least one frame of the speech signal is a time domain signal.
  • at least one frame of the speech signal may be Fourier transformed to obtain at least one frame of the frequency domain signal.
  • a fast Fourier transform FFT
  • FFT is a general term for efficient and fast calculation methods using computer to calculate discrete Fourier transform (DFT). With this algorithm, the number of multiplications required for the computer to calculate the discrete Fourier transform is greatly reduced. In particular, the more samples are transformed, the more significant the computational savings of the FFT algorithm is.
  • spectral features of at least one frame of the frequency domain signal are extracted to obtain a spectral feature of the first speech signal.
  • a frame frequency domain signal may be selected from the at least one frame frequency domain signal as the first frequency domain signal; and the spectral feature of the first frequency domain signal is extracted as the spectrum feature of the first voice signal.
  • one frame of the at least one frame frequency domain signal may be selected as the first frequency domain signal.
  • the spectrum of a speech signal refers to the correspondence between the frequencies of the speech signal and the amplitude of the signal.
  • an amplitude-gradation mapping relationship may be preset, and the signal amplitude corresponding to each frequency is expressed by the corresponding gray scale.
  • the amplitude range of the signal amplitude corresponding to each frequency is quantized into 256 quantized values, 0 represents black, 255 represents white, and the larger the amplitude value, the smaller the corresponding gray value.
  • the gray value corresponding to the signal amplitude at each frequency in the first frequency domain signal is searched to map the signal amplitude on each frequency to a gray value. Then, the gray value corresponding to each frequency in the first frequency domain signal is used as the spectral feature of the first voice signal.
  • each frequency in the first frequency domain signal is, for example, 0 Hz, 400 Hz, 800 Hz, 1200 Hz, 1600 Hz, and 2000 Hz.
  • the gray values corresponding to the respective frequencies are 255, 0, 155, 255, 50, and 200, respectively.
  • the gray value corresponding to each of these frequencies is the spectral characteristic of the first speech signal.
  • the envelope information corresponding to the frequency-decibel curve of the first frequency domain signal may be used as The spectral characteristics of the first speech signal.
  • the amplitude corresponding to each frequency in the first frequency domain signal is logarithmically calculated to obtain a decibel corresponding to each frequency, thereby obtaining a correspondence between each frequency and a decibel. Then, a frequency-decibel curve is obtained according to the correspondence between each frequency and decibel, and then the envelope information corresponding to the frequency-decibel curve is obtained.
  • the sample signal having the gray value corresponding to each frequency in the first frequency domain signal may be acquired from the voice signal sample library.
  • the gray value corresponding to each frequency in the sample signal may be pre-stored in the voice signal sample library. If the difference between the gray value corresponding to each frequency in the first frequency domain signal and the gray value corresponding to the same frequency in the sample signal is within a specified threshold range, the sample signal may be considered to match the first frequency domain signal, and further, the The sample signal matches the first speech signal.
  • the method for acquiring the gray value corresponding to each frequency in the sample signal is similar to the method for acquiring the gray value corresponding to each frequency in the first frequency domain signal.
  • a sample signal a sample signal input by the user can be received, and the sample signal is a time domain signal. Then, the time domain sample signal is windowed and Fourier transformed to obtain at least one frame of the frequency domain sample signal.
  • a frame of the frequency domain sample signal is selected from the at least one frame of the frequency domain sample signal as the first frequency domain sample signal.
  • the signal amplitude at each frequency in the first frequency domain sample signal is mapped to a gray value according to a preset amplitude-gradation mapping relationship.
  • the gray value corresponding to each frequency in the first frequency domain sample signal is used as the spectral feature of the first frequency domain sample, that is, the spectral characteristic of the sample signal.
  • each frame of the frequency domain signal should be the same as the frame length of each frame sample signal. If the time length of a frame frequency domain signal is 10 ms, the frame length of the sample signal matching the frame frequency domain signal should be 10 ms.
  • filtering out other noise signals in the first voice signal according to the sample signal matched with the first voice signal to obtain an effective voice signal including: according to the first voice signal Matching sample signals, using a least mean square algorithm to calculate other noise values in each frame of the frequency domain signal; subtracting each of the frame frequency domain signals from other noise values in each frame of the frequency domain signal to obtain valid for each frame Frequency domain signal; performing inverse Fourier transform on each frame of the effective frequency domain signal to obtain an effective time domain signal for each frame; and sequentially combining effective time domain signals of each frame to obtain an effective voice signal.
  • the least mean square algorithm is based on the minimum mean square value of the error between the expected response and the output signal.
  • the gradient vector is estimated in the iterative process according to the input signal, and the weight coefficient is updated to achieve an optimal adaptive iteration. algorithm.
  • the least mean square algorithm is a gradient steepest descent method, and its remarkable features and advantages are simplicity and rapidity.
  • the first voice signal can be converted into at least one frame frequency domain signal, and the method for filtering out other noise signals is the same for each frame frequency domain signal in at least one frame frequency domain signal.
  • the following uses the first frame frequency domain signal as an example to illustrate a method of filtering out other noise signals in the first frame frequency domain signal.
  • the first frame frequency domain signal is weighted to obtain the first frame weighted signal.
  • the sample signal and the first weighted signal matched with the first speech signal are taken as inputs, and other noise values in the first frame frequency domain signal are taken as desired outputs.
  • the weight function in the first frame weighted signal is iterated multiple times, so that the first frame weighted signal is approximated to the sample signal.
  • the weight function may be referred to as an optimal weight function.
  • the weight function in the first frame weighting signal may be iterated a specified number of times to obtain an optimal weight function; the weight function in the first frame weighting signal may also be iterated multiple times, when the first frame weighted signal and the sample signal When the difference is within the specified error range, the resulting weight function is the optimal weight function.
  • the sample signal is subtracted from the product of the optimal weight function and the first frame frequency domain signal to obtain other noise values.
  • the first frame frequency domain signal is subtracted from other noise values in the first frame frequency domain signal to obtain a valid signal in the first frame frequency domain signal.
  • a valid speech signal in the frequency domain signal of each frame can be obtained.
  • the effective speech signal in each frame of the frequency domain signal obtained above is a frequency domain signal, which is also converted into a time domain signal. Based on this, inverse Fourier transform is performed on each frame of the effective frequency domain signal to obtain an effective time domain signal for each frame; then, each frame of the effective time domain signal is sequentially combined in time sequence to establish a frame and a frame. The connection relationship between them to obtain an effective speech signal in the time domain.
  • the effective speech signal in the time domain is a signal for removing environmental noise signals and other noise signals, and can be used for outputting to a speaker, a voice recognition, a voice communication, and the like.
  • the minimum mean square algorithm may also be used in this embodiment to filter out ambient noise in the original input signal according to the interference signal associated with the ambient noise signal in the original input signal. Signal to obtain the first speech signal.
  • the interference signal is weighted to obtain a weighted signal.
  • x(n) is the interference signal
  • w(n) is the weight function
  • Original input signal d (n) s (n ) + N 0 (n), where, s (n) is a first voice signal, N 0 (n) is the ambient noise signal. Where N 0 (n) is related to N 1 (n).
  • the interference signal and the original input signal are taken as inputs, and the first speech signal is taken as a desired output, and the weight function in the weighted signal is iterated multiple times through the least mean square algorithm, so that the weighted signal approaches the environmental noise signal.
  • the weight function at this time can be called the optimal weight function.
  • the original input signal is subtracted from the product of the optimal weight function and the interference signal to obtain a first speech signal.
  • the desired output is the original input signal minus the difference of the weighted signal, that is, the error signal, as shown in equation (2).
  • the mean square error of the expected output is:
  • the weight function can be set to 0, and then the weight function is adaptively updated.
  • the adaptive update process of the weight function is as follows.
  • the error signal e(n) is calculated.
  • equation (10) can be obtained according to equation (9).
  • the updated weight function can be expressed as:
  • is a relatively small value, so that the update algorithm of the weight function is convergent, thus ensuring the accuracy of the algorithm.
  • the weight function can be substituted into equation (6) to obtain a weighted signal that is close to the ambient noise signal, ie y(n). Then, the weighted signal, i.e., d(n)-y(n), is subtracted from the original signal to obtain a first speech signal that filters out the ambient noise signal.
  • the original input signal may be acquired by a first microphone within a first specified distance from the sound source; and the second signal may be acquired by a second microphone within a second specified distance from the first specified distance.
  • the second specified distance is greater than the first specified distance.
  • the first microphone and the second microphone are mounted on the headset, and the sound source is the mouth of the user, the first microphone may be disposed at a position within a first specified distance from the mouth of the user, that is, near the mouth of the user. The location, such as the position of the corresponding mouth corner on the headset.
  • the second microphone may be disposed at a position outside the first specified distance from the mouth of the user within a second specified distance, ie, a position away from the mouth of the user, such as the position of the corresponding head on the headset.
  • the first microphone and the second microphone should be in the same environment, so that the ambient noise signal in the original input signal collected by the first microphone is related to the interference signal collected by the second microphone.
  • the first microphone is close to the sound source, and the second microphone is away from the sound source, so that most of the original input signals collected by the first microphone are valid voice signals, a small part is an environmental noise signal and other noise signals; and the second microphone collects interference signals. Most of them are ambient noise signals, and a small part is an effective speech signal. Based on this, the ambient noise signal in the original input signal collected by the first microphone can be filtered according to the interference signal collected by the second microphone to obtain the first voice signal.
  • FIG. 3 is a schematic structural diagram of an electronic device according to another embodiment of the present invention.
  • the electronic device 200 includes a processor 201, and a memory 202 connected to the processor 201;
  • the memory 202 is configured to store one or more computer instructions.
  • the processor 201 is configured to execute one or more computer instructions for: filtering an ambient noise signal in the original input signal according to an interference signal related to an ambient noise signal in the original input signal to obtain a first voice signal And obtaining, from the voice signal sample library, the sample signal matched with the first voice signal; filtering the other noise signals in the first voice signal according to the sample signal matched with the first voice signal to obtain an effective voice signal.
  • the first noise reduction of the original input signal is achieved, that is, the environmental noise signal is filtered out; on this basis, according to
  • the sample signal matched by the first voice signal filters out other noise signals in the first voice signal to obtain a valid voice signal, so that the valid voice signal sent by the user can be retained according to the sample signal matched with the first voice signal.
  • Filter out noise signals other than the effective signal to achieve the second noise reduction Especially when other noise signals are speech from other speakers, they can be effectively filtered according to the sample signal.
  • the environmental noise signal and other noise signals in the original input signal are sequentially filtered out by two noise reductions, and the obtained effective voice signal is more clear.
  • the processor 201 is configured to: perform voiceprint recognition on the first voice signal to obtain a spectrum of the first voice signal when acquiring the sample signal that matches the first voice signal from the voice signal sample library. Characterizing; calculating a similarity between a spectral feature of the first speech signal and a spectral feature of each sample signal stored in the speech signal sample library; and using the sample signal having the highest similarity with the spectral feature of the first speech signal as the first speech signal Matching sample signals.
  • the processor 201 when performing the voiceprint recognition on the first voice signal to obtain the spectral feature of the first voice signal, is specifically configured to: perform windowing processing on the first voice signal to obtain at least one frame of the voice signal. Performing a Fourier transform on at least one frame of the speech signal to obtain at least one frame of the frequency domain signal; extracting spectral features of the at least one frame of the frequency domain signal to obtain a spectral characteristic of the first speech signal.
  • the processor 201 when extracting the spectral features of the at least one frame of the frequency domain signal to obtain the spectral features of the first voice signal, is specifically configured to: select, from the at least one frame frequency domain signal, a frame frequency domain signal as a first frequency domain signal; mapping, according to a preset amplitude-gradation mapping relationship, a signal amplitude on each frequency in the first frequency domain signal to a gray value; and a gray value corresponding to each frequency in the first frequency domain signal As the spectral feature of the first speech signal.
  • the processor 201 filters out other noise signals in the first voice signal according to the sample signal matched with the first voice signal to obtain a valid voice signal, specifically, according to the first voice signal.
  • Matching sample signals using a least mean square algorithm to calculate other noise values in each frame of the frequency domain signal; subtracting each of the frame frequency domain signals from other noise values in each frame of the frequency domain signal to obtain valid for each frame Frequency domain signal; performing inverse Fourier transform on each frame of the effective frequency domain signal to obtain an effective time domain signal for each frame; and sequentially combining effective time domain signals of each frame to obtain an effective voice signal.
  • the processor 201 is further configured to: before the ambient noise signal in the original input signal is filtered according to the interference signal related to the ambient noise signal in the original input signal, by using the distance sound source within the first specified distance
  • the first microphone acquires the original input signal; and the interference signal is collected by the second microphone within the second specified distance from the first specified distance from the sound source; wherein the second specified distance is greater than the first specified distance.
  • the processor 201 filters out the ambient noise signal in the original input signal according to the interference signal related to the ambient noise signal in the original input signal to obtain the first voice signal, specifically: using the least mean square
  • the algorithm filters out the ambient noise signal in the original input signal based on the interference signal associated with the ambient noise signal in the original input signal to obtain the first speech signal.
  • the embodiment of the invention further provides a computer storage medium, which stores one or more computer instructions, when the one or more computer instructions are executed by the computer, according to an environmental noise signal in the original input signal Correlating the interference signal, filtering out the ambient noise signal in the original input signal to obtain the first speech signal; obtaining a sample signal matching the first speech signal from the speech signal sample library; matching according to the first speech signal The sample signal filters out other noise signals in the first speech signal to obtain a valid speech signal.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明提供一种语音信号降噪方法及设备。其中,方法的部分包括以下步骤:根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号;从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号;根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号。本发明提供的方法可以有效滤除语音信号中的环境噪声信号和其他噪声信号。

Description

语音信号降噪方法及设备 技术领域
本发明涉及信号处理技术领域,尤其涉及一种语音信号降噪方法及设备。
背景技术
随着科技的发展,出现了很多具有语音输入功能的设备,例如手机、机器人、语音音响等。在用户通过耳机上的麦克风输入语音信号的过程中,除了用户的语音信号,也会混入一些噪声信号,这些噪声信号会对用户输入的语音信号造成干扰,降低有效语音信号的清晰度。
目前,主要使用最小均方(Least Mean Square,LMS)算法对用户输入的信号进行降噪处理。该LMS算法主要用于滤除环境噪声信号,若用户输入的信号中除了环境噪声信号之外还掺杂着其它人的声音信号,在采用LMS算法进行降噪后仍得到不清晰的有效语音信号。由此可见,需要一种更为有效的语音信号降噪方法,去除语音信号中的各种噪声,以获得清晰的有效语音信号。
发明内容
本发明的多个方面提供一种语音信号降噪方法及设备,用以有效地去除语音信号中的环境噪声信号和其他噪声信号,以获的清晰的语音信号。
本发明提供一种语音信号降噪方法,包括:
根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号;
从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号;
根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号。
可选地,所述从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号,包括:
对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征;
计算所述第一语音信号的频谱特征与所述语音信号样本库中存储的各样本信号的频谱特征的相似度;
将与所述第一语音信号的频谱特征的相似度最高的样本信号作为与所述第一语音信号相匹配的样本信号。
可选地,所述对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征,包括:
对所述第一语音信号进行加窗处理,以获得至少一帧语音信号;
对所述至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号;
提取所述至少一帧频域信号的频谱特征,以获得所述第一语音信号的频谱特征。
可选地,所述提取所述至少一帧频域信号的频谱特征,以获得所述第一语音信号的频谱特征,包括:
从所述至少一帧频域信号中,选择一帧频域信号作为第一频域信号;
按照预设的幅度-灰度映射关系,将所述第一频域信号中各频率上的信号幅度映射为灰度值;
将所述第一频域信号中各频率对应的灰度值,作为所述第一语音信号的频谱特征。
可选地,所述根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号,包括:
根据与所述第一语音信号相匹配的样本信号,采用最小均方算法计算每一帧频域信号中的其它噪声值;
将每一帧频域信号减去每一帧频域信号中的其它噪声值,以得到每一帧有效频域信号;
对所述每一帧有效频域信号进行傅里叶反变换,以得到每一帧有效时域信号;
将每一帧有效时域信号顺次组合,以得到所述有效语音信号。
可选地,在所述根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号之前,所述方法还包括:
通过距离声源在第一指定距离内的第一麦克风采集所述原始输入信号;以及
通过距离所述声源在所述第一指定距离外、第二指定距离内的第二麦克风采集所述干扰信号;
其中,所述第二指定距离大于所述第一指定距离。
可选地,所述根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号,包括:
采用最小均方算法,根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到所述第一语音信号。
本发明还提供一种电子设备,包括:处理器,以及与所述处理器连接的存储器;
所述存储器,用于存储一条或多条计算机指令;
所述处理器,用于执行所述一条或多条计算机指令,以用于:
根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号;
从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号;
根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号。
可选地,所述处理器在从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号时,具体用于:
对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征;
计算所述第一语音信号的频谱特征与所述语音信号样本库中存储的各样本信号的频谱特征的相似度;
将与所述第一语音信号的频谱特征的相似度最高的样本信号作为与所述第一语音信号相匹配的样本信号。
可选地,所述处理器在对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征时,具体用于:
对所述第一语音信号进行加窗处理,以获得至少一帧语音信号;
对所述至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号;
提取所述至少一帧频域信号的频谱特征,以获得所述第一语音信号的频谱特征。
在本发明中,通过滤除原始输入信号中的环境噪声信号,以得到第一语音信号,实现了原始输入信号的第一次降噪,即滤除了环境噪声信号;在此基础上,根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号,使得能够依 照与第一语音信号相匹配的样本信号,保留用户发出的有效语音信号,滤除除有效信号之外的其他噪声信号,实现第二次降噪。尤其是当其他噪声信号是其他说话者发出的语音时,能够根据样本信号有效滤除。本实施例中,通过两次降噪,依次滤除原始输入信号中的环境噪声信号和其他噪声信号,获得的有效语音信号更为清晰。
附图说明
图1为本发明一实施例提供的语音信号降噪方法的流程示意图;
图2为本发明又一实施例提供的第一帧频域信号中各频率对应的灰度值;
图3为本发明又一实施例提供的电子设备的结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合本发明具体实施例及相应的附图对本发明技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
以下结合附图,详细说明本发明各实施例提供的技术方案。
图1为本发明一实施例提供的语音信号降噪方法的流程示意图。如图1所示,该方法包括以下步骤:
S101:根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号。
S102:从语音信号样本库中,获取与第一语音信号相匹配的样本信号。
S103:根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号。
原始输入信号指用户通过耳机、手机等设备上的麦克风输入的语音信号。由于环境噪声和其他说话者的存在,原始输入信号除了用户发出的有效语音信号外还会包括环境噪声信号和其他噪声信号。其中,环境噪声信号是指在工业生产、建筑施工、交通运输和社会生活中所产生的干扰周围生活环境的声音信号。其他噪声信号可以指除环境噪声之外的噪声信号,例如除用户外的其他说话者发出的语音信号。
本实施例中,首先对原始输入信号进行第一次降噪处理,以滤除原始输入信号中的环境噪声信号。然后对第一次降噪处理后的语音信号进行第二次降噪处理,以滤除其他说话者发 出的语音信号等其他噪声信号,进而得到更加清晰的语音信号。为了方便描述,将最后得到的语音信号称为有效语音信号。
下面详细说明对原始输入信号进行两次降噪处理的过程。
第一次降噪处理:根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号(即步骤S101)。
干扰信号是从与原始输入信号的环境相同的环境中采集的信号。例如,原始输入信号是从下雨的环境中采集的,那么干扰信号也是从下雨的环境中采集的。干扰信号主要由环境噪声信号构成,与环境噪声信号具有相随变动的关系,也即干扰信号与环境噪声信号相关。
基于干扰信号与环境噪声信号的相关性,可以采用降噪算法,例如最小均方算法,根据干扰信号得到的逼近环境噪声信号的信号。再将原始输入信号减去根据干扰信号得到的信号,以获得滤除环境噪声信号的语音信号。为方便描述,将滤除环境噪声信号的语音信号称为第一语音信号。
第一语音信号除了有效语音信号外还会包括其他噪声信号,基于此,下面对第一语音信号进行第二次降噪处理。
第二次降噪处理:从语音信号样本库中,获取与第一语音信号相匹配的样本信号(即步骤S102);根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号(即步骤S103)。
语音信号样本库中存储有至少一个样本信号。这些样本信号可以是用户在比较安静的环境中预先输入的语音信号,这些样本信号可以认为是不包括噪声的有效语音信号。其中,一个用户可以对应一个样本信号也可以对应多个样本信号。例如,用户可以在嗓子正常和发炎两种情况下,各存储一个样本信号。
样本信号与第一语音信号相匹配指的是样本信号与第一语音信号的时域波形、频谱特性或统计特性等相匹配。如果第一语音信号与样本信号相匹配,说明第一语音信号包括用户发出的有效语音信号,则可以根据样本信号对第一语音信号再次进行降噪处理,以得到有效语音信号。
其中,基于第一语音信号与该样本信号相匹配,使得该样本信号与第一语音信号中的有效语音信号具有相关性,与其他噪声信号不具有相关性。基于此,可以根据该样本信号,保留与样本信号相关的信号,也就是有效语音信号;滤除与样本信号不相关的信号,即其他噪声信号。
在一示例中,其他噪声信号例如是其他说话者的语音信号。与第一语音信号相匹配的样本信号例如为信号A。由于其他说话者的发声系统与用户的发声系统不同,导致其他说话者 发出的语音信号与信号A不相关。基于此,可以滤除第一语音信号中的其他说话者的语音信号,以得到用户的发出的有效语音信号。
本实施例中,通过滤除原始输入信号中的环境噪声信号,以得到第一语音信号,实现了原始输入信号的第一次降噪,即滤除了环境噪声信号;在此基础上,根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号,使得能够依照与第一语音信号相匹配的样本信号,保留用户发出的有效语音信号,滤除除有效信号之外的其他噪声信号,实现第二次降噪。尤其是当其他噪声信号是其他说话者发出的语音时,可以根据样本信号有效滤除。本实施例中,通过两次降噪,依次滤除原始输入信号中的环境噪声信号和其他噪声信号,获得的有效语音信号更为清晰。
在上述实施例或下述实施例中,从语音信号样本库中,获取与第一语音信号相匹配的样本信号,包括:对第一语音信号进行声纹识别,以获得第一语音信号的频谱特征;计算第一语音信号的频谱特征与语音信号样本库中存储的各样本信号的频谱特征的相似度;将与第一语音信号的频谱特征的相似度最高的样本信号作为与第一语音信号相匹配的样本信号。
其中,声纹是用电声学仪器显示的携带言语信息的声波频谱。通过对第一语音信号进行声纹识别可以得到第一语音信号中携带言语信息的声波频谱,进而从声波频谱中提取该声波频谱的特征作为第一语音信号的频谱特征。
不同人的语音信号的频谱特征不同,若两个语音信号的频谱特征越相似,说明两个语音信号的发声者是同一人的概率越高,两个语音信号也就越匹配。基于此,计算第一语音信号的频谱特征与语音信号样本库中存储的各样本信号的频谱特征的相似度;将与第一语音信号的频谱特征的相似度最高的样本信号作为与第一语音信号相匹配的样本信号。
可选地,语音信号样本库中除了存储至少一个样本信号外,可以存储各样本信号对应的频谱特征,以便能够直接对比第一语音信号和各样本信号的相似度。
可选地,可以计算第一语音信号和各样本信号在同一频率下的的幅度的差值,差值越大说明第一语音信号与该样本信号的相似度越低,则第一语音信号包括用户的发出的语音信号的概率越小;差值与越小说明第一语音信号与该样本信号的相似度越高,则第一语音信号包括用户发出的语音信号的概率越大。基于此,将与第一语音信号的频谱特征的相似度最高的样本信号作为与第一语音信号相匹配的样本信号。
语音信号样本库中,可能未存储用户的语音信号,也就不存在与第一语音信号相匹配的样本信号。基于此,可以设置一相似度阈值。将与第一语音信号的频谱特征的相似度最高、且相似度大于相似度阈值的样本信号作为与第一语音信号相匹配的样本信号,进而执行后续 降噪操作。若语音信号样本库中不存在与第一语音信号的频谱特征的相似度最高、且相似度大于相似度阈值的样本信号,则可以直接将第一语音信号作为有效语音信号,并结束操作。
在上述实施例或下述实施例中,在对第一语音信号进行声纹识别,以获得第一语音信号的频谱特征的过程中,可以首先对第一语音信号进行加窗处理,以获得至少一帧语音信号;然后,对至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号;接着,提取至少一帧频域信号的频谱特征,以获得第一语音信号的频谱特征。
在使用计算机进行信号处理时,不能对无限长的信号进行处理,而是取其有限的时间片段进行分析;而且,由于语音信号属于短时平稳信号,一般认为在10~30ms内语音信号特性基本上是不变的,或是缓慢的,于是可截取一小段语音信号进行频谱分析。基于此,可以通过窗函数,将第一语音信号拆分成至少一个时间片段的信号,每一个时间片段的信号可以称为一帧语音信号。其中,时间片段的长度可以为10~30ms中的任一时长。
可选地,若第一语音信号的时间长度在10~30ms,则可以不对第一语音信号进行加窗处理,直接将第一语音信号作为一帧语音信号。
至少一帧语音信号是时域信号,为了获取语音信号在频域中的频谱特征,可对至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号。可选地,可以对至少一帧语音信号进行快速傅里叶变换(fast Fourier transform,FFT)。其中,FFT即利用计算机计算离散傅里叶变换(DFT)的高效、快速计算方法的统称。采用这种算法能使计算机计算离散傅里叶变换所需要的乘法次数大为减少,特别是被变换的抽样点数越多,FFT算法计算量的节省就越显著。
接着,从至少一帧频域信号中,提取至少一帧频域信号的频谱特征,以获得第一语音信号的频谱特征。
对于每一帧语音信号来说,频谱特征几乎是相同的。因此,可以从至少一帧频域信号中,选择一帧频域信号作为第一频域信号;提取第一频域信号的频谱特征,作为第一语音信号的频谱特征。
可选地,可以从至少一帧频域信号中任选一帧作为第一频域信号。
语音信号的频谱指的是语音信号的各频率与信号幅度的对应关系。为了能够清晰、直观地反映语音信号的频谱特征,可以预设一幅度-灰度映射关系,将各频率对应的信号幅度大小用相应的灰度表述。可选地,将各频率对应的信号幅度所在的幅度范围量化为256个量化值,0表示黑色,255表示白色,幅度值越大,对应的灰度值越小。
然后,在幅度-灰度映射关系中,查找第一频域信号中各频率上的信号幅度对应的灰度值,以将各频率上的信号幅度映射为灰度值。继而,将第一频域信号中各频率对应的灰度值,作为第一语音信号的频谱特征。
在一示例中,如图2所示,第一频域信号中各频率例如为0Hz、400Hz、800Hz、1200Hz、1600Hz和2000Hz。各频率对应的灰度值分别为255、0、155、255、50、200。这些各频率对应的灰度值就是第一语音信号的频谱特征。
可选地,除了将第一频域信号中各频率对应的灰度值作为第一语音信号的频谱特征之外,还可以将第一频域信号的频率-分贝曲线对应的包络线信息作为第一语音信号的频谱特征。
可选地,将第一频域信号中各频率对应的振幅作对数计算,以得到各频率对应的分贝,进而得到各频率与分贝的对应关系。然后,根据各频率与分贝的对应关系获得频率-分贝曲线,继而得到频率-分贝曲线对应的包络线信息。
在获取第一频域信号中各频率对应的灰度值之后,可以从语音信号样本库中,获取与第一频域信号中各频率对应的灰度值相近的样本信号。
可选地,在语音信号样本库中可以预先存储样本信号中各频率对应的灰度值。若第一频域信号中各频率对应的灰度值与样本信号中相同频率对应的灰度值之差在指定阈值范围内,可以认为该样本信号与第一频域信号相匹配,进一步,该样本信号与第一语音信号相匹配。
样本信号中各频率对应的灰度值的获取方法与第一频域信号中各频率对应的灰度值的获取方法类似。对于一个样本信号来说,可以接收用户输入的样本信号,这个样本信号是时域信号。然后,对时域样本信号进行加窗处理以及傅里叶变换,以得到至少一帧频域样本信号。从至少一帧频域样本信号选择一帧频域样本信号,作为第一频域样本信号。按照预设的幅度-灰度映射关系,将第一频域样本信号中各频率上的信号幅度映射为灰度值。
进一步地,将第一频域样本信号中各频率对应的灰度值,作为第一频域样本的频谱特征,也就是该样本信号的频谱特征。
值得说明的是,每一帧频域信号的帧长应与每一帧样本信号的帧长相同。若一帧频域信号的时间长度是10ms,则与该帧频域信号相匹配的样本信号的帧长应是10ms。
在上述实施例或下述实施例中,根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号,包括:根据与第一语音信号相匹配的样本信号,采用最小均方算法计算每一帧频域信号中的其它噪声值;将每一帧频域信号减去每一帧频域信号中的其它噪声值,以得到每一帧有效频域信号;对每一帧有效频域信号进行傅里叶反变换,以得到每一帧有效时域信号;将每一帧有效时域信号顺次组合,以得到有效语音信号。
其中,最小均方算法是以期望响应和输出信号之间的误差的均方值最小为基准的,是依据输入信号在迭代过程中估计梯度矢量,并更新权系数以达到最优的自适应迭代算法。最小 均方算法是一种梯度最速下降方法,其显著的特点和优点是简单性、快速性。
第一语音信号可转换为至少一帧频域信号,对于至少一帧频域信号中的每一帧频域信号而言,滤除其他噪声信号的方法相同。下面以第一帧频域信号为例,说明滤除第一帧频域信号中的其他噪声信号的方法。
采用权函数,对第一帧频域信号进行加权处理,以得到第一帧加权信号。将与第一语音信号相匹配的样本信号和第一加权信号作为输入,将第一帧频域信号中的其他噪声值作为期望输出。经过最小均方算法,多次迭代第一帧加权信号中的权函数,使得将第一帧加权信号逼近样本信号。当第一帧加权信号逼近样本信号时,权函数可称为最优权函数。
可选地,可以将第一帧加权信号中的权函数迭代指定次数,以得到最优权函数;也可以多次迭代第一帧加权信号中的权函数,当第一帧加权信号与样本信号之差在指定误差范围内时,所得的权函数为最优权函数。
然后,将样本信号减去最优权函数与第一帧频域信号的乘积,以得到其他噪声值。最后,将第一帧频域信号减去第一帧频域信号中的其他噪声值,以得到第一帧频域信号中的有效信号。同理,可以得到每一帧频域信号中的有效语音信号。
上述得到的每一帧频域信号中的有效语音信号是频域信号,还要将其转换为时域信号。基于此,对每一帧有效频域信号进行傅里叶反变换,以得到每一帧有效时域信号;然后,将每一帧有效时域信号按照时间顺序,顺次组合,建立帧与帧之间的连接关系,以得到时域内的有效语音信号。该时域内的有效语音信号是去除环境噪声信号和其他噪声信号的信号,可以用于输出至扬声器、语音识别、语音通信等操作。
可选地,与滤除其他噪声信号的方法类似,本实施例中也可以采用最小均方算法,根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号。
首先,如式(1)所示,对干扰信号进行加权处理,以得到加权信号。
y(n)=w(n)x(n),n=1,...,M;x(n)=N 1(n)  (1)
其中,M是迭代次数,x(n)是干扰信号,w(n)是权函数。
原始输入信号d(n)=s(n)+N 0(n),其中,s(n)是第一语音信号,N 0(n)是环境噪声信号。其中N 0(n)与N 1(n)相关。
然后,将干扰信号和原始输入信号作为输入,将第一语音信号作为期望输出,经过最小均方算法,多次迭代加权信号中的权函数,使得加权信号逼近环境噪声信号。此时的权函数 可称为最优权函数。然后,将原始输入信号减去最优权函数与干扰信号的乘积,以得到第一语音信号。
具体而言,期望输出为原始输入信号减去加权信号的差值,即误差信号,如式(2)所示。
e(n)=d(n)-y(n)=s(n)+N 0(n)-y(n)  (2)
期望输出的均方差为:
E[e 2(n)]=E[(s(n)+N 0(n)-y(n)) 2]
=E[s 2(n)]+E[(N 0(n)-y(n)) 2]+2E[s(n)·(N 0(n)-y(n))]  (3)
由于s(n)与N 0(n)不相关,s(n)与N 1(n)不相关,则有式(4)
E[s(n)·(N 0(n)-y(n))]=0   (4)
进一步,将式(4)代入式(3)中得到式(5)
E[e 2(n)]=E[s 2(n)]+E[(N 0(n)-y(n)) 2]  (5)
由于s(n)为定值,若使E[e 2(n)]取最小值,则有式(6)
N 0(n)=y(n)=w(n)x(n)=w(n)N 1(n)   (6)
进一步,将式(6)代入式(5)中,得到式(7)
e(n)=s(n)  (7)
在最小均方算法初始时,可以将权函数设为0,后续将权函数自适应更新。权函数的自适应更新过程如下。
如式(8)所示,计算误差信号e(n)。
e(n)=d(n)-y(n)=d(n)-w(n)x(n)  (8)
然后,计算误差信号e(n)的误差均方差ξ(n)。
ξ(n)=E[e 2(n)]=E[d 2(n)-2d(n)y(n)+y 2(n)]  (9)
设R是x(n)的自相关矩阵,P是x(n)与d(n)的互相关矩阵,则根据式(9)可以得到式(10)。
ξ(n)=E[e 2(n)]=E[d 2(n)]+w(n)Rw(n)-2Pw(n)  (10)
然后,计算误差均方差梯度:
Figure PCTCN2017117553-appb-000001
简化误差均方差梯度:
Figure PCTCN2017117553-appb-000002
不断迭代权函数,直到迭代次数为M。更新的权函数可以表示为:
w(n+1)=w(n)+2μe(n)x(n)  (13)
其中,μ是一个比较小的值,以使权函数的更新算法是收敛的,进而保证算法的准确性。
在得到每次迭代过程中的权函数后,可以将权函数代入式(6)中,得到与环境噪声信号逼近的加权信号,即y(n)。然后,通过原始信号减去加权信号,即d(n)-y(n),就得到滤除环境噪声信号的第一语音信号。
在上述实施例或下述实施例中,为了准确采集到干扰信号和原始输入信号,在根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号之前,可以通过距离声源在第一指定距离内的第一麦克风采集原始输入信号;以及通过距离声源在第一指定距离外、第二指定距离内的第二麦克风采集干扰信号。其中,第二指定距离大于第一指定距离。
若第一麦克风和第二麦克风安装在头戴式耳机上,声源为用户的口部,则第一麦克风可以设置在距离用户的口部第一指定距离内的位置,即靠近用户的口部的位置,例如头戴式耳机上对应嘴角的位置。第二麦克风可以设置在距离用户的口部第一指定距离外、第二指定距离内的位置,即远离用户的口部的位置,例如头戴式耳机上对应头顶的位置。
本实施例中,第一麦克风和第二麦克风应处于相同的环境中,使得第一麦克风采集的原始输入信号中的环境噪声信号与第二麦克风采集的干扰信号相关。第一麦克风靠近声源,第二麦克风远离声源,使得第一麦克风采集的原始输入信号中大部分为有效语音信号,小部分为环境噪声信号和其他噪声信号;第二麦克风采集的干扰信号中大部分为环境噪声信号,小部分为有效语音信号。基于此,可以根据第二麦克风采集到的干扰信号,滤除第一麦克风采集到的原始输入信号中的环境噪声信号,以得到第一语音信号。
图3为本发明又一实施例提供的电子设备的结构示意图。如图3所示,电子设备200包括处理器201,以及与处理器201连接的存储器202;
存储器202,用于存储一条或多条计算机指令。
处理器201,用于执行一条或多条计算机指令,以用于:根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号;从语音信号样本库中,获取与第一语音信号相匹配的样本信号;根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号。
本实施例中,通过滤除原始输入信号中的环境噪声信号,以得到第一语音信号,实现了原始输入信号的第一次降噪,即滤除了环境噪声信号;在此基础上,根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号,使得能够依照与第一语音信号相匹配的样本信号,保留用户发出的有效语音信号,滤除除有效信号之外的其他噪声信号,实现第二次降噪。尤其是当其他噪声信号是其他说话者发出的语音时,能够根据样本信号有效滤除。本实施例中,通过两次降噪,依次滤除原始输入信号中的环境噪声信号和其他噪声信号,获得的有效语音信号更为清晰。
可选地,处理器201在从语音信号样本库中,获取与第一语音信号相匹配的样本信号时,具体用于:对第一语音信号进行声纹识别,以获得第一语音信号的频谱特征;计算第一语音信号的频谱特征与语音信号样本库中存储的各样本信号的频谱特征的相似度;将与第一语音信号的频谱特征的相似度最高的样本信号作为与第一语音信号相匹配的样本信号。
可选地,处理器201在对第一语音信号进行声纹识别,以获得第一语音信号的频谱特征时,具体用于:对第一语音信号进行加窗处理,以获得至少一帧语音信号;对至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号;提取至少一帧频域信号的频谱特征,以获得第一语音信号的频谱特征。
可选地,处理器201在提取至少一帧频域信号的频谱特征,以获得第一语音信号的频谱特征时,具体用于:从至少一帧频域信号中,选择一帧频域信号作为第一频域信号;按照预设的幅度-灰度映射关系,将第一频域信号中各频率上的信号幅度映射为灰度值;将第一频域信号中各频率对应的灰度值,作为第一语音信号的频谱特征。
可选地,处理器201在根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号时,具体用于:根据与第一语音信号相匹配的样本信号,采用最小均方算法计算每一帧频域信号中的其它噪声值;将每一帧频域信号减去每一帧频域信号中的其它噪声值,以得到每一帧有效频域信号;对每一帧有效频域信号进行傅里叶反变换,以得到每一帧有效时域信号;将每一帧有效时域信号顺次组合,以得到有效语音信号。
可选地,处理器201在根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号之前,还用于:通过距离声源在第一指定距离内的第一麦克风采集原始输入信号;以及通过距离声源在第一指定距离外、第二指定距离内的第二麦克风采 集干扰信号;其中,第二指定距离大于第一指定距离。
可选地,处理器201在根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号时,具体用于:采用最小均方算法,根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号。
本发明实施例还提供了一种计算机存储介质,该计算机存储介质存储一条或多条计算机指令,该一条或多条计算机指令被计算机执行时,可实现:根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除原始输入信号中的环境噪声信号,以得到第一语音信号;从语音信号样本库中,获取与第一语音信号相匹配的样本信号;根据与第一语音信号相匹配的样本信号,滤除第一语音信号中的其它噪声信号,以得到有效语音信号。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。

Claims (10)

  1. 一种语音信号降噪方法,其特征在于,包括:
    根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号;
    从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号;
    根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号。
  2. 根据权利要求1所述的方法,其特征在于,所述从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号,包括:
    对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征;
    计算所述第一语音信号的频谱特征与所述语音信号样本库中存储的各样本信号的频谱特征的相似度;
    将与所述第一语音信号的频谱特征的相似度最高的样本信号作为与所述第一语音信号相匹配的样本信号。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征,包括:
    对所述第一语音信号进行加窗处理,以获得至少一帧语音信号;
    对所述至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号;
    提取所述至少一帧频域信号的频谱特征,以获得所述第一语音信号的频谱特征。
  4. 根据权利要求3所述的方法,其特征在于,所述提取所述至少一帧频域信号的频谱特征,以获得所述第一语音信号的频谱特征,包括:
    从所述至少一帧频域信号中,选择一帧频域信号作为第一频域信号;
    按照预设的幅度-灰度映射关系,将所述第一频域信号中各频率上的信号幅度映射为灰度值;
    将所述第一频域信号中各频率对应的灰度值,作为所述第一语音信号的频谱特征。
  5. 根据权利要求3所述的方法,其特征在于,所述根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号,包括:
    根据与所述第一语音信号相匹配的样本信号,采用最小均方算法计算每一帧频域信号中的其它噪声值;
    将每一帧频域信号减去每一帧频域信号中的其它噪声值,以得到每一帧有效频域信号;
    对所述每一帧有效频域信号进行傅里叶反变换,以得到每一帧有效时域信号;
    将每一帧有效时域信号顺次组合,以得到所述有效语音信号。
  6. 根据权利要求1所述的方法,其特征在于,在所述根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号之前,所述方法还包括:
    通过距离声源在第一指定距离内的第一麦克风采集所述原始输入信号;以及
    通过距离所述声源在所述第一指定距离外、第二指定距离内的第二麦克风采集所述干扰信号;
    其中,所述第二指定距离大于所述第一指定距离。
  7. 根据权利要求1所述的方法,其特征在于,所述根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号,包括:
    采用最小均方算法,根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到所述第一语音信号。
  8. 一种电子设备,其特征在于,包括:处理器,以及与所述处理器连接的存储器;
    所述存储器,用于存储一条或多条计算机指令;
    所述处理器,用于执行所述一条或多条计算机指令,以用于:
    根据与原始输入信号中的环境噪声信号相关的干扰信号,滤除所述原始输入信号中的环境噪声信号,以得到第一语音信号;
    从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号;
    根据与所述第一语音信号相匹配的样本信号,滤除所述第一语音信号中的其它噪声信号,以得到有效语音信号。
  9. 根据权利要求8所述的电子设备,其特征在于,所述处理器在从语音信号样本库中,获取与所述第一语音信号相匹配的样本信号时,具体用于:
    对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征;
    计算所述第一语音信号的频谱特征与所述语音信号样本库中存储的各样本信号的频谱特征的相似度;
    将与所述第一语音信号的频谱特征的相似度最高的样本信号作为与所述第一语音信号相匹配的样本信号。
  10. 根据权利要求9所述的电子设备,其特征在于,所述处理器在对所述第一语音信号进行声纹识别,以获得所述第一语音信号的频谱特征时,具体用于:
    对所述第一语音信号进行加窗处理,以获得至少一帧语音信号;
    对所述至少一帧语音信号进行傅里叶变换,以获得至少一帧频域信号;
    提取所述至少一帧频域信号的频谱特征,以获得所述第一语音信号的频谱特征。
PCT/CN2017/117553 2017-11-27 2017-12-20 语音信号降噪方法及设备 WO2019100500A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/766,236 US11475907B2 (en) 2017-11-27 2017-12-20 Method and device of denoising voice signal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711207556.8A CN107945815B (zh) 2017-11-27 2017-11-27 语音信号降噪方法及设备
CN201711207556.8 2017-11-27

Publications (1)

Publication Number Publication Date
WO2019100500A1 true WO2019100500A1 (zh) 2019-05-31

Family

ID=61949069

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/117553 WO2019100500A1 (zh) 2017-11-27 2017-12-20 语音信号降噪方法及设备

Country Status (3)

Country Link
US (1) US11475907B2 (zh)
CN (1) CN107945815B (zh)
WO (1) WO2019100500A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108831440A (zh) * 2018-04-24 2018-11-16 中国地质大学(武汉) 一种基于机器学习及深度学习的声纹降噪方法及系统
CN108847208B (zh) * 2018-05-04 2020-11-27 歌尔科技有限公司 一种降噪处理方法、装置和耳机
CN108965904B (zh) * 2018-09-05 2021-08-06 阿里巴巴(中国)有限公司 一种直播间的音量调节方法及客户端
CN109120947A (zh) * 2018-09-05 2019-01-01 北京优酷科技有限公司 一种直播间的语音私聊方法及客户端
CN109005419B (zh) * 2018-09-05 2021-03-19 阿里巴巴(中国)有限公司 一种语音信息的处理方法及客户端
CN109104616B (zh) * 2018-09-05 2022-01-14 阿里巴巴(中国)有限公司 一种直播间的语音连麦方法及客户端
CN109273020B (zh) * 2018-09-29 2022-04-19 阿波罗智联(北京)科技有限公司 音频信号处理方法、装置、设备和存储介质
CN109410975B (zh) * 2018-10-31 2021-03-09 歌尔科技有限公司 一种语音降噪方法、设备及存储介质
CN109635759B (zh) * 2018-12-18 2020-10-09 北京嘉楠捷思信息技术有限公司 一种信号处理方法、装置及计算机可读存储介质
CN109946023A (zh) * 2019-04-12 2019-06-28 西南石油大学 一种管道气体泄漏判别装置及判识方法
CN110232905B (zh) * 2019-06-12 2021-08-27 会听声学科技(北京)有限公司 上行降噪方法、装置和电子设备
CN111383653A (zh) * 2020-03-18 2020-07-07 北京海益同展信息科技有限公司 语音处理方法及装置、存储介质、机器人
CN111583946A (zh) * 2020-04-30 2020-08-25 厦门快商通科技股份有限公司 一种语音信号增强方法和装置以及设备
CN112331225B (zh) * 2020-10-26 2023-09-26 东南大学 一种高噪声环境下辅助听力的方法及装置
CN113539291A (zh) * 2021-07-09 2021-10-22 北京声智科技有限公司 音频信号的降噪方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514884A (zh) * 2012-06-26 2014-01-15 华为终端有限公司 通话音降噪方法及终端
CN105719659A (zh) * 2016-02-03 2016-06-29 努比亚技术有限公司 基于声纹识别的录音文件分离方法及装置

Family Cites Families (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7099821B2 (en) * 2003-09-12 2006-08-29 Softmax, Inc. Separation of target acoustic signals in a multi-transducer arrangement
US20060282264A1 (en) * 2005-06-09 2006-12-14 Bellsouth Intellectual Property Corporation Methods and systems for providing noise filtering using speech recognition
US7970564B2 (en) * 2006-05-02 2011-06-28 Qualcomm Incorporated Enhancement techniques for blind source separation (BSS)
EP2058803B1 (en) * 2007-10-29 2010-01-20 Harman/Becker Automotive Systems GmbH Partial speech reconstruction
DE602008002695D1 (de) * 2008-01-17 2010-11-04 Harman Becker Automotive Sys Postfilter für einen Strahlformer in der Sprachverarbeitung
FR2932332B1 (fr) * 2008-06-04 2011-03-25 Parrot Systeme de controle automatique de gain applique a un signal audio en fonction du bruit ambiant
US8639516B2 (en) * 2010-06-04 2014-01-28 Apple Inc. User-specific noise suppression for voice quality improvements
US9330675B2 (en) * 2010-11-12 2016-05-03 Broadcom Corporation Method and apparatus for wind noise detection and suppression using multiple microphones
CN102497613A (zh) * 2011-11-30 2012-06-13 江苏奇异点网络有限公司 用于课堂扩音的双通道实时语音输出方法
CN104160443B (zh) * 2012-11-20 2016-11-16 统一有限责任两合公司 用于音频数据处理的方法、设备和系统
US9685171B1 (en) * 2012-11-20 2017-06-20 Amazon Technologies, Inc. Multiple-stage adaptive filtering of audio signals
US10134401B2 (en) * 2012-11-21 2018-11-20 Verint Systems Ltd. Diarization using linguistic labeling
EP2947658A4 (en) * 2013-01-15 2016-09-14 Sony Corp MEMORY CONTROL DEVICE, READ CONTROL DEVICE, AND RECORDING MEDIUM
US9117457B2 (en) * 2013-02-28 2015-08-25 Signal Processing, Inc. Compact plug-in noise cancellation device
US10306389B2 (en) * 2013-03-13 2019-05-28 Kopin Corporation Head wearable acoustic system with noise canceling microphone geometry apparatuses and methods
CN104050971A (zh) * 2013-03-15 2014-09-17 杜比实验室特许公司 声学回声减轻装置和方法、音频处理装置和语音通信终端
CN105144594B (zh) * 2013-05-14 2017-05-17 三菱电机株式会社 回声消除装置
JP6261043B2 (ja) * 2013-08-30 2018-01-17 本田技研工業株式会社 音声処理装置、音声処理方法、及び音声処理プログラム
US9177567B2 (en) * 2013-10-17 2015-11-03 Globalfoundries Inc. Selective voice transmission during telephone calls
TWI543151B (zh) * 2014-03-31 2016-07-21 Kung Lan Wang Voiceprint data processing method, trading method and system based on voiceprint data
US10332541B2 (en) * 2014-11-12 2019-06-25 Cirrus Logic, Inc. Determining noise and sound power level differences between primary and reference channels
CN105989836B (zh) * 2015-03-06 2020-12-01 腾讯科技(深圳)有限公司 一种语音采集方法、装置及终端设备
CN104898836B (zh) * 2015-05-19 2017-11-24 广东欧珀移动通信有限公司 一种旋转摄像头调节方法及用户终端
CN106486130B (zh) * 2015-08-25 2020-03-31 百度在线网络技术(北京)有限公司 噪声消除、语音识别方法及装置
US10331312B2 (en) * 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
CN106971733A (zh) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 基于语音降噪的声纹识别的方法及系统以及智能终端
CN105632493A (zh) * 2016-02-05 2016-06-01 深圳前海勇艺达机器人有限公司 一种通过语音控制和唤醒机器人的方法
US20170294185A1 (en) * 2016-04-08 2017-10-12 Knuedge Incorporated Segmentation using prior distributions
JP6878776B2 (ja) * 2016-05-30 2021-06-02 富士通株式会社 雑音抑圧装置、雑音抑圧方法及び雑音抑圧用コンピュータプログラム
CN106935248B (zh) * 2017-02-14 2021-02-05 广州孩教圈信息科技股份有限公司 一种语音相似度检测方法及装置
US10170137B2 (en) * 2017-05-18 2019-01-01 International Business Machines Corporation Voice signal component forecaster
US10558421B2 (en) * 2017-05-22 2020-02-11 International Business Machines Corporation Context based identification of non-relevant verbal communications
CN108305615B (zh) * 2017-10-23 2020-06-16 腾讯科技(深圳)有限公司 一种对象识别方法及其设备、存储介质、终端
CN108109619B (zh) * 2017-11-15 2021-07-06 中国科学院自动化研究所 基于记忆和注意力模型的听觉选择方法和装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514884A (zh) * 2012-06-26 2014-01-15 华为终端有限公司 通话音降噪方法及终端
CN105719659A (zh) * 2016-02-03 2016-06-29 努比亚技术有限公司 基于声纹识别的录音文件分离方法及装置

Also Published As

Publication number Publication date
US11475907B2 (en) 2022-10-18
CN107945815A (zh) 2018-04-20
US20200372925A1 (en) 2020-11-26
CN107945815B (zh) 2021-09-07

Similar Documents

Publication Publication Date Title
WO2019100500A1 (zh) 语音信号降噪方法及设备
US10504539B2 (en) Voice activity detection systems and methods
EP2643834B1 (en) Device and method for producing an audio signal
EP3164871B1 (en) User environment aware acoustic noise reduction
EP2643981B1 (en) A device comprising a plurality of audio sensors and a method of operating the same
JP4532576B2 (ja) 処理装置、音声認識装置、音声認識システム、音声認識方法、及び音声認識プログラム
WO2018223727A1 (zh) 识别声纹的方法、装置、设备及介质
JP2007523374A (ja) 自動音声認識器のためのトレーニングデータを生成する方法およびシステム
CN108172231A (zh) 一种基于卡尔曼滤波的去混响方法及系统
JP2011530091A (ja) 特徴抽出を使用してスピーチ強調のためにオーディオ信号を処理する装置及び方法
CN108847253B (zh) 车辆型号识别方法、装置、计算机设备及存储介质
CN110383798A (zh) 声学信号处理装置、声学信号处理方法和免提通话装置
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
TWI523006B (zh) 利用聲紋識別進行語音辨識的方法及其電子裝置
Martín-Doñas et al. Dual-channel DNN-based speech enhancement for smartphones
WO2019119593A1 (zh) 语音增强方法及装置
CN111968651A (zh) 一种基于wt的声纹识别方法及系统
CN116312561A (zh) 一种电力调度系统人员声纹识别鉴权降噪和语音增强方法、系统及装置
JP6142402B2 (ja) 音響信号解析装置、方法、及びプログラム
CN112997249A (zh) 语音处理方法、装置、存储介质及电子设备
CN115223584A (zh) 音频数据处理方法、装置、设备及存储介质
Nataraj et al. Single channel speech enhancement using adaptive filtering and best correlating noise identification
WO2022068440A1 (zh) 啸叫抑制方法、装置、计算机设备和存储介质
US20240005937A1 (en) Audio signal processing method and system for enhancing a bone-conducted audio signal using a machine learning model
US20230154481A1 (en) Devices, systems, and methods of noise reduction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17932676

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17932676

Country of ref document: EP

Kind code of ref document: A1