WO2017063516A1 - 噪音信号确定方法、语音去噪方法及装置 - Google Patents

噪音信号确定方法、语音去噪方法及装置 Download PDF

Info

Publication number
WO2017063516A1
WO2017063516A1 PCT/CN2016/101444 CN2016101444W WO2017063516A1 WO 2017063516 A1 WO2017063516 A1 WO 2017063516A1 CN 2016101444 W CN2016101444 W CN 2016101444W WO 2017063516 A1 WO2017063516 A1 WO 2017063516A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
variance
segment
determining
speech
Prior art date
Application number
PCT/CN2016/101444
Other languages
English (en)
French (fr)
Inventor
杜志军
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to JP2018519388A priority Critical patent/JP6784758B2/ja
Priority to ES16854895T priority patent/ES2807529T3/es
Priority to EP16854895.6A priority patent/EP3364413B1/en
Priority to SG11201803004YA priority patent/SG11201803004YA/en
Priority to KR1020187013177A priority patent/KR102208855B1/ko
Priority to PL16854895T priority patent/PL3364413T3/pl
Publication of WO2017063516A1 publication Critical patent/WO2017063516A1/zh
Priority to US15/951,928 priority patent/US10796713B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Definitions

  • the present application relates to the field of voice denoising technology, and in particular, to a noise signal determining method, a voice denoising method, and a device.
  • Speech denoising is a technique that improves speech quality by removing ambient noise in speech signals. In the process of speech denoising, it is first necessary to determine the power spectrum of the noise signal in the speech signal, and then denoise according to the determined power spectrum of the noise signal.
  • the method for determining the power spectrum of the noise signal in the voice signal is generally: assuming that the first N frame signal in a voice signal is a noise signal (ie, does not include a human voice signal), thereby passing the first N frame signal. An analysis is performed to obtain a power spectrum of the noise signal in the speech signal.
  • the prior art determines the first N frame signal in the voice signal as a noise signal in a hypothetical manner, and the first N frame signal obtained by the assumed method does not match the actual noise signal, thereby affecting the acquisition.
  • the accuracy of the power spectrum of the noise signal is a reference to determine the first N frame signal in the voice signal as a noise signal in a hypothetical manner, and the first N frame signal obtained by the assumed method does not match the actual noise signal, thereby affecting the acquisition. The accuracy of the power spectrum of the noise signal.
  • the purpose of the embodiment of the present application is to provide a noise signal determining method, a voice denoising method, and a device, so as to solve the problem that the first N frame signal obtained by the assumption in the prior art does not match the actual noise signal, thereby affecting the acquired noise signal.
  • the problem of the accuracy of the power spectrum is to provide a noise signal determining method, a voice denoising method, and a device, so as to solve the problem that the first N frame signal obtained by the assumption in the prior art does not match the actual noise signal, thereby affecting the acquired noise signal.
  • the noise signal determining method the voice denoising method, and the apparatus provided by the embodiments of the present application are implemented as follows:
  • a method for determining a noise signal comprising:
  • a speech denoising method comprising:
  • a noise signal determining device includes:
  • a power spectrum acquisition unit configured to perform Fourier transform on each frame signal in the speech signal segment to be analyzed, to obtain a power spectrum of each frame signal in the speech signal segment;
  • a variance determining unit configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency
  • a noise determining unit configured to determine, according to the variance, whether each frame signal in the segment of the voice signal is a noise signal.
  • a speech denoising device comprising:
  • a segment determining unit configured to determine a segment of the speech signal to be analyzed included in the to-be-processed speech
  • a power spectrum acquisition unit configured to perform Fourier transform on each frame signal in the speech signal segment to be analyzed, to obtain a power spectrum of each frame signal in the speech signal segment;
  • a variance determining unit configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency
  • a noise determining unit configured to determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal, and obtain a plurality of noise frames included in the voice signal segment;
  • a voice denoising unit configured to determine a power average corresponding to the plurality of noise frames included in the voice signal segment, and perform voice denoising processing of the to-be-processed voice according to the power average of the noise frame.
  • the method for determining a noise signal provided by the embodiment of the present application can be seen by the technical solution provided by the embodiment of the present application.
  • the method and device for denoising a speech performing Fourier transform on the segment of the speech signal to be analyzed to obtain a power spectrum of each frame signal, and determining a variance of each frame signal in the speech signal segment to be analyzed with respect to a power value at each frequency, and finally Determining whether the frame signal is a noise signal according to the variance, thereby accurately obtaining a plurality of noise frames included in the voice signal segment to be analyzed; in the process of voice denoising, according to the power average of the plurality of noise frames determined above
  • the processing of the speech is performed to perform denoising processing, thereby improving the speech denoising effect.
  • FIG. 1 is a flowchart of a method for determining a noise signal according to an embodiment of the present application
  • FIG. 2 is a flowchart of a step of determining whether a frame signal is a noise signal in an embodiment of the present application
  • FIG. 3 is a flowchart of a step of determining a variance of a power value of a frame signal at each sampling point in the embodiment of the present application;
  • FIG. 5 is a flowchart of a voice denoising method according to an embodiment of the present application.
  • FIG. 6 is a block diagram of a noise signal determining apparatus according to an embodiment of the present application.
  • FIG. 7 is a block diagram of a voice denoising device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of hardware implementation of the apparatus provided by the present application.
  • the noise signal determining method of the embodiment includes the following steps:
  • S101 Perform Fourier transform on each frame signal in the segment of the speech signal to be analyzed to obtain the segment of the speech signal. The power spectrum of each frame signal.
  • the segment of the speech signal to be analyzed may be intercepted from the speech to be processed by certain rules.
  • the segment of the speech signal to be analyzed may be a "suspected noise frame segment" that may initially contain more noise frames.
  • the method further includes:
  • determining, according to the amplitude change of the time domain signal of the to-be-processed voice, a segment of the speech signal included in the to-be-processed speech whose amplitude variation is less than a preset threshold is the segment of the speech signal to be analyzed.
  • the noise signal is usually a segment of the speech signal with a small amplitude or a relatively uniform amplitude, and the speech signal segment containing the speech of the person usually fluctuates greatly.
  • a preset threshold for identifying a "suspected noise frame segment" contained in the speech to be processed ie, the speech to be denoised
  • the segment of the speech signal included in the to-be-processed speech whose amplitude variation is less than the preset threshold may be determined as the segment of the speech signal to be analyzed.
  • the speech signal is first subjected to frame processing
  • the frame signal refers to a single frame speech signal
  • a segment of the speech signal includes a frame signal of several frames.
  • a frame signal may include several sampling points, such as: 1024 sample points, and adjacent two frame signals may overlap each other (for example, the coincidence degree is 50%).
  • the power spectrum (frequency domain) of the speech signal can be obtained by performing short-time Fourier transform (STFT) on the speech signal in the time domain.
  • STFT short-time Fourier transform
  • the power spectrum contains a plurality of power values corresponding to different frequencies, such as: 1024 power values.
  • a voice signal in a voice signal including a human voice, may be a noise signal (ambient noise) before a person starts speaking, by a period of time (eg, 1.5 s).
  • the embodiment of the present application may determine that the voice signal to be analyzed is a frame signal of a first N frame in a voice signal, for example, the voice signal to be analyzed is a voice signal of the first 1.5 seconds: ⁇ f 1 ', f' 2 , ..., f' n ⁇ , where f 1 ', f' 2 , ..., f' n respectively refer to respective frame signals contained in the speech signal.
  • the purpose of the embodiment of the present application is to determine which of the analyzed speech signals are noise signals.
  • a plurality of power values corresponding to each frame signal can be calculated.
  • the power spectrum of a certain frame signal at a certain frequency is a+bi
  • the real part a can represent the amplitude
  • the imaginary part b can represent the phase
  • the power value of the frame signal at the frequency is: a 2 + b 2 .
  • each frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇ contains 1024 sample points
  • 1024 power values of each frame signal at different frequencies can be obtained according to the power spectrum.
  • the power value corresponding to the frame signal f 1 ' is:
  • the power value corresponding to the frame signal f' 2 is:
  • the power value corresponding to the frame signal f' n is:
  • S102 Determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency.
  • the respective frame signals ⁇ f 1 ', f' 2 , ..., f' n ⁇ at respective frequencies the respective frame signals ⁇ f 1 ', f' 2 , ..., f can be respectively calculated according to the variance calculation formula.
  • ' n The variance of the power value ⁇ Var(f 1 '), Var(f' 2 ), ..., Var(f' n ) ⁇ .
  • Var(f 1 ') is about Variance
  • Var(f' 2 ) is about Variance
  • ..., Var(f' n ) is about Variance.
  • S103 Determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal.
  • the energy (ie, power value) of the frame signal including the segmented segment has a large change with the frequency band.
  • the energy of a frame signal (ie, a noise signal) that does not contain a segment of speech is relatively small as the frequency band changes, and the distribution is relatively uniform. Therefore, whether the frame signal is a noise signal can be determined according to the variance of each frame signal with respect to the power value.
  • step S103 may include:
  • S1031 Determine whether a variance of the frame signal with respect to the power value is greater than a first threshold T 1 .
  • the variance of a certain frame signal with respect to the power value exceeds the first threshold value T 1 , it indicates that the energy of the frame signal (ie, the power value) varies with the frequency band by more than the first threshold value T 1 , so that it can be determined that the frame signal is not a noise signal.
  • the variance of a certain frame signal with respect to the power value does not exceed the first threshold value T 1 , it indicates that the energy (ie, the power value) of the frame signal does not exceed the first threshold value T 1 with the frequency band, so that the The frame signal is a noise signal.
  • the speech signals to be analyzed can be sequentially determined: the frame signals ⁇ f 1 ', f' 2 , ..., f' belonging to the noise signal in ⁇ f 1 ', f' 2 , ..., f' n ⁇ m ⁇ and the frame signals ⁇ f' m+1 , f' m+2 , . . . , f' n ⁇ which are not part of the noise signal, so that the noise signals contained in a piece of speech signal can be determined, and according to these noise signals ⁇ f 1 ',f' 2 ,...,f' m ⁇ for speech denoising.
  • step S102 may specifically include:
  • the variance statistics are performed on each frame signal in the frequency domain. Since the non-noise signals are generally concentrated in the middle and low frequency bands, the noise signals are generally distributed uniformly in each frequency band, and therefore, for each frame signal corresponding to The power values of the respective frequencies respectively calculate the variance of at least two different frequency bands (ie, the above frequency intervals).
  • the first frequency interval may be 0 to 2000 Hz (low frequency band), and the second frequency interval may be 2000 to 4000 Hz (high frequency band).
  • the 1024 power values corresponding to each frame signal are respectively classified into the first power value set A corresponding to 0 to 2000 Hz according to the frequency interval, and 2000. ⁇ 4000 Hz corresponds to the second power value set B.
  • the corresponding 1024 power values are: Then, according to the frequency interval, the power value included in the first power value set A can be obtained, for example: The power value included in the first power value set A can be obtained, for example: And so on.
  • more than two frequency bands may be divided, and the variance of signal power values of two or more frequency bands may be separately counted.
  • S1022 Determine a first variance of the power value included in the first power value set.
  • the power value included in the first power value set A is obtained, for example:
  • the power value can be calculated according to the variance formula The first variance Var high (f 1 ').
  • S1021 Determine a second variance of the power values included in the second set of power values.
  • the power value included in the second power value set B is obtained, for example:
  • the power value can be calculated according to the variance formula The second variance Var low (f 1 ').
  • FIG. 4 it is a schematic diagram of a variance curve in the embodiment of the present application.
  • the horizontal axis represents the frame number of the frame signal
  • the vertical axis represents the magnitude of the variance
  • the first variance curve shows the trend of the first variance of each of the above frame signals, the first variance curve showing each of the above The trend of the second variance of the frame signal.
  • step S1031 may specifically include:
  • first variance of the frame signal with respect to the power value is greater than a first threshold T 1 . If so, it is determined that the frame signal is a noise signal. Taking the frame signal f 1 ' as an example, it is determined whether the first variance Var high (f 1 ') is greater than the first threshold T 1 .
  • step S103 may further include:
  • the frame signal is determined to be a noise signal.
  • the difference between the first variance and the second variance is:
  • the speech signals to be analyzed can be determined in sequence: which frame signals in ⁇ f 1 ', f' 2 , ..., f' n ⁇ are noise signals.
  • step S102 between step S102 and step S103, the method further includes:
  • each frame signal in the segment of the speech signal to be analyzed is sorted according to the size of the variance
  • determining, according to the variance, whether each frame signal in the voice signal segment is a noise signal comprising: determining, according to a variance of each frame signal obtained by sorting, a power value at each frequency, determining the voice signal segment Whether each frame signal is a noise signal.
  • the present embodiment can separately determine the frame signal: ⁇ f 1 ', f' 2 , ..., f' n ⁇ with respect to the variance of the power value: ⁇ Var(f 1 '), Var(f' 2 ),... , Var(f' n ) ⁇ .
  • the frame signals are sorted according to the variance of the power values from small to large. The smaller the variance, the more likely the noise signal is. Therefore, the frame signals belonging to the noise signals among the speech signals to be analyzed can be sorted to the forefront by sorting.
  • the variances of the low frequency band (for example, 0 to 2000 Hz) and the high frequency band (for example, 2000 to 4000 Hz) are respectively counted, according to each frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇
  • the frequency interval in which the frequency corresponding to the power spectrum is located the power value of each frame signal at each frequency is classified into the first power value set A corresponding to the first frequency interval (for example, 0 to 2000 Hz), and
  • the second frequency range (for example, 2000 to 4000 Hz) corresponds to the second power value set B.
  • the first variance ⁇ Var low (f 1 ') of the power value included in the first power value set corresponding to the frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇ is determined, Var low ( f' 2 ), ..., Var low (f' n ) ⁇ ; respectively determining the second power value included in the second power value set corresponding to the frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇ Variance ⁇ Var high (f 1 '), Var high (f' 2 ),..., Var high (f' n ) ⁇ .
  • the above step S104 may determine the noise signal included in the speech signal to be analyzed (which may be a speech signal sorted according to the variance size) as follows:
  • the second variance Var high (f' i-1 ) of the previous frame signal f' i-1 of each frame signal f i ' with respect to the power value and the subsequent frame of the frame signal can be sequentially determined. Whether the difference Var high (f' i+1 )-Var high (f' i-1 ) of the signal f' i+1 with respect to the second variance Var high (f' i+1 ) of the power value is greater than the third Threshold T 3 , if not, the frame signal f i ' is determined as a noise frame signal; the determined set of noise frame signals is determined as a noise signal.
  • the first variance of the previous frame signal f'i -1 of each frame signal f i ' with respect to the power value Var low (f' i-1 ) and the latter of the frame signal can be sequentially determined. Whether the difference Var of the frame signal f' i+1 with respect to the first variance of the power value Var low (f' i+1 ) Var low (f' i+1 ) - Var low (f' i-1 ) is greater than the fourth Threshold T 4 , if not, the frame signal f i ' is determined as a noise frame signal; the determined set of noise frame signals is determined as a noise signal.
  • the noise frame included in the speech signal to be analyzed may be identified by the above formulas (1) to (4). That is, for any one of the frame signals f i ', if it satisfies any one of the above formulas (1) to (4), it can be determined that the frame signal is a non-noise signal (noise cutoff frame). In other words, for any one of the frame signals f i ', if none of the above formulas (1) to (4) is satisfied, it can be determined that the frame signal is a noise signal.
  • the noise cutoff frame f 'm then the noise frame comprises: ⁇ f 1', f ' 2, ..., f' m-1 ⁇ .
  • the noise cutoff frame can be determined by some formulas in the above formulas (1) to (4), such as: formula (1) and formula (2), formula (2) and Formula (3). Furthermore, the formula for determining the noise cutoff frame of the embodiment of the present application is not limited to the formulas listed above.
  • the above thresholds T 1 , T 2 , T 3 , and T 4 are all obtained by counting a large number of test samples.
  • FIG. 5 is a flowchart of a voice denoising method according to an embodiment of the present application, including:
  • S201 Determine a segment of the speech signal to be analyzed included in the to-be-processed speech.
  • S202 Perform Fourier transform on each frame signal in the segment of the speech signal to be analyzed to obtain a power spectrum of each frame signal in the segment of the speech signal.
  • S203 Determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency.
  • S204 Determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal, and obtain a plurality of noise frames included in the voice signal segment.
  • S205 Determine a power average corresponding to the plurality of noise frames included in the voice signal segment, and perform voice denoising processing on the to-be-processed voice according to the power average of the noise frame.
  • the speech denoising process can be performed. Since the denoising method is a technique well known in the art, it will not be described in detail herein.
  • the step of sorting the frame signals according to the variance may be omitted, and each frame of the original signal is directly determined to determine which frames are noise frames.
  • a part of the frame is usually taken to calculate the power spectrum estimation value P noise . For example, if the determined noise signal is 50 frames, The first 30 frames are intercepted to calculate the power spectrum estimation value P noise , and the accuracy of the power spectrum estimation value is improved.
  • the embodiment of the present application further provides a noise signal determining device.
  • the device can be implemented by software, or can be implemented by hardware or a combination of hardware and software.
  • the CPU Central Process Unit
  • the CPU reads the corresponding computer program instructions into the memory.
  • a hardware structure of the device can be seen in FIG.
  • FIG. 6 is a block diagram of a noise signal determining apparatus according to an embodiment of the present application.
  • the functions of the units in the device may correspond to the functions in the steps of the noise signal determining method.
  • the noise signal determining apparatus 100 includes:
  • the power spectrum acquisition unit 101 is configured to perform Fourier transform on each frame signal in the segment of the speech signal to be analyzed, Obtaining a power spectrum of each frame signal in the segment of the speech signal;
  • the variance determining unit 102 is configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency;
  • the noise determining unit 103 is configured to determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal.
  • the device further includes: a segment obtaining unit, configured to:
  • a segment of the speech signal included in the to-be-processed speech that has a magnitude change less than a preset threshold is the segment of the speech signal to be analyzed
  • the noise determining unit 103 is configured to:
  • the frame signal is determined to be a noise signal.
  • the variance determination unit 102 is configured to:
  • the noise determining unit 103 is configured to:
  • the frame signal is determined to be a noise signal.
  • the variance determining unit 102 is specifically configured to:
  • the power value of the frame signal at each frequency is classified into the first power value set corresponding to the first frequency interval, and the second a second power value set corresponding to the frequency interval; wherein the first frequency interval is smaller than the second frequency interval;
  • the noise determining unit 103 is configured to:
  • the frame signal is determined to be a noise signal.
  • the embodiment of the present application further provides a voice denoising device.
  • the device can be implemented by software, or can be implemented by hardware or a combination of hardware and software. Take software implementation as an example, as logic
  • the device in the sense is formed by the CPU (Central Process Unit) of the server reading the corresponding computer program instructions into the memory.
  • a hardware structure of the device can be seen in FIG.
  • FIG. 7 is a block diagram of a speech denoising apparatus according to an embodiment of the present application.
  • the functions of the units in the device may correspond to the functions in the steps of the voice denoising method.
  • the voice denoising apparatus 200 includes:
  • a segment determining unit 201 configured to determine a segment of the speech signal to be analyzed included in the to-be-processed speech
  • the power spectrum acquisition unit 202 performs Fourier transform on each frame signal in the speech signal segment to be analyzed to obtain a power spectrum of each frame signal in the speech signal segment;
  • the variance determining unit 203 is configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency;
  • the noise determining unit 205 is configured to determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal, and obtain a plurality of noise frames included in the voice signal segment;
  • the voice denoising unit 10 is configured to determine a power average corresponding to the plurality of noise frames included in the voice signal segment, and perform voice denoising processing of the to-be-processed voice according to the power average of the noise frame.
  • the apparatus further comprises a sorting unit 204 for:
  • each frame signal in the segment of the speech signal to be analyzed is sorted according to the size of the variance
  • the noise determining unit 205 is specifically configured to:
  • each frame signal in the segment of the speech signal is a noise signal.
  • the noise signal determining method and the voice denoising method and apparatus obtained a power spectrum of each frame signal by Fourier transform of the speech signal segment to be analyzed, and determine each frame signal in the speech signal segment to be analyzed. Regarding the variance of the power values at each frequency, finally determining whether the frame signal is a noise signal according to the variance described above, thereby accurately obtaining a plurality of noise frames included in the speech signal segment to be analyzed; in the process of speech denoising, The denoising process can be performed on the processed speech according to the power average of the plurality of noise frames determined above, thereby improving the speech denoising effect.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention may be embodied in one or more of the computer-usable program code embodied therein.
  • the computer is in the form of a computer program product embodied on a storage medium, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the present application can also be practiced in distributed computing environments in these distributed computing environments. The task is performed by a remote processing device that is connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)
  • Noise Elimination (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种噪音信号确定方法、语音去噪方法及装置,噪音信号确定方法包括:对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱(S101);根据帧信号的功率谱确定语音信号片段中各帧信号关于各频率下的功率值的方差(S102);根据方差确定语音信号片段中的各帧信号是否为噪音信号(S103)。可以准确地得到待分析的语音信号片段中包含的若干噪音帧,进而提升语音去噪效果。

Description

噪音信号确定方法、语音去噪方法及装置
本申请要求2015年10月13日递交的申请号为201510670697.8、发明名称为“噪音信号确定方法、语音去噪方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音去噪技术领域,特别涉及一种噪音信号确定方法、语音去噪方法及装置。
背景技术
语音去噪技术是通过去除语音信号中的环境噪音来提升语音质量的技术。在语音去噪过程中,首先需确定语音信号中噪音信号的功率谱,后续再根据所确定的噪音信号的功率谱来作去噪。
现有技术中,确定语音信号中噪音信号的功率谱的方式通常是:假定一段语音信号中的前N帧信号是噪音信号(即不包含人的语音信号),从而通过对上述前N帧信号进行分析,得到该语音信号中的噪音信号的功率谱。
在实际应用场景中,现有技术通过假定的方式将语音信号中的前N帧信号确定为噪音信号,往往出现通过假定方式获得的前N帧信号与实际的噪音信号不符的情况,从而影响获取的噪音信号的功率谱的准确性。
发明内容
本申请实施例的目的是提供一种噪音信号确定方法、语音去噪方法及装置,以解决现有技术中通过假定方式获得的前N帧信号与实际的噪音信号不符,从而影响获取的噪音信号的功率谱的准确性的问题。
为解决上述技术问题,本申请实施例提供的噪音信号确定方法、语音去噪方法及装置是这样实现的:
一种噪音信号确定方法,包括:
对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率 值的方差;
根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
一种语音去噪方法,包括:
确定待处理语音中包含的待分析的语音信号片段;
对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
根据所述方差确定所述语音信号片段中的各帧信号是否为噪音信号,获得所述语音信号片段中包含的若干噪音帧;
确定与所述语音信号片段中包含的若干噪音帧对应的功率均值,并依据所述噪音帧的功率均值进行所述待处理语音的语音去噪处理。
一种噪音信号确定装置,包括:
功率谱获取单元,用于对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
方差确定单元,用于根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
噪音确定单元,用于根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
一种语音去噪装置,包括:
片段确定单元,用于确定待处理语音中包含的待分析的语音信号片段;
功率谱获取单元,用于对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
方差确定单元,用于根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
噪音确定单元,用于根据所述方差确定所述语音信号片段中的各帧信号是否为噪音信号,获得所述语音信号片段中包含的若干噪音帧;
语音去噪单元,用于确定与所述语音信号片段中包含的若干噪音帧对应的功率均值,并依据所述噪音帧的功率均值进行所述待处理语音的语音去噪处理。
由以上本申请实施例提供的技术方案可见,本申请实施例提供的噪音信号确定方法、 语音去噪方法及装置,通过对待分析的语音信号片段进行傅里叶变换得到各帧信号的功率谱,并确定待分析的语音信号片段中各帧信号关于各频率下的功率值的方差,最终根据上述方差来确定该帧信号是否为噪音信号,从而准确地得到上述待分析的语音信号片段中包含的若干噪音帧;在语音去噪的过程中,可以依据上述确定的若干噪音帧的功率均值来对待处理语音进行去噪处理,进而提升语音去噪效果。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请一实施例中噪音信号确定方法的流程图;
图2为本申请实施例中确定帧信号是否是噪音信号的步骤的流程图;
图3为本申请实施例中确定帧信号在各个采样点上的功率值的方差的步骤的流程图;
图4为本申请实施例中关于功率值的方差曲线图;
图5为本申请一实施例中语音去噪方法的流程图;
图6为本申请一实施例中噪音信号确定装置的模块图;
图7为本申请一实施例中语音去噪装置的模块图;
图8为本申请提供的装置的硬件实现结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
参照图1所示,其为本申请一实施例中噪音信号确定方法的流程图。为了确定一段待分析的语音信号片段中的噪音信号,本实施例的噪音信号确定方法包括如下步骤:
S101:对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段 中的各帧信号的功率谱。
上述待分析的语音信号片段可以通过一定的规则从待处理语音中截取得到。该待分析的语音信号片段可以是初步判断可能包含较多噪音帧的“疑似噪音帧片段”。优选地,在该步骤S101之前,所述方法还包括:
根据待处理语音的时域信号的幅度变化,确定该待处理语音中的包含的一段幅度变化小于预设阈值的语音信号片段为所述待分析的语音信号片段。
或,截取待处理语音中的前N帧语音信号作为所述待分析的语音信号片段。
本申请实施例中,一般在语音信号的时域中,噪音信号通常是变化幅度较小或幅度较一致的一段语音信号片段,而包含人的说话语音的语音信号片段通常变化幅度波动较大,根据这一规则,可以预先设定一个用以识别待处理语音(即待去噪处理的语音)中包含的“疑似噪音帧片段”的预设阈值。从而可以将该待处理语音中的包含的一段幅度变化小于预设阈值的语音信号片段确定为所述待分析的语音信号片段。
本申请实施例中,首先对语音信号进行分帧处理,帧信号是指单帧语音信号,一段语音信号包含若干帧的帧信号。一个帧信号可以包括若干个采样点,如:1024个采样点,相邻的两个帧信号可以存在相互重合(如重合度是50%)。本实施例可以通过将时域的语音信号作短时傅里叶变换(short-time Fourier transform,STFT),得到该语音信号的功率谱(频域)。功率谱包含多个对应于不同频率的功率值,如:1024个功率值。
本申请实施例中,通常在一段包含人的语音的语音信号中,在人开始说话之前,可以默认开始说话之前的一段时间(如:1.5s)的语音信号是噪音信号(环境噪音),故,本申请实施例可以确定上述待分析的语音信号是一段语音信号中的前N帧的帧信号,如:待分析的语音信号是前1.5s的语音信号:{f1',f'2,…,f'n},其中,f1',f'2,…,f'n分别指代该语音信号中包含的各个帧信号。本申请实施例的目的是:确定该分析的语音信号中哪些帧信号是噪音信号。
基于通过短时傅里叶变换得到的待分析的语音信号:{f1',f'2,…,f'n}的功率谱,可以计算得到每个帧信号对应的多个功率值。其中,假设某个帧信号在某个频率上的功率谱是a+bi,实部a可以代表幅度,虚部b可以代表相位,则该帧信号在该频率下的功率值是:a2+b2。通过以上过程,可以得到每个帧信号在对应的不同频率下的功率值。举例而言,若每个帧信号{f1',f'2,…,f'n}均包含1024个采样点,则可以根据功率谱得到每个帧信号在不同频率下的1024个功率值,如:帧信号f1'对应的功率值是:
Figure PCTCN2016101444-appb-000001
帧信号f'2对应的功率值是:
Figure PCTCN2016101444-appb-000002
帧信号f'n对应的功率值是:
Figure PCTCN2016101444-appb-000003
S102:根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差。
基于各个帧信号{f1',f'2,…,f'n}在各个频率的功率值,可以根据方差计算公式,分别计算得到各个帧信号{f1',f'2,…,f'n}关于功率值的方差{Var(f1'),Var(f'2),…,Var(f'n)}。其中,若以1024个采样点为例,Var(f1')是关于
Figure PCTCN2016101444-appb-000004
的方差,Var(f'2)是关于
Figure PCTCN2016101444-appb-000005
的方差,…,Var(f'n)是关于
Figure PCTCN2016101444-appb-000006
的方差。
S103:根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
本申请实施例中,通常,包含有话片段的帧信号的能量(即功率值)随频带有较大的变化。而不包含有话片段的帧信号(即噪音信号)的能量随频带的变化相对较小,分布较均匀。故,可以根据各个帧信号关于功率值的方差,来确定该帧信号是否为噪音信号。
参图2所示,其为本申请实施例中确定帧信号是否是噪音信号的步骤的流程图。本申请实施例中,上述步骤S103可以包括:
S1031:判断该帧信号关于功率值的方差是否大于第一阈值T1
S1032:若否,将该帧信号确定为噪音信号。
如果某个帧信号关于功率值的方差超过第一阈值T1,则表明该帧信号的能量(即功率值)随频带的变化幅度超过第一阈值T1,从而可以确定该帧信号不是噪音信号;反之,如果某个帧信号关于功率值的方差没有超过第一阈值T1,则表明该帧信号的能量(即功率值)随频带的变化幅度没有超过第一阈值T1,从而可以确定该帧信号是噪音信号。
通过如上过程,可以依次确定到待分析的语音信号:{f1',f'2,…,f'n}中的属于噪音信号的帧信号{f1',f'2,…,f'm}和不属于噪音信号的帧信号{f'm+1,f'm+2,…,f'n},从而可以确定到一段语音信号中包含的噪音信号,并根据这些噪音信号{f1',f'2,…,f'm}来作语音去噪。
参图3所示,本申请实施例中,上述步骤S102可以具体包括:
S1021:根据帧信号{f1',f'2,…,f'n}的功率谱对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中、及与第二频 率区间对应的第二功率值集合中;其中,所述第一频率区间小于所述第二频率区间。
在具体实施例中,在频域对每个帧信号进行方差统计,由于非噪音信号一般集中在中低频段,而噪音信号一般在各个频段分布较为均匀,故,对于每个帧信号所对应的各个频率的功率值,分别至少统计两个不同频段(即上述频率区间)的方差。
举例而言,第一频率区间可以是0~2000Hz(低频段),第二频率区间可以是2000~4000Hz(高频段)。若每帧信号包括的采样点是1024个,则分别将每帧信号对应的1024个功率值按照所处的频率区间,分别归分到0~2000Hz对应的第一功率值集合A中、及2000~4000Hz对应的第二功率值集合B中。以帧信号f1'为例,其对应的1024个功率值是:
Figure PCTCN2016101444-appb-000007
则按照频率区间,可以得到第一功率值集合A包括的功率值例如是:
Figure PCTCN2016101444-appb-000008
可以得到第一功率值集合A包括的功率值例如是:
Figure PCTCN2016101444-appb-000009
以此类推。
值得一提的是,本申请其他实施例中,可以划分两个以上的频段,并分别统计两个以上的频段的信号功率值的方差。
S1022:确定所述第一功率值集合中包含的功率值的第一方差。
如上所述,若以帧信号f1'为例,得到第一功率值集合A包括的功率值例如是:
Figure PCTCN2016101444-appb-000010
可以依据方差公式,计算得到功率值
Figure PCTCN2016101444-appb-000011
的第一方差Varhigh(f1')。
S1021:确定所述第二功率值集合中包含的功率值的第二方差。
如上所述,若以帧信号f1'为例,得到第二功率值集合B包括的功率值例如是:
Figure PCTCN2016101444-appb-000012
可以依据方差公式,计算得到功率值
Figure PCTCN2016101444-appb-000013
的第二方差Varlow(f1')。
参照图4所示,其为本申请实施例中的方差曲线示意图。其中,横轴表示帧信号的帧序号,纵轴表示方差的大小,第一方差曲线示出了上述每个帧信号的第一方差的走势,第一方差曲线示出了上述每个帧信号的第二方差的走势。从图中可以看出:在高频段2000~4000Hz,方差波动并不大;而在低频段0~2000Hz,方差波动较大,这就验证了非噪音信号主要集中在低频段。
如上所述,在本申请优选实施例中,上述步骤S1031可以具体包括:
判断该帧信号关于功率值的第一方差是否大于第一阈值T1。若是,则判定该帧信号 为噪音信号。以帧信号f1'为例,判断第一方差Varhigh(f1')是否大于第一阈值T1
本申请实施例中,上述步骤S103还可以具体包括:
判断第一方差与第二方差的差值是否大于第二阈值T2
若否,将该帧信号确定为噪音信号。
以帧信号f1'为例,第一方差和第二方差的差值是:|Varhigh(f1')-Varlow(f1')|,若|Varhigh(f1')-Varlow(f1')|<T2,则判定该帧信号f1'为噪音信号。按照此步骤,可以依次确定到待分析的语音信号:{f1',f'2,…,f'n}中哪些帧信号是噪音信号。
本申请实施例中,在步骤S102和步骤S103之间,所述方法还包括:
将所述待分析的语音信号片段中的各帧信号按照所述方差的大小进行排序;
则,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:基于排序得到的各帧信号关于各频率下的功率值的方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
如上所述,本实施例可以分别确定帧信号:{f1',f'2,…,f'n}关于功率值的方差:{Var(f1'),Var(f'2),…,Var(f'n)}。将帧信号按照功率值的方差从小到大进行排序,由于方差越小,越可能是噪音信号,故,通过排序可以将待分析的语音信号中的属于噪音信号的帧信号排序到前列。本申请实施例中,若分别统计低频段(例如:0~2000Hz)和高频段(例如:2000~4000Hz)的方差,根据各个帧信号{f1',f'2,…,f'n}的功率谱对应的频率所处的频率区间,将每帧信号在各个频率上的功率值归入与第一频率区间(例如:0~2000Hz)对应的第一功率值集合A中、及与第二频率区间(例如:2000~4000Hz)对应的第二功率值集合B中。随后,分别确定帧信号{f1',f'2,…,f'n}对应的第一功率值集合中包含的功率值的第一方差{Varlow(f1'),Varlow(f'2),…,Varlow(f'n)};分别确定帧信号{f1',f'2,…,f'n}对应的第二功率值集合中包含的功率值的第二方差{Varhigh(f1'),Varhigh(f'2),…,Varhigh(f'n)}。基于上述高频和低频的方差统计,上述步骤S104可以通过如下方式来确定待分析的语音信号(可以是按照方差大小进行排序后的语音信号)中包含的噪音信号:
Varlow(fi')>T1          (1);
|Varhigh(fi')-Varlow(fi')|>T2     (2);
Varhigh(f'i+1)-Varhigh(f'i-1)>T3    (3);
Varlow(f'i+1)-Varlow(f'i-1)>T4      (4);
其中,i∈(1,n),通过上述公式(1),可以依次判断每帧信号fi'关于所述功率值的第一方差是否大于第一阈值T1,若否,将该帧信号fi'确定为噪音帧信号;将确定的噪音帧信号的集合确定为噪音信号。
通过上述公式(2),可以依次判断每帧信号fi'关于所述功率值的第二方差是否大于第二阈值T2,若否,将该帧信号fi'确定为噪音帧信号;将确定的噪音帧信号的集合确定为噪音信号。
通过上述公式(3),可以依次判断每帧信号fi'的前一个帧信号f'i-1关于功率值的第二方差Varhigh(f'i-1)与该帧信号的后一个帧信号f'i+1关于所述功率值的第二方差Varhigh(f'i+1)的差值Varhigh(f'i+1)-Varhigh(f'i-1)是否大于第三阈值T3,若否,将该帧信号fi'确定为噪音帧信号;将确定的噪音帧信号的集合确定为噪音信号。
通过上述公式(4),可以依次判断每帧信号fi'的前一个帧信号f'i-1关于功率值的第一方差Varlow(f'i-1)与该帧信号的后一个帧信号f'i+1关于功率值的第一方差Varlow(f'i+1)的差值Varlow(f'i+1)-Varlow(f'i-1)是否大于第四阈值T4,若否,将该帧信号fi'确定为噪音帧信号;将确定的噪音帧信号的集合确定为噪音信号。
本申请实施例中,可以通过上述公式(1)~(4)来识别待分析的语音信号中包含的噪音帧。也就是说,对于任意一个帧信号fi'而言,若其满足上述公式(1)~(4)中的任意一个,则可以确定该帧信号为非噪音信号(噪音截止帧)。换句话说,对于任意一个帧信号fi'而言,若上述公式(1)~(4)均不满足,则可以确定该帧信号为噪音信号。通过上述过程,可以确定噪音截止帧f'm,则噪音帧包括:{f1',f'2,…,f'm-1}。
值得提及的是,本申请其他实施例中,可以通过上述公式(1)~(4)中部分公式来确定噪音截止帧,比如:公式(1)和公式(2),公式(2)和公式(3)。此外,本申请实施例的用以确定噪音截止帧的公式并不限于上述所列举的公式。其中,上述阈值T1、 T2、T3、T4均是通过大量测试样本统计得到的。
图5为本申请一实施例中语音去噪方法的流程,包括:
S201:确定待处理语音中包含的待分析的语音信号片段。
S202:对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱。
S203:根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差。
S204:根据所述方差确定所述语音信号片段中的各帧信号是否为噪音信号,获得所述语音信号片段中包含的若干噪音帧。
S205:确定与所述语音信号片段中包含的若干噪音帧对应的功率均值,并依据所述噪音帧的功率均值进行所述待处理语音的语音去噪处理。
本申请实施例中,在依据上述方法获取到一段待分析语音片段中包含的噪音帧{f1',f'2,…,f'm-1}后,可以确定这些噪音帧分别对应到原始信号(排序之前)中的帧序号,并统计这些帧信号的功率均值,从而获得噪音信号的功率谱估计值Pnoise。在获取得到噪音信号的功率谱估计值Pnoise后,可以进行语音去噪处理。由于去噪方法属于本领域普通技术所熟知的技术,本文在此不再予以具体叙述。
当然,本申请其他可行的实施例中,可以省去按照方差对帧信号进行排序的步骤,而是直接通过原始信号的各个方差来确定哪些帧是噪音帧。另外,在本申请所确定的多帧噪音信号后,为了避免过估计的情况,通常是取其中一部分帧来进行功率谱估计值Pnoise的计算,如:确定的噪音信号是50帧,则可以截取其中的前30帧来进行功率谱估计值Pnoise的计算,提高功率谱估计值的准确性。
与上述流程实现对应,本申请的实施例还提供了一种噪音信号确定装置。该装置可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为逻辑意义上的装置,是通过服务器的CPU(Central Process Unit,中央处理器)将对应的计算机程序指令读取到内存中运行形成的。该装置的一种硬件结构可参见图8所示。
图6为本申请一实施例中噪音信号确定装置的模块图。本实施例中,该装置中各单元的功能可以与上述噪音信号确定方法的各步骤中的功能对应,具体内容可以参照上述方法实施例。所述噪音信号确定装置100包括:
功率谱获取单元101,用于对待分析的语音信号片段中的各帧信号作傅里叶变换, 得到该语音信号片段中的各帧信号的功率谱;
方差确定单元102,用于根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
噪音确定单元103,用于根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
优选地,所述装置还包括:片段获取单元,用于:
根据待处理语音的时域信号的幅度变化,确定该待处理语音中的包含的一段幅度变化小于预设阈值的语音信号片段为所述待分析的语音信号片段;
或,截取待处理语音中的前N帧语音信号作为所述待分析的语音信号片段。
优选地,所述噪音确定单元103用于:
判断与所述语音信号片段中的各帧信号对应的所述方差是否大于第一阈值;
若否,将所述帧信号确定为噪音信号。
优选地,所述方差确定单元102用于:
根据所述功率谱对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中;
确定所述第一功率值集合中包含的功率值的第一方差;
则,所述噪音确定单元103用于:
判断所述第一方差是否大于第一阈值;
若否,将该帧信号确定为噪音信号。
优选地,所述方差确定单元102具体用以:
根据每个帧信号对应的各功率值对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中、及与第二频率区间对应的第二功率值集合中;其中,所述第一频率区间小于所述第二频率区间;
确定所述第一功率值集合中包含的功率值的第一方差;
确定所述第二功率值集合中包含的功率值的第二方差;
则,所述噪音确定单元103用于:
判断与每个帧信号对应的所述第一方差与所述第二方差的差值是否大于第二阈值;
若否,将该帧信号确定为噪音信号。
与上述流程实现对应,本申请的实施例还提供了一种语音去噪装置。该装置可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为逻辑 意义上的装置,是通过服务器的CPU(Central Process Unit,中央处理器)将对应的计算机程序指令读取到内存中运行形成的。该装置的一种硬件结构可参见图8所示。
图7为本申请一实施例中语音去噪装置的模块图。本实施例中,该装置中各单元的功能可以与上述语音去噪方法的各步骤中的功能对应,具体内容可以参照上述方法实施例。本实施例中,所述语音去噪装置200包括:
片段确定单元201,用于确定待处理语音中包含的待分析的语音信号片段;
功率谱获取单元202,用于对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
方差确定单元203,用于根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
噪音确定单元205,用于根据所述方差确定所述语音信号片段中的各帧信号是否为噪音信号,获得所述语音信号片段中包含的若干噪音帧;
语音去噪单元10,用于确定与所述语音信号片段中包含的若干噪音帧对应的功率均值,并依据所述噪音帧的功率均值进行所述待处理语音的语音去噪处理。
优选地,所述装置还包括排序单元204,用于:
将所述待分析的语音信号片段中的各帧信号按照所述方差的大小进行排序;
则,噪音确定单元205具体用于:
基于排序得到的各帧信号关于各频率下的功率值的方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
本申请实施例提供的噪音信号确定方法、语音去噪方法及装置,通过对待分析的语音信号片段进行傅里叶变换得到各帧信号的功率谱,并确定待分析的语音信号片段中各帧信号关于各频率下的功率值的方差,最终根据上述方差来确定该帧信号是否为噪音信号,从而准确地得到上述待分析的语音信号片段中包含的若干噪音帧;在语音去噪的过程中,可以依据上述确定的若干噪音帧的功率均值来对待处理语音进行去噪处理,进而提升语音去噪效果。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的 计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境 中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (18)

  1. 一种噪音信号确定方法,其特征在于,包括:
    对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
    根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
    根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
  2. 根据权利要求1所述的方法,其特征在于,对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱之前,所述方法还包括:
    根据待处理语音的时域信号的幅度变化,确定该待处理语音中的包含的一段幅度变化小于预设阈值的语音信号片段为所述待分析的语音信号片段;
    或,截取待处理语音中的前N帧语音信号作为所述待分析的语音信号片段。
  3. 根据权利要求1所述的方法,其特征在于,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:
    判断与所述语音信号片段中的各帧信号对应的所述方差是否大于第一阈值;
    若否,将该帧信号确定为噪音信号。
  4. 根据权利要求3所述的方法,其特征在于,根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差,包括:
    根据所述功率谱对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中;
    确定所述第一功率值集合中包含的功率值的第一方差;
    则,判断所述方差是否大于第一阈值,包括:
    判断所述第一方差是否大于第一阈值。
  5. 根据权利要求1所述的方法,其特征在于,根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差,包括:
    根据每个帧信号对应的各功率值对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中、及与第二频率区间对应的第二功率值集合中;其中,所述第一频率区间小于所述第二频率区间;
    确定所述第一功率值集合中包含的功率值的第一方差;
    确定所述第二功率值集合中包含的功率值的第二方差;
    则,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:
    判断与每个帧信号对应的所述第一方差与所述第二方差的差值是否大于第二阈值;
    若否,将该帧信号确定为噪音信号。
  6. 根据权利要求1所述的方法,其特征在于,根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差之后,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号之前,所述方法还包括:
    将所述待分析的语音信号片段中的各帧信号按照所述方差的大小进行排序;
    则,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:
    基于排序得到的各帧信号关于各频率下的功率值的方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
  7. 一种语音去噪方法,其特征在于,包括:
    确定待处理语音中包含的待分析的语音信号片段;
    对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
    根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
    根据所述方差确定所述语音信号片段中的各帧信号是否为噪音信号,获得所述语音信号片段中包含的若干噪音帧;
    确定与所述语音信号片段中包含的若干噪音帧对应的功率均值,并依据所述噪音帧的功率均值进行所述待处理语音的语音去噪处理。
  8. 根据权利要求7所述的方法,其特征在于,确定待处理语音中包含的待分析的语音信号片段,包括:
    根据待处理语音的时域信号的幅度变化,确定该待处理语音中的包含的一段幅度变化小于预设阈值的语音信号片段为所述待分析的语音信号片段;
    或,截取待处理语音中的前N帧语音信号作为所述待分析的语音信号片段。
  9. 根据权利要求7所述的方法,其特征在于,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:
    判断与所述语音信号片段中的各帧信号对应的所述方差是否大于第一阈值;
    若否,将该帧信号确定为噪音信号。
  10. 根据权利要求9所述的方法,其特征在于,根据所述帧信号的功率谱,确定所 述语音信号片段中各帧信号关于各频率下的功率值的方差,包括:
    根据所述功率谱对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中;
    确定所述第一功率值集合中包含的功率值的第一方差;
    则,判断所述方差是否大于第一阈值,包括:
    判断所述第一方差是否大于第一阈值。
  11. 根据权利要求7所述的方法,其特征在于,根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差,包括:
    根据每个帧信号对应的各功率值对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中、及与第二频率区间对应的第二功率值集合中;其中,所述第一频率区间小于所述第二频率区间;
    确定所述第一功率值集合中包含的功率值的第一方差;
    确定所述第二功率值集合中包含的功率值的第二方差;
    则,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:
    判断与每个帧信号对应的所述第一方差与所述第二方差的差值是否大于第二阈值;
    若否,将该帧信号确定为噪音信号。
  12. 根据权利要求7所述的方法,其特征在于,根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差之后,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号之前,所述方法还包括:
    将所述待分析的语音信号片段中的各帧信号按照所述方差的大小进行排序;
    则,根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号,包括:
    基于排序得到的各帧信号关于各频率下的功率值的方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
  13. 一种噪音信号确定装置,其特征在于,包括:
    功率谱获取单元,用于对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
    方差确定单元,用于根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
    噪音确定单元,用于根据所述方差,确定所述语音信号片段中的各帧信号是否为噪音信号。
  14. 根据权利要求13所述的装置,其特征在于,所述装置还包括:
    片段获取单元,用于:
    根据待处理语音的时域信号的幅度变化,确定该待处理语音中的包含的一段幅度变化小于预设阈值的语音信号片段为所述待分析的语音信号片段;
    或,截取待处理语音中的前N帧语音信号作为所述待分析的语音信号片段。
  15. 根据权利要求13所述的装置,其特征在于,所述噪音确定单元用于:
    判断与所述语音信号片段中的各帧信号对应的所述方差是否大于第一阈值;
    若否,将所述帧信号确定为噪音信号。
  16. 根据权利要求13所述的装置,其特征在于,所述方差确定单元用于:
    根据所述功率谱对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中;
    确定所述第一功率值集合中包含的功率值的第一方差;
    则,所述噪音确定单元用于:
    判断所述第一方差是否大于第一阈值;
    若否,将该帧信号确定为噪音信号。
  17. 根据权利要求13所述的装置,其特征在于,所述方差确定单元具体用以:
    根据每个帧信号对应的各功率值对应的频率所处的频率区间,至少将该帧信号在各个频率的功率值归入与第一频率区间对应的第一功率值集合中、及与第二频率区间对应的第二功率值集合中;其中,所述第一频率区间小于所述第二频率区间;
    确定所述第一功率值集合中包含的功率值的第一方差;
    确定所述第二功率值集合中包含的功率值的第二方差;
    则,所述噪音确定单元用于:
    判断与每个帧信号对应的所述第一方差与所述第二方差的差值是否大于第二阈值;
    若否,将该帧信号确定为噪音信号。
  18. 一种语音去噪装置,其特征在于,包括:
    片段确定单元,用于确定待处理语音中包含的待分析的语音信号片段;
    功率谱获取单元,用于对待分析的语音信号片段中的各帧信号作傅里叶变换,得到该语音信号片段中的各帧信号的功率谱;
    方差确定单元,用于根据所述帧信号的功率谱,确定所述语音信号片段中各帧信号关于各频率下的功率值的方差;
    噪音确定单元,用于根据所述方差确定所述语音信号片段中的各帧信号是否为噪音信号,获得所述语音信号片段中包含的若干噪音帧;
    语音去噪单元,用于确定与所述语音信号片段中包含的若干噪音帧对应的功率均值,并依据所述噪音帧的功率均值进行所述待处理语音的语音去噪处理。
PCT/CN2016/101444 2015-10-13 2016-10-08 噪音信号确定方法、语音去噪方法及装置 WO2017063516A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
JP2018519388A JP6784758B2 (ja) 2015-10-13 2016-10-08 ノイズ信号判定方法及び装置並びに音声ノイズ除去方法及び装置
ES16854895T ES2807529T3 (es) 2015-10-13 2016-10-08 Método para la determinación de señal de ruido y aparato del mismo
EP16854895.6A EP3364413B1 (en) 2015-10-13 2016-10-08 Method of determining noise signal and apparatus thereof
SG11201803004YA SG11201803004YA (en) 2015-10-13 2016-10-08 Noise signal determining method and apparatus and voice denoising method and apparatus
KR1020187013177A KR102208855B1 (ko) 2015-10-13 2016-10-08 노이즈 신호 결정 방법과 장치, 및 음성 노이즈 제거 방법과 장치
PL16854895T PL3364413T3 (pl) 2015-10-13 2016-10-08 Sposób określania sygnału szumu i przeznaczone do tego urządzenie
US15/951,928 US10796713B2 (en) 2015-10-13 2018-04-12 Identification of noise signal for voice denoising device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510670697.8 2015-10-13
CN201510670697.8A CN106571146B (zh) 2015-10-13 2015-10-13 噪音信号确定方法、语音去噪方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/951,928 Continuation US10796713B2 (en) 2015-10-13 2018-04-12 Identification of noise signal for voice denoising device

Publications (1)

Publication Number Publication Date
WO2017063516A1 true WO2017063516A1 (zh) 2017-04-20

Family

ID=58508605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/101444 WO2017063516A1 (zh) 2015-10-13 2016-10-08 噪音信号确定方法、语音去噪方法及装置

Country Status (9)

Country Link
US (1) US10796713B2 (zh)
EP (1) EP3364413B1 (zh)
JP (1) JP6784758B2 (zh)
KR (1) KR102208855B1 (zh)
CN (1) CN106571146B (zh)
ES (1) ES2807529T3 (zh)
PL (1) PL3364413T3 (zh)
SG (2) SG11201803004YA (zh)
WO (1) WO2017063516A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986839A (zh) * 2017-06-01 2018-12-11 瑟恩森知识产权控股有限公司 减少音频信号中的噪声

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102096533B1 (ko) * 2018-09-03 2020-04-02 국방과학연구소 음성 구간을 검출하는 방법 및 장치
CN110689901B (zh) * 2019-09-09 2022-06-28 苏州臻迪智能科技有限公司 语音降噪的方法、装置、电子设备及可读存储介质
JP7331588B2 (ja) * 2019-09-26 2023-08-23 ヤマハ株式会社 情報処理方法、推定モデル構築方法、情報処理装置、推定モデル構築装置およびプログラム
KR20220018271A (ko) 2020-08-06 2022-02-15 라인플러스 주식회사 딥러닝을 이용한 시간 및 주파수 분석 기반의 노이즈 제거 방법 및 장치
EP4273860A1 (en) * 2020-12-31 2023-11-08 Shenzhen Shokz Co., Ltd. Audio generation method and system
CN112967738A (zh) * 2021-02-01 2021-06-15 腾讯音乐娱乐科技(深圳)有限公司 人声检测方法、装置及电子设备和计算机可读存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03180900A (ja) * 1989-12-11 1991-08-06 Sanyo Electric Co Ltd 音声認識装置の雑音除去システム
EP2031583A1 (en) * 2007-08-31 2009-03-04 Harman Becker Automotive Systems GmbH Fast estimation of spectral noise power density for speech signal enhancement
JP2009216733A (ja) * 2008-03-06 2009-09-24 Nippon Telegr & Teleph Corp <Ntt> フィルタ推定装置、信号強調装置、フィルタ推定方法、信号強調方法、プログラム、記録媒体
CN101853661A (zh) * 2010-05-14 2010-10-06 中国科学院声学研究所 基于非监督学习的噪声谱估计与语音活动度检测方法
CN101968957A (zh) * 2010-10-28 2011-02-09 哈尔滨工程大学 一种噪声条件下的语音检测方法
CN102314883A (zh) * 2010-06-30 2012-01-11 比亚迪股份有限公司 一种判断音乐噪声的方法以及语音消噪方法
CN102800322A (zh) * 2011-05-27 2012-11-28 中国科学院声学研究所 一种噪声功率谱估计与语音活动性检测方法
CN103489446A (zh) * 2013-10-10 2014-01-01 福州大学 复杂环境下基于自适应能量检测的鸟鸣识别方法
CN103632677A (zh) * 2013-11-27 2014-03-12 腾讯科技(成都)有限公司 带噪语音信号处理方法、装置及服务器
CN103903629A (zh) * 2012-12-28 2014-07-02 联芯科技有限公司 基于隐马尔科夫链模型的噪声估计方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0836400A (ja) * 1994-07-25 1996-02-06 Kokusai Electric Co Ltd 音声状態判定回路
US6529868B1 (en) * 2000-03-28 2003-03-04 Tellabs Operations, Inc. Communication system noise cancellation power signal calculation techniques
US7299173B2 (en) * 2002-01-30 2007-11-20 Motorola Inc. Method and apparatus for speech detection using time-frequency variance
CN101197130B (zh) 2006-12-07 2011-05-18 华为技术有限公司 声音活动检测方法和声音活动检测器
US9047874B2 (en) 2007-03-06 2015-06-02 Nec Corporation Noise suppression method, device, and program
JP4327886B1 (ja) 2008-05-30 2009-09-09 株式会社東芝 音質補正装置、音質補正方法及び音質補正用プログラム
US8989403B2 (en) 2010-03-09 2015-03-24 Mitsubishi Electric Corporation Noise suppression device
JP4937393B2 (ja) 2010-09-17 2012-05-23 株式会社東芝 音質補正装置及び音声補正方法

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03180900A (ja) * 1989-12-11 1991-08-06 Sanyo Electric Co Ltd 音声認識装置の雑音除去システム
EP2031583A1 (en) * 2007-08-31 2009-03-04 Harman Becker Automotive Systems GmbH Fast estimation of spectral noise power density for speech signal enhancement
JP2009216733A (ja) * 2008-03-06 2009-09-24 Nippon Telegr & Teleph Corp <Ntt> フィルタ推定装置、信号強調装置、フィルタ推定方法、信号強調方法、プログラム、記録媒体
CN101853661A (zh) * 2010-05-14 2010-10-06 中国科学院声学研究所 基于非监督学习的噪声谱估计与语音活动度检测方法
CN102314883A (zh) * 2010-06-30 2012-01-11 比亚迪股份有限公司 一种判断音乐噪声的方法以及语音消噪方法
CN101968957A (zh) * 2010-10-28 2011-02-09 哈尔滨工程大学 一种噪声条件下的语音检测方法
CN102800322A (zh) * 2011-05-27 2012-11-28 中国科学院声学研究所 一种噪声功率谱估计与语音活动性检测方法
CN103903629A (zh) * 2012-12-28 2014-07-02 联芯科技有限公司 基于隐马尔科夫链模型的噪声估计方法和装置
CN103489446A (zh) * 2013-10-10 2014-01-01 福州大学 复杂环境下基于自适应能量检测的鸟鸣识别方法
CN103632677A (zh) * 2013-11-27 2014-03-12 腾讯科技(成都)有限公司 带噪语音信号处理方法、装置及服务器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3364413A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986839A (zh) * 2017-06-01 2018-12-11 瑟恩森知识产权控股有限公司 减少音频信号中的噪声

Also Published As

Publication number Publication date
SG10202005490WA (en) 2020-07-29
US20180293997A1 (en) 2018-10-11
EP3364413B1 (en) 2020-06-10
PL3364413T3 (pl) 2020-10-19
US10796713B2 (en) 2020-10-06
KR102208855B1 (ko) 2021-01-29
JP2018534618A (ja) 2018-11-22
EP3364413A4 (en) 2019-06-26
KR20180067608A (ko) 2018-06-20
ES2807529T3 (es) 2021-02-23
JP6784758B2 (ja) 2020-11-11
SG11201803004YA (en) 2018-05-30
CN106571146B (zh) 2019-10-15
EP3364413A1 (en) 2018-08-22
CN106571146A (zh) 2017-04-19

Similar Documents

Publication Publication Date Title
WO2017063516A1 (zh) 噪音信号确定方法、语音去噪方法及装置
WO2016095626A1 (zh) 监控进程的方法和装置
CN106850511B (zh) 识别访问攻击的方法及装置
WO2016015461A1 (zh) 异常帧检测方法和装置
US9997168B2 (en) Method and apparatus for signal extraction of audio signal
AU2014386442B2 (en) Method for detecting audio signal and apparatus
US20190311297A1 (en) Anomaly detection and processing for seasonal data
CN108847253B (zh) 车辆型号识别方法、装置、计算机设备及存储介质
WO2021000498A1 (zh) 复合语音识别方法、装置、设备及计算机可读存储介质
WO2017045429A1 (zh) 一种音频数据的检测方法、系统及存储介质
JP2018534618A5 (zh)
CN106034240A (zh) 视频检测方法及装置
EP3292819B1 (en) Noisy signal identification from non-stationary audio signals
US20180091390A1 (en) Data validation across monitoring systems
WO2015074493A1 (zh) 一种低频点击的过滤方法、装置、计算机程序以及计算机可读介质
CN117076941A (zh) 一种光缆鸟害监测方法、系统、电子设备及可读存储介质
JP2016191788A (ja) 音響処理装置、音響処理方法、及び、プログラム
CN113421590B (zh) 异常行为检测方法、装置、设备及存储介质
US10109298B2 (en) Information processing apparatus, computer readable storage medium, and information processing method
CN107229621B (zh) 差异数据的清洗方法及装置
CN110543965B (zh) 基线预测方法、基线预测装置、电子设备和介质
CN112863548A (zh) 训练音频检测模型的方法、音频检测方法及其装置
US9069849B1 (en) Methods for enforcing time alignment for speed resistant audio matching
Gao et al. A Method Using EEMD and L-Kurtosis to detect faults in roller bearings
TW202030641A (zh) 服裝的計件方法、裝置及設備

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16854895

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 11201803004Y

Country of ref document: SG

WWE Wipo information: entry into national phase

Ref document number: 2018519388

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20187013177

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2016854895

Country of ref document: EP