WO2017063516A1 - 噪音信号确定方法、语音去噪方法及装置 - Google Patents

噪音信号确定方法、语音去噪方法及装置 Download PDF

Info

Publication number
WO2017063516A1
WO2017063516A1 PCT/CN2016/101444 CN2016101444W WO2017063516A1 WO 2017063516 A1 WO2017063516 A1 WO 2017063516A1 CN 2016101444 W CN2016101444 W CN 2016101444W WO 2017063516 A1 WO2017063516 A1 WO 2017063516A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
variance
segment
determining
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/101444
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
杜志军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to PL16854895T priority Critical patent/PL3364413T3/pl
Priority to EP16854895.6A priority patent/EP3364413B1/en
Priority to JP2018519388A priority patent/JP6784758B2/ja
Priority to KR1020187013177A priority patent/KR102208855B1/ko
Priority to SG11201803004YA priority patent/SG11201803004YA/en
Priority to ES16854895T priority patent/ES2807529T3/es
Publication of WO2017063516A1 publication Critical patent/WO2017063516A1/zh
Priority to US15/951,928 priority patent/US10796713B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02168Noise filtering characterised by the method used for estimating noise the estimation exclusively taking place during speech pauses
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Definitions

  • the present application relates to the field of voice denoising technology, and in particular, to a noise signal determining method, a voice denoising method, and a device.
  • Speech denoising is a technique that improves speech quality by removing ambient noise in speech signals. In the process of speech denoising, it is first necessary to determine the power spectrum of the noise signal in the speech signal, and then denoise according to the determined power spectrum of the noise signal.
  • the method for determining the power spectrum of the noise signal in the voice signal is generally: assuming that the first N frame signal in a voice signal is a noise signal (ie, does not include a human voice signal), thereby passing the first N frame signal. An analysis is performed to obtain a power spectrum of the noise signal in the speech signal.
  • the prior art determines the first N frame signal in the voice signal as a noise signal in a hypothetical manner, and the first N frame signal obtained by the assumed method does not match the actual noise signal, thereby affecting the acquisition.
  • the accuracy of the power spectrum of the noise signal is a reference to determine the first N frame signal in the voice signal as a noise signal in a hypothetical manner, and the first N frame signal obtained by the assumed method does not match the actual noise signal, thereby affecting the acquisition. The accuracy of the power spectrum of the noise signal.
  • the purpose of the embodiment of the present application is to provide a noise signal determining method, a voice denoising method, and a device, so as to solve the problem that the first N frame signal obtained by the assumption in the prior art does not match the actual noise signal, thereby affecting the acquired noise signal.
  • the problem of the accuracy of the power spectrum is to provide a noise signal determining method, a voice denoising method, and a device, so as to solve the problem that the first N frame signal obtained by the assumption in the prior art does not match the actual noise signal, thereby affecting the acquired noise signal.
  • the noise signal determining method the voice denoising method, and the apparatus provided by the embodiments of the present application are implemented as follows:
  • a method for determining a noise signal comprising:
  • a speech denoising method comprising:
  • a noise signal determining device includes:
  • a power spectrum acquisition unit configured to perform Fourier transform on each frame signal in the speech signal segment to be analyzed, to obtain a power spectrum of each frame signal in the speech signal segment;
  • a variance determining unit configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency
  • a noise determining unit configured to determine, according to the variance, whether each frame signal in the segment of the voice signal is a noise signal.
  • a speech denoising device comprising:
  • a segment determining unit configured to determine a segment of the speech signal to be analyzed included in the to-be-processed speech
  • a power spectrum acquisition unit configured to perform Fourier transform on each frame signal in the speech signal segment to be analyzed, to obtain a power spectrum of each frame signal in the speech signal segment;
  • a variance determining unit configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency
  • a noise determining unit configured to determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal, and obtain a plurality of noise frames included in the voice signal segment;
  • a voice denoising unit configured to determine a power average corresponding to the plurality of noise frames included in the voice signal segment, and perform voice denoising processing of the to-be-processed voice according to the power average of the noise frame.
  • the method for determining a noise signal provided by the embodiment of the present application can be seen by the technical solution provided by the embodiment of the present application.
  • the method and device for denoising a speech performing Fourier transform on the segment of the speech signal to be analyzed to obtain a power spectrum of each frame signal, and determining a variance of each frame signal in the speech signal segment to be analyzed with respect to a power value at each frequency, and finally Determining whether the frame signal is a noise signal according to the variance, thereby accurately obtaining a plurality of noise frames included in the voice signal segment to be analyzed; in the process of voice denoising, according to the power average of the plurality of noise frames determined above
  • the processing of the speech is performed to perform denoising processing, thereby improving the speech denoising effect.
  • FIG. 1 is a flowchart of a method for determining a noise signal according to an embodiment of the present application
  • FIG. 2 is a flowchart of a step of determining whether a frame signal is a noise signal in an embodiment of the present application
  • FIG. 3 is a flowchart of a step of determining a variance of a power value of a frame signal at each sampling point in the embodiment of the present application;
  • FIG. 5 is a flowchart of a voice denoising method according to an embodiment of the present application.
  • FIG. 6 is a block diagram of a noise signal determining apparatus according to an embodiment of the present application.
  • FIG. 7 is a block diagram of a voice denoising device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of hardware implementation of the apparatus provided by the present application.
  • the noise signal determining method of the embodiment includes the following steps:
  • S101 Perform Fourier transform on each frame signal in the segment of the speech signal to be analyzed to obtain the segment of the speech signal. The power spectrum of each frame signal.
  • the segment of the speech signal to be analyzed may be intercepted from the speech to be processed by certain rules.
  • the segment of the speech signal to be analyzed may be a "suspected noise frame segment" that may initially contain more noise frames.
  • the method further includes:
  • determining, according to the amplitude change of the time domain signal of the to-be-processed voice, a segment of the speech signal included in the to-be-processed speech whose amplitude variation is less than a preset threshold is the segment of the speech signal to be analyzed.
  • the noise signal is usually a segment of the speech signal with a small amplitude or a relatively uniform amplitude, and the speech signal segment containing the speech of the person usually fluctuates greatly.
  • a preset threshold for identifying a "suspected noise frame segment" contained in the speech to be processed ie, the speech to be denoised
  • the segment of the speech signal included in the to-be-processed speech whose amplitude variation is less than the preset threshold may be determined as the segment of the speech signal to be analyzed.
  • the speech signal is first subjected to frame processing
  • the frame signal refers to a single frame speech signal
  • a segment of the speech signal includes a frame signal of several frames.
  • a frame signal may include several sampling points, such as: 1024 sample points, and adjacent two frame signals may overlap each other (for example, the coincidence degree is 50%).
  • the power spectrum (frequency domain) of the speech signal can be obtained by performing short-time Fourier transform (STFT) on the speech signal in the time domain.
  • STFT short-time Fourier transform
  • the power spectrum contains a plurality of power values corresponding to different frequencies, such as: 1024 power values.
  • a voice signal in a voice signal including a human voice, may be a noise signal (ambient noise) before a person starts speaking, by a period of time (eg, 1.5 s).
  • the embodiment of the present application may determine that the voice signal to be analyzed is a frame signal of a first N frame in a voice signal, for example, the voice signal to be analyzed is a voice signal of the first 1.5 seconds: ⁇ f 1 ', f' 2 , ..., f' n ⁇ , where f 1 ', f' 2 , ..., f' n respectively refer to respective frame signals contained in the speech signal.
  • the purpose of the embodiment of the present application is to determine which of the analyzed speech signals are noise signals.
  • a plurality of power values corresponding to each frame signal can be calculated.
  • the power spectrum of a certain frame signal at a certain frequency is a+bi
  • the real part a can represent the amplitude
  • the imaginary part b can represent the phase
  • the power value of the frame signal at the frequency is: a 2 + b 2 .
  • each frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇ contains 1024 sample points
  • 1024 power values of each frame signal at different frequencies can be obtained according to the power spectrum.
  • the power value corresponding to the frame signal f 1 ' is:
  • the power value corresponding to the frame signal f' 2 is:
  • the power value corresponding to the frame signal f' n is:
  • S102 Determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency.
  • the respective frame signals ⁇ f 1 ', f' 2 , ..., f' n ⁇ at respective frequencies the respective frame signals ⁇ f 1 ', f' 2 , ..., f can be respectively calculated according to the variance calculation formula.
  • ' n The variance of the power value ⁇ Var(f 1 '), Var(f' 2 ), ..., Var(f' n ) ⁇ .
  • Var(f 1 ') is about Variance
  • Var(f' 2 ) is about Variance
  • ..., Var(f' n ) is about Variance.
  • S103 Determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal.
  • the energy (ie, power value) of the frame signal including the segmented segment has a large change with the frequency band.
  • the energy of a frame signal (ie, a noise signal) that does not contain a segment of speech is relatively small as the frequency band changes, and the distribution is relatively uniform. Therefore, whether the frame signal is a noise signal can be determined according to the variance of each frame signal with respect to the power value.
  • step S103 may include:
  • S1031 Determine whether a variance of the frame signal with respect to the power value is greater than a first threshold T 1 .
  • the variance of a certain frame signal with respect to the power value exceeds the first threshold value T 1 , it indicates that the energy of the frame signal (ie, the power value) varies with the frequency band by more than the first threshold value T 1 , so that it can be determined that the frame signal is not a noise signal.
  • the variance of a certain frame signal with respect to the power value does not exceed the first threshold value T 1 , it indicates that the energy (ie, the power value) of the frame signal does not exceed the first threshold value T 1 with the frequency band, so that the The frame signal is a noise signal.
  • the speech signals to be analyzed can be sequentially determined: the frame signals ⁇ f 1 ', f' 2 , ..., f' belonging to the noise signal in ⁇ f 1 ', f' 2 , ..., f' n ⁇ m ⁇ and the frame signals ⁇ f' m+1 , f' m+2 , . . . , f' n ⁇ which are not part of the noise signal, so that the noise signals contained in a piece of speech signal can be determined, and according to these noise signals ⁇ f 1 ',f' 2 ,...,f' m ⁇ for speech denoising.
  • step S102 may specifically include:
  • the variance statistics are performed on each frame signal in the frequency domain. Since the non-noise signals are generally concentrated in the middle and low frequency bands, the noise signals are generally distributed uniformly in each frequency band, and therefore, for each frame signal corresponding to The power values of the respective frequencies respectively calculate the variance of at least two different frequency bands (ie, the above frequency intervals).
  • the first frequency interval may be 0 to 2000 Hz (low frequency band), and the second frequency interval may be 2000 to 4000 Hz (high frequency band).
  • the 1024 power values corresponding to each frame signal are respectively classified into the first power value set A corresponding to 0 to 2000 Hz according to the frequency interval, and 2000. ⁇ 4000 Hz corresponds to the second power value set B.
  • the corresponding 1024 power values are: Then, according to the frequency interval, the power value included in the first power value set A can be obtained, for example: The power value included in the first power value set A can be obtained, for example: And so on.
  • more than two frequency bands may be divided, and the variance of signal power values of two or more frequency bands may be separately counted.
  • S1022 Determine a first variance of the power value included in the first power value set.
  • the power value included in the first power value set A is obtained, for example:
  • the power value can be calculated according to the variance formula The first variance Var high (f 1 ').
  • S1021 Determine a second variance of the power values included in the second set of power values.
  • the power value included in the second power value set B is obtained, for example:
  • the power value can be calculated according to the variance formula The second variance Var low (f 1 ').
  • FIG. 4 it is a schematic diagram of a variance curve in the embodiment of the present application.
  • the horizontal axis represents the frame number of the frame signal
  • the vertical axis represents the magnitude of the variance
  • the first variance curve shows the trend of the first variance of each of the above frame signals, the first variance curve showing each of the above The trend of the second variance of the frame signal.
  • step S1031 may specifically include:
  • first variance of the frame signal with respect to the power value is greater than a first threshold T 1 . If so, it is determined that the frame signal is a noise signal. Taking the frame signal f 1 ' as an example, it is determined whether the first variance Var high (f 1 ') is greater than the first threshold T 1 .
  • step S103 may further include:
  • the frame signal is determined to be a noise signal.
  • the difference between the first variance and the second variance is:
  • the speech signals to be analyzed can be determined in sequence: which frame signals in ⁇ f 1 ', f' 2 , ..., f' n ⁇ are noise signals.
  • step S102 between step S102 and step S103, the method further includes:
  • each frame signal in the segment of the speech signal to be analyzed is sorted according to the size of the variance
  • determining, according to the variance, whether each frame signal in the voice signal segment is a noise signal comprising: determining, according to a variance of each frame signal obtained by sorting, a power value at each frequency, determining the voice signal segment Whether each frame signal is a noise signal.
  • the present embodiment can separately determine the frame signal: ⁇ f 1 ', f' 2 , ..., f' n ⁇ with respect to the variance of the power value: ⁇ Var(f 1 '), Var(f' 2 ),... , Var(f' n ) ⁇ .
  • the frame signals are sorted according to the variance of the power values from small to large. The smaller the variance, the more likely the noise signal is. Therefore, the frame signals belonging to the noise signals among the speech signals to be analyzed can be sorted to the forefront by sorting.
  • the variances of the low frequency band (for example, 0 to 2000 Hz) and the high frequency band (for example, 2000 to 4000 Hz) are respectively counted, according to each frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇
  • the frequency interval in which the frequency corresponding to the power spectrum is located the power value of each frame signal at each frequency is classified into the first power value set A corresponding to the first frequency interval (for example, 0 to 2000 Hz), and
  • the second frequency range (for example, 2000 to 4000 Hz) corresponds to the second power value set B.
  • the first variance ⁇ Var low (f 1 ') of the power value included in the first power value set corresponding to the frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇ is determined, Var low ( f' 2 ), ..., Var low (f' n ) ⁇ ; respectively determining the second power value included in the second power value set corresponding to the frame signal ⁇ f 1 ', f' 2 , ..., f' n ⁇ Variance ⁇ Var high (f 1 '), Var high (f' 2 ),..., Var high (f' n ) ⁇ .
  • the above step S104 may determine the noise signal included in the speech signal to be analyzed (which may be a speech signal sorted according to the variance size) as follows:
  • the second variance Var high (f' i-1 ) of the previous frame signal f' i-1 of each frame signal f i ' with respect to the power value and the subsequent frame of the frame signal can be sequentially determined. Whether the difference Var high (f' i+1 )-Var high (f' i-1 ) of the signal f' i+1 with respect to the second variance Var high (f' i+1 ) of the power value is greater than the third Threshold T 3 , if not, the frame signal f i ' is determined as a noise frame signal; the determined set of noise frame signals is determined as a noise signal.
  • the first variance of the previous frame signal f'i -1 of each frame signal f i ' with respect to the power value Var low (f' i-1 ) and the latter of the frame signal can be sequentially determined. Whether the difference Var of the frame signal f' i+1 with respect to the first variance of the power value Var low (f' i+1 ) Var low (f' i+1 ) - Var low (f' i-1 ) is greater than the fourth Threshold T 4 , if not, the frame signal f i ' is determined as a noise frame signal; the determined set of noise frame signals is determined as a noise signal.
  • the noise frame included in the speech signal to be analyzed may be identified by the above formulas (1) to (4). That is, for any one of the frame signals f i ', if it satisfies any one of the above formulas (1) to (4), it can be determined that the frame signal is a non-noise signal (noise cutoff frame). In other words, for any one of the frame signals f i ', if none of the above formulas (1) to (4) is satisfied, it can be determined that the frame signal is a noise signal.
  • the noise cutoff frame f 'm then the noise frame comprises: ⁇ f 1', f ' 2, ..., f' m-1 ⁇ .
  • the noise cutoff frame can be determined by some formulas in the above formulas (1) to (4), such as: formula (1) and formula (2), formula (2) and Formula (3). Furthermore, the formula for determining the noise cutoff frame of the embodiment of the present application is not limited to the formulas listed above.
  • the above thresholds T 1 , T 2 , T 3 , and T 4 are all obtained by counting a large number of test samples.
  • FIG. 5 is a flowchart of a voice denoising method according to an embodiment of the present application, including:
  • S201 Determine a segment of the speech signal to be analyzed included in the to-be-processed speech.
  • S202 Perform Fourier transform on each frame signal in the segment of the speech signal to be analyzed to obtain a power spectrum of each frame signal in the segment of the speech signal.
  • S203 Determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency.
  • S204 Determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal, and obtain a plurality of noise frames included in the voice signal segment.
  • S205 Determine a power average corresponding to the plurality of noise frames included in the voice signal segment, and perform voice denoising processing on the to-be-processed voice according to the power average of the noise frame.
  • the speech denoising process can be performed. Since the denoising method is a technique well known in the art, it will not be described in detail herein.
  • the step of sorting the frame signals according to the variance may be omitted, and each frame of the original signal is directly determined to determine which frames are noise frames.
  • a part of the frame is usually taken to calculate the power spectrum estimation value P noise . For example, if the determined noise signal is 50 frames, The first 30 frames are intercepted to calculate the power spectrum estimation value P noise , and the accuracy of the power spectrum estimation value is improved.
  • the embodiment of the present application further provides a noise signal determining device.
  • the device can be implemented by software, or can be implemented by hardware or a combination of hardware and software.
  • the CPU Central Process Unit
  • the CPU reads the corresponding computer program instructions into the memory.
  • a hardware structure of the device can be seen in FIG.
  • FIG. 6 is a block diagram of a noise signal determining apparatus according to an embodiment of the present application.
  • the functions of the units in the device may correspond to the functions in the steps of the noise signal determining method.
  • the noise signal determining apparatus 100 includes:
  • the power spectrum acquisition unit 101 is configured to perform Fourier transform on each frame signal in the segment of the speech signal to be analyzed, Obtaining a power spectrum of each frame signal in the segment of the speech signal;
  • the variance determining unit 102 is configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency;
  • the noise determining unit 103 is configured to determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal.
  • the device further includes: a segment obtaining unit, configured to:
  • a segment of the speech signal included in the to-be-processed speech that has a magnitude change less than a preset threshold is the segment of the speech signal to be analyzed
  • the noise determining unit 103 is configured to:
  • the frame signal is determined to be a noise signal.
  • the variance determination unit 102 is configured to:
  • the noise determining unit 103 is configured to:
  • the frame signal is determined to be a noise signal.
  • the variance determining unit 102 is specifically configured to:
  • the power value of the frame signal at each frequency is classified into the first power value set corresponding to the first frequency interval, and the second a second power value set corresponding to the frequency interval; wherein the first frequency interval is smaller than the second frequency interval;
  • the noise determining unit 103 is configured to:
  • the frame signal is determined to be a noise signal.
  • the embodiment of the present application further provides a voice denoising device.
  • the device can be implemented by software, or can be implemented by hardware or a combination of hardware and software. Take software implementation as an example, as logic
  • the device in the sense is formed by the CPU (Central Process Unit) of the server reading the corresponding computer program instructions into the memory.
  • a hardware structure of the device can be seen in FIG.
  • FIG. 7 is a block diagram of a speech denoising apparatus according to an embodiment of the present application.
  • the functions of the units in the device may correspond to the functions in the steps of the voice denoising method.
  • the voice denoising apparatus 200 includes:
  • a segment determining unit 201 configured to determine a segment of the speech signal to be analyzed included in the to-be-processed speech
  • the power spectrum acquisition unit 202 performs Fourier transform on each frame signal in the speech signal segment to be analyzed to obtain a power spectrum of each frame signal in the speech signal segment;
  • the variance determining unit 203 is configured to determine, according to a power spectrum of the frame signal, a variance of each frame signal in the voice signal segment with respect to a power value at each frequency;
  • the noise determining unit 205 is configured to determine, according to the variance, whether each frame signal in the voice signal segment is a noise signal, and obtain a plurality of noise frames included in the voice signal segment;
  • the voice denoising unit 10 is configured to determine a power average corresponding to the plurality of noise frames included in the voice signal segment, and perform voice denoising processing of the to-be-processed voice according to the power average of the noise frame.
  • the apparatus further comprises a sorting unit 204 for:
  • each frame signal in the segment of the speech signal to be analyzed is sorted according to the size of the variance
  • the noise determining unit 205 is specifically configured to:
  • each frame signal in the segment of the speech signal is a noise signal.
  • the noise signal determining method and the voice denoising method and apparatus obtained a power spectrum of each frame signal by Fourier transform of the speech signal segment to be analyzed, and determine each frame signal in the speech signal segment to be analyzed. Regarding the variance of the power values at each frequency, finally determining whether the frame signal is a noise signal according to the variance described above, thereby accurately obtaining a plurality of noise frames included in the speech signal segment to be analyzed; in the process of speech denoising, The denoising process can be performed on the processed speech according to the power average of the plurality of noise frames determined above, thereby improving the speech denoising effect.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention may be embodied in one or more of the computer-usable program code embodied therein.
  • the computer is in the form of a computer program product embodied on a storage medium, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the application can be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the present application can also be practiced in distributed computing environments in these distributed computing environments. The task is performed by a remote processing device that is connected through a communication network.
  • program modules can be located in both local and remote computer storage media including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephonic Communication Services (AREA)
  • Noise Elimination (AREA)
  • Mobile Radio Communication Systems (AREA)
PCT/CN2016/101444 2015-10-13 2016-10-08 噪音信号确定方法、语音去噪方法及装置 Ceased WO2017063516A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
PL16854895T PL3364413T3 (pl) 2015-10-13 2016-10-08 Sposób określania sygnału szumu i przeznaczone do tego urządzenie
EP16854895.6A EP3364413B1 (en) 2015-10-13 2016-10-08 Method of determining noise signal and apparatus thereof
JP2018519388A JP6784758B2 (ja) 2015-10-13 2016-10-08 ノイズ信号判定方法及び装置並びに音声ノイズ除去方法及び装置
KR1020187013177A KR102208855B1 (ko) 2015-10-13 2016-10-08 노이즈 신호 결정 방법과 장치, 및 음성 노이즈 제거 방법과 장치
SG11201803004YA SG11201803004YA (en) 2015-10-13 2016-10-08 Noise signal determining method and apparatus and voice denoising method and apparatus
ES16854895T ES2807529T3 (es) 2015-10-13 2016-10-08 Método para la determinación de señal de ruido y aparato del mismo
US15/951,928 US10796713B2 (en) 2015-10-13 2018-04-12 Identification of noise signal for voice denoising device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510670697.8 2015-10-13
CN201510670697.8A CN106571146B (zh) 2015-10-13 2015-10-13 噪音信号确定方法、语音去噪方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/951,928 Continuation US10796713B2 (en) 2015-10-13 2018-04-12 Identification of noise signal for voice denoising device

Publications (1)

Publication Number Publication Date
WO2017063516A1 true WO2017063516A1 (zh) 2017-04-20

Family

ID=58508605

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/101444 Ceased WO2017063516A1 (zh) 2015-10-13 2016-10-08 噪音信号确定方法、语音去噪方法及装置

Country Status (9)

Country Link
US (1) US10796713B2 (enExample)
EP (1) EP3364413B1 (enExample)
JP (1) JP6784758B2 (enExample)
KR (1) KR102208855B1 (enExample)
CN (1) CN106571146B (enExample)
ES (1) ES2807529T3 (enExample)
PL (1) PL3364413T3 (enExample)
SG (2) SG10202005490WA (enExample)
WO (1) WO2017063516A1 (enExample)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986839A (zh) * 2017-06-01 2018-12-11 瑟恩森知识产权控股有限公司 减少音频信号中的噪声
CN115249484A (zh) * 2021-04-27 2022-10-28 大众问问(北京)信息科技有限公司 语音信号处理方法、装置、计算机设备和存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102096533B1 (ko) * 2018-09-03 2020-04-02 국방과학연구소 음성 구간을 검출하는 방법 및 장치
CN110689901B (zh) * 2019-09-09 2022-06-28 苏州臻迪智能科技有限公司 语音降噪的方法、装置、电子设备及可读存储介质
JP7331588B2 (ja) * 2019-09-26 2023-08-23 ヤマハ株式会社 情報処理方法、推定モデル構築方法、情報処理装置、推定モデル構築装置およびプログラム
CN114746939B (zh) * 2019-12-13 2025-09-30 三菱电机株式会社 信息处理装置、检测方法和记录介质
KR102784793B1 (ko) 2020-08-06 2025-03-21 라인플러스 주식회사 딥러닝을 이용한 시간 및 주파수 분석 기반의 노이즈 제거 방법 및 장치
WO2022141364A1 (zh) 2020-12-31 2022-07-07 深圳市韶音科技有限公司 生成音频的方法和系统
CN112967738B (zh) * 2021-02-01 2024-06-14 腾讯音乐娱乐科技(深圳)有限公司 人声检测方法、装置及电子设备和计算机可读存储介质
US20240257823A1 (en) * 2023-01-30 2024-08-01 MIXHalo Corp. Systems and methods for remote real-time audio monitoring
CN119865647A (zh) * 2024-12-23 2025-04-22 海信视像科技股份有限公司 一种显示设备、服务器及其音频降噪和模型训练方法

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03180900A (ja) * 1989-12-11 1991-08-06 Sanyo Electric Co Ltd 音声認識装置の雑音除去システム
EP2031583A1 (en) * 2007-08-31 2009-03-04 Harman Becker Automotive Systems GmbH Fast estimation of spectral noise power density for speech signal enhancement
JP2009216733A (ja) * 2008-03-06 2009-09-24 Nippon Telegr & Teleph Corp <Ntt> フィルタ推定装置、信号強調装置、フィルタ推定方法、信号強調方法、プログラム、記録媒体
CN101853661A (zh) * 2010-05-14 2010-10-06 中国科学院声学研究所 基于非监督学习的噪声谱估计与语音活动度检测方法
CN101968957A (zh) * 2010-10-28 2011-02-09 哈尔滨工程大学 一种噪声条件下的语音检测方法
CN102314883A (zh) * 2010-06-30 2012-01-11 比亚迪股份有限公司 一种判断音乐噪声的方法以及语音消噪方法
CN102800322A (zh) * 2011-05-27 2012-11-28 中国科学院声学研究所 一种噪声功率谱估计与语音活动性检测方法
CN103489446A (zh) * 2013-10-10 2014-01-01 福州大学 复杂环境下基于自适应能量检测的鸟鸣识别方法
CN103632677A (zh) * 2013-11-27 2014-03-12 腾讯科技(成都)有限公司 带噪语音信号处理方法、装置及服务器
CN103903629A (zh) * 2012-12-28 2014-07-02 联芯科技有限公司 基于隐马尔科夫链模型的噪声估计方法和装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0836400A (ja) * 1994-07-25 1996-02-06 Kokusai Electric Co Ltd 音声状態判定回路
US6529868B1 (en) * 2000-03-28 2003-03-04 Tellabs Operations, Inc. Communication system noise cancellation power signal calculation techniques
US7299173B2 (en) * 2002-01-30 2007-11-20 Motorola Inc. Method and apparatus for speech detection using time-frequency variance
CN101197130B (zh) 2006-12-07 2011-05-18 华为技术有限公司 声音活动检测方法和声音活动检测器
WO2008111462A1 (ja) * 2007-03-06 2008-09-18 Nec Corporation 雑音抑圧の方法、装置、及びプログラム
JP4327886B1 (ja) 2008-05-30 2009-09-09 株式会社東芝 音質補正装置、音質補正方法及び音質補正用プログラム
EP2546831B1 (en) * 2010-03-09 2020-01-15 Mitsubishi Electric Corporation Noise suppression device
JP4937393B2 (ja) 2010-09-17 2012-05-23 株式会社東芝 音質補正装置及び音声補正方法

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03180900A (ja) * 1989-12-11 1991-08-06 Sanyo Electric Co Ltd 音声認識装置の雑音除去システム
EP2031583A1 (en) * 2007-08-31 2009-03-04 Harman Becker Automotive Systems GmbH Fast estimation of spectral noise power density for speech signal enhancement
JP2009216733A (ja) * 2008-03-06 2009-09-24 Nippon Telegr & Teleph Corp <Ntt> フィルタ推定装置、信号強調装置、フィルタ推定方法、信号強調方法、プログラム、記録媒体
CN101853661A (zh) * 2010-05-14 2010-10-06 中国科学院声学研究所 基于非监督学习的噪声谱估计与语音活动度检测方法
CN102314883A (zh) * 2010-06-30 2012-01-11 比亚迪股份有限公司 一种判断音乐噪声的方法以及语音消噪方法
CN101968957A (zh) * 2010-10-28 2011-02-09 哈尔滨工程大学 一种噪声条件下的语音检测方法
CN102800322A (zh) * 2011-05-27 2012-11-28 中国科学院声学研究所 一种噪声功率谱估计与语音活动性检测方法
CN103903629A (zh) * 2012-12-28 2014-07-02 联芯科技有限公司 基于隐马尔科夫链模型的噪声估计方法和装置
CN103489446A (zh) * 2013-10-10 2014-01-01 福州大学 复杂环境下基于自适应能量检测的鸟鸣识别方法
CN103632677A (zh) * 2013-11-27 2014-03-12 腾讯科技(成都)有限公司 带噪语音信号处理方法、装置及服务器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3364413A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986839A (zh) * 2017-06-01 2018-12-11 瑟恩森知识产权控股有限公司 减少音频信号中的噪声
CN115249484A (zh) * 2021-04-27 2022-10-28 大众问问(北京)信息科技有限公司 语音信号处理方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
EP3364413A1 (en) 2018-08-22
CN106571146B (zh) 2019-10-15
KR20180067608A (ko) 2018-06-20
US10796713B2 (en) 2020-10-06
JP6784758B2 (ja) 2020-11-11
SG11201803004YA (en) 2018-05-30
EP3364413A4 (en) 2019-06-26
EP3364413B1 (en) 2020-06-10
KR102208855B1 (ko) 2021-01-29
US20180293997A1 (en) 2018-10-11
PL3364413T3 (pl) 2020-10-19
JP2018534618A (ja) 2018-11-22
ES2807529T3 (es) 2021-02-23
SG10202005490WA (en) 2020-07-29
CN106571146A (zh) 2017-04-19

Similar Documents

Publication Publication Date Title
WO2017063516A1 (zh) 噪音信号确定方法、语音去噪方法及装置
US11108787B1 (en) Securing a network device by forecasting an attack event using a recurrent neural network
WO2020173133A1 (zh) 情感识别模型的训练方法、情感识别方法、装置、设备及存储介质
CN103903633B (zh) 检测语音信号的方法和装置
CN106850511B (zh) 识别访问攻击的方法及装置
AU2014386442B2 (en) Method for detecting audio signal and apparatus
CN106098079B (zh) 音频信号的信号提取方法与装置
JP2018534618A5 (enExample)
WO2017045429A1 (zh) 一种音频数据的检测方法、系统及存储介质
US20180091390A1 (en) Data validation across monitoring systems
CN113918438A (zh) 服务器异常的检测方法、装置、服务器及存储介质
EP3292819B1 (en) Noisy signal identification from non-stationary audio signals
CN107688563A (zh) 一种同义词的识别方法及识别装置
CN114639391A (zh) 机械故障提示方法、装置、电子设备及存储介质
US11888718B2 (en) Detecting behavioral change of IoT devices using novelty detection based behavior traffic modeling
CN109284354B (zh) 脚本搜索方法、装置、计算机设备及存储介质
WO2015074493A1 (zh) 一种低频点击的过滤方法、装置、计算机程序以及计算机可读介质
JP2016191788A (ja) 音響処理装置、音響処理方法、及び、プログラム
HK1235538B (zh) 噪音信号确定方法、语音去噪方法及装置
CN110543965B (zh) 基线预测方法、基线预测装置、电子设备和介质
US9069849B1 (en) Methods for enforcing time alignment for speed resistant audio matching
CN108227038A (zh) 一种台风强度诊断方法、装置、服务器及存储介质
US11439320B2 (en) Biological-sound analysis device, biological-sound analysis method, program, and storage medium
HK1235538A1 (en) Noise signal determining method, and voice de-noising method and apparatus
HK1235538A (en) Noise signal determining method, and voice de-noising method and apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16854895

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 11201803004Y

Country of ref document: SG

WWE Wipo information: entry into national phase

Ref document number: 2018519388

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20187013177

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2016854895

Country of ref document: EP