CN106504763A - Based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction - Google Patents
Based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction Download PDFInfo
- Publication number
- CN106504763A CN106504763A CN201611191478.2A CN201611191478A CN106504763A CN 106504763 A CN106504763 A CN 106504763A CN 201611191478 A CN201611191478 A CN 201611191478A CN 106504763 A CN106504763 A CN 106504763A
- Authority
- CN
- China
- Prior art keywords
- signal
- frame
- spectrum
- speech
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000001228 spectrum Methods 0.000 claims abstract description 38
- 238000001514 detection method Methods 0.000 claims abstract description 19
- 238000001914 filtration Methods 0.000 claims abstract description 9
- 230000008569 process Effects 0.000 claims description 20
- 238000011084 recovery Methods 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 5
- 238000003491 array Methods 0.000 claims description 4
- 230000008521 reorganization Effects 0.000 claims description 2
- 230000006978 adaptation Effects 0.000 claims 1
- 238000005192 partition Methods 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 16
- 230000009466 transformation Effects 0.000 abstract description 7
- 239000011159 matrix material Substances 0.000 description 13
- 238000012880 independent component analysis Methods 0.000 description 10
- 238000005070 sampling Methods 0.000 description 9
- 108091006146 Channels Proteins 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000004913 activation Effects 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 239000004568 cement Substances 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 5
- 230000003595 spectral effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 239000000654 additive Substances 0.000 description 3
- 230000000996 additive effect Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002087 whitening effect Effects 0.000 description 2
- 206010019133 Hangover Diseases 0.000 description 1
- 108010076504 Protein Sorting Signals Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000009022 nonlinear effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000010363 phase shift Effects 0.000 description 1
- 239000011295 pitch Substances 0.000 description 1
- 238000011045 prefiltration Methods 0.000 description 1
- 230000000191 radiation effect Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000005654 stationary process Effects 0.000 description 1
- 238000005728 strengthening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a kind of microphone array multiple target sound enhancement method based on blind source separating and spectrum-subtraction, which comprises the following steps:Multi-channel multi-target signal is gathered by microphone array;Bandpass filtering treatment is carried out respectively to gathering single channel signal, to shield non-speech segment noise and interference, and preemphasis is processed;Carry out voice adding window sub-frame processing again, obtain frame signal, and each frame is transformed into frequency domain using short time discrete Fourier transform, extract amplitude spectrum, the phase spectrum of each frame;Detection voice signal starting endpoint and end caps, estimating noise power are composed;Based on the background noise that spectrum-subtraction reduces speech frame;The signal exported after spectrum-subtraction is combined with phase spectrum, Short-time Fourier inverse transformation is carried out, is obtained the voice signal of time domain;Blind source separating is carried out finally, each echo signal is obtained.The implementation method of the present invention is simple, and low to resource requirement, computation complexity is low, and can realize that Multiple Target Signals strengthen.
Description
Technical field
The invention belongs to signal processing technology and computer speech signal processing technology field, and in particular to a kind of based on battle array
The sound enhancement method of row mike.
Background technology
The target of speech enhan-cement is from containing noisy voice signal to extract raw tone as pure as possible, suppresses the back of the body
Scape noise, improves the quality of voice and improves the comfort level of hearer, make hearer not feel fatigue.It solve sound pollution,
Improve the aspects such as voice quality, the raising intelligibility of speech and play more and more important effect.Speech enhancement technique is voice letter
Number processing needs the problem of urgent solution after developing into the practical stage.In speech recognition, anti-noise jamming is improve discrimination one
Individual important step.With speech recognition application continuous expansion and enter the practical stage, needed using more efficiently language in a hurry
Sound enhancement techniques, strengthen speech recognition features, make voice readily identified.Voice signal is a kind of complicated nonlinear properties, such as
What from various mixing voice signals, particularly from Co-channel SPEECH interference in isolate required for voice signal be one very
Difficult Digital Signal Processing problem, any algorithm be impossible to filter noise completely, all it is difficult to all exist in all noises
In the case of keep higher subjective and objective appraisal performance.
Based on the typical workflow of microphone array voice enhancement method as shown in figure 1, under idiographic flow mainly includes
Row step:
1) according to demand, design meets the microphone array array structure of demand.
2) multicenter voice signal acquiring system, for gathering multicenter voice signal;
3) the multicenter voice signal to gathering carries out pretreatment, voice activation detection, communication channel delay estimation, echo signal
The pretreatment operation of orientation estimation etc..
4) speech enhan-cement is carried out using array voice enhancement algorithm, obtain more pure voice signal.
Step 1) in, one suitable microphone array array structure of design is very important.
Microphone array topological structure can be divided into one-dimensional linear array (including equidistant array, nested linear battle array and non-
Equidistant array), two-dimensional array (including uniform non-homogeneous circle, square formation) and three-dimensional volumetric array.In practice, apply more
Have uniform line array, nested linear array, uniform surface battle array etc..Research shows that array topology is to Microphone Array Speech system
Impact larger.And the design of array topology has substantial connection with the selection of multi channel signals model.
According to the distance of sound source distance arrays, acoustical signal model can be divided into far field model and near field model.Both
Difference be:Far field model uses plane wave model, and it ignores the amplitude difference of each channel receiving signal, information source facing arrays
For only one of which incident angle, the delay length between each array element is linear;Near field model uses sphere waveform, it
Consideration receives the amplitude difference between signal, and has an incident angle, the time delay between each array element for each array element
Length does not have obvious relation.Near field and far field divide the absolute standard of neither one, it is generally recognized that when information source and array center
Distance when being far longer than signal wavelength the information source be in far field;Otherwise it is then near field.
Generally, microphone array can be regarded as a device for carrying out spatial sampling, battle array similar to time sampling
Row sample frequency sufficiently high just must will not cause ambiguity of space angle, it is to avoid spacial aliasing.For a uniform line-array, spatial sampling
Rate is defined as:That is spatial sampling frequencies UsDetermined by microphone array spacing d.Consider that the neighbouring sample of same signal is poor
Not Wei a phase shift, defining normalized spatial frequency is:Wherein λ represents that wavelength, Φ represent incident angle.For
Avoid spacial aliasing, it is desirable to which normalized frequency U meets:Now the corresponding scope of incident angle is -90°≤Φ≤
90°, therefore neighboring microphones interval (microphone array spacing) should be:
Above-mentioned spatial sampling theorem discloses microphone array spacing, signal frequency and arrival bearing (incident angle Φ) three
Between relation.If being unsatisfactory for spatial sampling theorem, spacial aliasing phenomenon occurs.
For a homogenous linear microphone array, r is definedmFor sound source to m-th microphone array center straight line away from
From.Then the discrete signal of m-th mike output can be expressed as:xm[n]=s [n- Δ nm]+ηm[n], wherein s [n] are sound source
Signal, Δ nmSample point between the signal received for m-th mike and sound-source signal postpones, ηm[n] is m-th Mike
The noise signal that wind is received.Make Δ τmTime delay between the signal received for m-th mike and sound-source signal, then
There is following relation:Wherein fsFor time sampling frequency, c is sound wave propagation in space
Speed.It is possible thereby to set up the array signal matrix of microphone array output:
x1[n]=s [n- Δ n1]+η1[n]
x2[n]=s [n- Δ n2]+η2[n]
xN[n]=s [n- Δ nN]+ηN[n]
Element number of array of the N for array microphone.
In step 3) in, visually different Enhancement Method or once or subtract.
In pretreatment, preemphasis and pre-filtering are determined by voice signal characteristic.The purpose of pre-filtering has two:
1. each frequency domain components medium frequency of input signal is suppressed to exceed fs/ 2 institute is important, to prevent aliasing from disturbing;2. suppress 50Hz's
Power supply Hz noise.So prefilter must be a band filter, if thereon, lower limiting frequency be respectively fHAnd fL, then
fH=3400Hz, fL=60~100Hz, sample frequency fs=16000Hz.
As the average power spectra of voice signal is by glottal excitation and mouth and nose radiation effect, front end about 800Hz with
On fall by 6dB/ frequencys multiplication, so when speech signal spec-trum is sought, the higher corresponding composition of frequency is less, the frequency spectrum of HFS
Ask than the difficulty of low frequency part, be that this will carry out preemphasis process in pretreatment.The purpose of preemphasis is to lift HFS, makes
The frequency spectrum of signal becomes flat, is maintained at low frequency in the whole frequency band of high frequency, can seek frequency spectrum with same signal to noise ratio, in order to
Spectrum analyses or channel parameters analysis.Preemphasis can be realized by the preemphasis digital filter for lifting high frequency characteristics that it is general
It is order digital filter, based on its operation principle, can obtain the corresponding mode that increases is:S ' (n)=s (n)-α s (n+1),
In order to recover original signal, need to carry out the signal spectrum for doing preemphasis process of postemphasising, i.e. s ' ' (n)=s ' (n)+β s ' (n
+ 1) wherein, s (n) represents that sound-source signal, s ' (n) represent that increasing the signal after processing, s " (n) represents the letter after processing that postemphasises
Number,It is weighting factor with β, typically takes -0.8~0.95.
As voice signal is a kind of time varying signal of non-stationary, which produces process and the tight phase of the motion of phonatory organ
Close.And the state velocity of phonatory organ is slow-paced more compared with acoustical vibration, therefore voice signal may be considered and put down in short-term
Steady.Research finds, in the range of 5~50ms, speech spectral characteristics and some physical characteristic parameters are held essentially constant.Cause
This can be incorporated into the processing method in stationary process and theory in the middle of the short time treatment of voice signal, and voice signal is divided
For a lot of voice segments in short-term, each voice segments in short-term referred to as analysis frame.So, to the process of each frame signal just quite
In processing to the persistent signal that feature is fixed.Frame can both be continuous, it would however also be possible to employ overlapping framing, general frame length take
10~30ms.When fetching data, the overlapped portion of former frame and a later frame is referred to as frame and moves, frame move with the ratio of frame length be typically taken as 0~
1/2.Will be through windowing process, i.e., with certain window function w (n) and signal multiplication, so as to form adding window to the speech frame for taking out
Voice.The Main Function of adding window is to reduce the spectral leakage that is brought by sub-frame processing, this is because, framing is to voice signal
Block suddenly, equivalent to the periodic convolution of frequency spectrum and the rectangular window function frequency spectrum of voice signal.Side due to rectangular window frequency spectrum
Lobe is higher, and the frequency spectrum of signal can produce " hangover ", i.e. spectral leakage.For this purpose, Hamming window can be adopted, because Hamming window secondary lobe is most
Low, can be efficiently against leakage phenomenon, with smoother low-pass characteristic, the frequency spectrum for obtaining is smoother.
Between array element, the estimation of time delayses plays a very important role in whole microphone array speech enhancement:It and
Signal frequency has together decided on the directivity of wave beam and has estimated for the orientation to sound source.The direct shadow of time delay estimation precision
Ring the performance of speech processing system.Due to spatial sampling of the microphone array to voice signal so that the letter that mike is received
Number relative with reference microphone for have certain time delay.For making the maximum of the output of Wave beam forming point to alignment target signal
Source, the expectation voice signal for keeping each mike to receive are synchronously the important means for solving the problem.Typical time delay is estimated
Meter method has broad sense correlation time-delay estimate method, based on the delay time estimation method of adaptive-filtering, self-adaptive features decomposition, height
Rank accumulation amount estimation method etc..Wherein broad sense correlation time-delay estimate method application is the most universal.Assume that a pair of mikes connect
Receiving voice signal model is:x1(t)=s (t)+η1、x2(t)=s (t-D)+η2, wherein s (t) be sound-source signal, x1(t) and x2
T () is the signal that two mikes are received respectively, D is the sound transmission time delay between two mikes, η1And η2For additivity background
Noise.Assume s (t), η1、η2Orthogonal, ignore signal amplitude decay here.Then x1(t) and x2T the broad sense between () is mutual
Close function R12(τ) it is:
Wherein X1(ω) and X2(ω) x is respectively1(t) and x2The Fourier transform of (t), ψ12(ω) add for broad sense cross-correlation
Weight function.Different weighting functions are selected for different situations so that R12(τ) peak value for having comparison sharp, then at peak value
Time delay between as two mikes.
Voice activation detection also known as speech detection, speech terminals detection, for be accurately determined input voice starting point and
Terminal, good the to ensure speech processing system performance, for voice different with the processing method of noise, if can not judge to work as
If front speech frame is noisy speech frame or noise frame, appropriate process cannot be carried out.In speech-enhancement system, in order to
More background noise characteristics are obtained, speech terminals detection is more concerned with how accurately detecting without segment.Phonic knowledge
The accumulation that study and noise source information are estimated all relies on accurate end-point detection.Common voice activation detection is based on voice
, come carried out, the length of speech frame is in 10~30ms for frame.The method of voice activation detection can be summarized and be:From input signal
Middle extract one or a series of contrast characteristic's parameters, then itself and one or a series of threshold values are compared.Such as
Fruit then represents current for there is segment more than thresholding, otherwise to mean that be currently without segment.
Speech detection typically has two steps:
The first step:Feature based on voice signal.With parameters such as energy, zero-crossing rate, entropy (entropy), pitches, and it
Derivative parameter judging the speech/non-speech section in signal stream.
Second step:After voice signal is detected in signal stream, the starting point or end point for being voice herein is judged.?
In voice system, it is easier to make in sentence, have pause (non-voice) due to the changeable background of signal and natural dialogue pattern, especially
Be outburst initial consonant before be silence gap again.Therefore the judgement of this beginning or end is particularly important.
The at present method taken by speech terminals detection can substantially be divided into two classes:
The first kind be under noise circumstance based on HMM model speech sound signal terminal point detect method, the method require background
Noise held stationary and signal to noise ratio is higher.
Equations of The Second Kind method is the algorithm detected based on the short-time energy of signal, and it is by the system to background noise energy
Meter, is made energy threshold, is determined voice signal starting point using energy threshold.
Step 4) in, more pure voice signal is obtained using voice enhancement algorithm.
Speech enhancement technique is can be largely classified into based on single pass method and the method for multichannel array mike.Single-pass
Road sound enhancement method species is various, and the feature for being all based on greatly various noise cancellation methods with reference to voice signal is studying with pin
Algorithm to property, it is spectrum-subtraction (SS which is theoretical ripe and most simple and effective:Spectral Subtraction) voice increasing
By force.Single sensor pickup can be limited by place, distance, application scenario, and therefore pickup effect will be had a greatly reduced quality, follow-up
Speech enhan-cement also will be difficult.
The ultimate principle of spectrum-subtraction is:In the power spectrum that the power spectrum of noisy speech is deducted frequency domain noise, voice is obtained
Power Spectral Estimation, just obtain after evolution voice amplitudes estimation, by after its phase recovery again using inverse Fourier transform recover when
Domain signal.Consider that human ear feels insensitive to phase place, the phase place adopted during phase recovery is the phase place letter of noisy speech
Breath.As voice is short-term stationarity, so thinking that in short-time spectrum amplitude Estimation it is stationary random signal.
Assume that s (n), η (n) and x (n) represent voice, noise and noisy speech respectively, S (ω), Γ (ω) and X (ω) divide
Its short-time spectrum is not represented.Assume that s (n), η (n) are uncorrelated and noise is additive noise.Then the additive model of signal is obtained:x
N ()=s (n)+η (n), the signal after windowing process are expressed as xw(n), sw(n), ηwN (), then have:xw(n)=sw
(n)+ηwN (), does Fourier transform to which, obtain:Xw(ω)=Sw(ω)+Γw(ω), therefore power spectrum is had:|Xw(ω)|2=
|Sw(ω)|2+According to observation data estimation | Xw()|2, remaining item must
Average statistical must be approximately.As s (n), η (n) are independent, then cross-power average statistical is 0, so the valuation of raw tone is:Wherein, estimated valueIt cannot be guaranteed that be non-negative, this is because
Estimate there is error during noise, when estimating that noise average power is more than certain frame noisy speech power, the estimated value that the frame drawsNegative situation will be appeared as, these negative values can by change their symbol be allowed to be changed on the occasion of, it is also possible to
Directly to their zero setting.WillRecover phase place and do the time domain that Short-time Fourier inverse transformation IFFT can just obtain voice signal
Estimate:
Currently, microphone array speech enhancement mainly has Wave beam forming, Subspace Decomposition, blind source separating etc..Wherein,
Blind source separating (BSS) refer to do not know cannot or in the case of source signal and hybrid mode only by observation signal recovery resource
The process of signal.I.e. blind source separating can be independent of the priori conditions of current event, just be able to can be carried out with less mike
Speech enhan-cement, the central issue of the algorithm are to solve multiple people's voices to interfere the voice of each speaker point under aliasing situation
Separate out and, reach to the enhanced purpose of each target voice.
Independent component analysis (ICA) are one of effective ways of Blind Signal Separation, belong at linear instantaneous mixing fanaticism number
Reason, the method do not rely on the accurate identification of the related detailed knowledge of source signal type or signal transmission system characteristic, are a kind of
Effectively redundancy cancels technology.Difference of the method according to cost function, can obtain different ICA algorithms, such as information maximization
Change (infomax) algorithm, Fast ICA algorithms, maximum entropy (M E) and Minimum mutual information (MM I) algorithm, maximum likelihood (ML) to calculate
Method etc..Its ultimate principle is:Obtained signal is seen as echo signal to mix through a linear transformation, in order to obtain mesh
Mark signal is accomplished by finding an inverse linear transformation and the signal decomposition of acquisition comes, so as to reach the detached purpose of information source.
In the case of muting, with X=[x1(t) x2(t) … xN(t)] ' represent microphone array receive one
Group observation signal, wherein t are time or sample sequence number, and N is mike number, it is assumed which is by independent element linear hybrid
Into that is,Wherein A is certain unknown non-singular matrix.So the vector of its signal model
Expression formula is X=AS.
In the presence of noise, it is assumed that noise is additive noise.Then the expression formula of its signal model is:X=AS+ Γ,
Wherein Γ=[η1η2… ηN] it is noise vector.X=AS+ Γ are converted and can be obtained:X=A (S+ Γ0),Γ=A
Γ0,Therefore can draw, signals with noise modelIt is so basic ICA models to appoint, simply independent element
By S-transformation it isUnder ICA baseband signal models, it is assumed that separation matrix to be asked is W, and after separating, signal matrix is Y, then have as follows
Expression formula:Y=WX=WAS.The final purpose of ICA is that one optimum of searching or preferably separation matrix W cause letter after separation
In number matrix Y, each signal is separate and approaches source signal as far as possible.
Currently, the speech enhan-cement based on microphone array is all that single target is carried out, so as to limit array pickup dress
The effective pickup effect that puts, and traditional single goal strengthens the demand that can not meet practical application.
Content of the invention
The present invention is in order to solve at present in the enhanced technical problem of multiple target based on array voice signal, it is proposed that a kind of
Based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction.
The microphone array multiple target sound enhancement method based on blind source separating and spectrum-subtraction of the present invention, including following step
Suddenly:
Step 1:Noisy Speech Signal is gathered by the microphone array of two-dimensional array, each passage of microphone array is obtained
Collection signal, wherein microphone array column number be more than or equal to 4;
Step 2:Collection signal execution step 201~205 to each passage respectively:
Step 201:Bandpass filtering treatment, shielding non-speech segment noise and interference are carried out to gathering signal;Again band logical is filtered
Signal after ripple carries out preemphasis process, and framing, windowing process obtain frame signal;
Then frequency domain conversion is carried out to every frame signal, i.e., short time discrete Fourier transform is carried out to each frame signal, and calculates every frame
Power spectrum;Calculate and retain the phase spectrum of every frame simultaneously, in case the phase recovery during spectrum-subtraction;
Step 203:Speech detection is carried out to the frame signal of every frame, is judged that present frame is speech frame or noise frame, is based on
Noise frame estimating noise power is composed;
Step 204:Noise power spectrum in the power spectrum of speech frame is removed based on spectrum-subtraction, the phonetic speech power of every frame is obtained
Power estimation;
Step 205:To phonetic speech power Power estimation evolution, and after the phase spectrum based on corresponding frame carries out phase recovery, then enter
Row Short-time Fourier inverse transformation, the time domain for obtaining speech frame estimate signal;
Signal Pretreatment is carried out to single pass collection signal in step 2, by collection signal be divided into a lot of in short-term
Voice segments (band is made an uproar), i.e. frame signal, then spectrum-subtraction process is carried out to the frame signal of every frame respectively, made an uproar with reducing the background of speech frame
Sound.
Step 3:Signal carries out information source separating treatment using blind source separating method to be estimated to the time domain of the speech frame of all passages,
Obtain the echo signal of various information source;
Step 4:The echo signal of same information source is carried out postemphasising, goes window, frame reorganization, the mesh of various information source is obtained
Mark voice signal.
In sum, as a result of above-mentioned technical proposal, the invention has the beneficial effects as follows:(1) traditional list is solved
Channel speech Enhancement Method processing environment background noise, algorithm are simple, the technical problem not high to resource requirement;(2) no longer according to
Bad Array Signal Processing algorithm carries out space filtering, it is not necessary to consider broadband beams algorithm, reduces the complexity of algorithm structure;
(3) achieved using blind source separation algorithm and echo signal is strengthened, no longer single or rotation carries out single goal signal enhancing.
Description of the drawings
Fig. 1 is traditional voice strengthening system schematic diagram.
Fig. 2 is that the specific embodiment of the invention realizes system schematic.
Fig. 3 is the flow chart of speech detection.
Fig. 4 is spectrum-subtraction single-channel voice Enhancement Method flow chart.
Specific embodiment
For making the object, technical solutions and advantages of the present invention clearer, with reference to embodiment and accompanying drawing, to this
Bright it is described in further detail.
Referring to Fig. 2, the multiple target sound enhancement method of the present invention, first the microphone array of two-dimensional array is gathered are each
Single channel signal (voice signal) carries out Signal Pretreatment, and single pass voice signal is divided into a lot of voice segments in short-term,
Frame signal is obtained, in order to follow-up voice activation detection, spectrum-subtraction process.Wherein Signal Pretreatment includes bandpass filtering, pre-
Increase to process, overlap framing, Hamming window windowing process.
Carry out voice activation detection, spectrum-subtraction process to single pass frame signal respectively, then same speech frame is owned
Passage carries out blind source separating, obtains the echo signal of not homologous signal, then the correspondingly converse operation of Signal Pretreatment, to same source
The echo signal of signal is postemphasised, and goes Hamming window to process, and frame restructuring obtains each targeted voice signal, realizes to multiple target language
The enhancement process of sound.
Sound source characteristic and noise field characteristic in for indoor environment, using shot noise field model and near-field sound source mould
Type, to actual environment in multichannel Noisy Speech Signal be modeled.By the 8 × 8 of 64 mike compositions planar array
Arrange to gather the voice signal in space.
With X=[x1(t) x2(t) … xj(t) … xN(t)] ' Noisy Speech Signal that each passage is exported is represented, wherein
J represents microphone channel sequence number.
After then carrying out Signal Pretreatment to the Noisy Speech Signal that each passage is exported, the array signal (frame signal) that obtains is
Xpw, then Xpw=[x1pw(n) x2pw(n) …jpw(n) … xNpw(n)] ', wherein n=1,2 ... ... L, L are frame length, and w is frame
Number.
To frame signal XpwMake short time discrete Fourier transform, obtain amplitude spectrum | Xpw(ω) with phase spectrum Φpw(ω).Wherein ω is
Stepped-frequency signal is angular frequency from point uniform samplings such as the N of 0 to 2 π.So having:
|Xpw|=[| X1pw(ω)| |X2pw(ω)| … |Xjpw(ω)| … |XNpw(ω)|]′
Utilize | Xpw|=[| X1pw(ω)| |X2pw(ω)| … |Xjpw(ω)| …|XNpw(ω) |] ', according to shown in Fig. 3
Flow chart carries out the detection of voice starting endpoint and end caps, that is, judge that present frame is noise frame or speech frame, and utilize
Judged result carries out spectrum and subtracts de-noising.Wherein, the detection of voice starting endpoint (start frame) and end caps (end frame) is concrete
Process is:
Using formulaCalculate the speech energy of each frame, wherein N is frame length, numberings of the w for frame, 1
≤ w≤L, L are frame number, and ω is each point in each frame;
Initialization threshold value T, by the statistics to background noise energy, arranges the initial value of threshold value T
Be then based on threshold value T carries out kind judging to every frame, judges that present frame is noise frame or speech frame, while
Threshold value T is updated based on the noise frame of nearest k frames:
A. the speech energy M of present frame is calculatedwIf, MwMore than T, then judge that present frame is speech frame, be otherwise judged to noise
Frame;
If b present frames are noise frame, based on nearest k, (empirical value, usual value are more than or equal to 10) frame noise frame
Threshold value T is updated:
b1:Average speech ENERGY E MN, speech energy maximum and the energy-minimum EAX of nearest k frames noise frame is calculated,
EMIN;
b2:According to formula T=min [a × (EAX-EMIN)+EMN, b × EMN] (0<a<1,1<b<10) after being updated
Threshold value T;
If c present frames are speech frame, judge whether all frames are disposed, if so, then end-point detection is finished, otherwise,
Continue to next frame repeat step a~c.
Further, the result of determination of speech frame and noise frame can also be tested using short-time zero-crossing rate, in case
Only judge by accident.
Referring to Fig. 4, based on all noise frames for detecting, can estimate to obtain noise power spectrum, be then based on spectrum-subtraction
The estimation noise of each speech frame is removed, i.e., the noise power spectrum for currently estimating to obtain is deducted with the power spectrum of speech frame, language is obtained
The phonetic speech power Power estimation of sound frame, then to phonetic speech power Power estimation evolution, and it is extensive to enter line phase based on the phase spectrum of each speech frame
After multiple, then Short-time Fourier inverse transformation is carried out, the time domain for obtaining speech frame estimates signal, i.e., single pass enhancing voice signal.
After the completion of above-mentioned, blind source separating is completed using natural gradient ICA, its concrete processing procedure is:
(1) if the average of Current observation signal X (multiple single channels strengthen voice signal sequence) is not zero, then just first from
Its average is deducted in observation signal X;
(2) matrix B is selected, makes covariance matrix E { VVTBe unit matrix I, wherein V=BX, each point of vectorial V
Between amount be incoherent, and have unit variance;
(3) whitening processing based on singular value decomposition:Estimate variance R of X firstx=E { XXT, RxIt is a reality
Hermitian battle arrays;Secondly to RxCarry out singular value decompositionWherein U=[u1,u2,…,un]
Row be RxLeft singular vector, the purpose of whitening processing is to weaken the mixed dependency of voice signal;
(4) due to σ1≥σ2≥…≥σm>0;σm+1=... σn;(m≤n) is m so as to the number for estimating source signal;
(5) orthogonal transformation is finally carried out:
U=[u1,2,…,n]
=BX, Um=[u1,2,…,m]
According to formulaReturned
Complex signal Y, wherein,P both can be described as performance matrix, can be described as convergent matrix again, and W represents independent component analysis
Separation matrix in ICA, A represent that the hybrid matrix in ICA, S represent source signal.
The above, specific embodiment only of the invention, any feature disclosed in this specification, except non-specifically
Narration, can equivalent by other or with similar purpose alternative features replaced;Disclosed all features or all sides
Method or during the step of, in addition to mutually exclusive feature and/or step, can be combined in any way.
Claims (3)
1. based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction, it is characterised in that including following
Step:
Step 1:Noisy Speech Signal gathered by the microphone array of two-dimensional array, each passage for obtaining microphone array is adopted
Collection signal, wherein microphone array column number are more than or equal to 4;
Step 2:Collection signal execution step 201~205 to each passage respectively:
Step 201:Bandpass filtering treatment, shielding non-speech segment noise and interference are carried out to gathering signal;Again to bandpass filtering after
Signal carry out preemphasis process, framing, windowing process obtain frame signal;
Short time discrete Fourier transform is carried out to the frame signal of every frame, and calculates the power spectrum and phase spectrum of each frame;
Step 203:Speech detection is carried out to the frame signal of every frame, judges that present frame is speech frame or noise frame, based on noise
Frame estimating noise power is composed;
Step 204:Noise power spectrum in the power spectrum of speech frame is removed based on spectrum-subtraction, the phonetic speech power spectrum for obtaining every frame is estimated
Meter;
Step 205:To phonetic speech power Power estimation evolution, and after the phase spectrum based on corresponding frame carries out phase recovery, then carry out short
When inverse fourier transform, obtain speech frame time domain estimate signal;
Step 3:Signal carries out information source separating treatment using blind source separating method to be estimated to the time domain of the speech frame of all passages, is obtained
The echo signal of various information source;
Step 4:The echo signal of same information source is carried out postemphasising, goes window, frame reorganization, the target language of various information source is obtained
Message number.
2. the method for claim 1, it is characterised in that the microphone array is classified as 8*8 rectangle plane arrays, each battle array
Unit is uniformly distributed on the whole, can carry out Subarray partition, and different submatrixs can work independently.
3. method as claimed in claim 1 or 2, it is characterised in that in step 3, using self adaptation natural gradient blind source separating
Carry out information source separating treatment.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2015109672348 | 2015-12-22 | ||
CN201510967234 | 2015-12-22 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106504763A true CN106504763A (en) | 2017-03-15 |
Family
ID=58333455
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611191478.2A Pending CN106504763A (en) | 2015-12-22 | 2016-12-21 | Based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106504763A (en) |
Cited By (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107102296A (en) * | 2017-04-27 | 2017-08-29 | 大连理工大学 | A kind of sonic location system based on distributed microphone array |
CN107293305A (en) * | 2017-06-21 | 2017-10-24 | 惠州Tcl移动通信有限公司 | It is a kind of to improve the method and its device of recording quality based on blind source separation algorithm |
CN107785029A (en) * | 2017-10-23 | 2018-03-09 | 科大讯飞股份有限公司 | Target voice detection method and device |
CN108831500A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | Sound enhancement method, device, computer equipment and storage medium |
CN108899052A (en) * | 2018-07-10 | 2018-11-27 | 南京邮电大学 | A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction |
CN109671439A (en) * | 2018-12-19 | 2019-04-23 | 成都大学 | A kind of intelligence fruit-bearing forest bird pest prevention and treatment equipment and its birds localization method |
WO2019105238A1 (en) * | 2017-12-01 | 2019-06-06 | 腾讯科技(深圳)有限公司 | Method and terminal for speech signal reconstruction and computer storage medium |
CN109859749A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | A kind of voice signal recognition methods and device |
CN109884591A (en) * | 2019-02-25 | 2019-06-14 | 南京理工大学 | A kind of multi-rotor unmanned aerial vehicle acoustical signal Enhancement Method based on microphone array |
CN110060704A (en) * | 2019-03-26 | 2019-07-26 | 天津大学 | A kind of sound enhancement method of improved multiple target criterion study |
CN110111806A (en) * | 2019-03-26 | 2019-08-09 | 广东工业大学 | A kind of blind separating method of moving source signal aliasing |
CN110223708A (en) * | 2019-05-07 | 2019-09-10 | 平安科技(深圳)有限公司 | Sound enhancement method and relevant device based on speech processes |
CN110459234A (en) * | 2019-08-15 | 2019-11-15 | 苏州思必驰信息科技有限公司 | For vehicle-mounted audio recognition method and system |
CN110459236A (en) * | 2019-08-15 | 2019-11-15 | 北京小米移动软件有限公司 | Noise estimation method, device and the storage medium of audio signal |
CN110491410A (en) * | 2019-04-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Speech separating method, audio recognition method and relevant device |
CN111128217A (en) * | 2019-12-31 | 2020-05-08 | 杭州爱莱达科技有限公司 | Distributed multi-channel voice coherent laser radar interception method and device |
CN111239680A (en) * | 2020-01-19 | 2020-06-05 | 西北工业大学太仓长三角研究院 | Direction-of-arrival estimation method based on differential array |
CN111402917A (en) * | 2020-03-13 | 2020-07-10 | 北京松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111627456A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Noise elimination method, device, equipment and readable storage medium |
CN111986692A (en) * | 2019-05-24 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Sound source tracking and pickup method and device based on microphone array |
CN112151036A (en) * | 2020-09-16 | 2020-12-29 | 科大讯飞(苏州)科技有限公司 | Anti-sound-crosstalk method, device and equipment based on multi-pickup scene |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN112309414A (en) * | 2020-07-21 | 2021-02-02 | 东莞市逸音电子科技有限公司 | Active noise reduction method based on audio coding and decoding, earphone and electronic equipment |
CN112735464A (en) * | 2020-12-21 | 2021-04-30 | 招商局重庆交通科研设计院有限公司 | Tunnel emergency broadcast sound effect information detection method |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
CN113053406A (en) * | 2021-05-08 | 2021-06-29 | 北京小米移动软件有限公司 | Sound signal identification method and device |
US11049509B2 (en) | 2019-03-06 | 2021-06-29 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
CN113077808A (en) * | 2021-03-22 | 2021-07-06 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
CN113160845A (en) * | 2021-03-29 | 2021-07-23 | 南京理工大学 | Speech enhancement algorithm based on speech existence probability and auditory masking effect |
CN113314135A (en) * | 2021-05-25 | 2021-08-27 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113314137A (en) * | 2020-02-27 | 2021-08-27 | 东北大学秦皇岛分校 | Mixed signal separation method based on dynamic evolution particle swarm shielding EMD |
CN113329288A (en) * | 2021-04-29 | 2021-08-31 | 开放智能技术(南京)有限公司 | Bluetooth headset noise reduction method based on notch technology |
CN113763982A (en) * | 2020-06-05 | 2021-12-07 | 阿里巴巴集团控股有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN114639398A (en) * | 2022-03-10 | 2022-06-17 | 电子科技大学 | Broadband DOA estimation method based on microphone array |
CN117238278A (en) * | 2023-11-14 | 2023-12-15 | 三一智造(深圳)有限公司 | Speech recognition error correction method and system based on artificial intelligence |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750956A (en) * | 2012-06-18 | 2012-10-24 | 歌尔声学股份有限公司 | Method and device for removing reverberation of single channel voice |
CN202749088U (en) * | 2012-08-08 | 2013-02-20 | 滨州学院 | Voice reinforcing system using blind source separation algorithm |
CN103854660A (en) * | 2014-02-24 | 2014-06-11 | 中国电子科技集团公司第二十八研究所 | Four-microphone voice enhancement method based on independent component analysis |
US20150078571A1 (en) * | 2013-09-17 | 2015-03-19 | Lukasz Kurylo | Adaptive phase difference based noise reduction for automatic speech recognition (asr) |
CN104935546A (en) * | 2015-06-18 | 2015-09-23 | 河海大学 | MIMO-OFDM (Multiple Input Multiple Output-Orthogonal Frequency Division Multiplexing) signal blind separation method for increasing natural gradient algorithm convergence speed |
-
2016
- 2016-12-21 CN CN201611191478.2A patent/CN106504763A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102750956A (en) * | 2012-06-18 | 2012-10-24 | 歌尔声学股份有限公司 | Method and device for removing reverberation of single channel voice |
CN202749088U (en) * | 2012-08-08 | 2013-02-20 | 滨州学院 | Voice reinforcing system using blind source separation algorithm |
US20150078571A1 (en) * | 2013-09-17 | 2015-03-19 | Lukasz Kurylo | Adaptive phase difference based noise reduction for automatic speech recognition (asr) |
CN103854660A (en) * | 2014-02-24 | 2014-06-11 | 中国电子科技集团公司第二十八研究所 | Four-microphone voice enhancement method based on independent component analysis |
CN104935546A (en) * | 2015-06-18 | 2015-09-23 | 河海大学 | MIMO-OFDM (Multiple Input Multiple Output-Orthogonal Frequency Division Multiplexing) signal blind separation method for increasing natural gradient algorithm convergence speed |
Non-Patent Citations (4)
Title |
---|
李蕴华: "基于盲源分离的单通道语音信号增强", 《计算机仿真》 * |
杨震等: "基于SB卡的语音识别实时仿真系统", 《南京邮电学院学报》 * |
职振华: "语音盲分离算法的研究", 《中国硕士学位论文全文数据库(电子期刊)信息科技辑》 * |
陈为国: "实时语音信号处理系统理论和应用", 《中国博士学位论文全文数据库(电子期刊)信息科技辑》 * |
Cited By (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107102296B (en) * | 2017-04-27 | 2020-04-14 | 大连理工大学 | Sound source positioning system based on distributed microphone array |
CN107102296A (en) * | 2017-04-27 | 2017-08-29 | 大连理工大学 | A kind of sonic location system based on distributed microphone array |
CN107293305A (en) * | 2017-06-21 | 2017-10-24 | 惠州Tcl移动通信有限公司 | It is a kind of to improve the method and its device of recording quality based on blind source separation algorithm |
CN107785029B (en) * | 2017-10-23 | 2021-01-29 | 科大讯飞股份有限公司 | Target voice detection method and device |
CN107785029A (en) * | 2017-10-23 | 2018-03-09 | 科大讯飞股份有限公司 | Target voice detection method and device |
US11308974B2 (en) | 2017-10-23 | 2022-04-19 | Iflytek Co., Ltd. | Target voice detection method and apparatus |
US11869481B2 (en) | 2017-11-30 | 2024-01-09 | Alibaba Group Holding Limited | Speech signal recognition method and device |
CN109859749A (en) * | 2017-11-30 | 2019-06-07 | 阿里巴巴集团控股有限公司 | A kind of voice signal recognition methods and device |
US11482237B2 (en) | 2017-12-01 | 2022-10-25 | Tencent Technology (Shenzhen) Company Limited | Method and terminal for reconstructing speech signal, and computer storage medium |
CN109887494A (en) * | 2017-12-01 | 2019-06-14 | 腾讯科技(深圳)有限公司 | The method and apparatus of reconstructed speech signal |
WO2019105238A1 (en) * | 2017-12-01 | 2019-06-06 | 腾讯科技(深圳)有限公司 | Method and terminal for speech signal reconstruction and computer storage medium |
CN108831500A (en) * | 2018-05-29 | 2018-11-16 | 平安科技(深圳)有限公司 | Sound enhancement method, device, computer equipment and storage medium |
CN108899052A (en) * | 2018-07-10 | 2018-11-27 | 南京邮电大学 | A kind of Parkinson's sound enhancement method based on mostly with spectrum-subtraction |
CN108899052B (en) * | 2018-07-10 | 2020-12-01 | 南京邮电大学 | Parkinson speech enhancement method based on multi-band spectral subtraction |
CN109671439B (en) * | 2018-12-19 | 2024-01-19 | 成都大学 | Intelligent fruit forest bird pest control equipment and bird positioning method thereof |
CN109671439A (en) * | 2018-12-19 | 2019-04-23 | 成都大学 | A kind of intelligence fruit-bearing forest bird pest prevention and treatment equipment and its birds localization method |
CN109884591A (en) * | 2019-02-25 | 2019-06-14 | 南京理工大学 | A kind of multi-rotor unmanned aerial vehicle acoustical signal Enhancement Method based on microphone array |
US11049509B2 (en) | 2019-03-06 | 2021-06-29 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
US11664042B2 (en) | 2019-03-06 | 2023-05-30 | Plantronics, Inc. | Voice signal enhancement for head-worn audio devices |
CN110111806A (en) * | 2019-03-26 | 2019-08-09 | 广东工业大学 | A kind of blind separating method of moving source signal aliasing |
CN110060704A (en) * | 2019-03-26 | 2019-07-26 | 天津大学 | A kind of sound enhancement method of improved multiple target criterion study |
CN110491410A (en) * | 2019-04-12 | 2019-11-22 | 腾讯科技(深圳)有限公司 | Speech separating method, audio recognition method and relevant device |
CN110223708A (en) * | 2019-05-07 | 2019-09-10 | 平安科技(深圳)有限公司 | Sound enhancement method and relevant device based on speech processes |
CN110223708B (en) * | 2019-05-07 | 2023-05-30 | 平安科技(深圳)有限公司 | Speech enhancement method based on speech processing and related equipment |
CN111986692A (en) * | 2019-05-24 | 2020-11-24 | 腾讯科技(深圳)有限公司 | Sound source tracking and pickup method and device based on microphone array |
CN112289335A (en) * | 2019-07-24 | 2021-01-29 | 阿里巴巴集团控股有限公司 | Voice signal processing method and device and pickup equipment |
CN110459236B (en) * | 2019-08-15 | 2021-11-30 | 北京小米移动软件有限公司 | Noise estimation method, apparatus and storage medium for audio signal |
CN110459236A (en) * | 2019-08-15 | 2019-11-15 | 北京小米移动软件有限公司 | Noise estimation method, device and the storage medium of audio signal |
CN110459234A (en) * | 2019-08-15 | 2019-11-15 | 苏州思必驰信息科技有限公司 | For vehicle-mounted audio recognition method and system |
CN110459234B (en) * | 2019-08-15 | 2022-03-22 | 思必驰科技股份有限公司 | Vehicle-mounted voice recognition method and system |
CN111128217A (en) * | 2019-12-31 | 2020-05-08 | 杭州爱莱达科技有限公司 | Distributed multi-channel voice coherent laser radar interception method and device |
CN111239680B (en) * | 2020-01-19 | 2022-09-16 | 西北工业大学太仓长三角研究院 | Direction-of-arrival estimation method based on differential array |
CN111239680A (en) * | 2020-01-19 | 2020-06-05 | 西北工业大学太仓长三角研究院 | Direction-of-arrival estimation method based on differential array |
CN113314137B (en) * | 2020-02-27 | 2022-07-26 | 东北大学秦皇岛分校 | Mixed signal separation method based on dynamic evolution particle swarm shielding EMD |
CN113314137A (en) * | 2020-02-27 | 2021-08-27 | 东北大学秦皇岛分校 | Mixed signal separation method based on dynamic evolution particle swarm shielding EMD |
CN111402917A (en) * | 2020-03-13 | 2020-07-10 | 北京松果电子有限公司 | Audio signal processing method and device and storage medium |
CN111627456A (en) * | 2020-05-13 | 2020-09-04 | 广州国音智能科技有限公司 | Noise elimination method, device, equipment and readable storage medium |
CN113763982A (en) * | 2020-06-05 | 2021-12-07 | 阿里巴巴集团控股有限公司 | Audio processing method and device, electronic equipment and readable storage medium |
CN112309414A (en) * | 2020-07-21 | 2021-02-02 | 东莞市逸音电子科技有限公司 | Active noise reduction method based on audio coding and decoding, earphone and electronic equipment |
CN112309414B (en) * | 2020-07-21 | 2024-01-12 | 东莞市逸音电子科技有限公司 | Active noise reduction method based on audio encoding and decoding, earphone and electronic equipment |
CN112151036A (en) * | 2020-09-16 | 2020-12-29 | 科大讯飞(苏州)科技有限公司 | Anti-sound-crosstalk method, device and equipment based on multi-pickup scene |
CN112151036B (en) * | 2020-09-16 | 2021-07-30 | 科大讯飞(苏州)科技有限公司 | Anti-sound-crosstalk method, device and equipment based on multi-pickup scene |
CN112735464A (en) * | 2020-12-21 | 2021-04-30 | 招商局重庆交通科研设计院有限公司 | Tunnel emergency broadcast sound effect information detection method |
CN113030862A (en) * | 2021-03-12 | 2021-06-25 | 中国科学院声学研究所 | Multi-channel speech enhancement method and device |
WO2022198820A1 (en) * | 2021-03-22 | 2022-09-29 | 北京搜狗科技发展有限公司 | Speech processing method and apparatus, and apparatus for speech processing |
CN113077808A (en) * | 2021-03-22 | 2021-07-06 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
CN113077808B (en) * | 2021-03-22 | 2024-04-26 | 北京搜狗科技发展有限公司 | Voice processing method and device for voice processing |
CN113160845A (en) * | 2021-03-29 | 2021-07-23 | 南京理工大学 | Speech enhancement algorithm based on speech existence probability and auditory masking effect |
CN113329288A (en) * | 2021-04-29 | 2021-08-31 | 开放智能技术(南京)有限公司 | Bluetooth headset noise reduction method based on notch technology |
CN113053406A (en) * | 2021-05-08 | 2021-06-29 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113314135A (en) * | 2021-05-25 | 2021-08-27 | 北京小米移动软件有限公司 | Sound signal identification method and device |
CN113314135B (en) * | 2021-05-25 | 2024-04-26 | 北京小米移动软件有限公司 | Voice signal identification method and device |
CN114639398B (en) * | 2022-03-10 | 2023-05-26 | 电子科技大学 | Broadband DOA estimation method based on microphone array |
CN114639398A (en) * | 2022-03-10 | 2022-06-17 | 电子科技大学 | Broadband DOA estimation method based on microphone array |
CN117238278A (en) * | 2023-11-14 | 2023-12-15 | 三一智造(深圳)有限公司 | Speech recognition error correction method and system based on artificial intelligence |
CN117238278B (en) * | 2023-11-14 | 2024-02-09 | 三一智造(深圳)有限公司 | Speech recognition error correction method and system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106504763A (en) | Based on blind source separating and the microphone array multiple target sound enhancement method of spectrum-subtraction | |
CN106251877B (en) | Voice Sounnd source direction estimation method and device | |
Hoshen et al. | Speech acoustic modeling from raw multichannel waveforms | |
Erdogan et al. | Improved mvdr beamforming using single-channel mask prediction networks. | |
WO2020108614A1 (en) | Audio recognition method, and target audio positioning method, apparatus and device | |
EP3360250B1 (en) | A sound signal processing apparatus and method for enhancing a sound signal | |
CN109830245A (en) | A kind of more speaker's speech separating methods and system based on beam forming | |
WO2015196729A1 (en) | Microphone array speech enhancement method and device | |
CN106057210B (en) | Quick speech blind source separation method based on frequency point selection under binaural distance | |
Niwa et al. | Post-filter design for speech enhancement in various noisy environments | |
JP6225245B2 (en) | Signal processing apparatus, method and program | |
KR20210137146A (en) | Speech augmentation using clustering of queues | |
Velasco et al. | Novel GCC-PHAT model in diffuse sound field for microphone array pairwise distance based calibration | |
Martín-Doñas et al. | Dual-channel DNN-based speech enhancement for smartphones | |
CN112394324A (en) | Microphone array-based remote sound source positioning method and system | |
EP3847645B1 (en) | Determining a room response of a desired source in a reverberant environment | |
Bohlender et al. | Neural networks using full-band and subband spatial features for mask based source separation | |
Paikrao et al. | Consumer Personalized Gesture Recognition in UAV Based Industry 5.0 Applications | |
Yegnanarayana et al. | Determining mixing parameters from multispeaker data using speech-specific information | |
Bavkar et al. | PCA based single channel speech enhancement method for highly noisy environment | |
Zhu et al. | Modified complementary joint sparse representations: a novel post-filtering to MVDR beamforming | |
Firoozabadi et al. | Combination of nested microphone array and subband processing for multiple simultaneous speaker localization | |
Pfeifenberger et al. | Blind source extraction based on a direction-dependent a-priori SNR. | |
Taghia et al. | Dual-channel noise reduction based on a mixture of circular-symmetric complex Gaussians on unit hypersphere | |
Hu et al. | Robust binaural sound localisation with temporal attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170315 |
|
RJ01 | Rejection of invention patent application after publication |