CN108597532A - Hidden method for acoustic based on MVDR - Google Patents
Hidden method for acoustic based on MVDR Download PDFInfo
- Publication number
- CN108597532A CN108597532A CN201810221809.5A CN201810221809A CN108597532A CN 108597532 A CN108597532 A CN 108597532A CN 201810221809 A CN201810221809 A CN 201810221809A CN 108597532 A CN108597532 A CN 108597532A
- Authority
- CN
- China
- Prior art keywords
- mvdr
- signal
- array
- sound
- sound source
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000005516 engineering process Methods 0.000 claims description 10
- 230000005236 sound signal Effects 0.000 claims description 8
- 230000008569 process Effects 0.000 claims description 7
- 230000001131 transforming effect Effects 0.000 claims description 4
- 238000010521 absorption reaction Methods 0.000 claims description 3
- 230000001427 coherent effect Effects 0.000 claims description 3
- 230000000873 masking effect Effects 0.000 claims description 2
- 238000003672 processing method Methods 0.000 claims description 2
- 210000000988 bone and bone Anatomy 0.000 description 14
- 230000004044 response Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 8
- 230000009467 reduction Effects 0.000 description 8
- 238000003491 array Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000000926 separation method Methods 0.000 description 6
- 239000000654 additive Substances 0.000 description 4
- 230000000996 additive effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000005316 response function Methods 0.000 description 4
- 230000001629 suppression Effects 0.000 description 3
- 238000002604 ultrasonography Methods 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000002592 echocardiography Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- XEBWQGVWTUSTLN-UHFFFAOYSA-M phenylmercury acetate Chemical compound CC(=O)O[Hg]C1=CC=CC=C1 XEBWQGVWTUSTLN-UHFFFAOYSA-M 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The invention discloses a kind of hidden method for acoustic based on MVDR, this method is first with Short Time Fourier Transform, the time-domain signal that sensor array receives is resolved into frequency domain sub-band signal, the hidden acoustic filter based on MVDR is constructed on each subband, so that the acoustical signal of sound source is undamped by hidden acoustic filter within from hidden throw, estimation signal is obtained eventually by anti-STFT.The present invention has preferable hidden sound effective value.
Description
Technical Field
The invention relates to a hidden sound technology of a microphone array, in particular to a hidden sound method based on MVDR.
Background
The research of the hidden sound technology has a long history, and people mainly explore two methods in the field of sound signal sensing: bone conduction microphones and ultrasound microphones, signal separation techniques and differential microphone arrays have been explored in the field of acoustic signal processing.
The bone conduction microphone collects sound signals and converts the sound signals into electric signals by utilizing slight vibration of bones of the head and neck caused by speaking of a person. Since it is different from a conventional microphone that picks up sound through air conduction, sound can be transmitted out clearly in a very noisy environment. As early as hundreds of years, people have many practical applications of bone conduction technology, but the voice quality of early bone conduction microphones is not very good, especially the voice quality of high frequency is poor, so the early bone conduction microphones can only be used for assisting conventional microphones, for example, the bone conduction microphones are used for voice endpoint detection to improve the performance of single-channel voice noise reduction. In the last twenty years, bone conduction microphones are really attracting attention, and the performance of the bone conduction microphones is greatly improved. Currently, several types of bone conduction microphone-based communication headsets have appeared on the market. The bone conduction microphone is invented by Beijing Meiersion scientific and technological development limited company in 5 months in 2013, realizes the domestic zero breakthrough of independent research and development of the product, and is already put into the market at present. Recently, other acoustic sensors similar to the bone conduction principle have been paid attention, such as DAIKIN-D Talk Mic headset, which works by using a highly sensitive microphone to pick up a vocal cord vibration signal of a speaker and convert the signal into an electrical signal. This way of picking up sound is much similar to bone conduction microphones. Although bone conduction microphone technology has been developed in a breakthrough manner, the popularization of the communication headset is not very ideal at present, and several problems mainly exist: 1) the dynamic effect is poor; 2) the cost is high; 3) the packaging effect is poor; 4) the sound quality is poor.
Over a decade ago, a group of scientists at the united states bell laboratory designed an ultrasonic microphone. This microphone consists of a small ultrasound transmitter and a large broadband conventional microphone. In operation, the ultrasound transmitter transmits a periodic sequence of wideband pulses. The frequency of the broadband pulse signal is between 20kHz and 70kHz, after the broadband pulse signal is reflected by a vocal tract of a speaker, a microphone receives a reflected signal, and then a digital signal processing link at the rear end estimates the shape parameter of the vocal tract by utilizing the transmitted signal and the reflected signal, so as to synthesize the voice spoken by the speaker. The microphone is characterized by working in an ultrasonic frequency band, so that the microphone is not interfered by audio signals in an auditory perception frequency range and can be used for solving the problem of voice communication in a cocktail-like environment. Scientists in the united states bell laboratory have constructed a workable prototype system that provides received and synthesized vowels to a speech recognition system that achieves 95% recognition. Preliminary hearing experiments also confirmed that the quality of the synthesized vowel substantially achieved the voice quality of conventional microphones. Of course, the microphone has many problems to be solved before being really practical, and the biggest technical problem is that the synthesized voice quality and intelligibility are not high for phonemes with unobtrusive vocal tract characteristics such as nasal sound.
The steganoacoustic technique can be viewed theoretically as a sub-problem of signal separation or enhancement. In a complex acoustic environment, a microphone is used to pick up a signal from a certain sound source, which is almost without exception contaminated. According to the mechanism of pollution generation, noise in speech signal processing is classified into 4 types: ambient noise, echoes, reverberation and interference. In order to separate the sound source from the noise, a specific method is used for each type of noise:
ambient Noise (Noise): environmental noise is unavoidable and ubiquitous, and its presence can seriously affect the voice quality, intelligibility and perception of spatial information by the human ear of a speech signal. The ambient noise is usually relatively stationary, i.e. the statistical properties of the noise at the current time point can be replaced by the statistical properties of the noise at the historical time. According to the statistical characteristics of the signal with noise and the statistical characteristics of the noise, a filter can be designed to filter the observation signal, so that the voice signal is enhanced, and the background noise is suppressed. The noise reduction technique may utilize a single-channel pickup system or may utilize a multi-channel pickup system, which correspond to the single-channel noise reduction technique and the multi-channel noise reduction technique, respectively. In contrast to single-channel noise reduction techniques, which result in speech distortion while achieving noise suppression, multi-channel noise reduction techniques reduce speech distortion while achieving the same output signal-to-noise ratio.
Echo (Echo): acoustic echoes are generated by acoustic coupling between the microphone and the loudspeaker. The presence of echo can severely impact multi-party duplex interaction. The echo is characterized by the fact that the source signal is known, that the echo component of the observed signal picked up by the microphone can be estimated as long as the acoustic propagation channel from the loudspeaker to the microphone can be estimated, and that the echo cancellation can be achieved by subtracting the estimate of this component from the picked-up signal, a technique known as echo cancellation.
Reverberation (reverbration): reverberation is caused by interface reflections (multipath effects) in the room environment. The reflections are classified into early reflections and late reflections. Early reflections (typically within 40 ms) typically carry useful information, such as the size of the room, which can be estimated by analyzing the structure of the early reflections. In addition, the early reflection can also enhance the harmonic components of the music and improve the listening feeling. However, late reflections cause spectral distortions, which lead to a reduction in speech quality, intelligibility, and blurring of the location information of the sound source. In a voice communication system, late reflections cause reverberation, thereby seriously affecting the quality of voice communication, so that a dereverberation technique is required. A dereverberation technology is that firstly, blind estimation is carried out on a channel, and then dereverberation is realized by utilizing an equalization technology; another technique for suppressing reverberation is super-directional array beamforming, which basically extracts a sound source signal from a desired direction and suppresses signals from other directions at the same time. The reverberation comes from all directions, so the super-directional array can suppress the reverberation to some extent.
Interference (Interference) signals from other sound sources: the interfering signal is caused by point source noise and is noise from a certain direction in space. In voice communication, many people are around and other sound sources are present, so that a plurality of sound sources are inevitable at each communication end, and signals from different sound sources interfere with each other. The typical technique for interference suppression is beamforming, which basically forms a spatial filter and aligns the direction of the maximum filter response to the direction of the desired sound source, and the degree of interference suppression depends on the size of the array response in the direction of the interference. The goals of speech noise reduction, source separation and beamforming are to separate the desired useful signal from other interfering signals, so these techniques can be used for steganographic processing. However, the separation performance available by the current separation technology is limited, and the requirements of the application of the hidden sound cannot be met.
The aforementioned signal separation techniques require the use of microphone arrays. Microphone array research has been in progress for over forty years, and over forty years, many array designs and processing methods have been developed. These arrays can be divided into two broad categories according to their response to the sound field: additive type arrays (AMA) and differential arrays (DMA). The additive array is typically large in size, each microphone measures the sound pressure of the sound field, and the entire array beamforming is also responsive to the sound pressure field. Much work is currently done in the literature on additive arrays, and on methods of processing additive arrays. In comparison, the differential array responds to the spatial differential field of the sound pressure field, and has the characteristics of small array size, good beam pattern frequency consistency, capability of maximizing array directivity under the condition of given array element number and the like.
Disclosure of Invention
The invention aims to provide a hidden sound method based on MVDR, which has a good hidden sound effect.
In order to solve the technical problems, the invention adopts the following technical scheme: the method is characterized in that a short-time Fourier transform is utilized to decompose a time domain signal received by a sensor array into sub-band signals, and an MVDR-based hidden sound filter is constructed on each sub-band, so that a sound signal from a sound source within a hidden sound distance does not attenuate and passes through the hidden sound filter, and finally an estimation signal is obtained through inverse STFT.
Further, the method comprises the steps of:
s1: constructing a guide vector according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like
S2: the signal y received by the sensor in the microphone arraym(k)=xm(k)+vm(k) M is divided into short time frames with a certain coincidence proportion, the frame length can be from several milliseconds to tens of milliseconds, and then each frame of each channel of the M channels is subjected to short time Fourier transform to obtain Ym(ω, i) where i represents the ith frame, and then constructing
ym(ω,i)=[Y1(ω,i) Y2(ω,i) … YM(ω,i)]T.
S3: decomposing a time domain signal received by the sensor array into sub-band signals by using short-time Fourier transform;
s4: constructing a hidden sound filter based on MVDR on a subband with frequency omega
S5: on the sub-band, the signal of the i-th frame is processed using a steganoacoustic filter h (ω)
S6: and transforming Z (omega, i) by using a short-time inverse Fourier transform and an overlap-add method, thereby obtaining a time-domain signal Z (k) after beam forming.
Further, the specific steps of decomposing the time domain signal received by the sensor array into sub-band signals by using short-time fourier transform are as follows:
assuming that the distance between two adjacent microphones is delta, because the sound hiding technology is used for picking up a sound source at a short distance, an ideal near-field sound source and interference act on the microphone array in an acoustic environment, and the distances from the sound source to the microphones are r respectivelys,1,rs,2,…,rs,MDefining the center of the array as a reference point, the distance of the sound source from the reference point being rsIncident angle of thetasThen the distance of the sound source from the mth microphone can be expressed as:
wherein,
assuming that at discrete time k, the signal emitted by the sound source is x (k), and if the absorption loss in the propagation process is neglected, the signal picked up by the m-th microphone has only a phase delay with respect to the sound source signal, and there is also an amplitude attenuation inversely proportional to the distance, which can be expressed as:
wherein x ism(k) Representing the sound source signal, v, picked up by the m-th microphonem(k) Representing the noise signal picked up by the mth microphone; tau ismRepresenting the time delay of the mth microphone compared to the sound source;
tau since the wave front of the near-field ideal sound source is sphericalmCan be expressed as:
where c represents the speed of sound in air,
in the frequency domain, equation (1) becomes:
wherein,denotes wave number, ω ═ 2 π f denotes angular frequency; f represents the frequency of the time and the frequency,represents an imaginary unit, Ym(ω)、Xm(ω)、Vm(ω) represents ym(k)、xm(k)、vm(k) The fourier transform of (d).
Further, according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like, a guide vector with the length of M is constructed:the superscript T represents the vector transpose operation, let r be rs,θ=θsThen, the process of the present invention,the signals picked up by the M microphones are represented by a vector as:
y(ω)=[Y1(ω) Y2(ω) … YM(ω)]T
=ds(ω,rs,θs)X(ω)+v(ω),
y (ω) is processed by the array, and the resulting output signal is:
wherein Z (ω) is an estimate of the sound source signal X (ω); the superscript H denotes the conjugate transpose operation, H (ω) — [ H [)1(ω)H2(ω) … HM(ω)]TRepresenting the weighting vector of the microphone array to the input signal, i.e. the filter coefficients of the microphone array.
Further, in S3, at the subband having the frequency ω, the MVDR-based implicit filter method is constructed as follows: the MVDR hidden sound filter is obtained by not attenuating the close-distance sound source signal and minimizing the variance of the residual noise at the output end of the array, and the mathematical expression is as follows:
hMVDR=argminh(ω)hHRvvh subject tohH(ω)ds(ω,rs,θs)=1, (2)
equation (2) can be solved using the Lagrange multiplier method to obtain
Further, assuming that the noise is isotropic noise, the (m, n) th element of its normalized correlation matrix can be written as follows:
wherein,there are two extreme cases: 1) if ω τ0Very large, i.e. at height
At high frequency or large distance, the noise signals received by the two sensors are nearly uncorrelated and isotropic
The sexual noise is close to spatial white noise; 2) if ω τ0The noise signals received by the two sensors are close to coherent under the condition of low frequency or small distance, and isotropic noise is close to point source noise;
the MVDR-based implicit acoustic filter has a matrix inversion operation, when the number of microphone elements is large, the matrix is ill-conditioned, the filter is extremely unstable, and in order to avoid the inversion instability, the traditional method is adopted: the diagonal loading technique is to add the matrix and a diagonal matrix, and the matrix after diagonal loading can be expressed as [ gamma ] (gamma)dn(ω)+εI]Wherein I is an identity matrix with a size of M, and epsilon is a loading coefficient, and at this time, the coefficient of the MVDR-based implicit acoustic filter is:
the invention has the beneficial effects that: the invention is a small microphone array on a platform, the core of the invention is that time domain signals received by a sensor array are decomposed into sub-band signals by utilizing short-time Fourier transform, and a proper sound hiding filter is constructed on each sub-band, so that sound signals from a sound source within a sound hiding distance pass through the sound hiding filter without attenuation.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a diagram of a muffling model.
FIG. 2 is a schematic diagram of a system for processing speech with a small microphone array.
Fig. 3 is a diagram of the pick-up and processing of near-field acoustic signals by a uniform linear array.
Fig. 4 is a graph of the attenuation of the hidden sound by the hidden sound method based on the 8-element MVDR.
FIG. 5 shows a 8-element MVDR-based implicit method with frequencies of 1300Hz and 3300Hz
The beam patterns (blue, red, purple, black indicate r is 5cm, 10cm, 30cm, 50cm, respectively).
FIG. 6 is a schematic diagram of a 8-array-element MVDR-based implicit method for aligning a sound source and an end with an end-fire direction of 0cm
Signal to interference ratio map of interference in the direction of the beam.
Detailed Description
The technical solution of the present invention will be clearly and completely described by the following detailed description.
The method for hiding the sound based on the MVDR comprises the steps of decomposing a time domain signal received by a sensor array into sub-band signals by utilizing short-time Fourier transform, constructing a proper hidden sound filter on each sub-band, enabling a sound signal from a sound source within a hidden sound distance to pass through the hidden sound filter without attenuation, and finally obtaining an estimated signal through inverse STFT.
The method of the invention comprises the following steps:
s1: constructing a guide vector according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like
S2: the signal y received by the sensor in the microphone arraym(k)=xm(k)+vm(k) M is divided into short time frames with a certain coincidence proportion, the frame length can be from several milliseconds to tens of milliseconds, and then each frame of each channel of the M channels is subjected to short time Fourier transform to obtain Ym(ω, i) where i represents the ith frame, and then constructing
ym(ω,i)=[Y1(ω,i) Y2(ω,i) … YM(ω,i)]T.
S3: decomposing a time domain signal received by the sensor array into sub-band signals by using short-time Fourier transform; assuming that the distance between two adjacent microphones is delta, because the sound hiding technology is used for picking up a sound source at a short distance, an ideal near-field sound source and interference act on the microphone array in an acoustic environment, and the distances from the sound source to the microphones are r respectivelys,1,rs,2,…,rs,MDefining the center of the array as a reference point, the distance of the sound source from the reference point being rsIncident angle of thetasThen the distance of the sound source from the mth microphone can be expressed as:
wherein,
assuming that at discrete time k, the signal emitted by the sound source is x (k), and if the absorption loss in the propagation process is neglected, the signal picked up by the m-th microphone has only a phase delay with respect to the sound source signal, and there is also an amplitude attenuation inversely proportional to the distance, which can be expressed as:
wherein x ism(k) Representing the sound source signal, v, picked up by the m-th microphonem(k) Representing the noise signal picked up by the mth microphone; tau ismRepresenting the time delay of the mth microphone compared to the sound source;
tau since the wave front of the near-field ideal sound source is sphericalmCan be expressed as:
where c represents the speed of sound in air,
in the frequency domain, equation (1) becomes:
wherein,denotes wave number, ω ═ 2 π f denotes angular frequency; f represents the frequency of the time and the frequency,represents an imaginary unit, Ym(ω)、Xm(ω)、Vm(ω) represents ym(k)、xm(k)、vm(k) The fourier transform of (d).
Constructing a guide vector with the length of M according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like:the superscript T represents the vector transpose operation, let r be rs,θ=θsThen, the process of the present invention,the signals picked up by the M microphones are represented by a vector as:
y(ω)=[Y1(ω) Y2(ω… YM(ω)]T
=ds(ω,rs,θs)X(ω)+v(ω),
y (ω) is processed by the array, and the resulting output signal is:
wherein Z (ω) is an estimate of the sound source signal X (ω); the superscript H denotes the conjugate transpose operation, H (ω) — [ H [)1(ω)H2(ω) … HM(ω)]TRepresenting the weighting vector of the microphone array to the input signal, i.e. the filter coefficients of the microphone array.
Before deriving the implicit filter, several important indexes for measuring the performance of the implicit sound are introduced.
The components of the array output that are related to the source signal are: h isH(ω)ds(ω,rs,θs) X (omega). Thus, the response of the array to the acoustic source signal is:
response of array to sound source signalThere should be three variables: omega, rs,θs. Fixing two of these variables, the response of the array to the source signal varies with the other variable to produce three indices: 1) fix omega, thetasThe response of the array to the acoustic source signal is dependent on rsThe index of change is called the muffled attenuation function; 2) fixed rs,θsThe index of the response of the array to the sound source signal changing along with omega is called a hidden sound frequency response function; 3) fix omega, rsResponse of the array to the source signal with thetasThe indicator of change is called the beam pattern.
Attenuation function of implicit sound
The muffling attenuation function describes the gain of the array at different distances to a single-frequency sound source signal, which is mathematically defined as:
muffled frequency response function
The muffled frequency response function describes the gain of the array to the broadband signal, which is mathematically defined as:
beam pattern
The beam pattern describes the sensitivity of the array to incident signals in different directions, which is mathematically defined as:
signal to interference ratio and signal to interference ratio gain
The signal-to-noise ratio measures the relative magnitude of the source signal and the noise signal. By comparing the input-output signal-to-noise ratio, the performance of the beamformer can be measured. The invention relates toThe heart is the interference noise. The input signal-to-interference ratio of the array is defined as:
wherein, X0(ω) is the sound source signal picked up by the reference position,V0(ω) is the interference signal picked up by the reference position,rnrepresenting the distance of the interference noise from the center of the array.Respectively represent X0(ω),V0(ω) power.
The output signal-to-interference ratio of the array is expressed as:
wherein theta isnIndicating the direction of the interference noise, phiX(ω)、ΦVAnd (ω) represents the power of X (ω) and V (ω), respectively.
Thus, the signal-to-interference ratio gain is:
s4: on the subband with frequency ω, the MVDR-based implicit filter method is constructed as follows: the MVDR hidden sound filter is obtained by not attenuating the close-distance sound source signal and minimizing the variance of the residual noise at the output end of the array, and the mathematical expression is as follows:
hMVDR=argminh(ω)hHRvvh subject to hH(ω)ds(ω,rs,θs)=1, (2)
equation (2) can be solved using the Lagrange multiplier method to obtain
Further, assuming that the noise is isotropic noise, the (m, n) th element of its normalized correlation matrix can be written as follows:
wherein,there are two extreme cases: 1) if ω τ0Very large, i.e. at high frequency or large spacing, the noise signals received by the two sensors are nearly uncorrelated, and the isotropic noise is close to spatial white noise; 2) if ω τ0Very small, i.e. at low frequencies or small spacings
Next, noise signals received by the two sensors are close to coherent, and isotropic noise is close to point source noise;
the MVDR-based implicit acoustic filter has a matrix inversion operation, when the number of microphone elements is large, the matrix is ill-conditioned, the filter is extremely unstable, and in order to avoid the inversion instability, the traditional method is adopted: the diagonal loading technique is to add the matrix and a diagonal matrix, and the matrix after diagonal loading can be expressed as [ gamma ] (gamma)dn(ω)+εI]Where I is an identity matrix of size M and ε is a loading coefficient, in which case MVDR-basedThe coefficients of the implicit sound filter are:
s5: on the sub-band, the signal of the i-th frame is processed using a steganoacoustic filter h (ω)
S6: and transforming Z (omega, i) by using a short-time inverse Fourier transform and an overlap-add method, thereby obtaining a time-domain signal Z (k) after beam forming.
In order to demonstrate the effect of the present invention, the following specific examples are given to verify the correctness of the algorithm proposed by the present patent. The hidden sound attenuation function, the hidden sound frequency response function and the beam pattern are MATLAB simulation results, and the signal-to-interference ratio is an experimental result of a test in a total-noise-elimination darkroom of an intelligent acoustics and immersive communication voice research center of northwest industrial university. The experimental conditions were set as follows:
the microphone array structure is a uniform linear array, the number M of the microphones is 8, and the array element distance delta is 1.1 cm.
Sound source position: thetas=0°,rs=5cm。
Fig. 4(a) is a graph of the attenuation of the concealment in the concealment method based on the 8-element MVDR. It can be seen that: the point source signal is attenuated quickly within 20cm, and the attenuation is quicker when the distance is closer; above 20cm, the attenuation is almost inversely proportional to the distance. The method is used for hiding the remote interference by the hiding method based on the 8-array element MVDR when the sound source distance is not more than 20 cm. Description of the drawings: radius of sound masking r0About 20cm, when the sound source distance rs<r0In time, the 8-array element MVDR-based implicit method can be used for implicit r0External interference.
Fig. 4(b) is a graph of the muffled frequency response of the muffled method based on 8-element MVDR. It can be seen that: the response of a point source signal at 2000Hz at 5cm, 10cm, 30cm and 50cm is in turn about: 0dB, 19dB, 33dB, 38 dB. Description of the drawings: when the sound source distance is 5cm, the interference at the positions of 10cm, 30cm and 50cm in the sound end-fire direction can be concealed by the concealed sound method based on the 8-array element MVDR.
FIG. 5 is a beam diagram of an 8-element MVDR based implicit method at 1300Hz and 3300 Hz. It can be seen that: at different frequencies, when the sound source distance is 5cm, the hidden sound method based on the 8-array element MVDR can hide the interference at 10cm, 30cm and 50cm in any direction, and the hidden sound effect on the interference in the non-end-fire direction is better.
Fig. 6 is a signal-to-interference ratio diagram of interference of a sound source with an end-fire direction of 0cm and an end-fire direction by a hidden sound method based on 8-array element MVDR. It can be seen that: when the interference is at 50cm, the signal-to-interference ratio of the voice interference in fig. 6(a) is improved by about 10dB, the signal-to-interference ratio of the white gaussian noise interference in fig. 6(b) is improved by about 6dB, the signal-to-interference ratio of the chirp interference in fig. 6(c) is improved by about 7dB, and the signal-to-interference ratio of the single frequency interference in fig. 6(d) is improved by about 7 dB. Description of the drawings: the implicit method based on the 8-array element MVDR has a good implicit effect.
The above-mentioned embodiments are merely descriptions of the preferred embodiments of the present invention, and do not limit the concept and scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art should fall into the protection scope of the present invention without departing from the design concept of the present invention, and the technical contents of the present invention as claimed are all described in the technical claims.
Claims (10)
1. The MVDR-based implicit method is characterized in that: the method comprises the steps of decomposing a time domain signal received by a sensor array into sub-band signals by short-time Fourier transform, constructing an MVDR-based sound hiding filter on each sub-band, enabling sound signals from a sound source within a sound hiding distance to pass through the sound hiding filter without attenuation, and finally obtaining an estimated signal through inverse STFT.
2. The MVDR-based implicit method of claim 1 wherein: the method comprises the following steps:
s1: constructing a guide vector according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like
S2: the signal y received by the sensor in the microphone arraym(k)=xm(k)+vm(k) M is divided into short time frames with a certain coincidence proportion, the frame length can be from several milliseconds to tens of milliseconds, and then each frame of each channel of the M channels is subjected to short time Fourier transform to obtain Ym(ω, i) where i represents the ith frame, and then constructing
ym(ω,i)=[Y1(ω,i)Y2(ω,i)…YM(ω,i)]T.
S3: decomposing a time domain signal received by the sensor array into sub-band signals by using short-time Fourier transform;
s4: constructing a hidden sound filter based on MVDR on a subband with frequency omega
S5: on the sub-band, the signal of the i-th frame is processed using a steganoacoustic filter h (ω)
S6: and transforming Z (omega, i) by using a short-time inverse Fourier transform and an overlap-add method, thereby obtaining a time-domain signal Z (k) after beam forming.
3. The MVDR-based implicit method of claim 1 wherein: by short-time Fourier transformAnd transforming, namely decomposing the time domain signals received by the sensor array into sub-band signals, wherein the specific steps are as follows: assuming that the distance between two adjacent microphones is delta, because the sound hiding technology is used for picking up a sound source at a short distance, an ideal near-field sound source and interference act on the microphone array in an acoustic environment, and the distances from the sound source to the microphones are r respectivelys,1,rs,2,…,rs,MDefining the center of the array as a reference point, the distance of the sound source from the reference point being rsIncident angle of thetasThen the distance of the sound source from the mth microphone can be expressed as:
wherein,
assuming that at discrete time k, the signal emitted by the sound source is x (k), and if the absorption loss in the propagation process is neglected, the signal picked up by the m-th microphone has only a phase delay with respect to the sound source signal, and there is also an amplitude attenuation inversely proportional to the distance, which can be expressed as:
wherein x ism(k) Representing the sound source signal, v, picked up by the m-th microphonem(k) Representing the noise signal picked up by the mth microphone; tau ismRepresenting the time delay of the mth microphone compared to the sound source;
tau since the wave front of the near-field ideal sound source is sphericalmCan be expressed as:
where c represents the speed of sound in air,
in the frequency domain, equation (1) becomes:
wherein,denotes wave number, ω ═ 2 π f denotes angular frequency; f represents the frequency of the time and the frequency,represents an imaginary unit, Ym(ω)、Xm(ω)、Vm(ω) represents ym(k)、xm(k)、vm(k) The fourier transform of (d).
4. The MVDR-based implicit method of claim 3 wherein: constructing a guide vector with the length of M according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like:the superscript T represents the vector transpose operation, let r be rs,θ=θsThen, the process of the present invention,the signals picked up by the M microphones are represented by a vector as:
y(ω)=[Y1(ω) Y2(ω) … YM(ω)]T
=ds(ω,rs,θs)X(ω)+v(ω),
y (ω) is processed by the array, and the resulting output signal is:
wherein Z (ω) is an estimate of the sound source signal X (ω); the superscript H denotes the conjugate transpose operation, H (ω) — [ H [)1(ω) H2(ω) … HM(ω)]TRepresenting the weighting vector of the microphone array to the input signal, i.e. the filter coefficients of the microphone array.
5. The MVDR-based implicit method of claim 1 wherein: in S3, the MVDR-based implicit filter method is constructed on the subband having the frequency ω as follows: the MVDR hidden sound filter is obtained by not attenuating the close-distance sound source signal and minimizing the variance of the residual noise at the output end of the array, and the mathematical expression is as follows:
hMVDR=arg minh(ω)hHRvvh subject to hH(ω)ds(ω,rs,θs)=1,(2)
equation (2) can be solved using the Lagrange multiplier method to obtain
6. The MVDR-based implicit method of claim 5 wherein: assuming that the noise is isotropic, the (m, n) th element of its normalized correlation matrix can be written as follows:
wherein,there are two extreme cases: 1) if ω τ0Very large, i.e. at high frequency or large spacing, the noise signals received by the two sensors are nearly uncorrelated, and the isotropic noise is close to spatial white noise; 2) if ω τ0Very small, i.e. at low frequencies orUnder the condition of small distance, noise signals received by the two sensors are close to coherent, and isotropic noise is close to point source noise;
the MVDR-based implicit acoustic filter has a matrix inversion operation, when the number of microphone elements is large, the matrix is ill-conditioned, the filter is extremely unstable, and in order to avoid the inversion instability, the traditional method is adopted: the diagonal loading technique is to add the matrix and a diagonal matrix, and the matrix after diagonal loading can be expressed as [ gamma ] (gamma)dn(ω)+εI]Wherein I is an identity matrix with a size of M, and epsilon is a loading coefficient, and at this time, the coefficient of the MVDR-based implicit acoustic filter is:
7. the MVDR-based implicit sound method of claim 1 wherein the microphone array includes but is not limited to a uniform linear microphone array, a non-uniform linear microphone array and a circular microphone array.
8. The MVDR-based masking method of claim 1 wherein the microphone array is one of a small microphone array and a large microphone array.
9. The MVDR-based implicit method of claim 1 wherein the speech signal processing method is used to process narrowband signals or wideband signals.
10. The MVDR-based stealth method of claim 1, wherein a stealth radius of the microphone array increases as a number of microphones and an inter-element spacing increases.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710163190.2A CN107170462A (en) | 2017-03-19 | 2017-03-19 | Hidden method for acoustic based on MVDR |
CN2017101631902 | 2017-03-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108597532A true CN108597532A (en) | 2018-09-28 |
Family
ID=59848862
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710163190.2A Withdrawn CN107170462A (en) | 2017-03-19 | 2017-03-19 | Hidden method for acoustic based on MVDR |
CN201810221809.5A Pending CN108597532A (en) | 2017-03-19 | 2018-03-18 | Hidden method for acoustic based on MVDR |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710163190.2A Withdrawn CN107170462A (en) | 2017-03-19 | 2017-03-19 | Hidden method for acoustic based on MVDR |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN107170462A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110148420A (en) * | 2019-06-30 | 2019-08-20 | 桂林电子科技大学 | A kind of audio recognition method suitable under noise circumstance |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110018465B (en) * | 2018-01-09 | 2020-11-06 | 中国科学院声学研究所 | MVDR beam forming method based on full-phase preprocessing |
CN108717855B (en) * | 2018-04-27 | 2020-07-28 | 深圳市沃特沃德股份有限公司 | Noise processing method and device |
CN112420068B (en) * | 2020-10-23 | 2022-05-03 | 四川长虹电器股份有限公司 | Quick self-adaptive beam forming method based on Mel frequency scale frequency division |
CN116013239B (en) * | 2022-12-07 | 2023-11-17 | 广州声博士声学技术有限公司 | Active noise reduction algorithm and device for air duct |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103491397A (en) * | 2013-09-25 | 2014-01-01 | 歌尔声学股份有限公司 | Method and system for achieving self-adaptive surround sound |
CN103856866A (en) * | 2012-12-04 | 2014-06-11 | 西北工业大学 | Low-noise differential microphone array |
CN104464739A (en) * | 2013-09-18 | 2015-03-25 | 华为技术有限公司 | Audio signal processing method and device and difference beam forming method and device |
CN104902418A (en) * | 2014-03-07 | 2015-09-09 | 奥迪康有限公司 | Multi-microphone method for estimation of target and noise spectral variances |
EP2916320A1 (en) * | 2014-03-07 | 2015-09-09 | Oticon A/s | Multi-microphone method for estimation of target and noise spectral variances |
CN104980865A (en) * | 2014-04-03 | 2015-10-14 | 奥迪康有限公司 | Binaural hearing assistance system comprising binaural noise reduction |
-
2017
- 2017-03-19 CN CN201710163190.2A patent/CN107170462A/en not_active Withdrawn
-
2018
- 2018-03-18 CN CN201810221809.5A patent/CN108597532A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103856866A (en) * | 2012-12-04 | 2014-06-11 | 西北工业大学 | Low-noise differential microphone array |
CN104464739A (en) * | 2013-09-18 | 2015-03-25 | 华为技术有限公司 | Audio signal processing method and device and difference beam forming method and device |
CN103491397A (en) * | 2013-09-25 | 2014-01-01 | 歌尔声学股份有限公司 | Method and system for achieving self-adaptive surround sound |
CN104902418A (en) * | 2014-03-07 | 2015-09-09 | 奥迪康有限公司 | Multi-microphone method for estimation of target and noise spectral variances |
EP2916320A1 (en) * | 2014-03-07 | 2015-09-09 | Oticon A/s | Multi-microphone method for estimation of target and noise spectral variances |
CN104980865A (en) * | 2014-04-03 | 2015-10-14 | 奥迪康有限公司 | Binaural hearing assistance system comprising binaural noise reduction |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110148420A (en) * | 2019-06-30 | 2019-08-20 | 桂林电子科技大学 | A kind of audio recognition method suitable under noise circumstance |
Also Published As
Publication number | Publication date |
---|---|
CN107170462A (en) | 2017-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5323995B2 (en) | System, method, apparatus and computer readable medium for dereverberation of multi-channel signals | |
CN108597532A (en) | Hidden method for acoustic based on MVDR | |
US7366662B2 (en) | Separation of target acoustic signals in a multi-transducer arrangement | |
US9485574B2 (en) | Spatial interference suppression using dual-microphone arrays | |
US8098844B2 (en) | Dual-microphone spatial noise suppression | |
US9100734B2 (en) | Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation | |
CN108337605A (en) | The hidden method for acoustic formed based on Difference Beam | |
TW201030733A (en) | Systems, methods, apparatus, and computer program products for enhanced active noise cancellation | |
US8615392B1 (en) | Systems and methods for producing an acoustic field having a target spatial pattern | |
CN113782046B (en) | Microphone array pickup method and system for long-distance voice recognition | |
Yamamoto et al. | Spherical microphone array post-filtering for reverberation suppression using isotropic beamformings | |
Šarić et al. | Performance analysis of MVDR beamformer applied on an end-fire microphone array composed of unidirectional microphones | |
CN112017684B (en) | Closed space reverberation elimination method based on microphone array | |
Li et al. | Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments | |
Zhang et al. | Speech enhancement using compact microphone array and applications in distant speech acquisition | |
Zhang et al. | A frequency domain approach for speech enhancement with directionality using compact microphone array. | |
Zhang et al. | A compact-microphone-array-based speech enhancement algorithm using auditory subbands and probability constrained postfilter | |
Datla | Implementation and evaluation of spectral subtraction (SS) with minimum statistics and wiener beamformer combination | |
Li et al. | A Subband Feedback Controlled Generalized Sidelobe Canceller in Frequency Domain with Multi-Channel Postfilter | |
Tashev | Defeating ambient noise: practical approaches for noise reduction and suppression | |
Nilsson | Suppression of reverberation in hearing aids | |
Tashev | Microsoft Research Redmond, USA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180928 |
|
RJ01 | Rejection of invention patent application after publication |