CN108597532A - Hidden method for acoustic based on MVDR - Google Patents

Hidden method for acoustic based on MVDR Download PDF

Info

Publication number
CN108597532A
CN108597532A CN201810221809.5A CN201810221809A CN108597532A CN 108597532 A CN108597532 A CN 108597532A CN 201810221809 A CN201810221809 A CN 201810221809A CN 108597532 A CN108597532 A CN 108597532A
Authority
CN
China
Prior art keywords
mvdr
signal
array
sound
sound source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810221809.5A
Other languages
Chinese (zh)
Inventor
陈景东
梁菲菲
王雪瀚
黄海
聂玮奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linjing Acoustics Technology Jiangsu Co Ltd
Original Assignee
Linjing Acoustics Technology Jiangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linjing Acoustics Technology Jiangsu Co Ltd filed Critical Linjing Acoustics Technology Jiangsu Co Ltd
Publication of CN108597532A publication Critical patent/CN108597532A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention discloses a kind of hidden method for acoustic based on MVDR, this method is first with Short Time Fourier Transform, the time-domain signal that sensor array receives is resolved into frequency domain sub-band signal, the hidden acoustic filter based on MVDR is constructed on each subband, so that the acoustical signal of sound source is undamped by hidden acoustic filter within from hidden throw, estimation signal is obtained eventually by anti-STFT.The present invention has preferable hidden sound effective value.

Description

MVDR-based implicit method
Technical Field
The invention relates to a hidden sound technology of a microphone array, in particular to a hidden sound method based on MVDR.
Background
The research of the hidden sound technology has a long history, and people mainly explore two methods in the field of sound signal sensing: bone conduction microphones and ultrasound microphones, signal separation techniques and differential microphone arrays have been explored in the field of acoustic signal processing.
The bone conduction microphone collects sound signals and converts the sound signals into electric signals by utilizing slight vibration of bones of the head and neck caused by speaking of a person. Since it is different from a conventional microphone that picks up sound through air conduction, sound can be transmitted out clearly in a very noisy environment. As early as hundreds of years, people have many practical applications of bone conduction technology, but the voice quality of early bone conduction microphones is not very good, especially the voice quality of high frequency is poor, so the early bone conduction microphones can only be used for assisting conventional microphones, for example, the bone conduction microphones are used for voice endpoint detection to improve the performance of single-channel voice noise reduction. In the last twenty years, bone conduction microphones are really attracting attention, and the performance of the bone conduction microphones is greatly improved. Currently, several types of bone conduction microphone-based communication headsets have appeared on the market. The bone conduction microphone is invented by Beijing Meiersion scientific and technological development limited company in 5 months in 2013, realizes the domestic zero breakthrough of independent research and development of the product, and is already put into the market at present. Recently, other acoustic sensors similar to the bone conduction principle have been paid attention, such as DAIKIN-D Talk Mic headset, which works by using a highly sensitive microphone to pick up a vocal cord vibration signal of a speaker and convert the signal into an electrical signal. This way of picking up sound is much similar to bone conduction microphones. Although bone conduction microphone technology has been developed in a breakthrough manner, the popularization of the communication headset is not very ideal at present, and several problems mainly exist: 1) the dynamic effect is poor; 2) the cost is high; 3) the packaging effect is poor; 4) the sound quality is poor.
Over a decade ago, a group of scientists at the united states bell laboratory designed an ultrasonic microphone. This microphone consists of a small ultrasound transmitter and a large broadband conventional microphone. In operation, the ultrasound transmitter transmits a periodic sequence of wideband pulses. The frequency of the broadband pulse signal is between 20kHz and 70kHz, after the broadband pulse signal is reflected by a vocal tract of a speaker, a microphone receives a reflected signal, and then a digital signal processing link at the rear end estimates the shape parameter of the vocal tract by utilizing the transmitted signal and the reflected signal, so as to synthesize the voice spoken by the speaker. The microphone is characterized by working in an ultrasonic frequency band, so that the microphone is not interfered by audio signals in an auditory perception frequency range and can be used for solving the problem of voice communication in a cocktail-like environment. Scientists in the united states bell laboratory have constructed a workable prototype system that provides received and synthesized vowels to a speech recognition system that achieves 95% recognition. Preliminary hearing experiments also confirmed that the quality of the synthesized vowel substantially achieved the voice quality of conventional microphones. Of course, the microphone has many problems to be solved before being really practical, and the biggest technical problem is that the synthesized voice quality and intelligibility are not high for phonemes with unobtrusive vocal tract characteristics such as nasal sound.
The steganoacoustic technique can be viewed theoretically as a sub-problem of signal separation or enhancement. In a complex acoustic environment, a microphone is used to pick up a signal from a certain sound source, which is almost without exception contaminated. According to the mechanism of pollution generation, noise in speech signal processing is classified into 4 types: ambient noise, echoes, reverberation and interference. In order to separate the sound source from the noise, a specific method is used for each type of noise:
ambient Noise (Noise): environmental noise is unavoidable and ubiquitous, and its presence can seriously affect the voice quality, intelligibility and perception of spatial information by the human ear of a speech signal. The ambient noise is usually relatively stationary, i.e. the statistical properties of the noise at the current time point can be replaced by the statistical properties of the noise at the historical time. According to the statistical characteristics of the signal with noise and the statistical characteristics of the noise, a filter can be designed to filter the observation signal, so that the voice signal is enhanced, and the background noise is suppressed. The noise reduction technique may utilize a single-channel pickup system or may utilize a multi-channel pickup system, which correspond to the single-channel noise reduction technique and the multi-channel noise reduction technique, respectively. In contrast to single-channel noise reduction techniques, which result in speech distortion while achieving noise suppression, multi-channel noise reduction techniques reduce speech distortion while achieving the same output signal-to-noise ratio.
Echo (Echo): acoustic echoes are generated by acoustic coupling between the microphone and the loudspeaker. The presence of echo can severely impact multi-party duplex interaction. The echo is characterized by the fact that the source signal is known, that the echo component of the observed signal picked up by the microphone can be estimated as long as the acoustic propagation channel from the loudspeaker to the microphone can be estimated, and that the echo cancellation can be achieved by subtracting the estimate of this component from the picked-up signal, a technique known as echo cancellation.
Reverberation (reverbration): reverberation is caused by interface reflections (multipath effects) in the room environment. The reflections are classified into early reflections and late reflections. Early reflections (typically within 40 ms) typically carry useful information, such as the size of the room, which can be estimated by analyzing the structure of the early reflections. In addition, the early reflection can also enhance the harmonic components of the music and improve the listening feeling. However, late reflections cause spectral distortions, which lead to a reduction in speech quality, intelligibility, and blurring of the location information of the sound source. In a voice communication system, late reflections cause reverberation, thereby seriously affecting the quality of voice communication, so that a dereverberation technique is required. A dereverberation technology is that firstly, blind estimation is carried out on a channel, and then dereverberation is realized by utilizing an equalization technology; another technique for suppressing reverberation is super-directional array beamforming, which basically extracts a sound source signal from a desired direction and suppresses signals from other directions at the same time. The reverberation comes from all directions, so the super-directional array can suppress the reverberation to some extent.
Interference (Interference) signals from other sound sources: the interfering signal is caused by point source noise and is noise from a certain direction in space. In voice communication, many people are around and other sound sources are present, so that a plurality of sound sources are inevitable at each communication end, and signals from different sound sources interfere with each other. The typical technique for interference suppression is beamforming, which basically forms a spatial filter and aligns the direction of the maximum filter response to the direction of the desired sound source, and the degree of interference suppression depends on the size of the array response in the direction of the interference. The goals of speech noise reduction, source separation and beamforming are to separate the desired useful signal from other interfering signals, so these techniques can be used for steganographic processing. However, the separation performance available by the current separation technology is limited, and the requirements of the application of the hidden sound cannot be met.
The aforementioned signal separation techniques require the use of microphone arrays. Microphone array research has been in progress for over forty years, and over forty years, many array designs and processing methods have been developed. These arrays can be divided into two broad categories according to their response to the sound field: additive type arrays (AMA) and differential arrays (DMA). The additive array is typically large in size, each microphone measures the sound pressure of the sound field, and the entire array beamforming is also responsive to the sound pressure field. Much work is currently done in the literature on additive arrays, and on methods of processing additive arrays. In comparison, the differential array responds to the spatial differential field of the sound pressure field, and has the characteristics of small array size, good beam pattern frequency consistency, capability of maximizing array directivity under the condition of given array element number and the like.
Disclosure of Invention
The invention aims to provide a hidden sound method based on MVDR, which has a good hidden sound effect.
In order to solve the technical problems, the invention adopts the following technical scheme: the method is characterized in that a short-time Fourier transform is utilized to decompose a time domain signal received by a sensor array into sub-band signals, and an MVDR-based hidden sound filter is constructed on each sub-band, so that a sound signal from a sound source within a hidden sound distance does not attenuate and passes through the hidden sound filter, and finally an estimation signal is obtained through inverse STFT.
Further, the method comprises the steps of:
s1: constructing a guide vector according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like
S2: the signal y received by the sensor in the microphone arraym(k)=xm(k)+vm(k) M is divided into short time frames with a certain coincidence proportion, the frame length can be from several milliseconds to tens of milliseconds, and then each frame of each channel of the M channels is subjected to short time Fourier transform to obtain Ym(ω, i) where i represents the ith frame, and then constructing
ym(ω,i)=[Y1(ω,i) Y2(ω,i) … YM(ω,i)]T.
S3: decomposing a time domain signal received by the sensor array into sub-band signals by using short-time Fourier transform;
s4: constructing a hidden sound filter based on MVDR on a subband with frequency omega
S5: on the sub-band, the signal of the i-th frame is processed using a steganoacoustic filter h (ω)
S6: and transforming Z (omega, i) by using a short-time inverse Fourier transform and an overlap-add method, thereby obtaining a time-domain signal Z (k) after beam forming.
Further, the specific steps of decomposing the time domain signal received by the sensor array into sub-band signals by using short-time fourier transform are as follows:
assuming that the distance between two adjacent microphones is delta, because the sound hiding technology is used for picking up a sound source at a short distance, an ideal near-field sound source and interference act on the microphone array in an acoustic environment, and the distances from the sound source to the microphones are r respectivelys,1,rs,2,…,rs,MDefining the center of the array as a reference point, the distance of the sound source from the reference point being rsIncident angle of thetasThen the distance of the sound source from the mth microphone can be expressed as:
wherein,
assuming that at discrete time k, the signal emitted by the sound source is x (k), and if the absorption loss in the propagation process is neglected, the signal picked up by the m-th microphone has only a phase delay with respect to the sound source signal, and there is also an amplitude attenuation inversely proportional to the distance, which can be expressed as:
wherein x ism(k) Representing the sound source signal, v, picked up by the m-th microphonem(k) Representing the noise signal picked up by the mth microphone; tau ismRepresenting the time delay of the mth microphone compared to the sound source;
tau since the wave front of the near-field ideal sound source is sphericalmCan be expressed as:
where c represents the speed of sound in air,
in the frequency domain, equation (1) becomes:
wherein,denotes wave number, ω ═ 2 π f denotes angular frequency; f represents the frequency of the time and the frequency,represents an imaginary unit, Ym(ω)、Xm(ω)、Vm(ω) represents ym(k)、xm(k)、vm(k) The fourier transform of (d).
Further, according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like, a guide vector with the length of M is constructed:the superscript T represents the vector transpose operation, let r be rs,θ=θsThen, the process of the present invention,the signals picked up by the M microphones are represented by a vector as:
y(ω)=[Y1(ω) Y2(ω) … YM(ω)]T
=ds(ω,rs,θs)X(ω)+v(ω),
y (ω) is processed by the array, and the resulting output signal is:
wherein Z (ω) is an estimate of the sound source signal X (ω); the superscript H denotes the conjugate transpose operation, H (ω) — [ H [)1(ω)H2(ω) … HM(ω)]TRepresenting the weighting vector of the microphone array to the input signal, i.e. the filter coefficients of the microphone array.
Further, in S3, at the subband having the frequency ω, the MVDR-based implicit filter method is constructed as follows: the MVDR hidden sound filter is obtained by not attenuating the close-distance sound source signal and minimizing the variance of the residual noise at the output end of the array, and the mathematical expression is as follows:
hMVDR=argminh(ω)hHRvvh subject tohH(ω)ds(ω,rs,θs)=1, (2)
equation (2) can be solved using the Lagrange multiplier method to obtain
Further, assuming that the noise is isotropic noise, the (m, n) th element of its normalized correlation matrix can be written as follows:
wherein,there are two extreme cases: 1) if ω τ0Very large, i.e. at height
At high frequency or large distance, the noise signals received by the two sensors are nearly uncorrelated and isotropic
The sexual noise is close to spatial white noise; 2) if ω τ0The noise signals received by the two sensors are close to coherent under the condition of low frequency or small distance, and isotropic noise is close to point source noise;
the MVDR-based implicit acoustic filter has a matrix inversion operation, when the number of microphone elements is large, the matrix is ill-conditioned, the filter is extremely unstable, and in order to avoid the inversion instability, the traditional method is adopted: the diagonal loading technique is to add the matrix and a diagonal matrix, and the matrix after diagonal loading can be expressed as [ gamma ] (gamma)dn(ω)+εI]Wherein I is an identity matrix with a size of M, and epsilon is a loading coefficient, and at this time, the coefficient of the MVDR-based implicit acoustic filter is:
the invention has the beneficial effects that: the invention is a small microphone array on a platform, the core of the invention is that time domain signals received by a sensor array are decomposed into sub-band signals by utilizing short-time Fourier transform, and a proper sound hiding filter is constructed on each sub-band, so that sound signals from a sound source within a sound hiding distance pass through the sound hiding filter without attenuation.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a diagram of a muffling model.
FIG. 2 is a schematic diagram of a system for processing speech with a small microphone array.
Fig. 3 is a diagram of the pick-up and processing of near-field acoustic signals by a uniform linear array.
Fig. 4 is a graph of the attenuation of the hidden sound by the hidden sound method based on the 8-element MVDR.
FIG. 5 shows a 8-element MVDR-based implicit method with frequencies of 1300Hz and 3300Hz
The beam patterns (blue, red, purple, black indicate r is 5cm, 10cm, 30cm, 50cm, respectively).
FIG. 6 is a schematic diagram of a 8-array-element MVDR-based implicit method for aligning a sound source and an end with an end-fire direction of 0cm
Signal to interference ratio map of interference in the direction of the beam.
Detailed Description
The technical solution of the present invention will be clearly and completely described by the following detailed description.
The method for hiding the sound based on the MVDR comprises the steps of decomposing a time domain signal received by a sensor array into sub-band signals by utilizing short-time Fourier transform, constructing a proper hidden sound filter on each sub-band, enabling a sound signal from a sound source within a hidden sound distance to pass through the hidden sound filter without attenuation, and finally obtaining an estimated signal through inverse STFT.
The method of the invention comprises the following steps:
s1: constructing a guide vector according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like
S2: the signal y received by the sensor in the microphone arraym(k)=xm(k)+vm(k) M is divided into short time frames with a certain coincidence proportion, the frame length can be from several milliseconds to tens of milliseconds, and then each frame of each channel of the M channels is subjected to short time Fourier transform to obtain Ym(ω, i) where i represents the ith frame, and then constructing
ym(ω,i)=[Y1(ω,i) Y2(ω,i) … YM(ω,i)]T.
S3: decomposing a time domain signal received by the sensor array into sub-band signals by using short-time Fourier transform; assuming that the distance between two adjacent microphones is delta, because the sound hiding technology is used for picking up a sound source at a short distance, an ideal near-field sound source and interference act on the microphone array in an acoustic environment, and the distances from the sound source to the microphones are r respectivelys,1,rs,2,…,rs,MDefining the center of the array as a reference point, the distance of the sound source from the reference point being rsIncident angle of thetasThen the distance of the sound source from the mth microphone can be expressed as:
wherein,
assuming that at discrete time k, the signal emitted by the sound source is x (k), and if the absorption loss in the propagation process is neglected, the signal picked up by the m-th microphone has only a phase delay with respect to the sound source signal, and there is also an amplitude attenuation inversely proportional to the distance, which can be expressed as:
wherein x ism(k) Representing the sound source signal, v, picked up by the m-th microphonem(k) Representing the noise signal picked up by the mth microphone; tau ismRepresenting the time delay of the mth microphone compared to the sound source;
tau since the wave front of the near-field ideal sound source is sphericalmCan be expressed as:
where c represents the speed of sound in air,
in the frequency domain, equation (1) becomes:
wherein,denotes wave number, ω ═ 2 π f denotes angular frequency; f represents the frequency of the time and the frequency,represents an imaginary unit, Ym(ω)、Xm(ω)、Vm(ω) represents ym(k)、xm(k)、vm(k) The fourier transform of (d).
Constructing a guide vector with the length of M according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like:the superscript T represents the vector transpose operation, let r be rs,θ=θsThen, the process of the present invention,the signals picked up by the M microphones are represented by a vector as:
y(ω)=[Y1(ω) Y2(ω… YM(ω)]T
=ds(ω,rs,θs)X(ω)+v(ω),
y (ω) is processed by the array, and the resulting output signal is:
wherein Z (ω) is an estimate of the sound source signal X (ω); the superscript H denotes the conjugate transpose operation, H (ω) — [ H [)1(ω)H2(ω) … HM(ω)]TRepresenting the weighting vector of the microphone array to the input signal, i.e. the filter coefficients of the microphone array.
Before deriving the implicit filter, several important indexes for measuring the performance of the implicit sound are introduced.
The components of the array output that are related to the source signal are: h isH(ω)ds(ω,rs,θs) X (omega). Thus, the response of the array to the acoustic source signal is:
response of array to sound source signalThere should be three variables: omega, rs,θs. Fixing two of these variables, the response of the array to the source signal varies with the other variable to produce three indices: 1) fix omega, thetasThe response of the array to the acoustic source signal is dependent on rsThe index of change is called the muffled attenuation function; 2) fixed rs,θsThe index of the response of the array to the sound source signal changing along with omega is called a hidden sound frequency response function; 3) fix omega, rsResponse of the array to the source signal with thetasThe indicator of change is called the beam pattern.
Attenuation function of implicit sound
The muffling attenuation function describes the gain of the array at different distances to a single-frequency sound source signal, which is mathematically defined as:
muffled frequency response function
The muffled frequency response function describes the gain of the array to the broadband signal, which is mathematically defined as:
beam pattern
The beam pattern describes the sensitivity of the array to incident signals in different directions, which is mathematically defined as:
signal to interference ratio and signal to interference ratio gain
The signal-to-noise ratio measures the relative magnitude of the source signal and the noise signal. By comparing the input-output signal-to-noise ratio, the performance of the beamformer can be measured. The invention relates toThe heart is the interference noise. The input signal-to-interference ratio of the array is defined as:
wherein, X0(ω) is the sound source signal picked up by the reference position,V0(ω) is the interference signal picked up by the reference position,rnrepresenting the distance of the interference noise from the center of the array.Respectively represent X0(ω),V0(ω) power.
The output signal-to-interference ratio of the array is expressed as:
wherein theta isnIndicating the direction of the interference noise, phiX(ω)、ΦVAnd (ω) represents the power of X (ω) and V (ω), respectively.
Thus, the signal-to-interference ratio gain is:
s4: on the subband with frequency ω, the MVDR-based implicit filter method is constructed as follows: the MVDR hidden sound filter is obtained by not attenuating the close-distance sound source signal and minimizing the variance of the residual noise at the output end of the array, and the mathematical expression is as follows:
hMVDR=argminh(ω)hHRvvh subject to hH(ω)ds(ω,rs,θs)=1, (2)
equation (2) can be solved using the Lagrange multiplier method to obtain
Further, assuming that the noise is isotropic noise, the (m, n) th element of its normalized correlation matrix can be written as follows:
wherein,there are two extreme cases: 1) if ω τ0Very large, i.e. at high frequency or large spacing, the noise signals received by the two sensors are nearly uncorrelated, and the isotropic noise is close to spatial white noise; 2) if ω τ0Very small, i.e. at low frequencies or small spacings
Next, noise signals received by the two sensors are close to coherent, and isotropic noise is close to point source noise;
the MVDR-based implicit acoustic filter has a matrix inversion operation, when the number of microphone elements is large, the matrix is ill-conditioned, the filter is extremely unstable, and in order to avoid the inversion instability, the traditional method is adopted: the diagonal loading technique is to add the matrix and a diagonal matrix, and the matrix after diagonal loading can be expressed as [ gamma ] (gamma)dn(ω)+εI]Where I is an identity matrix of size M and ε is a loading coefficient, in which case MVDR-basedThe coefficients of the implicit sound filter are:
s5: on the sub-band, the signal of the i-th frame is processed using a steganoacoustic filter h (ω)
S6: and transforming Z (omega, i) by using a short-time inverse Fourier transform and an overlap-add method, thereby obtaining a time-domain signal Z (k) after beam forming.
In order to demonstrate the effect of the present invention, the following specific examples are given to verify the correctness of the algorithm proposed by the present patent. The hidden sound attenuation function, the hidden sound frequency response function and the beam pattern are MATLAB simulation results, and the signal-to-interference ratio is an experimental result of a test in a total-noise-elimination darkroom of an intelligent acoustics and immersive communication voice research center of northwest industrial university. The experimental conditions were set as follows:
the microphone array structure is a uniform linear array, the number M of the microphones is 8, and the array element distance delta is 1.1 cm.
Sound source position: thetas=0°,rs=5cm。
Fig. 4(a) is a graph of the attenuation of the concealment in the concealment method based on the 8-element MVDR. It can be seen that: the point source signal is attenuated quickly within 20cm, and the attenuation is quicker when the distance is closer; above 20cm, the attenuation is almost inversely proportional to the distance. The method is used for hiding the remote interference by the hiding method based on the 8-array element MVDR when the sound source distance is not more than 20 cm. Description of the drawings: radius of sound masking r0About 20cm, when the sound source distance rs<r0In time, the 8-array element MVDR-based implicit method can be used for implicit r0External interference.
Fig. 4(b) is a graph of the muffled frequency response of the muffled method based on 8-element MVDR. It can be seen that: the response of a point source signal at 2000Hz at 5cm, 10cm, 30cm and 50cm is in turn about: 0dB, 19dB, 33dB, 38 dB. Description of the drawings: when the sound source distance is 5cm, the interference at the positions of 10cm, 30cm and 50cm in the sound end-fire direction can be concealed by the concealed sound method based on the 8-array element MVDR.
FIG. 5 is a beam diagram of an 8-element MVDR based implicit method at 1300Hz and 3300 Hz. It can be seen that: at different frequencies, when the sound source distance is 5cm, the hidden sound method based on the 8-array element MVDR can hide the interference at 10cm, 30cm and 50cm in any direction, and the hidden sound effect on the interference in the non-end-fire direction is better.
Fig. 6 is a signal-to-interference ratio diagram of interference of a sound source with an end-fire direction of 0cm and an end-fire direction by a hidden sound method based on 8-array element MVDR. It can be seen that: when the interference is at 50cm, the signal-to-interference ratio of the voice interference in fig. 6(a) is improved by about 10dB, the signal-to-interference ratio of the white gaussian noise interference in fig. 6(b) is improved by about 6dB, the signal-to-interference ratio of the chirp interference in fig. 6(c) is improved by about 7dB, and the signal-to-interference ratio of the single frequency interference in fig. 6(d) is improved by about 7 dB. Description of the drawings: the implicit method based on the 8-array element MVDR has a good implicit effect.
The above-mentioned embodiments are merely descriptions of the preferred embodiments of the present invention, and do not limit the concept and scope of the present invention, and various modifications and improvements made to the technical solutions of the present invention by those skilled in the art should fall into the protection scope of the present invention without departing from the design concept of the present invention, and the technical contents of the present invention as claimed are all described in the technical claims.

Claims (10)

1. The MVDR-based implicit method is characterized in that: the method comprises the steps of decomposing a time domain signal received by a sensor array into sub-band signals by short-time Fourier transform, constructing an MVDR-based sound hiding filter on each sub-band, enabling sound signals from a sound source within a sound hiding distance to pass through the sound hiding filter without attenuation, and finally obtaining an estimated signal through inverse STFT.
2. The MVDR-based implicit method of claim 1 wherein: the method comprises the following steps:
s1: constructing a guide vector according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like
S2: the signal y received by the sensor in the microphone arraym(k)=xm(k)+vm(k) M is divided into short time frames with a certain coincidence proportion, the frame length can be from several milliseconds to tens of milliseconds, and then each frame of each channel of the M channels is subjected to short time Fourier transform to obtain Ym(ω, i) where i represents the ith frame, and then constructing
ym(ω,i)=[Y1(ω,i)Y2(ω,i)…YM(ω,i)]T.
S3: decomposing a time domain signal received by the sensor array into sub-band signals by using short-time Fourier transform;
s4: constructing a hidden sound filter based on MVDR on a subband with frequency omega
S5: on the sub-band, the signal of the i-th frame is processed using a steganoacoustic filter h (ω)
S6: and transforming Z (omega, i) by using a short-time inverse Fourier transform and an overlap-add method, thereby obtaining a time-domain signal Z (k) after beam forming.
3. The MVDR-based implicit method of claim 1 wherein: by short-time Fourier transformAnd transforming, namely decomposing the time domain signals received by the sensor array into sub-band signals, wherein the specific steps are as follows: assuming that the distance between two adjacent microphones is delta, because the sound hiding technology is used for picking up a sound source at a short distance, an ideal near-field sound source and interference act on the microphone array in an acoustic environment, and the distances from the sound source to the microphones are r respectivelys,1,rs,2,…,rs,MDefining the center of the array as a reference point, the distance of the sound source from the reference point being rsIncident angle of thetasThen the distance of the sound source from the mth microphone can be expressed as:
wherein,
assuming that at discrete time k, the signal emitted by the sound source is x (k), and if the absorption loss in the propagation process is neglected, the signal picked up by the m-th microphone has only a phase delay with respect to the sound source signal, and there is also an amplitude attenuation inversely proportional to the distance, which can be expressed as:
wherein x ism(k) Representing the sound source signal, v, picked up by the m-th microphonem(k) Representing the noise signal picked up by the mth microphone; tau ismRepresenting the time delay of the mth microphone compared to the sound source;
tau since the wave front of the near-field ideal sound source is sphericalmCan be expressed as:
where c represents the speed of sound in air,
in the frequency domain, equation (1) becomes:
wherein,denotes wave number, ω ═ 2 π f denotes angular frequency; f represents the frequency of the time and the frequency,represents an imaginary unit, Ym(ω)、Xm(ω)、Vm(ω) represents ym(k)、xm(k)、vm(k) The fourier transform of (d).
4. The MVDR-based implicit method of claim 3 wherein: constructing a guide vector with the length of M according to parameters such as the structure of the array, the number of array elements, the position of a sound source and the like:the superscript T represents the vector transpose operation, let r be rs,θ=θsThen, the process of the present invention,the signals picked up by the M microphones are represented by a vector as:
y(ω)=[Y1(ω) Y2(ω) … YM(ω)]T
=ds(ω,rs,θs)X(ω)+v(ω),
y (ω) is processed by the array, and the resulting output signal is:
wherein Z (ω) is an estimate of the sound source signal X (ω); the superscript H denotes the conjugate transpose operation, H (ω) — [ H [)1(ω) H2(ω) … HM(ω)]TRepresenting the weighting vector of the microphone array to the input signal, i.e. the filter coefficients of the microphone array.
5. The MVDR-based implicit method of claim 1 wherein: in S3, the MVDR-based implicit filter method is constructed on the subband having the frequency ω as follows: the MVDR hidden sound filter is obtained by not attenuating the close-distance sound source signal and minimizing the variance of the residual noise at the output end of the array, and the mathematical expression is as follows:
hMVDR=arg minh(ω)hHRvvh subject to hH(ω)ds(ω,rs,θs)=1,(2)
equation (2) can be solved using the Lagrange multiplier method to obtain
6. The MVDR-based implicit method of claim 5 wherein: assuming that the noise is isotropic, the (m, n) th element of its normalized correlation matrix can be written as follows:
wherein,there are two extreme cases: 1) if ω τ0Very large, i.e. at high frequency or large spacing, the noise signals received by the two sensors are nearly uncorrelated, and the isotropic noise is close to spatial white noise; 2) if ω τ0Very small, i.e. at low frequencies orUnder the condition of small distance, noise signals received by the two sensors are close to coherent, and isotropic noise is close to point source noise;
the MVDR-based implicit acoustic filter has a matrix inversion operation, when the number of microphone elements is large, the matrix is ill-conditioned, the filter is extremely unstable, and in order to avoid the inversion instability, the traditional method is adopted: the diagonal loading technique is to add the matrix and a diagonal matrix, and the matrix after diagonal loading can be expressed as [ gamma ] (gamma)dn(ω)+εI]Wherein I is an identity matrix with a size of M, and epsilon is a loading coefficient, and at this time, the coefficient of the MVDR-based implicit acoustic filter is:
7. the MVDR-based implicit sound method of claim 1 wherein the microphone array includes but is not limited to a uniform linear microphone array, a non-uniform linear microphone array and a circular microphone array.
8. The MVDR-based masking method of claim 1 wherein the microphone array is one of a small microphone array and a large microphone array.
9. The MVDR-based implicit method of claim 1 wherein the speech signal processing method is used to process narrowband signals or wideband signals.
10. The MVDR-based stealth method of claim 1, wherein a stealth radius of the microphone array increases as a number of microphones and an inter-element spacing increases.
CN201810221809.5A 2017-03-19 2018-03-18 Hidden method for acoustic based on MVDR Pending CN108597532A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710163190.2A CN107170462A (en) 2017-03-19 2017-03-19 Hidden method for acoustic based on MVDR
CN2017101631902 2017-03-19

Publications (1)

Publication Number Publication Date
CN108597532A true CN108597532A (en) 2018-09-28

Family

ID=59848862

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710163190.2A Withdrawn CN107170462A (en) 2017-03-19 2017-03-19 Hidden method for acoustic based on MVDR
CN201810221809.5A Pending CN108597532A (en) 2017-03-19 2018-03-18 Hidden method for acoustic based on MVDR

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201710163190.2A Withdrawn CN107170462A (en) 2017-03-19 2017-03-19 Hidden method for acoustic based on MVDR

Country Status (1)

Country Link
CN (2) CN107170462A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148420A (en) * 2019-06-30 2019-08-20 桂林电子科技大学 A kind of audio recognition method suitable under noise circumstance

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110018465B (en) * 2018-01-09 2020-11-06 中国科学院声学研究所 MVDR beam forming method based on full-phase preprocessing
CN108717855B (en) * 2018-04-27 2020-07-28 深圳市沃特沃德股份有限公司 Noise processing method and device
CN112420068B (en) * 2020-10-23 2022-05-03 四川长虹电器股份有限公司 Quick self-adaptive beam forming method based on Mel frequency scale frequency division
CN116013239B (en) * 2022-12-07 2023-11-17 广州声博士声学技术有限公司 Active noise reduction algorithm and device for air duct

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103491397A (en) * 2013-09-25 2014-01-01 歌尔声学股份有限公司 Method and system for achieving self-adaptive surround sound
CN103856866A (en) * 2012-12-04 2014-06-11 西北工业大学 Low-noise differential microphone array
CN104464739A (en) * 2013-09-18 2015-03-25 华为技术有限公司 Audio signal processing method and device and difference beam forming method and device
CN104902418A (en) * 2014-03-07 2015-09-09 奥迪康有限公司 Multi-microphone method for estimation of target and noise spectral variances
EP2916320A1 (en) * 2014-03-07 2015-09-09 Oticon A/s Multi-microphone method for estimation of target and noise spectral variances
CN104980865A (en) * 2014-04-03 2015-10-14 奥迪康有限公司 Binaural hearing assistance system comprising binaural noise reduction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103856866A (en) * 2012-12-04 2014-06-11 西北工业大学 Low-noise differential microphone array
CN104464739A (en) * 2013-09-18 2015-03-25 华为技术有限公司 Audio signal processing method and device and difference beam forming method and device
CN103491397A (en) * 2013-09-25 2014-01-01 歌尔声学股份有限公司 Method and system for achieving self-adaptive surround sound
CN104902418A (en) * 2014-03-07 2015-09-09 奥迪康有限公司 Multi-microphone method for estimation of target and noise spectral variances
EP2916320A1 (en) * 2014-03-07 2015-09-09 Oticon A/s Multi-microphone method for estimation of target and noise spectral variances
CN104980865A (en) * 2014-04-03 2015-10-14 奥迪康有限公司 Binaural hearing assistance system comprising binaural noise reduction

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110148420A (en) * 2019-06-30 2019-08-20 桂林电子科技大学 A kind of audio recognition method suitable under noise circumstance

Also Published As

Publication number Publication date
CN107170462A (en) 2017-09-15

Similar Documents

Publication Publication Date Title
JP5323995B2 (en) System, method, apparatus and computer readable medium for dereverberation of multi-channel signals
CN108597532A (en) Hidden method for acoustic based on MVDR
US7366662B2 (en) Separation of target acoustic signals in a multi-transducer arrangement
US9485574B2 (en) Spatial interference suppression using dual-microphone arrays
US8098844B2 (en) Dual-microphone spatial noise suppression
US9100734B2 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
CN108337605A (en) The hidden method for acoustic formed based on Difference Beam
TW201030733A (en) Systems, methods, apparatus, and computer program products for enhanced active noise cancellation
US8615392B1 (en) Systems and methods for producing an acoustic field having a target spatial pattern
CN113782046B (en) Microphone array pickup method and system for long-distance voice recognition
Yamamoto et al. Spherical microphone array post-filtering for reverberation suppression using isotropic beamformings
Šarić et al. Performance analysis of MVDR beamformer applied on an end-fire microphone array composed of unidirectional microphones
CN112017684B (en) Closed space reverberation elimination method based on microphone array
Li et al. Speech Enhancement Using Robust Generalized Sidelobe Canceller with Multi-Channel Post-Filtering in Adverse Environments
Zhang et al. Speech enhancement using compact microphone array and applications in distant speech acquisition
Zhang et al. A frequency domain approach for speech enhancement with directionality using compact microphone array.
Zhang et al. A compact-microphone-array-based speech enhancement algorithm using auditory subbands and probability constrained postfilter
Datla Implementation and evaluation of spectral subtraction (SS) with minimum statistics and wiener beamformer combination
Li et al. A Subband Feedback Controlled Generalized Sidelobe Canceller in Frequency Domain with Multi-Channel Postfilter
Tashev Defeating ambient noise: practical approaches for noise reduction and suppression
Nilsson Suppression of reverberation in hearing aids
Tashev Microsoft Research Redmond, USA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180928

RJ01 Rejection of invention patent application after publication