CN114979904B

CN114979904B - Binaural wiener filtering method based on single external wireless acoustic sensor rate optimization

Info

Publication number: CN114979904B
Application number: CN202210547834.9A
Authority: CN
Inventors: 张结
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2024-02-23
Anticipated expiration: 2042-05-18
Also published as: CN114979904A

Abstract

The invention discloses a binaural wiener filtering method based on single external wireless acoustic sensor rate optimization, which comprises the following steps: obtaining a noise covariance matrix and a relative acoustic transfer function by carrying out parameter estimation; according to a preset lower bound of the expected output signal-to-noise ratio, processing a noise covariance matrix and a multi-sound source relative acoustic transfer function by using a single external wireless acoustic sensor to obtain the lowest transmission bit rate of the single external wireless acoustic sensor; constructing an unconstrained binaural multi-channel wiener filter with a weighted part noise estimation term through a binaural hearing aid according to the lowest transmission bit rate, and obtaining filter coefficients of the binaural multi-channel wiener filter with the weighted part noise estimation term according to the binaural multi-channel wiener filter and a noise covariance matrix; and carrying out frequency domain beam forming by utilizing short-time inverse Fourier transform according to the filter coefficients to obtain a binaural output voice signal, and outputting binaural spatial characteristic clues by utilizing a binaural multichannel wiener filter.

Description

Binaural wiener filtering method based on single external wireless acoustic sensor rate optimization

Technical Field

The invention relates to the field of voice signal processing, in particular to a binaural wiener filtering and device based on single external wireless acoustic sensor rate optimization.

Background

In complex acoustic scenes, the target sound source is usually submerged in various interference sound sources and background noise, so that a hearing normal person can naturally locate multiple sound source positions and understand the target speaker at the same time, but the hearing impaired person is very difficult to see. Wearing Hearing Aids (Hearing Aids) or other Hearing aid devices can improve the Hearing level of a Hearing impaired user to some extent. This requires the co-listening device to have both speech enhancement and multi-sound source localization capabilities, where sound source localization relies on Spatial Cues (Spatial Cues), so preserving binaural Spatial Cues is particularly critical for sound source localization, where a sound source may be a danger cue in reality. Binaural spatial cues include channel time differences (ITD: interaural Time Delay), level differences (ILD: interaural Level Difference) and phase differences (IPD: interaural Phase Difference), which are group delay, amplitude response and phase response of the channel relative transfer function (ITF: interaural Transfer Function), respectively. Although the purpose of speech enhancement is to suppress the energy of interfering sound sources, if their spatial cues are retained in the output speech, the hearing aid user can still determine the location information of the interfering sound sources based on the spatial cues.

The traditional hearing aid has the problems of serious distortion, poor noise reduction performance, low spatial cue preservation precision, high battery resource loss and the like.

Disclosure of Invention

In order to solve the above technical problems, the present invention provides a binaural wiener filtering method and device based on rate optimization of a single external wireless acoustic sensor, so as to solve at least one of the above problems.

According to a first aspect of the present invention, there is provided a binaural wiener filtering method based on single external wireless acoustic sensor rate optimization, comprising:

carrying out parameter estimation on a noise frame by using a sample moving average method to obtain a noise covariance matrix, and estimating the noise covariance matrix by using a covariance matrix difference method or a covariance matrix whitening method to obtain a relative acoustic transfer function, wherein the noise frame is obtained by processing multi-microphone noisy speech by using an end point detector;

according to a preset lower bound of the expected output signal-to-noise ratio, processing a noise covariance matrix and a relative acoustic transfer function by using the single external wireless acoustic sensor to obtain the lowest transmission bit rate of the single external wireless acoustic sensor;

constructing an unconstrained binaural multi-channel wiener filter with a weighted part noise estimation term through a binaural hearing aid according to the lowest transmission bit rate, and obtaining filter coefficients of the binaural multi-channel wiener filter with the weighted part noise estimation term according to the binaural multi-channel wiener filter and a noise covariance matrix;

And carrying out frequency domain beam forming on the binaural microphone voice signals by utilizing short-time inverse Fourier transform according to the filter coefficients to obtain binaural output voice signals, and outputting binaural spatial characteristic clues by utilizing the binaural multichannel wiener filter.

According to an embodiment of the present invention, the above-mentioned relative acoustic transfer function is a normalized principal eigenvector of a speech covariance matrix or a normalized generalized principal eigenvector of a hybrid covariance matrix and a noise covariance matrix;

wherein the speech covariance matrix is the difference between the hybrid covariance matrix and the noise covariance matrix;

the mixed covariance matrix is obtained by the following steps:

the end point detector processes the multi-microphone noise voice to obtain a noise frame and a noise-voice frame;

carrying out parameter estimation on the noise-voice frame by using a sample moving average method to obtain a mixed covariance matrix;

and estimating the voice covariance matrix by using a covariance matrix difference method or a covariance matrix whitening method to obtain an acoustic transfer function.

According to an embodiment of the present invention, the processing the noise covariance matrix and the relative acoustic transfer function by using the single external wireless acoustic sensor according to the preset lower bound of the expected output signal-to-noise ratio, to obtain the lowest transmission bit rate of the single external wireless acoustic sensor includes:

Determining a noise characteristic clue by using the power spectral density of the binaural hearing aid, the power spectral density of the single external wireless acoustic sensor, the relative acoustic transfer function and the acoustic transfer function, and quantizing the noise characteristic clue to obtain a quantized noise characteristic clue;

determining a rate optimization model of the single external wireless acoustic sensor by restraining the output signal-to-noise ratio according to a preset lower bound of the expected output signal-to-noise ratio;

and inputting the quantized noise characteristic clues into a rate optimization model to obtain the minimum transmission bit rate of the single external wireless acoustic sensor.

According to an embodiment of the present invention, constructing an unconstrained binaural multi-channel wiener filter with weighted partial noise estimation terms from a binaural hearing aid according to the lowest transmission bit rate, and obtaining filter coefficients of the binaural multi-channel wiener filter with weighted partial noise estimation terms from the binaural multi-channel wiener filter and the noise covariance matrix comprises:

quantizing the audio signal transmitted by the single external wireless acoustic sensor by using the lowest transmission bit rate and wirelessly transmitting the quantized audio signal;

receiving the quantized audio signal by using a binaural hearing aid, and constructing an unconstrained binaural multichannel wiener filter model with a weighted part noise estimation term by a minimized speech distortion method and a weighted output noise power summation method;

Obtaining a closed solution of the binaural multichannel wiener filter model by solving the derivative of the objective function of the binaural multichannel wiener filter model on the filter coefficients of the binaural multichannel wiener filter model;

and inputting the noise covariance matrix into a closed solution to obtain the filter coefficients of the binaural multi-channel wiener filter with the weighted partial noise estimation terms.

According to an embodiment of the present invention, the binaural multi-channel wiener filter with weighted partial noise estimation terms described above includes a binaural multi-channel wiener filter with weighted partial noise estimation terms based on a full-rate single external acoustic sensor and a binaural multi-channel wiener filter with weighted partial noise estimation terms based on a limited-rate single external acoustic sensor.

According to an embodiment of the present invention, the binaural multi-channel wiener filter with weighted partial noise estimation term based on the full-rate single external acoustic sensor described above is determined by equation (1):

a binaural multi-channel wiener filter with weighted partial noise estimation terms based on a finite rate single external acoustic sensor is determined by equation (2):

wherein Tr represents a matrix trace operation,m+1-dimensional speech covariance matrix representing microphone speech signal comprising single external wireless acoustic sensor, < > >M+1-dimensional noise covariance matrix representing microphone speech signal comprising single external wireless acoustic sensor, e _i Represents an M+1-dimensional column vector, ">Quantized speech covariance matrix representing m+1 dimensions of microphone speech signal comprising single external wireless acoustic sensor, +.>Represents an m+1-dimensional quantized noise covariance matrix of a microphone speech signal comprising a single external wireless acoustic sensor, L represents a left ear microphone, and R represents a right ear microphone, where M is a positive integer.

According to an embodiment of the present invention, the performing frequency domain beamforming on binaural microphone speech using short-time inverse fourier transform according to the filter coefficients to obtain a binaural output speech signal, and outputting binaural spatial signature cues using a binaural multichannel wiener filter includes:

performing short-time Fourier transform on the noisy speech received by the binaural hearing aid to obtain column vector features of the noisy speech;

performing inner product operation on the column vector characteristics of the voice with noise and the filter coefficients to obtain binaural voice output signals, wherein the binaural voice output signals comprise left ear voice output signals and right ear voice output signals;

calculating a binaural spatial signature cue of the binaural speech output signal using the binaural multi-channel wiener filter, wherein the binaural spatial signature cue comprises an output noise inter-channel transfer function, an output noise channel time difference, and an output noise channel phase difference;

And determining a binaural noise source positioning algorithm according to the binaural spatial characteristic clues.

According to an embodiment of the present invention, the binaural wiener filtering method based on single external wireless acoustic sensor rate optimization further includes:

the performance of the binaural multi-channel wiener filter with weighted partial noise estimate term is determined using the output signal quality and/or noise spatial cue preservation accuracy.

According to an embodiment of the present invention, the above-mentioned lowest transmission bit rate is determined by the formula (3):

wherein A is _e Representing the amplitude range, delta, of a microphone signal of a single external acoustic sensor _e Representing the uncorrelated noise variance of the microphone signal of the single-part wireless acoustic sensor,is the sum of the uncorrelated noise variance of the microphone signal of the mono wireless acoustic sensor and the quantized noise variance of the microphone signal of the mono wireless acoustic sensor, for representing the overall uncorrelated noise power of the mono external wireless acoustic sensor.

According to a second aspect of the present invention, there is provided a binaural wiener filtering device based on single external wireless acoustic sensor rate optimization, comprising:

the parameter estimation module is used for carrying out parameter estimation on a noise frame by utilizing a sample moving average method to obtain a noise covariance matrix, and carrying out estimation on the noise covariance matrix by utilizing a covariance matrix difference method or a covariance matrix whitening method to obtain a relative acoustic transfer function, wherein the noise frame is obtained by processing multi-microphone noisy speech through an end point detector;

The rate optimization module is used for processing the noise covariance matrix and the relative acoustic transfer function by using the single external wireless acoustic sensor according to a preset lower bound of the expected output signal-to-noise ratio to obtain the lowest transmission bit rate of the single external wireless acoustic sensor;

the filter construction module is used for constructing an unconstrained binaural multi-channel wiener filter with a weighted part noise estimation item through the binaural hearing aid according to the lowest transmission bit rate, and obtaining the filter coefficient of the binaural multi-channel wiener filter with the weighted part noise estimation item according to the binaural multi-channel wiener filter and the noise covariance matrix;

and the beam forming module is used for carrying out frequency domain beam forming on the binaural microphone voice signals by utilizing short-time inverse Fourier transform according to the filter coefficients to obtain binaural output voice signals, and outputting binaural spatial characteristic clues by utilizing the binaural multichannel wiener filter.

The invention provides a binaural wiener filtering voice enhancement method based on single external acoustic sensor rate optimization. The method has the advantages that: firstly, on the basis of a traditional BMWF-PNE method, the performance of the binaural hearing aid system in terms of voice enhancement and noise space characteristic clues is expanded by combining external wireless acoustic sensor audio data; secondly, the rate is optimized by minimizing the quantized transmission rate and restraining the speech enhancement signal-to-noise ratio, so that the communication rate can be effectively reduced, the battery resource consumption of an external sensor is saved, and the expected hearing perception effect of the hearing assistance system is ensured. Experimental results show that in reality, the multi-microphone audio data has a lot of redundant information, the redundant information can be reduced to a certain extent through optimizing the rate, the near-optimal voice processing performance is ensured, the voice enhancement and noise spatial cue preservation performance expected by the user can be obtained through adjusting the quantization rate, and the expected equalization effect is achieved.

Drawings

Fig. 1 is a flow chart of a binaural wiener filtering method based on single external wireless acoustic sensor rate optimization in accordance with an embodiment of the invention;

fig. 2 is a block diagram of a binaural wiener filtering method based on single external wireless acoustic sensor rate optimization in accordance with an embodiment of the invention;

FIG. 3 is a flow chart of obtaining a minimum transmission bit rate for a single external wireless acoustic sensor according to an embodiment of the present invention;

fig. 4 is a flow chart of obtaining filter coefficients of a binaural multi-channel wiener filter with weighted partial noise estimate terms according to an embodiment of the invention;

fig. 5 is a flowchart of deriving binaural output speech signals and binaural spatial signature cues according to an embodiment of the invention;

fig. 6 is a schematic diagram of a binaural wiener filtering device based on single external wireless acoustic sensor rate optimization according to an embodiment of the invention;

fig. 7 is an experimental view of an external wireless microphone based hearing aid according to an embodiment of the present invention;

FIG. 8 is a graph of output signal-to-noise ratio versus quantized transmission rate in accordance with an embodiment of the invention;

FIG. 9 is a graph of signal-to-noise ratio gain, ILD, and IPD errors versus parameters in accordance with an embodiment of the present invention;

fig. 10 is a graph of input signal-to-noise ratio, signal-to-noise ratio gain, ILD and IPD error versus external microphone angle, according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Wearing the hearing aid device can improve the hearing level of the hearing impaired person, requiring the hearing aid device to have the simultaneous functions of enhancing the target voice, suppressing interfering sound sources, and preserving binaural cues of noise sources to recover the complete sound field information. Traditional auxiliary hearing speech enhancement techniques rely on binaural beamforming methods, whose performance is severely dependent on binaural microphone array size, resulting in very limited spatial source perception performance for binaural auxiliary hearing systems based on small-sized microphone arrays. With the popularity of wireless electronic devices, the data resources of the secondary listening system can be increased by wirelessly sharing audio data of external wireless acoustic sensors. Therefore, the invention provides a binaural wiener filtering method based on a single external wireless acoustic sensor, which optimizes the battery energy consumption of the acoustic sensor in a mode of optimizing the transmission rate and restraining the binaural voice enhancement performance so as to expire the ideal voice enhancement target and improve the spatial audio perception performance of an auxiliary hearing system.

Conventional binaural hearing aids typically employ a Bilateral structure (bilinear), i.e. a speech enhancement algorithm (typically a small microphone array of 2-4 microphones per hearing aid) is designed on each of the two hearing aids, so that although the target sound source is well enhanced and the noise source is suppressed, the output spatial cues of the noise source are severely distorted. In recent years, with the rapid development of wireless technology, hearing aids have wireless signal transmission capability, and if binaural hearing aids can communicate with each other, a binaural speech enhancement (binaural) method can be designed by means of data sharing. Representative methods include binaural multichannel wiener filters (BMWF: binaural Multichannel Wiener Filtering), binaural minimum variance undistorted response beamformers (BMVDR: binaural Minimum Variance Distionless Response), and binaural linear constraint minimum variance beamformers (BLCMV: binaural Linearly-Constrained Minimum Variance). The BMWF and BMVDR methods design filters on the two hearing aids, respectively, with better noise reduction performance relative to the bilinear structure, but the output noise signature cues will be identified as coming from the target sound source direction. BLCMV trades for undistorted noise signature cues by including linear constraints on noise signature cues based on BMVDR. Studies have shown that the BMVDR and BLCMV methods are more effective for point sources, whereas the actual noise field is typically a diffuse noise scene (Diffuse Noise Field), which requires the use of BMWF techniques.

It can be seen that the conventional BMWF, BMVDR, BLCMV approach does not have controllable speech enhancement or spatial cue preservation capabilities. In view of the fact that there are differences in the degree of hearing loss and hearing habits of different users, it is desirable to obtain adjustable hearing assistance performance. In this regard, a BMWF-PNE method can be designed for a cost function of the BMWF, which includes a weighted part noise estimation term (PNE: partial noise estimation), the weighting factor eta controls the reference noise component included in the output voice, and the addition of the weighting factor can retain more noise components, so that better spatial cue preservation performance is obtained, and conversely, the reduction of the weighting factor can obtain better voice enhancement effect, but the spatial cue preservation precision is lost.

Although BMWF-PNE has more flexible performance tuning capabilities for diffuse noise fields relative to BMWF, the upper bound on speech enhancement and spatial cue preservation performance is low due to the very limited number of binaural microphones. With the widespread use of wireless electronic devices in recent years, hearing-aid users often carry electronic devices such as mobile phones and computers, and these devices have functions of sound signal acquisition and wireless signal transmission, and if the audio data transmission of the external devices is shared to hearing aids, the data resources of the hearing-aid system can be increased, and the binaural hearing perception performance is improved. On the other hand, since external wireless devices have limited battery resources, battery utilization is greatly dependent on data transmission rate. A remote microphone based BMWF-PNE binaural speech enhancement method was designed but using an ideal lossless microphone signal means that the external microphone signal needs to be quantitatively transmitted using infinite communication rate, which severely consumes battery resources of the external sensor.

Therefore, aiming at the problem of insufficient sound source perception performance of the traditional double-ear hearing assistance system, the invention expands the hearing perception space by sharing the audio data of the single external wireless acoustic sensor for the hearing assistance system, and establishes the BMWF-PNE method based on the single external wireless acoustic sensor by optimizing the communication transmission rate and restricting the voice enhancement performance, thereby saving the battery resource consumption of the external acoustic sensor (prolonging the service life thereof) and meeting the expected voice enhancement target. Experimental results show that the method can use a lower communication rate to achieve performance comparable to full rate transmission schemes.

Fig. 1 is a flow chart of a binaural wiener filtering method based on single external wireless acoustic sensor rate optimization according to an embodiment of the invention.

As shown in fig. 1, the binaural wiener filtering method based on the single external wireless acoustic sensor rate optimization includes operations S110 to S140.

In operation S110, a parameter estimation is performed on a noise frame by using a sample moving average method to obtain a noise covariance matrix, and the noise covariance matrix is estimated by using a covariance matrix difference method or a covariance matrix whitening method to obtain a relative acoustic transfer function, wherein the noise frame is obtained by processing multi-microphone noisy speech by using an end point detector.

Since the noise covariance matrix and the (relative) acoustic transfer function are the necessary parameters for the BMWF-PNE binaural beamforming method, the noise covariance matrix estimation is performed first. Where the noise covariance matrix may be estimated using a sample correlation matrix in a stationary noise scenario (e.g., diffuse noise field), a sample moving average (Average Smoothing) approach may be used. The relative acoustic transfer function can adopt two methods of covariance matrix difference or whitening, because the voice covariance matrix is equal to the mixed covariance matrix minus the noise covariance matrix, and the relative acoustic transfer function of the single sound source is a normalized principal eigenvector of the voice covariance matrix and is also equal to the normalized generalized principal eigenvectors of the mixed covariance matrix and the noise covariance matrix.

In operation S120, according to a preset lower bound of the expected output signal-to-noise ratio, the noise covariance matrix and the relative acoustic transfer function are processed by using the single external wireless acoustic sensor, so as to obtain the lowest transmission bit rate of the single external wireless acoustic sensor.

According to the parameter estimation obtained in operation S110, given a lower bound of the expected output signal-to-noise ratio, by minimizing the transmission rate (equivalently minimizing the transmission energy consumption), a rate optimization problem under the secondary constraint condition is established, and the problem can be converted into a root-finding problem of a unitary quadratic equation to obtain the lowest transmission bit rate.

In operation S130, an unconstrained binaural multi-channel wiener filter with weighted partial noise estimation terms is constructed by the binaural hearing aid based on the lowest transmission bit rate, and filter coefficients of the binaural multi-channel wiener filter with weighted partial noise estimation terms are obtained based on the binaural multi-channel wiener filter and the noise covariance matrix.

And (3) uniformly quantizing and wirelessly transmitting the audio signal of the external acoustic sensor according to the transmission rate obtained in the operation S120, establishing an unconstrained BMWF-PNE filter design criterion by a double-ear hearing aid end through a weighted summation mode of minimizing voice distortion and output noise power, obtaining a closed solution of the BMWF-PNE filter by solving a derivative of an objective function on the filter coefficient and setting the derivative to be zero, and substituting a covariance matrix into the closed solution.

In operation S140, beam forming is performed using a short-time fourier transform according to the filter coefficients, resulting in binaural output speech signals, and binaural spatial signature cues are output using binaural multi-channel wiener filters.

The M+1 dimension left ear and the right ear wave beam former obtained in the operation S130 are respectively integrated with the M+1 microphone signals received by the left ear and the right ear in a short time frequency domain to obtain left ear and right ear output voice signals; in addition, the binaural sound source localization algorithm can be designed based on the characteristic clues by calculating the output signal-to-noise ratio and the noise space characteristic clues by utilizing the binaural filters to perform the operation on the voice and noise covariance matrix.

the mixed covariance matrix is obtained by the following steps:

Fig. 2 is a block diagram of a binaural wiener filtering method based on single external wireless acoustic sensor rate optimization in accordance with an embodiment of the invention.

The binaural wiener filtering method based on single external wireless acoustic sensor rate optimization provided by the present invention is described in further detail below in conjunction with fig. 2.

As shown in fig. 2, the input of the end-point detection (VAD, voice activity detector) is a multi-microphone noisy speech signal, the output is a noise frame and a noise + speech frame, based on which a noise covariance matrix and a hybrid covariance matrix are estimated using a moving average technique using the noise frame and the noise + speech frame, respectively; based on the covariance matrix obtained by estimation, a relative acoustic transfer function estimation module adopts a covariance matrix whitening method to estimate a relative acoustic transfer function of multiple sound sources; inputting the noise covariance matrix and the relative acoustic transfer function to an external wireless acoustic sensor rate optimization module to obtain a minimum quantization rate; then uniformly quantizing the external microphone signal based on the rate and wirelessly transmitting at the rate; after the hearing aid receives an external microphone signal, designing a BMWF-PNE filter; finally, the binaural microphone signals are subjected to beam forming to obtain binaural output audio signals, output noise spatial characteristic clues (used for the subsequent sound source positioning algorithm design) are calculated, and the speech enhancement performance and the noise spatial characteristic clue preservation errors are evaluated.

Fig. 3 is a flow chart of obtaining a minimum transmission bit rate for a single external wireless acoustic sensor according to an embodiment of the present invention.

As shown in fig. 3, the above-mentioned processing the noise covariance matrix and the relative acoustic transfer function by using the single external wireless acoustic sensor according to the preset lower bound of the desired output signal-to-noise ratio, and obtaining the lowest transmission bit rate of the single external wireless acoustic sensor includes operations S310 to S330.

In operation S310, a noise feature cue is determined using the power spectral density of the binaural hearing aid, the power spectral density of the single external wireless acoustic sensor, the relative acoustic transfer function, and the noise feature cue is quantized to obtain a quantized noise feature cue.

In operation S320, a rate optimization model of the single external wireless acoustic sensor is determined by constraining the output signal-to-noise ratio according to a preset lower bound of the desired output signal-to-noise ratio.

In operation S330, the quantized noise signature cues are input to a rate optimization model, resulting in a minimum transmission bit rate for the single external wireless acoustic sensor.

Fig. 4 is a flow chart of obtaining filter coefficients for a binaural multi-channel wiener filter with weighted partial noise estimate terms according to an embodiment of the invention.

Constructing the binaural multi-channel wiener filter with weighted partial noise estimation terms unconstrained by the binaural hearing aid according to the lowest transmission bit rate and obtaining filter coefficients of the binaural multi-channel wiener filter with weighted partial noise estimation terms according to the binaural multi-channel wiener filter and the noise covariance matrix as described above comprises operations S410-S440, as shown in fig. 4.

In operation S410, an audio signal transmitted from a single external wireless acoustic sensor is quantized with a minimum transmission bit rate and the quantized audio signal is wirelessly transmitted.

In operation S420, the quantized audio signal is received with a binaural hearing aid, and an unconstrained binaural multi-channel wiener filter model with weighted partial noise estimation terms is constructed by a minimized speech distortion method and a weighted output noise power summation method.

In operation S430, a closed solution of the binaural multi-channel wiener filter model is obtained by solving the inverse of the filter coefficients of the target function of the binaural multi-channel wiener filter model for the binaural multi-channel wiener filter model.

In operation S440, the noise covariance matrix is input into the closed-form solution, resulting in filter coefficients of the binaural multi-channel wiener filter with weighted partial noise estimate terms.

wherein Tr represents a matrix trace operation,m+1-dimensional speech covariance matrix representing microphone speech signal comprising single external wireless acoustic sensor, < >>M+1-dimensional noise covariance matrix representing microphone speech signal comprising single external wireless acoustic sensor, e _i Represents an M+1-dimensional column vector, ">Quantized speech covariance matrix representing m+1 dimensions of microphone speech signal comprising single external wireless acoustic sensor, +.>Represents an m+1-dimensional quantized noise covariance matrix of a microphone speech signal comprising a single external wireless acoustic sensor, L represents a left ear microphone, and R represents a right ear microphone, where M is a positive integer.

Fig. 5 is a flow chart of deriving binaural output speech signals and binaural spatial signature cues according to an embodiment of the invention.

As shown in fig. 5, performing frequency domain beamforming on the binaural microphone voice signal by using short-time inverse fourier transform according to the filter coefficients to obtain a binaural output voice signal, and outputting binaural spatial signature cues by using the binaural multi-channel wiener filter includes operations S510 to S540.

In operation S510, performing short-time fourier transform on the noisy speech received by the binaural hearing aid to obtain column vector features of the noisy speech;

in operation S520, performing an inner product operation on the column vector features of the noisy speech and the filter coefficients to obtain binaural speech output signals, wherein the binaural speech output signals include a left ear speech output signal and a right ear speech output signal;

in operation S530, binaural spatial signature cues of the binaural speech output signal are calculated using the binaural multi-channel wiener filter, wherein the binaural spatial signature cues comprise an output noise inter-channel transfer function, an output noise channel time difference, and an output noise channel phase difference;

in operation S540, a binaural noise source localization algorithm is determined from the binaural spatial signature cues.

wherein A is _e Representing the amplitude range, delta, of a microphone signal of a single external acoustic sensor _e Representing the uncorrelated noise variance of the microphone signal of the single-part wireless acoustic sensor,a sum of uncorrelated noise variance representing the microphone signal of the mono wireless acoustic sensor and quantized noise variance of the microphone signal of the mono wireless acoustic sensor is used to represent the overall uncorrelated noise power of the mono external wireless acoustic sensor.

In order to better understand the technical solutions provided by the present invention for those skilled in the art, the technical solutions provided by the present invention are described in further detail below with reference to the specific embodiments.

(1) Signal model

Considering a binaural hearing aid system comprising M/2 microphones, M total microphones, where the frame and frequency index are represented by l and ω in the short-time frequency domain, respectively, the noisy speech signal collected by the kth microphone can be represented by equation (4):

Y _k (l,ω)＝X _k (l,ω)+N _k (l,ω),k＝1,…,M (4)，

Wherein X is _k (l, ω) and N _k (l, ω) represent the target signal component and the noise component, respectively, on the kth microphone. The short-time fourier transform domain signals of M microphones are stored as column vectors Y, i.e. y= [ Y ] ₁ (l,ω),Y ₂ (l,ω),…,Y _M (l,ω)] ^T Similarly, other variables may be written in column vector form, such that the signal model may be represented by equation (5):

where x represents a speech column vector, n represents a noise column vector,representing the M-dimensional complex vector set, the time-frequency index (l, ω) is omitted for ease of expression. Assuming that the target sound source and noise component are uncorrelated, the noisy hybrid microphone signal covariance matrix can be written as a sum of the speech covariance matrix and the noise covariance matrix, i.e., as shown in equation (6):

R _yy ＝ε[yy ^H ]＝ε[xx ^H ]+ε[nn ^H ]＝R _xx +R _nn (6)，

wherein H represents the conjugate transpose, and the speech covariance matrix is shown in formula (7):

wherein a and h _i Representing the acoustic transfer function (ATF: acoustic transfer function) and the relative acoustic transfer function (RTF: relative acoustic transfer function) respectively,representing the power spectral density of the target sound source, +.>Representing the power spectral density, R, of the target sound source component on the left and right ear reference microphones _nn Representing a noise covariance matrix, R _xx Representing a voice covariance matrix, epsilon represents an averaging operation, and L and R are reference microphone indexes of left and right ears. Using VAD (Voice Activity De) Detector, VAD, endpoint detection) module that can separate the noisy microphone signal into a noise frame and a speech + noise frame, over which R is estimated using a moving average technique, respectively _nn And R is _yy As shown in formulas (8) and (9):

in addition, the speech (e.g., equation (10) and the noise power spectral density (e.g., equation (11) are as follows) can be calculated from the covariance matrix:

wherein e _i The ith element is 1, the other positions are 0 and L _y Representing the number of frames, L, of noisy speech signals _n Representing the number of noise frames.

(2) BMWF-PNE based on full-rate single external acoustic sensor

Assuming that the external acoustic sensor transmits the acquired sound signal to the binaural hearing aid in a full-rate lossless manner, the hearing aid system has an m+1-dimensional microphone signal y _e ＝[y ^T ,Y _e ] ^T Similarly, x can be defined as _e ＝[x ^T ,X _e ] ^T ，n _e ＝[n ^T ,N _e ] ^T ，a _e ＝[a ^T ,a _e ] ^T ， And has a form as shown in formula (12):

wherein Y is _e Representing an external microphone noisy speech signal, X _e Representing the clean speech signal of an external microphone, N _e Representing an external microphone noise signal, a _e Representing the acoustic transfer function of the sound source to the external microphone,and->Representing an m+1-dimensional speech covariance matrix, a noisy speech covariance matrix, and a noise covariance matrix, respectively, that contain an external microphone (i.e., a single external wireless acoustic sensor).

The BMWF-PNE method designs the filter using a sum criterion that minimizes speech distortion and weighted output noise power, as shown in equation (13):

w _i ＝arg min _w ε{|X _i -w ^H x| ² }+με{|ηX _i -w ^H n| ² } (13)，

the optimal solution of the above unconstrained optimization problem is shown in equation (14):

where Tr represents a matrix trace (trace) operation, μ represents a speech distortion weighting factor, η represents a BMWF speech enhancement and binaural spatial cue preservation balance parameter. Obviously, when η=0 is that the BMWF-PNE is equivalent to the conventional BMWF filter, it can be proved that the noise output characteristic cues are equal to the characteristic cues of the sound source; when η=1, the output signal of the filter is equal to the reference microphone signal, which is equivalent to no processing, meaning that the noise characteristicsThe threads are all preserved. Thus, η controls speech enhancement and noise spatial feature cue preservation performance, the larger η is, the poorer noise reduction performance is, but the higher the noise feature cue preservation accuracy is; the smaller η, the better the noise reduction performance, but the larger the noise characteristic clue preservation error becomes. For simplicity and convenience makeWhere λ represents the normalized output signal-to-noise ratio.

(3) BMWF-PNE based on limited-rate single external acoustic sensor

Assuming that the external acoustic sensor uniformly quantizes each sample using b bits, the quantization noise variance is equal to that shown in equation (15):

Wherein A is _e Representing the amplitude range of the external microphone signal. If b bit/sample is used for lossless transmission, the exponential relation between the transmission energy consumption and the bit rate can be proved, namely, the larger the rate is, the higher the transmission energy consumption is [11 ]]。

Let Q represent quantization operations, y _eq ＝[y ^T ,Q(Y _e )] ^T Similarly, x can be defined as _eq ＝[x ^T ,Q(X _e )] ^T ，n _eq ＝[n ^T ,Q(N _e )] ^T ,Wherein y is _eq Representing quantized noisy speech column vectors, x _eq Representing quantized clean speech column vectors, n _eq Representing quantized noise column vectors, +.>Representing the quantized noisy speech covariance matrix (microphone speech signal containing a single external acoustic sensor (i.e. external microphone)),the quantized noise covariance matrix (microphone speech signal containing a single external acoustic sensor (i.e., external microphone)), then, like the BMWF-PNE filter, a BMWF-PNE expression based on a single external acoustic sensor at low rate can be obtained as shown in equation (16):

order theλ _eq Representing the quantized normalized output signal-to-noise ratio, can prove lambda _eq And lambda is not more than. The equal sign holds only when b is infinity.

(4) Binaural speech enhancement performance assessment index

To evaluate the designed binaural filter, an input signal-to-noise ratio (SNR) is defined as shown in equation (17):

the output signal to noise ratio can be calculated by equation (18) as follows:

It can be shown that the output SNR is about lambda _eq That is, using low rate transmission would lose speech enhancement SNR. In addition, the input ITF defining the target sound source is shown in formula (19):

the input ILD and IPD of the target sound source are the amplitude response and the phase response of the ITF, respectively. The output ITF of the sound source is calculated by equation (20) as follows:

thus, spatial cues of a target sound source using the BMWF-PNE filter can be preserved entirely. Similarly, the noise input ITF is defined as shown in equation (21):

the output noise ITF may be calculated using the formula (22):

wherein the method comprises the steps ofThe output noise ILD and IPD can thus be calculated similarly. It can be demonstrated that when η=1, the input and output ITFs of the noise are equal, that is, the noise spatial signature cues can be completely preserved, and reducing η increases the noise spatial signature preservation error. (5) External wireless acoustic sensor rate optimization

Let v in the presence of a single interfering sound source _e Acoustic transfer function representing interference source, let delta and delta _e Representing uncorrelated noise variances on the binaural microphone and the external microphone, respectively. Here it is assumed that uncorrelated noise on binaural microphones has the same variance, i.e. power spectral density, because they are very close together (the microphone spacing is small in real hearing aids). Equation (23) can be derived,

Wherein, represents complex conjugation, and an inequality shown in formula (24) can be obtained:wherein (1)>From this, it can be seen that λ represents the overall uncorrelated noise power of the external microphone _eq About->Monotonically decreasing and monotonically increasing with respect to bit rate. Based on this, minimizing the transmission energy consumption of the external acoustic sensor, equivalently minimizing the transmission rate, constraining the output signal-to-noise ratio, optimizing the transmission rate, can be described by the optimization problem shown in equation (25):

wherein β is as shown in formula (26):

wherein the method comprises the steps ofAnd->Representing the maximum and minimum output signal-to-noise ratios, respectively, i.e. the output SNR is maximum when the external microphone signal is transmitted losslessly to the hearing aid (b= infinity), and minimum when the external microphone signal is not used (b=0), α controls the desired output signal-to-noise ratio. Substituting the expression of SNR, the constraint in the above optimization problem is as shown in formula (27):

wherein z is ₁ 、z ₂ 、z ₃ As shown in the formula (28) to the formula (30):

due toAbout lambda _eq So that the larger b, the higher the output signal-to-noise ratio, the minimum bit rate satisfies the equality constraint. Solving the minimum bit rate can be performed in two steps: first from->Solving optimal lambda in unitary quadratic equation _eq I.e. +. >Then will->Substituted into lambda _eq Solving +.>Namely, formula (31) shows:

further, the minimum bit rate can be solved as shown in equation (32):

wherein the method comprises the steps ofRepresenting a rounding up operation.

(6) Binaural beamforming module

Based on the PUB beamformer, the binaural output speech signal may be beamformed in the STFT domain, as shown in equations (33) and (34):

in addition, BMWF-PNE based on a limited-speed single external acoustic sensor can be used for calculating transfer functions, ILD and IPD among output noise channels, and a noise source positioning algorithm can be designed according to the output binaural spatial characteristic clues.

Fig. 6 is a schematic diagram of a binaural wiener filtering device based on single external wireless acoustic sensor rate optimization according to an embodiment of the invention.

As shown in fig. 6, the apparatus 600 includes a parameter estimation module 610, a rate optimization module 620, a filter construction module 630, and a beamforming module 640.

The parameter estimation module 610 is configured to perform parameter estimation on a noise frame by using a sample moving average method to obtain a noise covariance matrix, and estimate the noise covariance matrix by using a covariance matrix difference method or a covariance matrix whitening method to obtain a relative acoustic transfer function, where the noise frame is obtained by processing multi-microphone noisy speech through an endpoint detector.

The rate optimization module 620 is configured to process the noise covariance matrix and the relative acoustic transfer function by using the single external wireless acoustic sensor according to a preset lower bound of the expected output signal-to-noise ratio, so as to obtain a minimum transmission bit rate of the single external wireless acoustic sensor.

A filter construction module 630 for constructing an unconstrained binaural multi-channel wiener filter with weighted partial noise estimation terms from the binaural hearing aid based on the lowest transmission bit rate, and obtaining filter coefficients of the binaural multi-channel wiener filter with weighted partial noise estimation terms from the binaural multi-channel wiener filter and the noise covariance matrix.

The beam forming module 640 is configured to perform frequency domain beam forming on the binaural microphone voice signal by using short-time inverse fourier transform according to the filter coefficients, obtain a binaural output voice signal, and output a binaural spatial feature cue by using the binaural multi-channel wiener filter.

In order to verify the effectiveness of the above method or device provided by the present invention, the above method or device provided by the present invention will be further described below by designing an experiment in combination with the related drawings.

Fig. 7 is an experimental view of an external wireless microphone based hearing aid according to an embodiment of the present invention.

Fig. 8 is a graph of output signal-to-noise ratio versus quantized transmission rate in accordance with an embodiment of the invention.

Fig. 9 is a graph of signal-to-noise ratio gain, ILD and IPD error versus parameters according to an embodiment of the present invention.

(1) Experimental setup

Experimental configuration as shown in fig. 7, the targeted speaker is located at the front position (90 °) of the user, one interfering sound source is located at-90 °, and the external microphone is located at 90 °. The target sound source, the interference source, and the external wireless microphone are all 3 meters from the user's head. Each hearing aid contains 3 behind the ear microphones (behind the ear), so the total number of microphones is m=6. The multichannel head-related room impulse response (BRIR: binaural room impulse response) was generated using the mirror Image method using the acoustic transfer function of the sound source to the external microphone using the published database of the university of obamburg, germany, with the two forward microphones designated as reference microphones. All sound sources are speech signals from the TIMIT data set. The target sound source and the interfering sound source have equal input power, the microphone self-noise is Gaussian white noise, and the signal-to-noise ratio SNR is 50dB. The sampling frequency of all sound source signals is fixed at 16kHz, the short-time fourier transform uses a square root hanning window of 32 ms, and the frame shift is 16 ms.

(2) Experimental results

First, the output signal-to-noise ratio variation curve of the rate optimized BMWF-PNE method proposed by the present invention with quantized transmission rate is observed, as shown in FIG. 8. It can be seen that the output signal-to-noise ratio of the rate optimization method increases with increasing rate, increasing from a minimum value (in the case of no external microphone data being included) to a maximum value (in the case of non-destructive transmission of external microphone data being included). By adjusting α, the transmission rate can be optimized and the desired speech enhancement objective is always met.

Next, the snr gain (i.e., output snr minus input snr), noise spatial cue preservation error (i.e., output ILD or IPD minus absolute value of input ILD or IPD) is observed as shown in fig. 9. It can be seen that the performance of the conventional BWMF-PNE method depends only on η, whereas the rate optimized BMWF-PNE method proposed by the present invention depends on both η and bit rate. And the higher bit rate is used for quantization transmission, so that the signal-to-noise ratio gain can be improved, and the noise clue preservation error can be reduced. It is apparent that increasing the rate does not significantly aid in performance when the rate is higher than 2 bits, because the external microphone audio data is highly correlated with the binaural microphone signal, the audio data collected by the multi-microphone speech information processing system has a high redundancy, and rate optimization can reduce this redundancy to some extent. Therefore, in practical application, the lower rate is used for quantized transmission instead of a full rate mode, so that not only can the near-optimal performance be obtained, but also the energy consumption of the sensor can be reduced.

Finally, the trend of the snr gain and noise clue preservation error with the angle of the external microphone is observed, as shown in fig. 10, wherein the input snr of the external microphone is also related to the angle thereof, and the quantization rate of the external microphone is fixed to 4 bits/sample. It can be seen that the closer the external microphone is to the target sound source, the higher its input signal-to-noise ratio, the greater the signal-to-noise ratio gain, and the smaller the noise spatial cue preservation error.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not meant to limit the scope of the invention, but to limit the invention thereto.

Claims

1. A binaural wiener filtering method based on single external wireless acoustic sensor rate optimization, comprising:

carrying out parameter estimation on a noise frame by using a sample moving average method to obtain a noise covariance matrix, and estimating the noise covariance matrix by using a covariance matrix difference method or a covariance matrix whitening method to obtain a relative acoustic transfer function, wherein the noise frame is obtained by processing multi-microphone noise voice through an end point detector;

According to a preset lower bound of the expected output signal-to-noise ratio, the single external wireless acoustic sensor is utilized to process the noise covariance matrix and the relative acoustic transfer function, and the lowest transmission bit rate of the single external wireless acoustic sensor is obtained;

the minimum transmission bit rate is obtained by establishing a rate optimization problem under a secondary constraint condition through minimum transmission rate and converting the rate optimization problem under the secondary constraint condition into a root-finding problem of a unitary quadratic equation;

constructing an unconstrained binaural multi-channel wiener filter with a weighted part noise estimation term through a binaural hearing aid according to the minimum transmission bit rate, and obtaining a filter coefficient of the binaural multi-channel wiener filter with the weighted part noise estimation term according to the binaural multi-channel wiener filter and the noise covariance matrix;

uniformly quantizing and wirelessly transmitting an audio signal of the external acoustic sensor by using the lowest transmission bit rate of the single external wireless acoustic sensor, establishing a binaural multi-channel wiener design criterion with a weighted part noise estimation term in an unconstrained manner by using a weighted summation mode of minimizing voice distortion and output noise power at the binaural hearing aid end, setting the derivative of the filter coefficient to be zero by solving the derivative of an objective function on the filter coefficient to obtain a closed solution of the binaural multi-channel wiener with the weighted part noise estimation term, and substituting a covariance matrix into the closed solution to obtain the filter coefficient of the binaural multi-channel wiener filter with the weighted part noise estimation term;

Performing frequency domain beam forming on the binaural microphone voice signal by utilizing short-time inverse Fourier transform according to the filter coefficients to obtain a binaural output voice signal, and outputting a binaural spatial characteristic clue by utilizing the binaural multichannel wiener filter;

the binaural filter calculates a voice covariance matrix and the noise covariance matrix to obtain an output signal-to-noise ratio and a noise space characteristic clue, and designs a binaural sound source positioning algorithm based on the noise space characteristic clue.

2. The method of claim 1, wherein the relative acoustic transfer function is a normalized principal eigenvector of a speech covariance matrix or a normalized generalized principal eigenvector of a hybrid covariance matrix and the noise covariance matrix;

wherein the speech covariance matrix is a difference between the hybrid covariance matrix and the noise covariance matrix;

the mixed covariance matrix is obtained by the following steps:

the end point detector processes the multi-microphone noisy speech to obtain a noise frame and a noise-speech frame;

And estimating the voice covariance matrix by using the covariance matrix difference method or the covariance matrix whitening method to obtain an acoustic transfer function.

3. The method of claim 2, wherein the processing the noise covariance matrix and the relative acoustic transfer function with the single external wireless acoustic sensor according to a preset desired output signal-to-noise ratio lower bound to obtain a lowest transmission bit rate for the single external wireless acoustic sensor comprises:

and inputting the quantized noise characteristic clues into the rate optimization model to obtain the lowest transmission bit rate of the single external wireless acoustic sensor.

4. The method of claim 1, wherein constructing an unconstrained binaural multi-channel wiener filter with weighted partial noise estimation terms from the binaural hearing aid based on the lowest transmission bit rate, and deriving filter coefficients of the binaural multi-channel wiener filter with weighted partial noise estimation terms from the binaural multi-channel wiener filter and the noise covariance matrix comprises:

Quantizing the audio signal transmitted by the single external wireless acoustic sensor by using the minimum transmission bit rate and wirelessly transmitting the quantized audio signal;

receiving the quantized audio signal by using a binaural hearing aid, and constructing an unconstrained binaural multi-channel wiener filter model with a weighted part noise estimation term by a minimized speech distortion method and a weighted output noise power summation method;

obtaining a closed solution of the binaural multi-channel wiener filter model by solving a derivative of an objective function of the binaural multi-channel wiener filter model on a filter coefficient of the binaural multi-channel wiener filter model;

5. The method of claim 4, wherein the binaural multi-channel wiener filter with weighted partial noise estimation terms comprises a binaural multi-channel wiener filter with weighted partial noise estimation terms based on a full-rate single external acoustic sensor and a binaural multi-channel wiener filter with weighted partial noise estimation terms based on a limited-rate single external acoustic sensor.

6. The method of claim 5, wherein the binaural multi-channel wiener filter with weighted partial noise estimation term based on a full-rate single external acoustic sensor is determined by equation (1):

the binaural multi-channel wiener filter with weighted partial noise estimation term based on a finite rate single external acoustic sensor is determined by equation (2):

wherein Tr represents a matrix trace operation,an m+1-dimensional speech covariance matrix representing a microphone speech signal comprising the single external wireless acoustic sensor, < >>The representation contains the single external partM+1-dimensional noise covariance matrix of microphone voice signal of wireless acoustic sensor, e _i Represents an M+1-dimensional column vector, ">Quantized speech covariance matrix representing m+1 dimensions of a microphone speech signal comprising said single external wireless acoustic sensor,/v>Represents the quantized noise covariance matrix of m+1 dimensions of the microphone speech signal comprising the single external wireless acoustic sensor, L represents the left ear microphone, and R represents the right ear microphone, where M is a positive integer.

7. The method of claim 1, wherein the performing frequency domain beamforming on the binaural microphone speech signal using a short-time inverse fourier transform according to the filter coefficients to obtain a binaural output speech signal, and outputting binaural spatial signature cues using the binaural multi-channel wiener filter comprises:

performing inner product operation on the column vector characteristics of the noisy speech and the filter coefficients to obtain binaural speech output signals, wherein the binaural speech output signals comprise left ear speech output signals and right ear speech output signals;

calculating a binaural spatial signature cue for the binaural speech output signal using the binaural multi-channel wiener filter, wherein the binaural spatial signature cue comprises an output noise inter-channel transfer function, an output noise channel level, and an output noise channel phase difference;

and determining a binaural noise source positioning algorithm according to the binaural spatial feature cues.

8. The method of claim 1, further comprising:

the performance of the binaural multi-channel wiener filter with weighted partial noise estimate term is determined using the output signal quality and/or the noise spatial cue preservation accuracy.

9. The method of claim 1, wherein the lowest transmission bit rate is determined by equation (3):

10. A binaural wiener filtering device based on single external wireless acoustic sensor rate optimization, comprising:

the parameter estimation module is used for carrying out parameter estimation on a noise frame by utilizing a sample moving average method to obtain a noise covariance matrix, and estimating the noise covariance matrix by utilizing a covariance matrix difference method or a covariance matrix whitening method to obtain a relative acoustic transfer function, wherein the noise frame is obtained by processing multi-microphone noise-carrying voice through an end point detector;

A filter construction module, configured to construct, according to the lowest transmission bit rate, an unconstrained binaural multi-channel wiener filter with a weighted part noise estimation term through a binaural hearing aid, and obtain, according to the binaural multi-channel wiener filter and the noise covariance matrix, a filter coefficient of the binaural multi-channel wiener filter with the weighted part noise estimation term;

the beam forming module is used for carrying out frequency domain beam forming on the binaural microphone voice signal by utilizing short-time inverse Fourier transform according to the filter coefficient to obtain a binaural output voice signal, and outputting a binaural spatial characteristic clue by utilizing the binaural multichannel wiener filter;