CN114613383A - Multi-input voice signal beam forming information complementation method under airborne environment - Google Patents
Multi-input voice signal beam forming information complementation method under airborne environment Download PDFInfo
- Publication number
- CN114613383A CN114613383A CN202210246203.3A CN202210246203A CN114613383A CN 114613383 A CN114613383 A CN 114613383A CN 202210246203 A CN202210246203 A CN 202210246203A CN 114613383 A CN114613383 A CN 114613383A
- Authority
- CN
- China
- Prior art keywords
- signal
- matrix
- optimal
- representing
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/16—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/175—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound
- G10K11/178—Methods or devices for protecting against, or for damping, noise or other acoustic waves in general using interference effects; Masking sound by electro-acoustically regenerating the original acoustic waves in anti-phase
- G10K11/1785—Methods, e.g. algorithms; Devices
- G10K11/17853—Methods, e.g. algorithms; Devices of the filter
- G10K11/17854—Methods, e.g. algorithms; Devices of the filter the filter being an adaptive filter
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The invention discloses a multi-input voice signal beam forming information complementation method under an airborne environment, which belongs to the field of airborne voice signal processing and comprises the following steps: s1, preprocessing the input signal; s2, carrying out voice activity detection on the preprocessed signals; s3, estimating and synchronizing time delay, adjusting the range of the corresponding voice section and noise section, judging whether the time delay among the signals after synchronization is less than the length of the filter, if so, estimating a matrix, otherwise, continuing the time delay synchronization; s4, carrying out noise matrix estimation and noisy speech matrix estimation, and carrying out optimal matrix estimation according to the two; and S5, estimating the optimal weight vector by using the optimal matrix to obtain an optimal filter, and outputting a synthesized signal by using the input signal through the optimal filter. The invention can ensure the information integrity, enhance the communication quality and the communication stability between the air and the machine, not only keep the complete voice information, but also effectively improve the signal-to-noise ratio.
Description
Technical Field
The invention relates to the field of airborne voice signal processing, in particular to a multi-input voice signal beam forming information complementation method under an airborne environment.
Background
In the process of a flight task, an ultrashort wave communication system has a problem of voice signal discontinuity caused by multiple reasons such as incomplete coverage of a multi-antenna space of an airplane, poor diffraction capability of ultrashort waves, electromagnetic interference existing in an airplane system and the like, and existing solutions for the problem mainly include a selective combining method scheme, an equal gain combining method scheme, a microphone array beam forming method scheme and the like.
The scheme of the selective combining method belongs to a combining method in a diversity combining scheme, and the scheme outputs by selecting a channel with the best performance, but only one signal is output by selective combining, which can cause information loss. The problem can be seen in fig. 1, which is similar to the problem in fig. 1. That is, taking four antennas to receive voice 123456789 as an example, when voice interruption occurs, the multichannel voice signal is comparatively gated. Because only the gating processing is carried out, the output signals still cause the speech interruption and the word loss, and complete information cannot be obtained.
The equal gain combining method scheme belongs to a combining method in a diversity combining scheme, and can only ensure in-phase addition, if the inputs are unbalanced, weak signal amplification is easily caused to participate in combining by multiple times, more noise is introduced, and even combining loss can be caused.
Microphone array beamforming approach by beamforming, the gain applied to the output of a single microphone or microphones in an array may be controlled, preferably maximizing the microphone array gain from beamforming, but increasing the gain may also increase the internal or self-noise of the system.
From the above, the existing scheme of the selective combining method selects a single signal to output, which causes the problem of signal loss. And the existing equal gain combining method scheme has the problem that more noise is easily introduced to cause combining loss.
Disclosure of Invention
The present invention is directed to overcome the deficiencies of the prior art, and provides a method for complementing multi-input speech signal beam forming information in an airborne environment, which aims to solve the problems set forth in the background.
The purpose of the invention is realized by the following scheme:
a multi-input voice signal beam forming information complementation method under an airborne environment comprises the following steps:
step S1, preprocessing the input signal;
step S2, voice activity detection is carried out on the preprocessed signals, and a voice section range of the input signals and a noise section range of the input signals are obtained;
step S3, estimating and synchronizing time delay, adjusting the range of the corresponding voice section and noise section, judging whether the time delay among the signals after synchronization is less than the length of the filter, if so, estimating the matrix, otherwise, continuing the time delay synchronization;
step S4, carrying out noise matrix estimation and noisy speech matrix estimation, and carrying out optimal matrix estimation by the two;
and step S5, estimating the optimal weight vector by using the optimal matrix to obtain an optimal filter, and outputting a synthesized signal by using the input signal through the optimal filter.
Further, in step S1, the preprocessing includes a framing windowing process.
Further, in step S2, the method includes the sub-steps of: and carrying out voice endpoint detection on the voice signals, and mutually determining the interval by utilizing a short-time energy method and a short-time zero-crossing rate method to obtain an accurate endpoint detection result.
Further, in step S3, the method includes the sub-steps of: setting the two input signal models as follows at time k:
wherein i is 1, 2.. times.n; s (k) represents the original clean speech signal; tau isiRepresenting the relative time delay of the voice signal received by each channel relative to the original pure voice signal; v. ofi(k) Representing the noise of the voice signal received by each channel relative to the original pure voice signal;
is a cross-correlation function of two input speech signals, y1(k) Representing the first received signal, y2(k) Representing the second received signal;
τ=τ1-τ2for time delay of two signals, alpha1Representing the coefficient of the ratio, alpha, of the first received signal to the clean original speech signal2The ratio coefficient of the second path of received signal and the pure original voice signal is represented;
if tau- (tau)1-τ2) 0, then the autocorrelation matrix R of s (k)ss(τ-(τ1-τ2) To take the maximum value of the maximum value,obtaining the maximum correlation of two paths of signals, calculating the corresponding displacement point number lambda, and obtaining the sampling rate fsAnd the relation between the point number lambda is used for calculating the time delay tau of the two sections of signals:
and after obtaining the time delay estimation result, carrying out displacement synchronization to obtain a signal without time delay difference.
Further, in step S4, the method includes the sub-steps of: calculating the autocorrelation matrix R of the noisy speechyy:
Computing a noise autocorrelation matrix Rvv:
Further, in step S5, the optimal matrix estimation includes the sub-steps of: the optimal matrix W is calculated as followsi,0:
Lhthe length is represented as a function of time,representing an order of LhUnit matrix of W0Optimal matrix sum representing all channelsAnd constructing an optimal matrix.
Further, in step S5, the estimating the optimal weight vector by using the optimal matrix includes the sub-steps of: the optimal weight vector is calculated as follows:
wherein u' ═ 1,0,. 0, 0]TIs of length LhWherein h represents the optimal filter,anddenotes h under the conditions of the optimal filter transformationy TRyyhyAnd hv TRvvhvRespectively representing the output power of the noisy speech and the noise, s.t. denotes W under constraintTRepresents the transpose of the optimal filter matrix, u' ═ 1,0]TIs of length LhA vector of (a);
solving the two optimization problems by a Lagrange multiplier method:
Ly(hy,λ)=hy TRyyhy+λy(WThy-u')
Lv(hv,λv)=hv TRvvhv+λv(WThv-u')
wherein L isy(hyλ) and Lv(h,λv) Respectively representing the lagrange function, lambda, of noisy speech and noise under constraintsvRepresenting lagrange multiplier vector parameters;
to Ly(hyλ) and Lv(h,λv) Derivation of h in (1) to obtain:
L'y(hy,λ)=Ryyhy+Wλy T
L'v(h,λv)=Rvvh+Wλv T
wherein L'y(hy,λy) And L'v(h,λv) Are respectively Ly(hy,λy) And Lv(hv,λv) A derivative of (a); l 'of'y(hy,λy) And L'v(hv,λv) All equal to 0, find:
bringing both into a constraint WThyU' and WThvU', yielding:
substituting W containing space-time information0A matrix, resulting in:
hST,yrepresenting an optimal filter, h, found for noisy speechST,vRepresents an optimal filter found for noise.
Further, in step S5, the outputting the synthesized signal through the optimal filter by using the input signal includes the sub-steps of:
use of hST,vAs a filter matrix, the synthesized signal output by the optimal filter is:
whereinFor filtering the output signal, hi,ST,vThe optimal filter matrix, x, representing channel iir(k) And vir(k) Respectively, speech and residual noise after being filtered by the optimal filter.
Further, the mutually determining the interval by using the short-time energy method and the short-time zero-crossing rate method comprises the following substeps:
let the speech signal of the nth frame be xn(k) Then the short-time energy of the frame isShort-time zero-crossing rate ofWherein the content of the first and second substances,representing a step function, k representing the time of day and N representing the total number of frames.
The beneficial effects of the invention include:
the invention prevents the problem of voice interruption by utilizing mutual supplement among multi-input voice information, ensures the information integrity, and enhances the communication quality and the communication stability between the air and the machine. Compared with a comparative gating method or an equal gain combining method, the method not only keeps complete voice information, but also can effectively improve the signal-to-noise ratio.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of the prior art method for selecting and combining to perform multi-channel speech signal comparison gating;
FIG. 2 is a flow chart of steps of a method according to an embodiment of the present invention;
FIG. 3 is a diagram showing the relationship between the number of estimated points corresponding to a noise segment and the output SNR;
FIG. 4 is a graph of the relationship between a noisy speech segment Ly and the output signal-to-noise ratio;
FIG. 5 is a graph of maximum delay length (0-1000) versus output signal-to-noise ratio;
FIG. 6 is a normalized speech waveform for four input signals with speech discontinuities at 5dB for both signal-to-noise ratios;
FIG. 7 is a speech waveform of the four input signals of FIG. 6 output using the method of the present invention;
FIG. 8 is a flowchart illustrating steps of a method according to an embodiment of the present invention.
Detailed Description
All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.
The embodiment of the invention provides a beam forming method for multi-input voice signals under an airborne environment, aiming at the problem that voice signals are interrupted due to the multiple reasons of incomplete coverage of a multi-antenna space of an airplane, poor diffraction capability of ultra-short waves, electromagnetic interference existing in an airplane system and the like in the flying task process of an ultra-short wave communication system and the defects of the existing solution (selection combination, equal gain combination) of the problem.
The technical scheme of the embodiment of the invention is as follows: performing frame-division windowing pretreatment on an input signal; carrying out voice endpoint detection on each input signal to determine whether the input signal is a voice section; carrying out time delay estimation processing on the existing voice section, and carrying out time delay synchronization to ensure that the maximum time delay is not greater than the length of a filter; determining a noise section according to the voice section information of each input signal after time delay synchronization, performing cross-correlation matrix estimation of the voice section and the noise section, and calculating an optimal filtering matrix and an optimal weight vector according to the results of the voice section and the noise section; and finally filtering the output signal.
As shown in fig. 2, the method comprises the following steps:
preprocessing input signals by framing, windowing and the like;
performing voice activity detection on the framed signal to obtain a voice segment range of the input signal and a noise segment range of the input signal;
carrying out time delay estimation preprocessing, estimating the maximum time delay, synchronizing, adjusting the range of the corresponding voice section and noise section, judging whether the time delay among the signals after synchronization is less than the length of a filter, carrying out matrix estimation if the time delay is less than the length of the filter, and otherwise, continuing time delay synchronization;
carrying out noise matrix estimation and noisy speech matrix estimation, and carrying out optimal matrix estimation by using the noise matrix estimation and the noisy speech matrix estimation;
and estimating the optimal weight vector by using the optimal matrix to obtain an optimal filter, and outputting a synthesized signal by using the input signal through the filter.
In the specific implementation process, the method comprises the following sub-steps:
firstly, preprocessing such as framing and windowing;
frame dividing time: 25ms
Windowing: sw(n) ═ S (n) (w (n)), where Sw(n) is the windowed function, S (n) is the function to be windowed, w (n) is the window function, w (n) selects the Hamming window,
secondly, voice endpoint detection is carried out on the voice signals;
short-time energy: let the speech signal of the nth frame be xn(k) Then the short-time energy of the frame isShort-time zero-crossing rate of
An accurate end point detection result can be obtained by mutually determining the interval by using a short-time energy method and a short-time zero crossing rate method.
And thirdly, performing time delay estimation synchronization. The time delay among the multiple input channels needs to be eliminated or is small, at least needs to be below the order of a filter, otherwise, the quality of the received voice is reduced, referring to fig. 5, the positions of the voice section and the noise section which are unified cannot be used for calculation, and the calculation amount is increased;
and (3) time delay estimation: suppose that the two input signal models are respectively
Wherein i is 1, 2.. times.n; s (k) represents the original clean speech signal; tau isiRepresenting the relative time delay of the voice signal received by each channel relative to the original pure voice signal; v. ofi(k) Representing the noise of the received speech signal for each channel relative to the original clean speech signal.
If tau- (tau)1-τ2) When R is equal to 0, then Rss(τ-(τ1-τ2) To take the maximum value of the maximum value,the maximum correlation of the two paths of signals is obtained, the corresponding displacement point number lambda can be obtained, and the sampling rate f is usedsAnd the relation between the point number lambda can calculate the time difference tau of two sections of signals.
And after obtaining the time delay estimation result, carrying out displacement synchronization to obtain a signal without time delay difference.
Fourthly, estimating a cross-correlation matrix of the voice section and the noise section, and greatly influencing the result by the effective points of the voice section and the noise section by referring to fig. 3 and 4;
Fifthly, estimating an optimal filter matrix;
wherein i represents the number of channels, Wi,0representing the optimal filter matrix for channel i.
Sixthly, calculating the optimal weight vector according to the following formula:
wherein u' ═ 1,0,. 0, 0]TIs of length LhWherein h represents the optimal filter,anddenotes h under the conditions of the optimal filter transformationy TRyyhyAnd hv TRvvhvRespectively representing the output power of the noisy speech and the noise, s.t. denotes W under constraintTRepresents the transpose of the optimal filter matrix, u' ═ 1,0]TIs of length LhA vector of (a);
solving the two optimization problems by a Lagrange multiplier method:
Ly(hy,λ)=hy TRyyhy+λy(WThy-u')
Lv(hv,λv)=hv TRvvhv+λv(WThv-u')
wherein L isy(hyλ) and Lv(h,λv) Respectively representing the lagrange function, lambda, of noisy speech and noise under constraintsvRepresenting lagrange multiplier vector parameters;
to Ly(hyλ) and Lv(h,λv) Derivation of h in (1) to obtain:
L'y(hy,λ)=Ryyhy+Wλy T
L'v(h,λv)=Rvvh+Wλv T
wherein L'y(hy,λy) And L'v(h,λv) Are respectively Ly(hy,λy) And Lv(hv,λv) A derivative of (d); l 'of'y(hy,λy) And L'v(hv,λv) Are all equal to 0, find:
bringing both into constraint WThyU' and WThvU', yielding:
substituting W containing space-time information0A matrix, resulting in:
hST,yrepresenting an optimal filter, h, found for noisy speechST,vOptimal filter representing solution to noise。
Seventhly, the voice and the noise are completely uncorrelated under the algorithm condition, so when the output power of the whole voice with noise is minimum after filtering, the output power of the noise is also minimum at the same time. In practice this does not hold completely, so to prevent the information of speech segments from being filtered, h is used hereST,vAs a filter matrix, filter the output signal:
whereinFor filtering the output signal, hi,ST,vThe optimal filter matrix, x, representing channel iir(k) And vir(k) Respectively, speech and residual noise after being filtered by the optimal filter.
As shown in fig. 3, the correlation between the number of noise segment significant points and the output signal-to-noise ratio. In the experiment, time delay, the number of effective points of a voice section and other variables are controlled, only the number of the corresponding effective points of the noise section is changed, and two paths of 5dB signals are input.
As shown in fig. 4, the correlation between the number of active points of a speech segment and the output signal-to-noise ratio. In the experiment, other variables such as time delay, effective point number of noise sections and the like are controlled, only the effective point number of voice sections is changed, and two paths of 5dB signals are input. The horizontal axis represents how many characters in the same sentence of voice enter the selection range, and the more characters are, that is, the larger the effective point number of the voice segment is.
As shown in fig. 5, the maximum delay length between multiple inputs is related to the output signal-to-noise ratio. In the experiment, other variables such as the number of effective points of a voice section, the number of effective points of a noise section and the like are controlled, only the maximum time delay between input signals is changed, and two paths of 5dB signals are input. At this time, the order of the filter is set to be 64, and it is seen that when the number of delay points is greater than the order of the filter, the output signal-to-noise ratio will have a great slip.
As shown in fig. 6, there are four input signals with speech discontinuities and the signal-to-noise ratios are all 5 dB.
As shown in fig. 7, the four input signals in fig. 6 are speech waveforms output using this method.
Example 1
As shown in fig. 8, a method for complementing multi-input speech signal beam forming information in an airborne environment includes the following steps:
step S1, preprocessing the input signal;
step S2, voice activity detection is carried out on the preprocessed signals, and a voice section range of the input signals and a noise section range of the input signals are obtained;
step S3, estimating and synchronizing time delay, adjusting the range of the corresponding voice section and noise section, judging whether the time delay among the signals after synchronization is less than the length of the filter, if so, estimating the matrix, otherwise, continuing the time delay synchronization;
step S4, carrying out noise matrix estimation and noisy speech matrix estimation, and carrying out optimal matrix estimation by the two;
and step S5, estimating the optimal weight vector by using the optimal matrix to obtain an optimal filter, and outputting a synthesized signal by using the input signal through the optimal filter.
Example 2
Based on embodiment 1, in step S1, the preprocessing includes a framing windowing process.
Example 3
Based on embodiment 1, in step S2, the method includes the sub-steps of: and carrying out voice endpoint detection on the voice signals, and mutually determining the interval by utilizing a short-time energy method and a short-time zero-crossing rate method to obtain an accurate endpoint detection result.
Example 4
Based on embodiment 1, in step S3, the method includes the sub-steps of: setting the two input signal models as follows at time k:
wherein i is 1, 2.. times.n; s (k) represents the original clean speech signal; tau isiRepresenting the relative time delay of the voice signal received by each channel relative to the original pure voice signal; v. ofi(k) Representing the noise of the voice signal received by each channel relative to the original pure voice signal;
is a cross-correlation function of two input speech signals, y1(k) Representing the first received signal, y2(k) Representing the second received signal;
τ=τ1-τ2for time delay of two signals, alpha1Representing the coefficient of the ratio, alpha, of the first received signal to the clean original speech signal2The ratio coefficient of the signal received by the second path and the pure original voice signal is represented;
if tau- (tau)1-τ2) 0, then the autocorrelation matrix R of s (k)ss(τ-(τ1-τ2) To take the maximum value of the maximum value,obtaining the maximum correlation of two paths of signals, calculating the corresponding displacement point number lambda, and obtaining the sampling rate fsAnd the relation between the point number lambda is used for calculating the time delay tau of the two sections of signals:
and after obtaining the time delay estimation result, carrying out displacement synchronization to obtain a signal without time delay difference.
Example 5
Based on embodiment 1, in step S4, the method includes the sub-steps of: calculating the autocorrelation matrix R of the noisy speechyy:
Computing a noise autocorrelation matrix Rvv:
Example 6
Based on embodiment 5, in step S5, the optimal matrix estimation includes the sub-steps of: the optimal matrix W is calculated as followsi,0:
Lhthe length is represented as a function of time,representing an order of LhUnit matrix of W0Optimal matrix sum representing all channelsThe formed optimal matrix can ensure the full rank of the matrix so as to be convenient for derivation when a Lagrange multiplier method is carried out subsequently.
Example 7
Based on embodiment 6, in step S5, the estimating the optimal weight vector by using the optimal matrix includes the following sub-steps: the optimal weight vector is calculated as follows:
wherein u' ═ 1,0,. 0, 0]TIs of length LhWherein h represents the optimal filter,anddenotes h under the conditions of the optimal filter transformationy TRyyhyAnd hv TRvvhvRespectively representing the output power of the noisy speech and the noise, s.t. denotes W under constraintTRepresents the transpose of the optimal filter matrix, u' ═ 1,0]TIs of length LhA vector of (a);
solving the two optimization problems by a Lagrange multiplier method:
Ly(hy,λ)=hy TRyyhy+λy(WThy-u')
Lv(hv,λv)=hv TRvvhv+λv(WThv-u')
wherein L isy(hyλ) and Lv(h,λv) Respectively representing the lagrange function, lambda, of noisy speech and noise under constraintsvRepresenting lagrange multiplier vector parameters;
to Ly(hyλ) and Lv(h,λv) Derivation of h in (1) to obtain:
L'y(hy,λ)=Ryyhy+Wλy T
L'v(h,λv)=Rvvh+Wλv T
wherein L'y(hy,λy) And L'v(h,λv) Are respectively Ly(hy,λy) And Lv(hv,λv) A derivative of (a); l 'of'y(hy,λy) And L'v(hv,λv) All equal to 0, find:
bringing both into constraint WThyU' and WThvU', yielding:
substituting W containing space-time information0A matrix, resulting in:
hST,yrepresenting an optimal filter, h, found for noisy speechST,vRepresents an optimal filter found for noise.
Example 8
Based on embodiment 7, in step S5, since the speech and the noise are completely uncorrelated under the algorithm condition, when the output power is minimum after the filtering of the whole noisy speech, the output power of the noise is also minimum at the same time. But in practice this will not be the case, so to prevent the information of the speech segments from being filtered, the embodiment of the present invention uses h hereST,vAs a filter matrix. The outputting of the synthesized signal by the input signal through the optimal filter includes the substeps of:
the composite signal output by the optimal filter is:
whereinFor filtering the output signal, hi,ST,vThe optimal filter matrix, x, representing channel iir(k) And vir(k) Respectively, speech and residual noise after being filtered by the optimal filter.
Example 9
Based on embodiment 3, the mutually determining the interval by using the short-time energy method and the short-time zero-crossing rate method comprises the following substeps:
let the speech signal of the nth frame be xn(k) Then the short-time energy of the frame isShort-term zero-crossing rate ofWherein the content of the first and second substances,represents a step function, k represents time, and N represents the total number of frames.
Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.
Claims (9)
1. A multi-input voice signal beam forming information complementation method under an airborne environment is characterized by comprising the following steps:
step S1, preprocessing the input signal;
step S2, voice activity detection is carried out on the preprocessed signals, and a voice section range of the input signals and a noise section range of the input signals are obtained;
step S3, estimating and synchronizing time delay, adjusting the range of the corresponding voice section and noise section, judging whether the time delay among the signals after synchronization is less than the length of the filter, if so, estimating the matrix, otherwise, continuing the time delay synchronization;
step S4, carrying out noise matrix estimation and noisy speech matrix estimation, and carrying out optimal matrix estimation by the two;
and step S5, estimating the optimal weight vector by using the optimal matrix to obtain an optimal filter, and outputting a synthesized signal by using the input signal through the optimal filter.
2. The method for complementing beamforming information of multiple input voice signals according to claim 1, wherein in step S1, the pre-processing comprises framing and windowing.
3. The method for complementing beamforming information of multiple input voice signals under airborne environment according to claim 1, wherein in step S2, the method comprises the sub-steps of: and carrying out voice endpoint detection on the voice signals, and mutually determining the interval by utilizing a short-time energy method and a short-time zero-crossing rate method to obtain an accurate endpoint detection result.
4. The method for complementing beamforming information of multiple input voice signals under airborne environment according to claim 1, wherein in step S3, the method comprises the sub-steps of: setting the two input signal models as follows at time k:
wherein i is 1, 2.. times.n; s (k) represents the original clean speech signal; tau isiRepresenting the relative time delay of the voice signal received by each channel relative to the original pure voice signal; v. ofi(k) Representing the noise of the voice signal received by each channel relative to the original pure voice signal;
Ry1y2(τ)=E[y1(k)y2(k-τ)],Ry1y2(tau) is the cross-correlation function of the two input speech signals, y1(k) Representing the first received signal, y2(k) Representing the second received signal;
τ=τ1-τ2for time delay of two signals, alpha1Representing the coefficient of the ratio, alpha, of the first received signal to the clean original speech signal2The ratio coefficient of the second path of received signal and the pure original voice signal is represented;
if tau- (tau)1-τ2) 0, then the autocorrelation matrix R of s (k)ss(τ-(τ1-τ2) To obtain a maximum value, Ry1y2(tau) obtaining maximum value, obtaining maximum correlation of two paths of signals, obtaining corresponding displacement point number lambda, and obtaining the sampling rate fsAnd the relation between the point number lambda is used for calculating the time delay tau of the two sections of signals:
and after obtaining the time delay estimation result, carrying out displacement synchronization to obtain a signal without time delay difference.
5. The method for complementing beamforming information of multiple input voice signals under airborne environment according to claim 1, wherein in step S4, the method comprises the sub-steps of: calculating the autocorrelation matrix R of the noisy speechyy:
Computing a noise autocorrelation matrix Rvv:
6. The method for complementing multi-input speech signal beam-forming information according to claim 5, wherein in step S5, the optimal matrix estimation comprises the sub-steps of: the optimal matrix W is calculated as followsi,0:
7. The method for complementing beamforming information of multiple input voice signals according to claim 6, wherein in step S5, the estimating the optimal weight vector using the optimal matrix comprises the following sub-steps: the optimal weight vector is calculated as follows:
wherein u' ═ 1,0,. 0, 0]TIs of length LhWherein h represents the optimal filter,anddenotes h under the conditions of the optimal filter transformationy TRyyhyAnd hv TRvvhvRepresenting the output power of the noisy speech and noise, respectively, s.t. denotes, under constraints, WTRepresents the transpose of the optimal filter matrix, u' ═ 1,0]TIs of length LhA vector of (a);
solving the two optimization problems by a Lagrange multiplier method:
Ly(hy,λ)=hy TRyyhy+λy(WThy-u')
Lv(hv,λv)=hv TRvvhv+λv(WThv-u')
wherein L isy(hyλ) and Lv(h,λv) Respectively representing the lagrange function, lambda, of noisy speech and noise under constraintsvRepresenting lagrange multiplier vector parameters;
to Ly(hyλ) and Lv(h,λv) Derivation of h in (1) to obtain:
L'y(hy,λ)=Ryyhy+Wλy T
L'v(h,λv)=Rvvh+Wλv T
wherein L'y(hy,λy) And L'v(h,λv) Are respectively Ly(hy,λy) And Lv(hv,λv) A derivative of (a); l 'of'y(hy,λy) And L'v(hv,λv) Are all equal to 0, find:
bringing both into constraint WThyU' and WThvU', yielding:
substituting W containing space-time information0A matrix, resulting in:
hST,yrepresenting an optimal filter, h, found for noisy speechST,vRepresents an optimal filter found for noise.
8. The method for complementing beamforming information of a multi-input speech signal under airborne environment according to claim 7, wherein in step S5, the step of outputting the synthesized signal by the input signal through an optimal filter comprises the sub-steps of:
use of hST,vAs a filter matrix, the synthesized signal output by the optimal filter is:
9. The method of complementing beamforming information for multiple input speech signals according to claim 3, wherein said mutually determining the interval by using the short-time energy method and the short-time zero-crossing rate method comprises the sub-steps of:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210246203.3A CN114613383B (en) | 2022-03-14 | 2022-03-14 | Multi-input voice signal beam forming information complementation method in airborne environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210246203.3A CN114613383B (en) | 2022-03-14 | 2022-03-14 | Multi-input voice signal beam forming information complementation method in airborne environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114613383A true CN114613383A (en) | 2022-06-10 |
CN114613383B CN114613383B (en) | 2023-07-18 |
Family
ID=81863801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210246203.3A Active CN114613383B (en) | 2022-03-14 | 2022-03-14 | Multi-input voice signal beam forming information complementation method in airborne environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114613383B (en) |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1633121A1 (en) * | 2004-09-03 | 2006-03-08 | Harman Becker Automotive Systems GmbH | Speech signal processing with combined adaptive noise reduction and adaptive echo compensation |
CN102611669A (en) * | 2010-12-29 | 2012-07-25 | Zte维创通讯公司 | Channel estimation filtering |
US20150019213A1 (en) * | 2013-07-15 | 2015-01-15 | Rajeev Conrad Nongpiur | Measuring and improving speech intelligibility in an enclosure |
CN104952459A (en) * | 2015-04-29 | 2015-09-30 | 大连理工大学 | Distributed speech enhancement method based on distributed uniformity and MVDR (minimum variance distortionless response) beam forming |
CN105223544A (en) * | 2015-08-26 | 2016-01-06 | 南京信息工程大学 | The constant Beamforming Method of the near field linear constraint adaptive weighted frequency of minimum variance |
CN106782590A (en) * | 2016-12-14 | 2017-05-31 | 南京信息工程大学 | Based on microphone array Beamforming Method under reverberant ambiance |
CN107610713A (en) * | 2017-10-23 | 2018-01-19 | 科大讯飞股份有限公司 | Echo cancel method and device based on time delay estimation |
CN110045334A (en) * | 2019-02-28 | 2019-07-23 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Sidelobe null Beamforming Method |
CN110111807A (en) * | 2019-04-27 | 2019-08-09 | 南京理工大学 | A kind of indoor sound source based on microphone array follows and Enhancement Method |
US20190341054A1 (en) * | 2018-05-07 | 2019-11-07 | Microsoft Technology Licensing, Llc | Multi-modal speech localization |
CN110473564A (en) * | 2019-07-10 | 2019-11-19 | 西北工业大学深圳研究院 | A kind of multi-channel speech enhancement method based on depth Wave beam forming |
CN111508516A (en) * | 2020-03-31 | 2020-08-07 | 上海交通大学 | Voice beam forming method based on channel correlation time frequency mask |
-
2022
- 2022-03-14 CN CN202210246203.3A patent/CN114613383B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1633121A1 (en) * | 2004-09-03 | 2006-03-08 | Harman Becker Automotive Systems GmbH | Speech signal processing with combined adaptive noise reduction and adaptive echo compensation |
CN102611669A (en) * | 2010-12-29 | 2012-07-25 | Zte维创通讯公司 | Channel estimation filtering |
US20150019213A1 (en) * | 2013-07-15 | 2015-01-15 | Rajeev Conrad Nongpiur | Measuring and improving speech intelligibility in an enclosure |
CN104952459A (en) * | 2015-04-29 | 2015-09-30 | 大连理工大学 | Distributed speech enhancement method based on distributed uniformity and MVDR (minimum variance distortionless response) beam forming |
CN105223544A (en) * | 2015-08-26 | 2016-01-06 | 南京信息工程大学 | The constant Beamforming Method of the near field linear constraint adaptive weighted frequency of minimum variance |
CN106782590A (en) * | 2016-12-14 | 2017-05-31 | 南京信息工程大学 | Based on microphone array Beamforming Method under reverberant ambiance |
CN107610713A (en) * | 2017-10-23 | 2018-01-19 | 科大讯飞股份有限公司 | Echo cancel method and device based on time delay estimation |
US20190341054A1 (en) * | 2018-05-07 | 2019-11-07 | Microsoft Technology Licensing, Llc | Multi-modal speech localization |
CN110045334A (en) * | 2019-02-28 | 2019-07-23 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Sidelobe null Beamforming Method |
CN110111807A (en) * | 2019-04-27 | 2019-08-09 | 南京理工大学 | A kind of indoor sound source based on microphone array follows and Enhancement Method |
CN110473564A (en) * | 2019-07-10 | 2019-11-19 | 西北工业大学深圳研究院 | A kind of multi-channel speech enhancement method based on depth Wave beam forming |
CN111508516A (en) * | 2020-03-31 | 2020-08-07 | 上海交通大学 | Voice beam forming method based on channel correlation time frequency mask |
Non-Patent Citations (1)
Title |
---|
王秋菊: ""机载噪声环境下语音增强研究"" * |
Also Published As
Publication number | Publication date |
---|---|
CN114613383B (en) | 2023-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3703053B1 (en) | Microphone array-based target voice acquisition method and device | |
CN107102296B (en) | Sound source positioning system based on distributed microphone array | |
CN109490822B (en) | Voice DOA estimation method based on ResNet | |
CN104936091B (en) | Intelligent interactive method and system based on circular microphone array | |
CN107018470B (en) | A kind of voice recording method and system based on annular microphone array | |
US9318124B2 (en) | Sound signal processing device, method, and program | |
US9031257B2 (en) | Processing signals | |
CN109830245A (en) | A kind of more speaker's speech separating methods and system based on beam forming | |
CN108877827A (en) | Voice-enhanced interaction method and system, storage medium and electronic equipment | |
CA2621940A1 (en) | Method and device for binaural signal enhancement | |
CN102204281A (en) | A system and method for producing a directional output signal | |
CN109725285B (en) | DOA estimation method based on MVDR covariance matrix element self-adaptive phase angle conversion | |
CN110827846B (en) | Speech noise reduction method and device adopting weighted superposition synthesis beam | |
CN110610718B (en) | Method and device for extracting expected sound source voice signal | |
CN112904279A (en) | Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum | |
CN110534126A (en) | A kind of auditory localization and sound enhancement method and system based on fixed beam formation | |
CN110830870B (en) | Earphone wearer voice activity detection system based on microphone technology | |
CN114613383A (en) | Multi-input voice signal beam forming information complementation method under airborne environment | |
CN112363112A (en) | Sound source positioning method and device based on linear microphone array | |
CN114613384B (en) | Deep learning-based multi-input voice signal beam forming information complementation method | |
Priyanka et al. | Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement | |
WO2023108864A1 (en) | Regional pickup method and system for miniature microphone array device | |
CN116701921B (en) | Multi-channel time sequence signal self-adaptive noise suppression circuit | |
CN113782024B (en) | Method for improving accuracy of automatic voice recognition after voice awakening | |
Xiaohua et al. | Research of the principle of cognitive sonar and beamforming simulation analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |