CN116343808A - Flexible microphone array voice enhancement method and device, electronic equipment and medium - Google Patents

Flexible microphone array voice enhancement method and device, electronic equipment and medium Download PDF

Info

Publication number
CN116343808A
CN116343808A CN202310349782.9A CN202310349782A CN116343808A CN 116343808 A CN116343808 A CN 116343808A CN 202310349782 A CN202310349782 A CN 202310349782A CN 116343808 A CN116343808 A CN 116343808A
Authority
CN
China
Prior art keywords
microphone array
voice
flexible
sound source
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310349782.9A
Other languages
Chinese (zh)
Inventor
王若凡
施钧辉
王钰琪
张劲
阮永都
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202310349782.9A priority Critical patent/CN116343808A/en
Publication of CN116343808A publication Critical patent/CN116343808A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The invention discloses a flexible microphone array voice enhancement method and device, electronic equipment and medium, comprising the following steps: acquiring a voice threshold value; dividing a signal received by a microphone array into a plurality of sub-bands, calculating the sum of binary Gaussian log-likelihood ratios of all the sub-bands, and detecting the existence of voice based on a voice threshold value; solving a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding out a maximum value of the spectrum function, wherein an angle corresponding to the maximum value is a sound source direction angle; and carrying out beam response optimization on the signal in the sound source direction to strengthen the voice signal in the sound source direction, and outputting the strengthened voice signal after wiener filtering processing. The method can realize the real-time voice signal separation and enhancement in a multi-person conversation scene in a noisy environment.

Description

Flexible microphone array voice enhancement method and device, electronic equipment and medium
Technical Field
The present invention relates to flexible circuits, and more particularly, to a method and apparatus for voice enhancement of a flexible microphone array, an electronic device, and a medium.
Background
The multi-person voice recognition and separation in complex environments is a very important and practical task. There are many scenarios in life, such as, for example, multi-person indoor conferences, team outdoor activities, all performed in noisy environments. The signals recorded by the traditional sensor system contain background noise and multi-person voice signals at the same time, so that the sounding positions and speaking contents of all persons cannot be effectively distinguished. Therefore, the conventional audio transceiving system cannot achieve enhancement and transfer of a desired sound source signal for a specific direction.
A microphone array refers to a group of acoustic sensors (called microphones) placed in an orderly arrangement that can achieve better spatial directivity than a single microphone by the interaction of small time differences between the arrival of sound waves at each microphone in the array. Microphone arrays are commonly used for sound source localization, suppressing background noise, signal extraction and separation. The microphone array does not limit the movement of a speaker, can position a sound source at any position in space, and is an important basic device in man-machine interaction and voice directional pickup. The problem of speech separation was originally derived from the "cocktail party problem" and was aimed at separating the desired speaker's voice from the noisy environment (other human voice disturbances or background noise) alone, making the desired voice clearer. Existing microphone array technology has some limitations, for example, the number of microphones in an array is limited by the size of the device and power consumption and cannot be greatly increased, and the distance from the array to the sound source is far enough to make the recorded audio signal-to-noise ratio lower.
The wearable device is portable equipment with higher degree of fit with the user, and it can be used to fields such as health monitoring, virtual display. Most of the prior wearable devices are provided with watches, earphones and glasses as main components, and the styles are relatively fixed. The flexible wearable device has high mechanical flexibility, can be well attached to the skin, and can achieve better fusion of a person and a sensor. The flexible MEMS microphone array is small in size and low in power consumption, can be well attached to the skin surface of a human body, and is convenient to wear. In the actual movement process, the integration of real-time voice signal-acquisition-storage-processing is realized, and the expected voice signal is fed back to appointed personnel in real time.
The invention provides a voice enhancement method based on wearable equipment so as to realize multi-person voice separation and enhancement in a complex environment.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a flexible microphone array voice enhancement method and device, electronic equipment and medium.
According to a first aspect of an embodiment of the present invention, there is provided a flexible microphone array speech enhancement method, including:
acquiring a voice threshold value;
detecting the existence of voice based on the voice threshold value;
solving a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding out a maximum value of the spectrum function, wherein an angle corresponding to the maximum value is a sound source direction angle;
and carrying out beam response optimization on the signal in the sound source direction to strengthen the voice signal in the sound source direction, and outputting the strengthened voice signal after wiener filtering processing.
Further, obtaining the voice threshold value includes:
speech threshold value L τ The expression of (2) is as follows:
L τ =(1-β)L 0 +βL 1
wherein L is 0 Is the energy of general environmental noise, L 1 The method is the field environment energy collected in the preset time; beta is a weight coefficient.
Further, performing voice presence detection based on the voice threshold value includes:
dividing signals received by a microphone array into a plurality of sub-bands;
calculating the binary Gaussian log-likelihood ratio of each sub-band;
weighting and summing the binary Gaussian log-likelihood ratios of all the sub-bands;
and when the sum of the binary Gaussian log-likelihood ratios of all the sub-bands is larger than the voice threshold value, judging that voice exists in the signals received by the microphone array.
Further, calculating a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding a maximum value of the spectrum function, wherein an angle corresponding to the maximum value, namely a sound source direction angle, comprises:
constructing a covariance matrix of the received signals of the microphone array;
eigenvalue decomposition is carried out on the covariance matrix to obtain lambda 1 ,λ 2 …λ M A characteristic value;
lambda is set to 1 ,λ 2 …λ M The eigenvalues being arranged in descending order, i.e. lambda 1 ≥…≥λ j >λ j+1 =…=λ M =σ 2 Wherein sigma 2 Is the noise power;
lambda is set to j+1 …λ M The eigenvectors corresponding to the eigenvalues are spread into noise subspaces;
and constructing a spectrum function based on the steering vector of the flexible annular microphone array and the noise subspace, searching spectrum peaks of the spectrum function, and finding out all spectrum function maxima within 0 to 360 degrees, wherein angles corresponding to the maxima are angle estimation of the source direction of the sound source.
Further, constructing a spectral function based on the steering vector and the noise subspace of the flexible annular microphone array includes:
the expression of the spectral function is as follows:
Figure BDA0004161157070000031
wherein a (θ) = [ α (θ) 1 ),α(θ 2 )…α(θ n )]Is a steering vector for the flexible annular microphone array,
Figure BDA0004161157070000032
Figure BDA0004161157070000033
θ n n=1, 2 … N, the angle of incidence of the signal with respect to the individual microphones in the array; c is the sound velocity in the air and d is the spacing between the microphones in the annular array; v (V) Noise Representing the noise subspace.
Further, performing beam response optimization on the signal of the sound source direction to enhance the sound source direction voice signal includes:
acquiring a single sound source direction beam response S (θ i )=W H A (θ); i=1, 2..d, D is the number of sound sources determined by spectral peak search, W is a weight vector;
by minimizing the beam response S (θ i ) To enhance the source directional speech signal.
Further, by minimizing the beam response S (θ i ) Enhancing the source directional speech signal includes:
minimizing beam response S (θ) i ) The minimization problem expression of (2) is:
min W H A(θ)
s.t.W H A(θ 0 )=1
wherein A (θ) 0 ) Is a steering vector for the enhanced sound source direction;
adding punishment items to the minimization problem to update the minimization problem to:
Figure BDA0004161157070000034
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004161157070000035
called penalty parameters;
introducing an auxiliary variable b, for W H The following minimization problem was found:
Figure BDA0004161157070000036
wherein q is the iteration number;
for the variable W in (1) H Deriving and making the result equal to 0, we get the following about W H Is represented by the expression:
Figure BDA0004161157070000037
the Laplacian operator in the step (2) is expanded by using the center difference to obtain iterative calculation W H Is a numerical format of (1):
Figure BDA0004161157070000038
setting the stopping condition of iterative optimization as follows
Figure BDA0004161157070000039
Epsilon is a real number greater than 0.
According to a second aspect of an embodiment of the present invention, there is provided a flexible microphone array speech enhancement apparatus for implementing the above-mentioned flexible microphone array speech enhancement method, the apparatus including: an annular array of annular flexible microphones; a plurality of acoustic sensors are arranged in the annular array of the annular flexible microphone at equal intervals, and a power switch key, an enhanced sound source switching key and an earphone are further arranged on the annular array of the annular flexible microphone.
According to a third aspect of embodiments of the present invention, there is provided an electronic device comprising a memory and a processor, the memory being coupled to the processor; the memory is used for storing program data, and the processor is used for executing the program data to realize the flexible microphone array voice enhancement method.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described flexible microphone array speech enhancement method.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a flexible microphone array voice enhancement method, which comprises the steps of calculating a spectrum function of a covariance matrix of signals received by a microphone array, searching spectrum peaks, finding a maximum value of the spectrum function, and taking an angle corresponding to the maximum value as a sound source direction angle; the method of the invention realizes the separation and enhancement of multi-person voice in complex environment. Meanwhile, the invention also designs a flexible microphone array voice enhancement device, which is switched by keys to listen to the enhanced voice of each sound source direction, solves the problem that the multi-person mixed voice is difficult to distinguish in a complex scene, conveniently carries out real-time voice processing and enhancement, and is particularly suitable for multi-person conversation scenes in an outdoor noisy environment.
Drawings
Fig. 1 is a schematic diagram of a flexible microphone array arrangement of the invention;
FIG. 2 is a flow chart of the present invention for speech separation and enhancement;
FIG. 3 is a schematic diagram of the result of the calculation of the sum of the likelihood ratios for voice presence detection of the present invention;
FIG. 4 is a schematic illustration of the result of the voice presence detection of the present invention;
FIG. 5 is a schematic illustration of the sound source localization results of the present invention;
FIG. 6 is a waveform diagram of a single sensor input signal to a microphone array in an embodiment of the invention;
FIG. 7 is a waveform diagram of a voice signal of a sound source direction after signal enhancement by a single sensor in an embodiment of the present invention;
fig. 8 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to specific embodiments and figures. The following examples are presented only to aid in the understanding of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
In order to solve the problem of multi-person mixed voice distinction in a multi-person conversation scene in a noisy environment, the embodiment of the invention provides a flexible microphone array voice enhancement method. As shown in fig. 1, the method is implemented based on a flexible microphone array voice enhancement device, the device comprises a circular flexible microphone annular array, a plurality of acoustic sensors are arranged at equal intervals on the circular flexible microphone annular array, and a power switch key B1 and an enhanced sound source switching key B2 are also arranged on the circular flexible microphone annular array. When the power switch is used, the power switch key B1 is pressed for starting, and the power switch key B1 is pressed for 3 seconds for shutting down. The expected sound source signals separated by the device can be switched every time the enhanced sound source switching key B2 is pressed, so that a multi-sound source separation scene is satisfied. Meanwhile, a small earphone is arranged on the annular array of the annular flexible microphone, and is used for a user to listen to the array receiving signal and the enhanced voice signal. The annular flexible microphone array is attached to an annular area formed by the forehead, the upper part of the left ear, the rear brain spoon and the upper part of the right ear of a human body, and can be worn on the head like a hairband.
Fig. 2 shows a flowchart of a method for enhancing voice of a flexible microphone array according to an embodiment of the present invention, where the method specifically includes the following steps:
step S1, a voice threshold value is obtained.
In step S1, the flexible microphone array speech enhancement device is turned on each time, a sound source existing in a 10 second field environment is collected, calibration is automatically performed on the flexible microphone array speech enhancement device, and a threshold value L for determining whether speech exists in the environment is calculated τ
Wherein L is τ Is an automatic update threshold, and its expression is:
L τ =(1-β)L 0 +βL 1
wherein L is 0 Is the common environmental noise energy collected a priori, L 1 Is the flexible microphone array voice enhancement device after each power-onThe field environment energy collected in the first 10 seconds; beta is a weight coefficient, in this example, 0.95 is taken from the empirical weight coefficient beta; site environmental energy L 1 The expression is calculated by squaring and summing signals of the flexible microphone array: l (L) 1 =∑X 2 (t), t represents time. The speech signal acquired by the flexible microphone array is denoted as X (t) = (X) 1 (t),…x n (t)), n=1, 2 …, N, where x n (t) is the speech signal picked up by a single microphone, N is the number of microphones in the array, n=32 in this embodiment.
And S2, detecting the existence of the voice based on the voice threshold value obtained in the step S1.
Under the condition of no voice, the voice separation work is not needed, so the voice existence detection is needed to be carried out on the collected signals by the flexible microphone array voice enhancement device.
The invention judges whether the signal received by the flexible microphone array contains voice or not by calculating the sub-band log likelihood ratio of the array signal. Specifically, in this example, the signal received by the flexible microphone array is divided into a plurality of sub-bands, the sum of binary gaussian log likelihood ratios of all the sub-bands is calculated, and the voice existence detection is performed based on the voice threshold value obtained in step S1. Wherein, the signal received by the nth microphone is expressed as:
x n (t)=g n,d (t)*s d (t)+g n,i (t)*s i (t)+v n (t)。
wherein s is d (t) represents a desired speech signal, s i (t) represents an interfering acoustic signal, v n And (t) is other ambient noise. g n,d (t) an acoustic impulse response function representing the nth microphone and the desired speech signal, g n,i (t) represents the acoustic impulse response function of the nth microphone and the interfering speech signal.
The speech and interfering sound signals are regarded as two independent uncorrelated variables, where d is the eigenvalue of the corresponding speech signal. In this example, the received signal band is divided into 4 sub-bands, 100-400 Hz, 400-1000 Hz, 1000-2000 Hz, 2000-3500 Hz, respectively.
The binary gaussian log likelihood ratio for each subband is calculated as follows:
Figure BDA0004161157070000061
wherein mu ds Is the power average value, mu, of the corresponding voice signal in a sub-band is Is the power average of the corresponding interfering acoustic signal in a subband. Sigma (sigma) ds Is the power variance, sigma, of the corresponding speech signal in a subband is Is the power variance of the corresponding interfering acoustic signal within a subband. Mu (mu) dE Is the power average, mu, of noise associated with the speech signal in a subband iE Is the power average of noise associated with interfering acoustic signals and ambient noise in a sub-band. Sigma (sigma) dE Is the power variance, sigma, of the speech-related noise in a subband iE Is the power variance of noise associated with interfering acoustic signals and ambient noise within a subband. k represents the number of subbands.
Weighting and summing the binary Gaussian log-likelihood ratios of all the sub-bands, and judging that the voice exists in the signals received by the microphone array when the sum of the binary Gaussian log-likelihood ratios of all the sub-bands is larger than the voice threshold value; the expression is as follows:
Figure BDA0004161157070000062
wherein alpha is k Is a per subband weight. If L is greater than or equal to L τ Then speech is considered to be present in the received signal.
FIG. 3 shows the result of the calculation of the sum of the log likelihood ratios of 4 subbands over a period of time in the present embodiment, the signal in the box of FIG. 4 representing the detected speech signal interval corresponds to the sum of the likelihood ratios exceeding the threshold L in FIG. 3 τ Is a region of (a) in the above-mentioned region(s). In the semi-anechoic chamber test environment, L τ =3.1,α k =0.25。
Step S3, a spectrum function is obtained for a covariance matrix of signals received by the microphone array, spectrum peak search is carried out, and a maximum value of the spectrum function is found, wherein an angle corresponding to the maximum value is a sound source direction angle.
The covariance matrix of the received signals of the microphone array is expressed as:
Figure BDA0004161157070000063
where H is the conjugate transpose operator and n is the number corresponding to an individual microphone in the microphone array.
The covariance matrix R is subjected to eigenvalue decomposition, and a voice signal subspace and a noise subspace are constructed by utilizing eigenvectors, and the two subspaces are orthogonal because the voice signal and the noise are mutually independent. The following results were obtained:
R=V∑V H
wherein V= [ V Speech ,V Noise ],∑=diag(λ 1 ,λ 2 …λ M ),λ M Is the covariance matrix R, and the eigenvalues are arranged in descending order from large to small, i.e., lambda 1 ≥…≥λ j >λ j+1 =…=λ M =σ 2 Wherein sigma 2 Is the noise power, the first j eigenvalues are related to the speech signal and have a value greater than sigma 2 Eigenvector Zhang Chengyu tone signal subspace V corresponding to the j eigenvalues Speech . From lambda j+1 To lambda M The eigenvectors corresponding to the eigenvalues are spread out to be noise subspace V Noise
The source direction angle theta of the voice signal can be obtained by searching spectral peaks of a spatial spectral function, and the expression of the spectral function is as follows:
Figure BDA0004161157070000071
wherein a (θ) = [ α (θ) 1 ),α(θ 2 )…α(θ n )]Is a steering vector for the flexible annular microphone array,
Figure BDA0004161157070000072
Figure BDA0004161157070000073
θ n n=1, 2 … N, the angle of incidence of the signal with respect to the individual microphones in the array. c is the speed of sound in air and d is the spacing between the microphones in the annular array.
And (3) carrying out spectral peak search on the spectral function P (theta), and finding all P (theta) maxima within 0 to 360 degrees, wherein the theta corresponding to the maxima is the angle estimation of the source direction of the sound source.
In the case of multiple sound sources, a sound source localization direction vector θ= (θ) is output 1 ,θ 2 …θ D ). D is the number of sound sources judged by the spectral peak search. As shown in fig. 5, in the present embodiment, three sound sources are found, the direction angles of which are 40 degrees, 128 degrees, and 220 degrees, respectively.
And S4, carrying out beam response optimization on the sound source direction signal to strengthen the sound source direction voice signal, and outputting the strengthened voice signal after wiener filtering processing.
The single source direction beam response S (θ i )=W H A (θ), i=1, 2..d, the speech enhancement of the sound source direction is achieved by minimizing the beam response S (θ i ) And obtain a weight vector
Figure BDA0004161157070000074
This optimization problem can be expressed as:
min W H A(θ)
s.t.W H A(θ 0 )=1
wherein A (θ) 0 ) Is the steering vector of the enhanced sound source direction.
Adding a penalty term to this limited minimization problem, changes the optimization problem to:
Figure BDA0004161157070000075
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0004161157070000076
referred to as penalty parameters.
Introducing an auxiliary variable b, for W H The following minimization problem was found:
Figure BDA0004161157070000077
where q is the number of iterations.
For the variable W in (1) H Deriving and making the result equal to 0, we get the following about W H Is represented by the expression:
Figure BDA0004161157070000081
the Laplacian operator in the step (2) is expanded by using the center difference to obtain iterative calculation W H Is a numerical format of (1):
Figure BDA0004161157070000082
the stop condition for iterative computation is
Figure BDA0004161157070000083
Epsilon is a fractional number greater than 0. And further suppressing noise of the output voice signal of the expected sound source through wiener filtering processing, improving the expected voice enhancement effect of the sound source direction, and obtaining the finally enhanced output voice signal.
Further, the user can switch the enhanced voice signals of different sound source directions through the enhanced sound source switching key B2.
The mixed voice signal collected by the 4 th microphone of the flexible microphone array in this embodiment is shown in fig. 6, and the waveform diagram of the output of the enhanced voice signal in the direction of the 40-degree sound source after separation is shown in fig. 7. As can be seen from fig. 7, the enhanced speech signal waveform exhibits a great suppression effect on the interference signal, and only the amplitude of the desired speech signal is more prominent, as compared with the waveform diagrams of the mixed speech signals of fig. 6 and 7.
As shown in fig. 8, an embodiment of the present application provides an electronic device, which includes a memory 101 for storing one or more programs; a processor 102. The method of any of the first aspects described above is implemented when one or more programs are executed by the processor 102.
And a communication interface 103, where the memory 101, the processor 102 and the communication interface 103 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used to store software programs and modules that are stored within the memory 101 for execution by the processor 102 to perform various functional applications and data processing. The communication interface 103 may be used for communication of signaling or data with other node devices.
The Memory 101 may be, but is not limited to, a random access Memory 101 (Random Access Memory, RAM), a Read Only Memory 101 (ROM), a programmable Read Only Memory 101 (Programmable Read-Only Memory, PROM), an erasable Read Only Memory 101 (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory 101 (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
The processor 102 may be an integrated circuit chip with signal processing capabilities. The processor 102 may be a general purpose processor 102, including a central processor 102 (Central Processing Unit, CPU), a network processor 102 (Network Processor, NP), etc.; but may also be a digital signal processor 102 (Digital Signal Processing, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In the embodiments provided in the present application, it should be understood that the disclosed method and system may be implemented in other manners. The above-described method and system embodiments are merely illustrative, for example, flow charts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
In another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by the processor 102, implements a method as in any of the first aspects described above. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory 101 (ROM), a random access Memory 101 (RAM, random Access Memory), a magnetic disk or an optical disk, or other various media capable of storing program codes.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.

Claims (10)

1. A method of flexible microphone array speech enhancement, the method comprising:
acquiring a voice threshold value;
detecting the existence of voice based on the voice threshold value;
solving a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding out a maximum value of the spectrum function, wherein an angle corresponding to the maximum value is a sound source direction angle;
and carrying out beam response optimization on the signal in the sound source direction to strengthen the voice signal in the sound source direction, and outputting the strengthened voice signal after wiener filtering processing.
2. The method of claim 1, wherein obtaining a speech threshold value comprises:
speech threshold value L τ The expression of (2) is as follows:
L τ =(1-β)L 0 +βL 1
wherein L is 0 Is the energy of general environmental noise, L 1 The method is the field environment energy collected in the preset time; beta is a weight coefficient.
3. The flexible microphone array voice enhancement method of claim 1 or 2, wherein detecting the presence of voice based on a voice threshold value comprises:
dividing signals received by a microphone array into a plurality of sub-bands;
calculating the binary Gaussian log-likelihood ratio of each sub-band;
weighting and summing the binary Gaussian log-likelihood ratios of all the sub-bands;
and when the sum of the binary Gaussian log-likelihood ratios of all the sub-bands is larger than the voice threshold value, judging that voice exists in the signals received by the microphone array.
4. The method for enhancing voice of a flexible microphone array according to claim 1, wherein the step of finding a maximum value of a spectral peak by calculating a spectral function of a covariance matrix of signals received by the microphone array, wherein an angle corresponding to the maximum value, that is, a sound source direction angle, comprises:
constructing a covariance matrix of the received signals of the microphone array;
eigenvalue decomposition is carried out on the covariance matrix to obtain lambda 12 …λ M A characteristic value;
lambda is set to 12 …λ M The eigenvalues being arranged in descending order, i.e. lambda 1 ≥…>λ j >λ j+1 =…=λ M =σ 2 Wherein sigma 2 Is the noise power;
lambda is set to j+1 …λ M The eigenvectors corresponding to the eigenvalues are spread into noise subspaces;
and constructing a spectrum function based on the steering vector of the flexible annular microphone array and the noise subspace, searching spectrum peaks of the spectrum function, and finding out all spectrum function maxima within 0 to 360 degrees, wherein angles corresponding to the maxima are angle estimation of the source direction of the sound source.
5. The flexible microphone array speech enhancement method of claim 1 or 4, wherein constructing a spectral function based on steering vectors and noise subspaces of the flexible annular microphone array comprises:
the expression of the spectral function is as follows:
Figure FDA0004161157060000021
wherein a) = [ α (θ) 1 ),α(θ 2 )…α(θ n )]Is a steering vector for the flexible annular microphone array,
Figure FDA0004161157060000022
Figure FDA0004161157060000023
θ n n=1, 2 … N, the angle of incidence of the signal with respect to the individual microphones in the array; c is the sound velocity in the air and d is the spacing between the microphones in the annular array; v (V) Noise Representing the noise subspace.
6. The flexible microphone array speech enhancement method of claim 1, wherein beam response optimizing the signal of the sound source direction to enhance the sound source direction speech signal comprises:
acquiring a single source direction beam response S at an angle θ i )=W H A) The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2 … D, D is the number of sound sources determined by spectral peak search, W is a weight vector;
by minimizing beam response S i ) To enhance the source directional speech signal.
7. The method of claim 6, wherein the beam response S is minimized by minimizing the beam response S i ) Enhancing the source directional speech signal includes:
minimizing beam response S i ) The minimization problem expression of (2) is:
minW H A)
s.t. H A(θ 0 )=1
wherein, the liquid crystal display device comprises a liquid crystal display device,A(θ 0 ) Is a steering vector for the enhanced sound source direction;
adding punishment items to the minimization problem to update the minimization problem to:
Figure FDA0004161157060000024
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure FDA0004161157060000025
called penalty parameters;
introducing an auxiliary variable b, for W H The following minimization problem was found:
Figure FDA0004161157060000026
wherein q is the iteration number;
for the variable W in (1) H Deriving and making the result equal to 0, we get the following about W H Is represented by the expression:
Figure FDA0004161157060000027
the Laplacian operator in the step (2) is expanded by using the center difference to obtain iterative calculation W H Is a numerical format of (1):
Figure FDA0004161157060000031
setting the stopping condition of iterative optimization as follows
Figure FDA0004161157060000032
Epsilon is a real number greater than 0.
8. A flexible microphone array speech enhancement device for implementing the flexible microphone array speech enhancement method of any of claims 1-7, the device comprising: an annular array of annular flexible microphones; a plurality of acoustic sensors are arranged in the annular array of the annular flexible microphone at equal intervals, and a power switch key, an enhanced sound source switching key and an earphone are further arranged on the annular array of the annular flexible microphone.
9. An electronic device comprising a memory and a processor, wherein the memory is coupled to the processor; wherein the memory is for storing program data and the processor is for executing the program data to implement the flexible microphone array speech enhancement method of any of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the flexible microphone array speech enhancement method of any of claims 1-7.
CN202310349782.9A 2023-03-28 2023-03-28 Flexible microphone array voice enhancement method and device, electronic equipment and medium Pending CN116343808A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310349782.9A CN116343808A (en) 2023-03-28 2023-03-28 Flexible microphone array voice enhancement method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310349782.9A CN116343808A (en) 2023-03-28 2023-03-28 Flexible microphone array voice enhancement method and device, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN116343808A true CN116343808A (en) 2023-06-27

Family

ID=86894837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310349782.9A Pending CN116343808A (en) 2023-03-28 2023-03-28 Flexible microphone array voice enhancement method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN116343808A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037836A (en) * 2023-10-07 2023-11-10 之江实验室 Real-time sound source separation method and device based on signal covariance matrix reconstruction

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037836A (en) * 2023-10-07 2023-11-10 之江实验室 Real-time sound source separation method and device based on signal covariance matrix reconstruction
CN117037836B (en) * 2023-10-07 2023-12-29 之江实验室 Real-time sound source separation method and device based on signal covariance matrix reconstruction

Similar Documents

Publication Publication Date Title
Ishi et al. Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments
US9837099B1 (en) Method and system for beam selection in microphone array beamformers
US7613310B2 (en) Audio input system
CN110770827B (en) Near field detector based on correlation
US10957338B2 (en) 360-degree multi-source location detection, tracking and enhancement
US20070100605A1 (en) Method for processing audio-signals
Taseska et al. Informed spatial filtering for sound extraction using distributed microphone arrays
JP2002062348A (en) Apparatus and method for processing signal
Koldovský et al. Spatial source subtraction based on incomplete measurements of relative transfer function
CN107211225A (en) Hearing assistant system
Grondin et al. Time difference of arrival estimation based on binary frequency mask for sound source localization on mobile robots
Talagala et al. Binaural sound source localization using the frequency diversity of the head-related transfer function
TW202147862A (en) Robust speaker localization in presence of strong noise interference systems and methods
CN116343808A (en) Flexible microphone array voice enhancement method and device, electronic equipment and medium
Levin et al. Near-field signal acquisition for smartglasses using two acoustic vector-sensors
Hosseini et al. Time difference of arrival estimation of sound source using cross correlation and modified maximum likelihood weighting function
Corey et al. Motion-tolerant beamforming with deformable microphone arrays
Choi et al. Convolutional neural network-based direction-of-arrival estimation using stereo microphones for drone
Carabias-Orti et al. Multi-source localization using a DOA Kernel based spatial covariance model and complex nonnegative matrix factorization
CN110858485B (en) Voice enhancement method, device, equipment and storage medium
CN113223552A (en) Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program
CA3146517A1 (en) Speech-tracking listening device
Karthik et al. Subband Selection for Binaural Speech Source Localization.
Kowalczyk et al. Embedded system for acquisition and enhancement of audio signals
CN117037836B (en) Real-time sound source separation method and device based on signal covariance matrix reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination