CN116343808A - Flexible microphone array voice enhancement method and device, electronic equipment and medium - Google Patents
Flexible microphone array voice enhancement method and device, electronic equipment and medium Download PDFInfo
- Publication number
- CN116343808A CN116343808A CN202310349782.9A CN202310349782A CN116343808A CN 116343808 A CN116343808 A CN 116343808A CN 202310349782 A CN202310349782 A CN 202310349782A CN 116343808 A CN116343808 A CN 116343808A
- Authority
- CN
- China
- Prior art keywords
- microphone array
- voice
- flexible
- sound source
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000001228 spectrum Methods 0.000 claims abstract description 26
- 230000004044 response Effects 0.000 claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000005457 optimization Methods 0.000 claims abstract description 9
- 238000001914 filtration Methods 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 28
- 230000003595 spectral effect Effects 0.000 claims description 14
- 239000013598 vector Substances 0.000 claims description 14
- 239000004973 liquid crystal related substance Substances 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 230000002708 enhancing effect Effects 0.000 claims description 4
- 230000007613 environmental effect Effects 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000000926 separation method Methods 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 11
- 230000002452 interceptive effect Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 230000004807 localization Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005316 response function Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 208000011293 voice disease Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Abstract
The invention discloses a flexible microphone array voice enhancement method and device, electronic equipment and medium, comprising the following steps: acquiring a voice threshold value; dividing a signal received by a microphone array into a plurality of sub-bands, calculating the sum of binary Gaussian log-likelihood ratios of all the sub-bands, and detecting the existence of voice based on a voice threshold value; solving a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding out a maximum value of the spectrum function, wherein an angle corresponding to the maximum value is a sound source direction angle; and carrying out beam response optimization on the signal in the sound source direction to strengthen the voice signal in the sound source direction, and outputting the strengthened voice signal after wiener filtering processing. The method can realize the real-time voice signal separation and enhancement in a multi-person conversation scene in a noisy environment.
Description
Technical Field
The present invention relates to flexible circuits, and more particularly, to a method and apparatus for voice enhancement of a flexible microphone array, an electronic device, and a medium.
Background
The multi-person voice recognition and separation in complex environments is a very important and practical task. There are many scenarios in life, such as, for example, multi-person indoor conferences, team outdoor activities, all performed in noisy environments. The signals recorded by the traditional sensor system contain background noise and multi-person voice signals at the same time, so that the sounding positions and speaking contents of all persons cannot be effectively distinguished. Therefore, the conventional audio transceiving system cannot achieve enhancement and transfer of a desired sound source signal for a specific direction.
A microphone array refers to a group of acoustic sensors (called microphones) placed in an orderly arrangement that can achieve better spatial directivity than a single microphone by the interaction of small time differences between the arrival of sound waves at each microphone in the array. Microphone arrays are commonly used for sound source localization, suppressing background noise, signal extraction and separation. The microphone array does not limit the movement of a speaker, can position a sound source at any position in space, and is an important basic device in man-machine interaction and voice directional pickup. The problem of speech separation was originally derived from the "cocktail party problem" and was aimed at separating the desired speaker's voice from the noisy environment (other human voice disturbances or background noise) alone, making the desired voice clearer. Existing microphone array technology has some limitations, for example, the number of microphones in an array is limited by the size of the device and power consumption and cannot be greatly increased, and the distance from the array to the sound source is far enough to make the recorded audio signal-to-noise ratio lower.
The wearable device is portable equipment with higher degree of fit with the user, and it can be used to fields such as health monitoring, virtual display. Most of the prior wearable devices are provided with watches, earphones and glasses as main components, and the styles are relatively fixed. The flexible wearable device has high mechanical flexibility, can be well attached to the skin, and can achieve better fusion of a person and a sensor. The flexible MEMS microphone array is small in size and low in power consumption, can be well attached to the skin surface of a human body, and is convenient to wear. In the actual movement process, the integration of real-time voice signal-acquisition-storage-processing is realized, and the expected voice signal is fed back to appointed personnel in real time.
The invention provides a voice enhancement method based on wearable equipment so as to realize multi-person voice separation and enhancement in a complex environment.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a flexible microphone array voice enhancement method and device, electronic equipment and medium.
According to a first aspect of an embodiment of the present invention, there is provided a flexible microphone array speech enhancement method, including:
acquiring a voice threshold value;
detecting the existence of voice based on the voice threshold value;
solving a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding out a maximum value of the spectrum function, wherein an angle corresponding to the maximum value is a sound source direction angle;
and carrying out beam response optimization on the signal in the sound source direction to strengthen the voice signal in the sound source direction, and outputting the strengthened voice signal after wiener filtering processing.
Further, obtaining the voice threshold value includes:
speech threshold value L τ The expression of (2) is as follows:
L τ =(1-β)L 0 +βL 1
wherein L is 0 Is the energy of general environmental noise, L 1 The method is the field environment energy collected in the preset time; beta is a weight coefficient.
Further, performing voice presence detection based on the voice threshold value includes:
dividing signals received by a microphone array into a plurality of sub-bands;
calculating the binary Gaussian log-likelihood ratio of each sub-band;
weighting and summing the binary Gaussian log-likelihood ratios of all the sub-bands;
and when the sum of the binary Gaussian log-likelihood ratios of all the sub-bands is larger than the voice threshold value, judging that voice exists in the signals received by the microphone array.
Further, calculating a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding a maximum value of the spectrum function, wherein an angle corresponding to the maximum value, namely a sound source direction angle, comprises:
constructing a covariance matrix of the received signals of the microphone array;
eigenvalue decomposition is carried out on the covariance matrix to obtain lambda 1 ,λ 2 …λ M A characteristic value;
lambda is set to 1 ,λ 2 …λ M The eigenvalues being arranged in descending order, i.e. lambda 1 ≥…≥λ j >λ j+1 =…=λ M =σ 2 Wherein sigma 2 Is the noise power;
lambda is set to j+1 …λ M The eigenvectors corresponding to the eigenvalues are spread into noise subspaces;
and constructing a spectrum function based on the steering vector of the flexible annular microphone array and the noise subspace, searching spectrum peaks of the spectrum function, and finding out all spectrum function maxima within 0 to 360 degrees, wherein angles corresponding to the maxima are angle estimation of the source direction of the sound source.
Further, constructing a spectral function based on the steering vector and the noise subspace of the flexible annular microphone array includes:
the expression of the spectral function is as follows:
wherein a (θ) = [ α (θ) 1 ),α(θ 2 )…α(θ n )]Is a steering vector for the flexible annular microphone array, θ n n=1, 2 … N, the angle of incidence of the signal with respect to the individual microphones in the array; c is the sound velocity in the air and d is the spacing between the microphones in the annular array; v (V) Noise Representing the noise subspace.
Further, performing beam response optimization on the signal of the sound source direction to enhance the sound source direction voice signal includes:
acquiring a single sound source direction beam response S (θ i )=W H A (θ); i=1, 2..d, D is the number of sound sources determined by spectral peak search, W is a weight vector;
by minimizing the beam response S (θ i ) To enhance the source directional speech signal.
Further, by minimizing the beam response S (θ i ) Enhancing the source directional speech signal includes:
minimizing beam response S (θ) i ) The minimization problem expression of (2) is:
min W H A(θ)
s.t.W H A(θ 0 )=1
wherein A (θ) 0 ) Is a steering vector for the enhanced sound source direction;
adding punishment items to the minimization problem to update the minimization problem to:
wherein, the liquid crystal display device comprises a liquid crystal display device,called penalty parameters;
introducing an auxiliary variable b, for W H The following minimization problem was found:
wherein q is the iteration number;
for the variable W in (1) H Deriving and making the result equal to 0, we get the following about W H Is represented by the expression:
the Laplacian operator in the step (2) is expanded by using the center difference to obtain iterative calculation W H Is a numerical format of (1):
setting the stopping condition of iterative optimization as followsEpsilon is a real number greater than 0.
According to a second aspect of an embodiment of the present invention, there is provided a flexible microphone array speech enhancement apparatus for implementing the above-mentioned flexible microphone array speech enhancement method, the apparatus including: an annular array of annular flexible microphones; a plurality of acoustic sensors are arranged in the annular array of the annular flexible microphone at equal intervals, and a power switch key, an enhanced sound source switching key and an earphone are further arranged on the annular array of the annular flexible microphone.
According to a third aspect of embodiments of the present invention, there is provided an electronic device comprising a memory and a processor, the memory being coupled to the processor; the memory is used for storing program data, and the processor is used for executing the program data to realize the flexible microphone array voice enhancement method.
According to a fourth aspect of embodiments of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described flexible microphone array speech enhancement method.
Compared with the prior art, the invention has the beneficial effects that: the invention provides a flexible microphone array voice enhancement method, which comprises the steps of calculating a spectrum function of a covariance matrix of signals received by a microphone array, searching spectrum peaks, finding a maximum value of the spectrum function, and taking an angle corresponding to the maximum value as a sound source direction angle; the method of the invention realizes the separation and enhancement of multi-person voice in complex environment. Meanwhile, the invention also designs a flexible microphone array voice enhancement device, which is switched by keys to listen to the enhanced voice of each sound source direction, solves the problem that the multi-person mixed voice is difficult to distinguish in a complex scene, conveniently carries out real-time voice processing and enhancement, and is particularly suitable for multi-person conversation scenes in an outdoor noisy environment.
Drawings
Fig. 1 is a schematic diagram of a flexible microphone array arrangement of the invention;
FIG. 2 is a flow chart of the present invention for speech separation and enhancement;
FIG. 3 is a schematic diagram of the result of the calculation of the sum of the likelihood ratios for voice presence detection of the present invention;
FIG. 4 is a schematic illustration of the result of the voice presence detection of the present invention;
FIG. 5 is a schematic illustration of the sound source localization results of the present invention;
FIG. 6 is a waveform diagram of a single sensor input signal to a microphone array in an embodiment of the invention;
FIG. 7 is a waveform diagram of a voice signal of a sound source direction after signal enhancement by a single sensor in an embodiment of the present invention;
fig. 8 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to specific embodiments and figures. The following examples are presented only to aid in the understanding of the invention. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.
In order to solve the problem of multi-person mixed voice distinction in a multi-person conversation scene in a noisy environment, the embodiment of the invention provides a flexible microphone array voice enhancement method. As shown in fig. 1, the method is implemented based on a flexible microphone array voice enhancement device, the device comprises a circular flexible microphone annular array, a plurality of acoustic sensors are arranged at equal intervals on the circular flexible microphone annular array, and a power switch key B1 and an enhanced sound source switching key B2 are also arranged on the circular flexible microphone annular array. When the power switch is used, the power switch key B1 is pressed for starting, and the power switch key B1 is pressed for 3 seconds for shutting down. The expected sound source signals separated by the device can be switched every time the enhanced sound source switching key B2 is pressed, so that a multi-sound source separation scene is satisfied. Meanwhile, a small earphone is arranged on the annular array of the annular flexible microphone, and is used for a user to listen to the array receiving signal and the enhanced voice signal. The annular flexible microphone array is attached to an annular area formed by the forehead, the upper part of the left ear, the rear brain spoon and the upper part of the right ear of a human body, and can be worn on the head like a hairband.
Fig. 2 shows a flowchart of a method for enhancing voice of a flexible microphone array according to an embodiment of the present invention, where the method specifically includes the following steps:
step S1, a voice threshold value is obtained.
In step S1, the flexible microphone array speech enhancement device is turned on each time, a sound source existing in a 10 second field environment is collected, calibration is automatically performed on the flexible microphone array speech enhancement device, and a threshold value L for determining whether speech exists in the environment is calculated τ 。
Wherein L is τ Is an automatic update threshold, and its expression is:
L τ =(1-β)L 0 +βL 1
wherein L is 0 Is the common environmental noise energy collected a priori, L 1 Is the flexible microphone array voice enhancement device after each power-onThe field environment energy collected in the first 10 seconds; beta is a weight coefficient, in this example, 0.95 is taken from the empirical weight coefficient beta; site environmental energy L 1 The expression is calculated by squaring and summing signals of the flexible microphone array: l (L) 1 =∑X 2 (t), t represents time. The speech signal acquired by the flexible microphone array is denoted as X (t) = (X) 1 (t),…x n (t)), n=1, 2 …, N, where x n (t) is the speech signal picked up by a single microphone, N is the number of microphones in the array, n=32 in this embodiment.
And S2, detecting the existence of the voice based on the voice threshold value obtained in the step S1.
Under the condition of no voice, the voice separation work is not needed, so the voice existence detection is needed to be carried out on the collected signals by the flexible microphone array voice enhancement device.
The invention judges whether the signal received by the flexible microphone array contains voice or not by calculating the sub-band log likelihood ratio of the array signal. Specifically, in this example, the signal received by the flexible microphone array is divided into a plurality of sub-bands, the sum of binary gaussian log likelihood ratios of all the sub-bands is calculated, and the voice existence detection is performed based on the voice threshold value obtained in step S1. Wherein, the signal received by the nth microphone is expressed as:
x n (t)=g n,d (t)*s d (t)+g n,i (t)*s i (t)+v n (t)。
wherein s is d (t) represents a desired speech signal, s i (t) represents an interfering acoustic signal, v n And (t) is other ambient noise. g n,d (t) an acoustic impulse response function representing the nth microphone and the desired speech signal, g n,i (t) represents the acoustic impulse response function of the nth microphone and the interfering speech signal.
The speech and interfering sound signals are regarded as two independent uncorrelated variables, where d is the eigenvalue of the corresponding speech signal. In this example, the received signal band is divided into 4 sub-bands, 100-400 Hz, 400-1000 Hz, 1000-2000 Hz, 2000-3500 Hz, respectively.
The binary gaussian log likelihood ratio for each subband is calculated as follows:
wherein mu ds Is the power average value, mu, of the corresponding voice signal in a sub-band is Is the power average of the corresponding interfering acoustic signal in a subband. Sigma (sigma) ds Is the power variance, sigma, of the corresponding speech signal in a subband is Is the power variance of the corresponding interfering acoustic signal within a subband. Mu (mu) dE Is the power average, mu, of noise associated with the speech signal in a subband iE Is the power average of noise associated with interfering acoustic signals and ambient noise in a sub-band. Sigma (sigma) dE Is the power variance, sigma, of the speech-related noise in a subband iE Is the power variance of noise associated with interfering acoustic signals and ambient noise within a subband. k represents the number of subbands.
Weighting and summing the binary Gaussian log-likelihood ratios of all the sub-bands, and judging that the voice exists in the signals received by the microphone array when the sum of the binary Gaussian log-likelihood ratios of all the sub-bands is larger than the voice threshold value; the expression is as follows:
wherein alpha is k Is a per subband weight. If L is greater than or equal to L τ Then speech is considered to be present in the received signal.
FIG. 3 shows the result of the calculation of the sum of the log likelihood ratios of 4 subbands over a period of time in the present embodiment, the signal in the box of FIG. 4 representing the detected speech signal interval corresponds to the sum of the likelihood ratios exceeding the threshold L in FIG. 3 τ Is a region of (a) in the above-mentioned region(s). In the semi-anechoic chamber test environment, L τ =3.1,α k =0.25。
Step S3, a spectrum function is obtained for a covariance matrix of signals received by the microphone array, spectrum peak search is carried out, and a maximum value of the spectrum function is found, wherein an angle corresponding to the maximum value is a sound source direction angle.
The covariance matrix of the received signals of the microphone array is expressed as:
where H is the conjugate transpose operator and n is the number corresponding to an individual microphone in the microphone array.
The covariance matrix R is subjected to eigenvalue decomposition, and a voice signal subspace and a noise subspace are constructed by utilizing eigenvectors, and the two subspaces are orthogonal because the voice signal and the noise are mutually independent. The following results were obtained:
R=V∑V H
wherein V= [ V Speech ,V Noise ],∑=diag(λ 1 ,λ 2 …λ M ),λ M Is the covariance matrix R, and the eigenvalues are arranged in descending order from large to small, i.e., lambda 1 ≥…≥λ j >λ j+1 =…=λ M =σ 2 Wherein sigma 2 Is the noise power, the first j eigenvalues are related to the speech signal and have a value greater than sigma 2 Eigenvector Zhang Chengyu tone signal subspace V corresponding to the j eigenvalues Speech . From lambda j+1 To lambda M The eigenvectors corresponding to the eigenvalues are spread out to be noise subspace V Noise 。
The source direction angle theta of the voice signal can be obtained by searching spectral peaks of a spatial spectral function, and the expression of the spectral function is as follows:
wherein a (θ) = [ α (θ) 1 ),α(θ 2 )…α(θ n )]Is a steering vector for the flexible annular microphone array, θ n n=1, 2 … N, the angle of incidence of the signal with respect to the individual microphones in the array. c is the speed of sound in air and d is the spacing between the microphones in the annular array.
And (3) carrying out spectral peak search on the spectral function P (theta), and finding all P (theta) maxima within 0 to 360 degrees, wherein the theta corresponding to the maxima is the angle estimation of the source direction of the sound source.
In the case of multiple sound sources, a sound source localization direction vector θ= (θ) is output 1 ,θ 2 …θ D ). D is the number of sound sources judged by the spectral peak search. As shown in fig. 5, in the present embodiment, three sound sources are found, the direction angles of which are 40 degrees, 128 degrees, and 220 degrees, respectively.
And S4, carrying out beam response optimization on the sound source direction signal to strengthen the sound source direction voice signal, and outputting the strengthened voice signal after wiener filtering processing.
The single source direction beam response S (θ i )=W H A (θ), i=1, 2..d, the speech enhancement of the sound source direction is achieved by minimizing the beam response S (θ i ) And obtain a weight vectorThis optimization problem can be expressed as:
min W H A(θ)
s.t.W H A(θ 0 )=1
wherein A (θ) 0 ) Is the steering vector of the enhanced sound source direction.
Adding a penalty term to this limited minimization problem, changes the optimization problem to:
wherein, the liquid crystal display device comprises a liquid crystal display device,referred to as penalty parameters.
Introducing an auxiliary variable b, for W H The following minimization problem was found:
where q is the number of iterations.
For the variable W in (1) H Deriving and making the result equal to 0, we get the following about W H Is represented by the expression:
the Laplacian operator in the step (2) is expanded by using the center difference to obtain iterative calculation W H Is a numerical format of (1):
the stop condition for iterative computation isEpsilon is a fractional number greater than 0. And further suppressing noise of the output voice signal of the expected sound source through wiener filtering processing, improving the expected voice enhancement effect of the sound source direction, and obtaining the finally enhanced output voice signal.
Further, the user can switch the enhanced voice signals of different sound source directions through the enhanced sound source switching key B2.
The mixed voice signal collected by the 4 th microphone of the flexible microphone array in this embodiment is shown in fig. 6, and the waveform diagram of the output of the enhanced voice signal in the direction of the 40-degree sound source after separation is shown in fig. 7. As can be seen from fig. 7, the enhanced speech signal waveform exhibits a great suppression effect on the interference signal, and only the amplitude of the desired speech signal is more prominent, as compared with the waveform diagrams of the mixed speech signals of fig. 6 and 7.
As shown in fig. 8, an embodiment of the present application provides an electronic device, which includes a memory 101 for storing one or more programs; a processor 102. The method of any of the first aspects described above is implemented when one or more programs are executed by the processor 102.
And a communication interface 103, where the memory 101, the processor 102 and the communication interface 103 are electrically connected directly or indirectly to each other to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 101 may be used to store software programs and modules that are stored within the memory 101 for execution by the processor 102 to perform various functional applications and data processing. The communication interface 103 may be used for communication of signaling or data with other node devices.
The Memory 101 may be, but is not limited to, a random access Memory 101 (Random Access Memory, RAM), a Read Only Memory 101 (ROM), a programmable Read Only Memory 101 (Programmable Read-Only Memory, PROM), an erasable Read Only Memory 101 (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory 101 (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
The processor 102 may be an integrated circuit chip with signal processing capabilities. The processor 102 may be a general purpose processor 102, including a central processor 102 (Central Processing Unit, CPU), a network processor 102 (Network Processor, NP), etc.; but may also be a digital signal processor 102 (Digital Signal Processing, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In the embodiments provided in the present application, it should be understood that the disclosed method and system may be implemented in other manners. The above-described method and system embodiments are merely illustrative, for example, flow charts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
In another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by the processor 102, implements a method as in any of the first aspects described above. The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory 101 (ROM), a random access Memory 101 (RAM, random Access Memory), a magnetic disk or an optical disk, or other various media capable of storing program codes.
The above embodiments are merely for illustrating the design concept and features of the present invention, and are intended to enable those skilled in the art to understand the content of the present invention and implement the same, the scope of the present invention is not limited to the above embodiments. Therefore, all equivalent changes or modifications according to the principles and design ideas of the present invention are within the scope of the present invention.
Claims (10)
1. A method of flexible microphone array speech enhancement, the method comprising:
acquiring a voice threshold value;
detecting the existence of voice based on the voice threshold value;
solving a spectrum function of a covariance matrix of signals received by the microphone array, searching spectrum peaks, and finding out a maximum value of the spectrum function, wherein an angle corresponding to the maximum value is a sound source direction angle;
and carrying out beam response optimization on the signal in the sound source direction to strengthen the voice signal in the sound source direction, and outputting the strengthened voice signal after wiener filtering processing.
2. The method of claim 1, wherein obtaining a speech threshold value comprises:
speech threshold value L τ The expression of (2) is as follows:
L τ =(1-β)L 0 +βL 1
wherein L is 0 Is the energy of general environmental noise, L 1 The method is the field environment energy collected in the preset time; beta is a weight coefficient.
3. The flexible microphone array voice enhancement method of claim 1 or 2, wherein detecting the presence of voice based on a voice threshold value comprises:
dividing signals received by a microphone array into a plurality of sub-bands;
calculating the binary Gaussian log-likelihood ratio of each sub-band;
weighting and summing the binary Gaussian log-likelihood ratios of all the sub-bands;
and when the sum of the binary Gaussian log-likelihood ratios of all the sub-bands is larger than the voice threshold value, judging that voice exists in the signals received by the microphone array.
4. The method for enhancing voice of a flexible microphone array according to claim 1, wherein the step of finding a maximum value of a spectral peak by calculating a spectral function of a covariance matrix of signals received by the microphone array, wherein an angle corresponding to the maximum value, that is, a sound source direction angle, comprises:
constructing a covariance matrix of the received signals of the microphone array;
eigenvalue decomposition is carried out on the covariance matrix to obtain lambda 1 ,λ 2 …λ M A characteristic value;
lambda is set to 1 ,λ 2 …λ M The eigenvalues being arranged in descending order, i.e. lambda 1 ≥…>λ j >λ j+1 =…=λ M =σ 2 Wherein sigma 2 Is the noise power;
lambda is set to j+1 …λ M The eigenvectors corresponding to the eigenvalues are spread into noise subspaces;
and constructing a spectrum function based on the steering vector of the flexible annular microphone array and the noise subspace, searching spectrum peaks of the spectrum function, and finding out all spectrum function maxima within 0 to 360 degrees, wherein angles corresponding to the maxima are angle estimation of the source direction of the sound source.
5. The flexible microphone array speech enhancement method of claim 1 or 4, wherein constructing a spectral function based on steering vectors and noise subspaces of the flexible annular microphone array comprises:
the expression of the spectral function is as follows:
wherein a) = [ α (θ) 1 ),α(θ 2 )…α(θ n )]Is a steering vector for the flexible annular microphone array, θ n n=1, 2 … N, the angle of incidence of the signal with respect to the individual microphones in the array; c is the sound velocity in the air and d is the spacing between the microphones in the annular array; v (V) Noise Representing the noise subspace.
6. The flexible microphone array speech enhancement method of claim 1, wherein beam response optimizing the signal of the sound source direction to enhance the sound source direction speech signal comprises:
acquiring a single source direction beam response S at an angle θ i )=W H A) The method comprises the steps of carrying out a first treatment on the surface of the i=1, 2 … D, D is the number of sound sources determined by spectral peak search, W is a weight vector;
by minimizing beam response S i ) To enhance the source directional speech signal.
7. The method of claim 6, wherein the beam response S is minimized by minimizing the beam response S i ) Enhancing the source directional speech signal includes:
minimizing beam response S i ) The minimization problem expression of (2) is:
minW H A)
s.t. H A(θ 0 )=1
wherein, the liquid crystal display device comprises a liquid crystal display device,A(θ 0 ) Is a steering vector for the enhanced sound source direction;
adding punishment items to the minimization problem to update the minimization problem to:
wherein, the liquid crystal display device comprises a liquid crystal display device,called penalty parameters;
introducing an auxiliary variable b, for W H The following minimization problem was found:
wherein q is the iteration number;
for the variable W in (1) H Deriving and making the result equal to 0, we get the following about W H Is represented by the expression:
the Laplacian operator in the step (2) is expanded by using the center difference to obtain iterative calculation W H Is a numerical format of (1):
8. A flexible microphone array speech enhancement device for implementing the flexible microphone array speech enhancement method of any of claims 1-7, the device comprising: an annular array of annular flexible microphones; a plurality of acoustic sensors are arranged in the annular array of the annular flexible microphone at equal intervals, and a power switch key, an enhanced sound source switching key and an earphone are further arranged on the annular array of the annular flexible microphone.
9. An electronic device comprising a memory and a processor, wherein the memory is coupled to the processor; wherein the memory is for storing program data and the processor is for executing the program data to implement the flexible microphone array speech enhancement method of any of claims 1-7.
10. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the flexible microphone array speech enhancement method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310349782.9A CN116343808A (en) | 2023-03-28 | 2023-03-28 | Flexible microphone array voice enhancement method and device, electronic equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310349782.9A CN116343808A (en) | 2023-03-28 | 2023-03-28 | Flexible microphone array voice enhancement method and device, electronic equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116343808A true CN116343808A (en) | 2023-06-27 |
Family
ID=86894837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310349782.9A Pending CN116343808A (en) | 2023-03-28 | 2023-03-28 | Flexible microphone array voice enhancement method and device, electronic equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116343808A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117037836A (en) * | 2023-10-07 | 2023-11-10 | 之江实验室 | Real-time sound source separation method and device based on signal covariance matrix reconstruction |
-
2023
- 2023-03-28 CN CN202310349782.9A patent/CN116343808A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117037836A (en) * | 2023-10-07 | 2023-11-10 | 之江实验室 | Real-time sound source separation method and device based on signal covariance matrix reconstruction |
CN117037836B (en) * | 2023-10-07 | 2023-12-29 | 之江实验室 | Real-time sound source separation method and device based on signal covariance matrix reconstruction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ishi et al. | Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments | |
US9837099B1 (en) | Method and system for beam selection in microphone array beamformers | |
US7613310B2 (en) | Audio input system | |
CN110770827B (en) | Near field detector based on correlation | |
US10957338B2 (en) | 360-degree multi-source location detection, tracking and enhancement | |
US20070100605A1 (en) | Method for processing audio-signals | |
Taseska et al. | Informed spatial filtering for sound extraction using distributed microphone arrays | |
JP2002062348A (en) | Apparatus and method for processing signal | |
Koldovský et al. | Spatial source subtraction based on incomplete measurements of relative transfer function | |
CN107211225A (en) | Hearing assistant system | |
Grondin et al. | Time difference of arrival estimation based on binary frequency mask for sound source localization on mobile robots | |
Talagala et al. | Binaural sound source localization using the frequency diversity of the head-related transfer function | |
TW202147862A (en) | Robust speaker localization in presence of strong noise interference systems and methods | |
CN116343808A (en) | Flexible microphone array voice enhancement method and device, electronic equipment and medium | |
Levin et al. | Near-field signal acquisition for smartglasses using two acoustic vector-sensors | |
Hosseini et al. | Time difference of arrival estimation of sound source using cross correlation and modified maximum likelihood weighting function | |
Corey et al. | Motion-tolerant beamforming with deformable microphone arrays | |
Choi et al. | Convolutional neural network-based direction-of-arrival estimation using stereo microphones for drone | |
Carabias-Orti et al. | Multi-source localization using a DOA Kernel based spatial covariance model and complex nonnegative matrix factorization | |
CN110858485B (en) | Voice enhancement method, device, equipment and storage medium | |
CN113223552A (en) | Speech enhancement method, speech enhancement device, speech enhancement apparatus, storage medium, and program | |
CA3146517A1 (en) | Speech-tracking listening device | |
Karthik et al. | Subband Selection for Binaural Speech Source Localization. | |
Kowalczyk et al. | Embedded system for acquisition and enhancement of audio signals | |
CN117037836B (en) | Real-time sound source separation method and device based on signal covariance matrix reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |