WO2023066213A1 - Microphone array and signal processing method and apparatus therefor, and device and medium - Google Patents

Microphone array and signal processing method and apparatus therefor, and device and medium Download PDF

Info

Publication number
WO2023066213A1
WO2023066213A1 PCT/CN2022/125739 CN2022125739W WO2023066213A1 WO 2023066213 A1 WO2023066213 A1 WO 2023066213A1 CN 2022125739 W CN2022125739 W CN 2022125739W WO 2023066213 A1 WO2023066213 A1 WO 2023066213A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency band
frequency
main lobe
array
weighting coefficient
Prior art date
Application number
PCT/CN2022/125739
Other languages
French (fr)
Chinese (zh)
Inventor
李天宇
Original Assignee
广州视源电子科技股份有限公司
广州视源人工智能创新研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司, 广州视源人工智能创新研究院有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2023066213A1 publication Critical patent/WO2023066213A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/01Satellite radio beacon positioning systems transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/13Receivers
    • G01S19/21Interference related issues ; Issues related to cross-correlation, spoofing or other methods of denial of service
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/22Arrangements for obtaining desired frequency or directional characteristics for obtaining desired frequency characteristic only 
    • H04R1/26Spatial arrangements of separate transducers responsive to two or more frequency ranges
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the present application relates to the field of circuit technology, for example, to a microphone array and a signal processing method, device, device and medium thereof.
  • the microphones are evenly arranged on a straight line, and the distance between adjacent microphones is equal.
  • beamforming techniques are usually applied, which usually require the distance between the microphones to be comparable to the signal wavelength.
  • the voice signal has a wide frequency band.
  • Effective beamforming for high-frequency signals requires a sufficiently small spacing between microphone array elements, and effective beamforming for low-frequency signals requires a sufficiently large array aperture.
  • the inventors found that if the existing array structure with evenly arranged microphones is used, in order to meet the requirements of high and low frequency beamforming at the same time, the number of microphones required is large, which not only increases the hardware cost and structural complexity, but also increases the beam Form the calculation amount of the algorithm.
  • the existing beamforming algorithm is usually a delay-accumulation method.
  • this method to perform beamforming on wideband speech signals, there are three problems: first, the shape of the beam pattern is related to frequency, and the width of the main lobe of the beam varies with frequency. The second is that the attenuation of noise in the entire frequency band is non-uniform, resulting in artificial noise in the beam output. The third is that when the incident direction of the sound wave deviates from the main lobe direction, the beamforming process introduces a low-pass filter effect, resulting in distortion of the output signal.
  • the purpose of this application is to provide a non-uniform linear microphone array structure and corresponding audio signal processing method, so as to obtain better sound pickup effect under the condition of using the same number of microphones, or reduce the array under the condition of ensuring the sound pickup effect the number of microphones in the
  • the application provides a non-uniform linear microphone array, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphones are arranged On the same straight line, and the spacing between adjacent microphones is not equal, wherein the larger the spacing between the expansion microphone and the center microphone pair, the greater the distance between the expansion microphone and the adjacent microphones on the side near the center of the array. The distance is also larger.
  • the present application provides an audio signal processing method of a microphone array, wherein the method includes:
  • the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
  • each frequency point selects the corresponding constraint condition and cost function, and optimize and solve the weighting coefficient, wherein, the cost function is calculated by the array steering vector matrix and the weighting coefficient, and the frequency independence is completed. After optimizing the weighting coefficient of , it is smoothed in the frequency domain;
  • Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  • the present application also proposes an audio signal processing device for a microphone array, wherein the software includes:
  • the array steering vector group calculation unit is used to calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
  • the weighting coefficient solving unit is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting coefficients Calculated, frequency-domain smoothing is performed after completing frequency-independent weighting coefficient optimization;
  • a signal extraction unit configured to extract a time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain a corresponding multi-channel time-frequency domain signal;
  • the spatial filtering unit is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain a time-frequency domain beam output signal and complete spatial filtering;
  • a signal generating module configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  • the present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, an audio signal processing method of a microphone array is implemented, which is applied to the above-mentioned A microphone array; wherein, the audio signal processing method of the microphone array includes: calculating a corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes corresponding different Several array-steering vector matrices of frequency points; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions to optimize and solve the weighting coefficients, wherein the cost function is composed of the array-steering vector matrix and the The above weighting coefficients are calculated; extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain Corresponding multi-channel time-frequency domain signals; respectively weighting and s
  • the present application also proposes a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, an audio signal processing method of a microphone array is implemented, which is applied to the above-mentioned microphone array ;
  • the audio signal processing method of the microphone array includes: calculating the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several corresponding to different frequency points array steering vector matrix; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time frequency domain signal; weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficient
  • a non-uniform linear microphone array of the present application arranges the expansion microphones non-uniformly around the center microphone group, and the farther away from the center, the larger the distance between adjacent expansion microphones, thus taking into account the High and low frequency beamforming requires minimum array element spacing and maximum array aperture.
  • the number of array elements is the same, the wavelength range that can be covered is wider.
  • the array area is the same, the number of microphone array elements required is less; and through this
  • An audio signal processing method for a microphone array is applied for, which uses different loss functions to attenuate the output power of the original audio signal obtained by the above-mentioned non-uniform linear microphone array according to different angle ranges and frequency ranges, thereby improving the obtained target audio frequency. The quality of the signal.
  • Fig. 1 is a structural schematic diagram of a non-uniform linear microphone array of an embodiment
  • Fig. 2 is a specific structural schematic diagram of a non-uniform linear microphone array of an embodiment
  • Fig. 3 is a specific structural schematic diagram of a non-uniform linear microphone array of an embodiment
  • FIG. 4 is a schematic flowchart of an audio signal processing method of a microphone array in an embodiment
  • FIG. 5 is a schematic flowchart of a method for processing an audio signal of a microphone array according to an embodiment
  • FIG. 6 is a broadband beam diagram of an audio signal in the prior art
  • Fig. 7 is the broadband beam diagram of the target audio signal of an embodiment
  • FIG. 8 is a schematic block diagram of the structure of an audio signal processing device for a microphone array according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • Microphone array 101. Center microphone pair; 102. Extended microphones.
  • FIG. 1 it is a non-uniform linear microphone array disclosed in the present application, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the The extended microphones are arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein, the larger the distance between the extended microphone and the center pair of microphones, the greater the distance between the extended microphone and the pair near the center of the array. The spacing between adjacent microphones is also greater.
  • the non-uniform linear microphone array provided by this embodiment is usually used in large audio and video conference screens, smart blackboards, and other devices that have certain requirements for sound pickup quality, and is used to collect noise around the device. voice message.
  • the non-uniform linear microphone array includes a center microphone pair and at least two extension microphones, wherein the center microphone pair includes two center microphones, and the center microphone pair and the extension microphones are arranged on the same straight line.
  • the distance between the center microphone pair d0 can be reserved according to its specific size, and it is symmetrical with the vertical line of the center microphone pair connection line Axis, a number of extension microphones are arranged symmetrically along the extension line of the above-mentioned two central microphones, and the distance between the extension microphone and the adjacent microphone on the center side of the array is d1, d2, d3 from inside to outside. Since the microphone array is often used in conjunction with the camera module, in this embodiment, the value of d0 can be selected according to the actual situation, and the remaining distances are designed to satisfy d1 ⁇ d2 ⁇ d3.
  • the distance between adjacent expansion microphones increases as the distance from the central microphone pair increases.
  • the non-uniform microphone array structure arranged according to the above principles can simultaneously achieve smaller spacing between adjacent microphone units and a larger overall aperture of the microphone array, namely The center microphone pair has a smaller-than-average spacing between its adjacent expansion microphones, while the front and rear side microphones have a larger-than-average spacing between their neighbors, allowing for greater optimal beamforming with the same number of microphones Frequency range, improve pickup quality.
  • the aforementioned non-uniform linear microphone array structure includes a single hardware structure, and also includes a combination of several non-uniform and asymmetric microphone array hardware structures in the form of sub-arrays.
  • FIG. 4 it is a schematic flow chart of an audio signal processing algorithm of a microphone array disclosed in the present application, and the method includes:
  • each frequency point selects the corresponding constraint condition and cost function, and optimize and solve the weighting coefficient, wherein, the cost function is calculated by the array steering vector matrix and the weighting coefficient, and the completion
  • the frequency-independent weighting coefficients are optimized and then smoothed in the frequency domain;
  • steps S1 and S2 only need to be performed once, and the frequency-domain beamforming weighting coefficients are obtained and stored, and the frequency-domain beamforming weighting coefficients are no longer modified with the change of the received signal until
  • the structural parameters of the microphone array are changed; the structural parameters include the number of microphones included in the microphone array.
  • step S1 firstly, it is necessary to calculate the array steering vector group according to the structural parameters of the microphone array and the signal acquisition channels. Specifically, the actual audio signal is simulated by the imaginary signal, and an array guide is calculated corresponding to each analysis frequency point and incoming wave direction according to the above-mentioned non-uniform linear microphone array structure, signal acquisition channel, signal sampling rate, and number of analysis frequency points. vector.
  • the array steering vector group consists of 512 array steering vector matrices with a dimension of 8 ⁇ 181, wherein each array steering vector matrix includes 181 steering vectors corresponding to different directions of incoming waves.
  • the dimension of the array steering vector matrix is 6 ⁇ 181.
  • the meaning of the above-mentioned sub-band optimization weighting coefficients is: according to the signal sampling rate and the number of analysis frequency points, all the processing frequency bands are divided into three processing frequency bands: low, medium and high, and the frequency domain is adjusted in the low, medium and high frequency bands.
  • beamforming weighting coefficients are optimized and solved, different constraints and cost functions are used to complete frequency-independent weighting coefficient optimization and then frequency domain smoothing.
  • the beamforming process is performed in the frequency domain, so it is necessary to transform the original microphone acquisition signal into the time-frequency domain.
  • the signals of all channels of the microphone array can be extracted, or only part of the channel signals can be extracted, and the positions of the microphones corresponding to the channels can be asymmetrical. Referring to Fig.
  • the sound pressure signals at M microphone positions are respectively expressed as x 1 (t),..., x M (t), and the corresponding multiples are obtained after sampling and buffering Channel time-domain audio signals x 1 (l),...,x M (l), and then after discrete Fourier transform (Discrete Fourier Transform, DFT), get a multi-channel time-frequency domain signal with K analysis frequency points , that is, x 1 (1),...,x 1 (K),...,x M (1),...,x M (K).
  • a specific analysis window is usually used to complete the time-frequency domain conversion.
  • the microphone array is composed of 8 microphones
  • the number of analysis frequency points is selected as 512
  • all channels are selected for beamforming
  • the multi-channel time-frequency domain signal of one frame is expressed as an 8 ⁇ 512 matrix.
  • step S4 after the multi-channel audio signal is transformed into the frequency domain, it is weighted and summed according to the weighting coefficient matrix calculated in the above step S2 to obtain the beam output frequency domain signal Y(1 ),...,Y(K).
  • the beam output frequency domain signals Y(1),...,Y(K) corresponding to K frequency points are subjected to inverse discrete Fourier transform (Inverse Discrete Fourier Transform, IDFT), and finally obtain a multi-channel time-domain audio signal y(l).
  • IDFT inverse discrete Fourier transform
  • the designated manner of the signal acquisition channel is one of the following manners:
  • a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
  • some channels are asymmetrically selected and designated as the signal acquisition channels, for example, an asymmetric multi-channel signal composed of the first seven microphones is extracted from the microphone array shown in FIG. 2 .
  • all channel signals can be selected and designated as the signal acquisition channel, and the time-domain audio signal corresponding to the signal acquisition channel is extracted from the multi-channel signal collected by the microphone array. Discrete Fourier transform is performed on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain signal.
  • the number of microphones to be gated is equal to the total number of microphones in the array, and their positions are symmetrical.
  • the dimension of the array steering vector matrix determined according to this channel selection method is the number of microphone units in the array, and The desired beam response during optimization of the weighting coefficients is symmetric.
  • some symmetrical channels may also be selected to be designated as the signal acquisition channels for subsequent beamforming.
  • the number of gated microphones is less than that of the array
  • the total number of microphones, and their positions are symmetrical.
  • the dimension of the array steering vector matrix determined according to this channel selection method is smaller than the number of microphone units in the array, and the expected beam response in the process of optimizing the weighting coefficients is symmetrical.
  • the number of channels participating in beamforming processing is reduced, so the number of weighting coefficients used is reduced, and the calculation amount of beamforming processing is reduced.
  • the multi-channel signal selection link in addition to the above-mentioned method of selecting all or part of the symmetric channels as the signal acquisition channel, it is also possible to select part of the asymmetric channel signal and designate it as the signal acquisition channel for subsequent beamforming.
  • the number of gated microphones is less than the number of microphone units in the array, and their positions are asymmetrical.
  • the dimension of the array steering vector matrix determined according to this channel selection method is less than the number of microphone units in the array, and the weighting coefficients are optimized
  • the desired beam response in the process is asymmetric.
  • the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
  • the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  • all the processing frequency bands are divided into three processing frequency bands: low, medium and high, and when optimizing and solving the frequency domain beamforming weighting coefficients in the low, medium and high frequency bands, different constraint conditions are respectively adopted and cost function.
  • the division of the processing frequency band is based on whether the distance between the microphone array elements and the value of the half-wavelength of the signal are similar.
  • the mid-frequency band is the frequency range in which the array structure can be used to form a more ideal desired beam response by optimizing the weighting coefficients, that is, the basis for determining the frequency cut-off point between the mid-frequency band and the low-frequency band is the half-wavelength value of the frequency signal and the maximum microphone spacing of the array Similarly, the basis for determining the frequency cut-off point between the mid-frequency band and the high-frequency band is that the half-wavelength value of the frequency signal is close to the minimum microphone spacing of the array.
  • the distance between the two microphones at the beginning and the end is the largest, which is 3300mm, and the distance between the center microphone pair and the adjacent extended microphone is the smallest, which is 25mm.
  • the half-wavelength of the lower limit frequency and upper limit frequency signals of the mid-frequency band are 3300mm and 25mm respectively.
  • the corresponding frequencies are 500Hz and 6600Hz. Therefore, a division of low, medium and high frequency bands suitable for the microphone array structure is shown in Table 1 below.
  • the weighting coefficients obtained through the optimization solution have discontinuity between frequency points, that is, there is a gap between the low, medium and high frequency bands.
  • a series of obvious discontinuities will have a certain degree of artificial noise in the corresponding beam output signal.
  • First judge the main discontinuity point of the weighting coefficient of each channel in the frequency domain then set a transition band covering several frequency points near the discontinuity point, and smooth the weighting coefficient of the frequency points covered by the transition band, which can reduce the beam Artifacts in the output audio signal.
  • the above-mentioned adjacent frequency point refers to the previous frequency point and the next frequency point of a certain frequency point.
  • a certain frequency point is the 256th frequency point among all 512 analysis frequency points.
  • the points are the 255th frequency point and the 257th frequency point, and the corresponding first-order difference average is the average value between the first-order differences corresponding to the 255th frequency point and the 257th frequency point.
  • the relative deviation between the first-order difference value of the 256th frequency point and the corresponding first-order difference average value is greater than the preset deviation threshold, it means that the rate of change of the weighting coefficient at the 256th frequency point is relatively large, that is, not enough Smoothing, therefore, the 255th frequency point to the 257th frequency point are used as the interval to be smoothed, and the weighting coefficient in this interval is smoothed, for example, the weighting coefficient is set to be the same as the average value of the first-order difference, etc.
  • the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:
  • the constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
  • the constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
  • the constraints of the high-frequency band include: the norm of the weighting coefficient is less than the high-band threshold of the norm of the weighting coefficient, and the deviation between the main lobe of the high-frequency beam output and the expected main lobe response of the high-frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
  • the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
  • the constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  • the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band are specified first, and the expected main lobe response and mid-high frequency range of the mid-frequency band are specified.
  • the main lobe deviation threshold of each frequency band is specified, and then the specific constraints and cost functions of each frequency band are specified.
  • the number of analysis frequency points is selected to be 512.
  • a frequency point to be optimized is selected, and the beam outputs of all directions of arrival are calculated according to the array steering vector matrix of the corresponding frequency point and the weighting coefficients to be optimized, and then the cost function is calculated according to the definition and optimized under constraints.
  • an 8 ⁇ 512 weighting coefficient matrix is obtained, which includes frequency-domain weighting coefficients corresponding to 8 microphone channels at 512 frequency points, and is used for weighted summation of multi-channel time-frequency domain signals in beamforming processing.
  • different constraint conditions and cost functions are used when optimizing and solving the frequency-domain beamforming weighting coefficients in the low, medium and high frequency bands.
  • the frequency ranges of the specified frequency bands are as shown in Table 1 above.
  • the expected beam main lobe angle range D 2 of the mid-frequency band 60° ⁇ 120° can be taken as the expected beam main lobe angle range D 2 of the mid-frequency band. Since it is difficult to form an obvious beam in the low frequency band, 50° ⁇ 130° is taken as the expected beam main lobe angle range D 1 in the low frequency band.
  • the expected beam main lobe angle range D3 in the high frequency band will be given later.
  • Angular ranges outside the desired beam main lobe angular range are designated as beam attenuation angular ranges (C 1 , C 2 , C 3 ) in all frequency bands.
  • the norm of the weighting coefficient needs to be constrained. Since it is difficult to optimize the ideal beam pattern in the low frequency band, a reasonable design paradigm is to gradually strengthen the constraint as the frequency increases. Therefore, the weighting coefficient norm thresholds ( ⁇ 1, ⁇ 2, ⁇ 3 ) of the low, medium and high frequency bands are selected as 1.5, 1.2 and 1.0 in turn.
  • the deviation between the optimized beam main lobe response and the expected beam main lobe response is constrained by using the beam main lobe deviation threshold.
  • the beam mainlobe deviation threshold constraint ⁇ 2 is specified as 0.7 in the mid-band.
  • the desired beam main lobe angle range ⁇ 3 in the high frequency band will be given later. Since it is difficult to form a clear beam in the low frequency band, this constraint is not carried out.
  • this embodiment does not directly give the analytical formula of the expected main lobe response, but the expected main lobe response It is specified as the beam pattern obtained by the uniform weighting of the microphone array at a certain frequency, and the frequency is specified here as twice the lower limit of the mid-band frequency f 2L .
  • the acquisition method of the high frequency band expected main lobe response is:
  • the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high frequency band expectation main lobe response.
  • the optimized mid-band beam output main lobe shape can be used as the expected main lobe response y 3 when the high-band weighting coefficients are optimized. Since the output main lobe shape of the optimized mid-band beam is unknown, the angular range D3 of the high-band beam main lobe can be specified as the angle range corresponding to the -6dB beamwidth of the optimized mid-band beam output main lobe.
  • the beam pattern of the complete frequency band obtained according to the above-mentioned optimized weighting coefficients is shown in Figure 6, and Figure 7 is the beam pattern of the complete frequency band obtained by the microphone in the prior art, it can be seen that the audio signal of the microphone array provided by this embodiment
  • the processing method is in the main frequency band of speech (0.5kHz ⁇ 6.0kHz), the main lobe width of the beam pattern is approximately constant, the gain in the 90° direction is maintained at 0dB, and the signal is strongly attenuated in the 0° ⁇ 60° and 120° ⁇ 180° directions , that is, the effect of filtering out noise and interference is better.
  • the output signal within the main lobe angle of the beam is consistent with the preset
  • the beam main lobe response deviation is small and has high robustness; by optimizing the output signal power within the beam attenuation angle as a cost function, the maximum degree of suppression of the incoming wave signal in the direction of the non-beam main lobe can be achieved; through Designing different constraints and cost functions in the low, medium, and high frequency bands can avoid the situation that the optimization problem has no solution; by setting a transition band covering several frequency points near the discontinuity point of the weighting coefficient, and covering the transition band
  • the weighting coefficients of the frequency points are smoothed, which can reduce the artificial noise in the beam output audio signal; through the weighting coefficient design method of first dividing the frequency bands into independent optimization and then smoothing the frequency domain, a group of multi-channel weighting coefficients can be obtained, according to the multi-channel
  • an audio signal processing device for a microphone array including:
  • the array steering vector group calculation unit 100 is used to calculate the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points ;
  • the weighting coefficient solving unit 200 is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting The coefficients are calculated, and then the frequency-domain smoothing is performed after completing the frequency-independent weighting coefficient optimization;
  • the signal extraction unit 300 is configured to extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain Corresponding multi-channel time-frequency domain signal;
  • the spatial filtering unit 400 is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients, to obtain a time-frequency domain beam output signal, and complete spatial filtering;
  • the signal generation module 500 is configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  • the designated manner of the signal acquisition channel is one of the following manners:
  • a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
  • the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
  • the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  • the spatial domain filtering unit 400 is further configured to:
  • the array steering vector matrix is calculated according to the imaginary signals of all directions of arrival, and the weighting coefficient solving unit 200 is further used for:
  • the constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the beam attenuation angle range of the low frequency band pass through Beamforming output power;
  • the constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
  • the constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
  • the output power is calculated by the array steering vector matrix and the weighting coefficients;
  • the output main lobe of the mid-frequency band beam is calculated according to the weighting coefficients optimized in the mid-frequency band to the array steering vector matrix It is obtained by performing weighted summation, and the output main lobe of the high-frequency band beam is obtained by performing weighted summation on the array steering vector matrix according to the optimized weight coefficient in the high-frequency band;
  • the constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  • the acquisition method of the high frequency band expected main lobe response is:
  • the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high frequency band expectation main lobe response.
  • an embodiment of the present application also provides a computer device, which may be a server, and its internal structure may be as shown in FIG. 9 .
  • the computer device includes a processor, memory, network interface and database connected by a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, computer programs and databases.
  • the memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store data such as the audio signal processing method of the microphone array.
  • the network interface of the computer device is used to communicate with an external terminal via a network connection.
  • an audio signal processing method of a microphone array is realized.
  • the audio signal processing method of the microphone array is applied to a microphone array; wherein, the microphone array includes a central microphone pair, and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and The extended microphones are arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein, the larger the distance between the extended microphone and the central microphone pair, the larger the distance between the extended microphone and the side near the center of the array.
  • the distance between adjacent microphones is also larger; the method includes: calculating the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes corresponding to different frequency points Several array-steering vector matrices; according to the frequency band to which each frequency point belongs, select the corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array-steering vector matrix and the weighting The coefficients are calculated, and then frequency-domain smoothing is performed after completing frequency-independent weighting coefficient optimization; the time-domain audio signal corresponding to the signal acquisition channel is extracted from the multi-channel signal collected by the microphone array, and the signal is collected The time-domain audio signal corresponding to the channel is discretely Fourier transformed to obtain the corresponding multi-channel time-frequency domain signal; the multi-channel time-frequency domain signal of the corresponding frequency point is respectively weighted and summed by the weighting coefficient to obtain the time-frequency domain beam output signal to complete spatial domain filtering; performing
  • the designated manner of the signal acquisition channel is one of the following manners:
  • a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
  • the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
  • the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  • the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:
  • the constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
  • the constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
  • the constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
  • the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
  • the constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  • the acquisition method of the high frequency band expected main lobe response is:
  • the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
  • An embodiment of the present application also provides a computer-readable storage medium, the storage medium is a volatile storage medium or a non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, a
  • the audio signal processing method of a microphone array is applied to a microphone array;
  • the microphone array includes a central microphone pair, and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphone Arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein the larger the distance between the expansion microphone and the center microphone pair, the greater the distance between the expansion microphone and the adjacent microphone on the side near the center of the array.
  • the method includes: calculating the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several arrays corresponding to different frequency points Steering vector matrix; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients ; Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain signal; weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain a time-frequency domain beam output signal, and completing spatial filtering; discretizing the time-frequency domain beam output signal The inverse Fourier transform is used to calculate the target audio signal.
  • the audio signal processing method of the microphone array implemented above uses the norm of the weighting coefficient and the expected main lobe response as constraints, so that the output signal within the beam main lobe angle has a small deviation from the preset beam main lobe response and has a relatively small High robustness; by optimizing the output signal power within the beam attenuation angle as a cost function, it can achieve the maximum suppression of the incoming wave signal in the direction of the non-beam main lobe; by designing different Constraint conditions and cost functions can avoid the situation that the optimization problem has no solution; by setting a transition band covering several frequency points near the discontinuity point of the weight coefficient, and smoothing the weight coefficient of the frequency points covered by the transition band, it can reduce The artificial noise in the beam output audio signal; through the weighting coefficient design method of frequency-domain smoothing after independent optimization of the frequency band first, a group of multi-channel weighting coefficients can be obtained, and the non-uniform linear microphone can be adjusted according to the multi-channel weighting coefficients.
  • the multi-channel audio signal collected by the array is subjected to spatial filtering, which can improve the signal-to-noise ratio of the target audio signal obtained after processing; in addition, the multi-channel signal can also be selected in different ways for the processing, thereby satisfying other subsequent multi-channel audio processing algorithms Requirements, by replacing the real channel signal in the subsequent multi-channel algorithm with the signal that has been filtered in the spatial domain, the number of channels can be reduced and the signal quality can be improved.
  • the designated manner of the signal acquisition channel is one of the following manners:
  • a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
  • the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
  • the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  • the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:
  • the constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
  • the constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
  • the constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
  • the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
  • the constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  • the acquisition method of the high frequency band expected main lobe response is:
  • the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
  • Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SSRSDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced SDRAM
  • SLDRAM Synchronous Link (Synchlink) DRAM
  • SLDRAM Synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Otolaryngology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The present application provides a non-uniform linear microphone array. The microphone array comprises a central microphone pair, and a plurality of extended microphones, which are symmetrically arranged on two sides of the central microphone pair, the central microphone pair and the extended microphones being arranged on the same straight line, and the distances between adjacent microphones being unequal, wherein the larger the distance between the extended microphone and the central microphone pair, the larger the distance between the extended microphone and an adjacent microphone at the side close to the center of the array. By means of the structure, a better sound pickup effect can be achieved when the same number of microphones is used, or the number of microphones in an array can be reduced while the sound pickup effect is ensured. By means of an audio signal processing method and apparatus for a microphone array, and a medium of the present application, weighting coefficient optimization solving can be performed on an audio signal that is collected by the non-uniform linear microphone array, and spatial filtering is performed on the audio signal by means of a weighting coefficient, thereby improving the quality of the signal.

Description

麦克风阵列及其信号处理方法、装置、设备及介质Microphone array and its signal processing method, device, equipment and medium
本申请要求于2021年10月21日提交中国专利局、申请号为202111228311X      ,发明名称为“麦克风阵列及其信号处理方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111228311X submitted to the China Patent Office on October 21, 2021, and the invention title is "microphone array and its signal processing method, device, equipment and medium", the entire content of which is passed References are incorporated in this application.
技术领域technical field
本申请涉及电路技术领域,例如涉及一种麦克风阵列及其信号处理方法、装置、设备及介质。The present application relates to the field of circuit technology, for example, to a microphone array and a signal processing method, device, device and medium thereof.
背景技术Background technique
现有的均匀线型麦克风阵列是将麦克风均匀排布在一条直线上,相邻麦克风的间距相等。为了提升上述阵列的拾音质量,通常应用波束形成技术,该技术通常要求麦克风间距与信号波长相当。语音信号频带较宽,对高频信号进行有效的波束形成要求麦克风阵元间距足够小,对低频信号进行有效的波束形成要求阵列孔径足够大。发明人发现,如果采用现有的麦克风均匀排布的阵列结构,为了同时满足高低频波束形成的要求,则所需的麦克风数量较多,不仅增加了硬件成本和结构复杂性,还增加了波束形成算法的计算量。In the existing uniform linear microphone array, the microphones are evenly arranged on a straight line, and the distance between adjacent microphones is equal. To improve the sound pickup quality of the above-mentioned arrays, beamforming techniques are usually applied, which usually require the distance between the microphones to be comparable to the signal wavelength. The voice signal has a wide frequency band. Effective beamforming for high-frequency signals requires a sufficiently small spacing between microphone array elements, and effective beamforming for low-frequency signals requires a sufficiently large array aperture. The inventors found that if the existing array structure with evenly arranged microphones is used, in order to meet the requirements of high and low frequency beamforming at the same time, the number of microphones required is large, which not only increases the hardware cost and structural complexity, but also increases the beam Form the calculation amount of the algorithm.
此外,现有的波束形成算法通常为时延-累加方法,用该方法对宽带语音信号进行波束形成时,存在三个问题:一是其波束图形状和频率有关,且波束主瓣宽度随频率增加而减小,二是噪声在整个频带的衰减是非均匀的,导致波束输出存在人工噪音,三是当声波入射方向偏离主瓣方向时波束形成处理引入了低通滤波效果,导致输出信号失真。In addition, the existing beamforming algorithm is usually a delay-accumulation method. When using this method to perform beamforming on wideband speech signals, there are three problems: first, the shape of the beam pattern is related to frequency, and the width of the main lobe of the beam varies with frequency. The second is that the attenuation of noise in the entire frequency band is non-uniform, resulting in artificial noise in the beam output. The third is that when the incident direction of the sound wave deviates from the main lobe direction, the beamforming process introduces a low-pass filter effect, resulting in distortion of the output signal.
技术问题technical problem
本申请目的在于:提供一种非均匀线型麦克风阵列结构和对应的音频信号处理方法,从而在使用相同数量麦克风的条件下获得更优拾音效果,或在保证拾音效果的条件下减少阵列中麦克风的数量。The purpose of this application is to provide a non-uniform linear microphone array structure and corresponding audio signal processing method, so as to obtain better sound pickup effect under the condition of using the same number of microphones, or reduce the array under the condition of ensuring the sound pickup effect the number of microphones in the
技术解决方案technical solution
本申请提供了一种非均匀线型麦克风阵列,其中,包括中心麦克风对,以及在所述中心麦克风对两侧对称排布的若干个扩展麦克风;所述中心麦克风对和所述扩展麦克风排布在同一直线上,且相邻麦克风的间距不等,其中,所述扩展麦克风与所述中心麦克风对之间的间距越大,则所述扩展麦克风与靠近阵列中心一侧的相邻麦克风之间的间距也越大。The application provides a non-uniform linear microphone array, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphones are arranged On the same straight line, and the spacing between adjacent microphones is not equal, wherein the larger the spacing between the expansion microphone and the center microphone pair, the greater the distance between the expansion microphone and the adjacent microphones on the side near the center of the array. The distance is also larger.
本申请提供了一种麦克风阵列的音频信号处理方法,其中,所述方法包括:The present application provides an audio signal processing method of a microphone array, wherein the method includes:
根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的,完成频率独立的加权系数优化后再对其进行频域平滑;According to the frequency band to which each frequency point belongs, select the corresponding constraint condition and cost function, and optimize and solve the weighting coefficient, wherein, the cost function is calculated by the array steering vector matrix and the weighting coefficient, and the frequency independence is completed. After optimizing the weighting coefficient of , it is smoothed in the frequency domain;
从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;
通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;
对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
本申请还提出了一种麦克风阵列的音频信号处理装置,其中,所述软件包括:The present application also proposes an audio signal processing device for a microphone array, wherein the software includes:
阵列导向矢量组计算单元,用于根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;The array steering vector group calculation unit is used to calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
加权系数求解单元,用于根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的,完成频率独立的加权系数优化后再对其进行频域平滑;The weighting coefficient solving unit is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting coefficients Calculated, frequency-domain smoothing is performed after completing frequency-independent weighting coefficient optimization;
信号提取单元,用于从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;A signal extraction unit, configured to extract a time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain a corresponding multi-channel time-frequency domain signal;
空域滤波单元,用于通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;The spatial filtering unit is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain a time-frequency domain beam output signal and complete spatial filtering;
信号生成模块,用于对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。A signal generating module, configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
本申请还提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种麦克风阵列的音频信号处理方法,应用于上述所述的一种麦克风阵列;其中,所述麦克风阵列的音频信号处理方法包括:根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。The present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, an audio signal processing method of a microphone array is implemented, which is applied to the above-mentioned A microphone array; wherein, the audio signal processing method of the microphone array includes: calculating a corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes corresponding different Several array-steering vector matrices of frequency points; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions to optimize and solve the weighting coefficients, wherein the cost function is composed of the array-steering vector matrix and the The above weighting coefficients are calculated; extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain Corresponding multi-channel time-frequency domain signals; respectively weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain time-frequency domain beam output signals, and completing spatial filtering; Inverse discrete Fourier transform is performed on the output signal of the domain beam to calculate the target audio signal.
本申请还提出了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种麦克风阵列的音频信号处理方法,应用于上述所述的一种麦克风阵列;其中,所述麦克风阵列的音频信号处理方法包括:根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。The present application also proposes a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, an audio signal processing method of a microphone array is implemented, which is applied to the above-mentioned microphone array ; Wherein, the audio signal processing method of the microphone array includes: calculating the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several corresponding to different frequency points array steering vector matrix; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time frequency domain signal; weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain a time-frequency domain beam output signal, and completing spatial filtering; performing the time-frequency domain beam output signal The inverse discrete Fourier transform is used to calculate the target audio signal.
有益效果Beneficial effect
本申请的一种非均匀线型麦克风阵列通过将扩展麦克风以中心麦克风组为中心,向周侧进行非均匀排布,并且距离中心越远,相邻的扩展麦克风间距越大,从而同时兼顾了高低频波束形成对最小阵元间距和最大阵列孔径的要求,在阵元数量相同时,能够覆盖的波长范围更广,在阵列面积相同时,所需的麦克风阵元数量更少;并通过本申请的一种麦克风阵列的音频信号处理方法,对上述非均匀线型麦克风阵列获取的原始音频信号按照不同的角度范围和频段范围采用不同的损失函数进行输出功率衰减,提高了获取到的目标音频信号的质量。A non-uniform linear microphone array of the present application arranges the expansion microphones non-uniformly around the center microphone group, and the farther away from the center, the larger the distance between adjacent expansion microphones, thus taking into account the High and low frequency beamforming requires minimum array element spacing and maximum array aperture. When the number of array elements is the same, the wavelength range that can be covered is wider. When the array area is the same, the number of microphone array elements required is less; and through this An audio signal processing method for a microphone array is applied for, which uses different loss functions to attenuate the output power of the original audio signal obtained by the above-mentioned non-uniform linear microphone array according to different angle ranges and frequency ranges, thereby improving the obtained target audio frequency. The quality of the signal.
附图说明Description of drawings
图1 为一实施例的非均匀线型麦克风阵列的结构示意图;Fig. 1 is a structural schematic diagram of a non-uniform linear microphone array of an embodiment;
图2 为一实施例的非均匀线型麦克风阵列的具体结构示意图;Fig. 2 is a specific structural schematic diagram of a non-uniform linear microphone array of an embodiment;
图3 为一实施例的非均匀线型麦克风阵列的具体结构示意图;Fig. 3 is a specific structural schematic diagram of a non-uniform linear microphone array of an embodiment;
图4 为一实施例的麦克风阵列的音频信号处理方法的流程示意图;FIG. 4 is a schematic flowchart of an audio signal processing method of a microphone array in an embodiment;
图5 为一实施例的麦克风阵列的音频信号处理方法的具体流程示意图;FIG. 5 is a schematic flowchart of a method for processing an audio signal of a microphone array according to an embodiment;
图6 为现有技术中的音频信号的宽频带波束图;FIG. 6 is a broadband beam diagram of an audio signal in the prior art;
图7 为一实施例的目标音频信号的宽频带波束图;Fig. 7 is the broadband beam diagram of the target audio signal of an embodiment;
图8 为本申请一实施例的麦克风阵列的音频信号处理装置的结构示意框图;FIG. 8 is a schematic block diagram of the structure of an audio signal processing device for a microphone array according to an embodiment of the present application;
图9 为本申请一实施例的计算机设备的结构示意框图。FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present application.
图1至图3中:In Figure 1 to Figure 3:
1、麦克风阵列;101、中心麦克风对;102、扩展麦克风。1. Microphone array; 101. Center microphone pair; 102. Extended microphones.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的最佳实施方式BEST MODE FOR CARRYING OUT THE INVENTION
参照图1,是本申请公开的一种非均匀线型麦克风阵列,其中,包括中心麦克风对,以及在所述中心麦克风对两侧对称排布的若干个扩展麦克风;所述中心麦克风对和所述扩展麦克风排布在同一直线上,且相邻麦克风的间距不等,其中,所述扩展麦克风与所述中心麦克风对之间的间距越大,则所述扩展麦克风与靠近阵列中心一侧的相邻麦克风之间的间距也越大。Referring to FIG. 1 , it is a non-uniform linear microphone array disclosed in the present application, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the The extended microphones are arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein, the larger the distance between the extended microphone and the center pair of microphones, the greater the distance between the extended microphone and the pair near the center of the array. The spacing between adjacent microphones is also greater.
在一个实施例中,本实施例提供的一种非均匀线型麦克风阵列,通常应用在音视频会议大屏、智慧黑板和其他对拾音质量有一定要求的设备中,用于采集设备周围的声音信息。所述非均匀线型麦克风阵列包括中心麦克风对和至少两个扩展麦克风,其中,中心麦克风对包括两个中心麦克风,中心麦克风对和扩展麦克风排布在同一直线上。具体来说,参照图2,当需要在麦克风阵列中预留摄像头等装置的安装位置时,可以根据其具体尺寸预留中心麦克风对间距d0,并以中心麦克风对连线的中垂线为对称轴,沿上述两个中心麦克风连线的延长线方向对称排布若干个扩展麦克风,扩展麦克风与其阵列中心一侧相邻麦克风的间距从内至外依次为d1,d2,d3。由于麦克风阵列经常与摄像头模组配合使用,因此在本实施例中,d0的数值大小可根据实际情况选择,其余间距设计为满足d1<d2<d3。In one embodiment, the non-uniform linear microphone array provided by this embodiment is usually used in large audio and video conference screens, smart blackboards, and other devices that have certain requirements for sound pickup quality, and is used to collect noise around the device. voice message. The non-uniform linear microphone array includes a center microphone pair and at least two extension microphones, wherein the center microphone pair includes two center microphones, and the center microphone pair and the extension microphones are arranged on the same straight line. Specifically, referring to Figure 2, when it is necessary to reserve the installation position of the camera and other devices in the microphone array, the distance between the center microphone pair d0 can be reserved according to its specific size, and it is symmetrical with the vertical line of the center microphone pair connection line Axis, a number of extension microphones are arranged symmetrically along the extension line of the above-mentioned two central microphones, and the distance between the extension microphone and the adjacent microphone on the center side of the array is d1, d2, d3 from inside to outside. Since the microphone array is often used in conjunction with the camera module, in this embodiment, the value of d0 can be selected according to the actual situation, and the remaining distances are designed to satisfy d1<d2<d3.
在一个实施例中,为了在保证信号质量的同时减少麦克风的数量,相邻的扩展麦克风的间距随其到中心麦克风对的距离增大而增大。相较于将相同数量的麦克风以均匀方式排布的阵列结构,按照上述原则排布的非均匀麦克风阵列结构,能够同时实现更小的相邻麦克风单元间距和更大的麦克风阵列整体孔径,即中心麦克风对与其相邻的扩展麦克风之间的间距小于平均间距,而首尾两侧麦克风与其相邻麦克风的间距大于平均间距,从而能够在麦克风数量不变的情况下,获得更大的优化波束形成频率范围,提高拾音质量。In one embodiment, in order to reduce the number of microphones while ensuring signal quality, the distance between adjacent expansion microphones increases as the distance from the central microphone pair increases. Compared with the array structure in which the same number of microphones are arranged in a uniform manner, the non-uniform microphone array structure arranged according to the above principles can simultaneously achieve smaller spacing between adjacent microphone units and a larger overall aperture of the microphone array, namely The center microphone pair has a smaller-than-average spacing between its adjacent expansion microphones, while the front and rear side microphones have a larger-than-average spacing between their neighbors, allowing for greater optimal beamforming with the same number of microphones Frequency range, improve pickup quality.
在一个实施例中,参照图3,上述非均匀线型麦克风阵列结构包括由单一硬件结构实现,同时也包括由若干个非均匀不对称的麦克风阵列硬件结构以子阵列形式组合实现。In one embodiment, referring to FIG. 3 , the aforementioned non-uniform linear microphone array structure includes a single hardware structure, and also includes a combination of several non-uniform and asymmetric microphone array hardware structures in the form of sub-arrays.
参照图4,是本申请公开的一种麦克风阵列的音频信号处理算法的流程示意图,上述方法包括:Referring to Fig. 4, it is a schematic flow chart of an audio signal processing algorithm of a microphone array disclosed in the present application, and the method includes:
S1、根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;S1. Calculate a corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
S2、根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的,完成频率独立的加权系数优化后再对其进行频域平滑;S2. According to the frequency band to which each frequency point belongs, select the corresponding constraint condition and cost function, and optimize and solve the weighting coefficient, wherein, the cost function is calculated by the array steering vector matrix and the weighting coefficient, and the completion The frequency-independent weighting coefficients are optimized and then smoothed in the frequency domain;
S3、从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;S3. Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time frequency domain signal;
S4、通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;S4. Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by using the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;
S5、对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号;S5. Perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain the target audio signal;
在实际的执行过程中,通常上述步骤S1和S2只需进行一次,得到所述频域波束形成加权系数后将其存储,不再随接收信号变化而修改所述频域波束形成加权系数,直至上述麦克风阵列的结构参数发生改变;上述结构参数包括麦克风阵列包含的麦克风数量。In the actual execution process, usually the above-mentioned steps S1 and S2 only need to be performed once, and the frequency-domain beamforming weighting coefficients are obtained and stored, and the frequency-domain beamforming weighting coefficients are no longer modified with the change of the received signal until The structural parameters of the microphone array are changed; the structural parameters include the number of microphones included in the microphone array.
如上述步骤S1所述,首先需要根据所述麦克风阵列的结构参数和信号采集通道计算阵列导向矢量组。具体来说,通过假想信号模拟实际的音频信号,根据上述非均匀线型麦克风阵列结构、信号采集通道、信号采样率和分析频点数,对应每个分析频点和来波方向,计算一个阵列导向矢量。示例性地,若上述麦克风阵列一共由8个麦克风组成,指定全部8个通道为信号采集通道,将来波方向间隔1°划分为0°~180°共181个离散的来波方向,分析频点数选择为512,则阵列导向矢量组由512个维数为8×181的阵列导向矢量矩阵组成,其中每个阵列导向矢量矩阵包括181个对应不同来波方向的导向矢量。示例性地,若指定除首尾2个通道外的其余6个通道为信号采集通道,则阵列导向矢量矩阵的维数为6×181。As described in the above step S1, firstly, it is necessary to calculate the array steering vector group according to the structural parameters of the microphone array and the signal acquisition channels. Specifically, the actual audio signal is simulated by the imaginary signal, and an array guide is calculated corresponding to each analysis frequency point and incoming wave direction according to the above-mentioned non-uniform linear microphone array structure, signal acquisition channel, signal sampling rate, and number of analysis frequency points. vector. Exemplarily, if the above-mentioned microphone array consists of 8 microphones in total, all 8 channels are designated as signal acquisition channels, and the incoming wave directions are divided into 181 discrete incoming wave directions at intervals of 1° from 0° to 180°, and the number of frequency points is analyzed If 512 is selected, the array steering vector group consists of 512 array steering vector matrices with a dimension of 8×181, wherein each array steering vector matrix includes 181 steering vectors corresponding to different directions of incoming waves. Exemplarily, if the remaining 6 channels except the first and last two channels are designated as signal acquisition channels, the dimension of the array steering vector matrix is 6×181.
如上述步骤S2所述,上述分频段优化加权系数的含义为:根据信号采样率和分析频点数将全部处理频带划分为低、中、高三个处理频段,在低、中、高频段对频域波束形成加权系数进行优化求解时,采用不同的约束条件和代价函数,完成频率独立的加权系数优化后再对其进行频域平滑。As described in the above step S2, the meaning of the above-mentioned sub-band optimization weighting coefficients is: according to the signal sampling rate and the number of analysis frequency points, all the processing frequency bands are divided into three processing frequency bands: low, medium and high, and the frequency domain is adjusted in the low, medium and high frequency bands. When beamforming weighting coefficients are optimized and solved, different constraints and cost functions are used to complete frequency-independent weighting coefficient optimization and then frequency domain smoothing.
如上述步骤S3所述,波束形成处理在频域进行,因此需要将原始麦克风采集信号变换到时频域。在具体的应用中,根据不同的波束形成目的,可以提取麦克风阵列全部通道的信号,也可以仅提取部分通道信号,通道对应的麦克风位置可以是非对称的。参照图5,假设当前提取的通道数为M,则M麦克风位置处的声压信号分别表示为x 1(t),...,x M(t),经过采样和缓存后得到对应的多通道时域音频信号x 1(l),...,x M(l),再经过离散傅里叶变换(Discrete Fourier Transform,DFT)后,得到分析频点数为K的多通道时频域信号,即x 1(1),..,x 1(K),...,x M(1),..,x M(K)。在实际应用中,通常还会使用特定的分析窗来完成时频域转换。示例性地,若所述麦克风阵列由8个麦克风组成,分析频点数选择为512,且选取全部通道用于波束形成,则一帧的多通道时频域信号表示为8×512的矩阵。 As described in step S3 above, the beamforming process is performed in the frequency domain, so it is necessary to transform the original microphone acquisition signal into the time-frequency domain. In a specific application, according to different beamforming purposes, the signals of all channels of the microphone array can be extracted, or only part of the channel signals can be extracted, and the positions of the microphones corresponding to the channels can be asymmetrical. Referring to Fig. 5, assuming that the number of channels currently extracted is M, the sound pressure signals at M microphone positions are respectively expressed as x 1 (t),..., x M (t), and the corresponding multiples are obtained after sampling and buffering Channel time-domain audio signals x 1 (l),...,x M (l), and then after discrete Fourier transform (Discrete Fourier Transform, DFT), get a multi-channel time-frequency domain signal with K analysis frequency points , that is, x 1 (1),...,x 1 (K),...,x M (1),...,x M (K). In practical applications, a specific analysis window is usually used to complete the time-frequency domain conversion. Exemplarily, if the microphone array is composed of 8 microphones, the number of analysis frequency points is selected as 512, and all channels are selected for beamforming, the multi-channel time-frequency domain signal of one frame is expressed as an 8×512 matrix.
如上述步骤S4所述,将多通道音频信号变换到频域后,根据上述步骤S2计算得到的加权系数矩阵对其进行加权求和,得到K个频点对应的波束输出频域信号Y(1),...,Y(K)。As described in the above step S4, after the multi-channel audio signal is transformed into the frequency domain, it is weighted and summed according to the weighting coefficient matrix calculated in the above step S2 to obtain the beam output frequency domain signal Y(1 ),...,Y(K).
如上述步骤S5所述,根据和步骤S3中对应的加窗策略,对K个频点对应的波束输出频域信号Y(1),...,Y(K)进行离散傅里叶反变换(Inverse Discrete Fourier Transform,IDFT),最终得到多通道时域音频信号y(l)。As described in step S5 above, according to the corresponding windowing strategy in step S3, the beam output frequency domain signals Y(1),...,Y(K) corresponding to K frequency points are subjected to inverse discrete Fourier transform (Inverse Discrete Fourier Transform, IDFT), and finally obtain a multi-channel time-domain audio signal y(l).
综上所述,通过对所述非均匀线型麦克风阵列采集的原始音频信号进行时频域变换和波束形成处理,最终获得了具有特定指向性的拾音功能,提高了拾取到的音频信号的信噪比。In summary, by performing time-frequency domain transformation and beamforming processing on the original audio signal collected by the non-uniform linear microphone array, the sound pickup function with specific directivity is finally obtained, which improves the accuracy of the picked-up audio signal. SNR.
在一个实施例中,所述信号采集通道的指定方式为以下方式中的一种:In one embodiment, the designated manner of the signal acquisition channel is one of the following manners:
在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道,例如,从图2所示的麦克风阵列中提取由前7个麦克风构成的不对称多通道信号。In the microphone array, some channels are asymmetrically selected and designated as the signal acquisition channels, for example, an asymmetric multi-channel signal composed of the first seven microphones is extracted from the microphone array shown in FIG. 2 .
如上所述,在多通道信号选取环节,可以选取全部通道信号指定为所述信号采集通道,从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号。此时,被选通的麦克风数量等于阵列的麦克风总数量,并且其位置是对称的,相应的,根据这种通道选取方式确定的所述阵列导向矢量矩阵维数是阵列的麦克风单元数量,并且优化加权系数过程中的期望波束响应是对称的。As mentioned above, in the multi-channel signal selection link, all channel signals can be selected and designated as the signal acquisition channel, and the time-domain audio signal corresponding to the signal acquisition channel is extracted from the multi-channel signal collected by the microphone array. Discrete Fourier transform is performed on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain signal. At this time, the number of microphones to be gated is equal to the total number of microphones in the array, and their positions are symmetrical. Correspondingly, the dimension of the array steering vector matrix determined according to this channel selection method is the number of microphone units in the array, and The desired beam response during optimization of the weighting coefficients is symmetric.
在多通道信号选取环节,除选取全部通道指定为所述信号采集通道外,也可以选取部分对称通道指定为所述信号采集通道用于后续波束形成,此时,被选通的麦克风数量小于阵列的麦克风总数量,并且其位置是对称的。与上述选取全部通道信号的方式相比,根据这种通道选取方式确定的所述阵列导向矢量矩阵维数小于阵列的麦克风单元数量,并且优化加权系数过程中的期望波束响应是对称的。与上述选取全部通道信号的方式相比,选取部分对称通道信号的方案中,参与波束形成处理的通道数减少,因此,用到的加权系数数量减少,波束形成处理的计算量减少。In the multi-channel signal selection link, in addition to selecting all channels to be designated as the signal acquisition channels, some symmetrical channels may also be selected to be designated as the signal acquisition channels for subsequent beamforming. At this time, the number of gated microphones is less than that of the array The total number of microphones, and their positions are symmetrical. Compared with the above method of selecting all channel signals, the dimension of the array steering vector matrix determined according to this channel selection method is smaller than the number of microphone units in the array, and the expected beam response in the process of optimizing the weighting coefficients is symmetrical. Compared with the above-mentioned method of selecting all channel signals, in the scheme of selecting part of symmetrical channel signals, the number of channels participating in beamforming processing is reduced, so the number of weighting coefficients used is reduced, and the calculation amount of beamforming processing is reduced.
在多通道信号选取环节,除上述选取全部或部分对称通道指定为所述信号采集通道的方式外,也可以选取部分非对称通道信号指定为所述信号采集通道用于后续波束形成,此时,被选通的麦克风数量小于阵列的麦克风单元数量,并且其位置是非对称的,相应的,根据这种通道选取方式确定的所述阵列导向矢量矩阵维数小于阵列的麦克风单元数量,并且优化加权系数过程中的期望波束响应是非对称的。In the multi-channel signal selection link, in addition to the above-mentioned method of selecting all or part of the symmetric channels as the signal acquisition channel, it is also possible to select part of the asymmetric channel signal and designate it as the signal acquisition channel for subsequent beamforming. At this time, The number of gated microphones is less than the number of microphone units in the array, and their positions are asymmetrical. Correspondingly, the dimension of the array steering vector matrix determined according to this channel selection method is less than the number of microphone units in the array, and the weighting coefficients are optimized The desired beam response in the process is asymmetric.
综上所述,组合使用上述三种指定信号采集通道的方案,可以得到若干组不同的和信号采集通道方案对应的加权系数,并得到多个波束形成的结果,为后续信号处理环节提供多通道信号。To sum up, by combining the above three schemes for specifying signal acquisition channels, several sets of weighting coefficients corresponding to the signal acquisition channel schemes can be obtained, and multiple beamforming results can be obtained, providing multi-channel for subsequent signal processing. Signal.
在一个实施例中,所述频段包括低频段、中频段和高频段;In one embodiment, the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
如上所述,在一个实施例中,将全部处理频带划分为低、中、高三个处理频段,在低、中、高频段对频域波束形成加权系数进行优化求解时,分别采用不同的约束条件和代价函数。其中,处理频段的划分依据是麦克风阵元间距和信号半波长的数值是否相近。中频段是能够通过优化加权系数利用所述阵列结构形成较理想期望波束响应的频率范围,即确定中频段与低频段的频率分界点的依据是该频率信号的半波长数值与阵列的最大麦克风间距相近,确定中频段与高频段的频率分界点的依据是该频率信号的半波长数值与阵列的最小麦克风间距相近。示例性地,根据如下麦克风间距设计参数确定非均匀线型麦克风阵列结构:d0=30mm,d1=25mm,d2=35mm,d3=90mm。此时,首尾两个麦克风间距最大,为3300mm,中心麦克风对和相邻的扩展麦克风间距最小,为25mm,根据所述频段划分原则,中频段下限频率和上限频率信号半波长分别为3300mm和25mm,当声速取330m/s时,对应频率为500Hz和6600Hz,因此,一种适合所述麦克风阵列结构的低、中、高频段的划分如下表1所示。As mentioned above, in one embodiment, all the processing frequency bands are divided into three processing frequency bands: low, medium and high, and when optimizing and solving the frequency domain beamforming weighting coefficients in the low, medium and high frequency bands, different constraint conditions are respectively adopted and cost function. Among them, the division of the processing frequency band is based on whether the distance between the microphone array elements and the value of the half-wavelength of the signal are similar. The mid-frequency band is the frequency range in which the array structure can be used to form a more ideal desired beam response by optimizing the weighting coefficients, that is, the basis for determining the frequency cut-off point between the mid-frequency band and the low-frequency band is the half-wavelength value of the frequency signal and the maximum microphone spacing of the array Similarly, the basis for determining the frequency cut-off point between the mid-frequency band and the high-frequency band is that the half-wavelength value of the frequency signal is close to the minimum microphone spacing of the array. Exemplarily, the non-uniform linear microphone array structure is determined according to the following microphone spacing design parameters: d0=30mm, d1=25mm, d2=35mm, d3=90mm. At this time, the distance between the two microphones at the beginning and the end is the largest, which is 3300mm, and the distance between the center microphone pair and the adjacent extended microphone is the smallest, which is 25mm. According to the principle of frequency band division, the half-wavelength of the lower limit frequency and upper limit frequency signals of the mid-frequency band are 3300mm and 25mm respectively. , when the sound velocity is 330m/s, the corresponding frequencies are 500Hz and 6600Hz. Therefore, a division of low, medium and high frequency bands suitable for the microphone array structure is shown in Table 1 below.
表1Table 1
Figure dest_path_image001
Figure dest_path_image001
在一个实施例中,所述对加权系数进行优化求解之后,还包括:In one embodiment, after optimizing and solving the weighting coefficients, further comprising:
计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
如上所述,由于对损失函数的优化求解是在各个频点独立进行的,因此经优化求解得到的加权系数存在频点间的不连续性,即在低、中、高频段的分界处存在一系列较明显的不连续点,对应的波束输出信号中将存在一定程度的人工噪音。首先判断每个通道的加权系数在频域的主要不连续点,然后在所述不连续点附近设置覆盖若干频点的过渡带,并对过渡带覆盖频点的加权系数进行平滑,可以减少波束输出音频信号中的人工噪声。As mentioned above, since the optimization solution of the loss function is carried out independently at each frequency point, the weighting coefficients obtained through the optimization solution have discontinuity between frequency points, that is, there is a gap between the low, medium and high frequency bands. A series of obvious discontinuities will have a certain degree of artificial noise in the corresponding beam output signal. First judge the main discontinuity point of the weighting coefficient of each channel in the frequency domain, then set a transition band covering several frequency points near the discontinuity point, and smooth the weighting coefficient of the frequency points covered by the transition band, which can reduce the beam Artifacts in the output audio signal.
具体来说,上述相邻频点是指某一频点的前一个频点以及后一个频点,例如某一个频点为全部512个分析频点中的第256个频点,那么相邻频点即第255个频点和第257个频点,而对应的一阶差分平均值为第255个频点和第257个频点对应的一阶差分之间的平均值。当第256个频点的一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值时,表示第256个频点处的加权系数的变化率较大,即不够平滑,因此将第255个频点至第257个频点作为待平滑区间,并对该区间中的加权系数进行平滑,例如将所述加权系数设置为与一阶差分平均值相同等。Specifically, the above-mentioned adjacent frequency point refers to the previous frequency point and the next frequency point of a certain frequency point. For example, a certain frequency point is the 256th frequency point among all 512 analysis frequency points. The points are the 255th frequency point and the 257th frequency point, and the corresponding first-order difference average is the average value between the first-order differences corresponding to the 255th frequency point and the 257th frequency point. When the relative deviation between the first-order difference value of the 256th frequency point and the corresponding first-order difference average value is greater than the preset deviation threshold, it means that the rate of change of the weighting coefficient at the 256th frequency point is relatively large, that is, not enough Smoothing, therefore, the 255th frequency point to the 257th frequency point are used as the interval to be smoothed, and the weighting coefficient in this interval is smoothed, for example, the weighting coefficient is set to be the same as the average value of the first-order difference, etc.
在一个实施例中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述方法还包括:In one embodiment, the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:
获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high-frequency band include: the norm of the weighting coefficient is less than the high-band threshold of the norm of the weighting coefficient, and the deviation between the main lobe of the high-frequency beam output and the expected main lobe response of the high-frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣和所述高频段波束输出主瓣是在对应频点下,根据所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
如上所述,在分频段优化加权系数的过程中,首先指定各频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,指定中频段的期望主瓣响应和中高频段的主瓣偏差阈值,进而指定各频段具体的约束条件和代价函数。As mentioned above, in the process of optimizing weighting coefficients by sub-frequency bands, the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band are specified first, and the expected main lobe response and mid-high frequency range of the mid-frequency band are specified. The main lobe deviation threshold of each frequency band is specified, and then the specific constraints and cost functions of each frequency band are specified.
示例性地,若上述麦克风阵列一共由8个麦克风组成,分析频点数选择为512。首先选取一个待优化频点,根据对应频点的阵列导向矢量矩阵和待优化的加权系数计算所有来波方向的波束输出,进而根据定义计算代价函数并在约束条件下进行优化。最终得到8×512的加权系数矩阵,其中包括在512个频点上与8个麦克风通道对应的频域加权系数,在波束形成处理中用于对多通道时频域信号进行加权求和。Exemplarily, if the above-mentioned microphone array consists of 8 microphones in total, the number of analysis frequency points is selected to be 512. First, a frequency point to be optimized is selected, and the beam outputs of all directions of arrival are calculated according to the array steering vector matrix of the corresponding frequency point and the weighting coefficients to be optimized, and then the cost function is calculated according to the definition and optimized under constraints. Finally, an 8×512 weighting coefficient matrix is obtained, which includes frequency-domain weighting coefficients corresponding to 8 microphone channels at 512 frequency points, and is used for weighted summation of multi-channel time-frequency domain signals in beamforming processing.
在一个实施例中,在低、中、高频段对频域波束形成加权系数进行优化求解时,采用不同的约束条件和代价函数。示例性地,指定各频段的频率范围如上表1所示,此时考虑所述麦克风阵列的实际使用场景:在会议室中,麦克风阵列通常放置在用户座位的正前方,当用户正对麦克风说话时,其信号的入射角度为90°,但由于参与会议的说话人通常不止一个且位置可能分布在长矩形桌子两侧,麦克风阵列采集到的原始音频信号通常来源于60°~120°的方向。因此,可以将60°~120°作为中频段的期望波束主瓣角度范围D 2。由于在低频段不易形成明显波束,因此将50°~130°作为低频段的期望波束主瓣角度范围D 1。高频段期望波束主瓣角度范围D 3将在随后给出。在所有频段都将期望波束主瓣角度范围以外的角度范围指定为波束衰减角度范围(C 1,C 2,C 3)。 In one embodiment, different constraint conditions and cost functions are used when optimizing and solving the frequency-domain beamforming weighting coefficients in the low, medium and high frequency bands. Exemplarily, the frequency ranges of the specified frequency bands are as shown in Table 1 above. At this time, consider the actual usage scenario of the microphone array: in a conference room, the microphone array is usually placed directly in front of the user's seat, and when the user is speaking into the microphone , the incident angle of the signal is 90°, but since there are usually more than one speaker participating in the meeting and their positions may be distributed on both sides of the long rectangular table, the original audio signal collected by the microphone array usually comes from a direction of 60°~120° . Therefore, 60°~120° can be taken as the expected beam main lobe angle range D 2 of the mid-frequency band. Since it is difficult to form an obvious beam in the low frequency band, 50°~130° is taken as the expected beam main lobe angle range D 1 in the low frequency band. The expected beam main lobe angle range D3 in the high frequency band will be given later. Angular ranges outside the desired beam main lobe angular range are designated as beam attenuation angular ranges (C 1 , C 2 , C 3 ) in all frequency bands.
在优化加权系数时,为了保证一定的稳健性,需要对加权系数的范数进行约束,由于在低频段难以优化得到理想的波束图,一种合理的设计范例是随频率增加逐渐加强该约束,因此低、中、高频段的加权系数范数阈值(α 1,α 2,α 3)依次选取为1.5,1.2和1.0。 When optimizing the weighting coefficient, in order to ensure a certain robustness, the norm of the weighting coefficient needs to be constrained. Since it is difficult to optimize the ideal beam pattern in the low frequency band, a reasonable design paradigm is to gradually strengthen the constraint as the frequency increases. Therefore, the weighting coefficient norm thresholds (α 1, α 2, α 3 ) of the low, medium and high frequency bands are selected as 1.5, 1.2 and 1.0 in turn.
在优化加权系数时,利用波束主瓣偏差阈值对优化得到的波束主瓣响应与期望波束主瓣响应之间的偏差进行约束。在中频段将波束主瓣偏差阈值约束β 2指定为0.7。高频段期望波束主瓣角度范围β 3将在随后给出。由于在低频段不易形成明显波束,因此不进行此项约束。 When optimizing the weighting coefficients, the deviation between the optimized beam main lobe response and the expected beam main lobe response is constrained by using the beam main lobe deviation threshold. The beam mainlobe deviation threshold constraint β2 is specified as 0.7 in the mid-band. The desired beam main lobe angle range β3 in the high frequency band will be given later. Since it is difficult to form a clear beam in the low frequency band, this constraint is not carried out.
在优化中频段加权系数时,需要给出期望波束响应主瓣形状。由于所述非均匀线型麦克风阵列的排布方式灵活多变,为了实现更一般的设计方法,本实施例不直接给出期望主瓣响应的解析式,而是将期望主瓣响应
Figure dest_path_image002
指定为所述麦克风阵列在某一频率处均匀加权得到的波束图,此处将该频率指定为中频段频率下限f 2L的2倍。
When optimizing the weighting coefficients in the mid-frequency band, it is necessary to give the expected beam response main lobe shape. Since the arrangement of the non-uniform linear microphone array is flexible and changeable, in order to realize a more general design method, this embodiment does not directly give the analytical formula of the expected main lobe response, but the expected main lobe response
Figure dest_path_image002
It is specified as the beam pattern obtained by the uniform weighting of the microphone array at a certain frequency, and the frequency is specified here as twice the lower limit of the mid-band frequency f 2L .
在一个实施例中,所述高频段期望主瓣响应的获取方法是:In one embodiment, the acquisition method of the high frequency band expected main lobe response is:
完成中频段处的加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the weighting coefficients at the middle frequency band, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high frequency band expectation main lobe response.
如上所述,在优化高频段加权系数时,信号波长减小至小于麦克风阵列中的最小阵元间距,此时难以在优化之前预先指定具体的期望主瓣响应的形式。因此,可以将经过优化的中频段波束输出主瓣形状作为高频段加权系数优化时的期望主瓣响应y 3。由于经过优化的中频段波束输出主瓣形状未知,因此可以将高频段波束主瓣角度范围D 3指定为所述经过优化的中频段波束输出主瓣的-6dB波束宽度对应的角度范围。 As mentioned above, when optimizing the weighting coefficients in the high frequency band, the signal wavelength is reduced to be smaller than the minimum element spacing in the microphone array, and it is difficult to pre-specify the specific expected main lobe response form before optimization. Therefore, the optimized mid-band beam output main lobe shape can be used as the expected main lobe response y 3 when the high-band weighting coefficients are optimized. Since the output main lobe shape of the optimized mid-band beam is unknown, the angular range D3 of the high-band beam main lobe can be specified as the angle range corresponding to the -6dB beamwidth of the optimized mid-band beam output main lobe.
上述指标参数选取范例适用于由如下间距确定的非均匀线型麦克风阵列:d0=30mm,d1=25mm,d2=35mm,d3=90mm,全部指标参数总结如下表2所示。The selection example of the above index parameters is applicable to the non-uniform linear microphone array determined by the following spacing: d0=30mm, d1=25mm, d2=35mm, d3=90mm, all index parameters are summarized in Table 2 below.
表2Table 2
Figure dest_path_image003
Figure dest_path_image003
根据上述优化加权系数得到的完整频带的波束图如图6所示,而图7是现有技术中的麦克风得到的完整频带的波束图,由此可见,本实施例提供的麦克风阵列的音频信号处理方法在语音的主要频段(0.5kHz~6.0kHz),波束图主瓣宽度近似恒定,在90°方向增益保持0dB,在0°~60°和120°~180°方向对信号有较强衰减,即滤除噪声和干扰的效果较好。The beam pattern of the complete frequency band obtained according to the above-mentioned optimized weighting coefficients is shown in Figure 6, and Figure 7 is the beam pattern of the complete frequency band obtained by the microphone in the prior art, it can be seen that the audio signal of the microphone array provided by this embodiment The processing method is in the main frequency band of speech (0.5kHz~6.0kHz), the main lobe width of the beam pattern is approximately constant, the gain in the 90° direction is maintained at 0dB, and the signal is strongly attenuated in the 0°~60° and 120°~180° directions , that is, the effect of filtering out noise and interference is better.
综上所述,为本申请实施例中提供的麦克风阵列的音频信号处理方法,通过将加权系数的范数和期望主瓣响应作为约束条件,使得波束主瓣角度内的输出信号与预设的波束主瓣响应偏差较小并且具有较高的稳健性;通过将波束衰减角度内的输出信号功率作为代价函数进行最优化求解,能够实现对非波束主瓣方向来波信号的最大程度抑制;通过在低、中、高频段设计不同的约束条件和代价函数,能够避免出现优化问题无解的情况;通过在所述加权系数不连续点附近设置覆盖若干频点的过渡带,并对过渡带覆盖频点的加权系数进行平滑,可以减少波束输出音频信号中的人工噪声;通过先分频段独立优化再进行频域平滑的加权系数设计方法,可以得到一组多通道加权系数,根据所述多通道加权系数对所述非均匀线型麦克风阵列采集的多通道音频信号进行空域滤波,能够提高处理后得到的目标音频信号的信噪比;此外,还可以按照不同方式选取多通道信号进行所述处理,从而满足后续其他多通道音频处理算法的要求,通过将后续多通道算法中的真实通道信号替换为经过空域滤波的信号,可以减少通道数量,提升信号质量。In summary, for the audio signal processing method of the microphone array provided in the embodiment of the present application, by using the norm of the weighting coefficient and the expected main lobe response as constraints, the output signal within the main lobe angle of the beam is consistent with the preset The beam main lobe response deviation is small and has high robustness; by optimizing the output signal power within the beam attenuation angle as a cost function, the maximum degree of suppression of the incoming wave signal in the direction of the non-beam main lobe can be achieved; through Designing different constraints and cost functions in the low, medium, and high frequency bands can avoid the situation that the optimization problem has no solution; by setting a transition band covering several frequency points near the discontinuity point of the weighting coefficient, and covering the transition band The weighting coefficients of the frequency points are smoothed, which can reduce the artificial noise in the beam output audio signal; through the weighting coefficient design method of first dividing the frequency bands into independent optimization and then smoothing the frequency domain, a group of multi-channel weighting coefficients can be obtained, according to the multi-channel The weighting coefficient performs spatial filtering on the multi-channel audio signal collected by the non-uniform linear microphone array, which can improve the signal-to-noise ratio of the target audio signal obtained after processing; in addition, the multi-channel signal can also be selected in different ways for the processing , so as to meet the requirements of other subsequent multi-channel audio processing algorithms. By replacing the real channel signal in the subsequent multi-channel algorithm with the signal after spatial filtering, the number of channels can be reduced and the signal quality can be improved.
参照图8,本申请还提出了一种麦克风阵列的音频信号处理装置,包括:Referring to FIG. 8, the present application also proposes an audio signal processing device for a microphone array, including:
阵列导向矢量组计算单元100,用于根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;The array steering vector group calculation unit 100 is used to calculate the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points ;
加权系数求解单元200,用于根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的,完成频率独立的加权系数优化后再对其进行频域平滑;The weighting coefficient solving unit 200 is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting The coefficients are calculated, and then the frequency-domain smoothing is performed after completing the frequency-independent weighting coefficient optimization;
信号提取单元300,用于从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;The signal extraction unit 300 is configured to extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain Corresponding multi-channel time-frequency domain signal;
空域滤波单元400,用于通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;The spatial filtering unit 400 is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients, to obtain a time-frequency domain beam output signal, and complete spatial filtering;
信号生成模块500,用于对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。The signal generation module 500 is configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
在一个实施例中,信号采集通道的指定方式为以下方式中的一种:In one embodiment, the designated manner of the signal acquisition channel is one of the following manners:
在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道。Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
在一个实施例中,所述频段包括低频段、中频段和高频段;In one embodiment, the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
在一个实施例中,所述空域滤波单元400,还用于:In an embodiment, the spatial domain filtering unit 400 is further configured to:
计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
在一个实施例中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述加权系数求解单元200,还用于:In one embodiment, the array steering vector matrix is calculated according to the imaginary signals of all directions of arrival, and the weighting coefficient solving unit 200 is further used for:
获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the beam attenuation angle range of the low frequency band pass through Beamforming output power;
所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣是根据中频段内优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的,所述高频段波束输出主瓣是根据高频段内优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficients; the output main lobe of the mid-frequency band beam is calculated according to the weighting coefficients optimized in the mid-frequency band to the array steering vector matrix It is obtained by performing weighted summation, and the output main lobe of the high-frequency band beam is obtained by performing weighted summation on the array steering vector matrix according to the optimized weight coefficient in the high-frequency band;
所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
在一个实施例中,所述高频段期望主瓣响应的获取方法是:In one embodiment, the acquisition method of the high frequency band expected main lobe response is:
完成中频段处的加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the weighting coefficients at the middle frequency band, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high frequency band expectation main lobe response.
参照图9,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于储存麦克风阵列的音频信号处理方法等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种麦克风阵列的音频信号处理方法。所述麦克风阵列的音频信号处理方法,应用于麦克风阵列;其中,所述麦克风阵列包括中心麦克风对,以及在所述中心麦克风对两侧对称排布的若干个扩展麦克风;所述中心麦克风对和所述扩展麦克风排布在同一直线上,且相邻麦克风的间距不等,其中,所述扩展麦克风与所述中心麦克风对之间的间距越大,则所述扩展麦克风与靠近阵列中心一侧的相邻麦克风之间的间距也越大;所述方法包括:根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的,完成频率独立的加权系数优化后再对其进行频域平滑;从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。Referring to FIG. 9 , an embodiment of the present application also provides a computer device, which may be a server, and its internal structure may be as shown in FIG. 9 . The computer device includes a processor, memory, network interface and database connected by a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as the audio signal processing method of the microphone array. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by a processor, an audio signal processing method of a microphone array is realized. The audio signal processing method of the microphone array is applied to a microphone array; wherein, the microphone array includes a central microphone pair, and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and The extended microphones are arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein, the larger the distance between the extended microphone and the central microphone pair, the larger the distance between the extended microphone and the side near the center of the array. The distance between adjacent microphones is also larger; the method includes: calculating the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes corresponding to different frequency points Several array-steering vector matrices; according to the frequency band to which each frequency point belongs, select the corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array-steering vector matrix and the weighting The coefficients are calculated, and then frequency-domain smoothing is performed after completing frequency-independent weighting coefficient optimization; the time-domain audio signal corresponding to the signal acquisition channel is extracted from the multi-channel signal collected by the microphone array, and the signal is collected The time-domain audio signal corresponding to the channel is discretely Fourier transformed to obtain the corresponding multi-channel time-frequency domain signal; the multi-channel time-frequency domain signal of the corresponding frequency point is respectively weighted and summed by the weighting coefficient to obtain the time-frequency domain beam output signal to complete spatial domain filtering; performing inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
在一个实施例中,所述信号采集通道的指定方式为以下方式中的一种:In one embodiment, the designated manner of the signal acquisition channel is one of the following manners:
在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道。Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
在一个实施例中,所述频段包括低频段、中频段和高频段;In one embodiment, the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
在一个实施例中,所述对加权系数进行优化求解之后,还包括:In one embodiment, after optimizing and solving the weighting coefficients, further comprising:
计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
在一个实施例中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述方法还包括:In one embodiment, the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:
获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣和所述高频段波束输出主瓣是在对应频点下,根据所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
在一个实施例中,所述高频段期望主瓣响应的获取方法是:In one embodiment, the acquisition method of the high frequency band expected main lobe response is:
完成中频段加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
本申请一实施例还提供一种计算机可读存储介质,所述存储介质为易失性存储介质或非易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现一种麦克风阵列的音频信号处理方法,应用于麦克风阵列;所述麦克风阵列包括中心麦克风对,以及在所述中心麦克风对两侧对称排布的若干个扩展麦克风;所述中心麦克风对和所述扩展麦克风排布在同一直线上,且相邻麦克风的间距不等,其中,所述扩展麦克风与所述中心麦克风对之间的间距越大,则所述扩展麦克风与靠近阵列中心一侧的相邻麦克风之间的间距也越大;所述方法包括:根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。An embodiment of the present application also provides a computer-readable storage medium, the storage medium is a volatile storage medium or a non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, a The audio signal processing method of a microphone array is applied to a microphone array; the microphone array includes a central microphone pair, and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphone Arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein the larger the distance between the expansion microphone and the center microphone pair, the greater the distance between the expansion microphone and the adjacent microphone on the side near the center of the array. The distance between them is also larger; the method includes: calculating the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several arrays corresponding to different frequency points Steering vector matrix; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients ; Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain signal; weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain a time-frequency domain beam output signal, and completing spatial filtering; discretizing the time-frequency domain beam output signal The inverse Fourier transform is used to calculate the target audio signal.
上述执行的麦克风阵列的音频信号处理方法,通过将加权系数的范数和期望主瓣响应作为约束条件,使得波束主瓣角度内的输出信号与预设的波束主瓣响应偏差较小并且具有较高的稳健性;通过将波束衰减角度内的输出信号功率作为代价函数进行最优化求解,能够实现对非波束主瓣方向来波信号的最大程度抑制;通过在低、中、高频段设计不同的约束条件和代价函数,能够避免出现优化问题无解的情况;通过在所述加权系数不连续点附近设置覆盖若干频点的过渡带,并对过渡带覆盖频点的加权系数进行平滑,可以减少波束输出音频信号中的人工噪声;通过先分频段独立优化再进行频域平滑的加权系数设计方法,可以得到一组多通道加权系数,根据所述多通道加权系数对所述非均匀线型麦克风阵列采集的多通道音频信号进行空域滤波,能够提高处理后得到的目标音频信号的信噪比;此外,还可以按照不同方式选取多通道信号进行所述处理,从而满足后续其他多通道音频处理算法的要求,通过将后续多通道算法中的真实通道信号替换为经过空域滤波的信号,可以减少通道数量,提升信号质量。The audio signal processing method of the microphone array implemented above uses the norm of the weighting coefficient and the expected main lobe response as constraints, so that the output signal within the beam main lobe angle has a small deviation from the preset beam main lobe response and has a relatively small High robustness; by optimizing the output signal power within the beam attenuation angle as a cost function, it can achieve the maximum suppression of the incoming wave signal in the direction of the non-beam main lobe; by designing different Constraint conditions and cost functions can avoid the situation that the optimization problem has no solution; by setting a transition band covering several frequency points near the discontinuity point of the weight coefficient, and smoothing the weight coefficient of the frequency points covered by the transition band, it can reduce The artificial noise in the beam output audio signal; through the weighting coefficient design method of frequency-domain smoothing after independent optimization of the frequency band first, a group of multi-channel weighting coefficients can be obtained, and the non-uniform linear microphone can be adjusted according to the multi-channel weighting coefficients. The multi-channel audio signal collected by the array is subjected to spatial filtering, which can improve the signal-to-noise ratio of the target audio signal obtained after processing; in addition, the multi-channel signal can also be selected in different ways for the processing, thereby satisfying other subsequent multi-channel audio processing algorithms Requirements, by replacing the real channel signal in the subsequent multi-channel algorithm with the signal that has been filtered in the spatial domain, the number of channels can be reduced and the signal quality can be improved.
在一个实施例中,所述信号采集通道的指定方式为以下方式中的一种:In one embodiment, the designated manner of the signal acquisition channel is one of the following manners:
在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道。Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
在一个实施例中,所述频段包括低频段、中频段和高频段;In one embodiment, the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
在一个实施例中,所述对加权系数进行优化求解之后,还包括:In one embodiment, after optimizing and solving the weighting coefficients, further comprising:
计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
在一个实施例中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述方法还包括:In one embodiment, the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:
获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣和所述高频段波束输出主瓣是在对应频点下,根据所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
在一个实施例中,所述高频段期望主瓣响应的获取方法是:In one embodiment, the acquisition method of the high frequency band expected main lobe response is:
完成中频段加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media provided in the present application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Claims (20)

  1. 一种非均匀线型麦克风阵列,其中,包括中心麦克风对,以及在所述中心麦克风对两侧对称排布的若干个扩展麦克风;所述中心麦克风对和所述扩展麦克风排布在同一直线上,且相邻麦克风的间距不等,其中,所述扩展麦克风与所述中心麦克风对之间的间距越大,则所述扩展麦克风与靠近阵列中心一侧的相邻麦克风之间的间距也越大。A non-uniform linear microphone array, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphones are arranged on the same straight line , and the distances between adjacent microphones are not equal, wherein the greater the distance between the expansion microphone and the central microphone pair, the greater the distance between the expansion microphone and the adjacent microphones on the side near the center of the array big.
  2. 一种麦克风阵列的音频信号处理方法,应用于如权利要求1中所述的一种麦克风阵列,其中,所述方法包括:A method for processing audio signals of a microphone array, applied to a microphone array as claimed in claim 1, wherein the method comprises:
    根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
    根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;According to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients;
    从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;
    通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;
    对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  3. 根据权利要求2所述的麦克风阵列的音频信号处理方法,其中,信号采集通道的指定方式为以下方式中的一种:The audio signal processing method of the microphone array according to claim 2, wherein the specified mode of the signal acquisition channel is one of the following modes:
    在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
    在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
    在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道。Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
  4. 根据权利要求2所述的麦克风阵列的音频信号处理方法,其中,所述频段包括低频段、中频段和高频段;The audio signal processing method of a microphone array according to claim 2, wherein the frequency bands include low frequency bands, middle frequency bands and high frequency bands;
    其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  5. 根据权利要求2所述的麦克风阵列的音频信号处理方法,其中,所述对加权系数进行优化求解之后,还包括:The audio signal processing method of a microphone array according to claim 2, wherein, after optimizing and solving the weighting coefficients, further comprising:
    计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
    如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
    选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
  6. 根据权利要求4所述的麦克风阵列的音频信号处理方法,其中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述方法还包括:The audio signal processing method of the microphone array according to claim 4, wherein, the array steering vector matrix is calculated according to the imaginary signals of all directions of arrival, and the method further comprises:
    获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
    所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
    所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
    所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
    其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣和所述高频段波束输出主瓣是在对应频点下,根据所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
    所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  7. 根据权利要求5所述的麦克风阵列的音频信号处理方法,其中,所述高频段期望主瓣响应的获取方法是:The audio signal processing method of the microphone array according to claim 5, wherein, the acquisition method of the expected main lobe response of the high frequency band is:
    完成中频段加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
  8. 一种麦克风阵列的音频信号处理装置,其中,所述装置包括:An audio signal processing device for a microphone array, wherein the device includes:
    阵列导向矢量组计算单元,用于根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;The array steering vector group calculation unit is used to calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
    加权系数求解单元,用于根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;The weighting coefficient solving unit is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting coefficients calculated;
    信号提取单元,用于从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;A signal extraction unit, configured to extract a time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain a corresponding multi-channel time-frequency domain signal;
    空域滤波单元,用于通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;The spatial filtering unit is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain a time-frequency domain beam output signal and complete spatial filtering;
    信号生成模块,用于对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。A signal generating module, configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现一种麦克风阵列的音频信号处理方法,应用于如权利要求1中所述的一种麦克风阵列;其中,所述麦克风阵列的音频信号处理方法包括:A kind of computer equipment, comprises memory and processor, and described memory stores computer program, it is characterized in that, when described processor executes described computer program, realizes the audio signal processing method of a kind of microphone array, is applied to claim 1 A kind of microphone array described in; Wherein, the audio signal processing method of described microphone array comprises:
    根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
    根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;According to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients;
    从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;
    通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;
    对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  10. 根据权利要求9所述的计算机设备,其中,信号采集通道的指定方式为以下方式中的一种:The computer device according to claim 9, wherein the designated manner of the signal acquisition channel is one of the following manners:
    在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
    在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
    在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道。Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
  11. 根据权利要求9所述的计算机设备,其中,所述频段包括低频段、中频段和高频段;The computer device according to claim 9, wherein the frequency bands include low frequency bands, mid frequency bands and high frequency bands;
    其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  12. 根据权利要求9所述的计算机设备,其中,所述对加权系数进行优化求解之后,还包括:The computer device according to claim 9, wherein, after optimizing and solving the weighting coefficients, further comprising:
    计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
    如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
    选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
  13. 根据权利要求11所述的计算机设备,其中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述方法还包括:The computer device according to claim 11, wherein the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, the method further comprising:
    获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
    所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;
    所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
    所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
    其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣和所述高频段波束输出主瓣是在对应频点下,根据所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
    所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  14. 根据权利要求12所述的计算机设备,其中,所述高频段期望主瓣响应的获取方法是:The computer device according to claim 12, wherein the acquisition method of the high frequency band expected main lobe response is:
    完成中频段加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现一种麦克风阵列的音频信号处理方法,应用于如权利要求1中所述的一种麦克风阵列;其中,所述麦克风阵列的音频信号处理方法包括:A computer-readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, an audio signal processing method of a microphone array is implemented, which is applied to a method as described in claim 1 A microphone array; Wherein, the audio signal processing method of the microphone array comprises:
    根据所述麦克风阵列的结构参数和信号采集通道计算对应的阵列导向矢量组,其中,所述阵列导向矢量组包括对应不同频点的若干个阵列导向矢量矩阵;Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;
    根据各个频点所属的频段,选取对应的约束条件和代价函数,对加权系数进行优化求解,其中,所述代价函数是由所述阵列导向矢量矩阵和所述加权系数计算得到的;According to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients;
    从所述麦克风阵列采集的多通道信号中提取所述信号采集通道对应的时域音频信号,对所述信号采集通道对应的时域音频信号进行离散傅里叶变换得到对应的多通道时频域信号;Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;
    通过所述加权系数分别对相应频点的所述多通道时频域信号进行加权求和,得到时频域波束输出信号,完成空域滤波;Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;
    对所述时频域波束输出信号进行离散傅里叶反变换,计算得到目标音频信号。Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
  16. 根据权利要求15所述的计算机可读存储介质,其中,信号采集通道的指定方式为以下方式中的一种:The computer-readable storage medium according to claim 15, wherein the designated manner of the signal acquisition channel is one of the following manners:
    在所述麦克风阵列中选取全部通道信号指定为所述信号采集通道;Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;
    在所述麦克风阵列中以所述中心麦克风对为中心对称选取部分通道,指定为所述信号采集通道;In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;
    在所述麦克风阵列中以不对称方式选取部分通道,指定为所述信号采集通道。Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述频段包括低频段、中频段和高频段;The computer readable storage medium of claim 15, wherein the frequency bands include low frequency bands, mid frequency bands and high frequency bands;
    其中,所述频段是根据信号采样率和分析频点数对全部处理频带划分得到的;所述低频段、中频段和高频段分别对应不同的约束条件和代价函数。Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述对加权系数进行优化求解之后,还包括:The computer-readable storage medium according to claim 15, wherein, after optimizing and solving the weighting coefficients, further comprising:
    计算每个通道的加权系数在各个频点上的一阶差分,以及各个频点的相邻频点的一阶差分平均值;Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;
    如果某个频点上的所述一阶差分值与对应的一阶差分平均值之间的相对偏差大于预设的偏差阈值,则将所述频点作为所述加权系数的不连续点;If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;
    选择所述不连续点以及所述不连续点的相邻频点作为待平滑区间,对所述待平滑区间内的加权系数进行平滑,并将所述待平滑区间内的加权系数更新为平滑后的加权系数。Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
  19. 根据权利要求17所述的计算机可读存储介质,其中,所述阵列导向矢量矩阵是根据全部来波方向的假想信号计算的,所述方法还包括:The computer-readable storage medium according to claim 17, wherein the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, the method further comprising:
    获取各个频段的频率范围、波束主瓣角度范围、波束衰减角度范围和加权系数范数阈值,并获取中频段的期望主瓣响应和中高频段的主瓣偏差阈值;Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;
    所述低频段的约束条件包括:所述加权系数的范数小于加权系数范数低频段阈值;所述低频段的代价函数是:低频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成的输出功率;The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through Beamforming output power;
    所述中频段的约束条件包括:所述加权系数的范数小于加权系数范数中频段阈值,以及中频段波束输出主瓣与所述中频段期望主瓣响应的偏差小于所述主瓣偏差中频段阈值;所述中频段的代价函数是:所述中频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;
    所述高频段的约束条件包括:所述加权系数的范数小于加权系数范数高频段阈值,以及高频段波束输出主瓣与所述高频段期望主瓣响应的偏差小于所述主瓣偏差高频段阈值;所述高频段的代价函数是:所述高频段波束衰减角度范围内的所有来波方向的假想信号经过波束形成输出功率;The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;
    其中,所述输出功率是由所述阵列导向矢量矩阵和所述加权系数计算得到的;所述中频段波束输出主瓣和所述高频段波束输出主瓣是在对应频点下,根据所述加权系数对所述阵列导向矢量矩阵进行加权求和得到的;Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;
    所述低频段、中频段和高频段对应的约束条件还均包括:来自主瓣中心角度方向的假想信号经过波束形成的输出增益为1。The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述高频段期望主瓣响应的获取方法是:The computer-readable storage medium according to claim 18, wherein the method for obtaining the expected main lobe response in the high frequency band is:
    完成中频段加权系数优化后,根据优化后的所述加权系数对所述阵列导向矢量矩阵进行加权求和,得到波束主瓣形状,并将所述波束主瓣形状作为所述高频段期望主瓣响应。After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
PCT/CN2022/125739 2021-10-21 2022-10-17 Microphone array and signal processing method and apparatus therefor, and device and medium WO2023066213A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111228311.XA CN116017230A (en) 2021-10-21 2021-10-21 Microphone array, signal processing method, device, equipment and medium thereof
CN202111228311.X 2021-10-21

Publications (1)

Publication Number Publication Date
WO2023066213A1 true WO2023066213A1 (en) 2023-04-27

Family

ID=86032244

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/125739 WO2023066213A1 (en) 2021-10-21 2022-10-17 Microphone array and signal processing method and apparatus therefor, and device and medium

Country Status (2)

Country Link
CN (1) CN116017230A (en)
WO (1) WO2023066213A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116962948A (en) * 2023-07-31 2023-10-27 南京航空航天大学 Non-uniform linear sparse microphone array design method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011015050A (en) * 2009-06-30 2011-01-20 Nittobo Acoustic Engineering Co Ltd Array for beam forming, and sound source exploration/measurement system using the same
CN104360355A (en) * 2014-12-05 2015-02-18 北京北斗星通导航技术股份有限公司 Anti-interference method and device
CN104811867A (en) * 2015-04-29 2015-07-29 西安电子科技大学 Spatial filtering method for microphone array based on virtual array extension
CN111954121A (en) * 2020-08-21 2020-11-17 云知声智能科技股份有限公司 Microphone array directional pickup method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011015050A (en) * 2009-06-30 2011-01-20 Nittobo Acoustic Engineering Co Ltd Array for beam forming, and sound source exploration/measurement system using the same
CN104360355A (en) * 2014-12-05 2015-02-18 北京北斗星通导航技术股份有限公司 Anti-interference method and device
CN104811867A (en) * 2015-04-29 2015-07-29 西安电子科技大学 Spatial filtering method for microphone array based on virtual array extension
CN111954121A (en) * 2020-08-21 2020-11-17 云知声智能科技股份有限公司 Microphone array directional pickup method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116962948A (en) * 2023-07-31 2023-10-27 南京航空航天大学 Non-uniform linear sparse microphone array design method
CN116962948B (en) * 2023-07-31 2024-06-11 南京航空航天大学 Non-uniform linear sparse microphone array design method

Also Published As

Publication number Publication date
CN116017230A (en) 2023-04-25

Similar Documents

Publication Publication Date Title
JP7011075B2 (en) Target voice acquisition method and device based on microphone array
US7336793B2 (en) Loudspeaker system for virtual sound synthesis
Coleman et al. Personal audio with a planar bright zone
US9143856B2 (en) Apparatus and method for spatially selective sound acquisition by acoustic triangulation
CN110557710B (en) Low complexity multi-channel intelligent loudspeaker with voice control
CN107018470B (en) A kind of voice recording method and system based on annular microphone array
WO2016074495A1 (en) Signal processing method and device
US20120314885A1 (en) Signal processing using spatial filter
CN110648678A (en) Scene identification method and system for conference with multiple microphones
JP2013543987A (en) System, method, apparatus and computer readable medium for far-field multi-source tracking and separation
Molés-Cases et al. Weighted pressure matching with windowed targets for personal sound zones
EP3275208A1 (en) Sub-band mixing of multiple microphones
Betlehem et al. Two dimensional sound field reproduction using higher order sources to exploit room reflections
WO2023066213A1 (en) Microphone array and signal processing method and apparatus therefor, and device and medium
CN113766396B (en) Speaker control
CN103916730B (en) A kind of sound field focusing method and system that can improve tonequality
TWI465121B (en) System and method for utilizing omni-directional microphones for speech enhancement
Saruwatari et al. Musical noise controllable algorithm of channelwise spectral subtraction and adaptive beamforming based on higher order statistics
Pepe et al. Deep learning for individual listening zone
Priyanka et al. Adaptive Beamforming Using Zelinski-TSNR Multichannel Postfilter for Speech Enhancement
Kovalyov et al. Dfsnet: A steerable neural beamformer invariant to microphone array configuration for real-time, low-latency speech enhancement
Yu et al. Speech enhancement based on the generalized sidelobe cancellation and spectral subtraction for a microphone array
CN114724574A (en) Double-microphone noise reduction method with adjustable expected sound source direction
Liu et al. A new neural beamformer for multi-channel speech separation
Yang et al. Binaural Angular Separation Network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22882819

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE