WO2020037555A1 - Method, device, apparatus, and system for evaluating microphone array consistency - Google Patents

Method, device, apparatus, and system for evaluating microphone array consistency Download PDF

Info

Publication number
WO2020037555A1
WO2020037555A1 PCT/CN2018/101766 CN2018101766W WO2020037555A1 WO 2020037555 A1 WO2020037555 A1 WO 2020037555A1 CN 2018101766 W CN2018101766 W CN 2018101766W WO 2020037555 A1 WO2020037555 A1 WO 2020037555A1
Authority
WO
WIPO (PCT)
Prior art keywords
microphone
microphones
signal
reference microphone
difference
Prior art date
Application number
PCT/CN2018/101766
Other languages
French (fr)
Chinese (zh)
Inventor
李国梁
罗朝洪
程树青
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to CN201880001199.6A priority Critical patent/CN109313909B/en
Priority to PCT/CN2018/101766 priority patent/WO2020037555A1/en
Priority to CN202310466643.4A priority patent/CN116437280A/en
Publication of WO2020037555A1 publication Critical patent/WO2020037555A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/001Monitoring arrangements; Testing arrangements for loudspeakers

Definitions

  • the present application relates to the field of voice communication and voice intelligent interaction, and more particularly, to a method, a device, a device, and a system for evaluating the consistency of a microphone array.
  • speech enhancement technology can improve people's hearing experience and improve the intelligibility of speech communication.
  • speech intelligent interactive applications speech enhancement technology can improve the accuracy of speech recognition and enhance the user experience. Therefore, speech enhancement technology It is vital in both traditional voice communication and voice interaction.
  • the speech enhancement technology is divided into single-channel speech enhancement technology and multi-channel speech enhancement technology.
  • single-channel speech enhancement technology can eliminate steady-state noise and cannot eliminate non-steady-state noise, and the improvement of the signal ratio is at the expense of speech damage and signal-to-noise. The more the ratio is increased, the greater the speech damage; the multi-channel speech enhancement technology uses a microphone array to collect multiple signals, and uses phase information and coherent information between the multiple microphone signals to eliminate noise, which can eliminate non-steady-state noise and reduce speech damage. small.
  • the consistency between different microphones in the microphone array directly affects the performance of the algorithm.
  • the existing scheme proposes an improved algorithm for the multi-channel enhancement technology, which increases the robustness of the algorithm and simultaneously The requirement for performance is reduced. However, when the consistency between microphones is very low, the algorithm performance will still be affected, which will affect the user experience.
  • the present application provides a method, device, device and system for evaluating the consistency of a microphone array, which can evaluate the consistency between different microphones in the microphone array, thereby guiding the calibration of the microphone array and evaluating the multi-channel enhancement algorithm based on the consistency evaluation result. Robustness improves user experience.
  • a method for assessing the consistency of a microphone array including:
  • a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone are determined, and the reference microphone is among the N microphones. Any one of the microphones;
  • a consistency evaluation is performed on the N microphones.
  • the consistency evaluation of the N microphones can be used to guide the microphone distribution in the microphone array, or to redesign the microphone distribution in the microphone array, or to redesign the microphone array, or to evaluate multi-channels. Enhance the robustness of the algorithm.
  • the distribution of microphone 1 or microphone 2 in the microphone array can be guided, or the microphone 1 or microphone 2 can be redesigned.
  • the distribution of the microphone 1 in the microphone array can be guided, or the microphone 1 can be redesigned, or the microphone array can be redesigned.
  • a phase spectrum difference and / or a power spectrum difference between each microphone and a reference microphone are determined according to the N audio signals collected by the N microphones respectively, so as to perform consistency evaluation on the N microphones Eliminate the impact of consistency between microphones on multi-channel speech enhancement algorithms and improve user experience.
  • performing a consistency evaluation on the N microphones according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone includes:
  • the phase consistency between the corresponding microphone and the reference microphone is evaluated.
  • the phase spectrum difference between the microphone 1 and the reference microphone is A, and the smaller A is, the better the phase consistency between the microphone 1 and the reference microphone is.
  • a threshold can be set. If the phase spectrum difference between the two microphones is smaller than this threshold, it means that the phase consistency between the two microphones meets the design requirements, and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
  • thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
  • the method further includes:
  • phase spectrum difference values are calibrated respectively.
  • the fixed phase difference between microphone 1 and the reference microphone is A
  • the phase spectrum difference between microphone 1 and the reference microphone is B
  • the phase spectrum difference between microphone 1 and the reference microphone is C.
  • C BA
  • the calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance includes:
  • Y i ( ⁇ ) represents the frequency spectrum of the i-th microphone
  • Y 1 ( ⁇ ) represents the frequency spectrum reference microphone
  • represents the frequency
  • d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference
  • c denotes the speed of sound
  • 2 ⁇ d i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
  • performing a consistency evaluation on the N microphones according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone includes:
  • the amplitude consistency between the corresponding microphone and the reference microphone is evaluated.
  • the power spectrum difference between the microphone 1 and the reference microphone is A, and the smaller the A, the better the amplitude consistency between the microphone 1 and the reference microphone.
  • a threshold can be set. If the power spectrum difference between the two microphones is smaller than this threshold, it means that the amplitude consistency between the two microphones meets the design requirements and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
  • thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
  • the N audio signals are signals collected in an environment in which the frequency-sweep signal data is played.
  • the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
  • the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.
  • determining a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone include:
  • a phase spectrum difference value and / or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone are determined.
  • K represents the total number of frames of the signal collected by each microphone.
  • each of the K signal frames may be processed by adding a Hamming window.
  • any two adjacent signal frames in the K signal frames overlap by R%, and R> 0.
  • R is 25 or 50.
  • the signal amplitude remains unchanged after overlapping and windowing.
  • each frame of the signal after the overlap has a component of the previous frame to prevent discontinuity between the two frames.
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the i-th audio signal
  • K represents the total number of frames collected by each microphone
  • [] T represents the transpose of a vector or a matrix.
  • the phase spectrum between each of the N microphones except the reference microphone and the reference microphone is determined according to the K target signal frames corresponding to each audio signal. Differences, including:
  • imag () means take the imaginary part
  • ln () means take the natural logarithm
  • Represents the j-th target signal frame of the i-th microphone Indicates the main frequency.
  • the power spectrum between each of the N microphones except the reference microphone and the reference microphone is determined according to the K target signal frames corresponding to each audio signal. Differences, including:
  • a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
  • determining the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal includes:
  • P i ( ⁇ ) represents the power spectrum of the i-th audio signal
  • Yi j ( ⁇ ) represents the j-th target signal frame in the i-th audio signal
  • K represents the total frame of the signal received by each microphone Number
  • represents frequency
  • the determining a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the power spectrum of each audio signal includes:
  • PD i ( ⁇ ) represents the power spectrum difference between the i-th microphone and the reference microphone
  • P 1 ( ⁇ ) represents the power spectrum of the reference microphone
  • P i ( ⁇ ) represents the power spectrum of the i-th microphone
  • the acquiring the N audio signals respectively acquired by the N microphones includes:
  • the frequency sweep signal data is composed of M + 1 segments of equal length and different frequencies.
  • the number of FFT points N fft is an even number, generally 32,64,128, ..., 1024, etc., the more the number of points, the greater the savings in the amount of calculations.
  • f i represents the frequency of the i-th stage signal
  • F s represents the sampling frequency
  • N fft represents the number of FFT points
  • S i (t) represents the i-th stage signal
  • the frequency sweep signal data played by the speaker can be written in the following vector form:
  • S (t) represents the frequency sweep signal data played by the speaker
  • S i (t) represents the i-th segment signal
  • [] T represents the transpose of a vector or matrix.
  • the N microphones respectively collect N audio signals
  • the audio signal collected by the i-th microphone is represented as x i (t)
  • x i (t) can be written as the following vector form:
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the audio signal collected by the i-th microphone
  • K represents the total number of frames of the signal collected by each microphone
  • [] T represents the transpose of the vector or matrix.
  • the acquiring the N audio signals respectively acquired by the N microphones includes:
  • N microphones Placing the N microphones in a test room, with speakers arranged in the test room, the N microphones being located directly in front of the speakers;
  • the test room has an anechoic room environment
  • the speaker is an artificial mouth dedicated for audio testing
  • the artificial mouth is calibrated with a standard microphone before use.
  • the method before controlling the speaker to play Gaussian white noise data or frequency sweep signal data, the method further includes:
  • a device for evaluating the consistency of a microphone array including:
  • An obtaining unit configured to obtain N audio signals respectively collected by N microphones, where the N microphones form a microphone array, and N ⁇ 2;
  • a processing unit configured to determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals, and
  • the reference microphone is any one of the N microphones;
  • the processing unit is further configured to perform an analysis on the N based on a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. Microphones for consistency assessment.
  • the processing unit is specifically configured to:
  • the phase consistency between the corresponding microphone and the reference microphone is evaluated.
  • the processing unit is further configured to:
  • the corresponding phase spectrum difference values thereof are respectively calibrated.
  • the processing unit is specifically configured to:
  • Y i ( ⁇ ) represents the frequency spectrum of the i-th microphone
  • Y 1 ( ⁇ ) represents the frequency spectrum reference microphone
  • represents the frequency
  • d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference
  • c denotes the speed of sound
  • 2 ⁇ d i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
  • the processing unit is specifically configured to:
  • An amplitude consistency between a corresponding microphone and the reference microphone is evaluated according to a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
  • the N audio signals are signals collected in an environment in which the frequency sweep signal data is played.
  • the N audio signals are signals collected in an environment in which Gaussian white noise data or frequency-sweep signal data is played.
  • the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.
  • the processing unit is specifically configured to:
  • any two adjacent signal frames in the K signal frames overlap by R%, and R> 0.
  • the R is 25 or 50.
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the i-th audio signal
  • K represents the total number of frames collected by each microphone
  • [] T represents the transpose of a vector or a matrix.
  • the processing unit is specifically configured to:
  • imag () means take the imaginary part
  • ln () means take the natural logarithm
  • Represents the j-th target signal frame of the i-th microphone Indicates the main frequency.
  • the processing unit is specifically configured to:
  • a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
  • the processing unit is specifically configured to:
  • P i ( ⁇ ) represents the power spectrum of the i-th audio signal
  • Yi j ( ⁇ ) represents the j-th target signal frame in the i-th audio signal
  • K represents the total frame of the signal collected by each microphone Number
  • represents frequency
  • the processing unit is specifically configured to:
  • PD i ( ⁇ ) represents the power spectrum difference between the i-th microphone and the reference microphone
  • P 1 ( ⁇ ) represents the power spectrum of the reference microphone
  • P i ( ⁇ ) represents the power spectrum of the i-th microphone
  • the processing unit is specifically configured to:
  • the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection using a speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to acquire the N audio signals, wherein, if the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
  • the processing unit is further configured to:
  • f i represents the frequency of the i-th stage signal
  • F s represents the sampling frequency
  • N fft represents the number of FFT points
  • S i (t) represents the i-th stage signal
  • the frequency sweep signal data played by the speaker is written in the following vector form:
  • S (t) represents the frequency sweep signal data played by the speaker
  • S i (t) represents the i-th segment signal
  • [] T represents the transpose of a vector or matrix.
  • the N microphones respectively collect N audio signals
  • the audio signal collected by the i-th microphone is represented as x i (t)
  • x i (t) can be written as the following vector form :
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the audio signal collected by the i-th microphone
  • K represents the total number of frames of the signal collected by each microphone
  • [] T represents the transpose of the vector or matrix.
  • the obtaining unit is specifically configured to:
  • the test room has an anechoic room environment
  • the speaker is an artificial mouth dedicated for audio testing
  • the artificial mouth is calibrated with a standard microphone before use.
  • the processing unit controls the speaker to play Gaussian white noise data or frequency sweep signal data
  • the obtaining unit is further configured to:
  • Triggering the processing unit according to a formula Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than a first threshold.
  • a device for evaluating the consistency of a microphone array including:
  • a processor configured to call and run programs and data stored in the memory
  • the apparatus is configured to perform the method in the first aspect described above or any possible implementation thereof.
  • a system for assessing the consistency of a microphone array including:
  • N microphones forming a microphone array N ⁇ 2;
  • At least one audio source At least one audio source
  • the device comprises a memory for storing programs and data and a processor for calling and running the programs and data stored in the memory, and the device is configured as the method in the first aspect or any possible implementation thereof.
  • a computer storage medium stores program code, and the program code may be used to instruct execution of the method in the first aspect or any possible implementation manner thereof.
  • a computer program product containing instructions which, when run on a computer, causes the computer to execute the method in the first aspect or any possible implementation thereof.
  • FIG. 1 is a schematic flowchart of a method for evaluating consistency of a microphone array according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a test environment according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of calculating a phase spectrum difference according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of calculating a power spectrum difference according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a phase spectrum difference between two microphones according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a phase spectrum difference value after calibration between two microphones according to an embodiment of the present application.
  • FIG. 7a is a schematic diagram of a power spectrum of two microphones according to an embodiment of the present application.
  • FIG. 7b is a schematic diagram of a power spectrum difference between two microphones according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a device for evaluating consistency of a microphone array according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an apparatus for evaluating consistency of a microphone array according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a system for evaluating consistency of a microphone array according to an embodiment of the present application.
  • Microphone array refers to a system composed of a certain number of microphones (acoustic sensors) that is used to sample and process the spatial characteristics of the sound field. The difference between the phases of the sound waves received by the two microphones is used to filter the sound waves, which can eliminate the ambient background sound to the maximum, leaving only the required sound waves.
  • the multi-channel speech enhancement technology algorithm assumes that the target speech components of multiple microphones in the microphone array are highly correlated, and the target speech is not related to non-target interference, so the consistency between different microphones in the microphone array directly affects the algorithm performance.
  • Quantitative evaluation of microphone consistency can be used to guide the design of microphones and the design of microphone arrays.
  • Microphone array circuits, electronic components, and acoustic structures all affect the consistency of microphones.
  • various factors can be tested item by item. The effect of consistency, so that the design of microphone consistency meets the system requirements.
  • Quantitative evaluation of microphone consistency can be used to compare the robustness of different algorithms. The lower the requirement for consistency indicators, the better the algorithm's robustness when the premise of achieving the same speech enhancement performance is achieved.
  • consistency is measured from two aspects: amplitude spectrum difference and phase spectrum difference, which has objectivity and accuracy, and the quantitative consistency evaluation method can objectively guide the design of the microphone array and can also objectively Comparing the robustness of multi-channel speech enhancement algorithms.
  • FIG. 1 is a schematic flowchart of a method for evaluating consistency of a microphone array according to an embodiment of the present application. It should be understood that FIG. 1 shows steps or operations of the method, but these steps or operations are merely examples, and other operations or variations of each operation in FIG. 1 may be performed in the embodiment of the present application.
  • the method may be executed by a device for evaluating the consistency of the microphone array, where the device for evaluating the consistency of the microphone array may be a mobile phone, a tablet computer, a portable computer, a Personal Digital Assistant (PDA), or the like.
  • PDA Personal Digital Assistant
  • S110 Obtain N audio signals collected by N microphones respectively, where the N microphones form a microphone array, and N ⁇ 2.
  • a microphone array 201 composed of the N microphones is placed in a test room 202, and a speaker 203 is disposed in the test room 202.
  • the microphone array 201 is located directly in front of the speaker 203.
  • the microphone array 201 is connected to the speaker 203, such as a computer control device 204.
  • the control device 204 can control the speaker 203 to play specific audio data, for example, to play Gaussian white noise data or frequency-sweep signal data.
  • the control device 204 can obtain the N microphones from the microphone array 201. Audio signals.
  • the microphone consistency evaluation requires that the signal-to-noise ratio of the collected audio signal is sufficiently high and the background noise is sufficiently weak, so the test environment is required to be in a quiet environment.
  • an anechoic room environment is required in the test room 202.
  • the speaker 203 requires a high signal-to-noise ratio and a flat frequency response curve.
  • the speaker uses an artificial mouth dedicated for audio testing, and is calibrated with a standard microphone before use.
  • the microphone array 201 is placed directly in front of the speaker 203, and in particular, it is required to be placed at a position calibrated by a standard microphone.
  • SNR signal-to-noise ratio
  • first audio data X 1 collected by the N microphones within a first duration T 1 is acquired. (n); then, in the environment where Gaussian white noise data or frequency-sweep signal data is played (that is, the control device 204 controls the speaker 203 to play Gaussian white noise data or frequency-sweep signal data), obtain the N microphones at the second
  • the second audio data X 2 (n) collected within the duration T 2 is calculated according to the following formula 1; finally, when the SNR is greater than a set threshold, the detection passes, otherwise the detection fails.
  • T 1 represents the first duration
  • T 2 represents the second duration
  • X 1 (n) represents the first audio data
  • X 2 (n) represents the second audio data.
  • test fails, the above test environment needs to be adjusted or calibrated to eliminate some factors that may affect the sexual noise ratio, until the SNR calculated according to the above formula 1 is greater than a set threshold.
  • acquiring audio signals by using the test environment shown in FIG. 2 described above may specifically include:
  • the sampling frequency F s and the number of FFT points N fft of the N microphones during audio signal collection are determined, and Gaussian white noise data or frequency-sweep signal data is played using a speaker, and the N microphones collect the N audio signals.
  • the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
  • each signal in the M + 1 segment signal can be calculated according to the following formula 2
  • each signal in the M + 1 segment signal can be calculated according to the following formula 3.
  • f i is the frequency of the ith signal
  • F s is the sampling frequency
  • N fft is the number of FFT points.
  • S i (t) represents the signal paragraph i
  • f i is the i-th frequency band signal.
  • the frequency sweep signal data played by the speaker can be written in the following vector form:
  • S (t) represents the frequency sweep signal data played by the speaker
  • S i (t) represents the i-th segment signal
  • [] T represents the transpose of a vector or matrix.
  • the N microphones respectively acquire N audio signals
  • the audio signal collected by the i-th microphone is represented as x i (t)
  • x i (t) can be written as the following vector form:
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the audio signal collected by the i-th microphone
  • K represents the total number of frames of the signal collected by each microphone
  • [] T represents the transpose of the vector or matrix.
  • the reference microphones are the N microphones. Any one of the microphones.
  • the audio signals may be framed, and the audio signals of each frame may be windowed, and the windowed signals of each frame may be FFT-transformed to obtain different microphones. Phase spectrum difference between the two.
  • the N audio signals are x 1 (t), x 2 (t), ..., x N (t), and each of the N audio signals is divided into Frames, to obtain K signal frames of equal length, K ⁇ 2.
  • frame the i-th audio signal to obtain K signal frames of equal length and write the following vector form:
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the i-th audio signal
  • K represents the total number of frames collected by each microphone
  • [] T represents the transpose of a vector or a matrix
  • K windowed signal frames For example, window the j-th frame x i, j of the i-th audio signal to obtain the i-th audio frame.
  • the j-th windowed signal frame of the signal y i, j x i, j ⁇ Win;
  • imag () means take the imaginary part
  • ln () means take the natural logarithm
  • Represents the j-th target signal frame of the i-th microphone Indicates the main frequency.
  • the first microphone is used as the reference microphone, that is, the phase spectrum difference between each microphone except the first microphone and the first microphone is calculated separately, and
  • the first microphone corresponds to the audio signal x 1 (t)
  • the second microphone corresponds to the audio signal x 2 (t)
  • ... the Nth microphone corresponds to the audio signal x N (t).
  • K represents the total number of frames of signals received by each microphone.
  • each of the K signal frames may be processed by adding a Hamming window.
  • any two adjacent signal frames in the K signal frames overlap by R%, and R> 0.
  • R is 25 or 50.
  • any two adjacent signal frames in the K signal frames overlap by 25% or 50%.
  • the signal amplitude remains unchanged after overlapping and windowing.
  • each frame of the signal after the overlap has a component of the previous frame to prevent discontinuity between the two frames.
  • the N audio signals are signals collected in an environment where the frequency sweep signal data is played.
  • the N audio signals are signals collected in an environment where the frequency sweep signal data is played.
  • phase difference of any frequency ⁇ can be calculated, that is, the phase spectrum difference PDiff i ( ⁇ ) between the i-th microphone and the reference microphone, that is, the above
  • the audio signals may be framed, and each frame of the audio signal is windowed, and the windowed signal of each frame is subjected to FFT transformation. After the FFT transformation is obtained, The power spectrum of each frame of the signal, find the power spectrum difference between different microphones.
  • the N audio signals are x 1 (t), x 2 (t), ..., x N (t), and each of the N audio signals is divided.
  • frame the i-th audio signal to obtain K signal frames of equal length and write the following vector form:
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the i-th audio signal
  • K represents the total number of frames received by each microphone
  • [] T represents the transpose of a vector or a matrix
  • K windowed signal frames For example, window the j-th frame x i, j of the i-th audio signal to obtain the i-th audio frame.
  • the j-th windowed signal frame of the signal y i, j x i, j ⁇ Win;
  • each audio signal determines the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. For example, calculate the i-th microphone and The power spectrum difference between the reference microphones.
  • P i ( ⁇ ) represents the power spectrum of the i-th audio signal
  • Yi j ( ⁇ ) represents the j-th target signal frame in the i-th audio signal
  • represents the frequency
  • K represents the data collected by each microphone. The total number of frames of the signal.
  • PD i ( ⁇ ) represents the power spectrum difference between the i-th microphone and the reference microphone
  • P 1 ( ⁇ ) represents the power spectrum of the reference microphone
  • P i ( ⁇ ) represents the power spectrum of the i-th microphone
  • the first microphone is used as the reference microphone, that is, the power spectrum difference between each microphone except the first microphone and the first microphone is calculated separately,
  • the first microphone corresponds to the audio signal x 1 (t)
  • the second microphone corresponds to the audio signal x 2 (t)
  • ... the Nth microphone corresponds to the audio signal x N (t).
  • each of the K signal frames may be processed by adding a Hamming window.
  • any two adjacent signal frames in the K signal frames overlap by R%, and R> 0.
  • R is 25 or 50.
  • any two adjacent signal frames in the K signal frames overlap by 25% or 50%.
  • the signal amplitude remains unchanged after overlapping and windowing.
  • each frame of the signal after the overlap has a component of the previous frame to prevent discontinuity between the two frames.
  • the N audio signals are signals collected in an environment in which Gaussian white noise data or frequency-sweep signal data is played.
  • the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
  • S130 Perform a consistency evaluation on the N microphones according to a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
  • phase spectrum difference value is used for phase consistency evaluation
  • power spectrum difference value is used for amplitude consistency evaluation
  • a corresponding microphone and the reference microphone are evaluated according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone. Phase consistency between.
  • the phase spectrum difference between the microphone 1 and the reference microphone is A, and the smaller A is, the better the phase consistency between the microphone 1 and the reference microphone is.
  • a threshold can be set. If the phase spectrum difference between the two microphones is smaller than this threshold, it means that the phase consistency between the two microphones meets the design requirements, and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
  • thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
  • the phase spectrum difference value may be calibrated by using a fixed phase difference.
  • a distance D i represents the i-th microphone and reference microphone to the sound source of the difference
  • a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone is calculated.
  • the i-th microphone and the reference microphone can be calculated according to the following formula 7.
  • phase spectrum difference values are calibrated respectively.
  • Y i ( ⁇ ) represents the frequency spectrum of the i-th microphone
  • Y 1 ( ⁇ ) represents the frequency spectrum reference microphone
  • represents the frequency
  • d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference
  • c denotes the speed of sound
  • 2 ⁇ d i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
  • the linear phase can be used to determine the fixed phase difference.
  • the fixed phase difference between the microphone 1 and the reference microphone is A
  • the phase spectrum difference between the microphone 1 and the reference microphone is B
  • the straight line represents the fitting between the microphone 1 and the reference microphone.
  • the amplitude between the corresponding microphone and the reference microphone is evaluated based on the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. consistency.
  • FIG. 7a shows the power spectrum of the microphone 1 and the power spectrum of the reference microphone
  • FIG. 7b shows the power spectrum difference between the microphone 1 and the reference microphone.
  • ⁇ 1 decibel (dB) the maximum value of the power spectrum difference
  • a threshold can be set. If the power spectrum difference between the two microphones is smaller than this threshold, it means that the amplitude consistency between the two microphones meets the design requirements and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
  • thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
  • the influence of factors such as the circuit, electronic components, and acoustic structure of the microphone array on the consistency of the microphone can be tested item by item to guide the calibration of the microphone array.
  • the phase spectrum difference and / or power spectrum difference between each microphone and the reference microphone may be determined according to the N audio signals collected by the N microphones, so as to make the N microphones consistent. Performance evaluation to eliminate the impact of consistency between microphones on multi-channel speech enhancement algorithms and improve user experience.
  • an embodiment of the present application provides a device 800 for evaluating the consistency of a microphone array, including:
  • the obtaining unit 810 is configured to obtain N audio signals collected by N microphones respectively, where the N microphones form a microphone array, and N ⁇ 2;
  • a processing unit 820 configured to determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals,
  • the reference microphone is any one of the N microphones;
  • the processing unit 820 is further configured to perform, according to a phase spectrum difference value and / or a power spectrum difference value between each of the N microphones except the reference microphone, and the reference microphone, N microphones were evaluated for consistency.
  • processing unit 820 is specifically configured to:
  • the phase consistency between the corresponding microphone and the reference microphone is evaluated.
  • processing unit 820 is further configured to:
  • the corresponding phase spectrum difference values thereof are respectively calibrated.
  • processing unit 820 is specifically configured to:
  • Y i ( ⁇ ) represents the frequency spectrum of the i-th microphone
  • Y 1 ( ⁇ ) represents the frequency spectrum reference microphone
  • represents the frequency
  • d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference
  • c denotes the speed of sound
  • 2 ⁇ d i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
  • processing unit 820 is specifically configured to:
  • An amplitude consistency between a corresponding microphone and the reference microphone is evaluated according to a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
  • the N audio signals are signals collected in an environment where the frequency sweep signal data is played.
  • the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
  • the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.
  • processing unit 820 is specifically configured to:
  • any two adjacent signal frames of the K signal frames overlap by R%, and R> 0.
  • the R is 25 or 50.
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the i-th audio signal
  • K represents the total number of frames collected by each microphone
  • [] T represents the transpose of a vector or a matrix.
  • processing unit 820 is specifically configured to:
  • imag () means take the imaginary part
  • ln () means take the natural logarithm
  • Represents the j-th target signal frame of the i-th microphone Indicates the main frequency.
  • processing unit 820 is specifically configured to:
  • a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
  • processing unit 820 is specifically configured to:
  • P i ( ⁇ ) represents the power spectrum of the i-th audio signal
  • Yi j ( ⁇ ) represents the j-th target signal frame in the i-th audio signal
  • K represents the total frame of the signal collected by each microphone Number
  • represents frequency
  • processing unit 820 is specifically configured to:
  • PD i ( ⁇ ) represents the power spectrum difference between the i-th microphone and the reference microphone
  • P 1 ( ⁇ ) represents the power spectrum of the reference microphone
  • P i ( ⁇ ) represents the power spectrum of the i-th microphone
  • processing unit 820 is specifically configured to:
  • the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection using a speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to acquire the N audio signals, wherein, if the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
  • processing unit 820 is further configured to:
  • f i represents the frequency of the i-th stage signal
  • F s represents the sampling frequency
  • N fft represents the number of FFT points
  • S i (t) represents the i-th stage signal
  • the frequency sweep signal data played by the speaker is written in the following vector form:
  • S (t) represents the frequency sweep signal data played by the speaker
  • S i (t) represents the i-th segment signal
  • [] T represents the transpose of a vector or matrix.
  • the N microphones respectively acquire N audio signals, where the audio signal collected by the i-th microphone is represented as x i (t), and x i (t) can be written in the following vector form:
  • x i (t) [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
  • x i (t) represents the audio signal collected by the i-th microphone
  • K represents the total number of frames of the signal collected by each microphone
  • [] T represents the transpose of the vector or matrix.
  • the obtaining unit 810 is specifically configured to:
  • the test room has an anechoic room environment
  • the speaker is an artificial mouth dedicated for audio testing
  • the artificial mouth is calibrated with a standard microphone before use.
  • the processing unit 820 controls the speaker to play Gaussian white noise data or frequency sweep signal data
  • the obtaining unit 810 is further configured to:
  • Trigger the processing unit 820 according to a formula Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than a first threshold.
  • an embodiment of the present application provides a device 900 for evaluating consistency of a microphone array, including:
  • a memory 910 for storing programs and data
  • a processor 920 configured to call and run a program and data stored in the memory
  • the device 900 is configured to perform the methods shown in FIGS. 1 to 7 described above.
  • an embodiment of the present application provides a system 1000 for evaluating consistency of a microphone array, including:
  • At least one audio source 1020 At least one audio source 1020;
  • the device 1030 includes a memory 1031 for storing programs and data and a processor 1032 for calling and running the programs and data stored in the memory, and the device 1030 is configured as the method shown in FIGS. 1 to 7 described above.
  • the size of the sequence numbers of the above processes does not mean the order of execution.
  • the execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may Integration into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes .

Abstract

Embodiments of the present application provide a method, device, apparatus, and system for evaluating microphone array consistency, being capable of evaluating the consistency among different microphones in a microphone array, thereby guiding calibration of the microphone array and evaluating robustness of a multi-channel enhancement algorithm according to the consistency evaluation result, and improving the user experience. The method comprises: obtaining N audio signals collected by N microphones respectively, the N microphones forming the microphone array, and N being greater than and equal to 2; determining, according to the N audio signals, a phase spectrum difference value and/or a power spectrum difference value between the microphones except for a reference microphone in the N microphones and the reference microphone, the reference microphone being any one microphone in the N microphones; and performing consistency evaluation on the N microphones according to the phase spectrum difference value and/or the power spectrum difference value between the microphones except for the reference microphone in the N microphones and the reference microphone.

Description

评估麦克风阵列一致性的方法、设备、装置和系统Method, equipment, device and system for evaluating microphone array consistency 技术领域Technical field
本申请涉及语音通讯和语音智能交互领域,并且更具体地,涉及评估麦克风阵列一致性的方法、设备、装置和系统。The present application relates to the field of voice communication and voice intelligent interaction, and more particularly, to a method, a device, a device, and a system for evaluating the consistency of a microphone array.
背景技术Background technique
在语音通讯应用中,语音增强技术能够提高人的听觉感受,提高语音通讯的可懂度,在语音智能交互应用中,语音增强技术能够提高语音识别的准确率,提升用户体验,因此语音增强技术无论是在传统的语音通讯,还是语音交互中都至关重要。语音增强技术分为单通道语音增强技术和多通道语音增强技术,其中,单通道语音增强技术能够消除稳态噪声,不能消除非稳态噪声,且信号比提高是以语音损伤为代价,信噪比提高越多,语音损伤越大;多通道语音增强技术利用麦克风阵列采集多路信号,利用多麦克风信号之间的相位信息和相干信息消除噪声,能够消除非稳态噪声,且对语音损伤较小。In speech communication applications, speech enhancement technology can improve people's hearing experience and improve the intelligibility of speech communication. In speech intelligent interactive applications, speech enhancement technology can improve the accuracy of speech recognition and enhance the user experience. Therefore, speech enhancement technology It is vital in both traditional voice communication and voice interaction. The speech enhancement technology is divided into single-channel speech enhancement technology and multi-channel speech enhancement technology. Among them, single-channel speech enhancement technology can eliminate steady-state noise and cannot eliminate non-steady-state noise, and the improvement of the signal ratio is at the expense of speech damage and signal-to-noise. The more the ratio is increased, the greater the speech damage; the multi-channel speech enhancement technology uses a microphone array to collect multiple signals, and uses phase information and coherent information between the multiple microphone signals to eliminate noise, which can eliminate non-steady-state noise and reduce speech damage. small.
在多通道语音增强技术中,麦克风阵列中不同麦克风之间的一致性直接影响算法性能,现有方案提出了多通道增强技术的改进算法,增加算法的鲁棒性,同时对麦克风之间的一致性要求降低,然而,麦克风之间的一致性很低时仍然会影响算法性能,从而影响了用户体验。In the multi-channel speech enhancement technology, the consistency between different microphones in the microphone array directly affects the performance of the algorithm. The existing scheme proposes an improved algorithm for the multi-channel enhancement technology, which increases the robustness of the algorithm and simultaneously The requirement for performance is reduced. However, when the consistency between microphones is very low, the algorithm performance will still be affected, which will affect the user experience.
发明内容Summary of the Invention
本申请提供一种评估麦克风阵列一致性的方法、设备、装置和系统,能够评估麦克风阵列中不同麦克风之间的一致性,从而根据一致性评估结果指导麦克风阵列的校准和评估多通道增强算法的鲁棒性,提升用户体验。The present application provides a method, device, device and system for evaluating the consistency of a microphone array, which can evaluate the consistency between different microphones in the microphone array, thereby guiding the calibration of the microphone array and evaluating the multi-channel enhancement algorithm based on the consistency evaluation result. Robustness improves user experience.
第一方面,提供了一种评估麦克风阵列一致性的方法,包括:In a first aspect, a method for assessing the consistency of a microphone array is provided, including:
获取N个麦克风分别采集的N个音频信号,该N个麦克风构成麦克风阵列,N≥2;Obtain N audio signals collected by N microphones respectively, and the N microphones form a microphone array, N≥2;
根据该N个音频信号,确定该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值,该参考麦克风为该N个麦克风中的任意一个麦克风;According to the N audio signals, a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone are determined, and the reference microphone is among the N microphones. Any one of the microphones;
根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克 风之间的相位谱差值和/或功率谱差值,对该N个麦克风进行一致性评估。According to the phase spectrum difference and / or power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, a consistency evaluation is performed on the N microphones.
需要说明的是,对该N个麦克风进行一致性评估,可以用于指导麦克风阵列中的麦克风分布,或者指导重新设计麦克风阵列中的麦克风分布,又或者指导重新设计麦克风阵列,又或者评估多通道增强算法的鲁棒性。It should be noted that the consistency evaluation of the N microphones can be used to guide the microphone distribution in the microphone array, or to redesign the microphone distribution in the microphone array, or to redesign the microphone array, or to evaluate multi-channels. Enhance the robustness of the algorithm.
例如,评估结果显示麦克风1与麦克风2的一致性较差时,可以指导调整麦克风1或者麦克风2在麦克风阵列中的分布,或者可以指导重新设计麦克风1或者麦克风2。For example, when the evaluation results show that the consistency between microphone 1 and microphone 2 is poor, the distribution of microphone 1 or microphone 2 in the microphone array can be guided, or the microphone 1 or microphone 2 can be redesigned.
又例如,评估结果显示麦克风1与多个麦克风的一致性都较差时,可以指导调整麦克风1在麦克风阵列中的分布,或者可以指导重新设计麦克风1,或者可以指导重新设计麦克风阵列。For another example, when the evaluation result shows that the consistency between the microphone 1 and multiple microphones is poor, the distribution of the microphone 1 in the microphone array can be guided, or the microphone 1 can be redesigned, or the microphone array can be redesigned.
在本申请实施例中,根据N个麦克风分别采集的N个音频信号,确定各个麦克风与参考麦克风之间的相位谱差值和/或功率谱差值,从而对N个麦克风进行一致性评估,消除麦克风之间的一致性对多通道语音增强算法的影响,提升用户体验。In the embodiment of the present application, a phase spectrum difference and / or a power spectrum difference between each microphone and a reference microphone are determined according to the N audio signals collected by the N microphones respectively, so as to perform consistency evaluation on the N microphones Eliminate the impact of consistency between microphones on multi-channel speech enhancement algorithms and improve user experience.
在一些可能的实现方式中,所述根据该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值,对该N个麦克风进行一致性评估,包括:In some possible implementation manners, performing a consistency evaluation on the N microphones according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone includes:
根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值,评估对应麦克风与该参考麦克风之间的相位一致性。According to the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.
需要说明的是,两个麦克风之间的相位谱差值越小,表示这两个麦克风之间的相位一致性越好。It should be noted that the smaller the phase spectrum difference between the two microphones, the better the phase consistency between the two microphones.
例如,麦克风1与参考麦克风之间的相位谱差值为A,A越小,表示麦克风1与参考麦克风之间的相位一致性越好。For example, the phase spectrum difference between the microphone 1 and the reference microphone is A, and the smaller A is, the better the phase consistency between the microphone 1 and the reference microphone is.
可选地,可以设置一个阈值,若两个麦克风之间的相位谱差值小于这一阈值,则表示这两个麦克风之间的相位一致性满足设计需求,这两个麦克风之间的一致性对多通道语音增强算法的影响可以忽略,或者这两个麦克风之间的一致性对多通道语音增强算法没有影响。Optionally, a threshold can be set. If the phase spectrum difference between the two microphones is smaller than this threshold, it means that the phase consistency between the two microphones meets the design requirements, and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
应注意的是,上述阈值可以根据不同的多通道语音增强算法灵活配置。It should be noted that the above thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
在一些可能的实现方式中,该方法还包括:In some possible implementation manners, the method further includes:
分别测量该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风到声源的距离差;Separately measure the distance difference between each of the N microphones except the reference microphone and the reference microphone to the sound source;
根据所测量的距离差,分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差;Calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;
根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差,分别校准其对应的相位谱差值。According to a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values are calibrated respectively.
例如,麦克风1与参考麦克风之间的固定相位差为A,麦克风1与参考麦克风之间的相位谱差值为B,校准之后,麦克风1与参考麦克风之间的相位谱差值为C,此时,C=B-A。For example, the fixed phase difference between microphone 1 and the reference microphone is A, and the phase spectrum difference between microphone 1 and the reference microphone is B. After calibration, the phase spectrum difference between microphone 1 and the reference microphone is C. In this case, C = BA.
在一些可能的实现方式中,所述根据所测量的距离,分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差,包括:In some possible implementation manners, the calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance includes:
根据公式
Figure PCTCN2018101766-appb-000001
分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差,
According to formula
Figure PCTCN2018101766-appb-000001
Calculate a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone,
其中,Y i(ω)表示第i个麦克风的频谱,Y 1(ω)表示参考麦克风的频谱,ω表示频率,d i表示第i个麦克风与参考麦克风到声源的距离差,c表示声速,2πωd i/c表示第i个麦克风与参考麦克风之间的固定相位差。 Wherein, Y i (ω) represents the frequency spectrum of the i-th microphone, Y 1 (ω) represents the frequency spectrum reference microphone, ω represents the frequency, d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference, c denotes the speed of sound , 2πωd i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
在一些可能的实现方式中,所述根据该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值,对该N个麦克风进行一致性评估,包括:In some possible implementation manners, performing a consistency evaluation on the N microphones according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone includes:
根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值,评估对应麦克风与该参考麦克风之间的幅度一致性。According to the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the amplitude consistency between the corresponding microphone and the reference microphone is evaluated.
需要说明的是,两个麦克风之间的功率谱差值越小,表示这两个麦克风之间的幅度一致性越好。It should be noted that the smaller the power spectrum difference between the two microphones, the better the amplitude consistency between the two microphones.
例如,麦克风1与参考麦克风之间的功率谱差值为A,A越小,表示麦克风1与参考麦克风之间的幅度一致性越好。For example, the power spectrum difference between the microphone 1 and the reference microphone is A, and the smaller the A, the better the amplitude consistency between the microphone 1 and the reference microphone.
可选地,可以设置一个阈值,若两个麦克风之间的功率谱差值小于这一阈值,则表示这两个麦克风之间的幅度一致性满足设计需求,这两个麦克风之间的一致性对多通道语音增强算法的影响可以忽略,或者这两个麦克风之间的一致性对多通道语音增强算法没有影响。Optionally, a threshold can be set. If the power spectrum difference between the two microphones is smaller than this threshold, it means that the amplitude consistency between the two microphones meets the design requirements and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
应注意的是,上述阈值可以根据不同的多通道语音增强算法灵活配置。It should be noted that the above thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
在一些可能的实现方式中,在进行相位一致性评估时,该N个音频信号是在播放扫频信号数据的环境下采集的信号。In some possible implementation manners, when performing phase consistency evaluation, the N audio signals are signals collected in an environment in which the frequency-sweep signal data is played.
在一些可能的实现方式中,在进行幅度一致性评估时,该N个音频信号 是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。In some possible implementation manners, when performing amplitude consistency evaluation, the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
在一些可能的实现方式中,该扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。In some possible implementation manners, the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.
在一些可能的实现方式中,所述根据该N个音频信号,确定该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值,包括:In some possible implementation manners, according to the N audio signals, determining a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. ,include:
将该N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2;Frame each of the N audio signals to obtain K signal frames of equal length, K≥2;
对该K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧;Perform windowing on each of the K signal frames to obtain K windowed signal frames;
对该K个加窗信号帧中的每个加窗信号帧做快速傅氏变换(Fast Fourier Transformation,FFT)变换,得到K个目标信号帧;Perform a Fast Fourier Transform (FFT) transformation on each of the K windowed signal frames to obtain K target signal frames;
根据该每个音频信号对应的该K个目标信号帧,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值。According to the K target signal frames corresponding to each audio signal, a phase spectrum difference value and / or a power spectrum difference value between each of the N microphones except the reference microphone and the reference microphone are determined.
可选地,K表示每个麦克风采集到信号的总帧数。Optionally, K represents the total number of frames of the signal collected by each microphone.
需要说明的是,加窗处理用来消除分帧时带来的截断效应。可选地,可以是对该K个信号帧中的每个信号帧做加汉明窗处理。It should be noted that the windowing process is used to eliminate the truncation effect brought by the framing. Optionally, each of the K signal frames may be processed by adding a Hamming window.
在一些可能的实现方式中,该K个信号帧中任意两个相邻信号帧重叠R%,R>0。例如,该R为25或者50。In some possible implementation manners, any two adjacent signal frames in the K signal frames overlap by R%, and R> 0. For example, R is 25 or 50.
可选地,重叠加窗后信号幅度保持不变。Optionally, the signal amplitude remains unchanged after overlapping and windowing.
应理解,重叠之后的每一帧信号都有上一帧的成分,防止两帧之间的不连续。It should be understood that each frame of the signal after the overlap has a component of the previous frame to prevent discontinuity between the two frames.
在一些可能的实现方式中,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式:In some possible implementation manners, frame the i-th audio signal to obtain K signal frames of equal length and write the following vector forms:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the i-th audio signal, K represents the total number of frames collected by each microphone, and [] T represents the transpose of a vector or a matrix.
在一些可能的实现方式中,所述根据该每个音频信号对应的该K个目标信号帧,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值,包括:In some possible implementation manners, the phase spectrum between each of the N microphones except the reference microphone and the reference microphone is determined according to the K target signal frames corresponding to each audio signal. Differences, including:
根据公式
Figure PCTCN2018101766-appb-000002
确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值,
According to formula
Figure PCTCN2018101766-appb-000002
Determine a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone,
其中,imag()表示取虚部,ln()表示取自然对数,
Figure PCTCN2018101766-appb-000003
表示第i个麦克风与参考麦克风之间的相位谱差值,
Figure PCTCN2018101766-appb-000004
表示参考麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000005
表示第i个麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000006
表示主频率。
Among them, imag () means take the imaginary part, ln () means take the natural logarithm,
Figure PCTCN2018101766-appb-000003
Represents the phase spectrum difference between the i-th microphone and the reference microphone,
Figure PCTCN2018101766-appb-000004
Represents the j-th target signal frame of the reference microphone,
Figure PCTCN2018101766-appb-000005
Represents the j-th target signal frame of the i-th microphone,
Figure PCTCN2018101766-appb-000006
Indicates the main frequency.
在一些可能的实现方式中,所述根据该每个音频信号对应的该K个目标信号帧,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值,包括:In some possible implementation manners, the power spectrum between each of the N microphones except the reference microphone and the reference microphone is determined according to the K target signal frames corresponding to each audio signal. Differences, including:
根据该每个音频信号对应的该K个目标信号帧,确定该每个音频信号的功率谱;Determine the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;
根据该每个音频信号的功率谱,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
在一些可能的实现方式中,所述根据该每个音频信号对应的该K个目标信号帧,确定该每个音频信号的功率谱,包括:In some possible implementation manners, determining the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal includes:
根据公式
Figure PCTCN2018101766-appb-000007
计算该每个音频信号的功率谱,
According to formula
Figure PCTCN2018101766-appb-000007
Calculate the power spectrum of each audio signal,
其中,P i(ω)表示第i个音频信号的功率谱,Y i,j(ω)表示第i个音频信号中的第j个目标信号帧,K表示每个麦克风接收到信号的总帧数,ω表示频率。 Among them, P i (ω) represents the power spectrum of the i-th audio signal, Yi , j (ω) represents the j-th target signal frame in the i-th audio signal, and K represents the total frame of the signal received by each microphone Number, ω represents frequency.
在一些可能的实现方式中,所述根据该每个音频信号的功率谱,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值,包括:In some possible implementation manners, the determining a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the power spectrum of each audio signal includes:
根据公式PD i(ω)=P 1(ω)-P i(ω)计算该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值, Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD i (ω) = P 1 (ω) -P i (ω),
其中,PD i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值,P 1(ω)表示参考麦克风的功率谱,P i(ω)表示第i个麦克风的功率谱。 Among them, PD i (ω) represents the power spectrum difference between the i-th microphone and the reference microphone, P 1 (ω) represents the power spectrum of the reference microphone, and P i (ω) represents the power spectrum of the i-th microphone.
在一些可能的实现方式中,所述获取N个麦克风分别采集的N个音频信号,包括:In some possible implementation manners, the acquiring the N audio signals respectively acquired by the N microphones includes:
确定该N个麦克风在进行音频信号采集时的采样频率F s和FFT点数N fft,使用扬声器播放高斯白噪声数据或者扫频信号数据,该N个麦克风采集该N个音频信号,其中,若该扬声器所播放的数据为扫频信号数据,该扫频信号数据由M+1段长度相等且频率不等的信号构成,
Figure PCTCN2018101766-appb-000008
Determine the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection, use the speaker to play Gaussian white noise data or frequency sweep signal data, and the N microphones collect the N audio signals, where if the The data played by the speaker is frequency sweep signal data. The frequency sweep signal data is composed of M + 1 segments of equal length and different frequencies.
Figure PCTCN2018101766-appb-000008
需要说明的是,FFT点数N fft为偶数,一般为32,64,128,...,1024等,点数越多,运算量的节约就越大。 It should be noted that the number of FFT points N fft is an even number, generally 32,64,128, ..., 1024, etc., the more the number of points, the greater the savings in the amount of calculations.
在一些可能的实现方式中,根据公式
Figure PCTCN2018101766-appb-000009
计算该M+1段信号中每段信号的频率,以及
In some possible implementations, according to the formula
Figure PCTCN2018101766-appb-000009
Calculate the frequency of each signal in the M + 1 segment signal, and
根据公式S i(t)=sin(2πf it)计算该M+1段信号中的每段信号, Calculate each signal in the M + 1 segment signal according to the formula S i (t) = sin (2πf i t),
其中,f i表示第i段信号的频率,F s表示采样频率,N fft表示FFT点数,S i(t)表示第i段信号,且S 1(t)的长度为周期T的整数倍,T=1/f 1Among them, f i represents the frequency of the i-th stage signal, F s represents the sampling frequency, N fft represents the number of FFT points, S i (t) represents the i-th stage signal, and the length of S 1 (t) is an integer multiple of the period T, T = 1 / f 1 .
在一些可能的实现方式中,扬声器所播放的扫频信号数据可以写成以下向量形式:In some possible implementations, the frequency sweep signal data played by the speaker can be written in the following vector form:
S(t)=[S 0(t),S 1(t),…,S M(t)] T S (t) = [S 0 (t), S 1 (t), ..., S M (t)] T
其中,S(t)表示扬声器所播放的扫频信号数据,S i(t)表示第i段信号,
Figure PCTCN2018101766-appb-000010
[] T表示向量或者矩阵的转置。
Among them, S (t) represents the frequency sweep signal data played by the speaker, and S i (t) represents the i-th segment signal,
Figure PCTCN2018101766-appb-000010
[] T represents the transpose of a vector or matrix.
在一些可能的实现方式中,该N个麦克风分别采集到N个音频信号,其中第i个麦克风采集到的音频信号表示为x i(t),且x i(t)可以写成以下向量形式: In some possible implementation manners, the N microphones respectively collect N audio signals, and the audio signal collected by the i-th microphone is represented as x i (t), and x i (t) can be written as the following vector form:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个麦克风采集到的音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the audio signal collected by the i-th microphone, K represents the total number of frames of the signal collected by each microphone, and [] T represents the transpose of the vector or matrix.
在一些可能的实现方式中,所述获取N个麦克风分别采集的N个音频信号,包括:In some possible implementation manners, the acquiring the N audio signals respectively acquired by the N microphones includes:
将该N个麦克风放置于测试房间内,该测试房间内配置有扬声器,该N个麦克风位于该扬声器的正前方;Placing the N microphones in a test room, with speakers arranged in the test room, the N microphones being located directly in front of the speakers;
控制该扬声器播放高斯白噪声数据或者扫频信号数据,以及控制该N个麦克风分别采集该N个音频信号。Controlling the speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to respectively acquire the N audio signals.
在一些可能的实现方式中,该测试房间内具有消音室环境,该扬声器为音频测试专用人工嘴,且该人工嘴在使用之前用标准麦克风校准。In some possible implementations, the test room has an anechoic room environment, the speaker is an artificial mouth dedicated for audio testing, and the artificial mouth is calibrated with a standard microphone before use.
在一些可能的实现方式中,在控制该扬声器播放高斯白噪声数据或者扫频信号数据之前,该方法还包括:In some possible implementation manners, before controlling the speaker to play Gaussian white noise data or frequency sweep signal data, the method further includes:
在安静的环境下,获取该N个麦克风在第一时长T 1内采集的第一音频数据X 1(n); In a quiet environment, acquiring first audio data X 1 (n) collected by the N microphones within a first duration T 1 ;
在播放高斯白噪声数据或者扫频信号数据的环境下,获取该N个麦克风 在第二时长T 2内采集的第二音频数据X 2(n); Acquiring the second audio data X 2 (n) collected by the N microphones within the second duration T 2 under the environment of playing Gaussian white noise data or frequency sweep signal data;
根据公式
Figure PCTCN2018101766-appb-000011
计算信噪比SNR,且确保该SNR大于第一阈值。
According to formula
Figure PCTCN2018101766-appb-000011
Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than the first threshold.
第二方面,提供了一种评估麦克风阵列一致性的设备,包括:In a second aspect, a device for evaluating the consistency of a microphone array is provided, including:
获取单元,用于获取N个麦克风分别采集的N个音频信号,所述N个麦克风构成麦克风阵列,N≥2;An obtaining unit, configured to obtain N audio signals respectively collected by N microphones, where the N microphones form a microphone array, and N≥2;
处理单元,用于根据所述N个音频信号,确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,所述参考麦克风为所述N个麦克风中的任意一个麦克风;A processing unit, configured to determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals, and The reference microphone is any one of the N microphones;
所述处理单元,还用于根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,对所述N个麦克风进行一致性评估。The processing unit is further configured to perform an analysis on the N based on a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. Microphones for consistency assessment.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,评估对应麦克风与所述参考麦克风之间的相位一致性。According to the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.
在一些可能的实现方式中,所述处理单元还用于:In some possible implementation manners, the processing unit is further configured to:
分别测量所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风到声源的距离差;Separately measure a distance difference between each of the N microphones except the reference microphone and the reference microphone to a sound source;
根据所测量的距离差,分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差;Calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;
根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,分别校准其对应的相位谱差值。According to a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values thereof are respectively calibrated.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据公式
Figure PCTCN2018101766-appb-000012
分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,
According to formula
Figure PCTCN2018101766-appb-000012
Respectively calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone,
其中,Y i(ω)表示第i个麦克风的频谱,Y 1(ω)表示参考麦克风的频谱,ω表示频率,d i表示第i个麦克风与参考麦克风到声源的距离差,c表示声速,2πωd i/c表示第i个麦克风与参考麦克风之间的固定相位差。 Wherein, Y i (ω) represents the frequency spectrum of the i-th microphone, Y 1 (ω) represents the frequency spectrum reference microphone, ω represents the frequency, d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference, c denotes the speed of sound , 2πωd i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值,评估对应麦克风与所述参考麦克风之间的幅度一致性。An amplitude consistency between a corresponding microphone and the reference microphone is evaluated according to a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
在一些可能的实现方式中,所述N个音频信号是在播放扫频信号数据的环境下采集的信号。In some possible implementation manners, the N audio signals are signals collected in an environment in which the frequency sweep signal data is played.
在一些可能的实现方式中,所述N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。In some possible implementation manners, the N audio signals are signals collected in an environment in which Gaussian white noise data or frequency-sweep signal data is played.
在一些可能的实现方式中,所述扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。In some possible implementation manners, the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
将所述N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2;Framing each of the N audio signals to obtain K signal frames of equal length, K≥2;
对所述K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧;Performing windowing processing on each of the K signal frames to obtain K windowed signal frames;
对所述K个加窗信号帧中的每个加窗信号帧做FFT变换,得到K个目标信号帧;Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames;
根据所述每个音频信号对应的所述K个目标信号帧,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值。Determine, according to the K target signal frames corresponding to each audio signal, a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone and / or Power spectrum difference.
在一些可能的实现方式中,所述K个信号帧中任意两个相邻信号帧重叠R%,R>0。In some possible implementation manners, any two adjacent signal frames in the K signal frames overlap by R%, and R> 0.
在一些可能的实现方式中,所述R为25或者50。In some possible implementation manners, the R is 25 or 50.
在一些可能的实现方式中,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式:In some possible implementation manners, frame the i-th audio signal to obtain K signal frames of equal length and write the following vector forms:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the i-th audio signal, K represents the total number of frames collected by each microphone, and [] T represents the transpose of a vector or a matrix.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据公式
Figure PCTCN2018101766-appb-000013
确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,
According to formula
Figure PCTCN2018101766-appb-000013
Determining a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone,
其中,imag()表示取虚部,ln()表示取自然对数,
Figure PCTCN2018101766-appb-000014
表示第i个麦克风与参考麦克风之间的相位谱差值,
Figure PCTCN2018101766-appb-000015
表示参考麦克风的第j个目 标信号帧,
Figure PCTCN2018101766-appb-000016
表示第i个麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000017
表示主频率。
Among them, imag () means take the imaginary part, ln () means take the natural logarithm,
Figure PCTCN2018101766-appb-000014
Represents the phase spectrum difference between the i-th microphone and the reference microphone,
Figure PCTCN2018101766-appb-000015
Represents the j-th target signal frame of the reference microphone,
Figure PCTCN2018101766-appb-000016
Represents the j-th target signal frame of the i-th microphone,
Figure PCTCN2018101766-appb-000017
Indicates the main frequency.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据所述每个音频信号对应的所述K个目标信号帧,确定所述每个音频信号的功率谱;Determining a power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;
根据所述每个音频信号的功率谱,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据公式
Figure PCTCN2018101766-appb-000018
计算所述每个音频信号的功率谱,
According to formula
Figure PCTCN2018101766-appb-000018
Calculating a power spectrum of each audio signal,
其中,P i(ω)表示第i个音频信号的功率谱,Y i,j(ω)表示第i个音频信号中的第j个目标信号帧,K表示每个麦克风采集到信号的总帧数,ω表示频率。 Among them, P i (ω) represents the power spectrum of the i-th audio signal, Yi , j (ω) represents the j-th target signal frame in the i-th audio signal, and K represents the total frame of the signal collected by each microphone Number, ω represents frequency.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
根据公式PD i(ω)=P 1(ω)-P i(ω)计算所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值, Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD i (ω) = P 1 (ω) -P i (ω),
其中,PD i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值,P 1(ω)表示参考麦克风的功率谱,P i(ω)表示第i个麦克风的功率谱。 Among them, PD i (ω) represents the power spectrum difference between the i-th microphone and the reference microphone, P 1 (ω) represents the power spectrum of the reference microphone, and P i (ω) represents the power spectrum of the i-th microphone.
在一些可能的实现方式中,所述处理单元具体用于:In some possible implementation manners, the processing unit is specifically configured to:
确定所述N个麦克风在进行音频信号采集时的采样频率F s和FFT点数N fft,使用扬声器播放高斯白噪声数据或者扫频信号数据,控制所述N个麦克风采集所述N个音频信号,其中,若所述扬声器所播放的数据为扫频信号数据,所述扫频信号数据由M+1段长度相等且频率不等的信号构成,
Figure PCTCN2018101766-appb-000019
Determining the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection, using a speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to acquire the N audio signals, Wherein, if the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
Figure PCTCN2018101766-appb-000019
在一些可能的实现方式中,所述处理单元还用于:In some possible implementation manners, the processing unit is further configured to:
根据公式
Figure PCTCN2018101766-appb-000020
计算所述M+1段信号中每段信号的频率,以及
According to formula
Figure PCTCN2018101766-appb-000020
Calculating the frequency of each signal in the M + 1 segment signal, and
根据公式S i(t)=sin(2πf it)计算所述M+1段信号中的每段信号, Calculate each signal in the M + 1 segment signals according to the formula S i (t) = sin (2πf i t),
其中,f i表示第i段信号的频率,F s表示采样频率,N fft表示FFT点数,S i(t)表示第i段信号,且S 1(t)的长度为周期T的整数倍,T=1/f 1Among them, f i represents the frequency of the i-th stage signal, F s represents the sampling frequency, N fft represents the number of FFT points, S i (t) represents the i-th stage signal, and the length of S 1 (t) is an integer multiple of the period T, T = 1 / f 1 .
在一些可能的实现方式中,所述扬声器所播放的扫频信号数据写成以下向量形式:In some possible implementation manners, the frequency sweep signal data played by the speaker is written in the following vector form:
S(t)=[S 0(t),S 1(t),…,S M(t)] T S (t) = [S 0 (t), S 1 (t), ..., S M (t)] T
其中,S(t)表示扬声器所播放的扫频信号数据,S i(t)表示第i段信号,
Figure PCTCN2018101766-appb-000021
[] T表示向量或者矩阵的转置。
Among them, S (t) represents the frequency sweep signal data played by the speaker, and S i (t) represents the i-th segment signal,
Figure PCTCN2018101766-appb-000021
[] T represents the transpose of a vector or matrix.
在一些可能的实现方式中,所述N个麦克风分别采集到N个音频信号,其中第i个麦克风采集到的音频信号表示为x i(t),且x i(t)可以写成以下向量形式: In some possible implementation manners, the N microphones respectively collect N audio signals, and the audio signal collected by the i-th microphone is represented as x i (t), and x i (t) can be written as the following vector form :
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个麦克风采集到的音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the audio signal collected by the i-th microphone, K represents the total number of frames of the signal collected by each microphone, and [] T represents the transpose of the vector or matrix.
在一些可能的实现方式中,所述获取单元具体用于:In some possible implementation manners, the obtaining unit is specifically configured to:
将所述N个麦克风放置于测试房间内,所述测试房间内配置有扬声器,所述N个麦克风位于所述扬声器的正前方;Placing the N microphones in a test room, where speakers are arranged in the test room, and the N microphones are located directly in front of the speakers;
控制所述扬声器播放高斯白噪声数据或者扫频信号数据,以及控制所述N个麦克风分别采集所述N个音频信号。Controlling the speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to collect the N audio signals, respectively.
在一些可能的实现方式中,所述测试房间内具有消音室环境,所述扬声器为音频测试专用人工嘴,且所述人工嘴在使用之前用标准麦克风校准。In some possible implementation manners, the test room has an anechoic room environment, the speaker is an artificial mouth dedicated for audio testing, and the artificial mouth is calibrated with a standard microphone before use.
在一些可能的实现方式中,在所述处理单元控制所述扬声器播放高斯白噪声数据或者扫频信号数据之前,所述获取单元还用于:In some possible implementation manners, before the processing unit controls the speaker to play Gaussian white noise data or frequency sweep signal data, the obtaining unit is further configured to:
在安静的环境下,获取所述N个麦克风在第一时长T 1内采集的第一音频数据X 1(n); In a quiet environment, acquiring first audio data X 1 (n) collected by the N microphones within a first duration T 1 ;
在播放高斯白噪声数据或者扫频信号数据的环境下,获取所述N个麦克风在第二时长T 2内采集的第二音频数据X 2(n); Acquiring second audio data X 2 (n) collected by the N microphones within a second duration T 2 in an environment where Gaussian white noise data or frequency sweep signal data is played;
触发所述处理单元根据公式
Figure PCTCN2018101766-appb-000022
计算信噪比SNR,且确保所述SNR大于第一阈值。
Triggering the processing unit according to a formula
Figure PCTCN2018101766-appb-000022
Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than a first threshold.
第三方面,提供了一种评估麦克风阵列一致性的装置,包括:In a third aspect, a device for evaluating the consistency of a microphone array is provided, including:
存储器,用于存储程序和数据;以及Memory for storing programs and data; and
处理器,用于调用并运行所述存储器中存储的程序和数据;A processor, configured to call and run programs and data stored in the memory;
该装置被配置为执行上述第一方面或其任意可能的实现方式中的方法。The apparatus is configured to perform the method in the first aspect described above or any possible implementation thereof.
第四方面,提供了评估麦克风阵列一致性的系统,包括:In a fourth aspect, a system for assessing the consistency of a microphone array is provided, including:
构成麦克风阵列的N个麦克风,N≥2;N microphones forming a microphone array, N≥2;
至少一个音频源;At least one audio source;
装置,包括用于存储程序和数据的存储器和用于调用并运行所述存储器中存储的程序和数据的处理器,该装置被配置为上述第一方面或其任意可能的实现方式中的方法。The device comprises a memory for storing programs and data and a processor for calling and running the programs and data stored in the memory, and the device is configured as the method in the first aspect or any possible implementation thereof.
第五方面,提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码可以用于指示执行上述第一方面或其任意可能的实现方式中的方法。According to a fifth aspect, a computer storage medium is provided, and the computer storage medium stores program code, and the program code may be used to instruct execution of the method in the first aspect or any possible implementation manner thereof.
第六方面,提供了一种包含指令的计算机程序产品,其在计算机上运行时,使得计算机执行上述第一方面或其任意可能的实现方式中的方法。According to a sixth aspect, a computer program product containing instructions is provided, which, when run on a computer, causes the computer to execute the method in the first aspect or any possible implementation thereof.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1是本申请实施例的评估麦克风阵列一致性的方法的示意性流程图。FIG. 1 is a schematic flowchart of a method for evaluating consistency of a microphone array according to an embodiment of the present application.
图2是根据本申请实施例的测试环境示意图。FIG. 2 is a schematic diagram of a test environment according to an embodiment of the present application.
图3是根据本申请实施例的计算相位谱差值的示意图。FIG. 3 is a schematic diagram of calculating a phase spectrum difference according to an embodiment of the present application.
图4是根据本申请实施例的计算功率谱差值的示意图。FIG. 4 is a schematic diagram of calculating a power spectrum difference according to an embodiment of the present application.
图5是根据本申请实施例的两麦克风之间的相位谱差值的示意图。FIG. 5 is a schematic diagram of a phase spectrum difference between two microphones according to an embodiment of the present application.
图6是根据本申请实施例的两麦克风之间校准之后的相位谱差值的示意图。6 is a schematic diagram of a phase spectrum difference value after calibration between two microphones according to an embodiment of the present application.
图7a是根据本申请实施例的两麦克风的功率谱的示意图。FIG. 7a is a schematic diagram of a power spectrum of two microphones according to an embodiment of the present application.
图7b是根据本申请实施例的两麦克风之间的功率谱差值的示意图。FIG. 7b is a schematic diagram of a power spectrum difference between two microphones according to an embodiment of the present application.
图8是根据本申请实施例的一种评估麦克风阵列一致性的设备的示意性结构图。FIG. 8 is a schematic structural diagram of a device for evaluating consistency of a microphone array according to an embodiment of the present application.
图9是根据本申请实施例的一种评估麦克风阵列一致性的装置的示意性结构图。FIG. 9 is a schematic structural diagram of an apparatus for evaluating consistency of a microphone array according to an embodiment of the present application.
图10是根据本申请实施例的一种评估麦克风阵列一致性的系统的示意性结构图。FIG. 10 is a schematic structural diagram of a system for evaluating consistency of a microphone array according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述。The technical solutions in the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application.
麦克风阵列(Microphone Array)是指由一定数目的麦克风(声学传感器)组成,用来对声场的空间特性进行采样并处理的系统。利用两个麦克风 接收到声波的相位之间的差异对声波进行过滤,能最大限度将环境背景声音清除掉,只剩下需要的声波。Microphone array refers to a system composed of a certain number of microphones (acoustic sensors) that is used to sample and process the spatial characteristics of the sound field. The difference between the phases of the sound waves received by the two microphones is used to filter the sound waves, which can eliminate the ambient background sound to the maximum, leaving only the required sound waves.
多通道语音增强技术算法假设条件是麦克风阵列中的多个麦克风的目标语音成分高相关性,目标语音与非目标干扰不相关,因此麦克风阵列中不同麦克风之间的一致性直接影响算法性能。The multi-channel speech enhancement technology algorithm assumes that the target speech components of multiple microphones in the microphone array are highly correlated, and the target speech is not related to non-target interference, so the consistency between different microphones in the microphone array directly affects the algorithm performance.
麦克风一致性的定量评估,可用于指导麦克风的设计和麦克风阵列的设计,麦克风阵列的电路、电子元器件、声学结构都会影响麦克风的一致性,在设计麦克风阵列时可逐项测试各种因素对一致性的影响,从而使麦克风一致性的设计达到系统要求。Quantitative evaluation of microphone consistency can be used to guide the design of microphones and the design of microphone arrays. Microphone array circuits, electronic components, and acoustic structures all affect the consistency of microphones. When designing a microphone array, various factors can be tested item by item. The effect of consistency, so that the design of microphone consistency meets the system requirements.
麦克风一致性的定量评估,可用于比较不同算法的鲁棒性,在达到相同语音增强性能的前提性,对一致性指标要求越低,算法鲁棒性越好。Quantitative evaluation of microphone consistency can be used to compare the robustness of different algorithms. The lower the requirement for consistency indicators, the better the algorithm's robustness when the premise of achieving the same speech enhancement performance is achieved.
在本申请实施例中,从幅度谱差值和相位谱差值两个方面衡量一致性,具有客观性和准确性,并且定量的一致性评估方法能够客观的指导麦克风阵列的设计,也能够客观的比较多通道语音增强算法的鲁棒性。In the embodiments of the present application, consistency is measured from two aspects: amplitude spectrum difference and phase spectrum difference, which has objectivity and accuracy, and the quantitative consistency evaluation method can objectively guide the design of the microphone array and can also objectively Comparing the robustness of multi-channel speech enhancement algorithms.
以下,结合图1至图7,详细介绍本申请实施例的评估麦克风阵列一致性的方法。Hereinafter, a method for evaluating the consistency of a microphone array according to an embodiment of the present application will be described in detail with reference to FIGS. 1 to 7.
图1是本申请一个实施例的评估麦克风阵列一致性的方法的示意性流程图。应理解,图1示出了该方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还可以执行其他操作或者图1中的各个操作的变形。该方法可以由评估麦克风阵列一致性的装置执行,其中,该评估麦克风阵列一致性的装置可以是手机、平板电脑、便携式电脑、个人数字助理(Personal Digital Assistant,PDA)等等。FIG. 1 is a schematic flowchart of a method for evaluating consistency of a microphone array according to an embodiment of the present application. It should be understood that FIG. 1 shows steps or operations of the method, but these steps or operations are merely examples, and other operations or variations of each operation in FIG. 1 may be performed in the embodiment of the present application. The method may be executed by a device for evaluating the consistency of the microphone array, where the device for evaluating the consistency of the microphone array may be a mobile phone, a tablet computer, a portable computer, a Personal Digital Assistant (PDA), or the like.
S110,获取N个麦克风分别采集的N个音频信号,该N个麦克风构成麦克风阵列,N≥2。S110: Obtain N audio signals collected by N microphones respectively, where the N microphones form a microphone array, and N ≧ 2.
在对N个麦克风进行一致性评估时,需要限制N个麦克风所处的环境,即该N个音频信号是在特殊的测试环境下采集的。When performing consistency evaluation on N microphones, it is necessary to limit the environment in which the N microphones are located, that is, the N audio signals are collected in a special test environment.
具体地,如图2所示,将由该N个麦克风构成的麦克风阵列201放置于测试房间202内,且在该测试房间202内配置有扬声器203,该麦克风阵列201具体位于该扬声器203的正前方,该麦克风阵列201与该扬声器203连接诸如计算机的控制设备204。该控制设备204可以控制该扬声器203播放特定的音频数据,例如,播放高斯白噪声数据或者扫频信号数据,同时,该 控制设备204可以从该麦克风阵列201处获取该N个麦克风分布采集的N个音频信号。Specifically, as shown in FIG. 2, a microphone array 201 composed of the N microphones is placed in a test room 202, and a speaker 203 is disposed in the test room 202. The microphone array 201 is located directly in front of the speaker 203. The microphone array 201 is connected to the speaker 203, such as a computer control device 204. The control device 204 can control the speaker 203 to play specific audio data, for example, to play Gaussian white noise data or frequency-sweep signal data. At the same time, the control device 204 can obtain the N microphones from the microphone array 201. Audio signals.
需要注意的是,麦克风一致性评估要求采集的音频信号的信噪比足够高,背景噪声足够弱,因此测试环境要求在安静环境下。特别地,测试房间202内要求具有消音室环境。扬声器203要求信噪比较高,且频率响应曲线平坦,特别地,扬声器使用音频测试专用人工嘴,且使用之前用标准麦克风校准。麦克风阵列201放置在扬声器203的正前方,特别地,要求放置在标准麦克风校准的位置。It should be noted that the microphone consistency evaluation requires that the signal-to-noise ratio of the collected audio signal is sufficiently high and the background noise is sufficiently weak, so the test environment is required to be in a quiet environment. In particular, an anechoic room environment is required in the test room 202. The speaker 203 requires a high signal-to-noise ratio and a flat frequency response curve. In particular, the speaker uses an artificial mouth dedicated for audio testing, and is calibrated with a standard microphone before use. The microphone array 201 is placed directly in front of the speaker 203, and in particular, it is required to be placed at a position calibrated by a standard microphone.
可选地,在进行正式的音频信号采集之前,还需要对上述测试环境进行信噪比(signal-to-noise ratio,SNR)检测。Optionally, before performing formal audio signal acquisition, it is also necessary to perform signal-to-noise ratio (SNR) detection on the above-mentioned test environment.
具体地,在如图2所示的测试环境下,首先,在安静的环境下(即扬声器203处于关闭状态),获取该N个麦克风在第一时长T 1内采集的第一音频数据X 1(n);然后,在播放高斯白噪声数据或者扫频信号数据的环境下(即该控制设备204控制该扬声器203播放高斯白噪声数据或者扫频信号数据),获取该N个麦克风在第二时长T 2内采集的第二音频数据X 2(n);接着,根据如下公式1计算SNR;最后,当SNR大于设定阈值时,则检测通过,否则检测不通过。 Specifically, in the test environment shown in FIG. 2, first, in a quiet environment (that is, the speaker 203 is turned off), first audio data X 1 collected by the N microphones within a first duration T 1 is acquired. (n); then, in the environment where Gaussian white noise data or frequency-sweep signal data is played (that is, the control device 204 controls the speaker 203 to play Gaussian white noise data or frequency-sweep signal data), obtain the N microphones at the second The second audio data X 2 (n) collected within the duration T 2 ; then, the SNR is calculated according to the following formula 1; finally, when the SNR is greater than a set threshold, the detection passes, otherwise the detection fails.
Figure PCTCN2018101766-appb-000023
Figure PCTCN2018101766-appb-000023
其中,T 1表示第一时长,T 2表示第二时长,X 1(n)表示第一音频数据,X 2(n)表示第二音频数据。 T 1 represents the first duration, T 2 represents the second duration, X 1 (n) represents the first audio data, and X 2 (n) represents the second audio data.
需要说明的是,若检测不通过,需要对上述测试环境进行调整或者校准,消除一些可能对性噪比造成影响的因素,直至根据上述公式1所计算的SNR大于设定阈值。It should be noted that if the test fails, the above test environment needs to be adjusted or calibrated to eliminate some factors that may affect the sexual noise ratio, until the SNR calculated according to the above formula 1 is greater than a set threshold.
可选地,在本申请实施例中,使用上述图2所示的测试环境采集音频信号具体可以包括:Optionally, in the embodiment of the present application, acquiring audio signals by using the test environment shown in FIG. 2 described above may specifically include:
确定该N个麦克风在进行音频信号采集时的采样频率F s和FFT点数N fft,使用扬声器播放高斯白噪声数据或者扫频信号数据,该N个麦克风采集该N个音频信号。 The sampling frequency F s and the number of FFT points N fft of the N microphones during audio signal collection are determined, and Gaussian white noise data or frequency-sweep signal data is played using a speaker, and the N microphones collect the N audio signals.
可选地,FFT点数N fft为偶数,一般为32,64,128,...,1024等,点数越多, 运算量的节约就越大。 Optionally, the number of FFT points N fft is an even number, generally 32, 64, 128, ..., 1024, etc., the more the number of points, the greater the savings in the amount of computation.
需要说明的是,若该扬声器所播放的数据为扫频信号数据,该扫频信号数据由M+1段长度相等且频率不等的信号构成,
Figure PCTCN2018101766-appb-000024
It should be noted that if the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
Figure PCTCN2018101766-appb-000024
可选地,可以根据如下公式2计算该M+1段信号中每段信号的频率,以及根据如下公式3计算该M+1段信号中的每段信号。Optionally, the frequency of each signal in the M + 1 segment signal can be calculated according to the following formula 2, and each signal in the M + 1 segment signal can be calculated according to the following formula 3.
Figure PCTCN2018101766-appb-000025
Figure PCTCN2018101766-appb-000025
其中,f i是第i段信号的频率,F s是采样频率,N fft表示FFT点数。 Among them, f i is the frequency of the ith signal, F s is the sampling frequency, and N fft is the number of FFT points.
S i(t)=sin(2πf it)          公式3 S i (t) = sin (2πf i t) Equation 3
其中,S i(t)表示第i段信号,f i是第i段信号的频率。 Where, S i (t) represents the signal paragraph i, f i is the i-th frequency band signal.
需要说明的是,第一段信号S 1(t)的长度为周期T的整数倍,T=1/f 1It should be noted that the length of the first segment signal S 1 (t) is an integer multiple of the period T, and T = 1 / f 1 .
可选地,扬声器所播放的扫频信号数据可以写成以下向量形式:Optionally, the frequency sweep signal data played by the speaker can be written in the following vector form:
S(t)=[S 0(t),S 1(t),…,S M(t)] T S (t) = [S 0 (t), S 1 (t), ..., S M (t)] T
其中,S(t)表示扬声器所播放的扫频信号数据,S i(t)表示第i段信号,
Figure PCTCN2018101766-appb-000026
[] T表示向量或者矩阵的转置。
Among them, S (t) represents the frequency sweep signal data played by the speaker, and S i (t) represents the i-th segment signal,
Figure PCTCN2018101766-appb-000026
[] T represents the transpose of a vector or matrix.
可选地,N个麦克风分别采集到N个音频信号,其中第i个麦克风采集到的音频信号表示为x i(t),且x i(t)可以写成以下向量形式: Optionally, the N microphones respectively acquire N audio signals, and the audio signal collected by the i-th microphone is represented as x i (t), and x i (t) can be written as the following vector form:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个麦克风采集到的音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the audio signal collected by the i-th microphone, K represents the total number of frames of the signal collected by each microphone, and [] T represents the transpose of the vector or matrix.
S120,根据该N个音频信号,确定该N个麦克风中除参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值,该参考麦克风为该N个麦克风中的任意一个麦克风。S120. Determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals. The reference microphones are the N microphones. Any one of the microphones.
可选地,在本申请实施例中,在该N个音频信号采集到之后,可以通过音频信号分帧,对每帧音频信号加窗,对每帧加窗信号做FFT变换,求不同麦克风之间的相位谱差值。Optionally, in the embodiment of the present application, after the N audio signals are collected, the audio signals may be framed, and the audio signals of each frame may be windowed, and the windowed signals of each frame may be FFT-transformed to obtain different microphones. Phase spectrum difference between the two.
具体地,如图3所示,假设该N个音频信号为x 1(t),x 2(t),…,x N(t),将该N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2,例如,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式: Specifically, as shown in FIG. 3, it is assumed that the N audio signals are x 1 (t), x 2 (t), ..., x N (t), and each of the N audio signals is divided into Frames, to obtain K signal frames of equal length, K≥2. For example, frame the i-th audio signal to obtain K signal frames of equal length and write the following vector form:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置; Among them, x i (t) represents the i-th audio signal, K represents the total number of frames collected by each microphone, and [] T represents the transpose of a vector or a matrix;
对该K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧,例如,对第i个音频信号的第j个帧x i,j加窗,得到第i个音频信号的第j个加窗信号帧y i,j=x i,j×Win; Perform windowing on each of the K signal frames to obtain K windowed signal frames, for example, window the j-th frame x i, j of the i-th audio signal to obtain the i-th audio frame. The j-th windowed signal frame of the signal y i, j = x i, j × Win;
对该K个加窗信号帧中的每个加窗信号帧做FFT变换,得到K个目标信号帧,例如,对第i个音频信号的第j个加窗信号帧y i,j(t)做FFT变换,得到第i个音频信号的第j个目标信号帧Y i,j(ω); Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames, for example, the jth windowed signal frame y i, j (t) of the i-th audio signal FFT transform to get the j-th target signal frame Y i, j (ω) of the i-th audio signal;
根据该每个音频信号对应的该K个目标信号帧,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值,例如,假设第j个目标信号帧的主频率为
Figure PCTCN2018101766-appb-000027
则可以根据以下公式4计算第i个麦克风与参考麦克风在主频率为
Figure PCTCN2018101766-appb-000028
处的相位谱差值。
Determine the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the K target signal frames corresponding to each audio signal, for example, assuming the jth target The main frequency of the signal frame is
Figure PCTCN2018101766-appb-000027
Then, the main frequency of the i-th microphone and the reference microphone can be calculated according to the following formula 4.
Figure PCTCN2018101766-appb-000028
Phase spectrum difference at.
Figure PCTCN2018101766-appb-000029
Figure PCTCN2018101766-appb-000029
其中,imag()表示取虚部,ln()表示取自然对数,
Figure PCTCN2018101766-appb-000030
表示第i个麦克风与参考麦克风之间的相位谱差值,
Figure PCTCN2018101766-appb-000031
表示参考麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000032
表示第i个麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000033
表示主频率。
Among them, imag () means take the imaginary part, ln () means take the natural logarithm,
Figure PCTCN2018101766-appb-000030
Represents the phase spectrum difference between the i-th microphone and the reference microphone,
Figure PCTCN2018101766-appb-000031
Represents the j-th target signal frame of the reference microphone,
Figure PCTCN2018101766-appb-000032
Represents the j-th target signal frame of the i-th microphone,
Figure PCTCN2018101766-appb-000033
Indicates the main frequency.
需要说明的是,在上述图3中,是以第一个麦克风为参考麦克风的,即分别计算除该第一麦克风之外的每个麦克风与该第一麦克风之间的相位谱差值,且第一麦克风对应音频信号x 1(t),第二麦克风对应音频信号x 2(t),…,第N麦克风对应音频信号x N(t)。 It should be noted that in FIG. 3 above, the first microphone is used as the reference microphone, that is, the phase spectrum difference between each microphone except the first microphone and the first microphone is calculated separately, and The first microphone corresponds to the audio signal x 1 (t), the second microphone corresponds to the audio signal x 2 (t), ..., and the Nth microphone corresponds to the audio signal x N (t).
可选地,K表示每个麦克风接收到信号的总帧数。Optionally, K represents the total number of frames of signals received by each microphone.
需要说明的是,加窗处理用来消除分帧时带来的截断效应。可选地,可以是对该K个信号帧中的每个信号帧做加汉明窗处理。It should be noted that the windowing process is used to eliminate the truncation effect brought by the framing. Optionally, each of the K signal frames may be processed by adding a Hamming window.
在一些可能的实现方式中,该K个信号帧中任意两个相邻信号帧重叠R%,R>0。例如,该R为25或者50。换句话说,该K个信号帧中任意两个相邻信号帧重叠25%或者50%。In some possible implementation manners, any two adjacent signal frames in the K signal frames overlap by R%, and R> 0. For example, R is 25 or 50. In other words, any two adjacent signal frames in the K signal frames overlap by 25% or 50%.
可选地,重叠加窗后信号幅度保持不变。Optionally, the signal amplitude remains unchanged after overlapping and windowing.
应理解,重叠之后的每一帧信号都有上一帧的成分,防止两帧之间的不连续。It should be understood that each frame of the signal after the overlap has a component of the previous frame to prevent discontinuity between the two frames.
可选地,在本申请实施例中,在进行相位一致性评估时,该N个音频信号是在播放扫频信号数据的环境下采集的信号。换句话说,在计算上述相位 谱差值时,该N个音频信号是在播放扫频信号数据的环境下采集的信号。Optionally, in the embodiment of the present application, when the phase consistency evaluation is performed, the N audio signals are signals collected in an environment where the frequency sweep signal data is played. In other words, when calculating the above-mentioned phase spectrum difference value, the N audio signals are signals collected in an environment where the frequency sweep signal data is played.
因此,可以计算出任意频率ω的相位差,即得到第i个麦克风与参考麦克风之间的相位谱差值PDiff i(ω),即上述
Figure PCTCN2018101766-appb-000034
Therefore, the phase difference of any frequency ω can be calculated, that is, the phase spectrum difference PDiff i (ω) between the i-th microphone and the reference microphone, that is, the above
Figure PCTCN2018101766-appb-000034
可选地,在本申请实施例中,在该N个音频信号采集到之后,可以通过音频信号分帧,对每帧音频信号加窗,对每帧加窗信号做FFT变换,求FFT变换之后的每帧信号的功率谱,求不同麦克风之间的功率谱差值。Optionally, in the embodiment of the present application, after the N audio signals are collected, the audio signals may be framed, and each frame of the audio signal is windowed, and the windowed signal of each frame is subjected to FFT transformation. After the FFT transformation is obtained, The power spectrum of each frame of the signal, find the power spectrum difference between different microphones.
具体地,如图4所示,假设该N个音频信号为x 1(t),x 2(t),…,x N(t),将该N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2,例如,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式: Specifically, as shown in FIG. 4, it is assumed that the N audio signals are x 1 (t), x 2 (t), ..., x N (t), and each of the N audio signals is divided. Frames, to obtain K signal frames of equal length, K≥2. For example, frame the i-th audio signal to obtain K signal frames of equal length and write the following vector form:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个音频信号,K表示每个麦克风接收到信号的总帧数,[] T表示向量或者矩阵的转置; Among them, x i (t) represents the i-th audio signal, K represents the total number of frames received by each microphone, and [] T represents the transpose of a vector or a matrix;
对该K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧,例如,对第i个音频信号的第j个帧x i,j加窗,得到第i个音频信号的第j个加窗信号帧y i,j=x i,j×Win; Perform windowing on each of the K signal frames to obtain K windowed signal frames, for example, window the j-th frame x i, j of the i-th audio signal to obtain the i-th audio frame. The j-th windowed signal frame of the signal y i, j = x i, j × Win;
对该K个加窗信号帧中的每个加窗信号帧做FFT变换,得到K个目标信号帧,例如,对第i个音频信号的第j个加窗信号帧y i,j(t)做FFT变换,得到第i个音频信号的第j个目标信号帧Y i,j(ω); Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames, for example, the jth windowed signal frame y i, j (t) of the i-th audio signal FFT transform to get the j-th target signal frame Y i, j (ω) of the i-th audio signal;
根据该每个音频信号对应的该K个目标信号帧,确定该每个音频信号的功率谱,例如,根据以下公式5计算第i个音频信号的功率谱;Determine the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal, for example, calculate the power spectrum of the i-th audio signal according to the following formula 5;
根据该每个音频信号的功率谱,确定该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值,例如,根据以下公式6计算第i个麦克风与该参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, determine the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. For example, calculate the i-th microphone and The power spectrum difference between the reference microphones.
Figure PCTCN2018101766-appb-000035
Figure PCTCN2018101766-appb-000035
其中,P i(ω)表示第i个音频信号的功率谱,Y i,j(ω)表示第i个音频信号中的第j个目标信号帧,ω表示频率,K表示每个麦克风采集到信号的总帧数。 Among them, P i (ω) represents the power spectrum of the i-th audio signal, Yi , j (ω) represents the j-th target signal frame in the i-th audio signal, ω represents the frequency, and K represents the data collected by each microphone. The total number of frames of the signal.
PD i(ω)=P 1(ω)-P i(ω)         公式6 PD i (ω) = P 1 (ω)-P i (ω) Equation 6
其中,PD i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值,P 1(ω)表示参考麦克风的功率谱,P i(ω)表示第i个麦克风的功率谱。 Among them, PD i (ω) represents the power spectrum difference between the i-th microphone and the reference microphone, P 1 (ω) represents the power spectrum of the reference microphone, and P i (ω) represents the power spectrum of the i-th microphone.
需要说明的是,在上述图4中,是以第一个麦克风为参考麦克风的,即 分别计算除该第一麦克风之外的每个麦克风与该第一麦克风之间的功率谱差值,且第一麦克风对应音频信号x 1(t),第二麦克风对应音频信号x 2(t),…,第N麦克风对应音频信号x N(t)。 It should be noted that in FIG. 4 above, the first microphone is used as the reference microphone, that is, the power spectrum difference between each microphone except the first microphone and the first microphone is calculated separately, The first microphone corresponds to the audio signal x 1 (t), the second microphone corresponds to the audio signal x 2 (t), ..., and the Nth microphone corresponds to the audio signal x N (t).
需要说明的是,加窗处理用来消除分帧时带来的截断效应。可选地,可以是对该K个信号帧中的每个信号帧做加汉明窗处理。It should be noted that the windowing process is used to eliminate the truncation effect brought by the framing. Optionally, each of the K signal frames may be processed by adding a Hamming window.
在一些可能的实现方式中,该K个信号帧中任意两个相邻信号帧重叠R%,R>0。例如,该R为25或者50。换句话说,该K个信号帧中任意两个相邻信号帧重叠25%或者50%。In some possible implementation manners, any two adjacent signal frames in the K signal frames overlap by R%, and R> 0. For example, R is 25 or 50. In other words, any two adjacent signal frames in the K signal frames overlap by 25% or 50%.
可选地,重叠加窗后信号幅度保持不变。Optionally, the signal amplitude remains unchanged after overlapping and windowing.
应理解,重叠之后的每一帧信号都有上一帧的成分,防止两帧之间的不连续。It should be understood that each frame of the signal after the overlap has a component of the previous frame to prevent discontinuity between the two frames.
可选地,在本申请实施例中,在进行幅度一致性评估时,该N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。换句话说,在计算上述功率谱差值时,该N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。Optionally, in the embodiment of the present application, when the amplitude consistency evaluation is performed, the N audio signals are signals collected in an environment in which Gaussian white noise data or frequency-sweep signal data is played. In other words, when calculating the above power spectrum difference value, the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
S130,根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的相位谱差值和/或功率谱差值,对该N个麦克风进行一致性评估。S130: Perform a consistency evaluation on the N microphones according to a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
具体地,相位谱差值用于进行相位一致性评估,以及功率谱差值用于进行幅度一致性评估。Specifically, the phase spectrum difference value is used for phase consistency evaluation, and the power spectrum difference value is used for amplitude consistency evaluation.
可选地,在本申请实施例中,根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,评估对应麦克风与所述参考麦克风之间的相位一致性。Optionally, in the embodiment of the present application, a corresponding microphone and the reference microphone are evaluated according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone. Phase consistency between.
需要说明的是,两个麦克风之间的相位谱差值越小,表示这两个麦克风之间的相位一致性越好。It should be noted that the smaller the phase spectrum difference between the two microphones, the better the phase consistency between the two microphones.
例如,麦克风1与参考麦克风之间的相位谱差值为A,A越小,表示麦克风1与参考麦克风之间的相位一致性越好。For example, the phase spectrum difference between the microphone 1 and the reference microphone is A, and the smaller A is, the better the phase consistency between the microphone 1 and the reference microphone is.
可选地,可以设置一个阈值,若两个麦克风之间的相位谱差值小于这一阈值,则表示这两个麦克风之间的相位一致性满足设计需求,这两个麦克风之间的一致性对多通道语音增强算法的影响可以忽略,或者这两个麦克风之间的一致性对多通道语音增强算法没有影响。Optionally, a threshold can be set. If the phase spectrum difference between the two microphones is smaller than this threshold, it means that the phase consistency between the two microphones meets the design requirements, and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
应注意的是,上述阈值可以根据不同的多通道语音增强算法灵活配置。It should be noted that the above thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
需要说明的是,因在采集数据时,不同麦克风到声源的距离难于完全一致,所以不同麦克风之间存在一个固定相位差。It should be noted that since the distances between different microphones and the sound source are difficult to be completely consistent when collecting data, there is a fixed phase difference between the different microphones.
可选地,在本申请实施例中,可以通过固定相位差校准上述相位谱差值。Optionally, in the embodiment of the present application, the phase spectrum difference value may be calibrated by using a fixed phase difference.
具体地,分别测量该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风到声源的距离差,例如,d i表示第i个麦克风与参考麦克风到声源的距离差; Specifically, were measured for each of the N microphones other than the microphone and reference microphone from the reference microphone difference to the sound source, e.g., a distance D i represents the i-th microphone and reference microphone to the sound source of the difference;
根据所测量的距离差,分别计算该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差,例如,可以根据以下公式7计算第i个麦克风与参考麦克风之间的固定相位差;According to the measured distance difference, a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone is calculated. For example, the i-th microphone and the reference microphone can be calculated according to the following formula 7. Fixed phase difference between
根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的固定相位差,分别校准其对应的相位谱差值。According to a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values are calibrated respectively.
Figure PCTCN2018101766-appb-000036
Figure PCTCN2018101766-appb-000036
其中,Y i(ω)表示第i个麦克风的频谱,Y 1(ω)表示参考麦克风的频谱,ω表示频率,d i表示第i个麦克风与参考麦克风到声源的距离差,c表示声速,2πωd i/c表示第i个麦克风与参考麦克风之间的固定相位差。 Wherein, Y i (ω) represents the frequency spectrum of the i-th microphone, Y 1 (ω) represents the frequency spectrum reference microphone, ω represents the frequency, d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference, c denotes the speed of sound , 2πωd i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
需要说明的是,固定相位差与信号频率满足线性关系,因此,可以使用线性拟合的方式确定固定相位差。It should be noted that the fixed phase difference satisfies a linear relationship with the signal frequency. Therefore, the linear phase can be used to determine the fixed phase difference.
例如,麦克风1与参考麦克风之间的固定相位差为A,麦克风1与参考麦克风之间的相位谱差值为B,如图5所示,直线部分表示拟合得到的麦克风1与参考麦克风之间的固定相位差,曲线部分表示麦克风1与参考麦克风之间的相位谱差值,其整体表现出,随着频率从0Hz增加至8000Hz,麦克风1与参考麦克风之间的相位谱差值从0弧度减小至-2弧度。校准之后,麦克风1与参考麦克风之间的相位谱差值为C,如图6中曲线所示,此时,C=B-A,其整体表现出,随着频率从0Hz增加至8000Hz,麦克风1与参考麦克风之间的相位谱差值在0弧度与±0.5弧度之间波动。For example, the fixed phase difference between the microphone 1 and the reference microphone is A, and the phase spectrum difference between the microphone 1 and the reference microphone is B. As shown in FIG. 5, the straight line represents the fitting between the microphone 1 and the reference microphone. The phase difference between the fixed phase difference between the microphone 1 and the reference microphone, the overall performance, as the frequency increases from 0Hz to 8000Hz, the phase spectrum difference between the microphone 1 and the reference microphone from 0 The radian is reduced to -2 radians. After calibration, the phase spectrum difference between microphone 1 and the reference microphone is C, as shown in the curve in Figure 6, at this time, C = BA, which shows that as the frequency increases from 0Hz to 8000Hz, microphone 1 and The phase spectrum difference between the reference microphones fluctuates between 0 radians and ± 0.5 radians.
由图5和图6对比可知,固定相位差会对两个麦克风之间的相位谱差值造成较大的影响,因此,在对两麦克风进行幅度一致性评估时,需要消除两麦克风之间的固定相位差所造成的影响。It can be seen from the comparison between FIG. 5 and FIG. 6 that the fixed phase difference will greatly affect the phase spectrum difference between the two microphones. Therefore, when the amplitude consistency evaluation is performed on the two microphones, it is necessary to eliminate the The effect of a fixed phase difference.
可选地,在本申请实施例中,根据该N个麦克风中除该参考麦克风之外的每个麦克风与该参考麦克风之间的功率谱差值,评估对应麦克风与该参考 麦克风之间的幅度一致性。Optionally, in the embodiment of the present application, the amplitude between the corresponding microphone and the reference microphone is evaluated based on the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. consistency.
需要说明的是,两个麦克风之间的功率谱差值越小,表示这两个麦克风之间的幅度一致性越好。It should be noted that the smaller the power spectrum difference between the two microphones, the better the amplitude consistency between the two microphones.
例如,如图7所示,具体地,图7a示出了麦克风1的功率谱与参考麦克风的功率谱,图7b示出了麦克风1与参考麦克风之间的功率谱差值,麦克风1与参考麦克风之间的功率谱相差不大,并且其功率谱差值的最大值<±1分贝(dB)。For example, as shown in FIG. 7, specifically, FIG. 7a shows the power spectrum of the microphone 1 and the power spectrum of the reference microphone, and FIG. 7b shows the power spectrum difference between the microphone 1 and the reference microphone. There is not much difference in the power spectrum between the microphones, and the maximum value of the power spectrum difference is <± 1 decibel (dB).
可选地,可以设置一个阈值,若两个麦克风之间的功率谱差值小于这一阈值,则表示这两个麦克风之间的幅度一致性满足设计需求,这两个麦克风之间的一致性对多通道语音增强算法的影响可以忽略,或者这两个麦克风之间的一致性对多通道语音增强算法没有影响。Optionally, a threshold can be set. If the power spectrum difference between the two microphones is smaller than this threshold, it means that the amplitude consistency between the two microphones meets the design requirements and the consistency between the two microphones The effect on the multi-channel speech enhancement algorithm can be ignored, or the consistency between the two microphones has no effect on the multi-channel speech enhancement algorithm.
应注意的是,上述阈值可以根据不同的多通道语音增强算法灵活配置。It should be noted that the above thresholds can be flexibly configured according to different multi-channel speech enhancement algorithms.
可选地,在本申请实施例中,可以逐项测试诸如麦克风阵列的电路、电子元器件、声学结构等因素对麦克风一致性的影响,从而指导麦克风阵列的校准,具体地,可以是指导麦克风的设计和麦克风阵列的设计,评估多通道增强算法的鲁棒性。Optionally, in the embodiments of the present application, the influence of factors such as the circuit, electronic components, and acoustic structure of the microphone array on the consistency of the microphone can be tested item by item to guide the calibration of the microphone array. Design and microphone array design to evaluate the robustness of the multi-channel enhancement algorithm.
因此,在本申请实施例中,可以根据N个麦克风分别采集的N个音频信号,确定各个麦克风与参考麦克风之间的相位谱差值和/或功率谱差值,从而对N个麦克风进行一致性评估,消除麦克风之间的一致性对多通道语音增强算法的影响,提升用户体验。Therefore, in the embodiment of the present application, the phase spectrum difference and / or power spectrum difference between each microphone and the reference microphone may be determined according to the N audio signals collected by the N microphones, so as to make the N microphones consistent. Performance evaluation to eliminate the impact of consistency between microphones on multi-channel speech enhancement algorithms and improve user experience.
可选地,如图8所示,本申请实施例提供了一种评估麦克风阵列一致性的设备800,包括:Optionally, as shown in FIG. 8, an embodiment of the present application provides a device 800 for evaluating the consistency of a microphone array, including:
获取单元810,用于获取N个麦克风分别采集的N个音频信号,所述N个麦克风构成麦克风阵列,N≥2;The obtaining unit 810 is configured to obtain N audio signals collected by N microphones respectively, where the N microphones form a microphone array, and N≥2;
处理单元820,用于根据所述N个音频信号,确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,所述参考麦克风为所述N个麦克风中的任意一个麦克风;A processing unit 820, configured to determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals, The reference microphone is any one of the N microphones;
所述处理单元820,还用于根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,对所述N个麦克风进行一致性评估。The processing unit 820 is further configured to perform, according to a phase spectrum difference value and / or a power spectrum difference value between each of the N microphones except the reference microphone, and the reference microphone, N microphones were evaluated for consistency.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,评估对应麦克风与所述参考麦克风之间的相位一致性。According to the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.
可选地,所述处理单元820还用于:Optionally, the processing unit 820 is further configured to:
分别测量所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风到声源的距离差;Separately measure a distance difference between each of the N microphones except the reference microphone and the reference microphone to a sound source;
根据所测量的距离差,分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差;Calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;
根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,分别校准其对应的相位谱差值。According to a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values thereof are respectively calibrated.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据公式
Figure PCTCN2018101766-appb-000037
分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,
According to formula
Figure PCTCN2018101766-appb-000037
Respectively calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone,
其中,Y i(ω)表示第i个麦克风的频谱,Y 1(ω)表示参考麦克风的频谱,ω表示频率,d i表示第i个麦克风与参考麦克风到声源的距离差,c表示声速,2πωd i/c表示第i个麦克风与参考麦克风之间的固定相位差。 Wherein, Y i (ω) represents the frequency spectrum of the i-th microphone, Y 1 (ω) represents the frequency spectrum reference microphone, ω represents the frequency, d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference, c denotes the speed of sound , 2πωd i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值,评估对应麦克风与所述参考麦克风之间的幅度一致性。An amplitude consistency between a corresponding microphone and the reference microphone is evaluated according to a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
可选地,所述N个音频信号是在播放扫频信号数据的环境下采集的信号。Optionally, the N audio signals are signals collected in an environment where the frequency sweep signal data is played.
可选地,所述N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。Optionally, the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
可选地,所述扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。Optionally, the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
将所述N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2;Framing each of the N audio signals to obtain K signal frames of equal length, K≥2;
对所述K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧;Performing windowing processing on each of the K signal frames to obtain K windowed signal frames;
对所述K个加窗信号帧中的每个加窗信号帧做FFT变换,得到K个目标信号帧;Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames;
根据所述每个音频信号对应的所述K个目标信号帧,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值。Determine, according to the K target signal frames corresponding to each audio signal, a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone and / or Power spectrum difference.
可选地,所述K个信号帧中任意两个相邻信号帧重叠R%,R>0。Optionally, any two adjacent signal frames of the K signal frames overlap by R%, and R> 0.
可选地,所述R为25或者50。Optionally, the R is 25 or 50.
可选地,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式:Optionally, frame the i-th audio signal to obtain K signal frames of equal length and write the following vector forms:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the i-th audio signal, K represents the total number of frames collected by each microphone, and [] T represents the transpose of a vector or a matrix.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据公式
Figure PCTCN2018101766-appb-000038
确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,
According to formula
Figure PCTCN2018101766-appb-000038
Determining a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone,
其中,imag()表示取虚部,ln()表示取自然对数,
Figure PCTCN2018101766-appb-000039
表示第i个麦克风与参考麦克风之间的相位谱差值,
Figure PCTCN2018101766-appb-000040
表示参考麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000041
表示第i个麦克风的第j个目标信号帧,
Figure PCTCN2018101766-appb-000042
表示主频率。
Among them, imag () means take the imaginary part, ln () means take the natural logarithm,
Figure PCTCN2018101766-appb-000039
Represents the phase spectrum difference between the i-th microphone and the reference microphone,
Figure PCTCN2018101766-appb-000040
Represents the j-th target signal frame of the reference microphone,
Figure PCTCN2018101766-appb-000041
Represents the j-th target signal frame of the i-th microphone,
Figure PCTCN2018101766-appb-000042
Indicates the main frequency.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据所述每个音频信号对应的所述K个目标信号帧,确定所述每个音频信号的功率谱;Determining a power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;
根据所述每个音频信号的功率谱,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据公式
Figure PCTCN2018101766-appb-000043
计算所述每个音频信号的功率谱,
According to formula
Figure PCTCN2018101766-appb-000043
Calculating a power spectrum of each audio signal,
其中,P i(ω)表示第i个音频信号的功率谱,Y i,j(ω)表示第i个音频信号中的第j个目标信号帧,K表示每个麦克风采集到信号的总帧数,ω表示频率。 Among them, P i (ω) represents the power spectrum of the i-th audio signal, Yi , j (ω) represents the j-th target signal frame in the i-th audio signal, and K represents the total frame of the signal collected by each microphone Number, ω represents frequency.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
根据公式PD i(ω)=P 1(ω)-P i(ω)计算所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值, Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD i (ω) = P 1 (ω) -P i (ω),
其中,PD i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值,P 1(ω)表示参考麦克风的功率谱,P i(ω)表示第i个麦克风的功率谱。 Among them, PD i (ω) represents the power spectrum difference between the i-th microphone and the reference microphone, P 1 (ω) represents the power spectrum of the reference microphone, and P i (ω) represents the power spectrum of the i-th microphone.
可选地,所述处理单元820具体用于:Optionally, the processing unit 820 is specifically configured to:
确定所述N个麦克风在进行音频信号采集时的采样频率F s和FFT点数N fft,使用扬声器播放高斯白噪声数据或者扫频信号数据,控制所述N个麦克风采集所述N个音频信号,其中,若所述扬声器所播放的数据为扫频信号数据,所述扫频信号数据由M+1段长度相等且频率不等的信号构成,
Figure PCTCN2018101766-appb-000044
Determining the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection, using a speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to acquire the N audio signals, Wherein, if the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
Figure PCTCN2018101766-appb-000044
可选地,所述处理单元820还用于:Optionally, the processing unit 820 is further configured to:
根据公式
Figure PCTCN2018101766-appb-000045
计算所述M+1段信号中每段信号的频率,以及
According to formula
Figure PCTCN2018101766-appb-000045
Calculating the frequency of each signal in the M + 1 segment signal, and
根据公式S i(t)=sin(2πf it)计算所述M+1段信号中的每段信号, Calculate each signal in the M + 1 segment signals according to the formula S i (t) = sin (2πf i t),
其中,f i表示第i段信号的频率,F s表示采样频率,N fft表示FFT点数,S i(t)表示第i段信号,且S 1(t)的长度为周期T的整数倍,T=1/f 1Among them, f i represents the frequency of the i-th stage signal, F s represents the sampling frequency, N fft represents the number of FFT points, S i (t) represents the i-th stage signal, and the length of S 1 (t) is an integer multiple of the period T, T = 1 / f 1 .
可选地,所述扬声器所播放的扫频信号数据写成以下向量形式:Optionally, the frequency sweep signal data played by the speaker is written in the following vector form:
S(t)=[S 0(t),S 1(t),…,S M(t)] T S (t) = [S 0 (t), S 1 (t), ..., S M (t)] T
其中,S(t)表示扬声器所播放的扫频信号数据,S i(t)表示第i段信号,
Figure PCTCN2018101766-appb-000046
[] T表示向量或者矩阵的转置。
Among them, S (t) represents the frequency sweep signal data played by the speaker, and S i (t) represents the i-th segment signal,
Figure PCTCN2018101766-appb-000046
[] T represents the transpose of a vector or matrix.
可选地,所述N个麦克风分别采集到N个音频信号,其中第i个麦克风采集到的音频信号表示为x i(t),且x i(t)可以写成以下向量形式: Optionally, the N microphones respectively acquire N audio signals, where the audio signal collected by the i-th microphone is represented as x i (t), and x i (t) can be written in the following vector form:
x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
其中,x i(t)表示第i个麦克风采集到的音频信号,K表示每个麦克风采集到信号的总帧数,[] T表示向量或者矩阵的转置。 Among them, x i (t) represents the audio signal collected by the i-th microphone, K represents the total number of frames of the signal collected by each microphone, and [] T represents the transpose of the vector or matrix.
可选地,所述获取单元810具体用于:Optionally, the obtaining unit 810 is specifically configured to:
将所述N个麦克风放置于测试房间内,所述测试房间内配置有扬声器,所述N个麦克风位于所述扬声器的正前方;Placing the N microphones in a test room, where speakers are arranged in the test room, and the N microphones are located directly in front of the speakers;
控制所述扬声器播放高斯白噪声数据或者扫频信号数据,以及控制所述N个麦克风分别采集所述N个音频信号。Controlling the speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to collect the N audio signals, respectively.
可选地,所述测试房间内具有消音室环境,所述扬声器为音频测试专用人工嘴,且所述人工嘴在使用之前用标准麦克风校准。Optionally, the test room has an anechoic room environment, the speaker is an artificial mouth dedicated for audio testing, and the artificial mouth is calibrated with a standard microphone before use.
可选地,在所述处理单元820控制所述扬声器播放高斯白噪声数据或者扫频信号数据之前,所述获取单元810还用于:Optionally, before the processing unit 820 controls the speaker to play Gaussian white noise data or frequency sweep signal data, the obtaining unit 810 is further configured to:
在安静的环境下,获取所述N个麦克风在第一时长T 1内采集的第一音频数据X 1(n); In a quiet environment, acquiring first audio data X 1 (n) collected by the N microphones within a first duration T 1 ;
在播放高斯白噪声数据或者扫频信号数据的环境下,获取所述N个麦克风在第二时长T 2内采集的第二音频数据X 2(n); Acquiring second audio data X 2 (n) collected by the N microphones within a second duration T 2 in an environment where Gaussian white noise data or frequency sweep signal data is played;
触发所述处理单元820根据公式
Figure PCTCN2018101766-appb-000047
计算信噪比SNR,且确保所述SNR大于第一阈值。
Trigger the processing unit 820 according to a formula
Figure PCTCN2018101766-appb-000047
Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than a first threshold.
可选地,如图9所示,本申请实施例提供了一种评估麦克风阵列一致性的装置900,包括:Optionally, as shown in FIG. 9, an embodiment of the present application provides a device 900 for evaluating consistency of a microphone array, including:
存储器910,用于存储程序和数据;以及A memory 910 for storing programs and data; and
处理器920,用于调用并运行所述存储器中存储的程序和数据;A processor 920, configured to call and run a program and data stored in the memory;
该装置900被配置为执行上述图1至7中所示的方法。The device 900 is configured to perform the methods shown in FIGS. 1 to 7 described above.
可选地,如图10所示,本申请实施例提供了一种评估麦克风阵列一致性的系统1000,包括:Optionally, as shown in FIG. 10, an embodiment of the present application provides a system 1000 for evaluating consistency of a microphone array, including:
构成麦克风阵列1010的N个麦克风,N≥2;N microphones constituting the microphone array 1010, N≥2;
至少一个音频源1020;At least one audio source 1020;
装置1030,包括用于存储程序和数据的存储器1031和用于调用并运行所述存储器中存储的程序和数据的处理器1032,该装置1030被配置为上述图1至7中所示的方法。The device 1030 includes a memory 1031 for storing programs and data and a processor 1032 for calling and running the programs and data stored in the memory, and the device 1030 is configured as the method shown in FIGS. 1 to 7 described above.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the above processes does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not deal with the embodiments of the present application. The implementation process constitutes any limitation.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working processes of the systems, devices, and units described above can refer to the corresponding processes in the foregoing method embodiments, and are not repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示 意性的,例如,该单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division. For example, multiple units or components may be combined or may Integration into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application is essentially a part that contributes to the existing technology or a part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present application. The foregoing storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以该权利要求的保护范围为准。The above is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily think of changes or replacements within the technical scope disclosed in this application. It should be covered by the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of this claim.

Claims (48)

  1. 一种评估麦克风阵列一致性的方法,其特征在于,包括:A method for evaluating the consistency of a microphone array, comprising:
    获取N个麦克风分别采集的N个音频信号,所述N个麦克风构成麦克风阵列,N≥2;Acquiring N audio signals respectively collected by N microphones, the N microphones forming a microphone array, N≥2;
    根据所述N个音频信号,确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,所述参考麦克风为所述N个麦克风中的任意一个麦克风;Determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals, where the reference microphone is Any one of the N microphones;
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,对所述N个麦克风进行一致性评估。Perform consistency evaluation on the N microphones according to a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,对所述N个麦克风进行一致性评估,包括:The method according to claim 1, wherein the N microphones are based on a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone. Conduct a conformance assessment, including:
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,评估对应麦克风与所述参考麦克风之间的相位一致性。According to the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, further comprising:
    分别测量所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风到声源的距离差;Separately measure a distance difference between each of the N microphones except the reference microphone and the reference microphone to a sound source;
    根据所测量的距离差,分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差;Calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,分别校准其对应的相位谱差值。According to a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values thereof are respectively calibrated.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所测量的距离,分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,包括:The method according to claim 3, wherein, according to the measured distances, a fixed phase between each of the N microphones except the reference microphone and the reference microphone is calculated separately. Poor, including:
    根据公式
    Figure PCTCN2018101766-appb-100001
    分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,
    According to formula
    Figure PCTCN2018101766-appb-100001
    Respectively calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone,
    其中,Y i(ω)表示第i个麦克风的频谱,Y 1(ω)表示参考麦克风的频谱,ω表示频率,d i表示第i个麦克风与参考麦克风到声源的距离差,c表示声速,2πωd i/c表示第i个麦克风与参考麦克风之间的固定相位差。 Wherein, Y i (ω) represents the frequency spectrum of the i-th microphone, Y 1 (ω) represents the frequency spectrum reference microphone, ω represents the frequency, d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference, c denotes the speed of sound , 2πωd i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述根据所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,对所述N个麦克风进行一致性评估,包括:The method according to any one of claims 1 to 4, wherein, according to a phase spectrum difference value between each of the N microphones except the reference microphone and the reference microphone, Performing consistency evaluation on the N microphones, including:
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值,评估对应麦克风与所述参考麦克风之间的幅度一致性。An amplitude consistency between a corresponding microphone and the reference microphone is evaluated according to a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
  6. 根据权利要求2至4中任一项所述的方法,其特征在于,所述N个音频信号是在播放扫频信号数据的环境下采集的信号。The method according to any one of claims 2 to 4, wherein the N audio signals are signals collected in an environment in which frequency-sweep signal data is played.
  7. 根据权利要求5所述的方法,其特征在于,所述N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。The method according to claim 5, wherein the N audio signals are signals collected in an environment where Gaussian white noise data or frequency-sweep signal data is played.
  8. 根据权利要求6或7所述的方法,其特征在于,所述扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。The method according to claim 6 or 7, wherein the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal. Species.
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述根据所述N个音频信号,确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,包括:The method according to any one of claims 1 to 8, wherein, according to the N audio signals, determining each of the N microphones except the reference microphone and the reference microphone Phase spectral difference and / or power spectral difference between, including:
    将所述N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2;Framing each of the N audio signals to obtain K signal frames of equal length, K≥2;
    对所述K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧;Performing windowing processing on each of the K signal frames to obtain K windowed signal frames;
    对所述K个加窗信号帧中的每个加窗信号帧做FFT变换,得到K个目标信号帧;Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames;
    根据所述每个音频信号对应的所述K个目标信号帧,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值。Determine, according to the K target signal frames corresponding to each audio signal, a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone and / or Power spectrum difference.
  10. 根据权利要求9所述的方法,其特征在于,所述K个信号帧中任意两个相邻信号帧重叠R%,R>0。The method according to claim 9, wherein any two adjacent signal frames of the K signal frames overlap by R%, and R> 0.
  11. 根据权利要求10所述的方法,其特征在于,所述R为25或者50。The method according to claim 10, wherein the R is 25 or 50.
  12. 根据权利要求9至11中任一项所述的方法,其特征在于,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式:The method according to any one of claims 9 to 11, wherein the i-th audio signal is framed to obtain K signal frames of equal length and written into the following vector form:
    x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
    其中,x i(t)表示第i个音频信号,K表示每个麦克风采集到信号的总帧 数,[ ] T表示向量或者矩阵的转置。 Among them, x i (t) represents the i-th audio signal, K represents the total number of frames collected by each microphone, and [] T represents the transpose of a vector or a matrix.
  13. 根据权利要求9至12中任一项所述的方法,其特征在于,所述根据所述每个音频信号对应的所述K个目标信号帧,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,包括:The method according to any one of claims 9 to 12, characterized in that, according to the K target signal frames corresponding to each audio signal, determining that the reference microphones are excluded from the N microphones The phase spectrum difference between each microphone other than the reference microphone includes:
    根据公式
    Figure PCTCN2018101766-appb-100002
    确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,
    According to formula
    Figure PCTCN2018101766-appb-100002
    Determining a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone,
    其中,imag( )表示取虚部,ln( )表示取自然对数,
    Figure PCTCN2018101766-appb-100003
    表示第i个麦克风与参考麦克风之间的相位谱差值,
    Figure PCTCN2018101766-appb-100004
    表示参考麦克风的第j个目标信号帧,
    Figure PCTCN2018101766-appb-100005
    表示第i个麦克风的第j个目标信号帧,
    Figure PCTCN2018101766-appb-100006
    表示主频率。
    Among them, imag () means taking the imaginary part, ln () means taking the natural logarithm,
    Figure PCTCN2018101766-appb-100003
    Represents the phase spectrum difference between the i-th microphone and the reference microphone,
    Figure PCTCN2018101766-appb-100004
    Represents the j-th target signal frame of the reference microphone,
    Figure PCTCN2018101766-appb-100005
    Represents the j-th target signal frame of the i-th microphone,
    Figure PCTCN2018101766-appb-100006
    Indicates the main frequency.
  14. 根据权利要求9至13中任一项所述的方法,其特征在于,所述根据所述每个音频信号对应的所述K个目标信号帧,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值,包括:The method according to any one of claims 9 to 13, characterized in that, according to the K target signal frames corresponding to each audio signal, determining that the reference microphones are excluded from the N microphones The difference in power spectrum between each microphone other than the reference microphone includes:
    根据所述每个音频信号对应的所述K个目标信号帧,确定所述每个音频信号的功率谱;Determining a power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;
    根据所述每个音频信号的功率谱,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
  15. 根据权利要求14所述的方法,其特征在于,所述根据所述每个音频信号对应的所述K个目标信号帧,确定所述每个音频信号的功率谱,包括:The method according to claim 14, wherein determining the power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal comprises:
    根据公式
    Figure PCTCN2018101766-appb-100007
    计算所述每个音频信号的功率谱,
    According to formula
    Figure PCTCN2018101766-appb-100007
    Calculating a power spectrum of each audio signal,
    其中,P i(ω)表示第i个音频信号的功率谱,Y i,j(ω)表示第i个音频信号中的第j个目标信号帧,K表示每个麦克风采集到信号的总帧数,ω表示频率。 Among them, P i (ω) represents the power spectrum of the i-th audio signal, Yi , j (ω) represents the j-th target signal frame in the i-th audio signal, and K represents the total frame of the signal collected by each microphone Number, ω represents frequency.
  16. 根据权利要求14或15所述的方法,其特征在于,所述根据所述每个音频信号的功率谱,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值,包括:The method according to claim 14 or 15, wherein, according to a power spectrum of each audio signal, determining each of the N microphones except the reference microphone and the reference Difference in power spectrum between microphones, including:
    根据公式PD i(ω)=P 1(ω)-P i(ω)计算所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值, Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD i (ω) = P 1 (ω) -P i (ω),
    其中,PD i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值,P 1(ω)表示参考麦克风的功率谱,P i(ω)表示第i个麦克风的功率谱。 Among them, PD i (ω) represents the power spectrum difference between the i-th microphone and the reference microphone, P 1 (ω) represents the power spectrum of the reference microphone, and P i (ω) represents the power spectrum of the i-th microphone.
  17. 根据权利要求1至16中任一项所述的方法,其特征在于,所述获取N个麦克风分别采集的N个音频信号,包括:The method according to any one of claims 1 to 16, wherein the acquiring N audio signals collected by each of the N microphones comprises:
    确定所述N个麦克风在进行音频信号采集时的采样频率F s和FFT点数N fft,使用扬声器播放高斯白噪声数据或者扫频信号数据,所述N个麦克风采集所述N个音频信号,其中,若所述扬声器所播放的数据为扫频信号数据,所述扫频信号数据由M+1段长度相等且频率不等的信号构成,
    Figure PCTCN2018101766-appb-100008
    Determine the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection, use a speaker to play Gaussian white noise data or frequency sweep signal data, and the N microphones collect the N audio signals, where If the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies,
    Figure PCTCN2018101766-appb-100008
  18. 根据权利要求17所述的方法,其特征在于,The method according to claim 17, wherein:
    根据公式
    Figure PCTCN2018101766-appb-100009
    计算所述M+1段信号中每段信号的频率,以及
    According to formula
    Figure PCTCN2018101766-appb-100009
    Calculating the frequency of each signal in the M + 1 segment signal, and
    根据公式S i(t)=sin(2πf it)计算所述M+1段信号中的每段信号, Calculate each signal in the M + 1 segment signals according to the formula S i (t) = sin (2πf i t),
    其中,f i表示第i段信号的频率,F s表示采样频率,N fft表示FFT点数,S i(t)表示第i段信号,且S 1(t)的长度为周期T的整数倍,T=1/f 1Among them, f i represents the frequency of the i-th stage signal, F s represents the sampling frequency, N fft represents the number of FFT points, S i (t) represents the i-th stage signal, and the length of S 1 (t) is an integer multiple of the period T, T = 1 / f 1 .
  19. 根据权利要求18所述的方法,其特征在于,所述扬声器所播放的扫频信号数据写成以下向量形式:The method according to claim 18, wherein the frequency sweep signal data played by the speaker is written in the following vector form:
    S(t)=[S 0(t),S 1(t),…,S M(t)] T S (t) = [S 0 (t), S 1 (t), ..., S M (t)] T
    其中,S(t)表示扬声器所播放的扫频信号数据,S i(t)表示第i段信号,
    Figure PCTCN2018101766-appb-100010
    [ ] T表示向量或者矩阵的转置。
    Among them, S (t) represents the frequency sweep signal data played by the speaker, and S i (t) represents the i-th segment signal,
    Figure PCTCN2018101766-appb-100010
    [] T represents the transpose of a vector or matrix.
  20. 根据权利要求1至19中任一项所述的方法,其特征在于,所述N个麦克风分别采集到N个音频信号,其中第i个麦克风采集到的音频信号表示为x i(t),且x i(t)可以写成以下向量形式: The method according to any one of claims 1 to 19, wherein the N microphones respectively acquire N audio signals, and the audio signal collected by the i-th microphone is represented as x i (t), And x i (t) can be written as the following vector:
    x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
    其中,x i(t)表示第i个麦克风采集到的音频信号,K表示每个麦克风采集到信号的总帧数,[ ] T表示向量或者矩阵的转置。 Among them, x i (t) represents the audio signal collected by the i-th microphone, K represents the total number of frames of the signal collected by each microphone, and [] T represents the transpose of the vector or matrix.
  21. 根据权利要求1至20中任一项所述的方法,其特征在于,所述获取N个麦克风分别采集的N个音频信号,包括:The method according to any one of claims 1 to 20, wherein the acquiring N audio signals respectively collected by N microphones comprises:
    将所述N个麦克风放置于测试房间内,所述测试房间内配置有扬声器,所述N个麦克风位于所述扬声器的正前方;Placing the N microphones in a test room, where speakers are arranged in the test room, and the N microphones are located directly in front of the speakers;
    控制所述扬声器播放高斯白噪声数据或者扫频信号数据,以及控制所述N个麦克风分别采集所述N个音频信号。Controlling the speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to collect the N audio signals, respectively.
  22. 根据权利要求21所述的方法,其特征在于,所述测试房间内具有消音室环境,所述扬声器为音频测试专用人工嘴,且所述人工嘴在使用之前用标准麦克风校准。The method according to claim 21, wherein the test room has an anechoic room environment, the speaker is an artificial mouth dedicated for audio testing, and the artificial mouth is calibrated with a standard microphone before use.
  23. 根据权利要求21或22所述的方法,其特征在于,在控制所述扬声器播放高斯白噪声数据或者扫频信号数据之前,所述方法还包括:The method according to claim 21 or 22, wherein before controlling the speaker to play Gaussian white noise data or frequency sweep signal data, the method further comprises:
    在安静的环境下,获取所述N个麦克风在第一时长T 1内采集的第一音频数据X 1(n); In a quiet environment, acquiring first audio data X 1 (n) collected by the N microphones within a first duration T 1 ;
    在播放高斯白噪声数据或者扫频信号数据的环境下,获取所述N个麦克风在第二时长T 2内采集的第二音频数据X 2(n); Acquiring second audio data X 2 (n) collected by the N microphones within a second duration T 2 in an environment where Gaussian white noise data or frequency sweep signal data is played;
    根据公式
    Figure PCTCN2018101766-appb-100011
    计算信噪比SNR,且确保所述SNR大于第一阈值。
    According to formula
    Figure PCTCN2018101766-appb-100011
    Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than a first threshold.
  24. 一种评估麦克风阵列一致性的设备,其特征在于,包括:A device for evaluating the consistency of a microphone array, comprising:
    获取单元,用于获取N个麦克风分别采集的N个音频信号,所述N个麦克风构成麦克风阵列,N≥2;An obtaining unit, configured to obtain N audio signals respectively collected by N microphones, where the N microphones form a microphone array, and N≥2;
    处理单元,用于根据所述N个音频信号,确定所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,所述参考麦克风为所述N个麦克风中的任意一个麦克风;A processing unit, configured to determine a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the N audio signals, and The reference microphone is any one of the N microphones;
    所述处理单元,还用于根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值,对所述N个麦克风进行一致性评估。The processing unit is further configured to perform an analysis on the N based on a phase spectrum difference and / or a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone. Microphones for consistency assessment.
  25. 根据权利要求24所述的设备,其特征在于,所述处理单元具体用于:The device according to claim 24, wherein the processing unit is specifically configured to:
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,评估对应麦克风与所述参考麦克风之间的相位一致性。According to the phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone, the phase consistency between the corresponding microphone and the reference microphone is evaluated.
  26. 根据权利要求25所述的设备,其特征在于,所述处理单元还用于:The device according to claim 25, wherein the processing unit is further configured to:
    分别测量所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风到声源的距离差;Separately measure a distance difference between each of the N microphones except the reference microphone and the reference microphone to a sound source;
    根据所测量的距离差,分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差;Calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone according to the measured distance difference;
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,分别校准其对应的相位谱差值。According to a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone, the corresponding phase spectrum difference values thereof are respectively calibrated.
  27. 根据权利要求26所述的设备,其特征在于,所述处理单元具体用于:The device according to claim 26, wherein the processing unit is specifically configured to:
    根据公式
    Figure PCTCN2018101766-appb-100012
    分别计算所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的固定相位差,
    According to formula
    Figure PCTCN2018101766-appb-100012
    Respectively calculating a fixed phase difference between each of the N microphones except the reference microphone and the reference microphone,
    其中,Y i(ω)表示第i个麦克风的频谱,Y 1(ω)表示参考麦克风的频谱,ω表示频率,d i表示第i个麦克风与参考麦克风到声源的距离差,c表示声速,2πωd i/c表示第i个麦克风与参考麦克风之间的固定相位差。 Wherein, Y i (ω) represents the frequency spectrum of the i-th microphone, Y 1 (ω) represents the frequency spectrum reference microphone, ω represents the frequency, d i represents the distance from the i-th microphone and reference microphone to the sound source of the difference, c denotes the speed of sound , 2πωd i / c represents a fixed phase difference between the i-th microphone and the reference microphone.
  28. 根据权利要求24至27中任一项所述的设备,其特征在于,所述处理单元具体用于:The device according to any one of claims 24 to 27, wherein the processing unit is specifically configured to:
    根据所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值,评估对应麦克风与所述参考麦克风之间的幅度一致性。An amplitude consistency between a corresponding microphone and the reference microphone is evaluated according to a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone.
  29. 根据权利要求25至27中任一项所述的设备,其特征在于,所述N个音频信号是在播放扫频信号数据的环境下采集的信号。The device according to any one of claims 25 to 27, wherein the N audio signals are signals collected in an environment in which the frequency sweep signal data is played.
  30. 根据权利要求28所述的设备,其特征在于,所述N个音频信号是在播放高斯白噪声数据或者扫频信号数据的环境下采集的信号。The device according to claim 28, wherein the N audio signals are signals collected in an environment in which Gaussian white noise data or frequency-sweep signal data is played.
  31. 根据权利要求29或30所述的设备,其特征在于,所述扫频信号为线性扫频信号、对数扫频信号、线性步进扫频信号、对数步进扫频信号中的任意一种。The device according to claim 29 or 30, wherein the frequency sweep signal is any one of a linear frequency sweep signal, a logarithmic frequency sweep signal, a linear step frequency sweep signal, and a logarithmic step frequency sweep signal. Species.
  32. 根据权利要求24至31中任一项所述的设备,其特征在于,所述处理单元具体用于:The device according to any one of claims 24 to 31, wherein the processing unit is specifically configured to:
    将所述N个音频信号中的每个音频信号进行分帧,得到长度相等的K个信号帧,K≥2;Framing each of the N audio signals to obtain K signal frames of equal length, K≥2;
    对所述K个信号帧中的每个信号帧做加窗处理,得到K个加窗信号帧;Performing windowing processing on each of the K signal frames to obtain K windowed signal frames;
    对所述K个加窗信号帧中的每个加窗信号帧做FFT变换,得到K个目标信号帧;Perform FFT transformation on each of the K windowed signal frames to obtain K target signal frames;
    根据所述每个音频信号对应的所述K个目标信号帧,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值和/或功率谱差值。Determine, according to the K target signal frames corresponding to each audio signal, a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone and / or Power spectrum difference.
  33. 根据权利要求32所述的设备,其特征在于,所述K个信号帧中任意两个相邻信号帧重叠R%,R>0。The device according to claim 32, wherein any two adjacent signal frames of the K signal frames overlap by R%, and R> 0.
  34. 根据权利要求33所述的设备,其特征在于,所述R为25或者50。The device according to claim 33, wherein the R is 25 or 50.
  35. 根据权利要求32至34中任一项所述的设备,其特征在于,将第i个音频信号进行分帧,得到长度相等的K个信号帧写成以下向量形式:The device according to any one of claims 32 to 34, wherein the i-th audio signal is framed to obtain K signal frames of equal length and written into the following vector form:
    x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
    其中,x i(t)表示第i个音频信号,K表示每个麦克风采集到信号的总帧数,[ ] T表示向量或者矩阵的转置。 Among them, x i (t) represents the i-th audio signal, K represents the total number of frames collected by each microphone, and [] T represents the transpose of a vector or a matrix.
  36. 根据权利要求32至35中任一项所述的设备,其特征在于,所述处理单元具体用于:The device according to any one of claims 32 to 35, wherein the processing unit is specifically configured to:
    根据公式
    Figure PCTCN2018101766-appb-100013
    确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的相位谱差值,
    According to formula
    Figure PCTCN2018101766-appb-100013
    Determining a phase spectrum difference between each of the N microphones except the reference microphone and the reference microphone,
    其中,imag( )表示取虚部,ln( )表示取自然对数,
    Figure PCTCN2018101766-appb-100014
    表示第i个麦克风与参考麦克风之间的相位谱差值,
    Figure PCTCN2018101766-appb-100015
    表示参考麦克风的第j个目标信号帧,
    Figure PCTCN2018101766-appb-100016
    表示第i个麦克风的第j个目标信号帧,
    Figure PCTCN2018101766-appb-100017
    表示主频率。
    Among them, imag () means taking the imaginary part, ln () means taking the natural logarithm,
    Figure PCTCN2018101766-appb-100014
    Represents the phase spectrum difference between the i-th microphone and the reference microphone,
    Figure PCTCN2018101766-appb-100015
    Represents the j-th target signal frame of the reference microphone,
    Figure PCTCN2018101766-appb-100016
    Represents the j-th target signal frame of the i-th microphone,
    Figure PCTCN2018101766-appb-100017
    Indicates the main frequency.
  37. 根据权利要求32至36中任一项所述的设备,其特征在于,所述处理单元具体用于:The device according to any one of claims 32 to 36, wherein the processing unit is specifically configured to:
    根据所述每个音频信号对应的所述K个目标信号帧,确定所述每个音频信号的功率谱;Determining a power spectrum of each audio signal according to the K target signal frames corresponding to each audio signal;
    根据所述每个音频信号的功率谱,确定所述N个麦克风中除所述参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值。According to the power spectrum of each audio signal, a power spectrum difference between each of the N microphones except the reference microphone and the reference microphone is determined.
  38. 根据权利要求37所述的设备,其特征在于,所述处理单元具体用于:The device according to claim 37, wherein the processing unit is specifically configured to:
    根据公式
    Figure PCTCN2018101766-appb-100018
    计算所述每个音频信号的功率谱,
    According to formula
    Figure PCTCN2018101766-appb-100018
    Calculating a power spectrum of each audio signal,
    其中,P i(ω)表示第i个音频信号的功率谱,Y i,j(ω)表示第i个音频信号中的第j个目标信号帧,K表示每个麦克风采集到信号的总帧数,ω表示频率。 Among them, P i (ω) represents the power spectrum of the i-th audio signal, Yi , j (ω) represents the j-th target signal frame in the i-th audio signal, and K represents the total frame of the signal collected by each microphone Number, ω represents frequency.
  39. 根据权利要求37或38所述的设备,其特征在于,所述处理单元具体用于:The device according to claim 37 or 38, wherein the processing unit is specifically configured to:
    根据公式PD i(ω)=P 1(ω)-P i(ω)计算所述N个麦克风中除参考麦克风之外的每个麦克风与所述参考麦克风之间的功率谱差值, Calculate the power spectrum difference between each of the N microphones except the reference microphone and the reference microphone according to the formula PD i (ω) = P 1 (ω) -P i (ω),
    其中,PD i(ω)表示第i个麦克风与参考麦克风之间的功率谱差值,P 1(ω)表示参考麦克风的功率谱,P i(ω)表示第i个麦克风的功率谱。 Among them, PD i (ω) represents the power spectrum difference between the i-th microphone and the reference microphone, P 1 (ω) represents the power spectrum of the reference microphone, and P i (ω) represents the power spectrum of the i-th microphone.
  40. 根据权利要求24至39中任一项所述的设备,其特征在于,所述处 理单元具体用于:The device according to any one of claims 24 to 39, wherein the processing unit is specifically configured to:
    确定所述N个麦克风在进行音频信号采集时的采样频率F s和FFT点数N fft,使用扬声器播放高斯白噪声数据或者扫频信号数据,控制所述N个麦克风采集所述N个音频信号,其中,若所述扬声器所播放的数据为扫频信号数据,所述扫频信号数据由M+1段长度相等且频率不等的信号构成,
    Figure PCTCN2018101766-appb-100019
    Determining the sampling frequency F s and FFT points N fft of the N microphones during audio signal collection, using a speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to acquire the N audio signals, Wherein, if the data played by the speaker is frequency-sweep signal data, the frequency-sweep signal data is composed of M + 1 segments of equal length and different frequencies.
    Figure PCTCN2018101766-appb-100019
  41. 根据权利要求40所述的设备,其特征在于,所述处理单元还用于:The device according to claim 40, wherein the processing unit is further configured to:
    根据公式
    Figure PCTCN2018101766-appb-100020
    计算所述M+1段信号中每段信号的频率,以及
    According to formula
    Figure PCTCN2018101766-appb-100020
    Calculating the frequency of each signal in the M + 1 segment signal, and
    根据公式S i(t)=sin(2πf it)计算所述M+1段信号中的每段信号, Calculate each signal in the M + 1 segment signals according to the formula S i (t) = sin (2πf i t),
    其中,f i表示第i段信号的频率,F s表示采样频率,N fft表示FFT点数,S i(t)表示第i段信号,且S 1(t)的长度为周期T的整数倍,T=1/f 1Among them, f i represents the frequency of the i-th stage signal, F s represents the sampling frequency, N fft represents the number of FFT points, S i (t) represents the i-th stage signal, and the length of S 1 (t) is an integer multiple of the period T, T = 1 / f 1 .
  42. 根据权利要求41所述的设备,其特征在于,所述扬声器所播放的扫频信号数据写成以下向量形式:The device according to claim 41, wherein the frequency sweep signal data played by the speaker is written in the following vector form:
    S(t)=[S 0(t),S 1(t),…,S M(t)] T S (t) = [S 0 (t), S 1 (t), ..., S M (t)] T
    其中,S(t)表示扬声器所播放的扫频信号数据,S i(t)表示第i段信号,
    Figure PCTCN2018101766-appb-100021
    [ ] T表示向量或者矩阵的转置。
    Among them, S (t) represents the frequency sweep signal data played by the speaker, and S i (t) represents the i-th segment signal,
    Figure PCTCN2018101766-appb-100021
    [] T represents the transpose of a vector or matrix.
  43. 根据权利要求24至42中任一项所述的设备,其特征在于,所述N个麦克风分别采集到N个音频信号,其中第i个麦克风采集到的音频信号表示为x i(t),且x i(t)可以写成以下向量形式: The device according to any one of claims 24 to 42, wherein the N microphones respectively collect N audio signals, and the audio signal collected by the i-th microphone is represented as x i (t) And x i (t) can be written as the following vector:
    x i(t)=[x i,1(t),x i,2(t),…,x i,K(t)] T x i (t) = [x i, 1 (t), x i, 2 (t), ..., x i, K (t)] T
    其中,x i(t)表示第i个麦克风采集到的音频信号,K表示每个麦克风采集到信号的总帧数,[ ] T表示向量或者矩阵的转置。 Among them, x i (t) represents the audio signal collected by the i-th microphone, K represents the total number of frames of the signal collected by each microphone, and [] T represents the transpose of the vector or matrix.
  44. 根据权利要求24至43中任一项所述的设备,其特征在于,所述获取单元具体用于:The device according to any one of claims 24 to 43, wherein the obtaining unit is specifically configured to:
    将所述N个麦克风放置于测试房间内,所述测试房间内配置有扬声器,所述N个麦克风位于所述扬声器的正前方;Placing the N microphones in a test room, where speakers are arranged in the test room, and the N microphones are located directly in front of the speakers;
    控制所述扬声器播放高斯白噪声数据或者扫频信号数据,以及控制所述N个麦克风分别采集所述N个音频信号。Controlling the speaker to play Gaussian white noise data or frequency sweep signal data, and controlling the N microphones to collect the N audio signals, respectively.
  45. 根据权利要求44所述的设备,其特征在于,所述测试房间内具有消音室环境,所述扬声器为音频测试专用人工嘴,且所述人工嘴在使用之前用标准麦克风校准。The device according to claim 44, wherein the test room has an anechoic room environment, the speaker is an artificial mouth for audio testing, and the artificial mouth is calibrated with a standard microphone before use.
  46. 根据权利要求44或45所述的设备,其特征在于,在所述处理单元控制所述扬声器播放高斯白噪声数据或者扫频信号数据之前,所述获取单元还用于:The device according to claim 44 or 45, wherein before the processing unit controls the speaker to play Gaussian white noise data or frequency sweep signal data, the obtaining unit is further configured to:
    在安静的环境下,获取所述N个麦克风在第一时长T 1内采集的第一音频数据X 1(n); In a quiet environment, acquiring first audio data X 1 (n) collected by the N microphones within a first duration T 1 ;
    在播放高斯白噪声数据或者扫频信号数据的环境下,获取所述N个麦克风在第二时长T 2内采集的第二音频数据X 2(n); Acquiring second audio data X 2 (n) collected by the N microphones within a second duration T 2 in an environment where Gaussian white noise data or frequency sweep signal data is played;
    触发所述处理单元根据公式
    Figure PCTCN2018101766-appb-100022
    计算信噪比SNR,且确保所述SNR大于第一阈值。
    Triggering the processing unit according to a formula
    Figure PCTCN2018101766-appb-100022
    Calculate the signal-to-noise ratio SNR, and ensure that the SNR is greater than a first threshold.
  47. 一种评估麦克风阵列一致性的装置,其特征在于,包括:A device for evaluating the consistency of a microphone array, comprising:
    存储器,用于存储程序和数据;以及Memory for storing programs and data; and
    处理器,用于调用并运行所述存储器中存储的程序和数据;A processor, configured to call and run programs and data stored in the memory;
    所述装置被配置为:执行如权利要求1至23中任一项所述的方法。The device is configured to perform the method according to any one of claims 1 to 23.
  48. 一种评估麦克风阵列一致性的系统,其特征在于,包括:A system for evaluating the consistency of a microphone array is characterized in that it includes:
    构成麦克风阵列的N个麦克风,N≥2;N microphones forming a microphone array, N≥2;
    至少一个音频源;At least one audio source;
    装置,包括用于存储程序和数据的存储器和用于调用并运行所述存储器中存储的程序和数据的处理器,所述装置被配置为:An apparatus including a memory for storing programs and data and a processor for calling and running the programs and data stored in the memory, the apparatus is configured to:
    执行如权利要求1至23中任一项所述的方法。The method according to any one of claims 1 to 23 is performed.
PCT/CN2018/101766 2018-08-22 2018-08-22 Method, device, apparatus, and system for evaluating microphone array consistency WO2020037555A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201880001199.6A CN109313909B (en) 2018-08-22 2018-08-22 Method, device, apparatus and system for evaluating consistency of microphone array
PCT/CN2018/101766 WO2020037555A1 (en) 2018-08-22 2018-08-22 Method, device, apparatus, and system for evaluating microphone array consistency
CN202310466643.4A CN116437280A (en) 2018-08-22 2018-08-22 Method, device, apparatus and system for evaluating consistency of microphone array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/101766 WO2020037555A1 (en) 2018-08-22 2018-08-22 Method, device, apparatus, and system for evaluating microphone array consistency

Publications (1)

Publication Number Publication Date
WO2020037555A1 true WO2020037555A1 (en) 2020-02-27

Family

ID=65221692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/101766 WO2020037555A1 (en) 2018-08-22 2018-08-22 Method, device, apparatus, and system for evaluating microphone array consistency

Country Status (2)

Country Link
CN (2) CN109313909B (en)
WO (1) WO2020037555A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110111807B (en) * 2019-04-27 2022-01-11 南京理工大学 Microphone array-based indoor sound source following and enhancing method
CN110636432A (en) * 2019-09-29 2019-12-31 深圳市火乐科技发展有限公司 Microphone testing method and related equipment
CN111065036B (en) * 2019-12-26 2021-08-31 北京声智科技有限公司 Frequency response testing method and device of microphone array
CN112672265B (en) * 2020-10-13 2022-06-28 珠海市杰理科技股份有限公司 Method and system for detecting microphone consistency and computer readable storage medium
CN112889299B (en) * 2021-01-12 2022-07-22 华为技术有限公司 Method and apparatus for evaluating microphone array consistency
CN113259830B (en) * 2021-04-26 2023-03-21 歌尔股份有限公司 Multi-microphone consistency test system and method
CN114390421A (en) * 2021-12-03 2022-04-22 伟创力电子技术(苏州)有限公司 Automatic testing method for microphone matrix and loudspeaker
CN114222234A (en) * 2021-12-31 2022-03-22 思必驰科技股份有限公司 Microphone array consistency detection method, electronic device and storage medium
CN114449434B (en) * 2022-04-07 2022-08-16 北京荣耀终端有限公司 Microphone calibration method and electronic equipment
CN115776626B (en) * 2023-02-10 2023-05-02 杭州兆华电子股份有限公司 Frequency response calibration method and system for microphone array

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871420A (en) * 2012-12-13 2014-06-18 华为技术有限公司 Signal processing method and signal processing device for microphone array
CN106161751A (en) * 2015-04-14 2016-11-23 电信科学技术研究院 A kind of noise suppressing method and device
US20170078819A1 (en) * 2014-05-05 2017-03-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. System, apparatus and method for consistent acoustic scene reproduction based on adaptive functions
CN107864444A (en) * 2017-11-01 2018-03-30 大连理工大学 A kind of microphone array frequency response calibration method

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006033734A (en) * 2004-07-21 2006-02-02 Sanyo Electric Co Ltd Sound inspection method and device of electric product
CN1756444B (en) * 2004-09-30 2011-09-28 富迪科技股份有限公司 Self detection and correction method for electroacoustic system
US8126156B2 (en) * 2008-12-02 2012-02-28 Hewlett-Packard Development Company, L.P. Calibrating at least one system microphone
US8620672B2 (en) * 2009-06-09 2013-12-31 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for phase-based processing of multichannel signal
US9113264B2 (en) * 2009-11-12 2015-08-18 Robert H. Frater Speakerphone and/or microphone arrays and methods and systems of the using the same
CN102111697B (en) * 2009-12-28 2015-03-25 歌尔声学股份有限公司 Method and device for controlling noise reduction of microphone array
CN102075848B (en) * 2011-02-17 2014-05-21 深圳市豪恩声学股份有限公司 Method and system for testing array microphone and rotating device
EP2565667A1 (en) * 2011-08-31 2013-03-06 Friedrich-Alexander-Universität Erlangen-Nürnberg Direction of arrival estimation using watermarked audio signals and microphone arrays
US9609141B2 (en) * 2012-10-26 2017-03-28 Avago Technologies General Ip (Singapore) Pte. Ltd. Loudspeaker localization with a microphone array
CN103247298B (en) * 2013-04-28 2015-09-09 华为技术有限公司 A kind of sensitivity correction method and audio frequency apparatus
CN103559330B (en) * 2013-10-10 2017-04-12 上海华为技术有限公司 Method and system for detecting data consistency
WO2016209098A1 (en) * 2015-06-26 2016-12-29 Intel Corporation Phase response mismatch correction for multiple microphones
CN105554674A (en) * 2015-12-28 2016-05-04 努比亚技术有限公司 Microphone calibration method, device and mobile terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871420A (en) * 2012-12-13 2014-06-18 华为技术有限公司 Signal processing method and signal processing device for microphone array
US20170078819A1 (en) * 2014-05-05 2017-03-16 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. System, apparatus and method for consistent acoustic scene reproduction based on adaptive functions
CN106161751A (en) * 2015-04-14 2016-11-23 电信科学技术研究院 A kind of noise suppressing method and device
CN107864444A (en) * 2017-11-01 2018-03-30 大连理工大学 A kind of microphone array frequency response calibration method

Also Published As

Publication number Publication date
CN109313909A (en) 2019-02-05
CN116437280A (en) 2023-07-14
CN109313909B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
WO2020037555A1 (en) Method, device, apparatus, and system for evaluating microphone array consistency
CN106486131B (en) A kind of method and device of speech de-noising
CN109831733B (en) Method, device and equipment for testing audio playing performance and storage medium
CN110880329B (en) Audio identification method and equipment and storage medium
CN109845288B (en) Method and apparatus for output signal equalization between microphones
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
US20140337021A1 (en) Systems and methods for noise characteristic dependent speech enhancement
CN108766454A (en) A kind of voice noise suppressing method and device
US11069366B2 (en) Method and device for evaluating performance of speech enhancement algorithm, and computer-readable storage medium
KR20120116442A (en) Distortion measurement for noise suppression system
WO2021135547A1 (en) Human voice detection method, apparatus, device, and storage medium
CN109256139A (en) A kind of method for distinguishing speek person based on Triplet-Loss
WO2021000498A1 (en) Composite speech recognition method, device, equipment, and computer-readable storage medium
US20140321655A1 (en) Sensitivity Calibration Method and Audio Device
CN112017693B (en) Audio quality assessment method and device
CN110290453B (en) Delay testing method and system of wireless playing device
US11915718B2 (en) Position detection method, apparatus, electronic device and computer readable storage medium
CN110169082A (en) Combining audio signals output
Enzinger et al. Mismatched distances from speakers to telephone in a forensic-voice-comparison case
Raikar et al. Effect of Microphone Position Measurement Error on RIR and its Impact on Speech Intelligibility and Quality.
CN106710602B (en) Acoustic reverberation time estimation method and device
US20200388275A1 (en) Voice processing device and voice processing method
CN114420165A (en) Audio circuit testing method, device, equipment and storage medium
CN111885474A (en) Microphone testing method and device
CN113593604A (en) Method, device and storage medium for detecting audio quality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18930587

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18930587

Country of ref document: EP

Kind code of ref document: A1