CN108549051A

CN108549051A - A kind of near-field sound source real-time positioning system based on microphone array

Info

Publication number: CN108549051A
Application number: CN201810507372.1A
Authority: CN
Inventors: 李秀坤; 叶春煦; 贾红剑
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2018-04-17
Filing date: 2018-05-24
Publication date: 2018-09-18

Abstract

The invention discloses a kind of near-field sound source real-time positioning system based on microphone array, belongs to auditory localization technical field；Including microphone array and its amplification circuit module, Signal acquiring and processing module, positioning result memory module and other systems supplementary module.Voice signal is received and is amplified by microphone array, and amplifying circuit uses the special amplifier of microphone with automatic gain function for core.Signal acquiring and processing module is using the dsp chip on piece analog-digital conversion function as core, using the direct memory access function inside dsp chip.Near field two dimension broadband MUSIC algorithms are advanced optimized, after being accepted or rejected to microphone array signals frequency spectrum sub-band, subbands matrix does MUSIC Power estimations, reduces the operand of dsp chip, improves system real time.Auditory localization result memory module, as storage medium, improves systematic difference range using SD card, and the cluster chain of speaker's position data is established using " built in advance cluster chain method ", improves system real time.

Description

A kind of near-field sound source real-time positioning system based on microphone array

Technical field

The invention belongs to auditory localization technical fields, and in particular to a kind of near-field sound source based on microphone array is fixed in real time Position system.

Background technology

A kind of common acoustic-electrical transducer of the microphone as voice signal especially speech signal collection, in multimedia meeting The fields extensive application [39] such as view, teaching, communication, mechanical equipment vibration and noise measuring, military commanding scouting.City at present Audio collection Related product on field, using single microphone as based on the product of audio signal sample unit, some high-end products Various forms of microphone arrays are used as audio signal sample, the sensor of processing.It is usually used in traditional meeting The acoustic information of single microphone acquisition speaker, after the meeting arranges session recording content by artificial mode, is proofreaded, However this far can not meet the requirement of modern meeting, the intelligent construction of meeting room becomes a new trend.One intelligence The meeting room of energyization, generally includes the following aspects：1. switched including light, meeting elevator, large-size screen monitors in realization meeting room, The device control module of all devices such as meeting camera, air-conditioning centralized management；2. realizing that conference host, personnel participating in the meeting calculate The signal that the picture of the smart machines such as machine, tablet computer, mobile phone freely throws screen display throws panel module；3. realization meeting camera, The audio-visual system that microphone, stereo set automatic collection are recorded, automatically switch, being automatically stored.Wherein, audio-visual system is intelligence The core of energy meeting room, and to the auditory localization of speaker, and be the key point that audio-visual system realizes automation.According to The prior information of meeting person's seating arrangements can distinguish different speakers with the position of different moments difference speaker, be artificial after the meeting It arranges minutes or smart meeting room control centre distinguishes different speakers and provides more information.

For the interference such as reverberation, noise are eliminated, the phase information of voice signal is obtained more accurately to judge sound source position The purpose set, domestic and foreign scholars, researcher have done extensive and deep grind to the microphone with certain geometrical shape at present Study carefully.Each microphone in array is one " array element ", is obtained more about sound using the information redundancy of multiple " array elements " acquisition The information in source.The main sound source far field Mutual coupling of technical field of microphone array application and near field positioning, speaker Identification, speech enhan-cement, semantic understanding etc..Auditory localization and Application on Voiceprint Recognition are to judge speaker position, distinguish different speakers Key technology, it is a kind of judge in meeting room speaker speak the time, speak position to distinguish the software and hardware system of different speakers System becomes an active demand.

Invention content

The purpose of the present invention is to provide can carry out positioning to speaker in real time in microphone near field range and and deposit Speaker's positioning result is stored up, speaker's more accurate position coordinates in two dimensional surface region are provided to intelligent meeting chamber system A kind of near-field sound source real-time positioning system based on microphone array.

The purpose of the present invention is realized by following technical solution：

In a first aspect, the present invention provides a kind of near-field sound source real-time positioning system based on microphone array, the system The overall structure of system includes：

One electret microphone array arranged by uniform straight line array for acquiring speaker's acoustic information, by Mike On the one hand wind array can increase array entirety ruler as far as possible according to uniform straight line array arrangement in the case where array element quantity is certain It is very little, improve the near field range of system, on the other hand can aspect combined with facilities such as desks in practical application scene, facilitate peace Dress and use.

The amplifying circuit of followed by each electret microphone, amplifying circuit, which uses, has automatic gain control function The integrated level high special amplifier of microphone (such as MAX9814L) low level signal amplification that electret microphone is exported, When designing circuit, action time t_attack and release time t_ that amplifier automatic growth control module is set are needed Release, to reach not only without " peak clipping " phenomenon but also without " suction " phenomenon.May be used also using the special amplifier of the high microphone of integrated level To reduce the node of circuit system, the stability of system is improved.

The output signal of aforementioned each electret microphone amplifying circuit is acquired by microphone array signals and is adopted with processing module Collection, microphone array signals acquisition use the dsp chip on piece analog-digital converter as core, sampling essence with processing module Degree is not less than 10, you can distinguishes the phase difference and amplitude of speaker's voice signal that microphone array difference array element receives Difference.For the considerations of reducing circuit system number of nodes and reducing cost, between microphone array and signal acquisition module not It needs that sampling hold circuit is added, by improving the sample frequency of analog-digital converter, makes the sampling interval duration of adjacent array element not Higher than 1.6 μ s, you can ignore the error that non-synchronous sampling is brought

It, can be with while improving running efficiency of system using direct memory access (DMA) function in dsp chip Convenient realize carries out framing to microphone array signals.After every frame microphone array signals acquire and are transmitted, automatically It is interrupted into DMA, positioning of the speaker in two dimensional surface region is realized in breaking in the dma.After obtaining positioning result, in real time Positioning result is stored in the SD card of FAT32 file system formats.

Second aspect, the present invention provide a kind of near-field sound source real-time location method of corresponding above system, the method Flow includes：

Uniform straight line array microphone array acquires the speaker's voice time domain signal spoken in system near field range, and will In the signal deposit matrix X (t) of acquisition, matrix X (t) can accommodate the microphone array signals of 10ms~40ms；

After matrix X (t) is performed integrally primary update, to every a line in matrix, i.e., the frame that each array element receives is believed Number doing Fast Fourier Transform (FFT) obtains the spectral matrix X (ω, t) of microphone array time-domain signal.

Spectral matrix X (ω, t) is blocked, acquisition centre frequency is 300Hz and 1000Hz, and frequency bandwidth is 100Hz's Two sub-bands, or other 2~3 sub-bands are selected according to the characteristic voice of speaker in vocal print library；

The correlation matrix R (ω _ k) of above-mentioned sub-band, specific formula is asked to be shown below respectively：

R (ω _ k)=E { X (ω _ k) X^H (ω _ k) }

Corresponding feature vector is formed into noise subspace by R (ω _ k) feature decomposition, and according to the distribution of small characteristic value U_N, and the corresponding MUSIC spectrums of the sub-band are obtained according to the following formula.

P_MUSIC (ω _ k, x, y)=1/ ‖ a (ω _ k, x, y) U_N ‖ ^2

The MUSIC of each sub-band is composed and is added, and carries out spectrum peak search in two dimensional surface region, the corresponding coordinate of spectral peak As position of the speaker in two dimensional surface region.

This method is in each auditory localization operation, it is only necessary to carry out a spectrum peak search, and seek spectral matrix X When the correlation matrix of (ω, t), due to being blocked to spectral matrix X (ω, t), reduce operand.

The third aspect, according to the above-mentioned near-field sound source real-time positioning system based on microphone array, the present invention provides one Kind improves the method that the auditory localization destination file of system real time is established and is written, the method includes：

The present invention optimizes traditional document creation and write-in flow.Common document creation and write-in side before optimization Method is as shown in Figure 4.Flow chart after optimization is as shown in Fig. 5 figures.When system initialization, in file directory area, write-in file is believed substantially While breath, a cluster chain for containing about 60 clusters is created in FAT table, particular number can adjust according to actual needs.It is corresponding File data area can store the data of 240KB, the data volume that system work once generates generally is not more than this size, Sky cluster and modification FAT table can need not be searched again for during auditory localization.This method we can be referred to as " built in advance cluster Chain method ".This cluster chain established in initialization procedure is herein referred to as " initial cluster chain ".

" the initial cluster chain " established in FAT table by " built in advance cluster chain method " is still stored in SD card, is read by SD card The chain relationship in file data area between cluster and cluster is taken still to need to expend more time, this is limited by the read-write speed of SD card Degree.In order to further increase running efficiency of system, in system initialization, by " initial cluster chain " backup in extending out SRAM, The read or write speed of SRAM is significantly larger than SD card by this method, can further increase system running speed, and it is unnecessary to reduce Time overhead.

The beneficial effects of the present invention are：

The near-field sound source positioning system based on microphone array that the present invention provides a kind of.With traditional sonic location system It is compared with method, the present invention is improved as far as possible using the very high chip of integrated level to reduce the number of nodes of system hardware circuit The stability of system.The present invention by uniform rectilinear microphone array be listed in array number amount it is certain in the case of increase battle array as far as possible The overall dimensions of row, and using the method that uniform rectilinear microphone array provides through the invention with sound on array parallel direction The higher feature of source positioning accuracy, makes this array that can apply well in a meeting room environment.By to existing near field Two-dimentional broadband MUSIC auditory localization algorithms advanced optimize, and reduce calculation amount, improve the real-time of system.Pass through FAT32 The SD card of file system format stores auditory localization destination file, improves the compatibility and popularity of system.And pass through " built in advance Cluster chain method " optimizes existing embedded system document creation with wiring method, further improves the real-time of system.

Description of the drawings

Fig. 1 is the overall structure block diagram of the present invention；

Fig. 2 is the schematic diagram of single electret microphone and its amplifying circuit；

Fig. 3 is the microphone array signals data transmission flow figure of dsp chip acquisition；

Fig. 4 is auditory localization document creation and Stored Procedure figure before optimization；

Fig. 5 is auditory localization document creation and Stored Procedure figure after optimization；

Fig. 6 is ADC module external circuit schematic diagram；

Fig. 7 is TMS320F28335ADC module-cascade mode work flow diagrams.

Specific implementation mode

The specific implementation mode of the present invention is described further below in conjunction with the accompanying drawings：

A kind of near-field sound source real-time positioning system based on microphone array, the electret for lining up uniform straight line array by one Body microphone array acquires the voice signal of speaker, and the electric signal of each electret microphone output is respectively by comprising automatic The special amplifier amplification of microphone of gain control module, amplifier output signal includes on piece analog-digital converter by one Dsp chip acquires, and the digital quantity after analog-to-digital conversion is transmitted to outer by direct memory access (DMA) function in dsp chip Expand SRAM in, and in the dma break in realize optimization near field two dimension broadband MUSIC auditory localization algorithms, auditory localization result with The specific stored in file format of user is in the SD card of FAT32 file system formats.

Microphone array for acquiring speaker's voice is classified as uniform straight line array, and array element quantity is no less than 4, adjacent array element Spacing no more than 5 centimetres (half of the voice signal minimum wavelength about detected), array element quantity and array overall dimensions according to Practical engineering application needs, and is determined in the case where meeting Near Field r≤(2L^2)/λ, and wherein L indicates array sizes, r tables Show that system detection range, λ indicate the wavelength of detectable signal.

Amplifying circuit is each using the special amplifier of the microphone comprising automatic gain module (such as MAX9814L) amplification The output signal of electret microphone can reduce system node using the special amplifier of the high microphone of integrated level, improve System stability.The action time t_attack and release time t_release of automatic gain module will be controlled strictly, wherein t_ For the value range of attack between 0.8ms~1.4ms, t_release is 400~600 times of t_attack, such case Under, amplified microphone array signals are not in " peak clipping " phenomenon, and sound, which sounds, suddenly big or suddenly small " take out will not occurs Inhale " phenomenon.

Signal acquiring and processing module uses the dsp chip comprising analog-digital conversion function for core, on piece analog-digital converter Sampling precision be not less than 10, on piece analog-digital converter can reduce circuit system node, improve system stability.Modulus turns The microphone array signals of parallel operation acquisition are transmitted to extend out in SRAM by direct memory access (DMA) function to be stored, according to The short-term stationarity of voice signal needs to do " framing " processing to microphone array signals, is per frame microphone array signals 10ms~40ms, auditory localization operation carry out in breaking in the dma after the completion of the acquisition of every frame microphone array signals.

Sampling hold circuit need not be added between microphone array and its amplifying circuit and Signal acquiring and processing module, By the way that higher sample frequency is arranged, makes the sampling interval duration of adjacent array element that can distinguish adjacent array element no more than 1.6 μ s and adopt The phase difference and amplitude difference of speaker's voice signal of collection reach and reduce system node, improve system stability, and reduce system The purpose of cost of manufacture.

Auditory localization algorithm is using the near field two dimension broadband MUSIC algorithms optimized.The microphone array signals of acquisition are led to It crosses the frequency spectrum that Fast Fourier Transform (FFT) finds out microphone array signals, and the characteristics of according to speaker's voice signal, array is believed Number frequency spectrum carry out " blocking ", Selection Center frequency be 300Hz and 1000Hz, frequency bandwidth be 100Hz two sub-band generations For the time domain array signal used in classical MUSIC algorithms, MUSIC is found out respectively and composes and is added, said by spectrum peak search determination Talk about position coordinates of the people in two regions.

If system workplace speaker provided by the invention is relatively fixed, the vocal print library of speaker can be established, is led to It crosses and each speaker's vocal print characteristic distributions in vocal print library is compared, select 2~3 sub-bands, the center frequency of the sub-band Rate should be in vocal print library common, the higher frequency values of energy in all speaker's voice spectrums.

Two centre frequencies are that the selection for the sub-band that 300Hz and 1000Hz frequency bandwidths are 100Hz is big according to observation What the sound spectrograph of amount was realized, in order to further increase the accuracy of speaker's location estimation, in the relatively-stationary occasion of speaker, The voiceprint that speaker can be acquired in advance, establishes speaker's vocal print library, choose all speakers to be detected it is common, energy Higher 2~3 sub-bands replace chosen in claim 5 two sub-bands, and more accurate speaker position can be obtained Estimated result.

Auditory localization result is stored in a file format in the SD card of FAT32 file system formats, with to the maximum extent The compatibility of extension system.SD card works in the spi mode, and not needing dsp chip has hardware MMC interfaces or software-driven, It does not need and additionally increases special MMC chips on circuit system, reduce the requirement to dsp chip and circuit system number of nodes yet Amount, improves the cost and complexity of system, improves the stability of system.

When being created in SD card and auditory localization destination file be written, using " built in advance cluster chain method " in system initialization just It searches and records " empty cluster " that data can be written in SD card, the empty number of clusters amount of lookup is primary not less than system work generated The size of auditory localization destination file need not be again in auditory localization algorithm performs and positioning result file writing process Cluster chain is searched for, the continuity and real-time of system work are effectively improved.

Specific embodiment one：

In microphone array and its amplifying circuit, electret microphone uses NMI9745 type electret microphones, amplifier Using the special amplifier of MAX9814L type microphones, according to single electret microphone shown in Fig. 3 and its amplifying circuit Schematic diagram connects.

It, can be according to optional time coefficient when AGC modules inside MAX9814L detect that output voltage is more than predetermined threshold value Reduce gain, this time constant is referred to as " Attack Time ".After signal output amplitude reduces, gain can be in the short time It is inside maintained at reduction state, this time is referred to as " Hold Time ".It then is slowly increased to normal value, this time is claimed For

“Release Time”.Reduce the time coefficient t of gain_attackIt can be obtained by following formula.

t_attack=2400 × C_CT

Attack Time (t can be adjusted by A/R pins_attack) and Release Time (t_release) ratio t_attack/t_release, A/R pins state and t_attack/t_releaseRelationship with reference to MAX9814 official's databook.Pass through experiment Comparison, the voice signal (sound intensity at microphone is about 60dB) when for normally speaking, t_attackIt sounds very much and will appear in short-term " suction " phenomenon, t_attackIt is too long, it will appear obvious " peak clipping " phenomenon, C herein_CT=470nf, at this time t_attack= 1.1ms, t_release=550ms.AGC threshold values are arranged by TH pins, and the voltage of TH pins is the mean value of output signal, usually select For the numerical value more slightly lower than supply voltage, select here by way of resistance series connection partial pressure, divider resistance parameter selection and Mike The circuit diagram of wind amplification module is as shown in Figure 3.And it is each electret microphone is equal no more than 5cm according to adjacent array element spacing Even line array arrangement.

Dsp chip selects the TMS320F28335 type dsp chips of TI companies production.F28335 has two groups of ADC moulds of A, B Block, every group of ADC module contain 8 analog signal input channels, are illustrated in figure 6 the ADC portion circuit diagram of F28335. MIC8~MIC15 is separately connected the output interface of microphone amplifying circuit.ADCREFP and ADCREFM is inside ADC module respectively Benchmark positive output and middle output, to prevent unexpected shake, the two pins need electric by the low equal effects of a capacitance 2.2uf respectively The ceramic condenser for hindering (ESR) is connect with simulation.ADCLO pins are the low benchmark of ADC module, are directly connect with simulation. ADCRESEXT external bias resistance pins are that the resistance that 1% resistance value is 24.9k Ω is connect with simulation by a precision.

The core of F28335 on pieces ADC is 12 analog-to-digital conversion modules, and 16 analog input channels are adopted by two Sample retainer SH-A, SH-B this ADC module of time-sharing multiplex under the control of system control module, transformation result is according to sorting unit Setting is stored sequentially in result register ADCRESULT0~ADCRESULT15.There are two types of operations by sorting unit SEQ1, SEQ2 Mode, i.e. cascade operation and double sorting units operate.Cascade operation is that SEQ1 and SEQ2 are combined into a sorting unit, only response one A SOC trigger sources；In double sorting unit operations, SEQ1 and SEQ2 respond a SOC trigger source respectively, and it is logical to be each responsible for ADCINAx The sequence in road and the channels ADCINBx transformation result.

Cascade operation and double sorting units operate maximum difference lies in the response quantity of SOC trigger sources is different, design herein Mono- SOC trigger source of system EPWMA, using cascade mode.In cascade operation mode, two sorting units SEQ1 and SEQ2 Constitute a 16 state sorting unit SEQ.SEQ controls ADC module by transformation result according to preset sequential storage to result In register, and generates an ADC after each channel is fully completed primary conversion and convert interruption.TMS320F28335's Work flow diagram of the ADC module under cascade system is as shown in Figure 7.

Near field two dimension broadband MUSIC auditory localization algorithms according to second aspect in invention content, by two energy The MUSIC spectrums of higher sub-band are added the MUSIC spectrums after being optimized, and speaker is obtained in two dimensional surface by spectrum peak search Position coordinates in region.

SD card is connect with TMS320F28335 type dsp chips according to SPI mode, and according to write-in file shown in fig. 5 Flow auditory localization result is written to the data field of SD card according to file format requirements, when a cluster is write it is full after, according to " initial cluster chain " in SRAM jumps to next cluster and continues to write to data.

Until this task of user's ends with system.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of near-field sound source real-time positioning system based on microphone array, it is characterised in that：It is lined up by one uniformly straight The electric signal of the voice signal of the electret microphone array acquisition speaker of linear array, each electret microphone output leads to respectively The special amplifier amplification of the microphone comprising automatic growth control module is crossed, amplifier output signal includes on piece modulus by one The dsp chip of converter acquires, and the digital quantity after analog-to-digital conversion is transmitted to by the direct memory access function in dsp chip It extends out in SRAM, and realizes the near field two dimension broadband MUSIC auditory localization algorithms of optimization, auditory localization result in breaking in the dma With the specific stored in file format of user in the SD card of FAT32 file system formats.

2. a kind of near-field sound source real-time positioning system based on microphone array according to claim 1, it is characterised in that： Microphone array for acquiring speaker's voice is classified as uniform straight line array, and array element quantity is no less than 4, and adjacent array element spacing is little In 5 centimetres, array element quantity and array overall dimensions meet Near FieldWherein L indicates that array sizes, r indicate that system is visited Ranging is from λ indicates the wavelength of detectable signal.

3. a kind of near-field sound source real-time positioning system based on microphone array according to claim 1 and claim 2, It is characterized in that：Amplifying circuit amplifies each electret microphone using the special amplifier of the microphone comprising automatic gain module Output signal, using the special amplifier of the high microphone of integrated level, the action time t of automatic gain module_attackAnd release Time t_release, wherein t_attackValue range between 0.8ms~1.4ms, t_releaseFor t_attack400~600 times.

4. a kind of near-field sound source real-time positioning system based on microphone array according to claim 1, signal acquisition with Processing module uses the dsp chip comprising analog-digital conversion function for core, and the sampling precision of on piece analog-digital converter is not less than 10 " framing " processing is done in position to microphone array signals, is 10ms~40ms per frame microphone array signals, per frame microphone array Auditory localization operation is carried out in breaking in the dma after the completion of signal acquisition.

5. a kind of near-field sound source based on microphone array according to claim 1, claim 3 and claim 4 is real When positioning system, it is characterised in that：Higher sample frequency is set, the sampling interval duration of adjacent array element is made to be not more than 1.6 μ s.

6. a kind of near-field sound source real-time positioning system based on microphone array according to claim 1 and claim 4, It is characterized in that, near field two dimension broadband MUSIC algorithm of the auditory localization algorithm using optimization：

The microphone array signals of acquisition are found out to the frequency spectrum of microphone array signals by Fast Fourier Transform (FFT), and according to saying The characteristics of talking about human speech sound signal carries out " blocking " to the frequency spectrum of array signal, and Selection Center frequency is 300Hz and 1000Hz, frequency The time domain array signal that bandwidth is replaced using in classics MUSIC algorithms by two sub-bands of 100Hz, finds out MUSIC respectively It composes and is added, position coordinates of the speaker in two regions are determined by spectrum peak search.

7. a kind of near-field sound source based on microphone array according to claim 1 and claim 6 is fixed in real time Position system, it is characterised in that：When the system workplace speaker of offer fixes, the vocal print library of speaker can be established, is passed through Each speaker's vocal print characteristic distributions in vocal print library are compared, 2~3 sub-bands, the centre frequency of the sub-band are selected It should be in vocal print library common, the higher frequency values of energy in all speaker's voice spectrums.

8. a kind of near-field sound source based on microphone array according to claim 1, claim 4 and claim 5 is real When positioning system, it is characterised in that：Two centre frequencies are the choosing for the sub-band that 300Hz and 1000Hz frequency bandwidths are 100Hz It takes and is realized by observing a large amount of sound spectrograph, in the fixed occasion of speaker, acquire the voiceprint of speaker in advance, foundation is said Talk about voice line library, choose all speakers to be detected it is common, higher 2~3 sub-bands of energy replace claim 5 Two sub-bands of middle selection, obtain more accurate speaker's location estimation result.

9. a kind of near-field sound source real-time positioning system based on microphone array according to claim 1, it is characterised in that： Auditory localization result is stored in a file format in the SD card of FAT32 file system formats, and SD card works in the spi mode, Not needing dsp chip has hardware MMC interfaces or software-driven, does not also need and additionally increases special MMC cores on circuit system Piece.

10. a kind of near-field sound source based on microphone array according to claim 1 and claim 7 positions in real time is System, when being created in SD card and auditory localization destination file is written, is looked up using " built in advance cluster chain method " in system initialization And " the empty cluster " that data can be written in SD card is recorded, the empty number of clusters amount generated sound source primary not less than system work of lookup The size of positioning result file.