CN106887236A

CN106887236A - A kind of remote speech harvester of sound image combined positioning

Info

Publication number: CN106887236A
Application number: CN201510973445.2A
Authority: CN
Inventors: 朱沄杰; 徐伟明; 何颋; 黄松岳
Original assignee: Ningbo Sangdena Electronic Technology Co Ltd
Current assignee: Ningbo Sangdena Electronic Technology Co Ltd
Priority date: 2015-12-16
Filing date: 2015-12-16
Publication date: 2017-06-23

Abstract

For the difficult point of voice collecting under remote, noise background, sound image combined determination target speaker is carried out with reference to video frequency pick-up head, microphone array, the problems such as determining difficulty so as to solve the target speaker for existing using camera, high directivity microphone or microphone array merely, need mechanical rotation device, present invention proposition carries out sound image combined determination target speaker positioning with reference to a kind of voice acquisition device of monitoring camera, microphone array beam forming technique, so that the performance that outdoor remote speech enhancing is gathered under improving ambient noise conditions.

Description

A kind of remote speech harvester of sound image combined positioning

Technical field

The present invention relates to a kind of voice acquisition device, more particularly, to a kind of remote speech of sound image combined positioning Harvester.

Background technology

In fields such as security, security protections, all kinds of video monitoring systems have been used widely.Rely on all kinds of videos Monitoring system, can be confirmed to related personnel in long-distance video, be screened.Utilizing video monitoring system If being gathered using language, dialog information by remote speech when carrying out remote suspect's confirmation, screening, Operating efficiency can be leveraged.But remote speech collection is carried out under actual environment background noise conditions still With high difficulty.

Due to the presence of ambient noise, remote speech must be protected when gathering using the high directivity of harvester Demonstrate,prove the remote speech quality of collection.Current remote speech harvester is mainly using the rifle of interference tubular construction Formula microphone forms high directivity.

As Chinese patent ZL 2010101269089 discloses a kind of sound pick up equipment, it includes：Accommodating body, first Piezoelectric element, the second piezoelectric element and circuit unit, the accommodating body have pickup mouthful, and the piezoelectric element is set In in accommodating body, to sense the vibration of high frequency sound wave and converted output signal, second piezoelectric element is arranged at appearance Put in vivo, the converted output signal to sense the vibration of low-frequency sound wave, the circuit unit is electrically connected to the first piezoelectricity Element and the second piezoelectric element, to receive the signal of the first piezoelectric element and the second piezoelectric element, and are processed Produce voice signal；Therefore, with preferable sensing sensitivity and broader audio, tonequality can be lifted.

Chinese patent ZL2010591158.2 discloses a kind of long range sound pick up equipment of video location, by The sound focusing structure staving that circumference centered on video camera installs 2 built-in directional microphones forms sensing high Property, obtain reference noise, sound focusing with reference to the omnidirectional microphone that 2 pickup ambient noises are installed outside bucket body sidewall Structure staving is rotated with camera, and the device is only obtained after operating personnel carry out video location according to video image content Take with the equidirectional voice signal of camera, and carry out adaptive noise reduction treatment using digital signal processor.

But because the remote speech harvester of above-mentioned formation high directivity can only be formed immediately ahead of device Fixed high directivity wave beam, needs to be spoken come the remote of alignment motion by the rotation of device in actual use People's target, increased extra mechanical servo antrol cost；Simultaneously, it is contemplated that video monitoring is to remote Target has larger field, with portrait can be seen by focusing from long-distance video image, but often cannot be straight Connect discriminating, find speech act, cause video frequency pick-up head and remote pickup device to carry out Mechanical Moving alignment mesh There is a problem of being difficult synchronous during mark speaker, will also cause Design of Monitoring and Control System to use upper inconvenience.

Microphone array is made up of multiple microphones according to certain topological structure, can be by beamforming algorithm to not Signal on equidirectional forms different responses, namely array space directional property, make array microphone that there is sound The function such as source positioning and tracking, voice extraction and separation and denoising, so as to improve the voice under complex background Signal quality, making up isolated microphone cannot obtain defect with utilization space information, and can avoid using machinery Tumbler carrys out alignment target speaker.

The model domain compensation that Chinese patent ZL 2013102011025 is disclosed in a kind of remote speech identification is newly square Method, the method is proposed in simulating chamber for the difficulty in microphone array remote speech collection indoors and identification Reverberation acoustic enviroment, and pass through be input into interior space size generate diverse location room shock response sequence Row, so as to the compensation that indoor remote speech is carried out in model domain improves collection and identifying processing performance.

But the outdoor scene remote speech for fields such as security, security protections gathers occasion, and the required voice that carries out is adopted Speaker's object distance of collection is much larger than indoor application scene, and there is serious ambient noise.Now, merely It is difficult to obtain speaker direction so as to carry out speech enhan-cement and collection by microphone array algorithm.

The content of the invention

For the difficult point of voice collecting under remote, noise background, enter with reference to video frequency pick-up head, microphone array The sound image combined determination target speaker of row, so as to solve to use camera, high directivity microphone or Mike merely The problems such as target speaker that wind array is present determines difficulty, needs mechanical rotation device, the present invention proposes to combine prison Control camera, a kind of voice acquisition device of microphone array beam forming technique carry out sound image combined determination target Speaker positions, so that the performance that outdoor remote speech enhancing is gathered under improving ambient noise conditions.

A kind of remote speech harvester of sound image combined positioning, including with lower module,

Monitoring camera：For gathering long-distance video image；

Microphone array：For voice signal multichannel collecting, preposition treatment and analog-to-digital conversion；

Beam scanning module, its input connection microphone array output end：For carrying out beam scanning, obtain The directional spreding information of remote speech and noise；

Sound image combined processing module, its input connects the output of monitoring camera and beam scanning module respectively End：For the image information, the voice of beam scanning module acquisition and the noise direction letter that transmit monitoring camera Breath sends into the sound image combined monitoring display screen being provided with after Coordinate Conversion and carries out sound image combined locating and displaying；

Sound image combined monitoring display screen, its input connects sound image combined processing module output end：For receiving sound The sound image combined information sent as joint imaging processing module simultaneously carries out screen display.

Target chosen module：For policer operation personnel according to the image on sound image combined monitoring display screen, sound United information selected target speaker.

The output end of wave beam alignment modules, its input difference linking objective chosen module and microphone array：Root The target speaker selected according to target chosen module carries out microphone array wave beam alignment for direction.

Voice acquisition module, its input connects the output end of wave beam alignment modules：To the language of wave beam alignment modules Message breath is acquired.

The microphone array includes enhancing module, and each channel speech signal output part of microphone array is through increasing Strong module connects beam scanning module and wave beam alignment modules respectively, and the enhancing module is used to strengthen microphone array The voice messaging of row.

The enhancing module includes pre-amplification circuit and analog-digital converter.

The microphone array includes reflector：For carrying out voice signal focusing on microphone；

The target chosen module is provided with mouse input, and operating personnel are by observing sound image combined monitoring display screen By mouse input selected target speaker, target chosen module exports target speaker's after Coordinate Conversion Directional information is to wave beam alignment modules.

Using the sound image combined positioning remote speech harvester the step of it is as follows：

One initialization step：Each module parameter Initialize installation；

One video acquisition step：Monitoring camera gathers long-distance video image；

One beam scanning step：Microphone array is scanned to remote speech and signal travel direction, acquisition side To distributed intelligence；

One sound image combined process step：Merge addition video figure after beam scanning result is carried out into Coordinate Conversion Picture, forms sound image combined video image.

One sound image combined step display：Sound image combined display screen is shown sound image combined result；

One selected step of target：Operating personnel combine image, acoustic information on sound image combined display screen and utilize Mouse selected target speaker, and the directional information that target is spoken is exported by Coordinate Conversion；

One wave beam alignment procedures：Selected target speaker direction input microphone array is carried out into wave beam pair It is accurate；

One speech acquisition step：Microphone array wave beam registration signal is acquired.

Brief description of the drawings

Fig. 1 is the structure composition block diagram of the embodiment of the present invention；

Fig. 2 is the microphone reflector schematic diagram of the embodiment of the present invention；

Fig. 3 is 5 yuan of microphone arrays of the embodiment of the present invention and its is connected circuit diagram with microprocessor；

Fig. 4 is the beam scanning principle schematic of the embodiment of the present invention；

Fig. 5 is connected circuit diagram for the camera of the embodiment of the present invention with microprocessor.

Specific embodiment

In order that technology contents of the invention, feature, advantage are more obvious understandable, following examples will be with reference to attached The present invention is further illustrated for figure.

Microphone array is by 5 in the array remote speech harvester embodiment of the sound image combined positioning Microphone (m0, m1 ..., m4) the composition microphone lines array of individual equidistant arrangement, each microphone in array Reflector shown in Fig. 2 is installed, with axis into 45 degree, reflection cover material is stainless to the reflecting surface of reflector Steel is used with adapting to the outdoor mounted of apparatus of the present invention, in order to focus on remote speech in the present embodiment, sets hair Penetrate cover diameter d₀=40cm.The voice signal that microphone array is obtained carries out beam scanning using beam scanning algorithm Obtain the directional spreding information of remote speech and noise.

Microphone array is made up of microphone and hardware circuit, wherein microphone array by small volume, simple structure, The preceding storing that good omnidirectional microphone m0 ..., m4, the NJM2100 operational amplifier chip of electroacoustic performance is constituted Big circuit and MAX118 modulus conversion chips constitute (as shown in Figure 3), in the present embodiment, in order to gather Remote speech, sets microphone spacing d=40cm.

Beam scanning module, sound image combined processing module, wave beam are aligned and strengthen module, target chosen module etc. Comprising modules belong to digital signal processing module, in the present embodiment using ARM9 S3C2440 microprocessors Carry out software programming realization.

Microphone array is with the connected mode of microprocessor：5 microphone output signal warps in microphone array Multichannel modulus conversion chip is input into after the 2 grades of pre-amplification circuits amplification for crossing the composition of operational amplifier shown in Fig. 2 MAX118, S3C2440 microprocessor by I/O port GPB2, the input channel end A1 of 3,4 control MAX118, A2, A3, by timer output pin TOUT0, TOUT1 control MAX118 reading/write-in port WR, RD carries out the analog-to-digital conversion of sample frequency 16ksps, and 8bit moduluses are carried out by data wire DATA0 to DATA7 Transmission of the transformation result to S3C2440 microprocessors.

Multicenter voice signal in the array remote speech harvester embodiment of the sound image combined positioning After analog-to-digital conversion enters microprocessor, with the number between each digital signal processing module that software programming form is run According to, controlling stream connected mode as shown in figure 3, being described as follows：

Beam scanning module is superimposed after time delay adjustment is carried out gradually to each channel signal time delay of microphone array To obtain the corresponding beam-formed signal of different wave beams.Beam scanning principle combination Fig. 3 is described as follows： In embodiments of the present invention, it is X-axis with horizontal line where 5 yuan of microphone linear arrays, with 5 yuan of microphone linear arrays Location coordinate is set up in middle microphone m2 positions for the origin of coordinates, and each array element spacing of linear array is d.With this The center array element microphone m2 of embodiment linear array carries out beam scanning as benchmark：That is, the voice that m2 is received Signal does not make delay compensation, the voice signal x that rest channels microphone is received in linear array_iCarry out following delay compensation X ' is obtained after calculating_i(as shown in Figure 4)：

x′_i(k, j)=x_i(k′)；

Wherein i is the numbering of each passage in linear array；Beam scanning is taken at intervals of 1.25 degree, then to microphone linear array Front 180 degree scope is scanned need to be scanned 144 times, and 72 times, i.e. j=0 are respectively scanned in left and right, and ± 1, ± 2, ± 3 ... ± 72 Represent beam scanning sequence number；θ_jIt is the scanning beam formed after the adjustment of each time delay, C is the velocity of sound (this in air 340m/s is taken in embodiment), f_sFor the sample frequency of Microphone Array Speech signal, (unit is Hz, in this reality Apply and take 16000Hz in example), round () represents rounding operation.Then each channel speech is believed after delay compensation gradually Number x '_iIt is overlapped and be capable of achieving positive and negative 90 degree of scopes (to microphone linear array front 180 degree in the present embodiment Scope carries out beam scanning) beam scanning, calculating in window L long (L=800 in the present embodiment) to receiving Noisy speech carry out beam scanning can obtain comprising remote speech sound source, noise source direction positive and negative 90 Degree scope beam information E (θ_j), j=0, ± 1, ± 2, ± 3 ... ± 7.

Monitoring camera video acquisition：Because monitoring camera video acquisition is technology generally in the art, Not to this partially unfolded specific descriptions in the embodiment of the present invention, the present embodiment uses band generally in the art The CMOS camera for having 0V9650 chips carries out long-distance video collection, the video figure of camera collection Sound image combined treatment is carried out as being input into S3C2440 microprocessors by USB interface generally in the art.

Sound image combined treatment：Sound image combined process step is to respectively by microphone array and the wave beam of camera acquisition Scanning result and video image carry out acoustics Combined Treatment, obtain beam scanning in S3C2440 microprocessors Beam information coordinate transform is carried out according to camera visual field.OV9650 cameras are taken the photograph to focus in the present embodiment As head (video image format is set to 640 × 320, frame per second 15fps).The present embodiment uses OV9650 60 meters of remote fixed-sites of distance of camera head monitor, centered on camera axis, measurement is obtained The corresponding level angle of fixed-site in OV9650 cameras visual field at 60 meters of distances as monitoring objective It is ± 45 degree.Then carry out being imaged during beam scanning result is converted to sound image combined treatment by following Coordinate Conversion Corresponding beam data in head field range：

Specifically, by display screen after sound image combined treatment with red highlighted curve in 640 × 320 video images The acoustic energy wave beam situation of corresponding angle, policer operation in the most inferior horn displaing coordinate converted images information of display Personnel can be easily according to image harmony energy beam Combined Treatment show that selected need collection is to speak The voice of people.After acoustic image processes coordinate transform, corresponding wave beam in the camera field range that will be obtained Data separate interpolation algorithm generally in the art is calculated 320 spot beam curves, and is superimposed upon camera 320 spot beam curves are represented with red highlighted curve in acquisition view data.

Sound image combined display：Camera superposition wave beam curve data after acoustic image is processed sends into this area In general monitoring display screen, then can ensure in the present embodiment in OV9650 cameras visual field at 60 meters of distances Noise, signal energy distribution situation are intuitively with red highlighted song in corresponding ± 45 degree of horizontal extents in place The mode Overlapping display of line is on 640 × 320 display screens.

Target is selected：Operating personnel can simultaneously see prison by directly observing 640 × 320 monitoring images Voice, noise energy wave beam in control image at 60 meters of distances in personnel's image and correspondence visual field, it is special When not having many people, many vehicles or other noise sources in visual field, operating personnel easily basis can regard Frequency image and sound waves beam curve joint determine target speaker, and carry out target speaker using mouse Determination.After mouse determines target speaker, using technology generally in the art, can be by mouse on screen Mark determines that the horizontal coordinate position z of point is converted to corresponding angle on target θ_t.Changing concrete principle is：

Wave beam is aligned and enhancing：In the present embodiment, distant object speaker is determined by sound image combined positioning Behind direction, by each channel signal of microphone array according to angle on target θ_tCorresponding each channel time delay value is calculated to enter Row polishing, to polishing after each channel signal weighted superposition, obtain alignment target speaker Wave beam forming output Signal, so as to obtain remote enhancing voice；

Voice collecting：Wave beam alignment and enhanced remote speech are adopted using technology generally in the art Collection, and be maintained in the storage medium of setting.

The preferred embodiments of the present invention are the foregoing is only, is not intended to limit the invention.Sound disclosed by the invention It is to combine reflector microphone array as the characteristics of the array remote speech harvester of alignment by union is maximum The noise speech directional information for providing and the target video image alignment by union that monitoring camera is provided are provided and obtain remote Distance objective speaker, overcome conventional method it is remote, there are ambient noise conditions under be difficult to determine target The shortcoming of speaker, after determining target speaker direction especially by sound image combined positioning, with reference to reflector and Microphone array speech enhancement can further suppress the influence of ambient noise, improve remote speech collection property Energy.

Claims

1. a kind of remote speech harvester of sound image combined positioning, it is characterised in that：Including with lower module,

Monitoring camera：For gathering long-distance video image；

Beam scanning module, its input connection microphone array output end：For carrying out beam scanning, long distance is obtained From voice and the directional spreding information of noise；

Sound image combined processing module, its input connects the output end of monitoring camera and beam scanning module respectively：With The voice and noise directional information obtained in the image information, beam scanning module of transmitting monitoring camera are through sitting Sending into the sound image combined monitoring display screen being provided with after mark conversion carries out sound image combined locating and displaying；

Sound image combined monitoring display screen, its input connects sound image combined processing module output end：For receiving acoustic image connection Sound image combined information that synthesized image processing module is sent simultaneously carries out screen display；

Target chosen module：Combined according to the image on sound image combined monitoring display screen, sound for policer operation personnel Information selected target speaker；

The output end of wave beam alignment modules, its input difference linking objective chosen module and microphone array：According to mesh The selected target speaker of mark chosen module carries out microphone array wave beam alignment for direction；

Voice acquisition module, its input connects the output end of wave beam alignment modules：To the voice letter of wave beam alignment modules Breath is acquired.

2. the remote speech harvester of a kind of sound image combined positioning according to claim 1, it is characterised in that： The microphone array includes enhancing module, each enhanced mould of channel speech signal output part of microphone array Block connects beam scanning module and wave beam alignment modules respectively, and the enhancing module is used to strengthen microphone array Voice messaging.

3. the remote speech harvester of a kind of sound image combined positioning according to claim 2, it is characterised in that： The enhancing module includes pre-amplification circuit and analog-digital converter.

4. the remote speech harvester of a kind of sound image combined positioning according to claim 1, it is characterised in that： The microphone array includes reflector：For carrying out voice signal focusing on microphone.

5. the remote speech harvester of a kind of sound image combined positioning according to claim 1, it is characterised in that： The target chosen module is provided with mouse input, and operating personnel are passed through by observing sound image combined monitoring display screen Mouse input selected target speaker, target chosen module exports the direction of target speaker after Coordinate Conversion Information is to wave beam alignment modules.

6. described in usage right requirement 1-5 any one the step of the remote speech harvester of sound image combined positioning：

One initialization step：Each module parameter Initialize installation；

One beam scanning step：Microphone array is scanned to remote speech and signal travel direction, obtains direction point Cloth information；

One sound image combined process step：Merge addition video image, shape after beam scanning result is carried out into Coordinate Conversion Into sound image combined video image.

One wave beam alignment procedures：Selected target speaker direction input microphone array is carried out into wave beam alignment；