CN202068548U

CN202068548U - Three-dimensional space high-definition voice acquisition subsystem of video sensing system

Info

Publication number: CN202068548U
Application number: CN2011200124345U
Authority: CN
Inventors: 穆科明; 王兴国; 方汝松
Original assignee: 穆科明; 王兴国; 方汝松
Current assignee: NANJING JIEMAI VIDEO TECHNOLOGY CO., LTD.
Priority date: 2011-01-17
Filing date: 2011-01-17
Publication date: 2011-12-07
Anticipated expiration: 2021-01-17

Abstract

The utility model discloses a three-dimensional space high-definition voice acquisition subsystem of a video sensing system. The structure of the subsystem comprises that the output of each microphone is connected with the input of a microphone programmable gain amplifier, the signal output end of the microphone programmable gain amplifier is connected with the input of an analog-to-digital conversion module (A/D), the output of the analog-to-digital conversion module (A/D) is connected with the input of an audio preprocessing, the output of the audio preprocessing is connected with the input end of a three-dimensional space high-definition voice processing system, the output end of the three-dimensional space high-definition voice processing system is connected with the input interface of an audio and video signal processor, the output interface of the audio and video signal processor is connected with the input interface of an audio decompression, and the output interface of the audio decompression is connected with a standard audio input interface. Advantages comprise that the subsystem employs a microphone array to carry out acquisition, identification, and confirmation to a surrounding environment sound source, thus sound reecho is reduced; self-setting voice amplification control is realized so as to carry out processing to wideband voice signals; and quality and effect achieve optimum.

Description

Video sensor-based system three dimensions high definition voice collecting subsystem

Background technology

The utility model relates to a kind of video sensor-based system three dimensions high definition voice collecting subsystem, belong to safety monitoring system, intelligent transportation system, HD video conference system, high definition medical video diagnostic device, application technology technical fields such as long-distance educational system.

Background technology

In common rig camera, video conferencing system, Medical Devices, its audio frequency generally all adopts single microphone to gather voice signal at present.Its principle as shown in Figure 1.Language in the conventional video sensor-based system adopts module to be made up of 4 parts basically: 1. the audio signal sample front end is general adopts single microphone to finish.Microphone changes into analog signal with voice signal, in the power amplifier connecting system.2. analog digital conversion A/D module analog signal conversion that the audio signal sample front end is transmitted is a digital signal, is transferred to the audio frequency pre-processing module then.3. the audio frequency pre-treatment mainly is that the digital signal of input is carried out processing such as noise reduction to input signal in numeric field.4. audio compression is that after treatment voice signal form is on request compressed, such as MP3.Being input to audio video processor then stores and transmits.In traditional video sensor-based system, there are two serious problems in the audio collection subsystem: at first, owing to adopt single signal input device, the system requirements microphone is as far as possible near speech source, to obtain quality signal preferably.But in actual applications, being difficult to require the user is near microphone in a minute.Such as, the video camera that is used to monitor, often be placed in higher position or stash, monitored object is difficult near video camera.Do not have the good data source like this, the audio collection front end just can't obtain high-quality signal.Secondly, there is numerous audio signals some application scenario, as outdoor, and exhibit halls etc.For traditional audio signal sample system is can't the identifying purpose signal source, can only mechanically all signals collecting be come in, and compresses.The voice signal of Huo Deing is very noisy like this, is difficult to obtain the signal of theme voice object, can not satisfy any situation subaudio frequency acquisition system and can both obtain best sound effect.

Summary of the invention

The utility model proposes a kind of video sensor-based system three dimensions high definition voice collecting subsystem, its purpose is intended to overcome the existing in prior technology defective, solves the audio-quality problems in the video sensor-based system.Utilize microphone array to realize the collection of high definition voice, the select target sound source is amplified and is handled targetedly in three dimensions.For noise, echo, etc. other non-target sound suppress and eliminate.Microphone array is one group of microphone that the position arranged in proximity is orderly, and microphone array utilizes sound wave to time difference of different microphones and obtain better directivity.

Technical solution of the present utility model: its feature comprises microphone array, the microphone programmable gain amplifier, analog-to-digital conversion module (A/D), the audio frequency pre-treatment, the three dimensions speech processing system, the audio-video signal processor, audio decompression and standard audio output interface, wherein the input of the output of each microphone and microphone programmable gain amplifier is joined, the input of the signal output part of microphone programmable gain amplifier and analog-to-digital conversion module (A/D) joins, the output of analog-to-digital conversion module (A/D) and the input of audio frequency pre-treatment join, the output of audio frequency pre-treatment and three dimensions high definition speech processing system input join, the input interface of three dimensions high definition speech processing system output and audio-video signal processor joins, the output interface of audio-video signal processor and the input interface of audio decompression join, and the output interface of audio decompression and standard audio input interface join.

Advantage of the present utility model: adopted microphone array, by of the collection of a plurality of microphones to the surrounding environment sound source.The echo signal source is discerned.Signal source through confirming has dedicated microphone that it is carried out signals collecting in the microphone array.Reducing sound echoes.Realize that voice amplify control from adjusting, and can handle wideband section voice signal.Three dimensions high definition speech collecting system adopts a plurality of microphones.Sound collection ability, quality and best results have been increased to different frequency range.

Description of drawings

Accompanying drawing 1 is the voice collecting subsystem structure schematic diagram in the conventional video sensor-based system.

Accompanying drawing 2 is video sensor-based system three dimensions high definition voice collecting subsystem structure figure.

Accompanying drawing 3 is structural representations of three dimensions speech processing system.

Embodiment

Contrast accompanying drawing 2, its structure comprises microphone array, the microphone programmable gain amplifier, analog-to-digital conversion module (A/D), the audio frequency pre-treatment, the three dimensions speech processing system, the audio-video signal processor, audio decompression and standard audio output interface, wherein the input of the output of each microphone and microphone programmable gain amplifier is joined, the input of the signal output part of microphone programmable gain amplifier and analog-to-digital conversion module (A/D) joins, the output of analog-to-digital conversion module (A/D) and the input of audio frequency pre-treatment join, the output of audio frequency pre-treatment and three dimensions high definition speech processing system input join, the input interface of three dimensions high definition speech processing system output and audio-video signal processor joins, the output interface of audio-video signal processor and the input interface of audio decompression join, and the output interface of audio decompression and standard audio input interface join.

Three dimensions high definition speech collecting system is to utilize microphone array to realize the collection of high definition voice in visual sensing system.It can be in three dimensions targetedly the select target sound source amplify and handle.For noise, echo, etc. other non-target sound suppress and eliminate.Microphone array is one group of microphone that the position arranged in proximity is orderly.Compare with the microphone that tradition is single, microphone array utilizes sound wave to time difference of different microphones and obtain better directivity.Three dimensions high definition speech collecting system has mainly been realized three key technologies: 1. the formation of wave beam, utilize the signal of different microphone inputs in the microphone array, microphone array can wait the microphone that is used for a high orientation, and the voice that can form a high orientation are pounced on and caught wave beam.The wave beam of microphone array can controlled definite object sound source direction.The search engine of microphone array can be searched for target sound source in real time and pouncing on of it caught positioning of beam in current position.The microphone array of this high directivity has reduced the noise of surrounding environment and entering of response signal to a great extent.2. the directivity of array because the noise of microphone array output and echo more much smaller than single microphone output, so also good than single microphone to the inhibition of steady noise.Pounce on the typical pattern of beam direction of catching such as the microphone array voice of a 1000Hz.This pattern is much better than the effect of the microphone of a high price, high-quality, super individual event.In the voice collecting process, the microphone array Control Software is searched for target sound source, and will pounce on and catch the direction of positioning of beam in target sound source.If target sound source is moved, pounces on and catch wave beam and can follow the tracks of sound source.This mechanism is equal to the microphone of two high directivities.A microphone is used for not stopping the input that each voice signal is tested in the scanning three-dimensional space.Another one is that voice are pounced on and caught microphone, and it is oriented to the sound source of descant matter, target sound source that Here it is.3. constant wave beamwidth, normal voice collecting work frequency range is that 200Hz is to 7000Hz.Wavelength fluctuation has 35 times.Be difficult to find a constant microphone or a microphone array of frequency range to satisfy top whole working band like this.In typical working environment, the noise of the overwhelming majority generally is lower than 750Hz all in the lower part of frequency ratio but fortunately.Also be present in low-frequency range and echo, exist hardly for the frequency range that is higher than 4000Hz.The microphone array of such linearity can provide 300Hz constant wave beam frequency range to 5000Hz, the working frequency range of the basic voice collecting that satisfies.Adopt a plurality of microphones to form microphone arrays, can be automatically, recognition objective speech source effectively, and can dynamically lock, follow the tracks of this sound source.We have adopted 4 displays that microphone combination forms in visual sensing system.By the real-time control to the preposition amplification of each microphone, analog-to-digital conversion (A/D), audio frequency pre-treatment, system finally obtains the voice signal of high definition.This real-time control is to be finished by the three dimensions speech processing system.

Contrast accompanying drawing 3, the structure of three dimensions speech processing system comprises microphone programmable gain amplifier, analog-to-digital conversion module, high definition voice controller and standard audio output interface, wherein the signal output part of microphone programmable gain amplifier joins by the digital filter signal input part in A/D analog-to-digital conversion module and the high definition voice controller, and parameter programmable I IR filter signal output and standard audio output interface in the high definition voice controller join.

The structure of high definition voice controller comprises digital filter, 5 band equalizer, central processing unit, high pass filter able to programme, automatic gain controller, parameter programmable I IR filter, wherein the signal input part of the signal output part of digital filter and 5 band equalizer joins, first signal output part of 5 band equalizer and first signal input part of automatic gain controller join, the secondary signal output of 5 band equalizer and first signal input part of high pass filter able to programme join, first signal output part of central processing unit and the secondary signal input of high pass filter able to programme join, the secondary signal output of central processing unit and the secondary signal input of automatic gain controller join, first signal input part of the 3rd signal output part of central processing unit and parameter programmable I IR filter joins, and the secondary signal input of the signal output part of high pass filter able to programme and parameter programmable I IR filter joins.

1.) microphone programmable gain amplifier

The microphone programmable gain amplifier comprises programmable microphone gain amplifier and fixed gain amplifier.By the effect of automatic gain controller, it is constant that the microphone programmable gain amplifier keeps outputing to the analog voice signal of analog-to-digital conversion module.

2.) analog-to-digital conversion module

The analog-to-digital conversion control module adopts many bits higher order signal sampling architectures.It supports this employing frequency, is the sample frequency of 8ks/s to the high definition voice signal from the received pronunciation sample frequency, 48ks/s.

3.) high definition voice controller

The high definition voice controller is made up of 6 modules:

The ■ digital filter

The digital decimation of employing Sigma-Delta structure, interpolation filter can be at the voice digital signal of sample frequency 8ks/s to output high definition between the 48ks/s.It can suppress extraordinary noise, such as wind noise of outdoor environment etc.

■ 5 band equalizer

The volume relative size that adopts the dynamic volume equalizer to regulate indivedual wave bands, voice are sounded more 3D effect.5 band equalizer by to the sound (20Hz-16KHz) of different frequency by the center cut-off frequency to signal carry out-12dB is to gain or the inhibition of+12dB.Voice signal by equalizer is clear, and is melodious, not thin.

■ high pass filter able to programme

The ■ high pass filter can pass through high-frequency signal.Its attenuation amplitude is lower than the cut-off frequency of general filter.The attenuation of each frequency is able to programme.The high pass filter of native system is supported two kinds of patterns: cut-off frequency is at first order IIR filtering device and the programmable bivalent high-pass filter of cut-off frequency of 3.7Hz.Parameter programmable I IR filter

Iir filter is used for eliminating the narrow-band noise in the assigned frequency voice signal, not as the noise jamming of 50Hz-60Hz.Iir filter has different centre frequencies and bandwidth settings.These settings all are to finish by programmable parameter setting.

The ■ automatic gain controller

Automatic gain controller is controlled the programmable microphone gain amplifier in real time according to the input signal after being exaggerated.Contain a digital peak value detector in the automatic gain controller, the time compares input signal and the threshold value that configures.

The ■ central processing unit

Central processing unit is regulated each module, parameter in real time according to the output of each module and presetting of system.

4.) standard audio output

The high definition voice controller adopts the voice output interface of standard.The agreement of its dateout is able to programme.Can support I2S, DSP Mode, MSB-First L and MSB-First R etc.It may operate in holotype or under pattern.

Claims

1. video sensor-based system three dimensions high definition voice collecting subsystem, its feature comprises microphone array, the microphone programmable gain amplifier, analog-to-digital conversion module A/D, the audio frequency pre-treatment, the three dimensions speech processing system, the audio-video signal processor, audio decompression and standard audio output interface, wherein the input of the output of each microphone and microphone programmable gain amplifier is joined, the input of the signal output part of microphone programmable gain amplifier and analog-to-digital conversion module A/D joins, the input of the output of analog-to-digital conversion module A/D and audio frequency pre-treatment joins, the output of audio frequency pre-treatment and three dimensions high definition speech processing system input join, the input interface of three dimensions high definition speech processing system output and audio-video signal processor joins, the output interface of audio-video signal processor and the input interface of audio decompression join, and the output interface of audio decompression and standard audio input interface join.

2. video sensor-based system three dimensions high definition voice collecting subsystem according to claim 1, the structure that it is characterized in that the three dimensions speech processing system comprises microphone programmable gain amplifier, analog-to-digital conversion module, high definition voice controller and standard audio output interface, wherein the signal output part of microphone programmable gain amplifier joins by the digital filter signal input part in A/D analog-to-digital conversion module and the high definition voice controller, and parameter programmable I IR filter signal output and standard audio output interface in the high definition voice controller join.

3. video sensor-based system three dimensions high definition voice collecting subsystem according to claim 2, the structure that it is characterized in that the high definition voice controller comprises digital filter, 5 band equalizer, central processing unit, high pass filter able to programme, automatic gain controller, parameter programmable I IR filter, wherein the signal input part of the signal output part of digital filter and 5 band equalizer joins, first signal output part of 5 band equalizer and first signal input part of automatic gain controller join, the secondary signal output of 5 band equalizer and first signal input part of high pass filter able to programme join, first signal output part of central processing unit and the secondary signal input of high pass filter able to programme join, the secondary signal output of central processing unit and the secondary signal input of automatic gain controller join, first signal input part of the 3rd signal output part of central processing unit and parameter programmable I IR filter joins, and the secondary signal input of the signal output part of high pass filter able to programme and parameter programmable I IR filter joins.