CN111932619A

CN111932619A - Microphone tracking system and method combining image recognition and voice positioning

Info

Publication number: CN111932619A
Application number: CN202010718515.0A
Authority: CN
Inventors: 虞焰兴
Original assignee: Anhui Semxum Information Technology Co ltd
Current assignee: Anhui Semxum Information Technology Co ltd
Priority date: 2020-07-23
Filing date: 2020-07-23
Publication date: 2020-11-13

Abstract

The invention discloses a microphone tracking system and method combining image recognition and voice positioning, and relates to the technical field of voice positioning. The invention comprises a camera, a microphone and a background server; the microphone comprises a sound acquisition module and a sound processing module; the sound acquisition module is used for acquiring sound of the current scene; the sound processing module is used for enhancing the sound according to the scene where the current microphone is positioned; the background server is used for calculating the distance between the microphone and the mouth and adjusting the distance and the elevation angle of the microphone. According to the invention, two top view scenes and two side view scenes are acquired through the camera, the microphone in the scenes is used as the original point of the three-dimensional image, the distance and the elevation angle between the microphone and the mouth of a person are calculated by utilizing a spatial filtering algorithm, and the current scene is judged to be in a near-field scene or a far-field scene, the microphone processing module is used for adjusting the strength of sound, so that the optimal angle and distance are intelligently adjusted, the method is suitable for a complex environment, and the user experience is improved.

Description

Microphone tracking system and method combining image recognition and voice positioning

Technical Field

The invention belongs to the technical field of sound positioning, and particularly relates to a microphone tracking system and method combining image recognition and voice positioning.

Background

The existing voice positioning system and method are based on a microphone array to complete positioning, real-time tracking cannot be achieved, the positioning of the microphone array can be carried out again only by awakening the positioning system through voice, real-time tracking and monitoring cannot be achieved, and the user experience effect is poor.

Meanwhile, the existing voice positioning system and method have higher requirements on the applicable environment due to the self limitation: on one hand, the anti-interference capability is poor, for example, the anti-echo interference capability is poor, and for example, a voice positioning system integrated in equipment such as a television and a sound system, the self-sounding content can also interfere with positioning because the equipment pronounces; on the other hand, the adaptive capacity of a complex environment is poor, the positioning accuracy is reduced in a noise environment, and the interference of unsteady noise, such as multiple persons speaking at the same time, and the positioning accuracy is also affected by room reverberation, for example, a high reverberation environment of a hard reflection medium around, such as glass, etc., is provided.

In addition, the existing speech positioning system and method are limited by the microphone array, for example, the two-microphone array can only satisfy 180 ° planar positioning, the four-microphone array can only satisfy 360 ° planar positioning, and usually, the spatial positioning needs to be realized by the microphone array with a complex array type, but the three-dimensional spatial positioning is difficult to be realized by simpler equipment.

Disclosure of Invention

The invention aims to provide a microphone tracking system and a microphone tracking method combining image recognition and voice positioning.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention relates to a microphone tracking system combining image recognition and voice positioning, which comprises a camera, a microphone and a background server, wherein the camera is connected with the microphone;

the camera is used for acquiring an image sequence of a current scene and sending the acquired image sequence to the background server for processing;

the microphone comprises a sound acquisition module and a sound processing module; the sound acquisition module is used for acquiring sound of the current scene; the sound processing module is used for weakening or enhancing sound according to a near-field scene or a far-field scene where the current microphone is located;

the background server comprises an image recognition unit and a microphone tracking unit; the image identification unit is used for identifying the mouth position and the microphone position of a person in an image sequence, taking the microphone as an origin of a three-dimensional coordinate, calculating the distance between the microphone and the mouth according to a spatial filtering algorithm, and judging whether a current scene is in a near-field scene or a far-field scene by using a preset distance threshold value; the microphone tracking module is used for adjusting the distance and the elevation angle of the microphone according to the calculated distance between the microphone and the mouth.

Preferably, the camera position and the microphone position are unified three-dimensional coordinates.

Preferably, the image recognition unit calculates the distance between the mouth of the person and the microphone, then judges the current scene and positions the directions of the mouth and the microphone, and feeds the scene and the directions of the mouth and the microphone back to the sound processing unit of the microphone; the sound processing unit reinforces the sound signal of the positioning direction and simultaneously suppresses the sound signals of other directions.

Preferably, the microphones are a set of two-microphone array; the microphone is fixed right in front of the teacher desk; the camera is a set of camera, and one of them camera is located the microphone directly over, and another camera is fixed in lectern one side and is the same with the microphone height.

The invention relates to a microphone tracking method combining image recognition and voice positioning, which comprises the following steps:

step S1: acquiring a current scene image sequence;

step S2: recognizing a human face and a microphone in the image sequence, and caching and recognizing a three-dimensional coordinate with the microphone as an origin;

step S3: calculating the distance and angle between the microphone and the mouth according to a spatial filtering algorithm;

step S4: judging whether the current scene is in a near-field scene or a far-field scene by using a preset distance threshold value;

step S5: the microphone tracking module adjusts the distance and the elevation angle between the microphone and the human mouth;

step S6: the microphone processing module processes the attenuation or enhancement of the sound depending on whether the current scene is a near-field scene or a far-field scene.

The invention has the following beneficial effects:

according to the invention, two top view scenes and two side view scenes are acquired through the camera, the microphone in the scenes is used as the original point of the three-dimensional image, the distance and the elevation angle between the microphone and the mouth of a person are calculated by utilizing a spatial filtering algorithm, and the current scene is judged to be in a near-field scene or a far-field scene, the microphone processing module is used for adjusting the strength of sound, so that the optimal angle and distance are intelligently adjusted, the method is suitable for a complex environment, and the user experience is improved.

Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a microphone tracking system incorporating image recognition and voice localization according to the present invention;

FIG. 2 is a diagram of the steps of a microphone tracking method combining image recognition and voice localization according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present invention is a microphone tracking system combining image recognition and voice positioning, including a camera, a microphone and a background server;

the camera is used for acquiring an image sequence of a current scene and sending the acquired image sequence to the background server for processing; the method comprises the following steps that a plurality of images are collected by a camera, and two optimal pictures are finally selected from the images, wherein the two optimal pictures are respectively a top view collected by the right-above camera and a side view collected by the right-side camera;

The camera position and the microphone position are unified to form a three-dimensional coordinate, the X-axis coordinate and the Z-axis coordinate of the human mouth compared with the microphone original point are obtained from the side view by using the original point of the three-dimensional coordinate of the microphone, and the X-axis coordinate and the Y-axis coordinate of the human mouth compared with the microphone original point are obtained from the top view, so that the specific coordinate position of the human mouth in the three-dimensional coordinate with the microphone as the original point is accurately obtained, and the angle of the real distance between the microphone and the human mouth is conveniently calculated.

The image recognition unit calculates the distance between the mouth of a person and the microphone, then judges the current scene and positions the directions of the mouth and the microphone, and feeds the scene and the directions back to the sound processing unit of the microphone; the sound processing unit reinforces the sound signal of the positioning direction and simultaneously suppresses the sound signals of other directions.

Wherein, the microphone is a group of double-microphone array; the microphone is fixed right in front of the teacher desk; the camera is a set of camera, wherein a camera is located directly over the microphone, another camera is fixed in lectern one side and the same with the microphone height, and the image of shooing is convenient for establish three-dimensional space coordinate system.

Referring to fig. 2, the present invention is a microphone tracking method combining image recognition and voice localization, including the following steps:

step S1: acquiring a current scene image sequence;

It should be noted that, in the above system embodiment, each included unit is only divided according to functional logic, but is not limited to the above division as long as the corresponding function can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

In addition, it is understood by those skilled in the art that all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. The utility model provides a microphone tracking system who combines image recognition and speech localization, includes camera, microphone and backstage server, its characterized in that:

2. The system of claim 1, wherein the camera position and the microphone position are unified three-dimensional coordinates.

3. The microphone tracking system combining image recognition and voice positioning as claimed in claim 1, wherein the image recognition unit calculates the distance from the human mouth to the microphone, determines the current scene and positions the direction of the mouth and the microphone, and feeds the determined distance back to the sound processing unit of the microphone; the sound processing unit reinforces the sound signal of the positioning direction and simultaneously suppresses the sound signals of other directions.

4. The system of claim 1, wherein the microphones are a set of two-microphone arrays; the microphone is fixed right in front of the teacher desk; the camera is a set of camera, and one of them camera is located the microphone directly over, and another camera is fixed in lectern one side and is the same with the microphone height.

5. A microphone tracking method combining image recognition and voice localization, comprising the steps of:

step S1: acquiring a current scene image sequence;