CN111768787A

CN111768787A - Multifunctional auxiliary audio-visual method and system

Info

Publication number: CN111768787A
Application number: CN202010592121.5A
Authority: CN
Inventors: 张龙杰; 孙涛; 王诚成; 邓博渊; 刘玄冰; 赵祖星; 刘厚君; 林衍; 刘子谦; 李浩杰
Original assignee: Naval Aeronautical University
Current assignee: Naval Aeronautical University
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-13

Abstract

The invention discloses a multifunctional auxiliary audio-visual method, which comprises the following steps: acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems; acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system; analyzing the first voice signal and the second voice signal, and comparing the high-frequency component ratio of the first voice signal with a preset threshold value to obtain an analysis result; processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result; and controlling an AR display module in the audio-visual AR system to display the processing result. The voice signal acquired by the voice acquisition module is processed and the processing result is displayed through the sound source positioning and voice character conversion technology, so that the problem that a person with hearing impairment cannot hear dangerous warning of a blind area in the visual field or can not talk with people is solved.

Description

Multifunctional auxiliary audio-visual method and system

Technical Field

The invention relates to the field of augmented reality, in particular to a multifunctional auxiliary audio-visual method and a multifunctional auxiliary audio-visual system.

Background

People with hearing impairment often have a lot of difficulties and dangers in life, for example, people who talk with the hearing impairment or people who walk on the road where vehicles come and go have some troubles or dangers, and the life dilemma brought by the hearing impairment is a problem to be solved.

With the progress of science and technology, many high-tech products appear in our lives, and products for solving hearing-impaired people are also endless, for example, hearing aids which we often see now bring serious burden to ears if the hearing aids are worn for a long time, and the hearing aids often introduce some noisy noises, so that the hearing-impaired people cannot judge dangerous directions or hear the conversation of the other party clearly. The problem that an intelligent multifunctional auxiliary audio-visual system is needed to be solved urgently is provided.

Disclosure of Invention

The invention aims to provide a multifunctional auxiliary audio-visual method and a multifunctional auxiliary audio-visual system, which solve the problem that people with hearing impairment cannot judge dangerous directions or hear the conversation of the other party clearly at present.

In order to achieve the purpose, the invention adopts the technical scheme that:

a method of multifunctional assisted audio visual comprising the steps of:

the method comprises the steps of firstly, acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems;

secondly, acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;

thirdly, analyzing the first voice signal and the second voice signal, and comparing the high-frequency component proportion of the first voice signal with a preset threshold value to obtain an analysis result;

fourthly, processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;

and fifthly, controlling an AR display module in the audio-visual AR system to display a processing result.

As an embodiment, the ratio of the high-frequency components of the first voice signal exceeds the preset threshold, the position of the sound source is located according to a sound source location algorithm, and the video signal superposition warning character collected by a video collection module close to the position of the sound source is called as a processing result.

As an embodiment, when the ratio of the high-frequency components of the first voice signal is lower than the preset threshold and the second voice signal is collected, the voice signal is converted into words according to an online voice recognition algorithm as a processing result.

Furthermore, the sound source positioning algorithm is to monitor the first voice signal of the voice acquisition module in each audio-visual acquisition system, and position the sound source by using the time difference of the first voice signal acquired by the voice acquisition module in each audio-visual acquisition system.

Further, the preset threshold value is 10% -20%.

The invention also discloses a multifunctional auxiliary audio-visual system, which comprises a central processing module, a power supply module, at least three audio-visual acquisition systems and an audio-visual AR system, wherein the audio-visual acquisition systems comprise a voice acquisition module and a video acquisition module, the voice acquisition module and the video acquisition module are in communication connection with the central processing module, the audio-visual AR system comprises a voice acquisition module and an AR display module, the voice acquisition module and the AR display module are in communication connection with the central processing module, and the battery module is respectively in electrical connection with the central processing module, the at least three audio-visual acquisition systems and the audio-visual AR system.

Further, the at least three audiovisual acquisition systems are arranged non-colinear.

Further, the audiovisual AR system is disposed in front of the head.

Compared with the prior art, the invention has the following advantages and beneficial effects:

collecting high-frequency sounds such as alarms, whistling and the like by utilizing a voice collection module array; calculating the distance and the direction of the sound source by utilizing the time difference of the sound received by each voice acquisition module, and realizing the positioning of the sound source; starting a corresponding video acquisition module according to the sound source positioning result, and acquiring a video image of the area where the sound source is located; according to the actual situation, the central processing module generates warning characters such as 'left side perceived siren'; utilize AR display module, superpose the warning character on the image that the camera was gathered, show in supplementary audio-visual system person of wearing before the eye, make the general perception sound source of dysaudia department's environmental situation of video through warning character and projection. On the other hand, when the hearing-impaired person communicates with a normal person, the voice acquisition module arranged in front of the multifunctional auxiliary audio-visual system is started to acquire the voice information of the other party in real time; the central processing module carries out online voice recognition and converts an input voice signal into a text; the speech recognized text is sent to the AR device and displayed in front of the wearer's eyes to help the hearing impaired person "hear" the sound.

Drawings

FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;

FIG. 2 is an information flow diagram of a first embodiment of the present invention;

FIG. 3 is a system configuration diagram of a second embodiment of the present invention;

FIG. 4 is a block diagram of an audio-visual acquisition system according to a second embodiment of the present invention;

FIG. 5 is a block diagram showing the structure of an audiovisual AR system in a second embodiment of the present invention;

fig. 6 is a schematic diagram of three-microphone sound source localization according to the first embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1:

as shown in fig. 1, the invention discloses a multifunctional auxiliary audio-visual method, which comprises the following steps:

the method comprises the steps of firstly, acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems; the voice acquisition module is preferably a high sensitivity microphone.

thirdly, analyzing the first voice signal and the second voice signal, and comparing the high-frequency component proportion of the first voice signal with a preset threshold value, wherein the threshold value can be set by a user, generally 10% -20%, to obtain an analysis result;

Specifically, when the ratio of the high-frequency components of the first voice signal exceeds a preset threshold value, the position of a sound source is positioned according to a sound source positioning algorithm, and a video signal superposition warning character acquired by a video acquisition module close to the position of the sound source is called as a processing result; the application scenario at this time is as follows: the hearing-impaired people wear the multifunctional auxiliary audio-visual system to do outdoor activities, vehicles running 20 meters behind the hearing-impaired people continuously whistle for prompting passers-by, the multifunctional auxiliary audio-visual system perceives sound, after analysis, the high-frequency component proportion in the sound is judged to be 30%, sound source positioning is immediately carried out, the distance and the position of the sound source are calculated, then rear video acquisition equipment is started, video information in the area is projected to the eyes of the wearers, meanwhile, warning signs of 'attention, continuous whistle sounds' transmitted 20 meters behind the hearing-impaired people are superposed and displayed on videos, the hearing-impaired people do not need to look around, and the situation of the surrounding environment can be comprehensively known by means of the multifunctional auxiliary audio-visual system.

The sound source localization algorithm is to monitor the first voice signal of the voice acquisition module in each audio-visual acquisition system, and to localize the sound source by using the time difference of the first voice signal acquired by the voice acquisition module in each audio-visual acquisition system, for example, three audio-visual acquisition systems are used for localization, specifically as shown in fig. 6, three microphones are used as the voice acquisition modules for example, and a microphone M is used as the voice acquisition module for example₁,M₂Is the origin at M₁,M₂The connecting line of (A) is an x-axis, a microphone array coordinate system Oxy is established, and then the microphone M₀On the y-axis. According to the coordinate system definition, assume M₀,M₁,M₂Respectively are (0, l)₂)，(-l₁0) and (l)₁0), the coordinates of the sound source S are (Rcos θ, Rsin θ), and R is the distance from S to the origin.

According to fig. 6, assuming that the sound velocity is c, the sound source reaches the microphone M₀,M₁,M₂Respectively at a time of τ₀，τ₁，τ₂Time difference τ₀₁＝τ₁-τ₀,τ₀₂＝τ₂-τ₀，τ₁₂＝τ₂-τ₁According to a geometric relationship have

The above equation set is an equation about R and θ, and the azimuth and distance of the sound source can be obtained by using a binary quadratic solution method.

When the ratio of the high-frequency components of the first voice signal is lower than a preset threshold and the second voice signal is collected, the voice signal is converted into words as a processing result according to an online voice recognition algorithm, and the application scene at the moment is as an example: the hearing-impaired person wears the multi-functional supplementary audio-visual system and carries out face-to-face interchange with normal people, and the speech acquisition module in the audio-visual AR system gathers the speech information of other side, and central processing module on-line discernment pronunciation converts speech information into the characters, then projects text information through AR display module and to hearing-impaired person's eye the place ahead, realizes hearing-impaired person's effect of "hearing" sound.

Example 2:

as shown in fig. 3 and 4, the invention further discloses a multifunctional auxiliary audio-visual system, which comprises a central processing module, a power supply module, at least three audio-visual acquisition systems and an audio-visual AR system, wherein the audio-visual acquisition systems comprise a voice acquisition module and a video acquisition module, the voice acquisition module and the video acquisition module are in communication connection with the central processing module, the audio-visual AR system comprises a voice acquisition module and an AR display module, the voice acquisition module and the AR display module are in communication connection with the central processing module, and the battery module is respectively in electrical connection with the central processing module, the at least three audio-visual acquisition systems and the audio-visual AR system.

Wherein, central processing module can adopt STM32 series singlechip, and the pronunciation collection module can adopt the higher microphone of sensitivity, and video acquisition module can adopt high definition miniature camera, and AR display module can select to use VUFINE augmented reality glasses.

Wherein, at least three audio-visual collection systems are arranged in a non-collinear way, preferably, the number of the audio-visual collection systems is three, the three audio-visual collection systems are respectively arranged on the left side, the right side and the rear side of the head of a person, and the audio-visual AR system is arranged in front of the head of the person.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as being fixed or detachable or integrally connected; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Claims

1. A multifunctional auxiliary audio-visual method is characterized by comprising the following steps:

acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems;

acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;

analyzing the first voice signal and the second voice signal, and comparing the high-frequency component ratio of the first voice signal with a preset threshold value to obtain an analysis result;

processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;

and controlling an AR display module in the audio-visual AR system to display the processing result.

2. The multifunctional auxiliary audio-visual method according to claim 1, wherein the ratio of the high frequency components of the first voice signal exceeds the preset threshold, the sound source position is located according to a sound source location algorithm, and a video signal acquired by a video acquisition module close to the sound source position is called to superimpose a warning character as a processing result.

3. The multifunctional auxiliary audio-visual method according to claim 1, wherein when the ratio of the high frequency components of the first voice signal is lower than the preset threshold and the second voice signal is collected, the voice signal is converted into words according to an online voice recognition algorithm as a processing result.

4. The method of claim 2, wherein the sound source localization algorithm is to intercept the first voice signal of the voice capturing module in each of the audio-visual capturing systems, and to localize the sound source by using the time difference of the first voice signal captured by the voice capturing module in each of the audio-visual capturing systems.

5. The method of claim 1, wherein the predetermined threshold is 10% to 20%.

6. The utility model provides a multi-functional supplementary visual system of listening which characterized in that, includes central processing module, power module, at least three visual collection system and visual AR system, visual collection system includes voice acquisition module and video acquisition module, voice acquisition module and video acquisition module with central processing module communication connection, visual AR system includes voice acquisition module and AR display module, voice acquisition module and AR display module with central processing module communication connection, battery module with central processing module, at least three visual collection system with visual AR system electric connection respectively.

7. A multi-functional assisted-audio-visual system according to claim 6, characterized in that said at least three audio-visual acquisition systems are arranged non-collinearly.

8. The system of claim 6, wherein said audio-visual AR system is disposed in front of the head.