CN111768787A - Multifunctional auxiliary audio-visual method and system - Google Patents
Multifunctional auxiliary audio-visual method and system Download PDFInfo
- Publication number
- CN111768787A CN111768787A CN202010592121.5A CN202010592121A CN111768787A CN 111768787 A CN111768787 A CN 111768787A CN 202010592121 A CN202010592121 A CN 202010592121A CN 111768787 A CN111768787 A CN 111768787A
- Authority
- CN
- China
- Prior art keywords
- visual
- audio
- voice
- voice signal
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/57—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
Abstract
The invention discloses a multifunctional auxiliary audio-visual method, which comprises the following steps: acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems; acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system; analyzing the first voice signal and the second voice signal, and comparing the high-frequency component ratio of the first voice signal with a preset threshold value to obtain an analysis result; processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result; and controlling an AR display module in the audio-visual AR system to display the processing result. The voice signal acquired by the voice acquisition module is processed and the processing result is displayed through the sound source positioning and voice character conversion technology, so that the problem that a person with hearing impairment cannot hear dangerous warning of a blind area in the visual field or can not talk with people is solved.
Description
Technical Field
The invention relates to the field of augmented reality, in particular to a multifunctional auxiliary audio-visual method and a multifunctional auxiliary audio-visual system.
Background
People with hearing impairment often have a lot of difficulties and dangers in life, for example, people who talk with the hearing impairment or people who walk on the road where vehicles come and go have some troubles or dangers, and the life dilemma brought by the hearing impairment is a problem to be solved.
With the progress of science and technology, many high-tech products appear in our lives, and products for solving hearing-impaired people are also endless, for example, hearing aids which we often see now bring serious burden to ears if the hearing aids are worn for a long time, and the hearing aids often introduce some noisy noises, so that the hearing-impaired people cannot judge dangerous directions or hear the conversation of the other party clearly. The problem that an intelligent multifunctional auxiliary audio-visual system is needed to be solved urgently is provided.
Disclosure of Invention
The invention aims to provide a multifunctional auxiliary audio-visual method and a multifunctional auxiliary audio-visual system, which solve the problem that people with hearing impairment cannot judge dangerous directions or hear the conversation of the other party clearly at present.
In order to achieve the purpose, the invention adopts the technical scheme that:
a method of multifunctional assisted audio visual comprising the steps of:
the method comprises the steps of firstly, acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems;
secondly, acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;
thirdly, analyzing the first voice signal and the second voice signal, and comparing the high-frequency component proportion of the first voice signal with a preset threshold value to obtain an analysis result;
fourthly, processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;
and fifthly, controlling an AR display module in the audio-visual AR system to display a processing result.
As an embodiment, the ratio of the high-frequency components of the first voice signal exceeds the preset threshold, the position of the sound source is located according to a sound source location algorithm, and the video signal superposition warning character collected by a video collection module close to the position of the sound source is called as a processing result.
As an embodiment, when the ratio of the high-frequency components of the first voice signal is lower than the preset threshold and the second voice signal is collected, the voice signal is converted into words according to an online voice recognition algorithm as a processing result.
Furthermore, the sound source positioning algorithm is to monitor the first voice signal of the voice acquisition module in each audio-visual acquisition system, and position the sound source by using the time difference of the first voice signal acquired by the voice acquisition module in each audio-visual acquisition system.
Further, the preset threshold value is 10% -20%.
The invention also discloses a multifunctional auxiliary audio-visual system, which comprises a central processing module, a power supply module, at least three audio-visual acquisition systems and an audio-visual AR system, wherein the audio-visual acquisition systems comprise a voice acquisition module and a video acquisition module, the voice acquisition module and the video acquisition module are in communication connection with the central processing module, the audio-visual AR system comprises a voice acquisition module and an AR display module, the voice acquisition module and the AR display module are in communication connection with the central processing module, and the battery module is respectively in electrical connection with the central processing module, the at least three audio-visual acquisition systems and the audio-visual AR system.
Further, the at least three audiovisual acquisition systems are arranged non-colinear.
Further, the audiovisual AR system is disposed in front of the head.
Compared with the prior art, the invention has the following advantages and beneficial effects:
collecting high-frequency sounds such as alarms, whistling and the like by utilizing a voice collection module array; calculating the distance and the direction of the sound source by utilizing the time difference of the sound received by each voice acquisition module, and realizing the positioning of the sound source; starting a corresponding video acquisition module according to the sound source positioning result, and acquiring a video image of the area where the sound source is located; according to the actual situation, the central processing module generates warning characters such as 'left side perceived siren'; utilize AR display module, superpose the warning character on the image that the camera was gathered, show in supplementary audio-visual system person of wearing before the eye, make the general perception sound source of dysaudia department's environmental situation of video through warning character and projection. On the other hand, when the hearing-impaired person communicates with a normal person, the voice acquisition module arranged in front of the multifunctional auxiliary audio-visual system is started to acquire the voice information of the other party in real time; the central processing module carries out online voice recognition and converts an input voice signal into a text; the speech recognized text is sent to the AR device and displayed in front of the wearer's eyes to help the hearing impaired person "hear" the sound.
Drawings
FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;
FIG. 2 is an information flow diagram of a first embodiment of the present invention;
FIG. 3 is a system configuration diagram of a second embodiment of the present invention;
FIG. 4 is a block diagram of an audio-visual acquisition system according to a second embodiment of the present invention;
FIG. 5 is a block diagram showing the structure of an audiovisual AR system in a second embodiment of the present invention;
fig. 6 is a schematic diagram of three-microphone sound source localization according to the first embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in fig. 1, the invention discloses a multifunctional auxiliary audio-visual method, which comprises the following steps:
the method comprises the steps of firstly, acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems; the voice acquisition module is preferably a high sensitivity microphone.
Secondly, acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;
thirdly, analyzing the first voice signal and the second voice signal, and comparing the high-frequency component proportion of the first voice signal with a preset threshold value, wherein the threshold value can be set by a user, generally 10% -20%, to obtain an analysis result;
fourthly, processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;
and fifthly, controlling an AR display module in the audio-visual AR system to display a processing result.
Specifically, when the ratio of the high-frequency components of the first voice signal exceeds a preset threshold value, the position of a sound source is positioned according to a sound source positioning algorithm, and a video signal superposition warning character acquired by a video acquisition module close to the position of the sound source is called as a processing result; the application scenario at this time is as follows: the hearing-impaired people wear the multifunctional auxiliary audio-visual system to do outdoor activities, vehicles running 20 meters behind the hearing-impaired people continuously whistle for prompting passers-by, the multifunctional auxiliary audio-visual system perceives sound, after analysis, the high-frequency component proportion in the sound is judged to be 30%, sound source positioning is immediately carried out, the distance and the position of the sound source are calculated, then rear video acquisition equipment is started, video information in the area is projected to the eyes of the wearers, meanwhile, warning signs of 'attention, continuous whistle sounds' transmitted 20 meters behind the hearing-impaired people are superposed and displayed on videos, the hearing-impaired people do not need to look around, and the situation of the surrounding environment can be comprehensively known by means of the multifunctional auxiliary audio-visual system.
The sound source localization algorithm is to monitor the first voice signal of the voice acquisition module in each audio-visual acquisition system, and to localize the sound source by using the time difference of the first voice signal acquired by the voice acquisition module in each audio-visual acquisition system, for example, three audio-visual acquisition systems are used for localization, specifically as shown in fig. 6, three microphones are used as the voice acquisition modules for example, and a microphone M is used as the voice acquisition module for example1,M2Is the origin at M1,M2The connecting line of (A) is an x-axis, a microphone array coordinate system Oxy is established, and then the microphone M0On the y-axis. According to the coordinate system definition, assume M0,M1,M2Respectively are (0, l)2),(-l10) and (l)10), the coordinates of the sound source S are (Rcos θ, Rsin θ), and R is the distance from S to the origin.
According to fig. 6, assuming that the sound velocity is c, the sound source reaches the microphone M0,M1,M2Respectively at a time of τ0,τ1,τ2Time difference τ01=τ1-τ0,τ02=τ2-τ0,τ12=τ2-τ1According to a geometric relationship have
The above equation set is an equation about R and θ, and the azimuth and distance of the sound source can be obtained by using a binary quadratic solution method.
When the ratio of the high-frequency components of the first voice signal is lower than a preset threshold and the second voice signal is collected, the voice signal is converted into words as a processing result according to an online voice recognition algorithm, and the application scene at the moment is as an example: the hearing-impaired person wears the multi-functional supplementary audio-visual system and carries out face-to-face interchange with normal people, and the speech acquisition module in the audio-visual AR system gathers the speech information of other side, and central processing module on-line discernment pronunciation converts speech information into the characters, then projects text information through AR display module and to hearing-impaired person's eye the place ahead, realizes hearing-impaired person's effect of "hearing" sound.
Example 2:
as shown in fig. 3 and 4, the invention further discloses a multifunctional auxiliary audio-visual system, which comprises a central processing module, a power supply module, at least three audio-visual acquisition systems and an audio-visual AR system, wherein the audio-visual acquisition systems comprise a voice acquisition module and a video acquisition module, the voice acquisition module and the video acquisition module are in communication connection with the central processing module, the audio-visual AR system comprises a voice acquisition module and an AR display module, the voice acquisition module and the AR display module are in communication connection with the central processing module, and the battery module is respectively in electrical connection with the central processing module, the at least three audio-visual acquisition systems and the audio-visual AR system.
Wherein, central processing module can adopt STM32 series singlechip, and the pronunciation collection module can adopt the higher microphone of sensitivity, and video acquisition module can adopt high definition miniature camera, and AR display module can select to use VUFINE augmented reality glasses.
Wherein, at least three audio-visual collection systems are arranged in a non-collinear way, preferably, the number of the audio-visual collection systems is three, the three audio-visual collection systems are respectively arranged on the left side, the right side and the rear side of the head of a person, and the audio-visual AR system is arranged in front of the head of the person.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as being fixed or detachable or integrally connected; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Claims (8)
1. A multifunctional auxiliary audio-visual method is characterized by comprising the following steps:
acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems;
acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;
analyzing the first voice signal and the second voice signal, and comparing the high-frequency component ratio of the first voice signal with a preset threshold value to obtain an analysis result;
processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;
and controlling an AR display module in the audio-visual AR system to display the processing result.
2. The multifunctional auxiliary audio-visual method according to claim 1, wherein the ratio of the high frequency components of the first voice signal exceeds the preset threshold, the sound source position is located according to a sound source location algorithm, and a video signal acquired by a video acquisition module close to the sound source position is called to superimpose a warning character as a processing result.
3. The multifunctional auxiliary audio-visual method according to claim 1, wherein when the ratio of the high frequency components of the first voice signal is lower than the preset threshold and the second voice signal is collected, the voice signal is converted into words according to an online voice recognition algorithm as a processing result.
4. The method of claim 2, wherein the sound source localization algorithm is to intercept the first voice signal of the voice capturing module in each of the audio-visual capturing systems, and to localize the sound source by using the time difference of the first voice signal captured by the voice capturing module in each of the audio-visual capturing systems.
5. The method of claim 1, wherein the predetermined threshold is 10% to 20%.
6. The utility model provides a multi-functional supplementary visual system of listening which characterized in that, includes central processing module, power module, at least three visual collection system and visual AR system, visual collection system includes voice acquisition module and video acquisition module, voice acquisition module and video acquisition module with central processing module communication connection, visual AR system includes voice acquisition module and AR display module, voice acquisition module and AR display module with central processing module communication connection, battery module with central processing module, at least three visual collection system with visual AR system electric connection respectively.
7. A multi-functional assisted-audio-visual system according to claim 6, characterized in that said at least three audio-visual acquisition systems are arranged non-collinearly.
8. The system of claim 6, wherein said audio-visual AR system is disposed in front of the head.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010592121.5A CN111768787A (en) | 2020-06-24 | 2020-06-24 | Multifunctional auxiliary audio-visual method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010592121.5A CN111768787A (en) | 2020-06-24 | 2020-06-24 | Multifunctional auxiliary audio-visual method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111768787A true CN111768787A (en) | 2020-10-13 |
Family
ID=72721801
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010592121.5A Pending CN111768787A (en) | 2020-06-24 | 2020-06-24 | Multifunctional auxiliary audio-visual method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111768787A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112349182A (en) * | 2020-11-10 | 2021-02-09 | 中国人民解放军海军航空大学 | Deaf-mute conversation auxiliary system |
CN112927704A (en) * | 2021-01-20 | 2021-06-08 | 中国人民解放军海军航空大学 | Silent all-weather individual communication system |
CN115064036A (en) * | 2022-04-26 | 2022-09-16 | 北京亮亮视野科技有限公司 | AR technology-based danger early warning method and device |
CN115079833A (en) * | 2022-08-24 | 2022-09-20 | 北京亮亮视野科技有限公司 | Multilayer interface and information visualization presenting method and system based on somatosensory control |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106504754A (en) * | 2016-09-29 | 2017-03-15 | 浙江大学 | A kind of real-time method for generating captions according to audio output |
CN106686490A (en) * | 2016-12-20 | 2017-05-17 | 安徽乐年健康养老产业有限公司 | Voice acquisition processing method |
CN108762494A (en) * | 2018-05-16 | 2018-11-06 | 北京小米移动软件有限公司 | Show the method, apparatus and storage medium of information |
CN109065055A (en) * | 2018-09-13 | 2018-12-21 | 三星电子(中国)研发中心 | Method, storage medium and the device of AR content are generated based on sound |
WO2019237427A1 (en) * | 2018-06-11 | 2019-12-19 | 北京佳珥医学科技有限公司 | Method, apparatus and system for assisting hearing-impaired people, and augmented reality glasses |
-
2020
- 2020-06-24 CN CN202010592121.5A patent/CN111768787A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106504754A (en) * | 2016-09-29 | 2017-03-15 | 浙江大学 | A kind of real-time method for generating captions according to audio output |
CN106686490A (en) * | 2016-12-20 | 2017-05-17 | 安徽乐年健康养老产业有限公司 | Voice acquisition processing method |
CN108762494A (en) * | 2018-05-16 | 2018-11-06 | 北京小米移动软件有限公司 | Show the method, apparatus and storage medium of information |
WO2019237427A1 (en) * | 2018-06-11 | 2019-12-19 | 北京佳珥医学科技有限公司 | Method, apparatus and system for assisting hearing-impaired people, and augmented reality glasses |
CN109065055A (en) * | 2018-09-13 | 2018-12-21 | 三星电子(中国)研发中心 | Method, storage medium and the device of AR content are generated based on sound |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112349182A (en) * | 2020-11-10 | 2021-02-09 | 中国人民解放军海军航空大学 | Deaf-mute conversation auxiliary system |
CN112927704A (en) * | 2021-01-20 | 2021-06-08 | 中国人民解放军海军航空大学 | Silent all-weather individual communication system |
CN115064036A (en) * | 2022-04-26 | 2022-09-16 | 北京亮亮视野科技有限公司 | AR technology-based danger early warning method and device |
CN115079833A (en) * | 2022-08-24 | 2022-09-20 | 北京亮亮视野科技有限公司 | Multilayer interface and information visualization presenting method and system based on somatosensory control |
CN115079833B (en) * | 2022-08-24 | 2023-01-06 | 北京亮亮视野科技有限公司 | Multilayer interface and information visualization presenting method and system based on somatosensory control |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111768787A (en) | Multifunctional auxiliary audio-visual method and system | |
WO2016086440A1 (en) | Wearable guiding device for the blind | |
US10111013B2 (en) | Devices and methods for the visualization and localization of sound | |
CN108957761B (en) | Display device and control method thereof, head-mounted display device and control method thereof | |
US7415123B2 (en) | Method and apparatus for producing spatialized audio signals | |
JP2004077277A (en) | Visualization display method for sound source location and sound source location display apparatus | |
KR101421046B1 (en) | Glasses and control method thereof | |
CN110673819A (en) | Information processing method and electronic equipment | |
CN105561543A (en) | Underwater glasses and control method thereof | |
US11328692B2 (en) | Head-mounted situational awareness system and method of operation | |
US20020158816A1 (en) | Translating eyeglasses | |
CN110351631A (en) | Deaf-mute's alternating current equipment and its application method | |
CN105527711A (en) | Smart glasses with augmented reality | |
CN104090385B (en) | A kind of anti-cheating intelligent glasses | |
CN108563020A (en) | A kind of intelligent MR rescue helmets with thermal infrared imager | |
CN206574215U (en) | It is a kind of to carry out leading passive monitoring sacurity alarm system and buckle by mobile phone | |
KR20130133932A (en) | A wearable type head mounted display device for hearing impaired person | |
JP2000325389A (en) | Visual sense assisting device | |
CN112002186A (en) | Information barrier-free system and method based on augmented reality technology | |
CN112396718A (en) | On-site construction safety and quality supervision research system based on AR technology | |
CN213903982U (en) | Novel intelligent glasses and remote visualization system | |
CN209899996U (en) | Blind guiding system based on video communication | |
CN218045797U (en) | Smart cloud glasses and system worn by blind person | |
CN211653603U (en) | Image processing system | |
CN215821381U (en) | Visual field auxiliary device of AR & VR head-mounted typoscope in coordination |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |