CN111768787A - Multifunctional auxiliary audio-visual method and system - Google Patents

Multifunctional auxiliary audio-visual method and system Download PDF

Info

Publication number
CN111768787A
CN111768787A CN202010592121.5A CN202010592121A CN111768787A CN 111768787 A CN111768787 A CN 111768787A CN 202010592121 A CN202010592121 A CN 202010592121A CN 111768787 A CN111768787 A CN 111768787A
Authority
CN
China
Prior art keywords
visual
audio
voice
voice signal
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010592121.5A
Other languages
Chinese (zh)
Inventor
张龙杰
孙涛
王诚成
邓博渊
刘玄冰
赵祖星
刘厚君
林衍
刘子谦
李浩杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval Aeronautical University
Original Assignee
Naval Aeronautical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval Aeronautical University filed Critical Naval Aeronautical University
Priority to CN202010592121.5A priority Critical patent/CN111768787A/en
Publication of CN111768787A publication Critical patent/CN111768787A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals

Abstract

The invention discloses a multifunctional auxiliary audio-visual method, which comprises the following steps: acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems; acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system; analyzing the first voice signal and the second voice signal, and comparing the high-frequency component ratio of the first voice signal with a preset threshold value to obtain an analysis result; processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result; and controlling an AR display module in the audio-visual AR system to display the processing result. The voice signal acquired by the voice acquisition module is processed and the processing result is displayed through the sound source positioning and voice character conversion technology, so that the problem that a person with hearing impairment cannot hear dangerous warning of a blind area in the visual field or can not talk with people is solved.

Description

Multifunctional auxiliary audio-visual method and system
Technical Field
The invention relates to the field of augmented reality, in particular to a multifunctional auxiliary audio-visual method and a multifunctional auxiliary audio-visual system.
Background
People with hearing impairment often have a lot of difficulties and dangers in life, for example, people who talk with the hearing impairment or people who walk on the road where vehicles come and go have some troubles or dangers, and the life dilemma brought by the hearing impairment is a problem to be solved.
With the progress of science and technology, many high-tech products appear in our lives, and products for solving hearing-impaired people are also endless, for example, hearing aids which we often see now bring serious burden to ears if the hearing aids are worn for a long time, and the hearing aids often introduce some noisy noises, so that the hearing-impaired people cannot judge dangerous directions or hear the conversation of the other party clearly. The problem that an intelligent multifunctional auxiliary audio-visual system is needed to be solved urgently is provided.
Disclosure of Invention
The invention aims to provide a multifunctional auxiliary audio-visual method and a multifunctional auxiliary audio-visual system, which solve the problem that people with hearing impairment cannot judge dangerous directions or hear the conversation of the other party clearly at present.
In order to achieve the purpose, the invention adopts the technical scheme that:
a method of multifunctional assisted audio visual comprising the steps of:
the method comprises the steps of firstly, acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems;
secondly, acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;
thirdly, analyzing the first voice signal and the second voice signal, and comparing the high-frequency component proportion of the first voice signal with a preset threshold value to obtain an analysis result;
fourthly, processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;
and fifthly, controlling an AR display module in the audio-visual AR system to display a processing result.
As an embodiment, the ratio of the high-frequency components of the first voice signal exceeds the preset threshold, the position of the sound source is located according to a sound source location algorithm, and the video signal superposition warning character collected by a video collection module close to the position of the sound source is called as a processing result.
As an embodiment, when the ratio of the high-frequency components of the first voice signal is lower than the preset threshold and the second voice signal is collected, the voice signal is converted into words according to an online voice recognition algorithm as a processing result.
Furthermore, the sound source positioning algorithm is to monitor the first voice signal of the voice acquisition module in each audio-visual acquisition system, and position the sound source by using the time difference of the first voice signal acquired by the voice acquisition module in each audio-visual acquisition system.
Further, the preset threshold value is 10% -20%.
The invention also discloses a multifunctional auxiliary audio-visual system, which comprises a central processing module, a power supply module, at least three audio-visual acquisition systems and an audio-visual AR system, wherein the audio-visual acquisition systems comprise a voice acquisition module and a video acquisition module, the voice acquisition module and the video acquisition module are in communication connection with the central processing module, the audio-visual AR system comprises a voice acquisition module and an AR display module, the voice acquisition module and the AR display module are in communication connection with the central processing module, and the battery module is respectively in electrical connection with the central processing module, the at least three audio-visual acquisition systems and the audio-visual AR system.
Further, the at least three audiovisual acquisition systems are arranged non-colinear.
Further, the audiovisual AR system is disposed in front of the head.
Compared with the prior art, the invention has the following advantages and beneficial effects:
collecting high-frequency sounds such as alarms, whistling and the like by utilizing a voice collection module array; calculating the distance and the direction of the sound source by utilizing the time difference of the sound received by each voice acquisition module, and realizing the positioning of the sound source; starting a corresponding video acquisition module according to the sound source positioning result, and acquiring a video image of the area where the sound source is located; according to the actual situation, the central processing module generates warning characters such as 'left side perceived siren'; utilize AR display module, superpose the warning character on the image that the camera was gathered, show in supplementary audio-visual system person of wearing before the eye, make the general perception sound source of dysaudia department's environmental situation of video through warning character and projection. On the other hand, when the hearing-impaired person communicates with a normal person, the voice acquisition module arranged in front of the multifunctional auxiliary audio-visual system is started to acquire the voice information of the other party in real time; the central processing module carries out online voice recognition and converts an input voice signal into a text; the speech recognized text is sent to the AR device and displayed in front of the wearer's eyes to help the hearing impaired person "hear" the sound.
Drawings
FIG. 1 is a flow chart of a method according to a first embodiment of the present invention;
FIG. 2 is an information flow diagram of a first embodiment of the present invention;
FIG. 3 is a system configuration diagram of a second embodiment of the present invention;
FIG. 4 is a block diagram of an audio-visual acquisition system according to a second embodiment of the present invention;
FIG. 5 is a block diagram showing the structure of an audiovisual AR system in a second embodiment of the present invention;
fig. 6 is a schematic diagram of three-microphone sound source localization according to the first embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
as shown in fig. 1, the invention discloses a multifunctional auxiliary audio-visual method, which comprises the following steps:
the method comprises the steps of firstly, acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems; the voice acquisition module is preferably a high sensitivity microphone.
Secondly, acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;
thirdly, analyzing the first voice signal and the second voice signal, and comparing the high-frequency component proportion of the first voice signal with a preset threshold value, wherein the threshold value can be set by a user, generally 10% -20%, to obtain an analysis result;
fourthly, processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;
and fifthly, controlling an AR display module in the audio-visual AR system to display a processing result.
Specifically, when the ratio of the high-frequency components of the first voice signal exceeds a preset threshold value, the position of a sound source is positioned according to a sound source positioning algorithm, and a video signal superposition warning character acquired by a video acquisition module close to the position of the sound source is called as a processing result; the application scenario at this time is as follows: the hearing-impaired people wear the multifunctional auxiliary audio-visual system to do outdoor activities, vehicles running 20 meters behind the hearing-impaired people continuously whistle for prompting passers-by, the multifunctional auxiliary audio-visual system perceives sound, after analysis, the high-frequency component proportion in the sound is judged to be 30%, sound source positioning is immediately carried out, the distance and the position of the sound source are calculated, then rear video acquisition equipment is started, video information in the area is projected to the eyes of the wearers, meanwhile, warning signs of 'attention, continuous whistle sounds' transmitted 20 meters behind the hearing-impaired people are superposed and displayed on videos, the hearing-impaired people do not need to look around, and the situation of the surrounding environment can be comprehensively known by means of the multifunctional auxiliary audio-visual system.
The sound source localization algorithm is to monitor the first voice signal of the voice acquisition module in each audio-visual acquisition system, and to localize the sound source by using the time difference of the first voice signal acquired by the voice acquisition module in each audio-visual acquisition system, for example, three audio-visual acquisition systems are used for localization, specifically as shown in fig. 6, three microphones are used as the voice acquisition modules for example, and a microphone M is used as the voice acquisition module for example1,M2Is the origin at M1,M2The connecting line of (A) is an x-axis, a microphone array coordinate system Oxy is established, and then the microphone M0On the y-axis. According to the coordinate system definition, assume M0,M1,M2Respectively are (0, l)2),(-l10) and (l)10), the coordinates of the sound source S are (Rcos θ, Rsin θ), and R is the distance from S to the origin.
According to fig. 6, assuming that the sound velocity is c, the sound source reaches the microphone M0,M1,M2Respectively at a time of τ0,τ1,τ2Time difference τ01=τ1002=τ20,τ12=τ21According to a geometric relationship have
Figure BDA0002555992150000041
The above equation set is an equation about R and θ, and the azimuth and distance of the sound source can be obtained by using a binary quadratic solution method.
When the ratio of the high-frequency components of the first voice signal is lower than a preset threshold and the second voice signal is collected, the voice signal is converted into words as a processing result according to an online voice recognition algorithm, and the application scene at the moment is as an example: the hearing-impaired person wears the multi-functional supplementary audio-visual system and carries out face-to-face interchange with normal people, and the speech acquisition module in the audio-visual AR system gathers the speech information of other side, and central processing module on-line discernment pronunciation converts speech information into the characters, then projects text information through AR display module and to hearing-impaired person's eye the place ahead, realizes hearing-impaired person's effect of "hearing" sound.
Example 2:
as shown in fig. 3 and 4, the invention further discloses a multifunctional auxiliary audio-visual system, which comprises a central processing module, a power supply module, at least three audio-visual acquisition systems and an audio-visual AR system, wherein the audio-visual acquisition systems comprise a voice acquisition module and a video acquisition module, the voice acquisition module and the video acquisition module are in communication connection with the central processing module, the audio-visual AR system comprises a voice acquisition module and an AR display module, the voice acquisition module and the AR display module are in communication connection with the central processing module, and the battery module is respectively in electrical connection with the central processing module, the at least three audio-visual acquisition systems and the audio-visual AR system.
Wherein, central processing module can adopt STM32 series singlechip, and the pronunciation collection module can adopt the higher microphone of sensitivity, and video acquisition module can adopt high definition miniature camera, and AR display module can select to use VUFINE augmented reality glasses.
Wherein, at least three audio-visual collection systems are arranged in a non-collinear way, preferably, the number of the audio-visual collection systems is three, the three audio-visual collection systems are respectively arranged on the left side, the right side and the rear side of the head of a person, and the audio-visual AR system is arranged in front of the head of the person.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as being fixed or detachable or integrally connected; can be mechanically or electrically connected; may be directly connected or indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Claims (8)

1. A multifunctional auxiliary audio-visual method is characterized by comprising the following steps:
acquiring first voice signals acquired by voice acquisition modules in at least three audio-visual acquisition systems, and acquiring video signals acquired by video acquisition modules of at least three audio-visual acquisition systems;
acquiring a second voice signal acquired by a voice acquisition module in the audio-visual AR system;
analyzing the first voice signal and the second voice signal, and comparing the high-frequency component ratio of the first voice signal with a preset threshold value to obtain an analysis result;
processing the first voice signal and the second voice signal according to the analysis result to obtain a processing result;
and controlling an AR display module in the audio-visual AR system to display the processing result.
2. The multifunctional auxiliary audio-visual method according to claim 1, wherein the ratio of the high frequency components of the first voice signal exceeds the preset threshold, the sound source position is located according to a sound source location algorithm, and a video signal acquired by a video acquisition module close to the sound source position is called to superimpose a warning character as a processing result.
3. The multifunctional auxiliary audio-visual method according to claim 1, wherein when the ratio of the high frequency components of the first voice signal is lower than the preset threshold and the second voice signal is collected, the voice signal is converted into words according to an online voice recognition algorithm as a processing result.
4. The method of claim 2, wherein the sound source localization algorithm is to intercept the first voice signal of the voice capturing module in each of the audio-visual capturing systems, and to localize the sound source by using the time difference of the first voice signal captured by the voice capturing module in each of the audio-visual capturing systems.
5. The method of claim 1, wherein the predetermined threshold is 10% to 20%.
6. The utility model provides a multi-functional supplementary visual system of listening which characterized in that, includes central processing module, power module, at least three visual collection system and visual AR system, visual collection system includes voice acquisition module and video acquisition module, voice acquisition module and video acquisition module with central processing module communication connection, visual AR system includes voice acquisition module and AR display module, voice acquisition module and AR display module with central processing module communication connection, battery module with central processing module, at least three visual collection system with visual AR system electric connection respectively.
7. A multi-functional assisted-audio-visual system according to claim 6, characterized in that said at least three audio-visual acquisition systems are arranged non-collinearly.
8. The system of claim 6, wherein said audio-visual AR system is disposed in front of the head.
CN202010592121.5A 2020-06-24 2020-06-24 Multifunctional auxiliary audio-visual method and system Pending CN111768787A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010592121.5A CN111768787A (en) 2020-06-24 2020-06-24 Multifunctional auxiliary audio-visual method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010592121.5A CN111768787A (en) 2020-06-24 2020-06-24 Multifunctional auxiliary audio-visual method and system

Publications (1)

Publication Number Publication Date
CN111768787A true CN111768787A (en) 2020-10-13

Family

ID=72721801

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010592121.5A Pending CN111768787A (en) 2020-06-24 2020-06-24 Multifunctional auxiliary audio-visual method and system

Country Status (1)

Country Link
CN (1) CN111768787A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112349182A (en) * 2020-11-10 2021-02-09 中国人民解放军海军航空大学 Deaf-mute conversation auxiliary system
CN112927704A (en) * 2021-01-20 2021-06-08 中国人民解放军海军航空大学 Silent all-weather individual communication system
CN115064036A (en) * 2022-04-26 2022-09-16 北京亮亮视野科技有限公司 AR technology-based danger early warning method and device
CN115079833A (en) * 2022-08-24 2022-09-20 北京亮亮视野科技有限公司 Multilayer interface and information visualization presenting method and system based on somatosensory control

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106686490A (en) * 2016-12-20 2017-05-17 安徽乐年健康养老产业有限公司 Voice acquisition processing method
CN108762494A (en) * 2018-05-16 2018-11-06 北京小米移动软件有限公司 Show the method, apparatus and storage medium of information
CN109065055A (en) * 2018-09-13 2018-12-21 三星电子(中国)研发中心 Method, storage medium and the device of AR content are generated based on sound
WO2019237427A1 (en) * 2018-06-11 2019-12-19 北京佳珥医学科技有限公司 Method, apparatus and system for assisting hearing-impaired people, and augmented reality glasses

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504754A (en) * 2016-09-29 2017-03-15 浙江大学 A kind of real-time method for generating captions according to audio output
CN106686490A (en) * 2016-12-20 2017-05-17 安徽乐年健康养老产业有限公司 Voice acquisition processing method
CN108762494A (en) * 2018-05-16 2018-11-06 北京小米移动软件有限公司 Show the method, apparatus and storage medium of information
WO2019237427A1 (en) * 2018-06-11 2019-12-19 北京佳珥医学科技有限公司 Method, apparatus and system for assisting hearing-impaired people, and augmented reality glasses
CN109065055A (en) * 2018-09-13 2018-12-21 三星电子(中国)研发中心 Method, storage medium and the device of AR content are generated based on sound

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112349182A (en) * 2020-11-10 2021-02-09 中国人民解放军海军航空大学 Deaf-mute conversation auxiliary system
CN112927704A (en) * 2021-01-20 2021-06-08 中国人民解放军海军航空大学 Silent all-weather individual communication system
CN115064036A (en) * 2022-04-26 2022-09-16 北京亮亮视野科技有限公司 AR technology-based danger early warning method and device
CN115079833A (en) * 2022-08-24 2022-09-20 北京亮亮视野科技有限公司 Multilayer interface and information visualization presenting method and system based on somatosensory control
CN115079833B (en) * 2022-08-24 2023-01-06 北京亮亮视野科技有限公司 Multilayer interface and information visualization presenting method and system based on somatosensory control

Similar Documents

Publication Publication Date Title
CN111768787A (en) Multifunctional auxiliary audio-visual method and system
WO2016086440A1 (en) Wearable guiding device for the blind
US10111013B2 (en) Devices and methods for the visualization and localization of sound
CN108957761B (en) Display device and control method thereof, head-mounted display device and control method thereof
US7415123B2 (en) Method and apparatus for producing spatialized audio signals
JP2004077277A (en) Visualization display method for sound source location and sound source location display apparatus
KR101421046B1 (en) Glasses and control method thereof
CN110673819A (en) Information processing method and electronic equipment
CN105561543A (en) Underwater glasses and control method thereof
US11328692B2 (en) Head-mounted situational awareness system and method of operation
US20020158816A1 (en) Translating eyeglasses
CN110351631A (en) Deaf-mute's alternating current equipment and its application method
CN105527711A (en) Smart glasses with augmented reality
CN104090385B (en) A kind of anti-cheating intelligent glasses
CN108563020A (en) A kind of intelligent MR rescue helmets with thermal infrared imager
CN206574215U (en) It is a kind of to carry out leading passive monitoring sacurity alarm system and buckle by mobile phone
KR20130133932A (en) A wearable type head mounted display device for hearing impaired person
JP2000325389A (en) Visual sense assisting device
CN112002186A (en) Information barrier-free system and method based on augmented reality technology
CN112396718A (en) On-site construction safety and quality supervision research system based on AR technology
CN213903982U (en) Novel intelligent glasses and remote visualization system
CN209899996U (en) Blind guiding system based on video communication
CN218045797U (en) Smart cloud glasses and system worn by blind person
CN211653603U (en) Image processing system
CN215821381U (en) Visual field auxiliary device of AR & VR head-mounted typoscope in coordination

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination