CN114120960B - Auxiliary space sensing system and method based on hearing - Google Patents

Auxiliary space sensing system and method based on hearing Download PDF

Info

Publication number
CN114120960B
CN114120960B CN202111373446.5A CN202111373446A CN114120960B CN 114120960 B CN114120960 B CN 114120960B CN 202111373446 A CN202111373446 A CN 202111373446A CN 114120960 B CN114120960 B CN 114120960B
Authority
CN
China
Prior art keywords
module
audio
information
image
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111373446.5A
Other languages
Chinese (zh)
Other versions
CN114120960A (en
Inventor
费腾
李阳春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202111373446.5A priority Critical patent/CN114120960B/en
Publication of CN114120960A publication Critical patent/CN114120960A/en
Application granted granted Critical
Publication of CN114120960B publication Critical patent/CN114120960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/162Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Emergency Management (AREA)
  • Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to an auxiliary space sensing system based on hearing, which comprises a data acquisition module, a man-machine interaction module, a control module, a walking mode module and a staring mode module, wherein the data acquisition module, the man-machine interaction module, the control module, the walking mode module and the staring mode module provide two working modes of the walking mode and the staring mode, and correspond to two auxiliary space sensing methods based on hearing. The walking mode auxiliary space sensing method provides larger space information quantity, and effectively solves the problem that the existing sensory substitution device cannot meet the requirements of visually impaired people due to insufficient space information; the gaze pattern auxiliary space perception method provides more refined and concentrated space descriptive information, and effectively solves the problem that the practicability is not strong due to the fact that the existing sensory substitution device provides redundant information; the two methods are combined to meet different use requirements of visually impaired users in different scenes.

Description

Auxiliary space sensing system and method based on hearing
Technical Field
The invention belongs to the technical field of sensory substitution, and particularly relates to an auxiliary space sensing system and method based on hearing.
Background
The data published by the world health organization shows that about 2.53 million people worldwide have vision impairment. Due to the lack of vision, visually impaired people face a number of difficulties in traveling in daily life. With the development of society, the quality of life and the trip level of visually impaired people are receiving more and more attention. The vision-impaired people are helped to perceive space, so that the walking ability of the group is improved, the life of the group is facilitated, and the problem to be solved is urgent.
At present, the traditional walking assisting modes for the blind are mainly a blind crutch and a blind guiding dog. However, they have some disadvantages, such as limited detection range of the blind sticks, high culture cost of the guide dogs and limited application occasions. Meanwhile, the traditional blind guiding modes can only help visually impaired people avoid obstacles on the road, but cannot enable the visually impaired people to know the spatial structure and scene information of the surrounding environment.
With the development of computer science and sensor technology, sensory substitution devices have been investigated for spatially perception assisted research by visually impaired people. Because of the characteristics of strong intuitiveness and many available parameters, most sensory substitution devices attempt to use auditory signals instead of visual signals to provide scene information to users. However, these studies tend to suffer from two problems: providing insufficient spatial information may not meet the needs of visually impaired people, or providing information that is too redundant may make the utility less efficient.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an auxiliary space perception system and method based on hearing, which utilize hearing to replace vision, convert space scene information into a non-voice space audio signal or a voice description signal, not only assist visually impaired people to effectively avoid obstacles in the walking process, improve the space cognition of users, but also effectively transmit environment information and improve the scene understanding capability of users.
An auxiliary space perception system based on hearing comprises a data acquisition module, a man-machine interaction module, a control module, a walking mode module and a staring mode module, wherein the man-machine interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module.
The data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream.
The man-machine interaction module is used for conveying a user instruction and sending a walking mode instruction and a staring mode instruction to the control module.
The control module processes the depth image data acquired by the data acquisition module according to the walking mode instruction of the man-machine interaction module, maps the depth image information to the spatial audio and outputs the spatial audio through the earphone; the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the staring mode instruction of the man-machine interaction module, synthesizes the semantic information and outputs corresponding voice through the earphone.
The walking mode module is used for detecting and outputting the direction and distance information of objects in the space scene, and comprises the following steps: a depth image preprocessing sub-module for processing the depth image data stream; the spatial audio generation sub-module is used for mapping the depth image information to audio parameters and carrying out spatial processing on the audio; and the spatial audio output sub-module is used for outputting spatial audio.
The gaze pattern module is used for identifying and outputting attribute and state information of objects in a space scene, and comprises the following steps: the RGB image semantic construction submodule is used for converting an RGB image data stream into sentences; the voice synthesis submodule is used for converting sentences into voices; and the voice output sub-module is used for outputting voice signals.
The walking mode auxiliary space sensing method realized by the auxiliary space sensing system based on hearing comprises the following steps:
Step 101, receiving a depth image data stream transmitted in real time by a data acquisition module, filling null values for each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of null value pixels to the null value pixels, traversing the image and executing the operation until no null value exists in the image;
102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image;
step 103, performing downsampling processing on the depth image obtained in step 102, and downsampling the image with the original size of 225×315 to 5×7;
Step 104, mapping the depth image pixel value subjected to the downsampling in step 103 onto an audio parameter;
Step 105, using head related function technique to spatially process the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the pixel in the image to the sound source position Applying;
step 106, outputting the spatial audio to the earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
Moreover, the downsampling rule in step 103 is: and giving the minimum pixel value in the eight adjacent domains of the pixel to be solved to the pixel to be solved.
Moreover, the rule for converting the image information into audio in the step 104 is as follows: extracting the minimum pixel of each column in the image, setting a threshold D, and comparing the minimum pixel with the threshold D; when the minimum pixel value is smaller than or equal to D, the object representing the position is closer to the user, the minimum pixel value is mapped to the loudness and the pitch of the beep, the smaller the pixel value is, the closer the object is, the larger the mapped loudness and the higher the pitch of the beep are, otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid the obstacle; when the minimum pixel value is larger than D, the object representing the position is far away from the user, and the object can not form collision threat temporarily, and the pixel information is represented by a water drop sound with fixed loudness and sound height, and the water drop sound can be regarded as 'An Quanyin' for prompting the object to be far away.
Furthermore, after the spatialization process in step 105, the beeps or drops are spatially perceived by the user as having directions from them, and the specific mapping rule is as follows:
(1)
(2)
wherein x is the row number of the pixels in the image, y is the column number of the pixels in the image, a three-dimensional coordinate system is constructed by taking the center of the skull as the origin O, the x-axis passes through the human ear, the y-axis passes through the nose, the z-axis is perpendicular to the xOy plane, For the horizontal angle formed by the line of the sound source position and the origin and the yOz plane,/>And the altitude angle is formed by the connecting line of the sound source position and the origin point and the xOy plane.
Also, for each frame of depth image, in step 106, 7 audio clips of different horizontal angles are generated, and these audio clips are played in sequence from left to right; when the audio is a water drop sound, the object representing the position is far away from the user; when the audio is a beep, the object representing the position is less than D meters away from the user, the higher the tone of the beep, the greater the loudness, and the closer the distance; the user can judge the distance and the direction of the obstacle according to the tone, the loudness and the sound source position information of the audio, so that the obstacle is avoided in the walking process.
A gaze-mode assisted spatial perception method implemented with the above hearing-based assisted spatial perception system, comprising the steps of:
step 201, receiving an RGB image transmitted by a data acquisition module in real time, and generating English descriptive text of the image by calling Computer Vision API services provided by Microsoft;
step 202, converting the English text generated in the step 1 into Chinese text through hundred-degree translation API service;
step 203, converting the Chinese text generated in step2 into voice based on pyttsx modules in python software;
Step 204, outputting the voice to the earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current field scene information.
Compared with the prior art, the invention has the following advantages:
1) The system provided by the invention comprises two working modes, namely a walking mode and a staring mode, and corresponds to two hearing-based auxiliary space sensing methods, and the two methods are combined to meet different use requirements of visually impaired users in different scenes.
2) The walking mode auxiliary space perception method converts space scene information into space audio signals in real time by using an audible technology, so that a visually impaired person can quickly acquire space structure information, effectively help the visually impaired person avoid obstacles in the walking process, and improve the space cognition feeling. The walking mode auxiliary space sensing method provides larger space information quantity, and effectively solves the problem that the existing sensory substitution device cannot meet the requirements of visually impaired people due to insufficient space information.
3) The staring mode auxiliary space perception method converts the space scene information into voice for reading in real time, so that a visually impaired person can quickly acquire descriptive information of the space scene, the visually impaired person can be effectively helped to acquire environment information, and the scene understanding capability is improved. The gaze pattern auxiliary space perception method provides more refined and concentrated space descriptive information, and effectively solves the problem that the practicability is not strong due to the fact that the existing sensory substitution device provides redundant information.
Drawings
Fig. 1 is a schematic structural diagram of an auditory-based auxiliary spatial perception system according to an embodiment of the present invention.
FIG. 2 is a flow chart of a walking pattern aided spatial perception method of the present invention.
FIG. 3 is a view showing the horizontal angle of the sound source position used in the walking pattern aided space sensing method of the present inventionAngle of heightIs a schematic diagram of (a).
Fig. 4 is a flow chart of a gaze pattern assisted spatial perception method of the present invention.
Detailed Description
The invention provides an auxiliary space perception system and method based on hearing, which utilize hearing to replace vision, convert space scene information into a non-voice space audio signal or a voice description signal, and assist a user to perceive space and understand scenes.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
As shown in fig. 1, the present invention provides an auditory-based auxiliary spatial perception system, comprising: the system comprises a data acquisition module, a man-machine interaction module, a control module, a walking mode module and a staring mode module, wherein the man-machine interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module.
The data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream.
The man-machine interaction module is used for conveying a user instruction and sending a walking mode instruction and a staring mode instruction to the control module.
The control module processes the depth image data acquired by the data acquisition module according to the walking mode instruction of the man-machine interaction module, maps the depth image information to audio parameters and outputs spatial audio through the earphone; the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the staring mode instruction of the man-machine interaction module, synthesizes the semantic information and outputs corresponding voice through the earphone.
The walking mode module is used for detecting and outputting the direction and distance information of objects in the space scene, and comprises the following steps: a depth image preprocessing sub-module for processing the depth image data stream; the spatial audio generation sub-module is used for mapping the depth image information to audio parameters and carrying out spatial processing on the audio; and the spatial audio output sub-module is used for outputting spatial audio.
The gaze pattern module is used for identifying and outputting attribute and state information of objects in a space scene, and comprises the following steps: the RGB image semantic construction submodule is used for converting an RGB image data stream into sentences; the voice synthesis submodule is used for converting sentences into voices; and the voice output sub-module is used for outputting voice signals.
By arranging the man-machine interaction module, a user can freely select a mode, and different types of information of surrounding space scenes can be obtained through different types of sound output; by arranging the walking mode module, a user can quickly sense the space structure of the surrounding environment, and is helped to master the azimuth and distance information of the obstacle, so that the obstacle is effectively avoided, and the safety of the user in the walking process is ensured; by setting the gaze pattern module, the user can quickly acquire descriptive information of surrounding environment, and the user is facilitated to understand the space scene.
The system comprises two working modes, namely a walking mode and a staring mode, and corresponds to two hearing-based auxiliary space sensing methods: a walking mode assisted spatial perception method and a gaze mode assisted spatial perception method. The walking mode auxiliary space perception method converts space scene information into space audio signals in real time by using an audible technology, so that a visually impaired person can quickly acquire space structure information, effectively help the visually impaired person avoid obstacles in the walking process, and improve the space cognition feeling. The staring mode auxiliary space perception method converts the space scene information into voice for reading in real time, so that a visually impaired person can quickly acquire descriptive information of the space scene, the visually impaired person can be effectively helped to acquire environment information, and the scene understanding capability is improved.
As shown in fig. 2, the walking pattern auxiliary space perception method includes the following steps:
Step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values for each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of null value pixels to the null value pixels, traversing the image and executing the operation until no null value exists in the image.
Step 102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image.
Step 103, performing downsampling processing on the depth image obtained in step 102, and downsampling the image with the original size of 225×315 to 5×7, where the downsampling rule is as follows: and giving the minimum pixel value in the eight adjacent domains of the pixel to be solved to the pixel to be solved.
And 104, mapping the depth image pixel values subjected to the downsampling in the step 103 onto audio parameters.
The rule for converting the image information into the audio is as follows: the minimum pixel of each column in the image is extracted, a threshold D (in this embodiment, D is 3 meters) is set, and the minimum pixel is compared with the threshold D. When the minimum pixel value is less than or equal to D, the object representing the position is closer to the user, the minimum pixel value is mapped to the loudness and the pitch of the beeps, the smaller the pixel value is, the closer the object is, the greater the mapped loudness and the higher the pitch of the beeps are, otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid the obstacle. When the minimum pixel value is larger than D, the object representing the position is far away from the user, and the object can not form collision threat temporarily, and the pixel information is represented by a water drop sound with fixed loudness and sound height, and the water drop sound can be regarded as 'An Quanyin' for prompting the object to be far away.
Step 105, using head related function technique to spatially process the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the pixel in the image to the sound source positionAnd (3) upper part. After the spatialization process, beeps or drops are spatially perceived by the user as to the direction they are coming from.
The specific mapping rule is as follows:
(1)
(2)
wherein x is the row number of the pixels in the image, y is the column number of the pixels in the image, as shown in fig. 3, a three-dimensional coordinate system is constructed by taking the center of the skull as the origin O, the x-axis passes through the human ear, the y-axis passes through the nose, the z-axis is perpendicular to the xOy plane, For the horizontal angle formed by the line of the sound source position and the origin and the yOz plane,/>And the altitude angle is formed by the connecting line of the sound source position and the origin point and the xOy plane.
Step 106, outputting the spatial audio to the earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
For each frame depth image, 7 audio clips of different horizontal angles will be generated, which are played in turn in a left-to-right order. When the audio is a water drop sound, the object representing the position is far away from the user; when the audio is a beep, the object representing the location is less than D meters from the user, the higher the tone of the beep, the greater the loudness, and the closer the distance. The user can judge the distance and the direction of the obstacle according to the tone, the loudness and the sound source position information of the audio, so that the obstacle is avoided in the walking process.
As shown in fig. 3, the gaze pattern assisted spatial perception method comprises the steps of:
step 201, receiving the RGB image transmitted by the data acquisition module in real time, and generating an english descriptive text of the image by calling Computer Vision API services provided by microsoft.
Step 202, converting the English text generated in step 201 into Chinese text through hundred-degree translation API service.
Step 203, converting the Chinese text generated in step 202 into speech based on pyttsx modules in python software.
Step 204, outputting the voice to the earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current field scene information.
The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Those skilled in the art may make various modifications or additions to the described embodiments or substitutions thereof without departing from the spirit of the invention or exceeding the scope of the invention as defined in the accompanying claims.

Claims (7)

1. The hearing-based auxiliary space perception system is characterized by comprising a data acquisition module, a man-machine interaction module, a control module, a walking mode module and a staring mode module, wherein the man-machine interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module;
The data acquisition module is used for acquiring space scene information, including a depth image data stream and an RGB image data stream;
the man-machine interaction module is used for conveying a user instruction and sending a walking mode instruction and a staring mode instruction to the control module;
The control module processes the depth image data acquired by the data acquisition module according to the walking mode instruction of the man-machine interaction module, maps the depth image information to audio parameters and outputs spatial audio through the earphone; the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the staring mode instruction of the man-machine interaction module, synthesizes the semantic information and outputs corresponding voice through the earphone;
The walking mode module is used for detecting and outputting the direction and distance information of objects in the space scene, and comprises the following steps: a depth image preprocessing sub-module for processing the depth image data stream; the spatial audio generation sub-module is used for mapping the depth image information to the spatial audio and carrying out spatial processing on the audio; the spatial audio output sub-module is used for outputting spatial audio;
The gaze pattern module is used for identifying and outputting attribute and state information of objects in a space scene, and comprises the following steps: the RGB image semantic construction sub-module is used for converting RGB image data into sentences; the voice synthesis submodule is used for converting sentences into voices; and the voice output sub-module is used for outputting voice signals.
2. A walking pattern aided spatial perception method implemented using the hearing based aided spatial perception system of claim 1, comprising the steps of:
Step 101, receiving a depth image data stream transmitted in real time by a data acquisition module, filling null values for each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of null value pixels to the null value pixels, traversing the image and executing the operation until no null value exists in the image;
102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image;
step 103, performing downsampling processing on the depth image obtained in step 102, and downsampling the image with the original size of 225×315 to 5×7;
Step 104, mapping the depth image pixel value subjected to the downsampling in step 103 onto an audio parameter;
Step 105, using head related function technique to spatially process the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the pixel in the image to the sound source position Applying;
step 106, outputting the spatial audio to the earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
3. A walking pattern aided spatial perception method as claimed in claim 2, wherein: the downsampling rule in step 103 is: and giving the minimum pixel value in the eight adjacent domains of the pixel to be solved to the pixel to be solved.
4. A walking pattern aided spatial perception method as claimed in claim 2, wherein: the rule for converting the image information into audio in step 104 is: extracting the minimum pixel of each column in the image, setting a threshold D, and comparing the minimum pixel with the threshold D; when the minimum pixel value is smaller than or equal to D, the object representing the position is closer to the user, the minimum pixel value is mapped to the loudness and the pitch of the beep, the smaller the pixel value is, the closer the object is, the larger the mapped loudness and the higher the pitch of the beep are, otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid the obstacle; when the minimum pixel value is larger than D, the object representing the position is far away from the user, and the object can not form collision threat temporarily, and the pixel information is represented by a water drop sound with fixed loudness and sound height, and the water drop sound can be regarded as 'An Quanyin' for prompting the object to be far away.
5. A walking pattern aided spatial perception method as claimed in claim 2, wherein: the spatialization process of the beeps or drops is spatialization, and the user will feel the direction they are coming from, and the specific mapping rule is as follows:
(1)
(2)
wherein x is the row number of the pixels in the image, y is the column number of the pixels in the image, a three-dimensional coordinate system is constructed by taking the center of the skull as the origin O, the x-axis passes through the human ear, the y-axis passes through the nose, the z-axis is perpendicular to the xOy plane, For the horizontal angle formed by the line of the sound source position and the origin and the yOz plane,/>And the altitude angle is formed by the connecting line of the sound source position and the origin point and the xOy plane.
6. A walking pattern aided spatial perception method as claimed in claim 2, wherein: for each frame of depth image, in step 106, 7 audio clips of different horizontal angles are generated, and these audio clips are played in turn in the order from left to right; when the audio is a water drop sound, the object representing the position is far away from the user; when the audio is a beep, the object representing the position is less than D meters away from the user, the higher the tone of the beep, the greater the loudness, and the closer the distance; the user can judge the distance and the direction of the obstacle according to the tone, the loudness and the sound source position information of the audio, so that the obstacle is avoided in the walking process.
7. A gaze-mode assisted spatial perception method implemented with the auditory-based assisted spatial perception system of claim 1, comprising the steps of:
step 201, receiving an RGB image transmitted by a data acquisition module in real time, and generating English descriptive text of the image by calling Computer Vision API services provided by Microsoft;
Step 202, converting the English text generated in step 201 into Chinese text through hundred-degree translation API service;
Step 203, converting the Chinese text generated in step 202 into voice based on pyttsx modules in python software;
Step 204, outputting the voice to the earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current field scene information.
CN202111373446.5A 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing Active CN114120960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111373446.5A CN114120960B (en) 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111373446.5A CN114120960B (en) 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing

Publications (2)

Publication Number Publication Date
CN114120960A CN114120960A (en) 2022-03-01
CN114120960B true CN114120960B (en) 2024-05-03

Family

ID=80396465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111373446.5A Active CN114120960B (en) 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing

Country Status (1)

Country Link
CN (1) CN114120960B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060133140A (en) * 2005-06-20 2006-12-26 경북대학교 산학협력단 Reproduction of visual image using auditory stimulation and method controlling of the same
CN109085926A (en) * 2018-08-21 2018-12-25 华东师范大学 A kind of the augmented reality system and its application of multi-modality imaging and more perception blendings
CN113038322A (en) * 2021-03-04 2021-06-25 聆感智能科技(深圳)有限公司 Method and device for enhancing environmental perception by hearing
CN113196390A (en) * 2021-03-09 2021-07-30 曹庆恒 Perception system based on hearing and use method thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11369543B2 (en) * 2016-09-17 2022-06-28 Noah E Gamerman Non-visual precision spatial awareness device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060133140A (en) * 2005-06-20 2006-12-26 경북대학교 산학협력단 Reproduction of visual image using auditory stimulation and method controlling of the same
CN109085926A (en) * 2018-08-21 2018-12-25 华东师范大学 A kind of the augmented reality system and its application of multi-modality imaging and more perception blendings
CN113038322A (en) * 2021-03-04 2021-06-25 聆感智能科技(深圳)有限公司 Method and device for enhancing environmental perception by hearing
CN113196390A (en) * 2021-03-09 2021-07-30 曹庆恒 Perception system based on hearing and use method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AudioMan:电子行走辅助系统的设计与实现;徐洁;方志刚;鲍福良;张丽红;;中国图象图形学报;20070715(第07期);全文 *

Also Published As

Publication number Publication date
CN114120960A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
CN108245385B (en) A kind of device helping visually impaired people's trip
CN107223277A (en) A kind of deaf-mute's householder method, device and electronic equipment
US20150227778A1 (en) Intelligent glasses for the visually impaired
CN111597828B (en) Translation display method, device, head-mounted display equipment and storage medium
CN104983511A (en) Voice-helping intelligent glasses system aiming at totally-blind visual handicapped
CN108245384A (en) Binocular vision apparatus for guiding blind based on enhancing study
Sáez et al. Aerial obstacle detection with 3-D mobile devices
US20130216093A1 (en) Walking assistance system and method
KR20110004064A (en) Method for guiding a blind person and system therefor
Liu et al. Electronic travel aids for the blind based on sensory substitution
CN114973412A (en) Lip language identification method and system
JP2016194612A (en) Visual recognition support device and visual recognition support program
KR101187600B1 (en) Speech Recognition Device and Speech Recognition Method using 3D Real-time Lip Feature Point based on Stereo Camera
Ali et al. Blind navigation system for visually impaired using windowing-based mean on Microsoft Kinect camera
Kaur et al. A scene perception system for visually impaired based on object detection and classification using multi-modal DCNN
CN111904806A (en) Blind guiding system
CN114120960B (en) Auxiliary space sensing system and method based on hearing
EP3058926A1 (en) Method of transforming visual data into acoustic signals and aid device for visually impaired or blind persons
CN116572260A (en) Emotion communication accompanying and nursing robot system based on artificial intelligence generated content
CN111539408A (en) Intelligent point reading scheme based on photographing and object recognizing
Nazim et al. Smart glasses: A visual assistant for the blind
Bhat et al. Vision sensory substitution to aid the blind in reading and object recognition
CN111273598A (en) Underwater information interaction system, diver safety guarantee method and storage medium
CN111862932A (en) Wearable blind assisting system and method for converting image into sound
Bourbakis et al. A 2D vibration array for sensing dynamic changes and 3D space for Blinds' navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant