CN114120960A - Auxiliary space perception system and method based on hearing - Google Patents
Auxiliary space perception system and method based on hearing Download PDFInfo
- Publication number
- CN114120960A CN114120960A CN202111373446.5A CN202111373446A CN114120960A CN 114120960 A CN114120960 A CN 114120960A CN 202111373446 A CN202111373446 A CN 202111373446A CN 114120960 A CN114120960 A CN 114120960A
- Authority
- CN
- China
- Prior art keywords
- audio
- module
- information
- image
- spatial
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000008447 perception Effects 0.000 title claims abstract description 37
- 230000003993 interaction Effects 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims description 15
- 238000013507 mapping Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 11
- 230000004888 barrier function Effects 0.000 claims description 8
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 6
- 230000015572 biosynthetic process Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 210000003625 skull Anatomy 0.000 claims description 3
- 238000003786 synthesis reaction Methods 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000001771 impaired effect Effects 0.000 abstract description 19
- 210000000697 sensory organ Anatomy 0.000 abstract description 5
- 238000006467 substitution reaction Methods 0.000 abstract description 5
- 230000005236 sound signal Effects 0.000 description 4
- 230000000007 visual effect Effects 0.000 description 4
- 230000019771 cognition Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000001953 sensory effect Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 208000029257 vision disease Diseases 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/162—Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B21/00—Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
- G08B21/18—Status alarms
- G08B21/24—Reminder alarms, e.g. anti-loss alarms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Emergency Management (AREA)
- Artificial Intelligence (AREA)
- Processing Or Creating Images (AREA)
Abstract
The invention relates to an auditory sense-based auxiliary space sensing system which comprises a data acquisition module, a man-machine interaction module, a control module, a walking mode module and a staring mode module, wherein two working modes of the walking mode and the staring mode are provided, and the walking mode and the staring mode correspond to two auditory sense-based auxiliary space sensing methods. The walking mode assisted space perception method provides larger space information amount, and effectively solves the problem that the existing sense organ substitution device cannot meet the requirements of the visually impaired due to insufficient space information; the gazing mode auxiliary space perception method provides more refined and concentrated space descriptive information, and effectively solves the problem that the practicability is not strong because the existing sense organ substitution device provides redundant information; the two methods are combined to meet different use requirements of the visually impaired users in different scenes.
Description
Technical Field
The invention belongs to the technical field of sense organ replacement, and particularly relates to an auxiliary space perception system and method based on hearing.
Background
Data published by the world health organization show that about 2.53 million people in the world suffer from vision disorders. The visually impaired people face a lot of difficulties in going out in daily life due to the lack of vision. With the development of society, the quality of life and the level of travel of visually impaired people are receiving more and more attention. The visual sense space is assisted to visually impaired people, the walking ability of the group is further improved, the life of the group is facilitated, and the problem to be solved urgently is solved.
At present, the traditional ways for assisting the blind to walk are mainly a blind stick and a blind guiding dog. However, they all have some disadvantages, such as limited detection range of the blind stick, high cultivation cost of the guide dog and limited application. Meanwhile, the traditional blind guiding methods only can help the visually impaired people to avoid obstacles on the circuit, but cannot enable the visually impaired people to know the spatial structure and scene information of the surrounding environment.
With the development of computer science and sensor technology, sensory substitution devices are being studied for spatial perception assistance in visually impaired people. Most sensory substitution devices attempt to use auditory signals instead of visual signals to provide scene information to users due to the characteristics of strong intuition and multiple available parameters of auditory sense. However, these studies tend to have two problems: the provision of insufficient spatial information fails to satisfy the needs of visually impaired persons, or the provision of excessively redundant information makes the utility poor.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an auxiliary space perception system and method based on auditory sense, which utilize auditory sense to replace visual sense and convert space scene information into a non-voice space audio signal or a voice description signal, thereby not only assisting the visually impaired to effectively avoid obstacles in the walking process and improving the space cognition sense of a user, but also effectively transmitting environmental information and improving the scene understanding ability of the user.
An auxiliary space perception system based on the sense of hearing comprises a data acquisition module, a human-computer interaction module, a control module, a walking mode module and a staring mode module, wherein the human-computer interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module.
The data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream.
The human-computer interaction module is used for transmitting a user instruction and sending a walking mode instruction and a staring mode instruction to the control module.
The control module processes the depth image data acquired by the data acquisition module according to a walking mode instruction of the man-machine interaction module, maps the depth image information to a spatial audio frequency, and outputs the spatial audio frequency through an earphone; and the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the gaze mode instruction of the human-computer interaction module, synthesizes and outputs corresponding voice through an earphone.
The walking mode module is used for detecting and outputting object azimuth and distance information in a space scene, and comprises: the depth image preprocessing submodule is used for processing a depth image data stream; the spatial audio generation submodule is used for mapping the depth image information to audio parameters and carrying out spatialization processing on the audio; and the spatial audio output sub-module is used for outputting spatial audio.
The gaze pattern module is used for identifying and outputting the attribute and state information of the object in the space scene, and comprises: the RGB image semantic construction submodule is used for converting the RGB image data stream into sentences; a speech synthesis submodule for converting the sentence into speech; and the voice output submodule is used for outputting the voice signal.
A walking mode auxiliary space perception method realized by the hearing-based auxiliary space perception system comprises the following steps:
step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values in each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of the null pixel to the null pixel, and traversing the image to execute the operation until no null value exists in the image;
102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image;
a step 103 of down-sampling the depth image obtained in the step 102 to 5 × 7 an image having an original size of 225 × 315;
104, mapping pixel values of the depth image subjected to downsampling processing in the step 103 to audio parameters;
step 105, using head-related function technique to perform spatialization processing on the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the image element in the image to the sound source positionThe above step (1);
and step 106, outputting the spatial audio to an earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
In step 103, the down-sampling rule is: and assigning the minimum pixel value in the eight neighborhoods of the pixel to be solved to the pixel to be solved.
Moreover, the rule for converting the image information into the audio in step 104 is as follows: extracting the minimum pixel of each row in the image, setting a threshold value D, and comparing the minimum pixel with the threshold value D; when the minimum pixel value is smaller than or equal to D, the object at the position is close to the user, the minimum pixel value is mapped to the loudness and pitch of the beep, the smaller the pixel value is, the closer the object is, the louder the object is, the higher the loudness and the higher the pitch of the beep are, and otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid obstacles; when the minimum pel value is greater than D, the object representing the location is further away from the user and does not temporarily pose a collision threat, and the pel information is represented using a fixed loudness and pitch water droplet sound, which may be considered a "safety sound" that suggests that the object is further away.
Furthermore, after the process of the step 105, the beep sound or the water drop sound is spatially sensed, and the user will feel the direction from them, and the mapping rule is as follows:
θ=-120°+30°×y (1)
wherein x is the line number of the pixel in the image, y is the column number of the pixel in the image, a three-dimensional coordinate system is constructed by taking the center of the skull as an origin O, the x axis passes through the ear of a person, the y axis passes through the nose, the z axis is vertical to the xOy plane, theta is a horizontal angle formed by the connecting line of the sound source position and the origin O and the yOz plane,is the height angle formed by the connecting line of the sound source position and the origin O and the xOy plane.
Moreover, in the step 106, for each frame of depth image, 7 audio segments with different horizontal angles are generated, and the audio segments are played sequentially from left to right; when the audio frequency is the sound of water drops, the object representing the position is far away from the user; when the audio is beep, the object indicating the position is less than D meters away from the user, and the higher the tone and the loudness of the beep are, the closer the object is; the user can judge the distance and the direction of the barrier according to the tone color, the tone, the loudness and the position information of the sound source of the audio, so that the barrier is avoided in the walking process.
A gaze mode auxiliary space perception method realized by the hearing-based auxiliary space perception system comprises the following steps:
step 201, receiving an RGB image transmitted by a data acquisition module in real time, and generating an English descriptive text of the image by calling a Computer Vision API service provided by Microsoft;
step 202, converting the English text generated in the step 1 into a Chinese text through Baidu translation API service;
step 203, based on a pyttsx module in python software, converting the Chinese text generated in the step 2 into voice;
and step 204, outputting voice to an earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current view scene information.
Compared with the prior art, the invention has the following advantages:
1) the system provided by the invention comprises two working modes, namely a walking mode and a staring mode, and corresponds to two hearing-based auxiliary space perception methods, and the two methods can meet different use requirements of a vision-impaired user in different scenes by combining.
2) The walking mode assisted space perception method uses an audibility technology to convert space scene information into a space audio signal in real time, so that the visually impaired can quickly acquire space structure information, effectively help the visually impaired to avoid obstacles in the walking process, and improve the space cognition. The walking mode assisted space perception method provides larger space information amount, and effectively solves the problem that the existing sense organ replacement device cannot meet the requirements of the visually impaired due to insufficient space information.
3) The gazing mode assisted space perception method converts space scene information into voice to read in real time, so that the visually impaired can quickly acquire descriptive information of the space scene, the visually impaired can be effectively helped to acquire environmental information, and scene understanding ability is improved. The gaze mode auxiliary space perception method provides more refined and concentrated space descriptive information, and effectively solves the problem that the existing sense organ substitution device provides redundant information and therefore is not high in practicability.
Drawings
Fig. 1 is a schematic structural diagram of an auxiliary space perception system based on hearing according to an embodiment of the invention.
FIG. 2 is a flowchart illustrating a walking mode assisted spatial awareness method according to the present invention.
FIG. 3 shows the horizontal angle θ and the elevation angle of the sound source position used in the walking mode assisted spatial perception method of the present inventionSchematic representation of (a).
FIG. 4 is a flow chart of a gaze-mode assisted spatial awareness method of the present invention.
Detailed Description
The invention provides an auxiliary space perception system and method based on auditory sense, which utilize auditory sense to replace visual sense to convert space scene information into a non-voice space audio signal or a voice description signal so as to assist a user in perceiving space and understanding a scene.
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
As shown in fig. 1, the present invention provides an auditory sense-based auxiliary spatial perception system, comprising: the system comprises a data acquisition module, a human-computer interaction module, a control module, a walking mode module and a staring mode module, wherein the human-computer interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module.
The data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream.
The human-computer interaction module is used for transmitting a user instruction and sending a walking mode instruction and a staring mode instruction to the control module.
The control module processes the depth image data acquired by the data acquisition module according to a walking mode instruction of the man-machine interaction module, maps the depth image information to an audio parameter and outputs a spatial audio through an earphone; and the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the gaze mode instruction of the human-computer interaction module, synthesizes and outputs corresponding voice through an earphone.
The walking mode module is used for detecting and outputting object azimuth and distance information in a space scene, and comprises: the depth image preprocessing submodule is used for processing a depth image data stream; the spatial audio generation submodule is used for mapping the depth image information to audio parameters and carrying out spatialization processing on the audio; and the spatial audio output sub-module is used for outputting spatial audio.
The gaze pattern module is used for identifying and outputting the attribute and state information of the object in the space scene, and comprises: the RGB image semantic construction submodule is used for converting the RGB image data stream into sentences; a speech synthesis submodule for converting the sentence into speech; and the voice output submodule is used for outputting the voice signal.
By arranging the human-computer interaction module, a user can freely select a mode, and different types of information of surrounding space scenes can be acquired through different types of sound output; by arranging the walking mode module, a user can quickly sense the spatial structure of the surrounding environment and help the user to master the direction and distance information of the barrier, so that the barrier is effectively avoided and the safety of the user in the walking process is ensured; by setting the gaze pattern module, the user can quickly acquire descriptive information of the surrounding environment, which is helpful for the user to understand the spatial scene.
The system comprises two working modes, namely a walking mode and a staring mode, and corresponds to two hearing-based auxiliary space perception methods: a walking mode aided space perception method and a gaze mode aided space perception method. The walking mode assisted space perception method uses an audibility technology to convert space scene information into a space audio signal in real time, so that the visually impaired can quickly acquire space structure information, effectively help the visually impaired to avoid obstacles in the walking process, and improve the space cognition. The gazing mode assisted space perception method converts space scene information into voice to read in real time, so that the visually impaired can quickly acquire descriptive information of the space scene, the visually impaired can be effectively helped to acquire environmental information, and scene understanding ability is improved.
As shown in fig. 2, the walking mode assisted spatial perception method includes the following steps:
step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values in each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of the null pixel to the null pixel, and traversing the image to execute the operation until no null value exists in the image.
And 102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image.
Step 103, down-sampling the depth image obtained in step 102 to 5 × 7 of an image with an original size of 225 × 315, wherein the down-sampling rule is as follows: and assigning the minimum pixel value in the eight neighborhoods of the pixel to be solved to the pixel to be solved.
And step 104, mapping the pixel values of the depth image subjected to the downsampling processing in the step 103 to the audio parameters.
The rule for converting image information into audio is as follows: the minimum pixel of each column in the image is extracted, a threshold value D (in the embodiment, D is 3 meters) is set, and the minimum pixel is compared with the threshold value D. And when the minimum pixel value is less than or equal to D, the object representing the position is closer to the user, the minimum pixel value is mapped to the loudness and pitch of the beep, the smaller the pixel value is, the closer the object is represented, the louder the loudness of the mapped beep is higher, the higher the pitch is, and otherwise, the louder the loudness is and the lower the pitch is, so that the user is prompted to avoid obstacles. When the minimum pel value is greater than D, the object representing the location is further away from the user and does not temporarily pose a collision threat, and the pel information is represented using a fixed loudness and pitch water droplet sound, which may be considered a "safety sound" that suggests that the object is further away.
Step 105, using head-related function technique to perform spatialization processing on the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the image element in the image to the sound source positionThe above. After the spatialization treatment, the beep sound or the water drop sound is spacious, and the user feels the direction from which they come.
The specific mapping rule is shown as the following formula:
θ=-120°+30°×y (1)
wherein x is the row number of the pixel in the image, y is the column number of the pixel in the image, as shown in figure 3, a three-dimensional coordinate system is constructed by taking the center of the skull as an origin O, the x axis passes through the ear of a person, the y axis passes through the nose,the z-axis is perpendicular to the xOy plane. Theta is a horizontal angle formed by a line connecting the sound source position with the origin O and the yOz plane,is the height angle formed by the connecting line of the sound source position and the origin O and the xOy plane.
And step 106, outputting the spatial audio to an earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
For each frame depth image, 7 different horizontal angle audio clips will be generated, which are played sequentially in left to right order. When the audio frequency is the sound of water drops, the object representing the position is far away from the user; when the audio is beeping, the object representing the location is less than D meters away from the user, and the beeping sound is closer as the tone is higher and the loudness is greater. The user can judge the distance and the direction of the barrier according to the tone color, the tone, the loudness and the position information of the sound source of the audio, so that the barrier is avoided in the walking process.
As shown in fig. 3, the gaze-mode assisted spatial perception method comprises the steps of:
step 201, receiving the RGB image transmitted by the data acquisition module in real time, and generating an english descriptive text of the image by calling a Computer Vision API service provided by microsoft.
Step 202, the english text generated in step 201 is converted into a chinese text through a Baidu translation API service.
Step 203, based on the pyttsx module in the python software, the chinese text generated in step 202 is converted into speech.
And step 204, outputting voice to an earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current view scene information.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.
Claims (7)
1. An auxiliary space perception system based on hearing is characterized by comprising a data acquisition module, a human-computer interaction module, a control module, a walking mode module and a staring mode module, wherein the human-computer interaction module is connected with the control module;
the data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream;
the human-computer interaction module is used for transmitting a user instruction and sending a walking mode instruction and a staring mode instruction to the control module;
the control module processes the depth image data acquired by the data acquisition module according to a walking mode instruction of the man-machine interaction module, maps the depth image information to an audio parameter and outputs a spatial audio through an earphone; the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the staring mode instruction of the human-computer interaction module, synthesizes and outputs corresponding voice through an earphone;
the walking mode module is used for detecting and outputting object azimuth and distance information in a space scene, and comprises: the depth image preprocessing submodule is used for processing a depth image data stream; the spatial audio generation submodule is used for mapping the depth image information to spatial audio and carrying out spatialization processing on the audio; a spatial audio output sub-module for outputting spatial audio;
the gaze pattern module is used for identifying and outputting the attribute and state information of the object in the space scene, and comprises: the RGB image semantic construction submodule is used for converting RGB image data into sentences; a speech synthesis submodule for converting the sentence into speech; and the voice output submodule is used for outputting the voice signal.
2. A walking mode aided spatial perception method implemented using the hearing based aided spatial perception system of claim 1, comprising the steps of:
step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values in each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of the null pixel to the null pixel, and traversing the image to execute the operation until no null value exists in the image;
102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image;
a step 103 of down-sampling the depth image obtained in the step 102 to 5 × 7 an image having an original size of 225 × 315;
104, mapping pixel values of the depth image subjected to downsampling processing in the step 103 to audio parameters;
step 105, using head-related function technique to perform spatialization processing on the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the image element in the image to the sound source positionThe above step (1);
and step 106, outputting the spatial audio to an earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
3. A walking mode aided spatial perception method as claimed in claim 2, characterized by: the down-sampling rule in step 103 is: and assigning the minimum pixel value in the eight neighborhoods of the pixel to be solved to the pixel to be solved.
4. A walking mode aided spatial perception method as claimed in claim 2, characterized by: the rule for converting the image information into the audio in step 104 is as follows: extracting the minimum pixel of each row in the image, setting a threshold value D, and comparing the minimum pixel with the threshold value D; when the minimum pixel value is smaller than or equal to D, the object at the position is close to the user, the minimum pixel value is mapped to the loudness and pitch of the beep, the smaller the pixel value is, the closer the object is, the louder the object is, the higher the loudness and the higher the pitch of the beep are, and otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid obstacles; when the minimum pel value is greater than D, the object representing the location is further away from the user and does not temporarily pose a collision threat, and the pel information is represented using a fixed loudness and pitch water droplet sound, which may be considered a "safety sound" that suggests that the object is further away.
5. A walking mode aided spatial perception method as claimed in claim 2, characterized by: the spatialized beeps or drips in step 105 are spatial, and the user will feel the directions they are coming from, the mapping rule is as follows:
θ=-120°+30°×y (1)
wherein x is the line number of the pixel in the image, y is the column number of the pixel in the image, a three-dimensional coordinate system is constructed by taking the center of the skull as an origin O, the x axis passes through the ear, the y axis passes through the nose, the z axis is vertical to the xOy plane, theta is a horizontal angle formed by the connecting line of the sound source position and the origin and the yOz plane,the height angle formed by the connecting line of the sound source position and the origin and the xOy plane.
6. A walking mode aided spatial perception method as claimed in claim 2, characterized by: in step 106, for each frame depth image, 7 audio segments with different horizontal angles are generated, and the audio segments are played sequentially from left to right; when the audio frequency is the sound of water drops, the object representing the position is far away from the user; when the audio is beep, the object indicating the position is less than D meters away from the user, and the higher the tone and the loudness of the beep are, the closer the object is; the user can judge the distance and the direction of the barrier according to the tone color, the tone, the loudness and the position information of the sound source of the audio, so that the barrier is avoided in the walking process.
7. A gaze-mode aided spatial perception method implemented using the hearing-based aided spatial perception system of claim 1, comprising the steps of:
step 201, receiving an RGB image transmitted by a data acquisition module in real time, and generating an English descriptive text of the image by calling a Computer Vision API service provided by Microsoft;
step 202, converting the English text generated in the step 201 into a Chinese text through Baidu translation API service;
step 203, based on a pyttsx module in python software, converting the Chinese text generated in the step 202 into voice;
and step 204, outputting voice to an earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current view scene information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111373446.5A CN114120960B (en) | 2021-11-19 | 2021-11-19 | Auxiliary space sensing system and method based on hearing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111373446.5A CN114120960B (en) | 2021-11-19 | 2021-11-19 | Auxiliary space sensing system and method based on hearing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114120960A true CN114120960A (en) | 2022-03-01 |
CN114120960B CN114120960B (en) | 2024-05-03 |
Family
ID=80396465
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111373446.5A Active CN114120960B (en) | 2021-11-19 | 2021-11-19 | Auxiliary space sensing system and method based on hearing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114120960B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060133140A (en) * | 2005-06-20 | 2006-12-26 | 경북대학교 산학협력단 | Reproduction of visual image using auditory stimulation and method controlling of the same |
US20180078444A1 (en) * | 2016-09-17 | 2018-03-22 | Noah Eitan Gamerman | Non-visual precision spatial awareness device. |
CN109085926A (en) * | 2018-08-21 | 2018-12-25 | 华东师范大学 | A kind of the augmented reality system and its application of multi-modality imaging and more perception blendings |
CN113038322A (en) * | 2021-03-04 | 2021-06-25 | 聆感智能科技(深圳)有限公司 | Method and device for enhancing environmental perception by hearing |
CN113196390A (en) * | 2021-03-09 | 2021-07-30 | 曹庆恒 | Perception system based on hearing and use method thereof |
-
2021
- 2021-11-19 CN CN202111373446.5A patent/CN114120960B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20060133140A (en) * | 2005-06-20 | 2006-12-26 | 경북대학교 산학협력단 | Reproduction of visual image using auditory stimulation and method controlling of the same |
US20180078444A1 (en) * | 2016-09-17 | 2018-03-22 | Noah Eitan Gamerman | Non-visual precision spatial awareness device. |
CN109085926A (en) * | 2018-08-21 | 2018-12-25 | 华东师范大学 | A kind of the augmented reality system and its application of multi-modality imaging and more perception blendings |
CN113038322A (en) * | 2021-03-04 | 2021-06-25 | 聆感智能科技(深圳)有限公司 | Method and device for enhancing environmental perception by hearing |
CN113196390A (en) * | 2021-03-09 | 2021-07-30 | 曹庆恒 | Perception system based on hearing and use method thereof |
Non-Patent Citations (1)
Title |
---|
徐洁;方志刚;鲍福良;张丽红;: "AudioMan:电子行走辅助系统的设计与实现", 中国图象图形学报, no. 07, 15 July 2007 (2007-07-15) * |
Also Published As
Publication number | Publication date |
---|---|
CN114120960B (en) | 2024-05-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019196133A1 (en) | Head-mounted visual aid device | |
CN206214373U (en) | Object detection from visual information to blind person, analysis and prompt system for providing | |
CN111597828B (en) | Translation display method, device, head-mounted display equipment and storage medium | |
CN106327584B (en) | Image processing method and device for virtual reality equipment | |
JP6771548B2 (en) | A portable system that allows the blind or visually impaired to interpret the surrounding environment by voice or touch. | |
CN108245385A (en) | A kind of device for helping visually impaired people's trip | |
KR102441171B1 (en) | Apparatus and Method for Monitoring User based on Multi-View Face Image | |
CN108245384A (en) | Binocular vision apparatus for guiding blind based on enhancing study | |
Liu et al. | Electronic travel aids for the blind based on sensory substitution | |
CN1969781A (en) | Guide for blind person | |
Blessenohl et al. | Improving indoor mobility of the visually impaired with depth-based spatial sound | |
CN105976675A (en) | Intelligent information exchange device and method for deaf-mute and average person | |
CN114973412A (en) | Lip language identification method and system | |
JP2016194612A (en) | Visual recognition support device and visual recognition support program | |
CN116572260A (en) | Emotion communication accompanying and nursing robot system based on artificial intelligence generated content | |
CN110717344A (en) | Auxiliary communication system based on intelligent wearable equipment | |
KR20120091625A (en) | Speech recognition device and speech recognition method using 3d real-time lip feature point based on stereo camera | |
Kaur et al. | A scene perception system for visually impaired based on object detection and classification using multi-modal DCNN | |
EP3058926A1 (en) | Method of transforming visual data into acoustic signals and aid device for visually impaired or blind persons | |
WO2022048455A1 (en) | Barrier free information access system and method employing augmented reality technology | |
Nazim et al. | Smart glasses: A visual assistant for the blind | |
CN114120960A (en) | Auxiliary space perception system and method based on hearing | |
US20210097888A1 (en) | Transmodal translation of feature vectors to audio for assistive devices | |
Scalvini et al. | Visual-auditory substitution device for indoor navigation based on fast visual marker detection | |
CN111273598A (en) | Underwater information interaction system, diver safety guarantee method and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |