CN114120960A - Auxiliary space perception system and method based on hearing - Google Patents

Auxiliary space perception system and method based on hearing Download PDF

Info

Publication number
CN114120960A
CN114120960A CN202111373446.5A CN202111373446A CN114120960A CN 114120960 A CN114120960 A CN 114120960A CN 202111373446 A CN202111373446 A CN 202111373446A CN 114120960 A CN114120960 A CN 114120960A
Authority
CN
China
Prior art keywords
audio
module
information
image
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111373446.5A
Other languages
Chinese (zh)
Other versions
CN114120960B (en
Inventor
费腾
李阳春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202111373446.5A priority Critical patent/CN114120960B/en
Publication of CN114120960A publication Critical patent/CN114120960A/en
Application granted granted Critical
Publication of CN114120960B publication Critical patent/CN114120960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/162Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • Artificial Intelligence (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to an auditory sense-based auxiliary space sensing system which comprises a data acquisition module, a man-machine interaction module, a control module, a walking mode module and a staring mode module, wherein two working modes of the walking mode and the staring mode are provided, and the walking mode and the staring mode correspond to two auditory sense-based auxiliary space sensing methods. The walking mode assisted space perception method provides larger space information amount, and effectively solves the problem that the existing sense organ substitution device cannot meet the requirements of the visually impaired due to insufficient space information; the gazing mode auxiliary space perception method provides more refined and concentrated space descriptive information, and effectively solves the problem that the practicability is not strong because the existing sense organ substitution device provides redundant information; the two methods are combined to meet different use requirements of the visually impaired users in different scenes.

Description

Auxiliary space perception system and method based on hearing
Technical Field
The invention belongs to the technical field of sense organ replacement, and particularly relates to an auxiliary space perception system and method based on hearing.
Background
Data published by the world health organization show that about 2.53 million people in the world suffer from vision disorders. The visually impaired people face a lot of difficulties in going out in daily life due to the lack of vision. With the development of society, the quality of life and the level of travel of visually impaired people are receiving more and more attention. The visual sense space is assisted to visually impaired people, the walking ability of the group is further improved, the life of the group is facilitated, and the problem to be solved urgently is solved.
At present, the traditional ways for assisting the blind to walk are mainly a blind stick and a blind guiding dog. However, they all have some disadvantages, such as limited detection range of the blind stick, high cultivation cost of the guide dog and limited application. Meanwhile, the traditional blind guiding methods only can help the visually impaired people to avoid obstacles on the circuit, but cannot enable the visually impaired people to know the spatial structure and scene information of the surrounding environment.
With the development of computer science and sensor technology, sensory substitution devices are being studied for spatial perception assistance in visually impaired people. Most sensory substitution devices attempt to use auditory signals instead of visual signals to provide scene information to users due to the characteristics of strong intuition and multiple available parameters of auditory sense. However, these studies tend to have two problems: the provision of insufficient spatial information fails to satisfy the needs of visually impaired persons, or the provision of excessively redundant information makes the utility poor.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an auxiliary space perception system and method based on auditory sense, which utilize auditory sense to replace visual sense and convert space scene information into a non-voice space audio signal or a voice description signal, thereby not only assisting the visually impaired to effectively avoid obstacles in the walking process and improving the space cognition sense of a user, but also effectively transmitting environmental information and improving the scene understanding ability of the user.
An auxiliary space perception system based on the sense of hearing comprises a data acquisition module, a human-computer interaction module, a control module, a walking mode module and a staring mode module, wherein the human-computer interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module.
The data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream.
The human-computer interaction module is used for transmitting a user instruction and sending a walking mode instruction and a staring mode instruction to the control module.
The control module processes the depth image data acquired by the data acquisition module according to a walking mode instruction of the man-machine interaction module, maps the depth image information to a spatial audio frequency, and outputs the spatial audio frequency through an earphone; and the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the gaze mode instruction of the human-computer interaction module, synthesizes and outputs corresponding voice through an earphone.
The walking mode module is used for detecting and outputting object azimuth and distance information in a space scene, and comprises: the depth image preprocessing submodule is used for processing a depth image data stream; the spatial audio generation submodule is used for mapping the depth image information to audio parameters and carrying out spatialization processing on the audio; and the spatial audio output sub-module is used for outputting spatial audio.
The gaze pattern module is used for identifying and outputting the attribute and state information of the object in the space scene, and comprises: the RGB image semantic construction submodule is used for converting the RGB image data stream into sentences; a speech synthesis submodule for converting the sentence into speech; and the voice output submodule is used for outputting the voice signal.
A walking mode auxiliary space perception method realized by the hearing-based auxiliary space perception system comprises the following steps:
step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values in each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of the null pixel to the null pixel, and traversing the image to execute the operation until no null value exists in the image;
102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image;
a step 103 of down-sampling the depth image obtained in the step 102 to 5 × 7 an image having an original size of 225 × 315;
104, mapping pixel values of the depth image subjected to downsampling processing in the step 103 to audio parameters;
step 105, using head-related function technique to perform spatialization processing on the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the image element in the image to the sound source position
Figure BDA0003363170620000021
The above step (1);
and step 106, outputting the spatial audio to an earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
In step 103, the down-sampling rule is: and assigning the minimum pixel value in the eight neighborhoods of the pixel to be solved to the pixel to be solved.
Moreover, the rule for converting the image information into the audio in step 104 is as follows: extracting the minimum pixel of each row in the image, setting a threshold value D, and comparing the minimum pixel with the threshold value D; when the minimum pixel value is smaller than or equal to D, the object at the position is close to the user, the minimum pixel value is mapped to the loudness and pitch of the beep, the smaller the pixel value is, the closer the object is, the louder the object is, the higher the loudness and the higher the pitch of the beep are, and otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid obstacles; when the minimum pel value is greater than D, the object representing the location is further away from the user and does not temporarily pose a collision threat, and the pel information is represented using a fixed loudness and pitch water droplet sound, which may be considered a "safety sound" that suggests that the object is further away.
Furthermore, after the process of the step 105, the beep sound or the water drop sound is spatially sensed, and the user will feel the direction from them, and the mapping rule is as follows:
θ=-120°+30°×y (1)
Figure BDA0003363170620000031
wherein x is the line number of the pixel in the image, y is the column number of the pixel in the image, a three-dimensional coordinate system is constructed by taking the center of the skull as an origin O, the x axis passes through the ear of a person, the y axis passes through the nose, the z axis is vertical to the xOy plane, theta is a horizontal angle formed by the connecting line of the sound source position and the origin O and the yOz plane,
Figure BDA0003363170620000032
is the height angle formed by the connecting line of the sound source position and the origin O and the xOy plane.
Moreover, in the step 106, for each frame of depth image, 7 audio segments with different horizontal angles are generated, and the audio segments are played sequentially from left to right; when the audio frequency is the sound of water drops, the object representing the position is far away from the user; when the audio is beep, the object indicating the position is less than D meters away from the user, and the higher the tone and the loudness of the beep are, the closer the object is; the user can judge the distance and the direction of the barrier according to the tone color, the tone, the loudness and the position information of the sound source of the audio, so that the barrier is avoided in the walking process.
A gaze mode auxiliary space perception method realized by the hearing-based auxiliary space perception system comprises the following steps:
step 201, receiving an RGB image transmitted by a data acquisition module in real time, and generating an English descriptive text of the image by calling a Computer Vision API service provided by Microsoft;
step 202, converting the English text generated in the step 1 into a Chinese text through Baidu translation API service;
step 203, based on a pyttsx module in python software, converting the Chinese text generated in the step 2 into voice;
and step 204, outputting voice to an earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current view scene information.
Compared with the prior art, the invention has the following advantages:
1) the system provided by the invention comprises two working modes, namely a walking mode and a staring mode, and corresponds to two hearing-based auxiliary space perception methods, and the two methods can meet different use requirements of a vision-impaired user in different scenes by combining.
2) The walking mode assisted space perception method uses an audibility technology to convert space scene information into a space audio signal in real time, so that the visually impaired can quickly acquire space structure information, effectively help the visually impaired to avoid obstacles in the walking process, and improve the space cognition. The walking mode assisted space perception method provides larger space information amount, and effectively solves the problem that the existing sense organ replacement device cannot meet the requirements of the visually impaired due to insufficient space information.
3) The gazing mode assisted space perception method converts space scene information into voice to read in real time, so that the visually impaired can quickly acquire descriptive information of the space scene, the visually impaired can be effectively helped to acquire environmental information, and scene understanding ability is improved. The gaze mode auxiliary space perception method provides more refined and concentrated space descriptive information, and effectively solves the problem that the existing sense organ substitution device provides redundant information and therefore is not high in practicability.
Drawings
Fig. 1 is a schematic structural diagram of an auxiliary space perception system based on hearing according to an embodiment of the invention.
FIG. 2 is a flowchart illustrating a walking mode assisted spatial awareness method according to the present invention.
FIG. 3 shows the horizontal angle θ and the elevation angle of the sound source position used in the walking mode assisted spatial perception method of the present invention
Figure BDA0003363170620000041
Schematic representation of (a).
FIG. 4 is a flow chart of a gaze-mode assisted spatial awareness method of the present invention.
Detailed Description
The invention provides an auxiliary space perception system and method based on auditory sense, which utilize auditory sense to replace visual sense to convert space scene information into a non-voice space audio signal or a voice description signal so as to assist a user in perceiving space and understanding a scene.
The technical solution of the present invention is further explained with reference to the drawings and the embodiments.
As shown in fig. 1, the present invention provides an auditory sense-based auxiliary spatial perception system, comprising: the system comprises a data acquisition module, a human-computer interaction module, a control module, a walking mode module and a staring mode module, wherein the human-computer interaction module is connected with the control module, and the control module and the data acquisition module are connected with the walking mode module and the staring mode module.
The data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream.
The human-computer interaction module is used for transmitting a user instruction and sending a walking mode instruction and a staring mode instruction to the control module.
The control module processes the depth image data acquired by the data acquisition module according to a walking mode instruction of the man-machine interaction module, maps the depth image information to an audio parameter and outputs a spatial audio through an earphone; and the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the gaze mode instruction of the human-computer interaction module, synthesizes and outputs corresponding voice through an earphone.
The walking mode module is used for detecting and outputting object azimuth and distance information in a space scene, and comprises: the depth image preprocessing submodule is used for processing a depth image data stream; the spatial audio generation submodule is used for mapping the depth image information to audio parameters and carrying out spatialization processing on the audio; and the spatial audio output sub-module is used for outputting spatial audio.
The gaze pattern module is used for identifying and outputting the attribute and state information of the object in the space scene, and comprises: the RGB image semantic construction submodule is used for converting the RGB image data stream into sentences; a speech synthesis submodule for converting the sentence into speech; and the voice output submodule is used for outputting the voice signal.
By arranging the human-computer interaction module, a user can freely select a mode, and different types of information of surrounding space scenes can be acquired through different types of sound output; by arranging the walking mode module, a user can quickly sense the spatial structure of the surrounding environment and help the user to master the direction and distance information of the barrier, so that the barrier is effectively avoided and the safety of the user in the walking process is ensured; by setting the gaze pattern module, the user can quickly acquire descriptive information of the surrounding environment, which is helpful for the user to understand the spatial scene.
The system comprises two working modes, namely a walking mode and a staring mode, and corresponds to two hearing-based auxiliary space perception methods: a walking mode aided space perception method and a gaze mode aided space perception method. The walking mode assisted space perception method uses an audibility technology to convert space scene information into a space audio signal in real time, so that the visually impaired can quickly acquire space structure information, effectively help the visually impaired to avoid obstacles in the walking process, and improve the space cognition. The gazing mode assisted space perception method converts space scene information into voice to read in real time, so that the visually impaired can quickly acquire descriptive information of the space scene, the visually impaired can be effectively helped to acquire environmental information, and scene understanding ability is improved.
As shown in fig. 2, the walking mode assisted spatial perception method includes the following steps:
step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values in each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of the null pixel to the null pixel, and traversing the image to execute the operation until no null value exists in the image.
And 102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image.
Step 103, down-sampling the depth image obtained in step 102 to 5 × 7 of an image with an original size of 225 × 315, wherein the down-sampling rule is as follows: and assigning the minimum pixel value in the eight neighborhoods of the pixel to be solved to the pixel to be solved.
And step 104, mapping the pixel values of the depth image subjected to the downsampling processing in the step 103 to the audio parameters.
The rule for converting image information into audio is as follows: the minimum pixel of each column in the image is extracted, a threshold value D (in the embodiment, D is 3 meters) is set, and the minimum pixel is compared with the threshold value D. And when the minimum pixel value is less than or equal to D, the object representing the position is closer to the user, the minimum pixel value is mapped to the loudness and pitch of the beep, the smaller the pixel value is, the closer the object is represented, the louder the loudness of the mapped beep is higher, the higher the pitch is, and otherwise, the louder the loudness is and the lower the pitch is, so that the user is prompted to avoid obstacles. When the minimum pel value is greater than D, the object representing the location is further away from the user and does not temporarily pose a collision threat, and the pel information is represented using a fixed loudness and pitch water droplet sound, which may be considered a "safety sound" that suggests that the object is further away.
Step 105, using head-related function technique to perform spatialization processing on the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the image element in the image to the sound source position
Figure BDA0003363170620000061
The above. After the spatialization treatment, the beep sound or the water drop sound is spacious, and the user feels the direction from which they come.
The specific mapping rule is shown as the following formula:
θ=-120°+30°×y (1)
Figure BDA0003363170620000062
wherein x is the row number of the pixel in the image, y is the column number of the pixel in the image, as shown in figure 3, a three-dimensional coordinate system is constructed by taking the center of the skull as an origin O, the x axis passes through the ear of a person, the y axis passes through the nose,the z-axis is perpendicular to the xOy plane. Theta is a horizontal angle formed by a line connecting the sound source position with the origin O and the yOz plane,
Figure BDA0003363170620000063
is the height angle formed by the connecting line of the sound source position and the origin O and the xOy plane.
And step 106, outputting the spatial audio to an earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
For each frame depth image, 7 different horizontal angle audio clips will be generated, which are played sequentially in left to right order. When the audio frequency is the sound of water drops, the object representing the position is far away from the user; when the audio is beeping, the object representing the location is less than D meters away from the user, and the beeping sound is closer as the tone is higher and the loudness is greater. The user can judge the distance and the direction of the barrier according to the tone color, the tone, the loudness and the position information of the sound source of the audio, so that the barrier is avoided in the walking process.
As shown in fig. 3, the gaze-mode assisted spatial perception method comprises the steps of:
step 201, receiving the RGB image transmitted by the data acquisition module in real time, and generating an english descriptive text of the image by calling a Computer Vision API service provided by microsoft.
Step 202, the english text generated in step 201 is converted into a chinese text through a Baidu translation API service.
Step 203, based on the pyttsx module in the python software, the chinese text generated in step 202 is converted into speech.
And step 204, outputting voice to an earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current view scene information.
The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims (7)

1. An auxiliary space perception system based on hearing is characterized by comprising a data acquisition module, a human-computer interaction module, a control module, a walking mode module and a staring mode module, wherein the human-computer interaction module is connected with the control module;
the data acquisition module is used for acquiring spatial scene information, including a depth image data stream and an RGB image data stream;
the human-computer interaction module is used for transmitting a user instruction and sending a walking mode instruction and a staring mode instruction to the control module;
the control module processes the depth image data acquired by the data acquisition module according to a walking mode instruction of the man-machine interaction module, maps the depth image information to an audio parameter and outputs a spatial audio through an earphone; the control module constructs semantic information from the RGB image data acquired by the data acquisition module according to the staring mode instruction of the human-computer interaction module, synthesizes and outputs corresponding voice through an earphone;
the walking mode module is used for detecting and outputting object azimuth and distance information in a space scene, and comprises: the depth image preprocessing submodule is used for processing a depth image data stream; the spatial audio generation submodule is used for mapping the depth image information to spatial audio and carrying out spatialization processing on the audio; a spatial audio output sub-module for outputting spatial audio;
the gaze pattern module is used for identifying and outputting the attribute and state information of the object in the space scene, and comprises: the RGB image semantic construction submodule is used for converting RGB image data into sentences; a speech synthesis submodule for converting the sentence into speech; and the voice output submodule is used for outputting the voice signal.
2. A walking mode aided spatial perception method implemented using the hearing based aided spatial perception system of claim 1, comprising the steps of:
step 101, receiving a depth image data stream transmitted by a data acquisition module in real time, filling null values in each frame of depth image, namely, assigning an average value of eight neighborhood non-null pixel values of the null pixel to the null pixel, and traversing the image to execute the operation until no null value exists in the image;
102, performing Gaussian low-pass filtering processing on the depth image processed in the step 101 to filter noise and fuzzy details in the image;
a step 103 of down-sampling the depth image obtained in the step 102 to 5 × 7 an image having an original size of 225 × 315;
104, mapping pixel values of the depth image subjected to downsampling processing in the step 103 to audio parameters;
step 105, using head-related function technique to perform spatialization processing on the audio information generated in step 104, i.e. mapping the coordinates (x, y) of the image element in the image to the sound source position
Figure FDA0003363170610000021
The above step (1);
and step 106, outputting the spatial audio to an earphone worn by the user in real time, wherein the audio transmits the non-voice sound of the spatial structure information.
3. A walking mode aided spatial perception method as claimed in claim 2, characterized by: the down-sampling rule in step 103 is: and assigning the minimum pixel value in the eight neighborhoods of the pixel to be solved to the pixel to be solved.
4. A walking mode aided spatial perception method as claimed in claim 2, characterized by: the rule for converting the image information into the audio in step 104 is as follows: extracting the minimum pixel of each row in the image, setting a threshold value D, and comparing the minimum pixel with the threshold value D; when the minimum pixel value is smaller than or equal to D, the object at the position is close to the user, the minimum pixel value is mapped to the loudness and pitch of the beep, the smaller the pixel value is, the closer the object is, the louder the object is, the higher the loudness and the higher the pitch of the beep are, and otherwise, the smaller the loudness and the lower the pitch are, so that the user is prompted to avoid obstacles; when the minimum pel value is greater than D, the object representing the location is further away from the user and does not temporarily pose a collision threat, and the pel information is represented using a fixed loudness and pitch water droplet sound, which may be considered a "safety sound" that suggests that the object is further away.
5. A walking mode aided spatial perception method as claimed in claim 2, characterized by: the spatialized beeps or drips in step 105 are spatial, and the user will feel the directions they are coming from, the mapping rule is as follows:
θ=-120°+30°×y (1)
Figure FDA0003363170610000022
wherein x is the line number of the pixel in the image, y is the column number of the pixel in the image, a three-dimensional coordinate system is constructed by taking the center of the skull as an origin O, the x axis passes through the ear, the y axis passes through the nose, the z axis is vertical to the xOy plane, theta is a horizontal angle formed by the connecting line of the sound source position and the origin and the yOz plane,
Figure FDA0003363170610000023
the height angle formed by the connecting line of the sound source position and the origin and the xOy plane.
6. A walking mode aided spatial perception method as claimed in claim 2, characterized by: in step 106, for each frame depth image, 7 audio segments with different horizontal angles are generated, and the audio segments are played sequentially from left to right; when the audio frequency is the sound of water drops, the object representing the position is far away from the user; when the audio is beep, the object indicating the position is less than D meters away from the user, and the higher the tone and the loudness of the beep are, the closer the object is; the user can judge the distance and the direction of the barrier according to the tone color, the tone, the loudness and the position information of the sound source of the audio, so that the barrier is avoided in the walking process.
7. A gaze-mode aided spatial perception method implemented using the hearing-based aided spatial perception system of claim 1, comprising the steps of:
step 201, receiving an RGB image transmitted by a data acquisition module in real time, and generating an English descriptive text of the image by calling a Computer Vision API service provided by Microsoft;
step 202, converting the English text generated in the step 201 into a Chinese text through Baidu translation API service;
step 203, based on a pyttsx module in python software, converting the Chinese text generated in the step 202 into voice;
and step 204, outputting voice to an earphone worn by the user in real time, wherein the voice is a descriptive sentence of the current view scene information.
CN202111373446.5A 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing Active CN114120960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111373446.5A CN114120960B (en) 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111373446.5A CN114120960B (en) 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing

Publications (2)

Publication Number Publication Date
CN114120960A true CN114120960A (en) 2022-03-01
CN114120960B CN114120960B (en) 2024-05-03

Family

ID=80396465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111373446.5A Active CN114120960B (en) 2021-11-19 2021-11-19 Auxiliary space sensing system and method based on hearing

Country Status (1)

Country Link
CN (1) CN114120960B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060133140A (en) * 2005-06-20 2006-12-26 경북대학교 산학협력단 Reproduction of visual image using auditory stimulation and method controlling of the same
US20180078444A1 (en) * 2016-09-17 2018-03-22 Noah Eitan Gamerman Non-visual precision spatial awareness device.
CN109085926A (en) * 2018-08-21 2018-12-25 华东师范大学 A kind of the augmented reality system and its application of multi-modality imaging and more perception blendings
CN113038322A (en) * 2021-03-04 2021-06-25 聆感智能科技(深圳)有限公司 Method and device for enhancing environmental perception by hearing
CN113196390A (en) * 2021-03-09 2021-07-30 曹庆恒 Perception system based on hearing and use method thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060133140A (en) * 2005-06-20 2006-12-26 경북대학교 산학협력단 Reproduction of visual image using auditory stimulation and method controlling of the same
US20180078444A1 (en) * 2016-09-17 2018-03-22 Noah Eitan Gamerman Non-visual precision spatial awareness device.
CN109085926A (en) * 2018-08-21 2018-12-25 华东师范大学 A kind of the augmented reality system and its application of multi-modality imaging and more perception blendings
CN113038322A (en) * 2021-03-04 2021-06-25 聆感智能科技(深圳)有限公司 Method and device for enhancing environmental perception by hearing
CN113196390A (en) * 2021-03-09 2021-07-30 曹庆恒 Perception system based on hearing and use method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
徐洁;方志刚;鲍福良;张丽红;: "AudioMan:电子行走辅助系统的设计与实现", 中国图象图形学报, no. 07, 15 July 2007 (2007-07-15) *

Also Published As

Publication number Publication date
CN114120960B (en) 2024-05-03

Similar Documents

Publication Publication Date Title
WO2019196133A1 (en) Head-mounted visual aid device
CN206214373U (en) Object detection from visual information to blind person, analysis and prompt system for providing
CN111597828B (en) Translation display method, device, head-mounted display equipment and storage medium
CN106327584B (en) Image processing method and device for virtual reality equipment
JP6771548B2 (en) A portable system that allows the blind or visually impaired to interpret the surrounding environment by voice or touch.
CN108245385A (en) A kind of device for helping visually impaired people's trip
KR102441171B1 (en) Apparatus and Method for Monitoring User based on Multi-View Face Image
CN108245384A (en) Binocular vision apparatus for guiding blind based on enhancing study
Liu et al. Electronic travel aids for the blind based on sensory substitution
CN1969781A (en) Guide for blind person
Blessenohl et al. Improving indoor mobility of the visually impaired with depth-based spatial sound
CN105976675A (en) Intelligent information exchange device and method for deaf-mute and average person
CN114973412A (en) Lip language identification method and system
JP2016194612A (en) Visual recognition support device and visual recognition support program
CN116572260A (en) Emotion communication accompanying and nursing robot system based on artificial intelligence generated content
CN110717344A (en) Auxiliary communication system based on intelligent wearable equipment
KR20120091625A (en) Speech recognition device and speech recognition method using 3d real-time lip feature point based on stereo camera
Kaur et al. A scene perception system for visually impaired based on object detection and classification using multi-modal DCNN
EP3058926A1 (en) Method of transforming visual data into acoustic signals and aid device for visually impaired or blind persons
WO2022048455A1 (en) Barrier free information access system and method employing augmented reality technology
Nazim et al. Smart glasses: A visual assistant for the blind
CN114120960A (en) Auxiliary space perception system and method based on hearing
US20210097888A1 (en) Transmodal translation of feature vectors to audio for assistive devices
Scalvini et al. Visual-auditory substitution device for indoor navigation based on fast visual marker detection
CN111273598A (en) Underwater information interaction system, diver safety guarantee method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant