CN115218903A - Object searching method and system for visually impaired people - Google Patents
Object searching method and system for visually impaired people Download PDFInfo
- Publication number
- CN115218903A CN115218903A CN202210516416.3A CN202210516416A CN115218903A CN 115218903 A CN115218903 A CN 115218903A CN 202210516416 A CN202210516416 A CN 202210516416A CN 115218903 A CN115218903 A CN 115218903A
- Authority
- CN
- China
- Prior art keywords
- target
- real
- spatial relationship
- point cloud
- environment image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000001771 impaired effect Effects 0.000 title claims abstract description 33
- 238000001514 detection method Methods 0.000 claims abstract description 96
- 239000013598 vector Substances 0.000 claims abstract description 44
- 230000009471 action Effects 0.000 claims abstract description 41
- 230000000007 visual effect Effects 0.000 claims description 36
- 238000013507 mapping Methods 0.000 claims description 27
- 230000004931 aggregating effect Effects 0.000 claims description 17
- 230000002776 aggregation Effects 0.000 claims description 9
- 238000004220 aggregation Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 9
- 230000001131 transforming effect Effects 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims description 2
- 230000033001 locomotion Effects 0.000 abstract description 6
- 230000000875 corresponding effect Effects 0.000 description 57
- 238000005516 engineering process Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 4
- 150000001875 compounds Chemical class 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000006116 polymerization reaction Methods 0.000 description 3
- 241000282472 Canis lupus familiaris Species 0.000 description 2
- 241000282412 Homo Species 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 2
- 241000220225 Malus Species 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 101100327917 Caenorhabditis elegans chup-1 gene Proteins 0.000 description 1
- 101100116570 Caenorhabditis elegans cup-2 gene Proteins 0.000 description 1
- 241000764238 Isis Species 0.000 description 1
- 206010047571 Visual impairment Diseases 0.000 description 1
- 235000021016 apples Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000035622 drinking Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 208000029257 vision disease Diseases 0.000 description 1
- 230000004393 visual impairment Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/206—Instruments for performing navigational calculations specially adapted for indoor navigation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/005—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 with correlation of navigation data from several sources, e.g. map or contour matching
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3804—Creation or updating of map data
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3804—Creation or updating of map data
- G01C21/3833—Creation or updating of map data characterised by the source of data
- G01C21/3841—Data obtained from two or more sources, e.g. probe vehicles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an object searching method facing the visually impaired, which comprises the steps of forming an image data set, and acquiring the priori knowledge of the object space relationship coded in the form of image embedded vectors; constructing dense point cloud to obtain a two-dimensional occupancy grid map; detecting the real-time environment image through a target detection model according to the voice instruction; when the detection result is that no target object exists in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through image-embedded vector-form object spatial relationship prior knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined; and calculating a motion instruction of navigation walking according to the camera pose and the navigation path, and feeding back the motion instruction to the user. The invention can search corresponding articles according to the voice instruction and provide corresponding navigation and action to meet the object searching requirement of the visually impaired, thereby improving the life quality of the visually impaired.
Description
Technical Field
The invention relates to the technical field of image processing, in particular to an object searching method and system for visually impaired people.
Background
Vision is an important way for humans to obtain information about the surroundings. For people with visual impairment, the lack of visual information brings inconvenience to life. In the past, the visually impaired people usually adopt an alternative way of blind guiding sticks, blind guiding dogs and the like to acquire environmental information. With the development of machine learning and computer vision technology, it is becoming possible to provide visual information to and assist the blind in their daily lives using computer vision technology. For example, the image description generation technology can assist the blind to recognize the environment, the target recognition and target detection technology can help the blind to find surrounding objects, the face recognition technology can inform the blind of the arrival of acquaintances, and the optical text recognition technology can help the blind to read books and newspapers.
However, these technologies are generally oriented to a specific function, and from the technical point of view, the technology solves the problem at a certain moment in a certain aspect, rather than from the point of view of the real demand of the blind person to meet the demand of the blind person in real life. Therefore, these techniques are difficult to apply to the daily life of the blind. For example, when a blind person wants a drinking vessel, he needs to actively use object detection techniques to look up objects in the current field of view several times in succession, and even if a suitable object is detected, it is difficult to determine its specific location. Although the artificial intelligence related algorithm is mature day by day, the practical requirements of the blind are still difficult to meet.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides an object searching method and system for the visually impaired people, which can search corresponding objects according to the instruction of a user and provide corresponding navigation so as to meet the object searching requirement of the visually impaired people.
The technical scheme of the invention is realized as follows:
according to one aspect of the invention, an object finding method for visually impaired people is provided.
The object searching method for the visually impaired comprises the following steps:
acquiring an environment image of a current environment where a user is located according to a pre-configured visual sensor to form an image data set; aggregating object spatial relationship knowledge maps according to the image dataset; acquiring object spatial relationship prior knowledge coded in a form of image embedding vectors according to the object spatial relationship knowledge graph;
constructing dense point cloud according to the real-time environment image acquired by the visual sensor to acquire a two-dimensional occupation grid map; performing attitude estimation on the real-time environment image to acquire a camera pose output at each time step;
acquiring a voice instruction of a user; detecting the real-time environment image through a preset target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image;
when the detection result is that a target object corresponding to the voice command exists in the real-time environment image, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined; when the detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through the image embedded vector type object space relation priori knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined;
and calculating the current attitude deviation of the user according to the camera pose and the navigation path, calculating an action instruction of the user for navigation and walking according to the navigation path according to the attitude deviation, and feeding the action instruction back to the user to prompt the user to execute the navigation path according to the action instruction to search for an object.
Wherein aggregating the object spatial relationship knowledge-graph according to the image dataset and obtaining the object spatial relationship prior knowledge encoded as a graph embedding vector form according to the object spatial relationship knowledge-graph comprises:
generating a scene atlas according to the image dataset through a pre-configured target detection model and a scene generation model; aggregating the scene atlas to obtain a preliminary object spatial relationship knowledge graph; carrying out statistical processing on the preliminary object spatial relationship knowledge graph to obtain an object spatial relationship knowledge graph; acquiring the prior knowledge of the object spatial relationship encoded in a graph embedding vector form by a variational graph self-encoder method based on an unsupervised learning method according to the object spatial relationship knowledge graph; and when the scene atlas is aggregated to obtain a preliminary object spatial relationship knowledge graph, deleting all labels which are artificially marked and are not suitable for serving as targets in advance.
The method for acquiring the two-dimensional occupation grid map by constructing the dense point cloud according to the real-time environment image acquired by the visual sensor comprises the following steps:
acquiring a depth map of each time step from a real-time environment image acquired by the vision sensor; mapping the depth map of each time step into local dense point cloud; deleting the point marks smaller than a preset threshold in the local dense point cloud; converting the deleted local dense point cloud into a global space and splicing the local dense point cloud and the global point cloud; and converting the global point cloud after each splicing into an octree map, projecting the octree map to a top view plane, and acquiring a two-dimensional occupancy grid map according to the projected top view plane.
Detecting the real-time environment image through the target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image comprises:
analyzing the voice command, and determining a target object corresponding to the voice command; detecting the image of each time step of the real-time environment image through a target detection model to obtain a target detection result; when the target detection result is that a target object corresponding to the voice command exists in the real-time environment image, taking the corresponding target object in the target detection result as a target object to be searched; when the target detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, analyzing all targets in the target detection result, and determining an embedded vector corresponding to each target; calculating the Euclidean distance between each target and the corresponding embedded vector of the target object; and when the Euclidean distance is greater than a preset threshold value, deleting the target from the target detection result, and analyzing all targets in the target detection result, determining whether the area related to each target contains the marked explored point in the corresponding point cloud, if so, deleting the target from the target detection result, sorting all targets with the Euclidean distance smaller than the preset value from near to far, and taking the target positioned at the first sorting position as an auxiliary target with the highest association degree with the target object.
Optionally, the feedback mode of the action instruction is at least one of the following: voice feedback, frequency buzzing feedback; the vision sensor is a wearable helmet type RGB-D camera.
According to another aspect of the present invention, there is provided an object finding system for visually impaired people.
This object searching system towards barrier person includes:
the priori knowledge module is used for acquiring an environment image of the current environment where the user is located according to a pre-configured visual sensor to form an image data set; aggregating object spatial relationship knowledge maps according to the image dataset; acquiring the spatial relationship prior knowledge of the object coded in the form of an image embedding vector according to the spatial relationship knowledge graph of the object;
the visual positioning and mapping module is used for constructing dense point cloud according to the real-time environment image acquired by the visual sensor to acquire a two-dimensional occupation grid map; carrying out attitude estimation on the real-time environment image to acquire a camera pose output at each time step;
the target determination module is used for acquiring a voice instruction of a user; detecting the real-time environment image through a preset target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image;
the path planning module is used for planning a path through a two-dimensional occupation grid map and determining a navigation path when the detection result is that a target object corresponding to the voice instruction exists in the real-time environment image; when the detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through the image embedded vector type object space relation priori knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined;
and the action generating and feedback module is used for calculating the current attitude deviation of the user according to the camera pose and the navigation path, calculating an action instruction for the user to navigate and walk according to the navigation path according to the attitude deviation, and feeding the action instruction back to the user to prompt the user to execute the navigation path according to the action instruction to search for an object.
The priori knowledge module comprises a scene graph generation submodule, a knowledge aggregation submodule and a knowledge coding submodule, wherein the scene graph generation submodule is used for generating a scene graph set through a pre-configured target detection model and a scene generation model according to the image data set; the knowledge aggregation sub-module is used for aggregating the scene graph set to obtain a preliminary object spatial relationship knowledge graph; carrying out statistical processing on the preliminary object spatial relationship knowledge graph to obtain an object spatial relationship knowledge graph; when the scene atlas is aggregated to obtain a preliminary object spatial relationship knowledge graph, deleting all labels which are artificially marked and are not suitable for serving as targets in advance; and the knowledge coding sub-module is used for acquiring the prior knowledge of the spatial relationship of the object coded in the form of an image embedding vector through a variational image self-encoder method based on an unsupervised learning method according to the spatial relationship knowledge map of the object.
The visual positioning and mapping module comprises a point cloud mapping submodule, a point cloud splicing submodule and a map projection submodule, and the point cloud mapping submodule is used for acquiring a depth map of each time step from a real-time environment image acquired by the visual sensor; mapping the depth map of each time step into local dense point cloud; the point cloud splicing submodule is used for deleting the point marks smaller than a preset threshold in the local dense point cloud; converting the deleted local dense point cloud into a global space and splicing the local dense point cloud and the global point cloud; and the map projection submodule is used for converting the global point cloud after each splicing into an octree map, projecting the octree map to a top view plane, and acquiring a two-dimensional occupation grid map according to the projected top view plane.
The target detection submodule is used for analyzing the voice command and determining a target object corresponding to the voice command; detecting the image of each time step of the real-time environment image through a target detection model to obtain a target detection result; the target acquisition submodule is used for taking a corresponding target object in a target detection result as a target object to be searched when the target detection result is that the target object corresponding to the voice command exists in the real-time environment image; the target reasoning submodule is used for analyzing all targets in the target detection result and determining an embedded vector corresponding to each target when the target detection result is that a target object corresponding to the voice instruction does not exist in the real-time environment image; calculating the Euclidean distance between each target and the corresponding embedded vector of the target object; when the Euclidean distance is larger than a preset threshold value, deleting the target from the target detection result, and analyzing all targets in the target detection result, determining whether the area related to each target contains a marked explored point in the corresponding point cloud, and if so, deleting the target from the target detection result; and sorting all the targets with Euclidean distances smaller than a preset value from near to far, and taking the target positioned at the first sorting as an auxiliary target with the highest association degree with the target object.
Optionally, the feedback mode of the action instruction is at least one of the following: voice feedback, frequency buzzing feedback; the vision sensor is a wearable helmet type RGB-D camera.
Has the advantages that: according to the invention, the environment image is acquired through the vision sensor, and the computer vision, the scene graph, the knowledge graph and the vision synchronous positioning and graph building technology are utilized to assist the vision-impaired people in searching for corresponding articles according to the voice instruction in the complex indoor environment, and provide corresponding navigation and actions to meet the object searching requirements of the vision-impaired people, so that the life quality of the vision-impaired people is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required in the embodiments will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an object finding method for the visually impaired according to an embodiment of the present invention;
FIG. 2 is a block diagram of an object-finding system for visually impaired persons according to an embodiment of the present invention;
FIG. 3 is an architecture diagram of an application of the object-finding system for visually impaired according to an embodiment of the present invention;
fig. 4 is a flowchart of an application of the method for finding an object for the visually impaired according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments of the present invention by a person skilled in the art, are within the scope of the present invention.
According to the embodiment of the invention, the invention provides an object searching method and system for the visually impaired.
As shown in fig. 1, an object searching method for the visually impaired according to an embodiment of the present invention includes:
step S101, acquiring an environment image of a current environment where a user is located according to a pre-configured visual sensor to form an image data set; aggregating object spatial relationship knowledge-graphs from the image dataset; acquiring the spatial relationship prior knowledge of the object coded in the form of an image embedding vector according to the spatial relationship knowledge graph of the object;
step S103, constructing dense point cloud according to the real-time environment image acquired by the visual sensor to acquire a two-dimensional occupation grid map; carrying out attitude estimation on the real-time environment image to acquire a camera pose output at each time step;
step S105, acquiring a voice instruction of a user; detecting the real-time environment image through a pre-configured target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image;
s107, when the detection result is that a target object corresponding to the voice command exists in the real-time environment image, planning a path through a two-dimensional occupation grid map, and determining a navigation path; when the detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through the image embedded vector type object space relation prior knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined;
step S109, calculating the current attitude deviation of the user according to the camera pose and the navigation path, calculating an action instruction of the user for navigation walking according to the navigation path according to the attitude deviation, and feeding the action instruction back to the user to prompt the user to execute the navigation path according to the action instruction to search for an object.
In one embodiment, when aggregating the object spatial relationship knowledge-graph according to the image dataset and acquiring the object spatial relationship prior knowledge encoded in the form of graph-embedded vectors according to the object spatial relationship knowledge-graph, a scene graph set may be generated by a pre-configured target detection model and a scene generation model according to the image dataset; aggregating the scene atlas to obtain a preliminary object spatial relationship knowledge graph; carrying out statistical processing on the preliminary object spatial relationship knowledge graph to obtain an object spatial relationship knowledge graph; acquiring the prior knowledge of the object spatial relationship coded in a graph embedding vector form by a variational graph self-encoder method based on an unsupervised learning method according to the object spatial relationship knowledge graph; and when the scene atlas is aggregated to obtain a preliminary object spatial relationship knowledge graph, deleting all labels which are artificially marked and are not suitable for serving as targets in advance.
In one embodiment, when constructing a dense point cloud to obtain a two-dimensional occupancy grid map according to a real-time environment image acquired by the visual sensor, a depth map of each time step can be obtained from the real-time environment image acquired by the visual sensor; mapping the depth map of each time step into local dense point cloud; deleting the point marks smaller than a preset threshold in the local dense point cloud; transforming the deleted local dense point cloud into a global space and splicing the global space and the global point cloud; and converting the global point cloud after each splicing into an octree map, projecting the octree map to a top view plane, and acquiring a two-dimensional occupancy grid map according to the projected top view plane.
In one embodiment, the real-time environment image is detected through the target detection model according to the voice instruction; when detecting whether a target object corresponding to the voice instruction exists in the real-time environment image, analyzing the voice instruction to determine the target object corresponding to the voice instruction; detecting the image of each time step of the real-time environment image through a target detection model to obtain a target detection result; when the target detection result is that a target object corresponding to the voice command exists in the real-time environment image, taking the corresponding target object in the target detection result as a target object to be searched; when the target detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, analyzing all targets in the target detection result, and determining an embedded vector corresponding to each target; calculating the Euclidean distance between each target and the corresponding embedded vector of the target object; and when the Euclidean distance is greater than a preset threshold value, deleting the target from the target detection result, and analyzing all targets in the target detection result, determining whether the area related to each target contains a marked explored point in the corresponding point cloud, if so, deleting the target from the target detection result, sorting all targets with the Euclidean distance smaller than a preset value from near to far, and taking the target positioned at the first sorting position as an auxiliary target with the highest association degree with the target object.
In one embodiment, the feedback mode of the action instruction is at least one of the following modes: the visual sensor is a wearable helmet type RGB-D camera.
As shown in fig. 2, an object searching system for the visually impaired according to an embodiment of the present invention includes:
the priori knowledge module 201 is used for acquiring an environment image of the current environment where a user is located according to a pre-configured visual sensor to form an image data set; aggregating object spatial relationship knowledge-graphs from the image dataset; acquiring the spatial relationship prior knowledge of the object coded in the form of an image embedding vector according to the spatial relationship knowledge graph of the object;
the visual positioning and mapping module 203 is used for constructing dense point cloud according to the real-time environment image acquired by the visual sensor to acquire a two-dimensional occupancy grid map; carrying out attitude estimation on the real-time environment image to acquire a camera pose output at each time step;
a target confirmation module 205, configured to obtain a voice instruction of a user; detecting the real-time environment image through a pre-configured target detection model according to the voice instruction; detecting whether a target object corresponding to the voice command exists in the real-time environment image or not;
the path planning module 207 is used for planning a path through a two-dimensional occupation grid map and determining a navigation path when a detection result shows that a target object corresponding to the voice command exists in the real-time environment image; when the detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through the image embedded vector type object space relation priori knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined;
and the action generating and feedback module 209 is configured to calculate a current posture deviation of the user according to the camera pose and the navigation path, calculate an action instruction for the user to navigate and walk according to the navigation path according to the posture deviation, and feed the action instruction back to the user, so that the user is prompted to execute the navigation path according to the action instruction to search for an object.
In one embodiment, the prior knowledge module 201 includes a scene graph generation sub-module (not shown in the figure), a knowledge aggregation sub-module (not shown in the figure), and a knowledge encoding sub-module (not shown in the figure), where the scene graph generation sub-module is configured to generate a scene graph set according to the image data set through a pre-configured target detection model and a scene generation model; the knowledge aggregation sub-module is used for aggregating the scene graph set to obtain a preliminary object spatial relationship knowledge graph; carrying out statistical processing on the preliminary object spatial relationship knowledge graph to obtain an object spatial relationship knowledge graph; when the scene atlas is aggregated to obtain a preliminary object spatial relationship knowledge graph, deleting all labels which are artificially marked and are not suitable for serving as targets in advance; and the knowledge coding submodule is used for acquiring the prior knowledge of the spatial relationship of the object coded in the form of the graph embedding vector by a variational graph self-coder method based on an unsupervised learning method according to the spatial relationship knowledge graph of the object.
In one embodiment, the visual positioning and mapping module 203 includes a point cloud mapping sub-module (not shown), a point cloud stitching sub-module (not shown), and a map projection sub-module (not shown), and the point cloud mapping sub-module is configured to obtain a depth map at each time step from a real-time environment image acquired by the visual sensor; mapping the depth map of each time step into local dense point cloud; the point cloud splicing submodule is used for deleting the point marks smaller than a preset threshold in the local dense point cloud; converting the deleted local dense point cloud into a global space and splicing the local dense point cloud and the global point cloud; and the map projection submodule is used for converting the global point cloud after each splicing into an octree map, projecting the octree map to a top view plane, and acquiring a two-dimensional occupation grid map according to the projected top view plane.
In one embodiment, the target confirmation module 205 includes a target detection sub-module (not shown in the figure), a target obtaining sub-module (not shown in the figure) and a target reasoning sub-module (not shown in the figure), and the target detection sub-module is configured to analyze the voice command and determine a target object corresponding to the voice command; detecting the image of each time step of the real-time environment image through a target detection model to obtain a target detection result; the target acquisition submodule is used for taking a corresponding target object in a target detection result as a target object to be searched when the target detection result is that the target object corresponding to the voice command exists in the real-time environment image; the target reasoning submodule is used for analyzing all targets in the target detection result and determining an embedded vector corresponding to each target when the target detection result is that a target object corresponding to the voice instruction does not exist in the real-time environment image; calculating the Euclidean distance between each target and the corresponding embedded vector of the target object; when the Euclidean distance is larger than a preset threshold value, deleting the target from the target detection result, and analyzing all targets in the target detection result, determining whether the area related to each target contains a marked explored point in the corresponding point cloud, and if so, deleting the target from the target detection result; and sorting all the targets with Euclidean distances smaller than a preset value from near to far, and taking the target positioned at the first sorting as an auxiliary target with the highest association degree of the target object.
In one embodiment, the feedback mode of the action instruction is at least one of the following modes: the visual sensor is a wearable helmet type RGB-D camera.
In order to facilitate understanding of the above-described embodiments of the present invention, the following detailed description of the embodiments of the present invention is made based on the principle of application.
According to the embodiment of the invention, the object searching system for the visually impaired comprises 1 helmet, an RGB-D camera and a microphone which are fixed on the helmet. On the software system, as shown in fig. 3, the system comprises a priori knowledge module, a visual SLAM positioning and mapping module, a sub-target reasoning module, a path planning module, an action generating module and a feedback module.
The priori knowledge module is composed of three sub-modules, namely scene graph generation, knowledge aggregation and knowledge coding. And (3) sequentially passing three sub-modules of scene graph generation, knowledge aggregation and knowledge coding from an unlabeled RGB image data set, and learning and coding the object spatial relationship priori knowledge in a graph embedding vector form.
The visual SLAM positioning and mapping module takes the RGB image and the depth image as input, estimates the user pose of 6-DOF in real time and constructs a two-dimensional occupation grid map in real time. And providing data support for a subsequent sub-target recommendation and path planning module.
And the sub-target reasoning module is used for analyzing the image acquired by the visual sensor in real time and deciding a global navigation target. Specifically, when a semantic target to be searched is detected in the real-time image, the global navigation target is set as the detected semantic target; when the semantic target to be searched is not detected in the real-time image, the prior knowledge of the object spatial relationship obtained from the prior knowledge module is used for carrying out sub-target reasoning, a target which is most likely to find the task target nearby in the current visual field is obtained as a sub-target, and the global navigation target is updated by using the sub-target.
The path planning module plans a feasible path capable of reaching the global navigation target according to the user pose and the two-dimensional occupation grid map acquired by the visual SLAM positioning and mapping module and the global navigation target acquired by the sub-target reasoning module by using a path planning algorithm.
And the action generating module calculates an action instruction according to the user pose acquired from the visual SLAM positioning and mapping module and the feasibility path acquired from the path planning module, and is used for guiding the user to gradually reach the global navigation target acquired by inference in the sub-target inference module.
And the feedback module is used for transmitting the action instruction obtained by calculation in the action generation module to the user.
In specific application, the work flow is shown in fig. 4, and includes the following steps:
(1) Aggregating object spatial relationship knowledge-maps from the unmarked RGB image dataset. This process can be described by the following equation:
in the formula (I), the compound is shown in the specification,representing an unlabeled RGB image dataset, SGG representing a scene graph generative model, SG representing a scene graph set, G representing a preliminary knowledge-graph,representing the final aggregated knowledge-graph of the object spatial relationship, F aggr Representing the course function from SG to G, F filt Denotes from G toIs performed by the process function of (1). In this embodiment, the RGB image dataset is an RGB image portion of a SUN RGB-D dataset, and the dataset contains 10335 RGB indoor scene RGB images. The specific polymerization steps are as follows:
sg={V,E,L,P,φ,ψ}
in the formula, V represents a node set in sg; e denotes the set of edges in sg, for any E i,j E denotes the connecting node v i And v j Directed edge of, v i ,v j Belongs to V; l represents a label set detected in the image by the target detection model; p represents a relation type set detected by the scene graph model; mapping phi:ψ:the full shot is adopted. Then is formed byA set of scene graphs may be derivedIn an arbitrary scene graph sg, forIf phi (v) i )≠φ(v j ) And isThen consider node v i ,v j Are spatially close. This means that the text focuses on "presence or absence" of spatial relationships and disregards specific categories of relationships. The set of scene graphs can be further represented as:
SG={V,E,L,φ}
in the formula (I), the compound is shown in the specification,v represents a union set of V sets corresponding to each scene graph in the scene graph set; e represents the union set of E sets corresponding to each scene graph in the scene graph set; l represents the union set of L sets corresponding to each scene graph in the scene graph set;indicating the number of images in the RGB image data set.
(1-2) aggregating the preliminary knowledge-graph G from SG. From SG through F aggr The knowledge graph G obtained by the polymerization is defined as:
in the formula, V represents a node set in G; e represents the set of edges in G; l represents a set of tags in G;is double-shot; c denotes each edge e i,j E belongs to the frequency set corresponding to E, eta:is a bijection. Process function F aggr The operation steps are as follows:
(1-2-1) in many cases, L contains some labels (e.g., "human", "dog", etc.), and should not be recommended as a sub-target. Manually marking all tags which are not suitable as sub-targets and marking as a setThen can obtain
(1-2-2) the function from V to V is expressed as:
can be obtained from the above formulav i Representing vertices belonging to a set V of vertices,/ t Representing the labels belonging to the set L.
(1-2-3) for relationship processing between objects, the present invention focuses on relationships "between tags" (e.g., < cup1, on, desk1> and < cup2, on, desk2>, are equivalent to < cup, on, desk >) as compared to relationships "between individuals". The function from E to E is then expressed as:
obtained by the above formulae i,j Representing edges belonging to the set of edges E, v i ,v j Representing vertices belonging to a set V of vertices,/ t ,Representing the tags belonging to the set of tags L. The aggregation process of knowledge graph G is represented as:
(1-3) obtaining an object space relation knowledge graph from the preliminary knowledge graph G through further processingTo ensure that a more generalized prior knowledge ORP can be learned, the relationship in G is statistically processed F filt . Note the history of F filt The later object relation knowledge map isIs defined as:
in the formula (I), the compound is shown in the specification,respectively representA set of nodes, a set of edges, and a set of target labels. Setting a threshold value T filt Discarding the intermediate frequency number of G lower than T filt The relationship (c) in (c). Then theCan be calculated from the following formula:
(2) And acquiring the spatial relation priori knowledge of the object coded in the form of an image embedding vector from the spatial relation knowledge graph of the object. To fully utilize the information of neighboring nodes and edges in the graph structure, graph data is typically processed using a graph convolution network. General graph volume network training requires labels, but in sub-label recommendation tasks, an accurate "true value" cannot be defined. For example, when the task goal is an apple and the sub-goal is a refrigerator, this simply means that apples are "more likely" to be found near the refrigerator, rather than "certain" to be found. Therefore, the invention adopts the unsupervised learning method to change the graph self-encoder method to obtain the graph embedding vector. The variational graph self-encoder encodes a known graph through a graph convolution network based on a decoder-encoder model to learn the distribution of the vector representation of the nodes, samples the vector representation of the nodes in the distribution, and then decodes (link prediction) to reconstruct the graph. The method comprises the following specific steps:
(2-1) mixingIs shown asWhereinTo representThe characteristic matrix after the unique hot coding of all nodes in the graph, A represents the adjacency matrix of the graph.
(2-2) use of variational diagrams for the self-encoder in the figureOn the encoder, using the trained encoder pairAnd coding to obtain an embedded vector Z, wherein the embedded vector Z is expressed by the following formula:
Z=encoder VGAE (X,A)
due to the fact thatIs a double shot, at the same timeBoth bijections, the mapping σ can be obtained:is a double shot. So far, obtaining the prior knowledge ORP of the object space relation encoded in the form of graph embedding vectors:
(3) Visual SLAM real-time localization. The present embodiment uses the RGB-D version of ORB-SLAM2 for pose estimation with RGB images and depth images as inputs. Outputting a 6-DOF Camera pose estimate pos at each time step t 。
(4) Visual SLAM real-time mapping. In order to obtain a more accurate map for assisting navigation and obstacle avoidance, the method constructs dense point cloud from the depth map to obtain the two-dimensional occupancy grid map. The method comprises the following specific steps:
(4-1) depth map depth of each time step t Mapping to a local dense point cloud pc t Setting a threshold T to avoid recommending duplicate sub-goals in the decision module exp Mixing pc with t Middle depth value less than T exp The points in (1) are marked as "explored" and the targets located in the explored zone will be removed in a sub-target reasoning stage. Then use the fuse t Will pc t And transforming to a global space and splicing with a global point cloud PC.
(4-2) since the data amount of the dense point cloud is considerable, considering the performance of hardware devices and the real-time performance of the system, for the local dense point cloud pc in each time step t And carrying out down-sampling. In order to eliminate map building errors caused by motion as much as possible, an octree map is used, a map which is updated in real time and expresses whether a certain node is occupied or not in a probability form is used, a global point cloud PC after each splicing is processed into the octree map, and the octree map is projected to a top view plane to acquire a two-dimensional occupied grid map t 。
(5) And carrying out sub-target reasoning and calculating a global navigation target. When a system is initialized, a microphone is used for receiving a voice instruction issued by a user, and a target g to be searched is set; initializing an empty list sub-goal As an alternative list of record sub-targets. The method comprises the following specific steps:
(5-1) RGB image RGB for each time step t And sending the target detection result to a target detection model to obtain a target detection result, wherein the fast-RCNN is adopted in the embodiment. Recording the target detection result corresponding to one graph as OBJ t ={B t ,L t In which B is t ,L t Respectively representing the detected target box set and the detected label set. Pair rgb t Detection result OBJ of t The following operations are carried out:
(5-1-1) from OBJ t To select an arbitrary targetThe ORP is queried to obtain its corresponding embedded vector. Setting a threshold T infe CalculatingThe Euclidean distance between the embedded vectors corresponding to g is larger than T infe From OBJ t In the middle of removingAnd the step is carried out again.
(5-1-2) ifThe point cloud corresponding to the area related to the object, including the point marked as "explored", is then extracted from the OBJ t In the middle of removingAnd (5-1-1) is restarted.
(5-1-4) repeating the steps of (5-1-1) to (5-1-3) until OBJ t Is empty.
(5-2) sorting list in order of near to far according to distance from g sub-goal Reordering, and then outputting list sub-goal As a result of sub-goal reasoning.
(6) And (6) planning the path. The observation range of the system changes at any time along with the movement of the user, and the map is updated in real time, so that a feasible path needs to be re-planned after the map is updated every time. On the one hand, in the task of the invention, it is not sensitive as to whether the path is optimal or not; on the other hand, considering the real-time performance of the system, in order to reduce the hardware computation load, the present embodiment adopts an interpolated RRT × 50 algorithm based on sampling to perform Path planning, and outputs a planned Path θ.
(7) And generating an action instruction. Humans are not sensitive to numerical angles, step sizes. Meanwhile, when the steering angle is small, the steering wheel rotates first and then moves forwards, and the action is also harsh for human beings. Considering the above factors, the present invention sets a motion seta i The terms "front left", "left turn", "forward", "front right" and "right turn" are sequentially indicated in this order, where "front left" and "front right" are used in the case where the steering angle is small. The specific steps of generating the action command are as follows:
(7-1) Path for planned Path t According to the current user's posture pos t Calculating the deviation angle theta t . Setting a threshold value T act The specific calculation steps are as follows:
(7-1-1) recording the position of the user as a point o and recording the current pose of the user asInitializing minimum distance records d min Is a rather large number, initialized to 10000 meters in this example; initial navigation points are recorded as
(7-1-2) Path planned from step (6) t Selecting point inComputingAnd o is a Euclidean distance d i If d is i >T act And d is i <d min Then d will be i Is assigned to d min Will beIs assigned toWill be provided withFrom Path t θ Is removed.
(7-2) setting an angle threshold Θ used for determining the type of motion 1 ,Θ 2 ,Θ 1 <Θ 2 . The action command a t The following formula is calculated:
(8) And feeding back the action instruction to the user. Two acoustic feedback approaches are used: voice announcements and beeps. The voice broadcast is suitable for quiet environments; the buzzing sound is suitable for a noisy environment, and the situation of voice broadcasting cannot be heard clearly; the content of the voice broadcast includes an action instruction set and "task success" and "task failure", which respectively indicate that a task target is found and not found. Correspondingly, the buzzes use different channels of buzzes at different frequencies to prompt the user for corresponding actions. Specifically, the two-channel very low frequency beep corresponds to "forward", and the frequency is set to 0.5Hz in this embodiment; the low and high frequency beeps of the left channel correspond to "left front" and "left turn", respectively, and the right and left channels are similar, with the frequencies of the low and high frequencies set to 2Hz and 20Hz, respectively, in this embodiment.
In summary, according to the technical scheme of the invention, the environment image is acquired through the vision sensor, and the computer vision, the scene map, the knowledge map and the vision synchronous positioning and mapping technology are utilized to assist the visually impaired people in searching for corresponding articles according to the voice instruction in the complex indoor environment, and provide corresponding navigation and actions to meet the object searching requirements of the visually impaired people, so that the life quality of the visually impaired people is improved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. An object searching method for visually impaired people, comprising:
acquiring an environment image of a current environment where a user is located according to a pre-configured visual sensor to form an image data set; aggregating object spatial relationship knowledge maps according to the image dataset; acquiring the spatial relationship prior knowledge of the object coded in the form of an image embedding vector according to the spatial relationship knowledge graph of the object;
according to the real-time environment image acquired by the visual sensor, dense point cloud is constructed to acquire a two-dimensional occupancy grid map; carrying out attitude estimation on the real-time environment image to acquire a camera pose output at each time step;
acquiring a voice instruction of a user; detecting the real-time environment image through a pre-configured target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image;
when the detection result is that a target object corresponding to the voice command exists in the real-time environment image, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined; when the detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through the image embedded vector type object space relation priori knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined;
and calculating the current attitude deviation of the user according to the camera pose and the navigation path, calculating an action instruction of the user for navigation and walking according to the navigation path according to the attitude deviation, and feeding the action instruction back to the user to prompt the user to execute the navigation path according to the action instruction to search for an object.
2. The visually impaired person-oriented object finding method as claimed in claim 1, wherein aggregating an object spatial relationship knowledge-graph from the image dataset and obtaining object spatial relationship prior knowledge encoded in the form of graph embedding vectors from the object spatial relationship knowledge-graph comprises:
generating a scene atlas according to the image dataset through a pre-configured target detection model and a scene generation model;
aggregating the scene atlas to obtain a preliminary object spatial relationship knowledge atlas; carrying out statistical processing on the preliminary object spatial relationship knowledge graph to obtain an object spatial relationship knowledge graph;
acquiring the prior knowledge of the object spatial relationship encoded in a graph embedding vector form by a variational graph self-encoder method based on an unsupervised learning method according to the object spatial relationship knowledge graph;
and when the scene atlas is aggregated to obtain a preliminary object spatial relationship knowledge graph, deleting all labels which are artificially marked and are not suitable for serving as targets in advance.
3. The method for finding the object facing the visually impaired according to claim 2, wherein the constructing a dense point cloud to obtain a two-dimensional occupancy grid map according to the real-time environment image collected by the vision sensor comprises:
acquiring a depth map of each time step from a real-time environment image acquired by the vision sensor; mapping the depth map of each time step into local dense point cloud;
deleting the point marks smaller than a preset threshold in the local dense point cloud; transforming the deleted local dense point cloud into a global space and splicing the global space and the global point cloud;
and converting the global point cloud after each splicing into an octree map, projecting the octree map to a top view plane, and acquiring a two-dimensional occupation grid map according to the projected top view plane.
4. The object searching method for the visually impaired according to claim 3, wherein the real-time environment image is detected by the target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image comprises:
analyzing the voice command, and determining a target object corresponding to the voice command; detecting the image of each time step of the real-time environment image through a target detection model to obtain a target detection result;
when the target detection result is that a target object corresponding to the voice command exists in the real-time environment image, taking the corresponding target object in the target detection result as a target object to be searched;
when the target detection result is that a target object corresponding to the voice instruction does not exist in the real-time environment image, analyzing all targets in the target detection result, and determining an embedded vector corresponding to each target; calculating the Euclidean distance between each target and the corresponding embedded vector of the target object; when the Euclidean distance is larger than a preset threshold value, deleting the target from the target detection result, and analyzing all targets in the target detection result, determining whether the area related to each target contains a marked explored point in the corresponding point cloud, and if so, deleting the target from the target detection result;
and sorting all the targets with Euclidean distances smaller than a preset value from near to far, and taking the target positioned at the first sorting as an auxiliary target with the highest association degree with the target object.
5. The visually impaired object finding method as claimed in claim 1, wherein the action command is fed back in at least one of the following ways: voice feedback, frequency buzzing feedback; the vision sensor is a wearable helmet type RGB-D camera.
6. An object finding system for visually impaired persons, comprising:
the priori knowledge module is used for acquiring an environment image of the current environment where the user is located according to a pre-configured visual sensor to form an image data set; aggregating object spatial relationship knowledge maps according to the image dataset; acquiring the spatial relationship prior knowledge of the object coded in the form of an image embedding vector according to the spatial relationship knowledge graph of the object;
the visual positioning and mapping module is used for constructing dense point cloud according to the real-time environment image acquired by the visual sensor to acquire a two-dimensional occupation grid map; carrying out attitude estimation on the real-time environment image to acquire a camera pose output at each time step;
the target determination module is used for acquiring a voice instruction of a user; detecting the real-time environment image through a pre-configured target detection model according to the voice instruction; detecting whether a target object corresponding to the voice instruction exists in the real-time environment image;
the path planning module is used for planning a path through a two-dimensional occupation grid map and determining a navigation path when the detection result is that a target object corresponding to the voice instruction exists in the real-time environment image; when the detection result is that a target object corresponding to the voice command does not exist in the real-time environment image, determining an auxiliary target with the highest degree of association with the target object through the image embedded vector type object space relation priori knowledge; according to the auxiliary target, path planning is carried out through a two-dimensional occupation grid map, and a navigation path is determined;
and the action generating and feedback module is used for calculating the current attitude deviation of the user according to the camera pose and the navigation path, calculating an action instruction for the user to navigate and walk according to the navigation path according to the attitude deviation, feeding the action instruction back to the user, and prompting the user to execute the navigation path according to the action instruction to search for an object.
7. The visually impaired object finding system of claim 6 wherein the a priori knowledge module comprises a scene graph generation sub-module, a knowledge aggregation sub-module, and a knowledge encoding sub-module, wherein,
the scene graph generation submodule is used for generating a scene graph set through a preset target detection model and a scene generation model according to the image data set;
the knowledge aggregation sub-module is used for aggregating the scene atlas to obtain a preliminary object spatial relationship knowledge atlas; carrying out statistical processing on the preliminary object spatial relationship knowledge graph to obtain an object spatial relationship knowledge graph; when the scene atlas is aggregated to obtain a preliminary object spatial relationship knowledge graph, deleting all labels which are artificially marked and are not suitable for serving as targets in advance;
and the knowledge coding submodule is used for acquiring the prior knowledge of the spatial relationship of the object coded in the form of the graph embedding vector by a variational graph self-coder method based on an unsupervised learning method according to the spatial relationship knowledge graph of the object.
8. The visually impaired object finding system according to claim 7, wherein the visual localization and mapping module comprises a point cloud mapping sub-module, a point cloud stitching sub-module and a map projection sub-module, wherein,
the point cloud mapping submodule is used for acquiring a depth map of each time step from a real-time environment image acquired by the visual sensor; mapping the depth map of each time step into local dense point cloud;
the point cloud splicing submodule is used for deleting the point marks smaller than a preset threshold in the local dense point cloud; converting the deleted local dense point cloud into a global space and splicing the local dense point cloud and the global point cloud;
and the map projection submodule is used for converting the global point cloud after each splicing into an octree map, projecting the octree map to a top view plane, and acquiring a two-dimensional occupation grid map according to the projected top view plane.
9. The visually impaired object finding system of claim 8 wherein the target determination module comprises a target detection sub-module, a target acquisition sub-module, and a target inference sub-module, wherein,
the target detection submodule is used for analyzing the voice command and determining a target object corresponding to the voice command; detecting the image of each time step of the real-time environment image through a target detection model to obtain a target detection result;
the target acquisition sub-module is used for taking a target object in a target detection result as a target object to be searched when the target detection result is that the target object corresponding to the voice command exists in the real-time environment image;
the target reasoning submodule is used for analyzing all targets in the target detection result and determining an embedded vector corresponding to each target when the target detection result is that a target object corresponding to the voice instruction does not exist in the real-time environment image; calculating the Euclidean distance between each target and the corresponding embedded vector of the target object; when the Euclidean distance is larger than a preset threshold value, deleting the target from the target detection result, and analyzing all targets in the target detection result, determining whether the area related to each target contains a marked explored point in the corresponding point cloud, and if so, deleting the target from the target detection result; and sorting all the targets with Euclidean distances smaller than a preset value from near to far, and taking the target positioned at the first sorting as an auxiliary target with the highest association degree with the target object.
10. The visually impaired object finding system according to claim 6, wherein the action instructions are fed back in at least one of the following ways: voice feedback, frequency buzzing feedback; the vision sensor is a wearable helmet type RGB-D camera.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210516416.3A CN115218903A (en) | 2022-05-12 | 2022-05-12 | Object searching method and system for visually impaired people |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210516416.3A CN115218903A (en) | 2022-05-12 | 2022-05-12 | Object searching method and system for visually impaired people |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115218903A true CN115218903A (en) | 2022-10-21 |
Family
ID=83607635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210516416.3A Pending CN115218903A (en) | 2022-05-12 | 2022-05-12 | Object searching method and system for visually impaired people |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115218903A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117001715A (en) * | 2023-08-30 | 2023-11-07 | 哈尔滨工业大学 | Intelligent auxiliary system and method for visually impaired people |
WO2024099238A1 (en) * | 2022-11-11 | 2024-05-16 | 北京字跳网络技术有限公司 | Assistive voice navigation method and apparatus, electronic device, and storage medium |
-
2022
- 2022-05-12 CN CN202210516416.3A patent/CN115218903A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024099238A1 (en) * | 2022-11-11 | 2024-05-16 | 北京字跳网络技术有限公司 | Assistive voice navigation method and apparatus, electronic device, and storage medium |
CN117001715A (en) * | 2023-08-30 | 2023-11-07 | 哈尔滨工业大学 | Intelligent auxiliary system and method for visually impaired people |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11126855B2 (en) | Artificial-intelligence powered ground truth generation for object detection and tracking on image sequences | |
CN110827415B (en) | All-weather unknown environment unmanned autonomous working platform | |
US20180161986A1 (en) | System and method for semantic simultaneous localization and mapping of static and dynamic objects | |
US9224043B2 (en) | Map generation apparatus, map generation method, moving method for moving body, and robot apparatus | |
Liu et al. | End-to-end trajectory transportation mode classification using Bi-LSTM recurrent neural network | |
CN115218903A (en) | Object searching method and system for visually impaired people | |
Cadena et al. | Robust place recognition with stereo cameras | |
Cheng et al. | Improving visual localization accuracy in dynamic environments based on dynamic region removal | |
CN114937083B (en) | Laser SLAM system and method applied to dynamic environment | |
JP2017167625A (en) | Autonomous mobile device, autonomous mobile system, autonomous mobile method, and program | |
US20230243658A1 (en) | Systems, Methods and Devices for Map-Based Object's Localization Deep Learning and Object's Motion Trajectories on Geospatial Maps Using Neural Network | |
CN115900710A (en) | Dynamic environment navigation method based on visual information | |
WO2022021661A1 (en) | Gaussian process-based visual positioning method, system, and storage medium | |
CN111259526B (en) | Cluster recovery path planning method, device, equipment and readable storage medium | |
Li et al. | Stereovoxelnet: Real-time obstacle detection based on occupancy voxels from a stereo camera using deep neural networks | |
Radwan | Leveraging sparse and dense features for reliable state estimation in urban environments | |
Zhang et al. | ForceFormer: exploring social force and transformer for pedestrian trajectory prediction | |
Yakoobi et al. | Deep learning-based solution for differently-abled persons in the society | |
Ye et al. | Person re-identification for robot person following with online continual learning | |
Cadena et al. | Place recognition using near and far visual information | |
Upadhyay et al. | Monocular localization using invariant image feature matching to assist navigation | |
Song et al. | Multi-sensory visual-auditory fusion of wearable navigation assistance for people with impaired vision | |
US11341334B2 (en) | Method and apparatus for evaluating natural language input to identify actions and landmarks | |
Ngoc et al. | Efficient Evaluation of SLAM Methods and Integration of Human Detection with YOLO Based on Multiple Optimization in ROS2. | |
Shah et al. | Survey on Object Detection, Distance Estimation and Navigation Systems for Blind People |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |