WO2023245615A1

WO2023245615A1 - Blind guiding method and apparatus, and readable storage medium

Info

Publication number: WO2023245615A1
Application number: PCT/CN2022/101093
Authority: WO
Inventors: 宋呈群; 程俊; 吴福祥; 郭海光; 高向阳
Original assignee: 中国科学院深圳先进技术研究院
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2023-12-28

Abstract

Provided are a blind guiding method and apparatus, and a readable storage medium, being suitable for the technical field of blind guiding. The method comprises: acquiring a first image sequence by means of a camera, the first image sequence comprising a first image at a first moment (201); according to the first image sequence, acquiring an environment map at the first moment and pose information of the camera in the environment map at the first moment (202); identifying a first object in the first image, and acquiring semantic information of the first object and a mask of the first object (203); projecting the mask of the first object into the environment map at the first moment, and acquiring three-dimensional position information of the first object (204); according to the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object, acquiring relative pose information between the camera and the first object (205); and broadcasting the relative pose information and the semantic information of the first object (206). The method and apparatus are beneficial for helping visually impaired people to perceive the surrounding environment more accurately, and the reliability of the blind guiding method is improved.

Description

Blind guide method, device and readable storage medium

Technical field

The present application belongs to the technical field of guiding the blind, and in particular relates to a guiding method, device and readable storage medium.

Background technique

With the gradual development of road traffic, complex road conditions have made visually impaired people face great difficulties when traveling, and they need to be assisted by guide devices. However, the guidance methods adopted by the existing blind guide devices cannot provide sufficient convenience for the visually impaired, so the Global Positioning System (Global Positioning System, For example, the GPS guide device is easily affected by the environment and has low accuracy in some areas, and cannot provide effective environmental information. When the visually impaired use a guide device to travel, if the environmental information provided by the guide device is not accurate enough, the visually impaired person will not be able to accurately judge their location or accurately find their destination based on the information provided by the guide device.

Therefore, for these visually impaired people, the existing guide devices cannot help the visually impaired people accurately perceive the surrounding environment during actual travel, and the reliability is low.

technical problem

One of the purposes of the embodiments of this application is to provide a method, device and readable storage medium for blind guidance, which can help visually impaired people accurately perceive the surrounding environment and improve the reliability of the blind guidance method.

Technical solutions

In a first aspect, embodiments of the present application provide a method for guiding the blind, including:

Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;

According to the first image sequence, obtain the environment map at the first moment and the pose information of the camera in the environment map at the first moment;

Identify a first object in the first image, and obtain semantic information of the first object and a mask of the first object;

Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object;

Obtain relative pose information between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object;

The relative pose information and the semantic information of the first object are broadcast.

In a possible implementation, the environment map at the first moment includes a plurality of feature points; the mask of the first object is projected onto the environment map at the first moment to obtain the first Three-dimensional position information of an object, including:

Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask from the plurality of feature points;

According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained and determined as the three-dimensional position information of the first object.

In a possible implementation, obtaining the mask of the first object includes:

According to the semantic information of the first object, a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.

In a possible implementation, the relative pose information includes the relative distance and relative angle between the camera and the first object;

The broadcasting of the relative pose information and the semantic information of the first object includes:

If the relative distance is within the preset distance range, or the relative angle is within the preset angle range, the relative pose information and the semantic information of the first object are broadcast.

In a possible implementation, the method further includes:

Acquire a second image sequence through the camera, the second image sequence including a second image at a second time, the second time being located after the first time;

According to the second image sequence and the environment map at the first moment, obtain an intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;

The portion of the preset range around the camera in the intermediate environment map is determined as the environment map at the second moment.

This embodiment of the present application reduces the computational load of the embodiment of the present application by determining the part within the preset range around the camera as the environment map at the second moment.

In a possible implementation, obtaining the environment map at the first moment and the pose information of the camera in the environment map at the first moment according to the first image sequence includes:

Obtain the first data through IMU;

According to the first data and the first image sequence, the environment map at the first moment and the pose information of the camera in the environment map at the first moment are obtained through the SLAM method.

In a possible implementation, the method further includes:

Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle;

Broadcast information about the obstacle.

In the second aspect, this application provides a guide device for the blind, including: an acquisition module, a map construction module, an object recognition module, a projection module, a determination module and a broadcast module;

The acquisition module is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;

The map construction module is used to obtain the environment map of the first moment and the pose information of the camera in the environment map of the first moment according to the first image sequence;

The object recognition module is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object;

The projection module is used to project the mask of the first object into the environment map at the first moment to obtain the three-dimensional position information of the first object;

The determination module is configured to obtain the relative position between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. pose information;

The broadcast module is used to broadcast the relative pose information and the semantic information of the first object.

In a third aspect, embodiments of the present application provide a blind guide device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, for executing any of the above-mentioned aspects of the first aspect. Methods in possible implementations.

In the fourth aspect, embodiments of the present application provide a computer-readable storage medium. The readable storage medium is used to store a computer program. When the computer program is executed by a processor, any possible method of the first aspect can be implemented. Methods in the implementation.

In a fifth aspect, embodiments of the present application provide a computer program product, which when the computer program is run on a blind guide device, causes the blind guide device to execute the method in any possible implementation of the first aspect.

beneficial effects

The beneficial effects of the blind guiding method provided by the embodiments of the present application are: collecting images of the surrounding environment in real time through cameras, constructing an environment map based on the images, and identifying objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the three-dimensional position information of the object in the environment map is obtained, and then the relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and improve the reliability of the blind guidance method.

For the beneficial effects of the blind guiding device and the readable storage medium provided by the embodiments of the present application, please refer to the relevant descriptions in the above blind guiding method.

Description of the drawings

Figure 1 is a schematic diagram of an application scenario according to an embodiment of the present application;

Figure 2 is a schematic flowchart of a blind guiding method provided by an embodiment of the present application;

Figure 3 is a schematic diagram of a first image provided by an embodiment of the present application;

Figure 4 is a schematic diagram of the camera pose provided by the embodiment of the present application;

Figure 5 is a schematic block diagram of a blind guide device provided by an embodiment of the present application;

Figure 6 is a schematic structural diagram of a blind guide device provided by an embodiment of the present application.

Embodiments of the invention

In the following description, for the purpose of explanation rather than limitation, specific details such as specific system structures and technologies are provided to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.

It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".

In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms "includes," "includes," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.

This application provides a blind guiding method, specifically a blind guiding method that uses a camera to sense objects in the surrounding environment.

In a possible implementation, the camera disclosed in the embodiment of the present application is a camera on a device that the visually impaired can carry when traveling, such as a camera on a smartphone or a camera on a handheld guide device.

In one possible application scenario, the blind guiding method disclosed in the embodiment of the present application is applied to a blind guiding device with a camera. Figure 1 is a schematic diagram of an application scenario according to an embodiment of the present application, which includes a user 10, a guide device 20 and an object 30.

Among them, the user 10 may be a visually impaired person who needs to use a guide to travel.

The blind guiding device 20 is used to acquire images of the surrounding environment through a camera, and acquire an environmental map based on the image to implement the blind guiding function. In a possible implementation, the blind guide device 20 is a smartphone.

Object 30 refers to objects in the environment, such as a tree or a car.

The camera of the smartphone is used to obtain images of the surrounding environment. The smartphone obtains the environment map, the semantic information of the objects in the map, and the relative pose information of the camera and the object through the images obtained by the camera.

When a visually impaired person walks with a smartphone in hand, the smartphone broadcasts to the visually impaired person what objects are in the surrounding environment and the corresponding location of the object through voice, so as to realize the purpose of perceiving the surrounding environment and finding the destination or target object.

Optionally, the smartphone can also be connected to an ultrasonic detection device. The ultrasonic detection device is used to detect whether there are obstacles in front of the camera. The obstacle information includes the distance between the camera and the obstacles. The smartphone receives the obstacle information sent by the ultrasonic detection device and broadcasts the obstacle information.

The ultrasonic detection device is connected to the smartphone in a wired or wireless manner. In a possible implementation, the ultrasonic detection device is connected to the smartphone through a type-C interface. In another possible implementation, the ultrasonic detection device is connected to the smartphone via Bluetooth.

It is worth noting that the distance between the ultrasonic detection device and the smartphone needs to be kept within the maximum connection distance. The maximum connection distance refers to the maximum distance at which data transmission can be achieved between the ultrasonic detection device and the smartphone, and the position of the ultrasonic detection device can be approximated to the position of the smartphone.

In another possible application scenario, the blind guide device 20 provided in the embodiment of the present application may also be a device with an embedded system. In a possible implementation, the embedded system is an embedded development board (Advanced RISC Machines, ARM).

It should be understood that the scenario in Figure 1 is only an example of a possible scenario of the embodiment of the present application, and the embodiment of the present application is not limited thereto.

The method for guiding the blind proposed in the embodiment of the present application will be described in detail below with reference to Figure 2 . Figure 2 is a schematic flowchart of a blind guiding method provided by an embodiment of the present application. The method includes the following steps:

Step 201: Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment.

The embodiments of this application do not limit the structure of the camera. In a possible implementation, the camera is a monocular camera. In another possible implementation, the camera used in the embodiment of the present application is a binocular camera.

Optionally, when the blind guiding method provided by the embodiment of the present application is applied to a smartphone, the image is acquired through the camera of the smartphone.

It is worth noting that what is obtained through the camera in the embodiment of the present application is an image sequence composed of multiple images, and the multiple images have a temporal sequence. To put it another way, the image acquisition disclosed in the embodiments of the present application is a dynamic process, and images are continuously acquired as time develops. In order to distinguish, the image sequence here can be called the first image sequence.

The first image sequence includes the first image captured by the camera at the first moment. It is worth noting that the “first” in the first moment and the first image disclosed in the embodiments of this application is only used for distinction. The first moment can refer to any moment in the process of acquiring an image by the camera, and the first image is The image acquired at the corresponding time.

It should be noted that the embodiment of the present application does not limit the posture of the camera. When a visually impaired person walks using a guide device, the posture of the camera may change. In this way, when shooting in the same environment, because the camera is in different postures, the postures of the objects in the graphics captured by the camera are also different. Illustratively, Figure 3 is a schematic diagram of the first image provided by the embodiment of the present application. The camera obtains the first image 310 of the object at the first moment. The posture of the camera changes during the walking process of the visually impaired person. At the first At a moment later in the time sequence, the camera obtained image 320 of the same object.

Step 202: According to the first image sequence, obtain the environment map at the first moment and the pose information of the camera in the environment map at the first moment.

Among them, pose information is position and attitude, including position information and angle information. Specifically, the pose information in this embodiment of the present application includes the coordinates of the object in the spatial coordinate system, and the angle information between the object and the coordinate axis.

According to the foregoing, the first image sequence obtained by the embodiment of the present application includes images at different times. It is worth noting that the pose information of the camera in the environment map at the first moment, that is, the camera pose information at the first moment. For the convenience of description, the pose information of the camera at the first moment in the environment map at the first moment is expressed as C.

In the embodiment of the present application, a method for obtaining an environment map at the first moment and the pose information of the camera in the environment map at the first moment based on the first image sequence includes:

In one possible implementation, using Simultaneous Localization and Map Construction and Mapping, SLAM) method to obtain the environment map at the first moment and the position and orientation of the camera in the environment map at the first moment.

SLAM is a method for solving positioning navigation and map construction problems. The SLAM method does not require the use of other external environmental information and can complete map construction based on images. It has the advantages of wide applicability and high computational efficiency. Optionally, in this embodiment, the environment map obtained through the SLAM method is a 3D sparse structure map. The 3D sparse structure map includes feature points, which help users perceive environmental information through the coordinates of the feature points in space. It has high accuracy, does not need to store and process too much information, and has a faster calculation speed.

Among them, the feature point in the field of image processing refers to the point where the gray value of the image changes drastically or the point with large curvature on the edge of the image (that is, the intersection of two edges).

It is worth noting that semantic information is not included in the 3D sparse structure map.

In another possible implementation, through an inertial measurement unit (Inertial Measurement Unit, IMU) obtains IMU data; according to the first data and the first image sequence, obtain the environment map at the first moment and the pose information of the camera image in the environment map at the first moment through the SLAM method. Among them, the first data refers to the data collected by the IMU.

Specifically, the IMU can collect angular velocity and acceleration. In the actual application scenario of the embodiment of this application, the angular velocity and acceleration collected by the IMU can be approximated as the angular velocity and acceleration of the camera.

In one possible implementation, the angular velocity and acceleration collected by the IMU are integrated with the information of the image sequence obtained by the SLAM method to obtain the environment map at the first moment and the pose information of the camera image in the environment map at the first moment. . In another possible implementation, the pose information of the camera is obtained through the angular velocity and acceleration collected by the IMU, and then fused with the camera pose information obtained by the SLAM method to obtain the environment map and camera image at the first moment. The pose information in the environment map at all times.

By fusing the data collected by the IMU with the SLAM method, the accuracy of the environment map obtained by the SLAM method and the position of the camera in the map can be improved.

Specifically, the method used in this application is the visual SLAM method, which uses cameras to realize the perception of the surrounding environment.

This application does not limit the specific method adopted by visual SLAM. Any visual SLAM method that can obtain the environment map and camera pose through images can be applied to the embodiments of this application. In a possible implementation, this application uses the monocular visual SLAM method, that is, to construct an environment map through monocular sequence images.

When constructing an environment map through monocular sequence images, only one camera is used to obtain surrounding image information, which means that only a single image of the surrounding environment can be obtained at the same time. In the application scenario where the blind guidance method provided by the embodiment of the present application is applied to smartphones, the monocular visual SLAM method has the advantages of portability and low cost.

Optionally, in one implementation, when constructing an environment map through monocular sequence images, the camera coordinates with the camera as the origin are calculated by matching feature points between two pictures corresponding to two different times. The coordinate transformation between these two moments is used to construct the environment map and obtain the camera pose in an iterative manner. Image feature points play a very important role in image matching based on feature points. Image feature points can reflect the essential characteristics of the image and identify the target objects in the image. Image matching can be completed through feature point matching.

Figure 4 is a schematic diagram of the camera pose provided by the embodiment of the present application. Similar to the scene shown in Figure 3, the camera has different poses at different times. The SLAM method can match the correspondence between feature points at the same position in space in images at different times. For example, the feature point P is matched in two images, and the camera coordinate system where the camera is located at two different times is They are (x, y, z) and (x', y', z') respectively. After obtaining the matching results of the feature point P in the image at different times, the coordinate transformation information from the coordinate system (x, y, z) to (x', y', z') can be obtained through geometric analysis method, Thus, the camera pose is obtained.

Step 203: Identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object.

In the technical field to which the embodiments of this application are applied, semantic information refers to semantic information at the conceptual level, which is used to indicate what the objects in the image are. For example, there is a tree in the image, and "tree" is the semantic information in it. Regarding the mask involved in the embodiment of the present application, in the field of image processing, some operations require using a selected image to partially or completely block the image being processed, and the image that plays a blocking role is called a mask.

In the embodiment of the present application, the mask of the object is a two-dimensional plane image. And the area size and shape of the mask of the first object are related to the type of the first object. On the basis that the mask of the first object has a similar shape to the first object, in subsequent steps, the mask is projected into the environment map at the first moment to determine the position of the mask, and the position of the mask is used as the second object. When determining the three-dimensional position of an object, the determined three-dimensional position of the first object has higher accuracy.

In the embodiment of the present application, the method for identifying the first object in the first image and obtaining the semantic information and mask of the first object includes:

In a possible implementation, the semantic information and mask of the first object are obtained through image processing.

In another possible implementation, a neural network model is used to obtain the semantic information and mask of the first object. Optionally, the neural network model adopted is a deep learning neural network. When adopting the neural network model as a method to obtain semantic information and masks, it has the advantages of high accuracy and wide coverage.

It is worth noting that the mask in the embodiment of the present application can express the semantic information of the object. Specifically, in a possible implementation of this application, the neural network needs to obtain a semantic segmentation result when obtaining the object mask. The semantic segmentation result is to distinguish different objects in the image and indicate their categories. For example, in an image Including cars, people and street lights. Furthermore, the neural network generates a corresponding mask based on the recognized object, and this mask can express the category information and shape information of the object.

Step 204: Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object.

According to the previous description, when this application uses visual SLAM to construct an environment map, the resulting sparse 3D structure map does not have semantic information. By projecting the mask of the object onto the map obtained in step 202, the feature points in the sparse 3D structure map can be combined with the semantic information of the object. When the three-dimensional position information of the first object is subsequently obtained, the The three-dimensional position information is more accurate.

Specifically, as mentioned above, the environment map includes feature points. Optionally, when projecting the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object, the method adopted includes:

Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask among multiple feature points;

According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained, and is determined as the three-dimensional position information of the first object.

It can be seen that the mask is projected to the spatial coordinates of the object corresponding to the mask. Therefore, the information about which feature points belong to which object can be obtained. On the basis of determining the corresponding relationship between the mask and the feature points of the first object, the center position of the mask is approximated as the center position of the first object, thereby determining the three-dimensional position information of the first object. The obtained three-dimensional position information has Higher accuracy.

For ease of expression, the three-dimensional position of the object center is expressed as Pi.

Step 205: Obtain relative pose information between the camera and the first object based on the camera's pose information in the environment map at the first moment and the three-dimensional position information of the first object.

Specifically, the relative pose information includes the relative distance and relative angle between the camera and the first object. In this step, based on the pose information C of the camera and the three-dimensional position Pi of the object center, the distance between the camera pose C and the object position Pi is determined, as well as the distance between the three-dimensional position Pi of the object center and the camera pose C. angle.

In a possible application scenario of the embodiment of the present application, the camera is set on a smartphone or a blind guide device. Therefore, the position of the camera can also be understood as the position of the person. In this case, when the relative pose information between the camera pose information and the object is obtained, the visually impaired can determine the location of the object based on this information and realize the function of guiding the blind. For example, the relative posture information of a tree and the camera is "the distance between the tree and the camera is 2 meters, and the relative angle is 30 degrees 2 meters." Visually impaired people can perceive the environment based on this relative posture information. There is a tree in .

Step 206: Broadcast the relative pose information and the semantic information of the first object.

For example, the voice broadcast content may be "There is a person 20 degrees to the left and 5 meters ahead", or "There is a tree 30 degrees and 2 meters to the right ahead."

Optionally, broadcast relative pose information and semantic information of the first object, including:

Specifically, in the actual application scenario of the embodiment of the present application, the first image may include multiple objects, but some of the objects are far away from the user, or have a large angle with the user and are not on the user's route. Not helpful for guide blind people. Therefore, the purpose of guiding the blind can be achieved by voice broadcasting objects within a certain range in the direction of the visually impaired person's forward direction, further providing accurate surrounding environment information for the visually impaired person's travel, and improving the user's travel experience.

Among them, only the preset distance range can be set, or only the preset angle range can be set, or the preset distance range and the preset angle range can be set at the same time.

Optionally, the preset range and/or the preset angle are set by default or manual setting.

Through multiple setting methods, the range of surrounding environment that users need to perceive when traveling can be changed, improving setting flexibility.

It can be seen that the blind guiding method provided by the embodiment of the present application collects images of the surrounding environment in real time through the camera, builds an environment map based on the images, and identifies objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the accurate three-dimensional position information of the object in the environment map is obtained. Furthermore, the accurate relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and remind the user to avoid obstacles accurately, which improves the reliability of the blind guide method. Moreover, in the embodiment of the present application, by processing the images collected by the camera in real time, the relative pose between the camera and the object and the semantic information of the object can be obtained. No additional hardware support is required, and the equipment requirements are simple and easy to implement.

Optionally, in order to reduce the computational load, the blind guiding method disclosed in the embodiment of this application also includes:

Acquire a second image sequence through the camera, the second image sequence includes a second image at a second time, and the second time is located after the first time;

According to the second image sequence and the environment map at the first moment, obtain the intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;

The second image sequence refers to any image sequence that is later than the first image sequence in time sequence, and the intermediate environment map refers to the environment map determined based on all image sequences obtained by the camera. The embodiment of the present application does not limit the selection of the preset range. Optionally, the preset range is derived from default or manual settings.

In this implementation, the blind guidance method provided by the embodiment of the present application will continue to construct the environment map. Therefore, as time progresses and the location of the camera changes, the environment map constructed by the embodiment of the present application continues to expand, and the corresponding pose information also continues to increase. More and more data need to be processed when determining the environment map and pose information at the second moment, which will occupy storage space and reduce computing efficiency. By retaining only the map information and pose information within the preset range, that is, continuously discarding the previous map and pose information as the walking distance increases, the calculation load of the blind guiding method provided by the embodiment of the present application can be reduced.

Optionally, as a further improvement to the embodiments of the present application, the embodiments of the present application also include a method of detecting obstacles using an ultrasonic detection device, which specifically includes the following steps:

Step A: Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle.

In one possible application scenario of the embodiment of the present application, when a smartphone is used to acquire an image sequence and construct an environment map, an ultrasonic detection device is externally connected to the smartphone.

Optionally, for the convenience of guiding the blind, the direction of ultrasonic detection is consistent with the direction of the camera, thereby making it easier for the blind to determine the specific direction of the obstacle.

Step B: Broadcast obstacle information.

It is worth noting that broadcasting obstacle information and broadcasting relative pose information and semantic information in step 206 do not affect each other. The blind guiding method disclosed in the embodiment of the present application can broadcast these two types of information. For example, the blind guiding method disclosed in the embodiment of the present application can announce through voice "Please note that there is a car 30 degrees and 3 meters to the right in front", or "Please note that there is an obstacle in front of you".

To sum up, the blind guiding method provided by the embodiment of the present application collects images of the surrounding environment in real time through a camera, builds an environment map based on the images, and identifies objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the accurate three-dimensional position information of the object in the environment map is obtained. Furthermore, the accurate relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and improve the reliability of the blind guidance method.

Embodiments of this application can also only broadcast relative posture information and corresponding object semantic information within a preset distance or a preset angle. The purpose of guiding the blind can be achieved by voice broadcasting objects within a certain range in the forward direction of the visually impaired. , further providing accurate surrounding environment information for visually impaired people when traveling, and improving users’ travel experience.

Embodiments of the present application can also use only the environment map within the preset range as the environment map at the second moment. That is, as the walking distance of the visually impaired increases, only the environment map and pose information within a certain range of the surroundings are retained, reducing the cost. Compute load.

Embodiments of the present application can also receive obstacle information detected by the ultrasonic detection device, helping the visually impaired to avoid obstacles in front of them in time, and improving the reliability of the blind guide method.

Moreover, in the embodiment of the present application, by processing the images collected by the camera in real time, the relative pose between the camera and the object and the semantic information of the object can be obtained. No additional hardware support is required, and the equipment requirements are simple and easy to implement.

Figure 5 shows a schematic block diagram of a blind guide device according to an embodiment of the present application. As shown in Figure 5, the device 500 includes: an acquisition module 510, a map construction module 520, an object recognition module 530, a projection module 540, a determination module 550, and a receiving module 560.

The acquisition module 510 is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment.

The map construction module 520 is used to obtain the environment map at the first moment and the position and orientation information of the camera in the environment map at the first moment according to the first image sequence.

The object recognition module 530 is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object.

The projection module 540 is configured to project the mask of the first object obtained by the object recognition module 530 onto the environment map at the first moment obtained by the map construction module 520 to obtain the three-dimensional position information of the first object.

The determination module 550 is used to determine the relative pose between the camera and the three-dimensional position information of the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. information.

The broadcast module 560 is used to broadcast the relative pose information obtained by the determination module 550 and the semantic information of the first object.

Optionally, the blind guide device provided by the embodiment of the present application may also include a receiving module 570, which is used to receive data sent by other devices.

In a possible implementation, the environment map at the first moment includes multiple feature points, and the projection module 540 is used to:

In a possible implementation, the object recognition module 530 is used to:

In a possible implementation, the relative pose information includes the relative distance and relative angle between the camera and the first object, and the broadcast module 560 is used to:

In a possible implementation, the acquisition module 510 is also used to:

Map building module 520 is also used to:

Determine the portion of the preset range around the camera in the intermediate environment map as the environment map at the second moment.

In a possible implementation, the map building module 520 is used to:

Obtain the first data through IMU;

In a possible implementation, the receiving module 570 is used to:

The broadcast module 560 is used to broadcast obstacle information.

Figure 6 is a schematic structural diagram of a blind guide device provided by an embodiment of the present application. As shown in FIG. 6 , the blind guide device 600 includes: at least one processor 60 (only one is shown in FIG. 6 ), a memory 61 , and a processor stored in the memory 61 and available on the at least one processor 60 . The computer program 62 is run. When the processor 60 executes the computer program 62, it is used to implement the steps in any of the above blind guide method embodiments (such as the method in Figure 2).

The so-called processor 60 may be a central processing unit (Central Processing Unit, CPU). The processor 60 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.

The memory 61 may be an internal storage unit of the device 600 in some embodiments, such as a hard disk or memory of the blind guide device 600 . In other embodiments, the memory 61 may also be an external storage device of the device 600, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital device equipped on the device 600. Digital, SD) card, Flash Card, etc. Further, the memory 61 may also include both an internal storage unit of the device 600 and an external storage device. The memory 61 is used to store operating systems, application programs, boot loaders, data and other programs, such as program codes of the computer programs. The memory 61 can also be used to temporarily store data that has been output or is to be output.

Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.

Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.

Embodiments of the present application provide a computer program product. When the computer program product is run on a guide device for the blind, the steps in each of the above method embodiments can be implemented when the guide device is executed.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of each of the above method embodiments may be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to the camera device/terminal device, recording media, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.

Claims

A method for guiding the blind, which is characterized by including:

Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;

According to the first image sequence, obtain the environment map at the first moment and the pose information of the camera in the environment map at the first moment;

Identify a first object in the first image, and obtain semantic information of the first object and a mask of the first object;

Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object;

Obtain relative pose information between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object;

The relative pose information and the semantic information of the first object are broadcast.
The method of claim 1, wherein the environment map at the first moment includes a plurality of feature points; and projecting the mask of the first object into the environment map at the first moment, Obtaining the three-dimensional position information of the first object includes:

Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask from the plurality of feature points;

According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained and determined as the three-dimensional position information of the first object.
The method of claim 2, wherein obtaining the mask of the first object includes:

According to the semantic information of the first object, a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.
The method according to any one of claims 1-3, wherein the relative pose information includes the relative distance and relative angle between the camera and the first object;

The broadcasting of the relative pose information and the semantic information of the first object includes:

If the relative distance is within the preset distance range, or the relative angle is within the preset angle range, the relative pose information and the semantic information of the first object are broadcast.
The method according to any one of claims 1-3, characterized in that the method further includes:

Acquire a second image sequence through the camera, the second image sequence including a second image at a second time, the second time being located after the first time;

According to the second image sequence and the environment map at the first moment, obtain an intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;

The portion of the preset range around the camera in the intermediate environment map is determined as the environment map at the second moment.
The method according to any one of claims 1 to 3, characterized in that, according to the first image sequence, obtaining the environment map of the first moment and the environment of the camera at the first moment The pose information in the map includes:

Obtain the first data through IMU;

According to the first data and the first image sequence, the environment map at the first moment and the pose information of the camera in the environment map at the first moment are obtained through the SLAM method.
The method according to any one of claims 1-3, further comprising:

Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle;

Broadcast information about the obstacle.
A guide device for the blind, characterized in that it includes: an acquisition module, a map construction module, an object recognition module, a projection module, a determination module and a broadcast module;

The acquisition module is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;

The map construction module is used to obtain the environment map of the first moment and the pose information of the camera in the environment map of the first moment according to the first image sequence;

The object recognition module is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object;

The projection module is used to project the mask of the first object into the environment map at the first moment to obtain the three-dimensional position information of the first object;

The determination module is configured to obtain the relative position between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. pose information;

The broadcast module is used to broadcast the relative pose information and the semantic information of the first object.
A guide device for the blind, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements claim 1 The method described in any one of to 7.
A computer-readable storage medium stores a computer program, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.