WO2023245615A1 - Blind guiding method and apparatus, and readable storage medium - Google Patents

Blind guiding method and apparatus, and readable storage medium Download PDF

Info

Publication number
WO2023245615A1
WO2023245615A1 PCT/CN2022/101093 CN2022101093W WO2023245615A1 WO 2023245615 A1 WO2023245615 A1 WO 2023245615A1 CN 2022101093 W CN2022101093 W CN 2022101093W WO 2023245615 A1 WO2023245615 A1 WO 2023245615A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
moment
information
environment map
mask
Prior art date
Application number
PCT/CN2022/101093
Other languages
French (fr)
Chinese (zh)
Inventor
宋呈群
程俊
吴福祥
郭海光
高向阳
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Priority to PCT/CN2022/101093 priority Critical patent/WO2023245615A1/en
Publication of WO2023245615A1 publication Critical patent/WO2023245615A1/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H3/00Appliances for aiding patients or disabled persons to walk about
    • A61H3/06Walking aids for blind persons
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements

Definitions

  • the present application belongs to the technical field of guiding the blind, and in particular relates to a guiding method, device and readable storage medium.
  • the GPS guide device is easily affected by the environment and has low accuracy in some areas, and cannot provide effective environmental information.
  • the visually impaired use a guide device to travel, if the environmental information provided by the guide device is not accurate enough, the visually impaired person will not be able to accurately judge their location or accurately find their destination based on the information provided by the guide device.
  • the existing guide devices cannot help the visually impaired people accurately perceive the surrounding environment during actual travel, and the reliability is low.
  • One of the purposes of the embodiments of this application is to provide a method, device and readable storage medium for blind guidance, which can help visually impaired people accurately perceive the surrounding environment and improve the reliability of the blind guidance method.
  • embodiments of the present application provide a method for guiding the blind, including:
  • the relative pose information and the semantic information of the first object are broadcast.
  • obtaining the mask of the first object includes:
  • a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.
  • the relative pose information includes the relative distance and relative angle between the camera and the first object
  • the relative pose information and the semantic information of the first object are broadcast.
  • the portion of the preset range around the camera in the intermediate environment map is determined as the environment map at the second moment.
  • This embodiment of the present application reduces the computational load of the embodiment of the present application by determining the part within the preset range around the camera as the environment map at the second moment.
  • obtaining the environment map at the first moment and the pose information of the camera in the environment map at the first moment according to the first image sequence includes:
  • Receive obstacle information sent by an ultrasonic detection device the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle;
  • this application provides a guide device for the blind, including: an acquisition module, a map construction module, an object recognition module, a projection module, a determination module and a broadcast module;
  • the acquisition module is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;
  • the map construction module is used to obtain the environment map of the first moment and the pose information of the camera in the environment map of the first moment according to the first image sequence;
  • the projection module is used to project the mask of the first object into the environment map at the first moment to obtain the three-dimensional position information of the first object;
  • the determination module is configured to obtain the relative position between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. pose information;
  • the broadcast module is used to broadcast the relative pose information and the semantic information of the first object.
  • embodiments of the present application provide a computer-readable storage medium.
  • the readable storage medium is used to store a computer program.
  • any possible method of the first aspect can be implemented. Methods in the implementation.
  • the beneficial effects of the blind guiding method are: collecting images of the surrounding environment in real time through cameras, constructing an environment map based on the images, and identifying objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the three-dimensional position information of the object in the environment map is obtained, and then the relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and improve the reliability of the blind guidance method.
  • Figure 1 is a schematic diagram of an application scenario according to an embodiment of the present application.
  • Figure 2 is a schematic flowchart of a blind guiding method provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a first image provided by an embodiment of the present application.
  • Figure 4 is a schematic diagram of the camera pose provided by the embodiment of the present application.
  • Figure 5 is a schematic block diagram of a blind guide device provided by an embodiment of the present application.
  • the camera disclosed in the embodiment of the present application is a camera on a device that the visually impaired can carry when traveling, such as a camera on a smartphone or a camera on a handheld guide device.
  • the ultrasonic detection device is connected to the smartphone in a wired or wireless manner.
  • the ultrasonic detection device is connected to the smartphone through a type-C interface.
  • the ultrasonic detection device is connected to the smartphone via Bluetooth.
  • the blind guide device 20 provided in the embodiment of the present application may also be a device with an embedded system.
  • the embedded system is an embedded development board (Advanced RISC Machines, ARM).
  • Figure 2 is a schematic flowchart of a blind guiding method provided by an embodiment of the present application. The method includes the following steps:
  • Step 201 Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment.
  • the image is acquired through the camera of the smartphone.
  • a method for obtaining an environment map at the first moment and the pose information of the camera in the environment map at the first moment based on the first image sequence includes:
  • the SLAM is a method for solving positioning navigation and map construction problems.
  • the SLAM method does not require the use of other external environmental information and can complete map construction based on images. It has the advantages of wide applicability and high computational efficiency.
  • the environment map obtained through the SLAM method is a 3D sparse structure map.
  • the 3D sparse structure map includes feature points, which help users perceive environmental information through the coordinates of the feature points in space. It has high accuracy, does not need to store and process too much information, and has a faster calculation speed.
  • IMU Inertial Measurement Unit
  • IMU data obtains IMU data; according to the first data and the first image sequence, obtain the environment map at the first moment and the pose information of the camera image in the environment map at the first moment through the SLAM method.
  • the first data refers to the data collected by the IMU.
  • the angular velocity and acceleration collected by the IMU are integrated with the information of the image sequence obtained by the SLAM method to obtain the environment map at the first moment and the pose information of the camera image in the environment map at the first moment.
  • the pose information of the camera is obtained through the angular velocity and acceleration collected by the IMU, and then fused with the camera pose information obtained by the SLAM method to obtain the environment map and camera image at the first moment. The pose information in the environment map at all times.
  • the accuracy of the environment map obtained by the SLAM method and the position of the camera in the map can be improved.
  • the method used in this application is the visual SLAM method, which uses cameras to realize the perception of the surrounding environment.
  • This application does not limit the specific method adopted by visual SLAM. Any visual SLAM method that can obtain the environment map and camera pose through images can be applied to the embodiments of this application. In a possible implementation, this application uses the monocular visual SLAM method, that is, to construct an environment map through monocular sequence images.
  • the monocular visual SLAM method When constructing an environment map through monocular sequence images, only one camera is used to obtain surrounding image information, which means that only a single image of the surrounding environment can be obtained at the same time.
  • the monocular visual SLAM method has the advantages of portability and low cost.
  • the camera coordinates with the camera as the origin are calculated by matching feature points between two pictures corresponding to two different times.
  • the coordinate transformation between these two moments is used to construct the environment map and obtain the camera pose in an iterative manner.
  • Image feature points play a very important role in image matching based on feature points.
  • Image feature points can reflect the essential characteristics of the image and identify the target objects in the image.
  • Image matching can be completed through feature point matching.
  • Figure 4 is a schematic diagram of the camera pose provided by the embodiment of the present application. Similar to the scene shown in Figure 3, the camera has different poses at different times.
  • the SLAM method can match the correspondence between feature points at the same position in space in images at different times. For example, the feature point P is matched in two images, and the camera coordinate system where the camera is located at two different times is They are (x, y, z) and (x', y', z') respectively.
  • the coordinate transformation information from the coordinate system (x, y, z) to (x', y', z') can be obtained through geometric analysis method, Thus, the camera pose is obtained.
  • Step 203 Identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object.
  • semantic information refers to semantic information at the conceptual level, which is used to indicate what the objects in the image are. For example, there is a tree in the image, and "tree" is the semantic information in it.
  • the mask involved in the embodiment of the present application in the field of image processing, some operations require using a selected image to partially or completely block the image being processed, and the image that plays a blocking role is called a mask.
  • the mask of the object is a two-dimensional plane image.
  • the area size and shape of the mask of the first object are related to the type of the first object.
  • the mask is projected into the environment map at the first moment to determine the position of the mask, and the position of the mask is used as the second object.
  • the determined three-dimensional position of the first object has higher accuracy.
  • the method for identifying the first object in the first image and obtaining the semantic information and mask of the first object includes:
  • the semantic information and mask of the first object are obtained through image processing.
  • a neural network model is used to obtain the semantic information and mask of the first object.
  • the neural network model adopted is a deep learning neural network.
  • Step 204 Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object.
  • the resulting sparse 3D structure map does not have semantic information.
  • the feature points in the sparse 3D structure map can be combined with the semantic information of the object.
  • the three-dimensional position information of the first object is subsequently obtained, the The three-dimensional position information is more accurate.
  • the environment map includes feature points.
  • the method adopted when projecting the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object, the method adopted includes:
  • the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained, and is determined as the three-dimensional position information of the first object.
  • the mask is projected to the spatial coordinates of the object corresponding to the mask. Therefore, the information about which feature points belong to which object can be obtained.
  • the center position of the mask is approximated as the center position of the first object, thereby determining the three-dimensional position information of the first object.
  • the obtained three-dimensional position information has Higher accuracy.
  • the three-dimensional position of the object center is expressed as Pi.
  • Step 205 Obtain relative pose information between the camera and the first object based on the camera's pose information in the environment map at the first moment and the three-dimensional position information of the first object.
  • the relative pose information includes the relative distance and relative angle between the camera and the first object.
  • the distance between the camera pose C and the object position Pi is determined, as well as the distance between the three-dimensional position Pi of the object center and the camera pose C. angle.
  • the camera is set on a smartphone or a blind guide device. Therefore, the position of the camera can also be understood as the position of the person.
  • the visually impaired can determine the location of the object based on this information and realize the function of guiding the blind.
  • the relative posture information of a tree and the camera is "the distance between the tree and the camera is 2 meters, and the relative angle is 30 degrees 2 meters.” Visually impaired people can perceive the environment based on this relative posture information. There is a tree in .
  • Step 206 Broadcast the relative pose information and the semantic information of the first object.
  • broadcast relative pose information and semantic information of the first object including:
  • the preset range and/or the preset angle are set by default or manual setting.
  • the blind guiding method collects images of the surrounding environment in real time through the camera, builds an environment map based on the images, and identifies objects in the surrounding environment based on the images.
  • the accurate three-dimensional position information of the object in the environment map is obtained.
  • the accurate relative pose between the camera and the object can be obtained.
  • the user can accurately perceive the surrounding environment and remind the user to avoid obstacles accurately, which improves the reliability of the blind guide method.
  • the relative pose between the camera and the object and the semantic information of the object can be obtained. No additional hardware support is required, and the equipment requirements are simple and easy to implement.
  • the second image sequence refers to any image sequence that is later than the first image sequence in time sequence
  • the intermediate environment map refers to the environment map determined based on all image sequences obtained by the camera.
  • the embodiment of the present application does not limit the selection of the preset range.
  • the preset range is derived from default or manual settings.
  • the blind guidance method provided by the embodiment of the present application will continue to construct the environment map. Therefore, as time progresses and the location of the camera changes, the environment map constructed by the embodiment of the present application continues to expand, and the corresponding pose information also continues to increase. More and more data need to be processed when determining the environment map and pose information at the second moment, which will occupy storage space and reduce computing efficiency.
  • the calculation load of the blind guiding method provided by the embodiment of the present application can be reduced.
  • Step A Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle.
  • an ultrasonic detection device is externally connected to the smartphone.
  • the direction of ultrasonic detection is consistent with the direction of the camera, thereby making it easier for the blind to determine the specific direction of the obstacle.
  • Step B Broadcast obstacle information.
  • the blind guiding method disclosed in the embodiment of the present application can broadcast these two types of information.
  • the blind guiding method disclosed in the embodiment of the present application can announce through voice "Please note that there is a car 30 degrees and 3 meters to the right in front", or "Please note that there is an obstacle in front of you".
  • the blind guiding method collects images of the surrounding environment in real time through a camera, builds an environment map based on the images, and identifies objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the accurate three-dimensional position information of the object in the environment map is obtained. Furthermore, the accurate relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and improve the reliability of the blind guidance method.
  • Embodiments of this application can also only broadcast relative posture information and corresponding object semantic information within a preset distance or a preset angle.
  • the purpose of guiding the blind can be achieved by voice broadcasting objects within a certain range in the forward direction of the visually impaired. , further providing accurate surrounding environment information for visually impaired people when traveling, and improving users’ travel experience.
  • Embodiments of the present application can also use only the environment map within the preset range as the environment map at the second moment. That is, as the walking distance of the visually impaired increases, only the environment map and pose information within a certain range of the surroundings are retained, reducing the cost. Compute load.
  • Figure 5 shows a schematic block diagram of a blind guide device according to an embodiment of the present application.
  • the device 500 includes: an acquisition module 510, a map construction module 520, an object recognition module 530, a projection module 540, a determination module 550, and a receiving module 560.
  • the acquisition module 510 is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment.
  • the map construction module 520 is used to obtain the environment map at the first moment and the position and orientation information of the camera in the environment map at the first moment according to the first image sequence.
  • the object recognition module 530 is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object.
  • the projection module 540 is configured to project the mask of the first object obtained by the object recognition module 530 onto the environment map at the first moment obtained by the map construction module 520 to obtain the three-dimensional position information of the first object.
  • the broadcast module 560 is used to broadcast the relative pose information obtained by the determination module 550 and the semantic information of the first object.
  • the blind guide device provided by the embodiment of the present application may also include a receiving module 570, which is used to receive data sent by other devices.
  • the environment map at the first moment includes multiple feature points
  • the projection module 540 is used to:
  • the object recognition module 530 is used to:
  • a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.
  • the receiving module 570 is used to:
  • the memory 61 may be an internal storage unit of the device 600 in some embodiments, such as a hard disk or memory of the blind guide device 600 . In other embodiments, the memory 61 may also be an external storage device of the device 600, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital device equipped on the device 600. Digital, SD) card, Flash Card, etc. Further, the memory 61 may also include both an internal storage unit of the device 600 and an external storage device. The memory 61 is used to store operating systems, application programs, boot loaders, data and other programs, such as program codes of the computer programs. The memory 61 can also be used to temporarily store data that has been output or is to be output.
  • an external storage device of the device 600 such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital device equipped on the device 600. Digital, SD) card, Flash Card, etc.
  • the memory 61 may also include both an internal storage unit of the device 600
  • Embodiments of the present application also provide a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the steps in each of the above method embodiments can be implemented.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Automation & Control Theory (AREA)
  • Pain & Pain Management (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Rehabilitation Therapy (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Navigation (AREA)

Abstract

Provided are a blind guiding method and apparatus, and a readable storage medium, being suitable for the technical field of blind guiding. The method comprises: acquiring a first image sequence by means of a camera, the first image sequence comprising a first image at a first moment (201); according to the first image sequence, acquiring an environment map at the first moment and pose information of the camera in the environment map at the first moment (202); identifying a first object in the first image, and acquiring semantic information of the first object and a mask of the first object (203); projecting the mask of the first object into the environment map at the first moment, and acquiring three-dimensional position information of the first object (204); according to the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object, acquiring relative pose information between the camera and the first object (205); and broadcasting the relative pose information and the semantic information of the first object (206). The method and apparatus are beneficial for helping visually impaired people to perceive the surrounding environment more accurately, and the reliability of the blind guiding method is improved.

Description

导盲方法、装置及可读存储介质Blind guide method, device and readable storage medium 技术领域Technical field
本申请属于导盲技术领域,尤其涉及一种导盲方法、装置及可读存储介质。The present application belongs to the technical field of guiding the blind, and in particular relates to a guiding method, device and readable storage medium.
背景技术Background technique
伴随着道路交通的逐步发展,复杂的路况使视障人士的出行面临着很大的困难,需要由导盲装置进行辅助。但现有的导盲装置所采取的导盲方法不能为视障人士提供足够的便利,以采用全球定位系统(Global Positioning System, GPS)的导盲装置为例,GPS容易受到环境的影响而在部分区域具有较低的准确性,无法提供有效的环境信息。在视障人士采用导盲装置出行时,如果导盲装置提供的环境信息不够准确,视障人士也就无法准确判断自己所处的位置,或是根据导盲装置提供的信息准确找到目的地。With the gradual development of road traffic, complex road conditions have made visually impaired people face great difficulties when traveling, and they need to be assisted by guide devices. However, the guidance methods adopted by the existing blind guide devices cannot provide sufficient convenience for the visually impaired, so the Global Positioning System (Global Positioning System, For example, the GPS guide device is easily affected by the environment and has low accuracy in some areas, and cannot provide effective environmental information. When the visually impaired use a guide device to travel, if the environmental information provided by the guide device is not accurate enough, the visually impaired person will not be able to accurately judge their location or accurately find their destination based on the information provided by the guide device.
因此,对于这些视障人士来说,在实际出行的过程中,现有的导盲装置并不能帮助视障人士准确地感知周围的环境,可靠性较低。Therefore, for these visually impaired people, the existing guide devices cannot help the visually impaired people accurately perceive the surrounding environment during actual travel, and the reliability is low.
技术问题technical problem
本申请实施例的目的之一在于:提供一种导盲方法、装置及可读存储介质,可以帮助视障人士准确地感知周围的环境,提高导盲方法的可靠性。One of the purposes of the embodiments of this application is to provide a method, device and readable storage medium for blind guidance, which can help visually impaired people accurately perceive the surrounding environment and improve the reliability of the blind guidance method.
技术解决方案Technical solutions
第一方面,本申请实施例提供了一种导盲方法,包括:In a first aspect, embodiments of the present application provide a method for guiding the blind, including:
通过摄像头获取第一图像序列,所述第一图像序列包括第一时刻的第一图像;Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;
根据所述第一图像序列,获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息;According to the first image sequence, obtain the environment map at the first moment and the pose information of the camera in the environment map at the first moment;
在所述第一图像中识别出第一物体,获取所述第一物体的语义信息和所述第一物体的掩膜;Identify a first object in the first image, and obtain semantic information of the first object and a mask of the first object;
将所述第一物体的掩膜投射到所述第一时刻的环境地图中,获取所述第一物体的三维位置信息;Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object;
根据所述摄像头在所述第一时刻的环境地图中的位姿信息和所述第一物体的三维位置信息,获取所述摄像头和所述第一物体之间的相对位姿信息;Obtain relative pose information between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object;
播报所述相对位姿信息和所述第一物体的语义信息。The relative pose information and the semantic information of the first object are broadcast.
在一种可能的实现方式中,所述第一时刻的环境地图包括多个特征点;所述将所述第一物体的掩膜投射到所述第一时刻的环境地图中,获取所述第一物体的三维位置信息,包括:In a possible implementation, the environment map at the first moment includes a plurality of feature points; the mask of the first object is projected onto the environment map at the first moment to obtain the first Three-dimensional position information of an object, including:
将所述第一物体的掩膜投射到所述第一时刻的环境地图中,在所述多个特征点中获取所述掩膜对应的目标特征点;Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask from the plurality of feature points;
根据所述目标特征点的三维位置信息,得到所述掩膜的中心在所述第一时刻的环境地图中的三维位置信息,并确定为所述第一物体的三维位置信息。According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained and determined as the three-dimensional position information of the first object.
在一种可能的实现方式中,获取所述第一物体的掩膜,包括:In a possible implementation, obtaining the mask of the first object includes:
根据所述第一物体的语义信息,在所述第一物体上生成所述第一物体的掩膜;所述第一物体的掩膜的区域大小和形状与所述第一物体的类型相关。According to the semantic information of the first object, a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.
在一种可能的实现方式中,所述相对位姿信息包括所述摄像头和所述第一物体之间的相对距离和相对角度;In a possible implementation, the relative pose information includes the relative distance and relative angle between the camera and the first object;
所述播报所述相对位姿信息和所述第一物体的语义信息,包括:The broadcasting of the relative pose information and the semantic information of the first object includes:
若所述相对距离在预设距离范围内,或者,所述相对角度在预设角度范围内,则播报所述相对位姿信息和所述第一物体的语义信息。If the relative distance is within the preset distance range, or the relative angle is within the preset angle range, the relative pose information and the semantic information of the first object are broadcast.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
通过摄像头获取第二图像序列,所述第二图像序列包括第二时刻的第二图像,所述第二时刻位于所述第一时刻之后;Acquire a second image sequence through the camera, the second image sequence including a second image at a second time, the second time being located after the first time;
根据所述第二图像序列和所述第一时刻的环境地图,获取第二时刻的中间环境地图和所述摄像头在所述中间环境地图中的位姿信息;According to the second image sequence and the environment map at the first moment, obtain an intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;
将所述中间环境地图中所述摄像头周围预设范围的部分确定为所述第二时刻的环境地图。The portion of the preset range around the camera in the intermediate environment map is determined as the environment map at the second moment.
本申请实施例通过将摄像头周围预设范围内的部分确定为第二时刻的环境地图,降低了本申请实施例的计算负荷。This embodiment of the present application reduces the computational load of the embodiment of the present application by determining the part within the preset range around the camera as the environment map at the second moment.
在一种可能的实现方式中,所述根据所述第一图像序列,获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息,包括:In a possible implementation, obtaining the environment map at the first moment and the pose information of the camera in the environment map at the first moment according to the first image sequence includes:
通过IMU获取第一数据;Obtain the first data through IMU;
根据所述第一数据和所述第一图像序列,通过SLAM方法获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息。According to the first data and the first image sequence, the environment map at the first moment and the pose information of the camera in the environment map at the first moment are obtained through the SLAM method.
在一种可能的实现方式中,所述方法还包括:In a possible implementation, the method further includes:
接收超声波检测装置发送的障碍物信息;所述超声波检测装置用于检测所述摄像头前方是否有障碍物,所述障碍物信息包括所述摄像头与所述障碍物之间的距离;Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle;
播报所述障碍物的信息。Broadcast information about the obstacle.
第二方面,本申请提供了一种导盲装置,包括:获取模块、地图构建模块、物体识别模块、投影模块、确定模块和播报模块;In the second aspect, this application provides a guide device for the blind, including: an acquisition module, a map construction module, an object recognition module, a projection module, a determination module and a broadcast module;
所述获取模块,用于通过摄像头获取第一图像序列,所述第一图像序列包括第一时刻的第一图像;The acquisition module is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;
所述地图构建模块,用于根据所述第一图像序列,获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息;The map construction module is used to obtain the environment map of the first moment and the pose information of the camera in the environment map of the first moment according to the first image sequence;
所述物体识别模块,用于在所述第一图像中识别出第一物体,获取所述第一物体的语义信息和所述第一物体的掩膜;The object recognition module is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object;
所述投影模块,用于将所述第一物体的掩膜投射到所述第一时刻的环境地图中,获取所述第一物体的三维位置信息;The projection module is used to project the mask of the first object into the environment map at the first moment to obtain the three-dimensional position information of the first object;
所述确定模块,用于根据所述摄像头在所述第一时刻的环境地图中的位姿信息和所述第一物体的三维位置信息,获取所述摄像头和所述第一物体之间的相对位姿信息;The determination module is configured to obtain the relative position between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. pose information;
所述播报模块,用于播报所述相对位姿信息和所述第一物体的语义信息。The broadcast module is used to broadcast the relative pose information and the semantic information of the first object.
第三方面,本申请实施例提供了一种导盲装置,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,用于执行上述第一方面的任意可能的实现方式中的方法。In a third aspect, embodiments of the present application provide a blind guide device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, for executing any of the above-mentioned aspects of the first aspect. Methods in possible implementations.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述可读存储介质用于保存计算机程序,所述计算机程序被处理器执行时,能够实现上述第一方面的任意可能的实现方式中的方法。In the fourth aspect, embodiments of the present application provide a computer-readable storage medium. The readable storage medium is used to store a computer program. When the computer program is executed by a processor, any possible method of the first aspect can be implemented. Methods in the implementation.
第五方面,本申请实施例提供了一种计算机程序产品,当计算机程序在导盲装置上运行时,使得导盲装置执行上述第一方面的任意可能的实现方式中的方法。In a fifth aspect, embodiments of the present application provide a computer program product, which when the computer program is run on a blind guide device, causes the blind guide device to execute the method in any possible implementation of the first aspect.
有益效果beneficial effects
本申请实施例提供的导盲方法的有益效果在于:通过摄像头实时采集周围环境的图像,基于图像构建环境地图,基于图像识别出周围环境中的物体。通过生成物体的掩膜并将掩膜透射到环境地图中,获得物体在环境地图中的三维位置信息,进而,可以得到摄像头和物体之间的相对位姿。通过播报摄像头和物体之间的相对位姿和物体的语义信息,使得用户准确地感知周围的环境,提高了导盲方法的可靠性。The beneficial effects of the blind guiding method provided by the embodiments of the present application are: collecting images of the surrounding environment in real time through cameras, constructing an environment map based on the images, and identifying objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the three-dimensional position information of the object in the environment map is obtained, and then the relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and improve the reliability of the blind guidance method.
本申请实施例提供的导盲装置及可读存储介质的有益效果可以参见上述导盲方法中的相关描述。For the beneficial effects of the blind guiding device and the readable storage medium provided by the embodiments of the present application, please refer to the relevant descriptions in the above blind guiding method.
附图说明Description of the drawings
图1是本申请实施例的应用场景示意图;Figure 1 is a schematic diagram of an application scenario according to an embodiment of the present application;
图2是本申请实施例提供的导盲方法的流程示意图;Figure 2 is a schematic flowchart of a blind guiding method provided by an embodiment of the present application;
图3是本申请实施例提供的第一图像的示意图;Figure 3 is a schematic diagram of a first image provided by an embodiment of the present application;
图4是本申请实施例提供的摄像头位姿的示意图;Figure 4 is a schematic diagram of the camera pose provided by the embodiment of the present application;
图5是本申请实施例提供的导盲装置的示意性框图;Figure 5 is a schematic block diagram of a blind guide device provided by an embodiment of the present application;
图6是本申请实施例提供的导盲装置的结构示意图。Figure 6 is a schematic structural diagram of a blind guide device provided by an embodiment of the present application.
本发明的实施方式Embodiments of the invention
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of explanation rather than limitation, specific details such as specific system structures and technologies are provided to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It will be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described features, integers, steps, operations, elements and/or components but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or collections thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and the appended claims, the term "if" may be interpreted as "when" or "once" or "in response to determining" or "in response to detecting" depending on the context. ". Similarly, the phrase "if determined" or "if [the described condition or event] is detected" may be interpreted, depending on the context, to mean "once determined" or "in response to a determination" or "once the [described condition or event] is detected ]" or "in response to detection of [the described condition or event]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包括”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。Reference in this specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Therefore, the phrases "in one embodiment", "in some embodiments", "in other embodiments", "in other embodiments", etc. appearing in different places in this specification are not necessarily References are made to the same embodiment, but rather to "one or more but not all embodiments" unless specifically stated otherwise. The terms "includes," "includes," "having," and variations thereof all mean "including but not limited to," unless otherwise specifically emphasized.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this application.
本申请提供了一种导盲方法,具体是一种通过摄像头感知周围环境中的物体的导盲方法。This application provides a blind guiding method, specifically a blind guiding method that uses a camera to sense objects in the surrounding environment.
在一种可能的实现方式中,本申请实施例所公开的摄像头为视障人士可在出行时携带的装置上的摄像头,比如智能手机的摄像头,或手持导盲装置的摄像头。In a possible implementation, the camera disclosed in the embodiment of the present application is a camera on a device that the visually impaired can carry when traveling, such as a camera on a smartphone or a camera on a handheld guide device.
在一种可能的应用场景中,本申请实施例所公开的导盲方法应用于具有摄像头的导盲装置。图1为本申请实施例的应用场景示意图,其中包括用户10,导盲装置20和物体30。In one possible application scenario, the blind guiding method disclosed in the embodiment of the present application is applied to a blind guiding device with a camera. Figure 1 is a schematic diagram of an application scenario according to an embodiment of the present application, which includes a user 10, a guide device 20 and an object 30.
其中,用户10可以是需要借助导盲方法出行的视障人士。Among them, the user 10 may be a visually impaired person who needs to use a guide to travel.
导盲装置20用于通过摄像头获取周围环境的图像,并根据图像获取环境地图实现导盲功能。在一种可能的实现方式中,导盲装置20为智能手机。The blind guiding device 20 is used to acquire images of the surrounding environment through a camera, and acquire an environmental map based on the image to implement the blind guiding function. In a possible implementation, the blind guide device 20 is a smartphone.
物体30指的是环境中的物体,比如一棵树,一辆车。Object 30 refers to objects in the environment, such as a tree or a car.
智能手机的摄像头用于获取周围环境的图像,智能手机通过摄像头获取的图像得到环境地图、地图中的物体的语义信息以及摄像头和物体的相对位姿信息。The camera of the smartphone is used to obtain images of the surrounding environment. The smartphone obtains the environment map, the semantic information of the objects in the map, and the relative pose information of the camera and the object through the images obtained by the camera.
视障人士在手持智能手机行走的过程中,智能手机通过语音向视障人士播报周围环境中有什么物体,以及物体对应的位置,实现感知周围环境以及寻找目的地或者目标物体的目的。When a visually impaired person walks with a smartphone in hand, the smartphone broadcasts to the visually impaired person what objects are in the surrounding environment and the corresponding location of the object through voice, so as to realize the purpose of perceiving the surrounding environment and finding the destination or target object.
可选地,智能手机还可以外接超声波检测装置,超声波检测装置用于检测摄像头前方是否有障碍物,障碍物信息包括摄像头与障碍物之间的距离。智能手机接收超声波检测装置发送的障碍物信息,并播报障碍物信息。Optionally, the smartphone can also be connected to an ultrasonic detection device. The ultrasonic detection device is used to detect whether there are obstacles in front of the camera. The obstacle information includes the distance between the camera and the obstacles. The smartphone receives the obstacle information sent by the ultrasonic detection device and broadcasts the obstacle information.
其中,超声波检测装置以有线或无线的方式与智能手机连接,在一种可能的实现方式中,超声波检测装置通过type-C接口与智能手机连接。在另一种可能的实现方式中,超声波检测装置通过蓝牙与智能手机连接。The ultrasonic detection device is connected to the smartphone in a wired or wireless manner. In a possible implementation, the ultrasonic detection device is connected to the smartphone through a type-C interface. In another possible implementation, the ultrasonic detection device is connected to the smartphone via Bluetooth.
值得注意的是,超声波检测装置与智能手机的距离需要保持在最大连接距离之内。最大连接距离指的是超声波检测装置与智能手机之间能够实现数据传输,且超声波检测装置的位置可以被近似为智能手机位置的最大距离。It is worth noting that the distance between the ultrasonic detection device and the smartphone needs to be kept within the maximum connection distance. The maximum connection distance refers to the maximum distance at which data transmission can be achieved between the ultrasonic detection device and the smartphone, and the position of the ultrasonic detection device can be approximated to the position of the smartphone.
在另一种可能的应用场景中,本申请实施例所提供导盲装置20还可以是具有嵌入式系统的装置。在一种可能的实现方式中,所述嵌入式系统为嵌入式开发板(Advanced RISC Machines, ARM)。In another possible application scenario, the blind guide device 20 provided in the embodiment of the present application may also be a device with an embedded system. In a possible implementation, the embedded system is an embedded development board (Advanced RISC Machines, ARM).
应该理解,图1中的场景只是示例性地说明本申请实施例的一个应该场景,本申请实施例并不限于此。It should be understood that the scenario in Figure 1 is only an example of a possible scenario of the embodiment of the present application, and the embodiment of the present application is not limited thereto.
下面结合图2对本申请实施例提出的导盲方法做出具体的说明。图2是本申请实施例提供的导盲方法的流程示意图,该方法包括以下步骤:The method for guiding the blind proposed in the embodiment of the present application will be described in detail below with reference to Figure 2 . Figure 2 is a schematic flowchart of a blind guiding method provided by an embodiment of the present application. The method includes the following steps:
步骤201:通过摄像头获取第一图像序列,第一图像序列包括第一时刻的第一图像。Step 201: Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment.
本申请实施例对于摄像头的结构不做出限定。在一种可能的实现方式中,摄像头为单目摄像头。在另一种可能的实现方式中,本申请实施例所采用的摄像头为双目摄像头。The embodiments of this application do not limit the structure of the camera. In a possible implementation, the camera is a monocular camera. In another possible implementation, the camera used in the embodiment of the present application is a binocular camera.
可选地,当本申请实施例所提供的导盲方法应用于智能手机时,通过智能手机的摄像头来获取图像。Optionally, when the blind guiding method provided by the embodiment of the present application is applied to a smartphone, the image is acquired through the camera of the smartphone.
值得注意的是,本申请实施例通过摄像头所获得的是由多张图像所组成的图像序列,且多张图像具有时间上的先后顺序。换种表述,本申请实施例公开的获取图像为一个动态的过程,在时间发展的过程中持续获取图像,为了进行区分,这里的图像序列可以称为第一图像序列。It is worth noting that what is obtained through the camera in the embodiment of the present application is an image sequence composed of multiple images, and the multiple images have a temporal sequence. To put it another way, the image acquisition disclosed in the embodiments of the present application is a dynamic process, and images are continuously acquired as time develops. In order to distinguish, the image sequence here can be called the first image sequence.
其中,第一图像序列中包括摄像头在第一时刻拍摄的第一图像。值得注意的是,本申请实施例公开的第一时刻和第一图像中的“第一”仅用作区分作用,第一时刻可以指摄像头获取图像过程中的任何一个时刻,第一图像则是对应时刻所获取的图像。The first image sequence includes the first image captured by the camera at the first moment. It is worth noting that the “first” in the first moment and the first image disclosed in the embodiments of this application is only used for distinction. The first moment can refer to any moment in the process of acquiring an image by the camera, and the first image is The image acquired at the corresponding time.
需要说明的是,本申请实施例未对摄像头的姿态做出限定。在视障人士使用导盲装置行走的过程中,摄像头的姿态可能会发生改变。这样,针对同样的环境进行拍摄时,由于摄像头处于不同的姿态,摄像头拍摄的图形中物体的姿态也有所不同。示例性的,图3为本申请实施例提供的第一图像的示意图,摄像头在第一时刻针对物体得到了第一图像310,在视障人士行走过程中摄像头的姿态发生了改变,在第一时刻时间顺序后的一个时刻,摄像头针对同样的物体得到了图像320。It should be noted that the embodiment of the present application does not limit the posture of the camera. When a visually impaired person walks using a guide device, the posture of the camera may change. In this way, when shooting in the same environment, because the camera is in different postures, the postures of the objects in the graphics captured by the camera are also different. Illustratively, Figure 3 is a schematic diagram of the first image provided by the embodiment of the present application. The camera obtains the first image 310 of the object at the first moment. The posture of the camera changes during the walking process of the visually impaired person. At the first At a moment later in the time sequence, the camera obtained image 320 of the same object.
步骤202:根据第一图像序列,获取第一时刻的环境地图和摄像头在第一时刻的环境地图中的位姿信息。Step 202: According to the first image sequence, obtain the environment map at the first moment and the pose information of the camera in the environment map at the first moment.
其中,位姿信息即位置和姿态,包括位置信息和角度信息。具体地,本申请实施例的位姿信息包括物体在空间坐标系中的坐标,以及物体与坐标轴之间的角度信息。Among them, pose information is position and attitude, including position information and angle information. Specifically, the pose information in this embodiment of the present application includes the coordinates of the object in the spatial coordinate system, and the angle information between the object and the coordinate axis.
根据前文所述,本申请实施例获得的第一图像序列中包括不同时刻的图像,值得注意的是,本申请实施例所称的摄像头在第一时刻的环境地图中的位姿信息,即摄像头在第一时刻的位姿信息。为了便于描述,将第一时刻摄像头在第一时刻的环境地图中的位姿信息表述为C。According to the foregoing, the first image sequence obtained by the embodiment of the present application includes images at different times. It is worth noting that the pose information of the camera in the environment map at the first moment, that is, the camera pose information at the first moment. For the convenience of description, the pose information of the camera at the first moment in the environment map at the first moment is expressed as C.
本申请实施例根据第一图像序列,获取第一时刻的环境地图和摄像头在第一时刻的环境地图中的位姿信息的方法,包括:In the embodiment of the present application, a method for obtaining an environment map at the first moment and the pose information of the camera in the environment map at the first moment based on the first image sequence includes:
在一种可能的实现方式中,使用即时定位与地图构建(Simultaneous Localization and Mapping, SLAM)方法获取第一时刻的环境地图和摄像头在第一时刻的环境地图中的位姿。In one possible implementation, using Simultaneous Localization and Map Construction and Mapping, SLAM) method to obtain the environment map at the first moment and the position and orientation of the camera in the environment map at the first moment.
SLAM是一种解决定位导航和地图构建问题的方法。SLAM方法不需要借助其它外部的环境信息,根据图像就能够完成地图的构建,具有适用性广和计算效率高的优点。可选的,在本实施例中,通过SLAM方法获得的环境地图为3D稀疏结构地图。3D稀疏结构地图中包括特征点,通过特征点在空间中的坐标帮助用户感知环境信息,准确性较高,且不需要储存和处理过多的信息,具有较快的计算速度。SLAM is a method for solving positioning navigation and map construction problems. The SLAM method does not require the use of other external environmental information and can complete map construction based on images. It has the advantages of wide applicability and high computational efficiency. Optionally, in this embodiment, the environment map obtained through the SLAM method is a 3D sparse structure map. The 3D sparse structure map includes feature points, which help users perceive environmental information through the coordinates of the feature points in space. It has high accuracy, does not need to store and process too much information, and has a faster calculation speed.
其中,特征点在图像处理领域指的是图像灰度值发生剧烈变化的点或者在图像边缘上曲率较大的点(即两个边缘的交点)。Among them, the feature point in the field of image processing refers to the point where the gray value of the image changes drastically or the point with large curvature on the edge of the image (that is, the intersection of two edges).
值得注意的是,3D稀疏结构地图中并不包括语义信息。It is worth noting that semantic information is not included in the 3D sparse structure map.
在另一种可能的实现方式中,通过惯性测量单元(Inertial Measurement Unit, IMU)获取IMU数据;根据第一数据和第一图像序列,通过SLAM方法获得第一时刻的环境地图和摄像图在第一时刻的环境地图中的位姿信息。其中,第一数据指IMU采集到的数据。In another possible implementation, through an inertial measurement unit (Inertial Measurement Unit, IMU) obtains IMU data; according to the first data and the first image sequence, obtain the environment map at the first moment and the pose information of the camera image in the environment map at the first moment through the SLAM method. Among them, the first data refers to the data collected by the IMU.
具体地,IMU能够对角速度和加速度进行采集。在本申请实施例的实际应用场景中,IMU所采集的角速度和加速度可以被近似为摄像头的角速度和加速度。Specifically, the IMU can collect angular velocity and acceleration. In the actual application scenario of the embodiment of this application, the angular velocity and acceleration collected by the IMU can be approximated as the angular velocity and acceleration of the camera.
在一种可能的实现方式中,将IMU采集的角速度和加速度与SLAM方法获得的图像序列的信息相融合,获得第一时刻的环境地图和摄像图在第一时刻的环境地图中的位姿信息。在另一种可能的实现方式中,通过IMU采集的角速度和加速度获得摄像头的位姿信息,再与SLAM方法获得的摄像头位姿信息相融合,获得第一时刻的环境地图和摄像图在第一时刻的环境地图中的位姿信息。In one possible implementation, the angular velocity and acceleration collected by the IMU are integrated with the information of the image sequence obtained by the SLAM method to obtain the environment map at the first moment and the pose information of the camera image in the environment map at the first moment. . In another possible implementation, the pose information of the camera is obtained through the angular velocity and acceleration collected by the IMU, and then fused with the camera pose information obtained by the SLAM method to obtain the environment map and camera image at the first moment. The pose information in the environment map at all times.
通过对IMU所采集的数据和SLAM方法进行融合,能够提高SLAM方法获取的环境地图以及摄像头在地图中的位姿的准确性。By fusing the data collected by the IMU with the SLAM method, the accuracy of the environment map obtained by the SLAM method and the position of the camera in the map can be improved.
具体地,本申请所采用的方法为视觉SLAM方法,也就是通过摄像头来实现对于周围环境的感知。Specifically, the method used in this application is the visual SLAM method, which uses cameras to realize the perception of the surrounding environment.
本申请对视觉SLAM所采取的具体方法不做限定,能够通过图像得到环境地图和摄像头位姿的视觉SLAM方法都可以应用到本申请实施例中。在一种可能的实现方式中,本申请采用单目视觉SLAM方法,即通过单目序列图像来构建环境地图。This application does not limit the specific method adopted by visual SLAM. Any visual SLAM method that can obtain the environment map and camera pose through images can be applied to the embodiments of this application. In a possible implementation, this application uses the monocular visual SLAM method, that is, to construct an environment map through monocular sequence images.
在通过单目序列图像来构建环境地图的情况下,只用一个摄像头来获取周围的图像信息,也就是说同一时刻只能获得单张周围环境图像。在将本申请实施例所提供的导盲方法应用到智能手机的应用场景下,单目视觉SLAM的方法具有便携和低成本的优点。When constructing an environment map through monocular sequence images, only one camera is used to obtain surrounding image information, which means that only a single image of the surrounding environment can be obtained at the same time. In the application scenario where the blind guidance method provided by the embodiment of the present application is applied to smartphones, the monocular visual SLAM method has the advantages of portability and low cost.
可选地,在一种实现方式中,通过单目序列图像进行环境地图的构建时,通过在两个不同时刻对应的两张图片之间进行特征点匹配,计算出以摄像头为原点的相机坐标系在这两个时刻之间的坐标变换,从而以迭代的方式实现环境地图的构建并得出摄像头的位姿。图像特征点在基于特征点的图像匹配中有着十分重要的作用。图像特征点能够反映图像本质特征,标识图像中目标物体。通过特征点匹配能够完成图像的匹配。Optionally, in one implementation, when constructing an environment map through monocular sequence images, the camera coordinates with the camera as the origin are calculated by matching feature points between two pictures corresponding to two different times. The coordinate transformation between these two moments is used to construct the environment map and obtain the camera pose in an iterative manner. Image feature points play a very important role in image matching based on feature points. Image feature points can reflect the essential characteristics of the image and identify the target objects in the image. Image matching can be completed through feature point matching.
图4为本申请实施例提供的摄像头位姿的示意图。类似于图3所表现出的场景,摄像头在不同时刻具有不同的位姿。SLAM方法能够匹配处于空间中同一位置的特征点在不同时刻的图像中的对应关系,示例性地,在两张图像中匹配出了特征点P,摄像头在两个不同时刻所处的相机坐标系分别为(x, y, z)和(x’, y’, z’)。在得出特征点P在不同时刻图像中的匹配结果的情况下,能够通过几何分析方法得到从坐标系(x, y, z)到(x’, y’, z’)的坐标变换信息,从而得到摄像头的位姿。Figure 4 is a schematic diagram of the camera pose provided by the embodiment of the present application. Similar to the scene shown in Figure 3, the camera has different poses at different times. The SLAM method can match the correspondence between feature points at the same position in space in images at different times. For example, the feature point P is matched in two images, and the camera coordinate system where the camera is located at two different times is They are (x, y, z) and (x', y', z') respectively. After obtaining the matching results of the feature point P in the image at different times, the coordinate transformation information from the coordinate system (x, y, z) to (x', y', z') can be obtained through geometric analysis method, Thus, the camera pose is obtained.
步骤203:在第一图像中识别出第一物体,获取第一物体的语义信息和第一物体的掩膜。Step 203: Identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object.
在本申请实施例应用的技术领域,语义信息指的是概念层的语义信息,用于指示图像中的物体是什么。例如,图像中有一棵树,“树”就是其中的语义信息。关于本申请实施例涉及的掩膜,在图像处理领域,部分操作需要用选定的图像,对正在处理中的图像进行部分或全部地遮挡,其中起遮挡作用的图像就被称为掩膜。In the technical field to which the embodiments of this application are applied, semantic information refers to semantic information at the conceptual level, which is used to indicate what the objects in the image are. For example, there is a tree in the image, and "tree" is the semantic information in it. Regarding the mask involved in the embodiment of the present application, in the field of image processing, some operations require using a selected image to partially or completely block the image being processed, and the image that plays a blocking role is called a mask.
在本申请的实施例中,物体的掩膜为二维平面图像。且第一物体的掩膜的区域大小与形状与第一物体的类型相关。在第一物体的掩膜与第一物体具有相似的形状的基础上,在后续步骤中将掩膜投射到第一时刻的环境地图中来确定掩膜的位置,并将掩膜的位置作为第一物体的三维位置时,所确定出的第一物体的三维位置具有更高的准确性。In the embodiment of the present application, the mask of the object is a two-dimensional plane image. And the area size and shape of the mask of the first object are related to the type of the first object. On the basis that the mask of the first object has a similar shape to the first object, in subsequent steps, the mask is projected into the environment map at the first moment to determine the position of the mask, and the position of the mask is used as the second object. When determining the three-dimensional position of an object, the determined three-dimensional position of the first object has higher accuracy.
本申请实施例在第一图像中识别第一物体,获取第一物体的语义信息和掩膜的方法包括:In the embodiment of the present application, the method for identifying the first object in the first image and obtaining the semantic information and mask of the first object includes:
在一种可能的实现方式中,通过图像处理的方法获取第一物体的语义信息和掩膜。In a possible implementation, the semantic information and mask of the first object are obtained through image processing.
在另一种可能的实现方式中,采用神经网络模型获取第一物体的语义信息和掩膜。可选地,采取的神经网络模型为深度学习的神经网络。在采取神经网络模型作为获取语义信息和掩膜的方法时,具有准确度高和覆盖面广的优点。In another possible implementation, a neural network model is used to obtain the semantic information and mask of the first object. Optionally, the neural network model adopted is a deep learning neural network. When adopting the neural network model as a method to obtain semantic information and masks, it has the advantages of high accuracy and wide coverage.
值得注意的是,本申请实施例中的掩膜能够表达出物体的语义信息。具体地,在本申请的一种可能的实现方式中,神经网络在获取物体掩膜时需要取得语义分割结果,语义分割结果即区分图像中的不同物体并指出它们的类别,例如一张图像中包括车、人和路灯。进一步地,神经网络基于识别出的物体生成对应的掩膜,且这个掩膜能表达出物体的类别信息和形状信息。It is worth noting that the mask in the embodiment of the present application can express the semantic information of the object. Specifically, in a possible implementation of this application, the neural network needs to obtain a semantic segmentation result when obtaining the object mask. The semantic segmentation result is to distinguish different objects in the image and indicate their categories. For example, in an image Including cars, people and street lights. Furthermore, the neural network generates a corresponding mask based on the recognized object, and this mask can express the category information and shape information of the object.
步骤204:将第一物体的掩膜投射到第一时刻的环境地图中,获取第一物体的三维位置信息。Step 204: Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object.
根据前文中的叙述,在本申请采用视觉SLAM构建环境地图的情况下,所得到的稀疏3D结构地图并不具有语义信息。通过将物体的掩膜投射到步骤202中所获得的地图上,能够使稀疏3D结构地图中的特征点与物体的语义信息相结合,在后续获取第一物体的三维位置信息时,能够使得到的三维位置信息更准确。According to the previous description, when this application uses visual SLAM to construct an environment map, the resulting sparse 3D structure map does not have semantic information. By projecting the mask of the object onto the map obtained in step 202, the feature points in the sparse 3D structure map can be combined with the semantic information of the object. When the three-dimensional position information of the first object is subsequently obtained, the The three-dimensional position information is more accurate.
具体地,根据前文所述,环境地图中包括特征点。可选地,在将第一物体的掩膜投射到第一时刻的环境地图中,获取第一物体的三维位置信息时,所采取的方法包括:Specifically, as mentioned above, the environment map includes feature points. Optionally, when projecting the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object, the method adopted includes:
将第一物体的掩膜投射到第一时刻的环境地图中,在多个特征点中获取掩摸对应的目标特征点;Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask among multiple feature points;
根据目标特征点的三维位置信息,得到掩膜的中心在所述第一时刻的环境地图中的三维位置信息,并确定为第一物体的三维位置信息。According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained, and is determined as the three-dimensional position information of the first object.
可以看出,掩膜被投射到了掩膜对应物体所处的空间坐标,因此,也就能够得到哪些特征点属于哪个物体的信息。在第一物体的掩膜与特征点的对应关系确定的基础上,将掩膜的中心位置近似作为第一物体的中心位置,从而确定第一物体的三维位置信息,所得到的三维位置信息具有更高的准确性。It can be seen that the mask is projected to the spatial coordinates of the object corresponding to the mask. Therefore, the information about which feature points belong to which object can be obtained. On the basis of determining the corresponding relationship between the mask and the feature points of the first object, the center position of the mask is approximated as the center position of the first object, thereby determining the three-dimensional position information of the first object. The obtained three-dimensional position information has Higher accuracy.
为了便于表述,将物体中心的三维位置表述为Pi。For ease of expression, the three-dimensional position of the object center is expressed as Pi.
步骤205:根据摄像头在第一时刻的环境地图中的位姿信息和第一物体的三维位置信息,获取摄像头和第一物体之间的相对位姿信息。Step 205: Obtain relative pose information between the camera and the first object based on the camera's pose information in the environment map at the first moment and the three-dimensional position information of the first object.
具体地,相对位姿信息包括摄像头和第一物体之间的相对距离和相对角度。在本步骤中,根据摄像头的位姿信息C,和物体中心的三维位置Pi,确定出摄像头位姿C和物体位置Pi之间的距离,以及物体中心的三维位置Pi相对于摄像头位姿C的角度。Specifically, the relative pose information includes the relative distance and relative angle between the camera and the first object. In this step, based on the pose information C of the camera and the three-dimensional position Pi of the object center, the distance between the camera pose C and the object position Pi is determined, as well as the distance between the three-dimensional position Pi of the object center and the camera pose C. angle.
在本申请实施例的一种可能的应用场景中,摄像头被设置在智能手机或者导盲装置上,因此,摄像头的位置也可以被理解为人所在的位置。在这种情况下,在获得摄像头位姿信息与物体之间的相对位姿信息时,视障人士就可以根据这一信息确定物体所在的位置,实现导盲的功能。示例性地,一棵树和摄像头的相对位姿信息为“树和摄像头之间的距离为2米,相对角度为30度2米”,视障人士可以根据这一相对位姿信息感知到环境中存在一棵树。In a possible application scenario of the embodiment of the present application, the camera is set on a smartphone or a blind guide device. Therefore, the position of the camera can also be understood as the position of the person. In this case, when the relative pose information between the camera pose information and the object is obtained, the visually impaired can determine the location of the object based on this information and realize the function of guiding the blind. For example, the relative posture information of a tree and the camera is "the distance between the tree and the camera is 2 meters, and the relative angle is 30 degrees 2 meters." Visually impaired people can perceive the environment based on this relative posture information. There is a tree in .
步骤206:播报相对位姿信息和第一物体的语义信息。Step 206: Broadcast the relative pose information and the semantic information of the first object.
示例性的,语音播报内容可以为“前方偏左20度5米处有一个人”,或者是“前方偏右30度2米处有一棵树”。For example, the voice broadcast content may be "There is a person 20 degrees to the left and 5 meters ahead", or "There is a tree 30 degrees and 2 meters to the right ahead."
可选地,播报相对位姿信息和第一物体的语义信息,包括:Optionally, broadcast relative pose information and semantic information of the first object, including:
若相对距离在预设距离范围内,或者,相对角度在预设角度范围内,则播报相对位姿信息和第一物体的语义信息。If the relative distance is within the preset distance range, or the relative angle is within the preset angle range, the relative pose information and the semantic information of the first object are broadcast.
具体地,在本申请实施例的实际应用场景中,第一图像中可能包括了多个物体,但部分物体离用户很远,或者,与用户的角度很大,并不在用户的行进路线上,对于导盲没有帮助。因此,可以语音播报视障人士前行方向一定范围内的物体就可以达到导盲的目的,进一步地为视障人士的出行提供准确的周围环境信息,提高用户出行感受。Specifically, in the actual application scenario of the embodiment of the present application, the first image may include multiple objects, but some of the objects are far away from the user, or have a large angle with the user and are not on the user's route. Not helpful for guide blind people. Therefore, the purpose of guiding the blind can be achieved by voice broadcasting objects within a certain range in the direction of the visually impaired person's forward direction, further providing accurate surrounding environment information for the visually impaired person's travel, and improving the user's travel experience.
其中,可以只设定预设距离范围,或只设定预设角度范围,也可以同时设定预设距离范围和预设角度范围。Among them, only the preset distance range can be set, or only the preset angle range can be set, or the preset distance range and the preset angle range can be set at the same time.
可选地,通过默认或人工设置的方式设定预设范围和/或预设角度。Optionally, the preset range and/or the preset angle are set by default or manual setting.
通过多种设置方式,可以改变用户出行时需要感知的周围环境的范围,提高了设置灵活性。Through multiple setting methods, the range of surrounding environment that users need to perceive when traveling can be changed, improving setting flexibility.
可以看出,本申请实施例所提供的导盲方法通过摄像头实时采集周围环境的图像,基于图像构建环境地图,基于图像识别出周围环境中的物体。通过生成物体的掩膜并将掩膜透射到环境地图中,获得物体在环境地图中准确的三维位置信息,进而,可以得到摄像头和物体之间准确的相对位姿。通过播报摄像头和物体之间的相对位姿和物体的语义信息,使得用户准确地感知周围的环境,并提醒用户准确避障,提高了导盲方法的可靠性。并且,在本申请实施例中,通过对摄像头实时采集的图像进行处理,可以获取摄像头和物体之间的相对位姿和物体的语义信息,不需要额外的硬件支持,设备要求简单,易于实现。It can be seen that the blind guiding method provided by the embodiment of the present application collects images of the surrounding environment in real time through the camera, builds an environment map based on the images, and identifies objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the accurate three-dimensional position information of the object in the environment map is obtained. Furthermore, the accurate relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and remind the user to avoid obstacles accurately, which improves the reliability of the blind guide method. Moreover, in the embodiment of the present application, by processing the images collected by the camera in real time, the relative pose between the camera and the object and the semantic information of the object can be obtained. No additional hardware support is required, and the equipment requirements are simple and easy to implement.
可选地,为了降低计算负荷,本申请实施例公开的导盲方法还包括:Optionally, in order to reduce the computational load, the blind guiding method disclosed in the embodiment of this application also includes:
通过摄像头获取第二图像序列,第二图像序列包括第二时刻的第二图像,第二时刻位于所述第一时刻之后;Acquire a second image sequence through the camera, the second image sequence includes a second image at a second time, and the second time is located after the first time;
根据第二图像序列和第一时刻的环境地图,获取第二时刻的中间环境地图和摄像头在中间环境地图中的位姿信息;According to the second image sequence and the environment map at the first moment, obtain the intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;
将中间环境地图中摄像头周围预设范围的部分确定为所述第二时刻的环境地图。The portion of the preset range around the camera in the intermediate environment map is determined as the environment map at the second moment.
其中,第二图像序列指的是在时间顺序上晚于第一图像序列的任意图像序列,中间环境地图指的是根据摄像头所获得的所有图像序列所确定的环境地图。本申请实施例对预设范围的选择不做限定,可选地,预设范围来源于默认或人工设定。The second image sequence refers to any image sequence that is later than the first image sequence in time sequence, and the intermediate environment map refers to the environment map determined based on all image sequences obtained by the camera. The embodiment of the present application does not limit the selection of the preset range. Optionally, the preset range is derived from default or manual settings.
在该实现方式中,本申请实施例所提供的导盲方法会持续进行环境地图的构建。因此,随着时间的推进和摄像头所处的位置不同,本申请实施例所构建的环境地图被持续扩充,对应的位姿信息也不断增加。在第二时刻确定环境地图和位姿信息时需要进行处理的数据也会越来越多,会占用储存空间并降低计算效率。通过只保留预设范围内的地图信息和位姿信息,即随着行走距离的增加不断舍弃前期的地图和位姿信息,能够降低本申请实施例所提供的导盲方法的计算负荷。In this implementation, the blind guidance method provided by the embodiment of the present application will continue to construct the environment map. Therefore, as time progresses and the location of the camera changes, the environment map constructed by the embodiment of the present application continues to expand, and the corresponding pose information also continues to increase. More and more data need to be processed when determining the environment map and pose information at the second moment, which will occupy storage space and reduce computing efficiency. By retaining only the map information and pose information within the preset range, that is, continuously discarding the previous map and pose information as the walking distance increases, the calculation load of the blind guiding method provided by the embodiment of the present application can be reduced.
可选地,作为对本申请实施例的进一步改进,本申请实施例还包括一种用超声波检测装置检测障碍物的方法,具体包括以下步骤:Optionally, as a further improvement to the embodiments of the present application, the embodiments of the present application also include a method of detecting obstacles using an ultrasonic detection device, which specifically includes the following steps:
步骤A:接收超声波检测装置发送的障碍物信息;所述超声波检测装置用于检测所述摄像头前方是否有障碍物,所述障碍物信息包括所述摄像头与所述障碍物之间的距离。Step A: Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle.
在本申请实施例一种可能的应用场景中,当采用智能手机获取图像序列并构建环境地图时,在智能手机上外接超声波检测装置。In one possible application scenario of the embodiment of the present application, when a smartphone is used to acquire an image sequence and construct an environment map, an ultrasonic detection device is externally connected to the smartphone.
可选地,为了导盲的便利,超声波检测的方向与摄像头的方向一致,从而便于盲人判断障碍物的具体方向。Optionally, for the convenience of guiding the blind, the direction of ultrasonic detection is consistent with the direction of the camera, thereby making it easier for the blind to determine the specific direction of the obstacle.
步骤B:播报障碍物信息。Step B: Broadcast obstacle information.
值得注意的是,播报障碍物信息与步骤206中的播报相对位姿信息和语义信息并不互相影响,本申请实施例公开的导盲方法可以对这两种信息进行播报。示例性地,本申请实施例公开的导盲方法可以通过语音播报“请注意,前方偏右30度3米处有一辆汽车”,或是“请注意,面前有障碍物”。It is worth noting that broadcasting obstacle information and broadcasting relative pose information and semantic information in step 206 do not affect each other. The blind guiding method disclosed in the embodiment of the present application can broadcast these two types of information. For example, the blind guiding method disclosed in the embodiment of the present application can announce through voice "Please note that there is a car 30 degrees and 3 meters to the right in front", or "Please note that there is an obstacle in front of you".
综上所述,本申请实施例所提供的导盲方法通过摄像头实时采集周围环境的图像,基于图像构建环境地图,基于图像识别出周围环境中的物体。通过生成物体的掩膜并将掩膜透射到环境地图中,获得物体在环境地图中准确的三维位置信息,进而,可以得到摄像头和物体之间准确的相对位姿。通过播报摄像头和物体之间的相对位姿和物体的语义信息,使得用户准确地感知周围的环境,提高了导盲方法的可靠性。To sum up, the blind guiding method provided by the embodiment of the present application collects images of the surrounding environment in real time through a camera, builds an environment map based on the images, and identifies objects in the surrounding environment based on the images. By generating a mask of the object and transmitting the mask into the environment map, the accurate three-dimensional position information of the object in the environment map is obtained. Furthermore, the accurate relative pose between the camera and the object can be obtained. By broadcasting the relative pose between the camera and the object and the semantic information of the object, the user can accurately perceive the surrounding environment and improve the reliability of the blind guidance method.
本申请实施例还可以只播报预设距离内或预设角度内的相对位姿信息和对应的物体语义信息,通过语音播报视障人士前行方向一定范围内的物体就可以达到导盲的目的,进一步地为视障人士的出行提供准确的周围环境信息,提高用户出行感受。Embodiments of this application can also only broadcast relative posture information and corresponding object semantic information within a preset distance or a preset angle. The purpose of guiding the blind can be achieved by voice broadcasting objects within a certain range in the forward direction of the visually impaired. , further providing accurate surrounding environment information for visually impaired people when traveling, and improving users’ travel experience.
本申请实施例还可以只将预设范围内的环境地图作为第二时刻的环境地图,即随着视障人士行走距离的增加,只将周围一定范围内的环境地图和位姿信息保留,降低计算负荷。Embodiments of the present application can also use only the environment map within the preset range as the environment map at the second moment. That is, as the walking distance of the visually impaired increases, only the environment map and pose information within a certain range of the surroundings are retained, reducing the cost. Compute load.
本申请实施例还可以接收超声波检测装置检测到的障碍物信息,帮助视障人士及时躲避前方的障碍,提高了导盲方法的可靠性。Embodiments of the present application can also receive obstacle information detected by the ultrasonic detection device, helping the visually impaired to avoid obstacles in front of them in time, and improving the reliability of the blind guide method.
并且,在本申请实施例中,通过对摄像头实时采集的图像进行处理,可以获取摄像头和物体之间的相对位姿和物体的语义信息,不需要额外的硬件支持,设备要求简单,易于实现。Moreover, in the embodiment of the present application, by processing the images collected by the camera in real time, the relative pose between the camera and the object and the semantic information of the object can be obtained. No additional hardware support is required, and the equipment requirements are simple and easy to implement.
图5示出了根据本申请实施例的导盲装置的示意性框图。如图5所示,所述装置500包括:获取模块510,地图构建模块520,物体识别模块530,投影模块540,确定模块550,以及,接收模块560。Figure 5 shows a schematic block diagram of a blind guide device according to an embodiment of the present application. As shown in Figure 5, the device 500 includes: an acquisition module 510, a map construction module 520, an object recognition module 530, a projection module 540, a determination module 550, and a receiving module 560.
获取模块510用于通过摄像头获取第一图像序列,所述第一图像序列包括第一时刻的第一图像。The acquisition module 510 is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment.
地图构建模块520,用于根据第一图像序列,获取第一时刻的环境地图和摄像头在第一时刻的环境地图中的位姿信息。The map construction module 520 is used to obtain the environment map at the first moment and the position and orientation information of the camera in the environment map at the first moment according to the first image sequence.
物体识别模块530,用于在第一图像中识别出第一物体,获取第一物体的语义信息和第一物体的掩膜。The object recognition module 530 is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object.
投影模块540,用于将物体识别模块530所获取的第一物体的掩膜投影到地图构建模块520所获取的第一时刻的环境地图,获取第一物体的三维位置信息。The projection module 540 is configured to project the mask of the first object obtained by the object recognition module 530 onto the environment map at the first moment obtained by the map construction module 520 to obtain the three-dimensional position information of the first object.
确定模块550,根据所述摄像头在所述第一时刻的环境地图中的位姿信息和所述第一物体的三维位置信息,用于确定摄像头和第一物体的三维位置信息间的相对位姿信息。The determination module 550 is used to determine the relative pose between the camera and the three-dimensional position information of the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. information.
播报模块560,用于播报确定模块550所得到的相对位姿信息以及第一物体的语义信息。The broadcast module 560 is used to broadcast the relative pose information obtained by the determination module 550 and the semantic information of the first object.
可选地,本申请实施例提供的导盲装置还可以包括接收模块570,接收模块570用于接收其他设备发送的数据。Optionally, the blind guide device provided by the embodiment of the present application may also include a receiving module 570, which is used to receive data sent by other devices.
在一种可能的实现方式中,第一时刻的环境地图中包括多个特征点,投影模块540用于:In a possible implementation, the environment map at the first moment includes multiple feature points, and the projection module 540 is used to:
将第一物体的掩膜投射到第一时刻的环境地图中,在多个特征点中获取所述掩膜对应的目标特征点;Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask among multiple feature points;
根据目标特征点的三维位置信息,得到掩膜的中心在第一时刻的环境地图中的三维位置信息,并确定为第一物体的三维位置信息。According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained, and is determined as the three-dimensional position information of the first object.
在一种可能的实现方式中,物体识别模块530用于:In a possible implementation, the object recognition module 530 is used to:
根据第一物体的语义信息,在第一物体上生成第一物体的掩膜;第一物体的掩膜的区域大小和形状与第一物体的类型相关。According to the semantic information of the first object, a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.
在一种可能的实现方式中,相对位姿信息包括摄像头和第一物体之间的相对距离和相对角度,播报模块560用于:In a possible implementation, the relative pose information includes the relative distance and relative angle between the camera and the first object, and the broadcast module 560 is used to:
若相对距离在预设距离范围内,或者,相对角度在预设角度范围内,则播报相对位姿信息和第一物体的语义信息。If the relative distance is within the preset distance range, or the relative angle is within the preset angle range, the relative pose information and the semantic information of the first object are broadcast.
在一种可能的实现方式中,获取模块510还用于:In a possible implementation, the acquisition module 510 is also used to:
通过摄像头获取第二图像序列,第二图像序列包括第二时刻的第二图像,第二时刻位于第一时刻之后;Acquire a second image sequence through the camera, the second image sequence includes a second image at a second time, and the second time is located after the first time;
地图构建模块520还用于:Map building module 520 is also used to:
根据第二图像序列和第一时刻的环境地图,获取第二时刻的中间环境地图和摄像头在中间环境地图中的位姿信息;According to the second image sequence and the environment map at the first moment, obtain the intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;
将中间环境地图中摄像头周围预设范围的部分确定为第二时刻的环境地图。Determine the portion of the preset range around the camera in the intermediate environment map as the environment map at the second moment.
在一种可能的实现方式中,地图构建模块520用于:In a possible implementation, the map building module 520 is used to:
通过IMU获取第一数据;Obtain the first data through IMU;
根据第一数据和第一图像序列,通过SLAM方法获取第一时刻的环境地图和摄像头在第一时刻的环境地图中的位姿信息。According to the first data and the first image sequence, the environment map at the first moment and the pose information of the camera in the environment map at the first moment are obtained through the SLAM method.
在一种可能的实现方式中,接收模块570用于:In a possible implementation, the receiving module 570 is used to:
接收超声波检测装置发送的障碍物信息;所述超声波检测装置用于检测所述摄像头前方是否有障碍物,所述障碍物信息包括所述摄像头与所述障碍物之间的距离;Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle;
播报模块560用于播报障碍物的信息。The broadcast module 560 is used to broadcast obstacle information.
图6是本申请实施例提供的导盲装置的结构示意图。如图6所示,导盲装置600包括:至少一个处理器60(图6中仅示出一个)处理器、存储器61以及存储在所述存储器61中并可在所述至少一个处理器60上运行的计算机程序62,所述处理器60执行所述计算机程序62时用于实现上述任意各个导盲方法实施例(比如图2中的方法)中的步骤。Figure 6 is a schematic structural diagram of a blind guide device provided by an embodiment of the present application. As shown in FIG. 6 , the blind guide device 600 includes: at least one processor 60 (only one is shown in FIG. 6 ), a memory 61 , and a processor stored in the memory 61 and available on the at least one processor 60 . The computer program 62 is run. When the processor 60 executes the computer program 62, it is used to implement the steps in any of the above blind guide method embodiments (such as the method in Figure 2).
所称处理器60可以是中央处理单元(Central Processing Unit,CPU),该处理器60还可以是其他通用处理器、数字信号处理器 (Digital Signal Processor,DSP)、专用集成电路 (Application Specific Integrated Circuit,ASIC)、现成可编程门阵列 (Field-Programmable Gate Array,FPGA) 或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 60 may be a central processing unit (Central Processing Unit, CPU). The processor 60 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
所述存储器61在一些实施例中可以是所述装置600的内部存储单元,例如导盲装置600的硬盘或内存。所述存储器61在另一些实施例中也可以是所述装置600的外部存储设备,例如所述装置600上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器61还可以既包括所述装置600的内部存储单元也包括外部存储设备。所述存储器61用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器61还可以用于暂时地存储已经输出或者将要输出的数据。The memory 61 may be an internal storage unit of the device 600 in some embodiments, such as a hard disk or memory of the blind guide device 600 . In other embodiments, the memory 61 may also be an external storage device of the device 600, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital device equipped on the device 600. Digital, SD) card, Flash Card, etc. Further, the memory 61 may also include both an internal storage unit of the device 600 and an external storage device. The memory 61 is used to store operating systems, application programs, boot loaders, data and other programs, such as program codes of the computer programs. The memory 61 can also be used to temporarily store data that has been output or is to be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, only the division of the above functional units and modules is used as an example. In actual applications, the above functions can be allocated to different functional units and modules according to needs. Module completion means dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment can be integrated into one processing unit, or each unit can exist physically alone, or two or more units can be integrated into one unit. The above-mentioned integrated unit can be hardware-based. It can also be implemented in the form of software functional units. In addition, the specific names of each functional unit and module are only for the convenience of distinguishing each other and are not used to limit the scope of protection of the present application. For the specific working processes of the units and modules in the above system, please refer to the corresponding processes in the foregoing method embodiments, and will not be described again here.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, the steps in each of the above method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在导盲装置上运行时,使得导盲装置执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application provide a computer program product. When the computer program product is run on a guide device for the blind, the steps in each of the above method embodiments can be implemented when the guide device is executed.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, this application can implement all or part of the processes in the methods of the above embodiments by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. The computer program When executed by a processor, the steps of each of the above method embodiments may be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device capable of carrying computer program code to the camera device/terminal device, recording media, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. For example, U disk, mobile hard disk, magnetic disk or CD, etc. In some jurisdictions, subject to legislation and patent practice, computer-readable media may not be electrical carrier signals and telecommunications signals.

Claims (10)

  1. 一种导盲方法,其特征在于,包括:A method for guiding the blind, which is characterized by including:
    通过摄像头获取第一图像序列,所述第一图像序列包括第一时刻的第一图像;Acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;
    根据所述第一图像序列,获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息;According to the first image sequence, obtain the environment map at the first moment and the pose information of the camera in the environment map at the first moment;
    在所述第一图像中识别出第一物体,获取所述第一物体的语义信息和所述第一物体的掩膜;Identify a first object in the first image, and obtain semantic information of the first object and a mask of the first object;
    将所述第一物体的掩膜投射到所述第一时刻的环境地图中,获取所述第一物体的三维位置信息;Project the mask of the first object onto the environment map at the first moment to obtain the three-dimensional position information of the first object;
    根据所述摄像头在所述第一时刻的环境地图中的位姿信息和所述第一物体的三维位置信息,获取所述摄像头和所述第一物体之间的相对位姿信息;Obtain relative pose information between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object;
    播报所述相对位姿信息和所述第一物体的语义信息。The relative pose information and the semantic information of the first object are broadcast.
  2. 根据权利要求1所述的方法,其特征在于,所述第一时刻的环境地图包括多个特征点;所述将所述第一物体的掩膜投射到所述第一时刻的环境地图中,获取所述第一物体的三维位置信息,包括:The method of claim 1, wherein the environment map at the first moment includes a plurality of feature points; and projecting the mask of the first object into the environment map at the first moment, Obtaining the three-dimensional position information of the first object includes:
    将所述第一物体的掩膜投射到所述第一时刻的环境地图中,在所述多个特征点中获取所述掩膜对应的目标特征点;Project the mask of the first object onto the environment map at the first moment, and obtain the target feature points corresponding to the mask from the plurality of feature points;
    根据所述目标特征点的三维位置信息,得到所述掩膜的中心在所述第一时刻的环境地图中的三维位置信息,并确定为所述第一物体的三维位置信息。According to the three-dimensional position information of the target feature point, the three-dimensional position information of the center of the mask in the environment map at the first moment is obtained and determined as the three-dimensional position information of the first object.
  3. 根据权利要求2所述的方法,其特征在于,获取所述第一物体的掩膜,包括:The method of claim 2, wherein obtaining the mask of the first object includes:
    根据所述第一物体的语义信息,在所述第一物体上生成所述第一物体的掩膜;所述第一物体的掩膜的区域大小和形状与所述第一物体的类型相关。According to the semantic information of the first object, a mask of the first object is generated on the first object; the area size and shape of the mask of the first object are related to the type of the first object.
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述相对位姿信息包括所述摄像头和所述第一物体之间的相对距离和相对角度;The method according to any one of claims 1-3, wherein the relative pose information includes the relative distance and relative angle between the camera and the first object;
    所述播报所述相对位姿信息和所述第一物体的语义信息,包括:The broadcasting of the relative pose information and the semantic information of the first object includes:
    若所述相对距离在预设距离范围内,或者,所述相对角度在预设角度范围内,则播报所述相对位姿信息和所述第一物体的语义信息。If the relative distance is within the preset distance range, or the relative angle is within the preset angle range, the relative pose information and the semantic information of the first object are broadcast.
  5. 根据权利要求1-3中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, characterized in that the method further includes:
    通过摄像头获取第二图像序列,所述第二图像序列包括第二时刻的第二图像,所述第二时刻位于所述第一时刻之后;Acquire a second image sequence through the camera, the second image sequence including a second image at a second time, the second time being located after the first time;
    根据所述第二图像序列和所述第一时刻的环境地图,获取第二时刻的中间环境地图和所述摄像头在所述中间环境地图中的位姿信息;According to the second image sequence and the environment map at the first moment, obtain an intermediate environment map at the second moment and the pose information of the camera in the intermediate environment map;
    将所述中间环境地图中所述摄像头周围预设范围的部分确定为所述第二时刻的环境地图。The portion of the preset range around the camera in the intermediate environment map is determined as the environment map at the second moment.
  6. 根据权利要求1-3中任一项所述的方法,其特征在于,所述根据所述第一图像序列,获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息,包括:The method according to any one of claims 1 to 3, characterized in that, according to the first image sequence, obtaining the environment map of the first moment and the environment of the camera at the first moment The pose information in the map includes:
    通过IMU获取第一数据;Obtain the first data through IMU;
    根据所述第一数据和所述第一图像序列,通过SLAM方法获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息。According to the first data and the first image sequence, the environment map at the first moment and the pose information of the camera in the environment map at the first moment are obtained through the SLAM method.
  7. 根据权利要求1-3中任一项所述的方法,其特征在于,还包括:The method according to any one of claims 1-3, further comprising:
    接收超声波检测装置发送的障碍物信息;所述超声波检测装置用于检测所述摄像头前方是否有障碍物,所述障碍物信息包括所述摄像头与所述障碍物之间的距离;Receive obstacle information sent by an ultrasonic detection device; the ultrasonic detection device is used to detect whether there is an obstacle in front of the camera, and the obstacle information includes the distance between the camera and the obstacle;
    播报所述障碍物的信息。Broadcast information about the obstacle.
  8. 一种导盲装置,其特征在于,包括:获取模块、地图构建模块、物体识别模块、投影模块、确定模块和播报模块;A guide device for the blind, characterized in that it includes: an acquisition module, a map construction module, an object recognition module, a projection module, a determination module and a broadcast module;
    所述获取模块,用于通过摄像头获取第一图像序列,所述第一图像序列包括第一时刻的第一图像;The acquisition module is configured to acquire a first image sequence through a camera, where the first image sequence includes a first image at a first moment;
    所述地图构建模块,用于根据所述第一图像序列,获取所述第一时刻的环境地图和所述摄像头在所述第一时刻的环境地图中的位姿信息;The map construction module is used to obtain the environment map of the first moment and the pose information of the camera in the environment map of the first moment according to the first image sequence;
    所述物体识别模块,用于在所述第一图像中识别出第一物体,获取所述第一物体的语义信息和所述第一物体的掩膜;The object recognition module is used to identify the first object in the first image, and obtain the semantic information of the first object and the mask of the first object;
    所述投影模块,用于将所述第一物体的掩膜投射到所述第一时刻的环境地图中,获取所述第一物体的三维位置信息;The projection module is used to project the mask of the first object into the environment map at the first moment to obtain the three-dimensional position information of the first object;
    所述确定模块,用于根据所述摄像头在所述第一时刻的环境地图中的位姿信息和所述第一物体的三维位置信息,获取所述摄像头和所述第一物体之间的相对位姿信息;The determination module is configured to obtain the relative position between the camera and the first object based on the pose information of the camera in the environment map at the first moment and the three-dimensional position information of the first object. pose information;
    所述播报模块,用于播报所述相对位姿信息和所述第一物体的语义信息。The broadcast module is used to broadcast the relative pose information and the semantic information of the first object.
  9. 一种导盲装置,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。A guide device for the blind, including a memory, a processor, and a computer program stored in the memory and executable on the processor, characterized in that when the processor executes the computer program, it implements claim 1 The method described in any one of to 7.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。A computer-readable storage medium stores a computer program, characterized in that when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.
PCT/CN2022/101093 2022-06-24 2022-06-24 Blind guiding method and apparatus, and readable storage medium WO2023245615A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/101093 WO2023245615A1 (en) 2022-06-24 2022-06-24 Blind guiding method and apparatus, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/101093 WO2023245615A1 (en) 2022-06-24 2022-06-24 Blind guiding method and apparatus, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2023245615A1 true WO2023245615A1 (en) 2023-12-28

Family

ID=89379021

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/101093 WO2023245615A1 (en) 2022-06-24 2022-06-24 Blind guiding method and apparatus, and readable storage medium

Country Status (1)

Country Link
WO (1) WO2023245615A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108168539A (en) * 2017-12-21 2018-06-15 儒安科技有限公司 A kind of blind man navigation method based on computer vision, apparatus and system
CN110522617A (en) * 2019-09-05 2019-12-03 张超 Blind person's wisdom glasses
CN111743740A (en) * 2020-06-30 2020-10-09 平安国际智慧城市科技股份有限公司 Blind guiding method and device, blind guiding equipment and storage medium
CN113893142A (en) * 2021-10-08 2022-01-07 四川康佳智能终端科技有限公司 Blind person obstacle avoidance method, system, equipment and readable storage medium
WO2022077264A1 (en) * 2020-10-14 2022-04-21 深圳市锐明技术股份有限公司 Object recognition method, object recognition apparatus, and electronic device
CN114488244A (en) * 2022-02-16 2022-05-13 东南大学 Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning
US20220198677A1 (en) * 2020-12-18 2022-06-23 Qualcomm Incorporated Object segmentation and feature tracking

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108168539A (en) * 2017-12-21 2018-06-15 儒安科技有限公司 A kind of blind man navigation method based on computer vision, apparatus and system
CN110522617A (en) * 2019-09-05 2019-12-03 张超 Blind person's wisdom glasses
CN111743740A (en) * 2020-06-30 2020-10-09 平安国际智慧城市科技股份有限公司 Blind guiding method and device, blind guiding equipment and storage medium
WO2022077264A1 (en) * 2020-10-14 2022-04-21 深圳市锐明技术股份有限公司 Object recognition method, object recognition apparatus, and electronic device
US20220198677A1 (en) * 2020-12-18 2022-06-23 Qualcomm Incorporated Object segmentation and feature tracking
CN113893142A (en) * 2021-10-08 2022-01-07 四川康佳智能终端科技有限公司 Blind person obstacle avoidance method, system, equipment and readable storage medium
CN114488244A (en) * 2022-02-16 2022-05-13 东南大学 Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHOU FANGBO; ZHAO HUAILIN; LIU HUAPING: "Indoor Mobile Robot Target Search Based on The Scene Graphs", CAAI TRANSACTIONS ON INTELLIGENT SYSTEMS, vol. 17, no. 05, 22 June 2022 (2022-06-22), pages 1032 - 1038, XP009551459, ISSN: 1673-4785 *

Similar Documents

Publication Publication Date Title
US20210058608A1 (en) Method and apparatus for generating three-dimensional (3d) road model
JP6763448B2 (en) Visually enhanced navigation
JP7240367B2 (en) Methods, apparatus, electronic devices and storage media used for vehicle localization
US6690451B1 (en) Locating object using stereo vision
US20200042803A1 (en) Information processing method, information processing apparatus, and recording medium
US10872246B2 (en) Vehicle lane detection system
CN111310708B (en) Traffic signal lamp state identification method, device, equipment and storage medium
KR102167835B1 (en) Apparatus and method of processing image
CN111461981A (en) Error estimation method and device for point cloud splicing algorithm
WO2022041869A1 (en) Road condition prompt method and apparatus, and electronic device, storage medium and program product
JP2024032933A (en) Method and device for positioning image and map data base
WO2020156923A2 (en) Map and method for creating a map
CN111353453B (en) Obstacle detection method and device for vehicle
WO2023123837A1 (en) Map generation method and apparatus, electronic device, and storage medium
CN109696173A (en) A kind of car body air navigation aid and device
JP2019121876A (en) Image processing device, display device, navigation system, image processing method, and program
JP2018077162A (en) Vehicle position detection device, vehicle position detection method and computer program for vehicle position detection
CN112686951A (en) Method, device, terminal and storage medium for determining robot position
CN112422653A (en) Scene information pushing method, system, storage medium and equipment based on location service
WO2022188333A1 (en) Walking method and apparatus, and computer storage medium
US11956693B2 (en) Apparatus and method for providing location
CN115205384A (en) Blind guiding method and device and readable storage medium
WO2021189420A1 (en) Data processing method and device
WO2023245615A1 (en) Blind guiding method and apparatus, and readable storage medium
CN111191596A (en) Closed area drawing method and device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22947385

Country of ref document: EP

Kind code of ref document: A1