WO2022151507A1 - Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine - Google Patents

Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine Download PDF

Info

Publication number
WO2022151507A1
WO2022151507A1 PCT/CN2021/072581 CN2021072581W WO2022151507A1 WO 2022151507 A1 WO2022151507 A1 WO 2022151507A1 CN 2021072581 W CN2021072581 W CN 2021072581W WO 2022151507 A1 WO2022151507 A1 WO 2022151507A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
area
target area
depth map
movable platform
Prior art date
Application number
PCT/CN2021/072581
Other languages
English (en)
Chinese (zh)
Inventor
施泽浩
封旭阳
聂谷洪
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2021/072581 priority Critical patent/WO2022151507A1/fr
Publication of WO2022151507A1 publication Critical patent/WO2022151507A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application relates to the field of intelligent perception, and in particular, to a control method of a movable platform, a movable platform, a control device and a machine-readable storage medium.
  • Use movable platforms such as drones, smart cars, intelligent robots, etc., to intelligently perceive the target, obtain the distance information between the target and the movable platform, and control the movable platform accordingly.
  • Actions such as interaction are currently a research hotspot in the field of intellisense.
  • the distance information between the target object and the movable platform is usually obtained by acquiring a depth map including the target object, projecting the target frame on the depth map, and averaging the depth values in the target frame.
  • the distance information between the target object and the movable platform obtained based on the above method is often inaccurate, which makes the following, obstacle avoidance, information interaction and other operations performed by controlling the movable platform inaccurate. to safety hazards.
  • the present application provides a movable platform and its control method, control device and machine-readable storage medium.
  • a method for controlling a movable platform includes: acquiring a depth map of a scene where a target object is located and collected by the movable platform; the first target area of the target object; adjust the first target area to increase the proportion of the area corresponding to the target object in the first target area to obtain a tracking area; according to the depth information of the tracking area , controlling the movable platform to move relative to the target.
  • a movable platform includes: an image acquisition device, a memory, and a processor; the image acquisition device is configured to acquire a depth map of a scene where a target object is located;
  • the memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed, is used for performing the following operations: acquiring the depth map and the first depth map including the target object a target area; adjust the first target area to increase the proportion of the area corresponding to the target object in the first target area to obtain a tracking area; control the available tracking area according to the depth information of the tracking area
  • the mobile platform moves relative to the target.
  • a control device includes: a memory and a processor; the memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed When executed, it is used to perform the following operations: obtain the depth map of the scene where the target object is located and collected by the movable platform; obtain the first target area including the target object on the depth map; adjust the first target area to Increasing the proportion of the area corresponding to the target in the first target area to obtain a tracking area; and controlling the movable platform to move relative to the target according to the depth information of the tracking area.
  • a movable platform includes: an image acquisition device, a memory, and a processor; the image acquisition device is configured to acquire a depth map of a scene where a target object is located;
  • the memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed, is used for performing the following operations: acquiring the depth map and the first depth map including the target object a target area; delete all or part of other areas in the first target area that do not correspond to the target object to obtain a tracking area; control the movable platform relative to the tracking area according to the depth information of the tracking area The target moves.
  • a control device includes: a memory and a processor; the memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed When executed, it is used to perform the following operations: obtain the depth map of the scene where the target object is located and collected by the movable platform; obtain the first target area including the target object on the depth map; delete the first target area A tracking area is obtained from all or part of other areas in the area not corresponding to the target object; and the movable platform is controlled to move relative to the target object according to the depth information of the tracking area.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed, the method described in any of the above embodiments is implemented.
  • the target area including the target object is obtained on the depth map of the scene where the target object is located, and the target area is adjusted so that the target area corresponds to the area of the target object
  • the proportion of the tracking area is increased, and the tracking area is obtained, and then the movable platform is controlled to move relative to the target object according to the depth information of the tracking area. Since the target area is adjusted in the present application, the proportion of the area corresponding to the target object in the target area is increased. Therefore, according to the depth information of the tracking area, a more accurate movable platform and target can be obtained.
  • the distance information of the object can then be more accurately controlled to control the movement of the movable platform relative to the target object, avoiding potential safety hazards and providing the movement performance of the movable platform.
  • FIG. 1 is a schematic diagram of a depth map that can be used to obtain target object distance information according to an exemplary embodiment of the present application.
  • Fig. 2 is a flowchart of a method for controlling a movable platform according to an exemplary embodiment of the present application.
  • FIG. 3 is a flowchart of acquiring a first target area including a target on a depth map according to an exemplary embodiment of the present application.
  • FIG. 4 shows the alignment relationship in the cache of image frames captured by the main camera of the movable platform and the binocular camera in an ideal situation according to an exemplary embodiment of the present application.
  • FIG. 5A is a color image including a target obtained by a movable platform according to an exemplary embodiment of the present application.
  • FIG. 5B is a depth map including a target obtained by a movable platform according to an exemplary embodiment of the present application.
  • FIG. 6 is an alignment relationship in the cache of image frames captured by the main camera of the movable platform and the binocular camera in an actual situation according to an exemplary embodiment of the present application.
  • FIG. 7 is a flow chart of correcting the first target area of the depth map by a method based on feature matching according to an exemplary embodiment of the present application.
  • FIG. 8 is an effect diagram of adjusting the first target area by a method based on image semantic segmentation according to an exemplary embodiment of the present application.
  • FIG. 9 is an effect diagram of performing semantic segmentation on an image by an image semantic segmentation method according to an exemplary embodiment of the present application.
  • FIG. 10 is a flowchart of adjusting the first target area based on a semantic segmentation result based on a deep learning model according to an exemplary embodiment of the present application.
  • Fig. 11A is a schematic diagram of a deep learning model for image semantic segmentation according to an exemplary embodiment of the present application.
  • FIG. 11B is a feature response diagram of a deep learning model to a target according to an exemplary embodiment of the present application.
  • FIG. 12 is an effect diagram of etching processing of a morphological operation according to an exemplary embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a movable platform according to an exemplary embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of a control device according to an exemplary embodiment of the present application.
  • Fig. 15 is a flowchart of another method for controlling a movable platform according to an exemplary embodiment of the present application.
  • first, second, third, etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish information of the same type from one another.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • mobile platforms such as unmanned aerial vehicles, smart cars, intelligent robots, etc. themselves or the acquisition devices mounted on them are used to obtain the distance information between the target and the mobile platform, and according to the obtained It is a research hotspot to control the movable platform to perform actions such as following, avoiding obstacles, and information interaction accordingly.
  • the obtained distance information between the target object and the movable platform usually has the problem of inaccuracy, which makes the following, obstacle avoidance, information interaction and other operations performed by controlling the movable platform inaccurate. even bring security risks.
  • the drone is usually equipped with a main camera and a binocular camera.
  • the main camera can obtain a color image including the scene where the target object is located
  • the binocular camera can obtain a binocular depth map including the scene where the target object is located.
  • the target may be occluded by other objects.
  • the depth information obtained on the binocular depth map according to the following target frame area contains background noise, which will cause the drone to measure The distance is not allowed, and the problem of following instability.
  • the target object in the target frame (shown by the gray rectangle) is occluded by grass. Then, since the grass covering the target is relatively close to the UAV, the distance between the UAV and the target calculated based on the average value in the target frame is closer than the real distance. This kind of depth estimation is inaccurate and will Causes the drone to be unable to keep up quickly after the target is far away.
  • the above application scenario is only an exemplary illustration.
  • the obtained distance information between the target and the movable platform may be inaccurate due to other reasons such as the target being a moving object and the inaccurate projection of the projection frame, which is not limited in this application.
  • the target frame may be a rectangular frame or a target frame of other shapes, which is not limited in this application.
  • the present application provides a control method for a movable platform.
  • the control method may include the following steps:
  • Step S201 acquiring the depth map of the scene where the target object is collected and obtained by the movable platform;
  • Step S202 obtaining a first target area including the target on the depth map
  • Step S203 adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area is increased to obtain a tracking area;
  • Step S204 controlling the movable platform to move relative to the target according to the depth information of the tracking area.
  • the movable platform may be an unmanned aerial vehicle, a smart car, an intelligent robot, an unmanned ship, etc., and the specific type of the movable platform is not limited in this application.
  • the depth map of the scene where the target object is collected and obtained by the movable platform can be obtained by a method based on binocular stereo vision, that is, based on the principle of parallax, using an imaging device from different positions Obtain two images of the object to be measured, and obtain the three-dimensional position information of the object by calculating the position deviation of the corresponding points of the images, and then obtain the depth map of the scene where the target object is located.
  • a binocular camera may be mounted on the movable platform, and the binocular camera may be used to obtain an image of the scene where the target object is located. Since the two cameras in the binocular camera are located at different positions, the position of the corresponding point in the image collected by each camera of the binocular camera has a deviation. Based on the deviation information, the three-dimensional image of the object in the image can be extracted. Location information to obtain the depth map of the scene where the target is located.
  • the depth map of the scene where the target is located is obtained based on two images with parallax
  • the three-dimensional information can be obtained by the principle of trigonometry, that is, a triangle is formed between the position of the binocular camera and the target , knowing the positional relationship between the two cameras of the binocular camera, the three-dimensional size of the object in the common field of view of the two cameras and the three-dimensional coordinates of the feature points of the spatial object can be obtained.
  • acquiring the depth map of the scene where the target object is based on two images with parallax can also be implemented based on methods such as deep learning, which is not limited in this application.
  • the target in order to obtain the depth map of the scene where the target object is collected by the movable platform, in addition to the method based on binocular stereo vision, the target can also be obtained based on laser radar, ultrasonic ranging, etc.
  • the depth map is obtained by means of the three-dimensional coordinates of the object, and the present application does not limit the specific way for the movable platform to obtain the depth map of the scene where the target object is located.
  • step S202 obtaining the first target area including the target on the depth map, can be achieved by the manner shown in FIG. 3:
  • Step S301 acquiring a second target area containing the target, and the target area is located on the color image collected by the movable platform;
  • Step S302 projecting the second target area onto the depth map to obtain a first target area including the target on the depth map.
  • the movable platform may be equipped with a main camera capable of acquiring a color image of the space surrounding the location of the movable platform.
  • the main camera may be a camera hung under the drone body or on the pan/tilt on the front side, and is used to collect images of the scene within the camera's field of view, so as to detect the unmanned aerial vehicle. Observation of the environment around the man-machine.
  • the movable platform uses its main camera to capture a color image containing the target, based on the color image, a second target area on the color image containing the target can be acquired.
  • the second target area on the color image that includes the target object may be a target area manually circled by the user, or may be automatically circled based on a condition input by the user in advance that meets the needs of the user
  • the target area for which conditions are input in advance may also be a target area that is automatically selected based on various deep learning models and includes the target object.
  • the present application does not limit the acquisition method of the second target area.
  • the map includes a first target area of the target.
  • projecting the second target area on the acquired color image including the target onto the depth map may be achieved by: projecting the second target area on the color image generated at time T to the depth map on the depth map generated at time T.
  • the color image is collected and generated by the main camera, and the depth map is collected and generated by the binocular camera. Then, the main camera and the binocular camera simultaneously perform In image acquisition, a color image 401 and a depth map 402 of the scene where the target object is located are acquired, wherein T0 to T5 represent color image frames and depth image frames captured at 6 different moments. Ideally, the color image captured by the main camera and the depth map captured by the binocular camera will contain exactly the same image content at the same time.
  • the target area on the color image generated at a certain moment is projected onto the depth map generated at that moment, and the image content contained in the target area is exactly the same, so the second target area of the color image can be
  • the projection result on the depth map is directly used as the first target area.
  • the "first target area" in the first target area is adjusted, which refers to the first target area obtained by direct projection.
  • FIG. 5A and FIG. 5B are respectively the color image containing the target collected by the main camera and the depth map containing the target collected by the binocular camera, wherein the area selected by the black rectangle is the target containing the target area. Since the color image contains more information, the second target area containing the target can be more accurately determined based on the color image. Since the depth map contains depth information, projecting the second target area of the color image containing the target object onto the depth map can obtain the third target area of the depth map containing the target object; based on the depth map The extracted distance information can obtain the distance information between the movable platform and the target.
  • the reason why the third target area containing the target on the depth map is obtained by projection based on the second target area containing the target on the color image collected by the movable platform is because the color image contains more useful information. More and more intuitive, the target area containing the target can be determined faster and more accurately.
  • the depth map is a grayscale map, and directly determining the target area including the target object based on the grayscale map is cumbersome and slow to process, and the accuracy will be affected to a certain extent.
  • the depth map obtained by the movable platform can also be directly obtained based on methods such as feature recognition, contour recognition, and deep learning.
  • the depth map includes the first target area of the target object, which is not limited in this application. Therefore, in this case, in step 203, the "first target area" in the first target area is adjusted, which refers to the first target area directly obtained based on the depth map obtained by the movable platform.
  • the color image captured by the main camera at a certain moment is the same as the color image.
  • the image content included in the depth map captured by the binocular camera at the corresponding moment is exactly the same.
  • the color image captured by the main camera at a certain moment and the image content included in the depth map captured by the binocular camera at the corresponding moment are different at the same pixel position.
  • the color image 601 and the depth map 602 acquired in one acquisition period T0 to T5 are respectively buffered in their respective buffers.
  • the size of the cache is limited due to the different processing time experienced by color images and depth maps. Therefore, at a certain time, the image frames collected by the main camera at time T0 may be discarded first. Then, for the depth map collected at time T0, only the color image at time T1 closest to its time stamp can be found for the target. Box projection.
  • the depth map collected at time T0 is projected based on the color image collected at time T1, and the color image contains
  • the content of the second target region of the target object and the first target region containing the target object in the depth map may be different, thereby causing inaccurate distance information obtained based on the first target region in the depth map.
  • the target may be a moving object
  • the The objects are caused to be located at different pixel locations in the color image and depth map.
  • the content contained in the first target area and the second target area may be different.
  • the distance information between the movable platform and the target object obtained based on the first target area on the depth map is inaccurate.
  • the method includes:
  • Step S701 respectively extracting the first feature point on the color image and the second feature point on the depth map
  • Step S702 performing feature matching on the first feature point and the second feature point
  • Step S703 modifying the obtained first target area of the depth map based on the result of the feature matching.
  • adjusting the first target area containing the target on the depth map can avoid various hardware processing or target Due to reasons such as object movement, the determined first target area on the depth map is inaccurate, which in turn brings about the problem of inaccurate distance information between the movable platform and the target object.
  • the method for extracting feature points can be implemented based on conventional image processing methods such as image texture information and pixel intensity information, and can also be implemented based on deep learning network methods. Of course, it can also be implemented based on other methods. make restrictions.
  • the correction of the obtained first target area of the depth map based on the result of the feature matching may be to make the first feature point on the color image completely align with the second feature point on the depth map, for example , the target is a person, then the extracted first feature point and second feature point can be multiple feature parts such as the nose and mouth of a person, so that the extracted first feature point and second feature point The points are completely aligned one by one, that is, the nose, mouth, etc. in the color image are aligned with the nose, mouth, etc. in the depth map, respectively, to obtain the first target area of the corrected depth map.
  • other correction methods are also possible, for example, aligning the centers of the plurality of first feature points with the centers of the plurality of second feature points, etc.
  • the present application does not limit the specific correction methods.
  • step S203 adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases , which includes: adjusting the first target area, so that the boundary of the first target area shrinks in a direction close to the target.
  • FIG. 5B is the depth map of the scene where the target object (wherein the target object is a person) collected by the movable platform is obtained.
  • the depth map can be obtained including The first target area of the target object (taking the area selected by the black rectangle as the first target area as an example), adjust the first target area in FIG. 5B so that the first target area The proportion of the area corresponding to the target object increases, and the tracking area is obtained, and the tracking area can be shown in FIG. 8 , wherein the tracking area is shown as the area surrounded by the black closed curve of the humanoid outline.
  • the first target area containing the target object in FIG. 5B includes not only the target object-person, but also the ground. If the distance of the first target area is averaged based on the first target area shown in FIG. 5B , the average result is taken as the distance between the movable platform and the target, which is not the real distance, but is the average distance between the target (person) and the background (ground) in the first target area and the movable platform. While the tracking area in FIG. 8 almost only includes the target object (person), the distance information calculated based on the tracking area is closer to the real distance.
  • the first target area is adjusted so that the first target area
  • An increase in the proportion of the area corresponding to the target object in the first target area is achieved by adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the center of the first target area. and when the target object is not located in the center of the first target area, adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, This is achieved by adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the center of the target object.
  • those skilled in the art can determine according to other actual situations, which is not limited in this application.
  • other areas in the area not corresponding to the target object in the first target area can also be deleted.
  • the boundary of the first target area may not be adjusted at this time, but by directly deleting part or all of the area other than the corresponding target in the first target area , so that the area ratio of the corresponding target object in the first target area can also be increased.
  • the proportion of the area corresponding to the target object in the first target area is increased. Therefore, the depth information of the tracking area obtained by adjusting at this time is , can obtain more accurate distance information between the movable platform and the target, and then can more accurately control the movement of the movable platform relative to the target, avoid potential safety hazards, and provide the movement performance of the movable platform.
  • the first thing to do is to distinguish the area corresponding to the target in the first target area from other areas. For this, semantic segmentation can be used.
  • the area corresponding to the target object in the first target area can be determined, so that the first target area can be further adjusted, Make the boundary of the first target area shrink in the direction close to the target; or delete all or part of the other areas in the area that does not correspond to the target in the first target area to increase the corresponding target in the first target area area proportion.
  • the possibility of using the above two means together cannot be ruled out. The following will introduce how to adjust the first target area in combination with semantic segmentation, so that the boundary of the first target area shrinks in a direction close to the target.
  • the adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the target may include: adjusting the depth map to include the target Semantic segmentation is performed on the first target area of (that is, using image semantic segmentation technology), and the first target area is adjusted based on the semantic segmentation result.
  • Image semantic segmentation technology refers to the use of computer vision, image processing technology, etc. to mark different types of objects in an image, and mark the pixel positions of different types of objects on the image. As shown in Figure 9, given a photo of a person riding a motorcycle (left image), based on image semantic segmentation technology, it can be segmented according to the different semantic meanings expressed in the image (right image), and the image shown on the right can be obtained. Pixel locations where objects of different classes (people and motorcycles) are located.
  • Image semantic segmentation technology is the cornerstone of image understanding and an important part of computer vision.
  • image semantic segmentation techniques such as pixel-level-based "Thresholding methods", pixel clustering-based image semantic segmentation techniques (Clustering-based segmentation methods), "Graph partitioning” segmentation methods (Graph partitioning) segmentation methods) and so on.
  • image semantic segmentation technology based on fully convolutional neural network
  • image semantic segmentation technology based on Dilated Convolutions, etc.
  • the image semantic segmentation for the first target area described in this application may adopt any technology capable of realizing image semantic segmentation, which is not limited in this application.
  • the image semantic segmentation technology is used to perform image semantic segmentation on the first target area including the target object on the depth map, and the first target area is adjusted based on the image semantic segmentation result. , so that the proportion of the area corresponding to the target object in the target area is increased, which is accurate and effective. Therefore, according to the depth information of the tracking area obtained after adjustment, more accurate distance information between the movable platform and the target can be obtained, and then the movement of the movable platform relative to the target can be more accurately controlled to avoid safety hidden dangers, providing the movement performance of the movable platform.
  • performing image semantic segmentation on the first target region may use a deep learning-based image semantic segmentation technology. Then, perform semantic segmentation on the first target area, and adjust the first target area based on the semantic segmentation result, as shown in Figure 10, including:
  • Step S1001 performing image semantic segmentation on the depth map based on a pre-trained deep learning model
  • Step S1002 adjusting the first target area according to the semantic segmentation result.
  • the depth map to be semantically segmented can be input into the pre-trained deep learning model, and the deep learning model can output the depth map that has undergone image semantic segmentation processing. , that is, to mark the pixel positions where objects of different categories in the depth map are marked, that is, to obtain a depth map similar to that shown in Figure 9.
  • the first target area including the target object on the depth map is adjusted, so that the area corresponding to the target object in the first target area occupies an The ratio increases, that is, similarly, the first target area containing the target object shown in FIG. 5B can be adjusted to the first target area containing the target object shown in FIG. 8 .
  • the above describes how to adjust the first target area in combination with semantic segmentation, so that the boundary of the first target area shrinks in the direction close to the target object, so as to increase the area ratio of the corresponding target object in the first target area. It can be understood that, it can also be combined with semantic segmentation to delete all or part of the other areas in the area that does not correspond to the target in the first target area, so as to increase the proportion of the area corresponding to the target in the first target area.
  • the implementation process is similar to the process described above, the only difference is that after distinguishing between the target object and the non-target object on the first target area through the semantic segmentation result, the area corresponding to the non-target object in the first target area is deleted. , since the principle is similar to that of the previous embodiment, it will not be introduced here.
  • the training process of the deep learning model includes: acquiring semantic labels corresponding to pixels in a training image output by the deep learning model, calculating a first loss function based on the semantic labels, and calculating a first loss function based on the first loss function.
  • a loss function trains the deep learning model.
  • the deep learning model can be designed as required, and can include at least one of a convolution layer, a batch normalization layer, a nonlinear activation layer, etc., or an existing deep learning model for image semantic segmentation. No restrictions.
  • the initial parameters of the deep learning model may be determined through pre-training, or may be determined according to empirical values, which are not limited in this application.
  • the training samples with semantic labels can be input into the deep learning model to be trained, and the deep learning model can be trained based on the predefined first loss function until the first loss function Convergence or less than the specified threshold.
  • the training image may be a depth map represented by a grayscale image.
  • the first loss function may be designed according to training needs, and may also use the first loss function commonly used in image semantic segmentation technology based on a deep learning model, which is not limited in this application.
  • the first loss function may be a cross-entropy loss function.
  • y is the semantic segmentation label on each pixel
  • y' is the predicted value on the pixel, that is, the output result of the deep learning network on the pixel value.
  • the image semantic segmentation technology based on the deep learning model is used to perform image semantic segmentation on the first target area including the target object on the depth map, and based on the image semantic segmentation result, adjust the image semantics.
  • the first target area is selected, so that the proportion of the area corresponding to the target object in the target area is increased. Due to the semantic segmentation of the image based on the deep learning model, the model can learn more information. Therefore, according to the depth information of the tracking area obtained after adjustment, more accurate distance information between the movable platform and the target can be obtained, which can be more accurate.
  • the movement of the movable platform relative to the target is accurately controlled, safety hazards are avoided, and the movement performance of the movable platform is provided.
  • the deep learning model may be trained twice, that is, before training the deep learning model based on the first loss function, the The training process further includes: acquiring a class label corresponding to the training image output by the deep learning model, calculating a second loss function based on the class label, training the deep learning model based on the second loss function, The initial parameters of the deep learning model before the first loss function trains the deep learning model.
  • FIG. 11A shows the supervised classification learning of the deep learning model on the grayscale image classification task, so that the deep learning model has high sensitivity to semantic objects.
  • the specific implementation of an exemplary embodiment is as follows:
  • the deep learning model is trained by using a single-channel grayscale image and the category label corresponding to each image.
  • the category is selected as the target category of the target.
  • the category is the target category expected to follow. , for example: people, cars, boats, etc.
  • FIG. 11A three categories of people, vehicles, and boats are given as illustrative examples.
  • the training data is input to the deep learning model, which outputs a vector P of length N, where N is the number of the classes.
  • the N vectors P respectively correspond to the predicted probability pi of the i-th target.
  • back-propagation learning is performed on the deep learning model, and the deep learning model is trained until the first loss function converges or falls below a threshold.
  • the training data is a grayscale image.
  • the grayscale image may also include a grayscale image converted from the color image, that is, as a grayscale image. Fake depth maps to increase the amount of training data.
  • the second loss function is a cross-entropy loss function.
  • the second loss function may be:
  • N is the batch size of the training data
  • M is the number of categories
  • y ic is the target category label corresponding to the ith input image
  • pic is the predicted probability corresponding to the ith target.
  • the network Based on the second loss function, the network performs supervised classification learning on the grayscale image classification task.
  • the trained network shows a higher response on the feature map for a specific input target, as shown in Figure 11B.
  • the deep learning model is trained on the grayscale image semantic segmentation task to obtain the final image semantic segmentation model.
  • the specific implementation of an exemplary embodiment is:
  • the network parameters obtained by training the deep learning model on the grayscale image classification task described above are used as pre-training parameters to initialize the deep learning model; the training data of the deep learning model adopts single-channel grayscale.
  • back-propagation learning is performed on the deep learning network based on the first loss function.
  • the determined deep learning network can be more adapted to the tasks on grayscale images, and overcome the limited information contained in grayscale images. The defect that the deep learning network performance is not good enough.
  • Inputting the depth map including the target into the deep learning model determined after two training sessions can more accurately perform semantic segmentation on the target, and then more accurately determine the first target area.
  • the real distance information between the target object and the movable platform can be obtained more accurately.
  • the boundary between the target and the background is blurred.
  • the first target obtained by semantically segmenting the depth map area, which may be larger than the area where the target is actually located.
  • the left picture of Fig. 12 is a depth map obtained based on the deep learning model, wherein the white human-shaped outline is the semantic segmentation result determined based on the deep learning model; The white outline is the first target area where the actual target (person) is located.
  • corrosion processing in the morphological operation may be performed on the semantic segmentation result output by the deep learning model.
  • the semantic segmentation result output by the deep learning model may be convolved to reduce the segmentation area representing the prime number target in the semantic segmentation result, and based on the convolved segmentation result, the first target area.
  • the specific size of the convolution operation can be determined according to the actual situation, for example, a 5*5 size convolution kernel can be used to convolve the semantic segmentation result output by the deep learning model.
  • performing convolution on the semantic segmentation result output by the deep learning model includes: during each convolution, taking the minimum pixel value of the area where the convolution kernel is located as the difference between the center point of the area where the convolution kernel is located Pixel values.
  • the pixel value of the area where the convolution kernel is located can also be weighted as the pixel value of the center point of the area where the convolution kernel is located; it is also possible to determine the volume in other ways.
  • the pixel value of the center point of the region where the product kernel is located which is not limited in this application.
  • the first target area is adjusted so that the proportion of the area corresponding to the target object in the first target area
  • the moving object segmentation technology based on frame difference statistics can also be used.
  • the moving object segmentation technology based on frame difference statistics is a method for obtaining the contour of a moving object by performing a difference operation on two consecutive frames of a video image sequence.
  • the target object in the monitoring scene is a moving object
  • the two frames are subtracted to obtain the absolute value of the pixel value difference at the corresponding position of the image to determine whether it is greater than a certain threshold, and then
  • the motion characteristics of the object of the video or image sequence are analyzed.
  • the contour of the target object can be obtained, and then the first target area including the target object in the depth map can be determined.
  • the moving object segmentation technology based on frame difference statistics can be implemented by: performing a difference operation on the depth map and its adjacent frame depth maps, and determining the depth map based on the difference result For the contour of the target object, the area included in the contour is regarded as the adjusted area.
  • 5B is the depth map of the scene where the target object is obtained by the movable platform.
  • the target object is a moving person
  • a difference operation is performed on the depth map acquired at a certain moment and the adjacent depth map acquired at a previous (or subsequent) moment. Find the absolute value of the pixel value at the corresponding position of the two image frames.
  • the contour of the target (person) can be obtained based on the obtained absolute values of the pixel values corresponding to the two image frames, and based on the contour, it is possible to re-determine Only the first target area of the target is included. Calculating the distance between the target object and the movable platform based on the distance information of the first target area can be more accurate and real.
  • controlling the movable platform to move relative to the target object according to the depth information of the tracking area includes: The depth information of the tracking area is controlled to control the movable platform to follow the movement of the target.
  • the determined depth map includes the first target area of the target object, and the area where the target object is located accounts for a large proportion. Based on the first target area, the distance information between the target object and the movable platform is obtained by using the method of the average distance in the area. Based on the distance information, the movable platform can be more accurately controlled to the target. The following movement of things.
  • control method of the mobile platform of the present application can also be used in many scenarios such as intelligent vehicle obstacle avoidance, intelligent robot ranging and so on.
  • the movable platform moves relative to the target, and may also control the running speed and running direction of the smart car, control the forward direction and forward speed of the smart robot, etc., which are not limited in this application.
  • FIG. 15 is another method for controlling a movable platform shown in an exemplary embodiment of the present application, including the following steps:
  • acquiring a first target area including the target on the depth map includes: acquiring a second target area including the target, where the second target area is located in the area collected by the movable platform. on the color image; project the second target area onto the depth map to obtain the first target area.
  • projecting the second target area onto the depth map includes: projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  • projecting the second target area onto the depth map further comprising: extracting a first feature point on the color image and a second feature point on the depth map respectively; Feature matching is performed between the first feature point and the second feature point; based on the result of the feature matching, the obtained first target area of the depth map is corrected.
  • the deleting all or part of other regions in the first target region that does not correspond to the region of the target includes: performing semantic segmentation on the first target region, and deleting all the regions based on the results of the semantic segmentation. All or part of other areas in the first target area that do not correspond to the area of the target.
  • the performing semantic segmentation on the first target area, and deleting all or part of other areas in the first target area that do not correspond to the area of the target object based on the result of the semantic segmentation includes:
  • the trained deep learning model performs semantic segmentation on the depth map; according to the semantic segmentation result, deletes all or part of other areas in the first target area that do not correspond to the area of the target.
  • the training process of the deep learning model includes: acquiring semantic labels corresponding to pixels in a training image output by the deep learning model, calculating a first loss function based on the semantic segmentation labels, and calculating a first loss function based on the first loss. function to train the deep learning model.
  • the training process before training the deep learning model based on the semantic label, the training process further includes: acquiring a class label corresponding to the training image output by the deep learning model, and calculating a second loss based on the class label. function, training the deep learning model based on the second loss function, and determining initial parameters of the deep learning model before training the deep learning model based on the first loss function.
  • the first loss function and/or the second loss function is a cross-entropy loss function.
  • the training images include grayscale images.
  • deleting all or part of other areas in the first target area that do not correspond to the area of the target object further includes: a semantic segmentation result output by the deep learning model. Convolution is performed using a preset convolution kernel to reduce the segmented area representing the target in the semantic segmentation result, and the first target area is adjusted based on the convolved semantic segmentation result.
  • performing convolution on the semantic segmentation result output by the deep learning model includes: during each convolution, taking the minimum pixel value of the area where the convolution kernel is located as the pixel value of the center point of the area where the convolution kernel is located. .
  • the deleting all or part of other regions in the first target region that does not correspond to the region of the target includes: performing a difference operation on the depth map and its adjacent frame depth maps, and Based on the difference result, the contour of the target object on the depth map is determined, and other regions other than the region included in the contour are deleted.
  • controlling the movable platform to move relative to the target object according to the depth information of the tracking area includes: controlling the movable platform to follow the movement of the target object according to the depth information of the tracking area.
  • the present application also provides a movable platform, as shown in FIG.
  • the image acquisition device is used to acquire the depth map of the scene where the target object is located;
  • the memory 1302 is used to store program codes
  • the processor 1303 calls the program code, and when the program code is executed, is used to perform the following operations:
  • the movable platform is controlled to move relative to the target.
  • the processor 1303 is further configured to perform the following operations:
  • the movable platform may be a drone, a smart car, a smart robot, etc., which is not limited in this application.
  • the target area including the target object is obtained on the depth map of the scene where the target object is located, and the target area is adjusted so that the target area corresponds to the target area.
  • the proportion of the area of the target object is increased to obtain a tracking area, and then the movable platform is controlled to move relative to the target object according to the depth information of the tracking area. Since the target area is adjusted in the present application, the proportion of the area corresponding to the target object in the target area is increased. Therefore, according to the depth information of the tracking area, a more accurate movable platform and target can be obtained.
  • the distance information of the object can then be more accurately controlled to control the movement of the movable platform relative to the target object, avoiding potential safety hazards and providing the movement performance of the movable platform.
  • control device includes: a memory 1401 and a processor 1402.
  • the memory 1401 is used to store program codes
  • the processor 1402 calls the program code, and when the program code is executed, is used to perform the following operations:
  • the movable platform is controlled to move relative to the target.
  • the processor 1402 is further configured to perform the following operations:
  • the movable platform Acquiring the depth map of the scene where the target object is collected by the movable platform; acquiring the first target area including the target object on the depth map; deleting the area in the first target area that does not correspond to the target object The whole or part of other areas of the tracking area is obtained; according to the depth information of the tracking area, the movable platform is controlled to move relative to the target object.
  • the target area including the target object is obtained on the depth map of the scene where the target object is located, and the target area is adjusted so that the target area corresponds to the target
  • the proportion of the area of the object is increased to obtain a tracking area, and then the movable platform is controlled to move relative to the target object according to the depth information of the tracking area. Since the target area is adjusted in the present application, the proportion of the area corresponding to the target object in the target area is increased. Therefore, according to the depth information of the tracking area, a more accurate movable platform and target can be obtained.
  • the distance information of the object can then be more accurately controlled to control the movement of the movable platform relative to the target object, avoiding potential safety hazards and providing the movement performance of the movable platform.
  • a machine-readable storage medium stores a computer program, which, when executed on a computer, implements any embodiment of the above method of the present application, It is not repeated here.
  • the machine-readable storage medium may be an internal storage unit of the device described in any of the foregoing embodiments, such as a hard disk or a memory of the device.
  • the machine-readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card equipped on the device , Flash Card (Flash Card) and so on.
  • the machine-readable storage medium may also include both an internal storage unit of the device and an external storage device.
  • the machine-readable storage medium is used to store the computer program and other programs and data required by the apparatus.
  • the computer-readable storage medium can also be used to temporarily store data that has been or will be output.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de commande d'une plateforme mobile, consistant à : acquérir une carte de profondeur d'une scène où un objet cible est situé collecté par une plateforme mobile (S201) ; acquérir une première zone cible couvrant l'objet cible sur la carte de profondeur (S202) ; régler la première zone cible pour faire augmenter la proportion de la zone correspondant à l'objet cible dans la première zone cible, de façon à obtenir une zone de suivi (S203) ; et commander, en fonction d'informations de profondeur de la zone de suivi, la plateforme mobile pour qu'elle se déplace par rapport à l'objet cible (S204).
PCT/CN2021/072581 2021-01-18 2021-01-18 Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine WO2022151507A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/072581 WO2022151507A1 (fr) 2021-01-18 2021-01-18 Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/072581 WO2022151507A1 (fr) 2021-01-18 2021-01-18 Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine

Publications (1)

Publication Number Publication Date
WO2022151507A1 true WO2022151507A1 (fr) 2022-07-21

Family

ID=82446822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/072581 WO2022151507A1 (fr) 2021-01-18 2021-01-18 Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine

Country Status (1)

Country Link
WO (1) WO2022151507A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971380A (zh) * 2014-05-05 2014-08-06 中国民航大学 基于rgb-d的行人尾随检测方法
CN104751491A (zh) * 2015-04-10 2015-07-01 中国科学院宁波材料技术与工程研究所 一种人群跟踪及人流量统计方法及装置
US20160110610A1 (en) * 2014-10-15 2016-04-21 Sony Computer Entertainment Inc. Image processor, image processing method, and computer program
CN110400338A (zh) * 2019-07-11 2019-11-01 Oppo广东移动通信有限公司 深度图处理方法、装置和电子设备
CN111582155A (zh) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 活体检测方法、装置、计算机设备和存储介质
CN112223278A (zh) * 2020-09-09 2021-01-15 山东省科学院自动化研究所 一种基于深度视觉信息的探测机器人跟随方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971380A (zh) * 2014-05-05 2014-08-06 中国民航大学 基于rgb-d的行人尾随检测方法
US20160110610A1 (en) * 2014-10-15 2016-04-21 Sony Computer Entertainment Inc. Image processor, image processing method, and computer program
CN104751491A (zh) * 2015-04-10 2015-07-01 中国科学院宁波材料技术与工程研究所 一种人群跟踪及人流量统计方法及装置
CN110400338A (zh) * 2019-07-11 2019-11-01 Oppo广东移动通信有限公司 深度图处理方法、装置和电子设备
CN111582155A (zh) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 活体检测方法、装置、计算机设备和存储介质
CN112223278A (zh) * 2020-09-09 2021-01-15 山东省科学院自动化研究所 一种基于深度视觉信息的探测机器人跟随方法及系统

Similar Documents

Publication Publication Date Title
CN110163904B (zh) 对象标注方法、移动控制方法、装置、设备及存储介质
US11645765B2 (en) Real-time visual object tracking for unmanned aerial vehicles (UAVs)
US9990736B2 (en) Robust anytime tracking combining 3D shape, color, and motion with annealed dynamic histograms
WO2019179464A1 (fr) Procédé de prédiction de direction de déplacement d'un objet cible, procédé de commande de véhicule, et dispositif
Huang et al. Robust inter-vehicle distance estimation method based on monocular vision
US11010622B2 (en) Infrastructure-free NLoS obstacle detection for autonomous cars
US20210326601A1 (en) Keypoint matching using graph convolutions
JP7135665B2 (ja) 車両制御システム、車両の制御方法及びコンピュータプログラム
US20210237774A1 (en) Self-supervised 3d keypoint learning for monocular visual odometry
US20210103299A1 (en) Obstacle avoidance method and device and movable platform
CN110969064B (zh) 一种基于单目视觉的图像检测方法、装置及存储设备
CN111738033B (zh) 基于平面分割的车辆行驶信息确定方法及装置、车载终端
CN112654998B (zh) 一种车道线检测方法和装置
WO2020010620A1 (fr) Procédé et appareil d'identification d'onde, support d'informations lisible par ordinateur et véhicule aérien sans pilote
CN113128430B (zh) 人群聚集检测方法、装置、电子设备和存储介质
WO2024001617A1 (fr) Procédé et appareil pour identifier un comportement de lecture avec un téléphone mobile
WO2022151507A1 (fr) Plateforme mobile et procédé et appareil de commande de celle-ci, et support de stockage lisible par machine
CN110175523B (zh) 一种自移动机器人动物识别与躲避方法及其存储介质
CN112733678A (zh) 测距方法、装置、计算机设备和存储介质
Pinard et al. End-to-end depth from motion with stabilized monocular videos
CN112016394A (zh) 障碍物信息获取方法、避障方法、移动装置及计算机可读存储介质
CN111126170A (zh) 一种基于目标检测与追踪的视频动态物检测方法
CN113723432B (zh) 一种基于深度学习的智能识别、定位追踪的方法及系统
Onkarappa et al. On-board monocular vision system pose estimation through a dense optical flow
US20230419522A1 (en) Method for obtaining depth images, electronic device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21918707

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21918707

Country of ref document: EP

Kind code of ref document: A1