WO2022151507A1 - Movable platform and method and apparatus for controlling same, and machine-readable storage medium - Google Patents

Movable platform and method and apparatus for controlling same, and machine-readable storage medium Download PDF

Info

Publication number
WO2022151507A1
WO2022151507A1 PCT/CN2021/072581 CN2021072581W WO2022151507A1 WO 2022151507 A1 WO2022151507 A1 WO 2022151507A1 CN 2021072581 W CN2021072581 W CN 2021072581W WO 2022151507 A1 WO2022151507 A1 WO 2022151507A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
area
target area
depth map
movable platform
Prior art date
Application number
PCT/CN2021/072581
Other languages
French (fr)
Chinese (zh)
Inventor
施泽浩
封旭阳
聂谷洪
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2021/072581 priority Critical patent/WO2022151507A1/en
Publication of WO2022151507A1 publication Critical patent/WO2022151507A1/en

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Abstract

A method for controlling a movable platform, comprising: acquiring a depth map of a scene where a target object is located collected by a movable platform (S201); acquiring a first target area covering the target object on the depth map (S202); adjusting the first target area to make the proportion of the area corresponding to the target object in the first target area is increased, so as to obtain a tracking area (S203); and controlling, according to depth information of the tracking area, the movable platform to move with respect to the target object (S204).

Description

可移动平台及其控制方法、控制装置及机器可读存储介质Movable platform, control method, control device, and machine-readable storage medium thereof 技术领域technical field
本申请涉及智能感知领域,尤其涉及一种可移动平台的控制方法、可移动平台、控制装置及机器可读存储介质。The present application relates to the field of intelligent perception, and in particular, to a control method of a movable platform, a movable platform, a control device and a machine-readable storage medium.
背景技术Background technique
利用可移动平台,例如无人机、智能汽车、智能机器人等,对目标物进行智能感知,获取目标物与可移动平台的距离信息,以相应地控制可移动平台执行例如跟随、避障、信息交互等动作,是当前智能感知领域的一个研究热点。Use movable platforms, such as drones, smart cars, intelligent robots, etc., to intelligently perceive the target, obtain the distance information between the target and the movable platform, and control the movable platform accordingly. Actions such as interaction are currently a research hotspot in the field of intellisense.
相关技术中,通常采用获取包含目标物的深度图,在深度图上投影目标框,并对目标框中的深度值取平均的方式,获取目标物与可移动平台的距离信息。但是,基于上述方法所获取的目标物与可移动平台的距离信息常常存在不准确的问题,进而使得控制可移动平台所执行的跟随、避障、信息交互等操作存在不准确的缺陷,甚至带来安全隐患。In the related art, the distance information between the target object and the movable platform is usually obtained by acquiring a depth map including the target object, projecting the target frame on the depth map, and averaging the depth values in the target frame. However, the distance information between the target object and the movable platform obtained based on the above method is often inaccurate, which makes the following, obstacle avoidance, information interaction and other operations performed by controlling the movable platform inaccurate. to safety hazards.
发明内容SUMMARY OF THE INVENTION
为克服相关技术中所存在的由于所获取的目标物与可移动平台的距离信息不准确,进而使得控制可移动平台所执行的跟随、避障、信息交互等操作存在不准确,甚至带来安全隐患的问题,本申请提供了一种可移动平台及其控制方法、控制装置及机器可读存储介质。In order to overcome the inaccurate distance information obtained between the target object and the movable platform in the related art, the following, obstacle avoidance, information interaction and other operations performed by controlling the movable platform are inaccurate, and even bring safety. The present application provides a movable platform and its control method, control device and machine-readable storage medium.
根据本申请实施例的第一方面,提供一种可移动平台的控制方法,所述方法包括:获取所述可移动平台采集得到的目标物所在场景的深度图;获取所述深度图上包括所述目标物的第一目标区域;调整所述第一目标区 域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to a first aspect of the embodiments of the present application, a method for controlling a movable platform is provided. The method includes: acquiring a depth map of a scene where a target object is located and collected by the movable platform; the first target area of the target object; adjust the first target area to increase the proportion of the area corresponding to the target object in the first target area to obtain a tracking area; according to the depth information of the tracking area , controlling the movable platform to move relative to the target.
根据本申请实施例的第二方面,提供一种可移动平台,所述可移动平台包括:图像采集装置、存储器和处理器;所述图像采集装置,用于获取目标物所在场景的深度图;所述存储器用于存储程序代码;所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取所述深度图以及所述深度图上包括所述目标物的第一目标区域;调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to a second aspect of the embodiments of the present application, a movable platform is provided, the movable platform includes: an image acquisition device, a memory, and a processor; the image acquisition device is configured to acquire a depth map of a scene where a target object is located; The memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed, is used for performing the following operations: acquiring the depth map and the first depth map including the target object a target area; adjust the first target area to increase the proportion of the area corresponding to the target object in the first target area to obtain a tracking area; control the available tracking area according to the depth information of the tracking area The mobile platform moves relative to the target.
根据本申请实施例的第三方面,提供一种控制装置,所述控制装置包括:存储器和处理器;所述存储器用于存储程序代码;所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取可移动平台采集得到的目标物所在场景的深度图;获取所述深度图上包括所述目标物的第一目标区域;调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to a third aspect of the embodiments of the present application, a control device is provided, the control device includes: a memory and a processor; the memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed When executed, it is used to perform the following operations: obtain the depth map of the scene where the target object is located and collected by the movable platform; obtain the first target area including the target object on the depth map; adjust the first target area to Increasing the proportion of the area corresponding to the target in the first target area to obtain a tracking area; and controlling the movable platform to move relative to the target according to the depth information of the tracking area.
根据本申请实施例的第四方面,提供一种可移动平台,所述可移动平台包括:图像采集装置、存储器和处理器;所述图像采集装置,用于获取目标物所在场景的深度图;所述存储器用于存储程序代码;所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取所述深度图以及所述深度图上包括所述目标物的第一目标区域;删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to a fourth aspect of the embodiments of the present application, a movable platform is provided, the movable platform includes: an image acquisition device, a memory, and a processor; the image acquisition device is configured to acquire a depth map of a scene where a target object is located; The memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed, is used for performing the following operations: acquiring the depth map and the first depth map including the target object a target area; delete all or part of other areas in the first target area that do not correspond to the target object to obtain a tracking area; control the movable platform relative to the tracking area according to the depth information of the tracking area The target moves.
根据本申请实施例的第五方面,提供一种控制装置,所述控制装置包括:存储器和处理器;所述存储器用于存储程序代码;所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:获取所述可移动平台采集得到的目标物所在场景的深度图;获取所述深度图上包括所述目标物的第一目标区域;删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to a fifth aspect of the embodiments of the present application, a control device is provided, the control device includes: a memory and a processor; the memory is used for storing program codes; the processor calls the program codes, and when the program codes are executed When executed, it is used to perform the following operations: obtain the depth map of the scene where the target object is located and collected by the movable platform; obtain the first target area including the target object on the depth map; delete the first target area A tracking area is obtained from all or part of other areas in the area not corresponding to the target object; and the movable platform is controlled to move relative to the target object according to the depth information of the tracking area.
根据本申请实施例的第六方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被执行时实现以上任意实施例所述的方法。According to a sixth aspect of the embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed, the method described in any of the above embodiments is implemented.
本申请的实施例提供的技术方案可以包括以下有益效果:The technical solutions provided by the embodiments of the present application may include the following beneficial effects:
在本申请的实施例中,通过获取目标物所在场景的深度图上包括所述目标物的目标区域,并对所述目标区域进行调整,以使所述目标区域中对应所述目标物的区域的占比增加,得到跟踪区域,进而根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。由于本申请对所述目标区域进行调整,使得所述目标区域中对应所述目标物的区域的占比增加,因此,根据所述跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而能够更加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。In the embodiment of the present application, the target area including the target object is obtained on the depth map of the scene where the target object is located, and the target area is adjusted so that the target area corresponds to the area of the target object The proportion of the tracking area is increased, and the tracking area is obtained, and then the movable platform is controlled to move relative to the target object according to the depth information of the tracking area. Since the target area is adjusted in the present application, the proportion of the area corresponding to the target object in the target area is increased. Therefore, according to the depth information of the tracking area, a more accurate movable platform and target can be obtained. The distance information of the object can then be more accurately controlled to control the movement of the movable platform relative to the target object, avoiding potential safety hazards and providing the movement performance of the movable platform.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not limiting of the present application.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造 性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1是本申请根据一示例性实施例示出的一种可以用于获取目标物距离信息的深度图的示意图。FIG. 1 is a schematic diagram of a depth map that can be used to obtain target object distance information according to an exemplary embodiment of the present application.
图2是本申请根据一示例性实施例示出的一种可移动平台的控制方法的流程图。Fig. 2 is a flowchart of a method for controlling a movable platform according to an exemplary embodiment of the present application.
图3是本申请根据一示例性实施例示出的一种获取深度图上包括目标物的第一目标区域的流程图。FIG. 3 is a flowchart of acquiring a first target area including a target on a depth map according to an exemplary embodiment of the present application.
图4是本申请根据一示例性实施例示出的一种理想情况下,可移动平台的主摄像头和双目摄像头所拍摄的图像帧在缓存中的对齐关系。FIG. 4 shows the alignment relationship in the cache of image frames captured by the main camera of the movable platform and the binocular camera in an ideal situation according to an exemplary embodiment of the present application.
图5A是本申请根据一示例性实施例示出的一种可移动平台所获取的包含目标物的彩色图像。FIG. 5A is a color image including a target obtained by a movable platform according to an exemplary embodiment of the present application.
图5B是本申请根据一示例性实施例示出的一种可移动平台所获取的包含目标物的深度图。FIG. 5B is a depth map including a target obtained by a movable platform according to an exemplary embodiment of the present application.
图6是本申请根据一示例性实施例示出的一种实际情况下,可移动平台的主摄像头和双目摄像头所拍摄的图像帧在缓存中的对齐关系。FIG. 6 is an alignment relationship in the cache of image frames captured by the main camera of the movable platform and the binocular camera in an actual situation according to an exemplary embodiment of the present application.
图7是本申请根据一示例性实施例示出的基于特征匹配的方法修正深度图的第一目标区域的流程图。FIG. 7 is a flow chart of correcting the first target area of the depth map by a method based on feature matching according to an exemplary embodiment of the present application.
图8是本申请根据一示例性实施例示出的基于图像语义分割的方法调整第一目标区域的效果图。FIG. 8 is an effect diagram of adjusting the first target area by a method based on image semantic segmentation according to an exemplary embodiment of the present application.
图9是本申请根据一示例性实施例示出的一种图像语义分割方法对于图像进行语义分割的效果图。FIG. 9 is an effect diagram of performing semantic segmentation on an image by an image semantic segmentation method according to an exemplary embodiment of the present application.
图10是本申请根据一示例性实施例示出的一种基于深度学习模型的语义分割结果,调整第一目标区域的流程图。FIG. 10 is a flowchart of adjusting the first target area based on a semantic segmentation result based on a deep learning model according to an exemplary embodiment of the present application.
图11A是本申请根据一示例性实施例示出的一种用于图像语义分割 的深度学习模型的示意图。Fig. 11A is a schematic diagram of a deep learning model for image semantic segmentation according to an exemplary embodiment of the present application.
图11B是本申请根据一示例性实施例示出的一种深度学习模型对目标物的特征响应图。FIG. 11B is a feature response diagram of a deep learning model to a target according to an exemplary embodiment of the present application.
图12是本申请根据一示例性实施例示出的一种形态操作的腐蚀处理的效果图。FIG. 12 is an effect diagram of etching processing of a morphological operation according to an exemplary embodiment of the present application.
图13是本申请根据一示例性实施例示出的一种可移动平台的结构示意图。FIG. 13 is a schematic structural diagram of a movable platform according to an exemplary embodiment of the present application.
图14是本申请根据一示例性实施例示出的一种控制装置的结构示意图。FIG. 14 is a schematic structural diagram of a control device according to an exemplary embodiment of the present application.
图15是本申请根据一示例性实施例示出的另一种可移动平台的控制方法的流程图。Fig. 15 is a flowchart of another method for controlling a movable platform according to an exemplary embodiment of the present application.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as recited in the appended claims.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in this application is for the purpose of describing particular embodiments only and is not intended to limit the application. As used in this specification and the appended claims, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的 信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this application to describe various information, such information should not be limited by these terms. These terms are only used to distinguish information of the same type from one another. For example, the first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information without departing from the scope of the present application. Depending on the context, the word "if" as used herein can be interpreted as "at the time of" or "when" or "in response to determining."
目前,在智能感知领域中,利用可移动平台,例如无人机、智能汽车、智能机器人等自身的或者搭载在其上的采集装置,获取目标物与可移动平台的距离信息,并依据所获得的距离信息相应地控制可移动平台执行例如跟随、避障、信息交互等动作,是一个研究热点。然而,相关技术中,所获取的目标物与可移动平台的距离信息,通常存在不准确的问题,进而使得控制可移动平台所执行的跟随、避障、信息交互等操作存在不准确的缺陷,甚至带来安全隐患。At present, in the field of intelligent perception, mobile platforms, such as unmanned aerial vehicles, smart cars, intelligent robots, etc. themselves or the acquisition devices mounted on them are used to obtain the distance information between the target and the mobile platform, and according to the obtained It is a research hotspot to control the movable platform to perform actions such as following, avoiding obstacles, and information interaction accordingly. However, in the related art, the obtained distance information between the target object and the movable platform usually has the problem of inaccuracy, which makes the following, obstacle avoidance, information interaction and other operations performed by controlling the movable platform inaccurate. even bring security risks.
以所述可移动平台为无人机、所述应用场景为无人机智能跟随目标物为例,进行示例性说明。Taking the movable platform as an unmanned aerial vehicle and the application scenario as an example of an unmanned aerial vehicle intelligently following a target, an exemplary description is given.
现有的无人机智能跟随技术中,无人机对跟随目标物的观测,其中一个重要内容就是对目标距离进行监控,以控制无人机与跟随目标之间的距离。因此,无人机能够准确获取与跟随目标之间的距离,就尤为重要。In the existing UAV intelligent following technology, one of the important contents of the UAV's observation of the following target is to monitor the target distance to control the distance between the UAV and the following target. Therefore, it is particularly important that the UAV can accurately obtain the distance between the target and the following target.
相关技术中,通过采用如下方法获取无人机与跟随目标之间的距离:无人机上通常搭载有主摄像头和双目摄像头。其中,主摄像头能够获取包含跟随目标物所在场景的彩色图像,双目摄像头能够获取包含目标物所在场景的双目深度图。通过从主摄像头所拍摄的彩色图像中,提取包含目标物的目标框,将所述目标框投影到双目深度图上,对在深度图上投影的目标框中的深度值取平均值作为目标物与无人机之间的距离。In the related art, the following method is used to obtain the distance between the drone and the following target: the drone is usually equipped with a main camera and a binocular camera. Among them, the main camera can obtain a color image including the scene where the target object is located, and the binocular camera can obtain a binocular depth map including the scene where the target object is located. By extracting the target frame containing the target from the color image captured by the main camera, projecting the target frame onto the binocular depth map, and taking the average value of the depth values in the target frame projected on the depth map as the target distance between the object and the drone.
然而,在无人机跟随目标的过程中,会存在目标物被其他物体遮挡的情况,那么,根据跟随目标框区域在双目深度图上获得的深度信息含有背景噪声,会导致无人机测距不准,跟随不稳定的问题。However, in the process of the drone following the target, the target may be occluded by other objects. Then, the depth information obtained on the binocular depth map according to the following target frame area contains background noise, which will cause the drone to measure The distance is not allowed, and the problem of following instability.
如图1所示,所述目标框(灰色矩形所示)内的目标物体被草丛遮挡。那么,由于遮挡目标物的草丛距离无人机比较近,基于所述目标框内的平均值所计算出的无人机与目标物的距离就比真实距离近,这种深度估计不准确,会造成无人机在目标走远后无法迅速跟上。As shown in Figure 1, the target object in the target frame (shown by the gray rectangle) is occluded by grass. Then, since the grass covering the target is relatively close to the UAV, the distance between the UAV and the target calculated based on the average value in the target frame is closer than the real distance. This kind of depth estimation is inaccurate and will Causes the drone to be unable to keep up quickly after the target is far away.
当然,本领域技术人员应当理解,上述应用场景仅为一个示例性说明。除了目标物被其他物体遮挡,还可以是目标物为运动物体、投影框投影不准确等其他原因,造成的所获取的目标物与可移动平台的距离信息不准确,本申请对此不作限制。所述目标框可以是矩形框,也可以是其他形状的目标框,本申请对此也不做限制。Of course, those skilled in the art should understand that the above application scenario is only an exemplary illustration. In addition to the target being blocked by other objects, the obtained distance information between the target and the movable platform may be inaccurate due to other reasons such as the target being a moving object and the inaccurate projection of the projection frame, which is not limited in this application. The target frame may be a rectangular frame or a target frame of other shapes, which is not limited in this application.
为了解决相关技术中,利用所获取的目标物与可移动平台的距离信息,控制所述可移动平台执行与所述距离信息有关的运动,由于所述距离信息不准确而带来的相关缺陷,本申请提供了一种可移动平台的控制方法,参见图2,是所述控制方法的流程图,所述控制方法可以包括如下步骤:In order to solve the related defects caused by the inaccuracy of the distance information in the related art, by using the obtained distance information between the target object and the movable platform to control the movable platform to perform the movement related to the distance information, The present application provides a control method for a movable platform. Referring to FIG. 2 , it is a flowchart of the control method. The control method may include the following steps:
步骤S201,获取所述可移动平台采集得到的目标物所在场景的深度图;Step S201, acquiring the depth map of the scene where the target object is collected and obtained by the movable platform;
步骤S202,获取所述深度图上包括所述目标物的第一目标区域;Step S202, obtaining a first target area including the target on the depth map;
步骤S203,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;Step S203, adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area is increased to obtain a tracking area;
步骤S204,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。Step S204, controlling the movable platform to move relative to the target according to the depth information of the tracking area.
在一些实施例中,所述可移动平台可以是无人飞行器、智能汽车、智能机器人、无人船等等,本申请对所述可移动平台的具体类型不做限定。In some embodiments, the movable platform may be an unmanned aerial vehicle, a smart car, an intelligent robot, an unmanned ship, etc., and the specific type of the movable platform is not limited in this application.
在一些实施例中,步骤S201中,获取所述可移动平台采集得到的目标物所在场景的深度图,可以通过基于双目立体视觉的方法获取,即基于视差原理,利用成像设备从不同的位置获取被测物体的两幅图像,通过计 算图像对应点的位置偏差,来获取物体三维位置信息,进而获得目标物所在场景的深度图。In some embodiments, in step S201, the depth map of the scene where the target object is collected and obtained by the movable platform can be obtained by a method based on binocular stereo vision, that is, based on the principle of parallax, using an imaging device from different positions Obtain two images of the object to be measured, and obtain the three-dimensional position information of the object by calculating the position deviation of the corresponding points of the images, and then obtain the depth map of the scene where the target object is located.
在一些实施例中,可以在所述可移动平台上搭载双目摄像头,利用所述双目摄像头获取目标物所在场景的图像。由于双目摄像头中的两个摄像头位于不同的位置,因此双目摄像头的每个摄像头所采集的图像,图像中对应点的位置存在偏差,基于所述偏差信息,可以提取出图像中物体的三维位置信息,获得目标物所在场景的深度图。In some embodiments, a binocular camera may be mounted on the movable platform, and the binocular camera may be used to obtain an image of the scene where the target object is located. Since the two cameras in the binocular camera are located at different positions, the position of the corresponding point in the image collected by each camera of the binocular camera has a deviation. Based on the deviation information, the three-dimensional image of the object in the image can be extracted. Location information to obtain the depth map of the scene where the target is located.
在一些实施例中,基于两幅具有视差的图像,获取目标物所在场景的深度图,可以由三角法原理进行三维信息的获取,即由双目摄像机所在的位置和目标物之间构成一个三角形,已知双目摄像头的两个摄像头之间的位置关系,便可以获得两个摄像头公共视场内物体的三维尺寸及空间物体特征点的三维坐标。当然,基于两幅具有视差的图像,获取目标物所在场景的深度图,还可以基于深度学习等方法来实现,本申请对此不做限制。In some embodiments, the depth map of the scene where the target is located is obtained based on two images with parallax, and the three-dimensional information can be obtained by the principle of trigonometry, that is, a triangle is formed between the position of the binocular camera and the target , knowing the positional relationship between the two cameras of the binocular camera, the three-dimensional size of the object in the common field of view of the two cameras and the three-dimensional coordinates of the feature points of the spatial object can be obtained. Of course, acquiring the depth map of the scene where the target object is based on two images with parallax can also be implemented based on methods such as deep learning, which is not limited in this application.
当然,本领域技术人员应当理解,获取所述可移动平台采集得到的目标物所在场景的深度图,除了可以基于双目立体视觉的方法获取,还可以基于激光雷达、超声波测距等能够获取目标物三维坐标的方式获得所述深度图,本申请对所述可移动平台获取目标物所在场景的深度图的具体方式不做限制。Of course, those skilled in the art should understand that, in order to obtain the depth map of the scene where the target object is collected by the movable platform, in addition to the method based on binocular stereo vision, the target can also be obtained based on laser radar, ultrasonic ranging, etc. The depth map is obtained by means of the three-dimensional coordinates of the object, and the present application does not limit the specific way for the movable platform to obtain the depth map of the scene where the target object is located.
在一些实施例中,本申请上述的可移动平台的控制方法中,步骤S202,获取所述深度图上包括所述目标物的第一目标区域,可以通过如图3所示的方式实现:In some embodiments, in the above-mentioned control method of a movable platform of the present application, step S202, obtaining the first target area including the target on the depth map, can be achieved by the manner shown in FIG. 3:
步骤S301,获取包含所述目标物的第二目标区域,所述目标区域位于所述可移动平台采集到的彩色图像上;Step S301, acquiring a second target area containing the target, and the target area is located on the color image collected by the movable platform;
步骤S302,将所述第二目标区域投影到所述深度图上,得到所述深度图上包括所述目标物的第一目标区域。Step S302, projecting the second target area onto the depth map to obtain a first target area including the target on the depth map.
在一些实施例中,可移动平台上可以搭载有主摄像头,所述主摄像头能够获取可移动平台所在位置周围空间的彩色图像。以消费级的无人机为例,所述主摄像头可以是挂在在无人机机身下面或者前侧的云台上的摄像头,用于采集摄像头视场范围内场景的图像,以对无人机周围的环境进行观测。In some embodiments, the movable platform may be equipped with a main camera capable of acquiring a color image of the space surrounding the location of the movable platform. Taking a consumer drone as an example, the main camera may be a camera hung under the drone body or on the pan/tilt on the front side, and is used to collect images of the scene within the camera's field of view, so as to detect the unmanned aerial vehicle. Observation of the environment around the man-machine.
在所述可移动平台利用其主摄像头采集到包含目标物的彩色图像之后,基于所述彩色图像,可以获取彩色图像上包含所述目标物的第二目标区域。After the movable platform uses its main camera to capture a color image containing the target, based on the color image, a second target area on the color image containing the target can be acquired.
在一些实施例中,所述彩色图像上包含所述目标物的第二目标区域,可以是由用户手动圈选的目标区域,还可以是基于用户预先输入的条件,自动圈选出的满足用户预先输入条件的目标区域,也可以是基于各种深度学习模型,自动圈选出的包含所述目标物的目标区域,本申请对所述第二目标区域的获取方式不做限制。In some embodiments, the second target area on the color image that includes the target object may be a target area manually circled by the user, or may be automatically circled based on a condition input by the user in advance that meets the needs of the user The target area for which conditions are input in advance may also be a target area that is automatically selected based on various deep learning models and includes the target object. The present application does not limit the acquisition method of the second target area.
在获取了所述可移动平台采集到的包括所述目标物的彩色图像上的第二目标区域之后,将所述彩色图像上的第二目标区域投影到所述深度图上,获取所述深度图上包括所述目标物的第一目标区域。After acquiring the second target area on the color image including the target collected by the movable platform, project the second target area on the color image onto the depth map to acquire the depth The map includes a first target area of the target.
在一些实施例中,将获取的包括目标物的彩色图像上的第二目标区域投影到所述深度图上,可以通过以下方式实现:将T时刻生成的彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。In some embodiments, projecting the second target area on the acquired color image including the target onto the depth map may be achieved by: projecting the second target area on the color image generated at time T to the depth map on the depth map generated at time T.
参见图4,仍以无人机跟随场景为例,所述彩色图像由主摄像头采集生成,所述深度图由双目摄像头采集生成,那么,所述主摄像头和双目摄像头同时对目标物进行图像采集,获取目标物所在场景的彩色图像401和深度图402,其中,T0~T5表示在6个不同的时刻所拍摄的彩色图像帧和深度图像帧。在理想情况下,主摄像头所拍摄的彩色图像与双目摄像头所获取的深度图在同一时刻,包含的图像内容完全相同。因此,将某一时 刻生成的所述彩色图像上的目标区域投影到该时刻生成的深度图上,所述目标区域所包含的图像内容完全相同,故可以将彩色图像的第二目标区域在所述深度图上的投影结果直接作为第一目标区域,步骤203中,调整所述第一目标区域中的“第一目标区域”,即指的是直接投影获得的第一目标区域。Referring to FIG. 4, still taking the drone following scene as an example, the color image is collected and generated by the main camera, and the depth map is collected and generated by the binocular camera. Then, the main camera and the binocular camera simultaneously perform In image acquisition, a color image 401 and a depth map 402 of the scene where the target object is located are acquired, wherein T0 to T5 represent color image frames and depth image frames captured at 6 different moments. Ideally, the color image captured by the main camera and the depth map captured by the binocular camera will contain exactly the same image content at the same time. Therefore, the target area on the color image generated at a certain moment is projected onto the depth map generated at that moment, and the image content contained in the target area is exactly the same, so the second target area of the color image can be The projection result on the depth map is directly used as the first target area. In step 203, the "first target area" in the first target area is adjusted, which refers to the first target area obtained by direct projection.
参见图5A和图5B,分别是主摄像头所采集的包含目标物的彩色图像和双目摄像头所采集的包含目标物的深度图,其中,黑色矩形框所框选的区域为包含目标物的目标区域。由于彩色图像包含更多的信息,因此,基于彩色图像能够更准确地确定包含目标物的第二目标区域。由于所述深度图包含深度信息,因此,将所述彩色图像包含目标物的第二目标区域投影到所述深度图上,能够获得深度图上包含目标物的第三目标区域;基于深度图所提取出的距离信息,能够获取所述可移动平台与所述目标物的距离信息。5A and FIG. 5B are respectively the color image containing the target collected by the main camera and the depth map containing the target collected by the binocular camera, wherein the area selected by the black rectangle is the target containing the target area. Since the color image contains more information, the second target area containing the target can be more accurately determined based on the color image. Since the depth map contains depth information, projecting the second target area of the color image containing the target object onto the depth map can obtain the third target area of the depth map containing the target object; based on the depth map The extracted distance information can obtain the distance information between the movable platform and the target.
之所以基于可移动平台所采集的彩色图像上的包含目标物的第二目标区域,采用投影的方式获取深度图上包含目标物的第三目标区域,是因为所述彩色图像包含的有用信息更多,且更加直观,能够更快更准确地确定包含所述目标物的目标区域。而所述深度图为灰度图,基于所述灰度图直接确定包含所述目标物的目标区域,处理起来比较繁琐,速度较慢,且准确度会有一定影响。The reason why the third target area containing the target on the depth map is obtained by projection based on the second target area containing the target on the color image collected by the movable platform is because the color image contains more useful information. More and more intuitive, the target area containing the target can be determined faster and more accurately. The depth map is a grayscale map, and directly determining the target area including the target object based on the grayscale map is cumbersome and slow to process, and the accuracy will be affected to a certain extent.
当然,本领域技术人员应当理解,基于本申请所述的可移动平台的控制方法,也可以基于可移动平台所获取的深度图,基于特征识别、轮廓识别、深度学习等方法,直接获取所述深度图上包含所述目标物的第一目标区域,本申请对此不做限制。故,这种情况下,步骤203中,调整所述第一目标区域中的“第一目标区域”,即指的是基于可移动平台所获取的深度图所直接获得的第一目标区域。Of course, those skilled in the art should understand that, based on the control method of the movable platform described in this application, the depth map obtained by the movable platform can also be directly obtained based on methods such as feature recognition, contour recognition, and deep learning. The depth map includes the first target area of the target object, which is not limited in this application. Therefore, in this case, in step 203, the "first target area" in the first target area is adjusted, which refers to the first target area directly obtained based on the depth map obtained by the movable platform.
前文所述的,将彩色图像中包含目标物的第二目标区域投影到所述 深度图上获取第一目标区域,是在理想情况下,所述主摄像头在某一时刻所拍摄的彩色图像与所述双目摄像头在对应时刻所拍摄的深度图所包含的图像内容完全一样。然而,由于一些原因,会导致所述主摄像头在某一时刻所拍摄的彩色图像与所述双目摄像头在对应时刻所拍摄的深度图所包含的图像内容在相同的像素位置处存在差别。As mentioned above, projecting the second target area containing the target in the color image onto the depth map to obtain the first target area is ideally, the color image captured by the main camera at a certain moment is the same as the color image. The image content included in the depth map captured by the binocular camera at the corresponding moment is exactly the same. However, due to some reasons, the color image captured by the main camera at a certain moment and the image content included in the depth map captured by the binocular camera at the corresponding moment are different at the same pixel position.
例如,由于主摄像头和双目摄像头所拍摄的图像经历的图像处理的时间不同,则会导致在同一时间所得到的彩色图像和深度图实际对应的采集时间并不相同。参见图6,在一个采集周期T0~T5内所获取的彩色图像601和深度图602,被分别缓存在各自的缓存中。由于彩色图像和深度图所经历的处理时间不同,而缓存的大小有限。因此,在某一时刻,主摄像头在T0时刻采集的图像帧可能会被率先丢掉,那么,对于在T0时刻所采集的深度图,只能找与其时间戳最接近的T1时刻的彩色图像进行目标框投影。由于T0时刻的彩色图像和T1时刻的深度图在相同像素位置所包含的图像内容并不相同,则,基于T1时刻所采集的彩色图对T0时刻所采集的深度图进行投影,彩色图像中包含所述目标物的第二目标区域和深度图中包含所述目标物的第一目标区域的内容会存在差别,进而造成基于所述深度图中的第一目标区域所获取的距离信息不准确。For example, since the images captured by the main camera and the binocular camera undergo different image processing times, the actual corresponding acquisition times of the color images and depth maps obtained at the same time are different. Referring to FIG. 6 , the color image 601 and the depth map 602 acquired in one acquisition period T0 to T5 are respectively buffered in their respective buffers. The size of the cache is limited due to the different processing time experienced by color images and depth maps. Therefore, at a certain time, the image frames collected by the main camera at time T0 may be discarded first. Then, for the depth map collected at time T0, only the color image at time T1 closest to its time stamp can be found for the target. Box projection. Since the color image at time T0 and the depth map at time T1 contain different image contents at the same pixel position, the depth map collected at time T0 is projected based on the color image collected at time T1, and the color image contains The content of the second target region of the target object and the first target region containing the target object in the depth map may be different, thereby causing inaccurate distance information obtained based on the first target region in the depth map.
另一种情况是,所述目标物可能是运动物体,那么,即使所述主摄像头和所述双目摄像头所采集的图像经历的处理时间相同,但是,由于所拍摄的目标物的运动,会导致所述目标物位于所述彩色图像和深度图的不同像素位置。基于彩色图像的第二目标区域向所述深度图上投影确定深度图上包含所述目标物的第一目标区域,会出现所述第一目标区域和第二目标区域所包含的内容不同的情况。相应地,基于所述深度图上的第一目标区域所获得的可移动平台与所述目标物的距离信息不准确。In another case, the target may be a moving object, then even if the images captured by the main camera and the binocular camera undergo the same processing time, due to the movement of the captured target, the The objects are caused to be located at different pixel locations in the color image and depth map. Based on the projection of the second target area of the color image onto the depth map to determine the first target area on the depth map that includes the target object, the content contained in the first target area and the second target area may be different. . Correspondingly, the distance information between the movable platform and the target object obtained based on the first target area on the depth map is inaccurate.
当然,还可以是其他情况造成的彩色图像的第二目标区域和深度图上的第一目标区域所包含的图像内容存在区别,本申请对此不做限制。总 而言之,由于硬件上图像处理和图像采集的限制,难以同时对主摄像头在某一时刻所拍摄的彩色图像与双目摄像头在对应时刻所拍摄的深度图之间进行投影。Of course, there may also be differences in the image content contained in the second target area of the color image and the first target area on the depth map caused by other situations, which are not limited in this application. All in all, due to the limitations of image processing and image acquisition on the hardware, it is difficult to simultaneously project the color image captured by the main camera at a certain moment and the depth map captured by the binocular camera at the corresponding moment.
在一些实施例中,不管是哪种情况,可以基于如图7所述的方法,对通过将彩色图像上包含所述目标物的第二目标区域投影到所述深度图上,所获得的第一目标区域进行修正,所述方法包括:In some embodiments, in either case, based on the method described in FIG. 7 , the first target area obtained by projecting the second target area including the target object on the color image onto the depth map A target area is corrected, and the method includes:
步骤S701,分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;Step S701, respectively extracting the first feature point on the color image and the second feature point on the depth map;
步骤S702,对所述第一特征点和所述第二特征点进行特征匹配;Step S702, performing feature matching on the first feature point and the second feature point;
步骤S703,基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Step S703, modifying the obtained first target area of the depth map based on the result of the feature matching.
基于对所述彩色图像和深度图进行特征点提取,并将相应的特征点进行匹配,调整所述深度图上的包含所述目标物的第一目标区域,能够避免由于各种硬件处理或者目标物运动等原因,造成所确定的所述深度图上的第一目标区域不准确,进而带来所获取的可移动平台与所述目标物的距离信息不准确的问题。Based on extracting feature points from the color image and the depth map, and matching the corresponding feature points, adjusting the first target area containing the target on the depth map can avoid various hardware processing or target Due to reasons such as object movement, the determined first target area on the depth map is inaccurate, which in turn brings about the problem of inaccurate distance information between the movable platform and the target object.
所述特征点提取的具体实现,可以参考相关技术。所述特征点提取的方法,可以基于图像的纹理信息、像素强度信息等常规的图像处理方式实现,还可以基于深度学习网络的方法实现,当然,还可以基于其他方法实现,本申请对此不做限制。For the specific implementation of the feature point extraction, reference may be made to the related art. The method for extracting feature points can be implemented based on conventional image processing methods such as image texture information and pixel intensity information, and can also be implemented based on deep learning network methods. Of course, it can also be implemented based on other methods. make restrictions.
所述基于特征匹配的结果,修正所得到的所述深度图的第一目标区域,可以是令所述彩色图像上的第一特征点与所述深度图上的第二特征点完全对齐,例如,所述目标物是人,那么,所提取的第一特征点和第二特征点可以是人的鼻子、嘴巴等多个特征部位,使所述所提取到的第一特征点和第二特征点完全一一对齐,即彩色图像中的鼻子、嘴巴等,分别与深 度图中的鼻子、嘴巴等,一一对齐,获取修正所得到的深度图的第一目标区域。当然,还可以是其他修正方式,例如,使多个第一特征点的中心对齐与多个第二特征点的中心对齐,等等,本申请对具体的修正方式不做限制。The correction of the obtained first target area of the depth map based on the result of the feature matching may be to make the first feature point on the color image completely align with the second feature point on the depth map, for example , the target is a person, then the extracted first feature point and second feature point can be multiple feature parts such as the nose and mouth of a person, so that the extracted first feature point and second feature point The points are completely aligned one by one, that is, the nose, mouth, etc. in the color image are aligned with the nose, mouth, etc. in the depth map, respectively, to obtain the first target area of the corrected depth map. Of course, other correction methods are also possible, for example, aligning the centers of the plurality of first feature points with the centers of the plurality of second feature points, etc. The present application does not limit the specific correction methods.
在一些实施例中,本申请所述的可移动平台的控制方法中,步骤S203,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩。In some embodiments, in the control method of the movable platform described in this application, step S203, adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases , which includes: adjusting the first target area, so that the boundary of the first target area shrinks in a direction close to the target.
结合图5B和图8进行说明,图5B为所述可移动平台采集得到的目标物(其中,目标物为人)所在场景的深度图,基于前文所述的方法,可以获取所述深度图上包括所述目标物的第一目标区域(以黑色矩形框所框选的区域作为所述第一目标区域为例),调整图5B中的所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域,所述跟踪区域可以如图8所示,其中,所述跟踪区域以人形轮廓的黑色封闭曲线所包围的区域所示。5B and 8, FIG. 5B is the depth map of the scene where the target object (wherein the target object is a person) collected by the movable platform is obtained. Based on the method described above, the depth map can be obtained including The first target area of the target object (taking the area selected by the black rectangle as the first target area as an example), adjust the first target area in FIG. 5B so that the first target area The proportion of the area corresponding to the target object increases, and the tracking area is obtained, and the tracking area can be shown in FIG. 8 , wherein the tracking area is shown as the area surrounded by the black closed curve of the humanoid outline.
比较图5B和图8中包含所述目标物的区域,可以明显地看到,图5B包含所述目标物的第一目标区域,除了包括所述的目标物——人,还包括地面。如果基于图5B所示的第一目标区域,对所述第一目标区域的距离进行平均,将所述平均结果作为所述可移动平台和所述目标物的距离,并不是真实的距离,而是第一目标区域内的目标物(人)和背景(地面)与所述可移动平台的平均距离。而图8中的所述跟踪区域,几乎仅仅包括目标物(人),则基于所述跟踪区域计算出的距离信息,更接近真实距离。Comparing the area containing the target object in FIG. 5B and FIG. 8 , it can be clearly seen that the first target area containing the target object in FIG. 5B includes not only the target object-person, but also the ground. If the distance of the first target area is averaged based on the first target area shown in FIG. 5B , the average result is taken as the distance between the movable platform and the target, which is not the real distance, but is the average distance between the target (person) and the background (ground) in the first target area and the movable platform. While the tracking area in FIG. 8 almost only includes the target object (person), the distance information calculated based on the tracking area is closer to the real distance.
当然,本领域技术人员应当理解,一般情况下,所述目标物位于所述深度图上的第一目标区域的中心时,所述调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,通过调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述第一目标区域的中 心的方向收缩的方式实现。而当所述目标物并未位于所述第一目标区域的中心时,所述调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,通过调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的中心的方向收缩的方式实现。当然,本领域技术人员可以根据其他实际情况确定,本申请对此并不做限制。Of course, those skilled in the art should understand that, in general, when the target object is located in the center of the first target area on the depth map, the first target area is adjusted so that the first target area An increase in the proportion of the area corresponding to the target object in the first target area is achieved by adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the center of the first target area. and when the target object is not located in the center of the first target area, adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, This is achieved by adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the center of the target object. Of course, those skilled in the art can determine according to other actual situations, which is not limited in this application.
为增大第一目标区域中对应目标物的区域占比,除了以上实施例所述的方法以外,在一个实施例中,还可以通过删除第一目标区域中未对应目标物的区域的其他区域中的全部或者部分区域来实现,与前面实施例不同的是,此时可以不对第一目标区域的边界进行调整,而是通过直接将第一目标区域内对应目标物以外的区域部分或全部删除,由此也可以实现增大第一目标区域中对应目标物的区域占比。In order to increase the proportion of the area corresponding to the target object in the first target area, in addition to the method described in the above embodiment, in one embodiment, other areas in the area not corresponding to the target object in the first target area can also be deleted. Different from the previous embodiment, the boundary of the first target area may not be adjusted at this time, but by directly deleting part or all of the area other than the corresponding target in the first target area , so that the area ratio of the corresponding target object in the first target area can also be increased.
通过上述实施例可以看到,由于本申请对第一目标区域进行调整,使得第一目标区域中对应所述目标物的区域的占比增加,因此,根据此时调整所得的跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而能够更加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。It can be seen from the above-mentioned embodiments that, due to the adjustment of the first target area in the present application, the proportion of the area corresponding to the target object in the first target area is increased. Therefore, the depth information of the tracking area obtained by adjusting at this time is , can obtain more accurate distance information between the movable platform and the target, and then can more accurately control the movement of the movable platform relative to the target, avoid potential safety hazards, and provide the movement performance of the movable platform.
而无论是通过调整第一目标区域,使第一目标区域的边界向靠近目标物的方向收缩的方式,还是通过删除第一目标区域中未对应目标物的区域的其他区域中的全部或者部分区域的方式来增大第一目标区域中对应目标物的区域占比,首先要做的是将第一目标区域中的目标物对应的区域与其他区域区分开来,对此,可以是通过语义分割的方式来将第一目标区域上目标物与非目标物之间区分开来,基于语义分割结果,可以确定第一目标区域中对应目标物的区域,由此可以进一步地调整第一目标区域,使第一目标区域的边界向靠近目标物的方向收缩;或者是删除第一目标区域中未对应目标物的区域的其他区域中的全部或者部分区域,来增大第一目标区域中对应目标物的区域占比。当然,不排除以上两种手段并用的可能。 下面将介绍如何结合语义分割来调整第一目标区域,以使第一目标区域的边界向靠近目标物的方向收缩。Whether it is by adjusting the first target area to shrink the boundary of the first target area toward the target, or by deleting all or part of the other areas in the first target area that do not correspond to the target To increase the proportion of the area corresponding to the target in the first target area, the first thing to do is to distinguish the area corresponding to the target in the first target area from other areas. For this, semantic segmentation can be used. To distinguish between the target and non-target objects in the first target area, based on the semantic segmentation result, the area corresponding to the target object in the first target area can be determined, so that the first target area can be further adjusted, Make the boundary of the first target area shrink in the direction close to the target; or delete all or part of the other areas in the area that does not correspond to the target in the first target area to increase the corresponding target in the first target area area proportion. Of course, the possibility of using the above two means together cannot be ruled out. The following will introduce how to adjust the first target area in combination with semantic segmentation, so that the boundary of the first target area shrinks in a direction close to the target.
在一些实施例中,所述调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩,可以包括:对所述深度图中包括所述目标物的第一目标区域进行语义分割(也即采用图像语义分割技术),基于语义分割结果,调整所述第一目标区域。In some embodiments, the adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the target may include: adjusting the depth map to include the target Semantic segmentation is performed on the first target area of (that is, using image semantic segmentation technology), and the first target area is adjusted based on the semantic segmentation result.
图像语义分割技术,是指利用计算机视觉、图像处理技术等对图像中不同类别的物体进行标记,标记出图像上不同类别物体所在的像素位置。如图9所示,给出一个人骑摩托车的照片(左图),基于图像语义分割技术,可以按照图像中表达语义含义的不同进行分割(右图),获得右图所示的图像上不同类别物体(人和摩托车)所在的像素位置。Image semantic segmentation technology refers to the use of computer vision, image processing technology, etc. to mark different types of objects in an image, and mark the pixel positions of different types of objects on the image. As shown in Figure 9, given a photo of a person riding a motorcycle (left image), based on image semantic segmentation technology, it can be segmented according to the different semantic meanings expressed in the image (right image), and the image shown on the right can be obtained. Pixel locations where objects of different classes (people and motorcycles) are located.
图像语义分割技术是图像理解的基石,是计算机视觉的重要部分。当前,有许多图像语义分割技术。例如,传统的图像语义分割技术,如基于像素级别的“阈值法”(Thresholding methods)、基于像素聚类的图像语义分割技术(Clustering-based segmentation methods)、“图划分”的分割方法(Graph partitioning segmentation methods)等等。随着深度学习的迅猛发展,也出现了许多基于深度学习的图像语义分割技术,例如基于全卷积神经网络的图像语义分割技术、基于膨胀卷积(Dilated Convolutions)的图像语义分割技术,等等。本申请所述的对第一目标区域进行图像语义分割,可以采用任何一种能够实现图像语义分割的技术,本申请对此不做限制。Image semantic segmentation technology is the cornerstone of image understanding and an important part of computer vision. Currently, there are many image semantic segmentation techniques. For example, traditional image semantic segmentation techniques, such as pixel-level-based "Thresholding methods", pixel clustering-based image semantic segmentation techniques (Clustering-based segmentation methods), "Graph partitioning" segmentation methods (Graph partitioning) segmentation methods) and so on. With the rapid development of deep learning, many deep learning-based image semantic segmentation technologies have emerged, such as image semantic segmentation technology based on fully convolutional neural network, image semantic segmentation technology based on Dilated Convolutions, etc. . The image semantic segmentation for the first target area described in this application may adopt any technology capable of realizing image semantic segmentation, which is not limited in this application.
通过上述实施例可以看到,采用图像语义分割技术,对所述深度图上包含所述目标物的第一目标区域进行图像语义分割,基于所述图像语义分割结果,调整所述第一目标区域,使得所述目标区域中对应所述目标物的区域的占比增加,准确有效。因此,根据调整后所得的跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而能够更 加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。It can be seen from the above embodiment that the image semantic segmentation technology is used to perform image semantic segmentation on the first target area including the target object on the depth map, and the first target area is adjusted based on the image semantic segmentation result. , so that the proportion of the area corresponding to the target object in the target area is increased, which is accurate and effective. Therefore, according to the depth information of the tracking area obtained after adjustment, more accurate distance information between the movable platform and the target can be obtained, and then the movement of the movable platform relative to the target can be more accurately controlled to avoid safety hidden dangers, providing the movement performance of the movable platform.
在一些实施例中,所述对第一目标区域进行图像语义分割,可以采用基于深度学习的图像语义分割技术。那么,对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域,可以如图10所示,包括:In some embodiments, performing image semantic segmentation on the first target region may use a deep learning-based image semantic segmentation technology. Then, perform semantic segmentation on the first target area, and adjust the first target area based on the semantic segmentation result, as shown in Figure 10, including:
步骤S1001,基于预先训练的深度学习模型,对所述深度图进行图像语义分割;Step S1001, performing image semantic segmentation on the depth map based on a pre-trained deep learning model;
步骤S1002,根据所述语义分割结果,调整所述第一目标区域。Step S1002, adjusting the first target area according to the semantic segmentation result.
在利用所述预先训练的深度学习模型进行图像语义分割时,可以将待图像语义分割的深度图输入至预先训练的深度学习模型,所述深度学习模型能够输出进行了图像语义分割处理的深度图,即将深度图中的不同类别的物体标记所在的像素位置标记出来,即获取类似图9所示的深度图。When using the pre-trained deep learning model to perform image semantic segmentation, the depth map to be semantically segmented can be input into the pre-trained deep learning model, and the deep learning model can output the depth map that has undergone image semantic segmentation processing. , that is, to mark the pixel positions where objects of different categories in the depth map are marked, that is, to obtain a depth map similar to that shown in Figure 9.
在对所述深度图进行了图像语义分割之后,对所述深度图上包括所述目标物的第一目标区域进行调整,以使所述第一目标区域中对应所述目标物的区域的占比增加,即类似于,可以将图5B所示的包含所述目标物的第一目标区域调整为图8所示的包含所述目标物的第一目标区域。After the image semantic segmentation is performed on the depth map, the first target area including the target object on the depth map is adjusted, so that the area corresponding to the target object in the first target area occupies an The ratio increases, that is, similarly, the first target area containing the target object shown in FIG. 5B can be adjusted to the first target area containing the target object shown in FIG. 8 .
以上介绍了结合语义分割来调整第一目标区域,以使第一目标区域的边界向靠近目标物的方向收缩的方式,来实现增大第一目标区域中对应目标物的区域占比。可以理解的是,同样可以结合语义分割来删除第一目标区域中未对应目标物的区域的其他区域中的全部或者部分区域,来实现增大第一目标区域中对应目标物的区域占比,其实现过程与上述介绍的过程类似,区别仅在于在通过语义分割结果将第一目标区域上目标物与非目标物之间区分开来后,将第一目标区域中非目标物对应的区域删除,由于原理与前面实施例类似,不在此展开介绍。The above describes how to adjust the first target area in combination with semantic segmentation, so that the boundary of the first target area shrinks in the direction close to the target object, so as to increase the area ratio of the corresponding target object in the first target area. It can be understood that, it can also be combined with semantic segmentation to delete all or part of the other areas in the area that does not correspond to the target in the first target area, so as to increase the proportion of the area corresponding to the target in the first target area. The implementation process is similar to the process described above, the only difference is that after distinguishing between the target object and the non-target object on the first target area through the semantic segmentation result, the area corresponding to the non-target object in the first target area is deleted. , since the principle is similar to that of the previous embodiment, it will not be introduced here.
在一些实施例中,所述深度学习模型的训练过程包括:获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。In some embodiments, the training process of the deep learning model includes: acquiring semantic labels corresponding to pixels in a training image output by the deep learning model, calculating a first loss function based on the semantic labels, and calculating a first loss function based on the first loss function. A loss function trains the deep learning model.
其中,所述深度学习模型,可以根据需要设计,可以包括卷积层、批量标准化层和非线性激活层等中至少一个,也可以使用现有的用于图像语义分割的深度学习模型,本申请不做限制。所述深度学习模型的初始参数,可以通过预先的训练确定,也可以根据经验值确定,本申请对此不作限制。Wherein, the deep learning model can be designed as required, and can include at least one of a convolution layer, a batch normalization layer, a nonlinear activation layer, etc., or an existing deep learning model for image semantic segmentation. No restrictions. The initial parameters of the deep learning model may be determined through pre-training, or may be determined according to empirical values, which are not limited in this application.
对所述深度学习模型的训练,可以将带有语义标签的训练样本输入至待训练的深度学习模型,基于预先定义的第一损失函数,训练所述深度学习模型,直至所述第一损失函数收敛或者小于指定阈值。For the training of the deep learning model, the training samples with semantic labels can be input into the deep learning model to be trained, and the deep learning model can be trained based on the predefined first loss function until the first loss function Convergence or less than the specified threshold.
所述训练图像,可以是以灰度图像表示的深度图。The training image may be a depth map represented by a grayscale image.
所述第一损失函数,可以根据训练需要进行设计,也可以使用基于深度学习模型的图像语义分割技术中常用的第损失函数,本申请对此不作限制。The first loss function may be designed according to training needs, and may also use the first loss function commonly used in image semantic segmentation technology based on a deep learning model, which is not limited in this application.
在一些实施例中,所述第一损失函数可以是交叉熵损失函数。In some embodiments, the first loss function may be a cross-entropy loss function.
可选的,所述第一损失函数可以是:L=∑-y·logy'-(1-y)·log(1-y);Optionally, the first loss function may be: L=∑-y·logy'-(1-y)·log(1-y);
其中,y为每个像素点上的语义分割标签,y’为该像素上的预测值,即所述深度学习网络在该像素值上的输出结果。Among them, y is the semantic segmentation label on each pixel, and y' is the predicted value on the pixel, that is, the output result of the deep learning network on the pixel value.
通过上述实施例可以看到,采用基于深度学习模型的图像语义分割技术,对所述深度图上包含所述目标物的第一目标区域进行图像语义分割,基于所述图像语义分割结果,调整所述第一目标区域,使得所述目标区域中对应所述目标物的区域的占比增加。由于基于深度学习模型所进行的图像语义分割,模型能够学习到更多信息,因此,根据调整后所得跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而 能够更加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。It can be seen from the above embodiment that the image semantic segmentation technology based on the deep learning model is used to perform image semantic segmentation on the first target area including the target object on the depth map, and based on the image semantic segmentation result, adjust the image semantics. The first target area is selected, so that the proportion of the area corresponding to the target object in the target area is increased. Due to the semantic segmentation of the image based on the deep learning model, the model can learn more information. Therefore, according to the depth information of the tracking area obtained after adjustment, more accurate distance information between the movable platform and the target can be obtained, which can be more accurate. The movement of the movable platform relative to the target is accurately controlled, safety hazards are avoided, and the movement performance of the movable platform is provided.
由于所述深度图通常以灰度图表示,而灰度图上信息相比于彩色图像大大减少。为了使得深度学习模型更加适用灰度图上的任务,在一些实施例中,可以对所述深度学习模型进行两次训练,即在基于所述第一损失函数训练所述深度学习模型之前,所述训练过程还包括:获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Since the depth map is usually represented as a grayscale image, the information on the grayscale image is greatly reduced compared to the color image. In order to make the deep learning model more suitable for tasks on grayscale images, in some embodiments, the deep learning model may be trained twice, that is, before training the deep learning model based on the first loss function, the The training process further includes: acquiring a class label corresponding to the training image output by the deep learning model, calculating a second loss function based on the class label, training the deep learning model based on the second loss function, The initial parameters of the deep learning model before the first loss function trains the deep learning model.
下面,结合图11A和图11B,为对所述深度学习模型进行两次训练进行示例性说明。Below, with reference to FIG. 11A and FIG. 11B , an exemplary description is given for performing two trainings on the deep learning model.
图11A给出了在灰度图分类任务上对所述深度学习模型进行有监督的分类学习,使得所述深度学习模型对语义物体具有较高的敏感度。一个示例性实施例的具体实现如下:FIG. 11A shows the supervised classification learning of the deep learning model on the grayscale image classification task, so that the deep learning model has high sensitivity to semantic objects. The specific implementation of an exemplary embodiment is as follows:
深度学习模型采用单通道灰度图及每张图对应的类别标签进行训练,所述类别选取为所述目标物的目标类别,在无人机跟随应用场景中,类别即为期望跟随的目标类别,例如:人、车、船等。在图11A中,给出了人、车、船三类作为示例性举例。The deep learning model is trained by using a single-channel grayscale image and the category label corresponding to each image. The category is selected as the target category of the target. In the application scenario of UAV following, the category is the target category expected to follow. , for example: people, cars, boats, etc. In FIG. 11A , three categories of people, vehicles, and boats are given as illustrative examples.
将训练数据输入至所述深度学习模型,所述深度学习模型输出长度为N的向量P,其中N为所述类别的数量。N个向量P分别对应第i类目标的预测概率pi。采用第二损失函数,对所述深度学习模型进行反向传播学习,训练所述深度学习模型,直至所述第一损失函数收敛或者低于阈值。The training data is input to the deep learning model, which outputs a vector P of length N, where N is the number of the classes. The N vectors P respectively correspond to the predicted probability pi of the i-th target. Using the second loss function, back-propagation learning is performed on the deep learning model, and the deep learning model is trained until the first loss function converges or falls below a threshold.
在一些实施例中,所述训练数据为灰度图,所述灰度图除了包括所述可移动平台所采集的深度图,还可以包括将所述彩色图像转换为的灰度图,即作为伪深度图来提高训练数据的数量。In some embodiments, the training data is a grayscale image. In addition to the depth map collected by the movable platform, the grayscale image may also include a grayscale image converted from the color image, that is, as a grayscale image. Fake depth maps to increase the amount of training data.
在一些实施例中,所述第二损失函数为交叉熵损失函数。In some embodiments, the second loss function is a cross-entropy loss function.
可选的,所述第二损失函数可以为:
Figure PCTCN2021072581-appb-000001
Optionally, the second loss function may be:
Figure PCTCN2021072581-appb-000001
其中,N为训练数据的批大小,M为类别数,y ic为第i张输入图像对应的目标类别标签,p ic为对应第i类目标的预测概率。 Among them, N is the batch size of the training data, M is the number of categories, y ic is the target category label corresponding to the ith input image, and pic is the predicted probability corresponding to the ith target.
基于第二损失函数,在灰度图分类任务上对网络进行有监督的分类学习。训练得到的网络,对于特定的输入目标,在特征图上表现为有较高的响应,如图11B所示。Based on the second loss function, the network performs supervised classification learning on the grayscale image classification task. The trained network shows a higher response on the feature map for a specific input target, as shown in Figure 11B.
在灰度图分类任务上对网络进行有监督的分类学习训练完毕后,再在灰度图语义分割任务上对所述深度学习模型进行训练,得到最终的图像语义分割模型。一个示例性实施例的具体实现为:After the network is trained for supervised classification learning on the grayscale image classification task, the deep learning model is trained on the grayscale image semantic segmentation task to obtain the final image semantic segmentation model. The specific implementation of an exemplary embodiment is:
采用迁移学习,将在前文所述的灰度图分类任务上训练所述深度学习模型得到的网络参数作为预训练参数进行深度学习模型的初始化参数;所述深度学习模型的训练数据采用单通道灰度图及每张图特定目标类别对应的语义分割标签,如:人、车、船。对于所述训练数据的每个像素位置,基于所述第一损失函数对所述深度学习网络进行反向传播学习。Using transfer learning, the network parameters obtained by training the deep learning model on the grayscale image classification task described above are used as pre-training parameters to initialize the deep learning model; the training data of the deep learning model adopts single-channel grayscale. The degree map and the semantic segmentation label corresponding to the specific target category of each map, such as: people, cars, ships. For each pixel position of the training data, back-propagation learning is performed on the deep learning network based on the first loss function.
基于上述经过两次训练,包括基于分类任务和基于语义分割任务的训练,所确定的深度学习网络,能够更加适应灰度图上的任务,克服灰度图所包含的信息有限,所训练获得的深度学习网络性能不够好的缺陷。Based on the above two trainings, including training based on classification tasks and based on semantic segmentation tasks, the determined deep learning network can be more adapted to the tasks on grayscale images, and overcome the limited information contained in grayscale images. The defect that the deep learning network performance is not good enough.
将所述包含目标物的深度图,输入至上述经过两次训练所确定的深度学习模型,能够更加准确对所述目标物进行语义分割,进而能够更加准确地确定所述第一目标区域。对基于所述语义分割结果所确定的第一目标区域进行距离平均,能够更加准确地获取所述目标物与所述可移动平台的真实距离信息。Inputting the depth map including the target into the deep learning model determined after two training sessions can more accurately perform semantic segmentation on the target, and then more accurately determine the first target area. By averaging the distance of the first target area determined based on the semantic segmentation result, the real distance information between the target object and the movable platform can be obtained more accurately.
在某些情况下,例如低光照,所述可移动平台所采集的深度图上, 目标与背景边缘界限模糊,基于所述深度学习模型,对所述深度图进行语义分割所获得的第一目标区域,可能会比实际所述目标物所在的区域要大。如图12所示,图12的左图为基于所述深度学习模型所获得的深度图,其中,白色人形轮廓为基于所述深度学习模型所确定的语义分割结果;图12的右图中的白色轮廓,为实际的目标物(人)所在的第一目标区域。In some cases, such as low light, on the depth map collected by the movable platform, the boundary between the target and the background is blurred. Based on the deep learning model, the first target obtained by semantically segmenting the depth map area, which may be larger than the area where the target is actually located. As shown in Fig. 12, the left picture of Fig. 12 is a depth map obtained based on the deep learning model, wherein the white human-shaped outline is the semantic segmentation result determined based on the deep learning model; The white outline is the first target area where the actual target (person) is located.
基于上述情况,可以对所述深度学习模型所输出的语义分割结果进行形态学操作中的腐蚀处理。在一些实施例中,可以对所述深度学习模型输出的语义分割结果进行卷积,以缩小所述语义分割结果中表征素数目标物的分割区域,并基于卷积后的分割结果,调整所述第一目标区域。Based on the above situation, corrosion processing in the morphological operation may be performed on the semantic segmentation result output by the deep learning model. In some embodiments, the semantic segmentation result output by the deep learning model may be convolved to reduce the segmentation area representing the prime number target in the semantic segmentation result, and based on the convolved segmentation result, the first target area.
其中,所述卷积操作的具体尺寸,可以根据实际情况确定,例如可以采用一个5*5大小的卷积核对所述深度学习模型所输出的语义分割结果进行卷积。The specific size of the convolution operation can be determined according to the actual situation, for example, a 5*5 size convolution kernel can be used to convolve the semantic segmentation result output by the deep learning model.
在一些实施例中,对所述深度学习模型输出的语义分割结果进行卷积,包括:每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。In some embodiments, performing convolution on the semantic segmentation result output by the deep learning model includes: during each convolution, taking the minimum pixel value of the area where the convolution kernel is located as the difference between the center point of the area where the convolution kernel is located Pixel values.
当然,本领域技术人员应当理解,每次卷积时,也可以将卷积核所在区域的像素值进行加权,作为卷积核所在区域的中心点位置的像素值;也可以是其他方式确定卷积核所在区域的中心点位置的像素值,本申请对此不作限制。Of course, those skilled in the art should understand that during each convolution, the pixel value of the area where the convolution kernel is located can also be weighted as the pixel value of the center point of the area where the convolution kernel is located; it is also possible to determine the volume in other ways. The pixel value of the center point of the region where the product kernel is located, which is not limited in this application.
通过上述实施例可以看到,对于图像语义分割结果,继续采取形态操作中的腐蚀处理,调整所述第一目标区域,能够进一步增加所述目标区域中对应所述目标物的区域的占比。因此,根据调整后所得跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而能够更加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。It can be seen from the above embodiment that, for the image semantic segmentation result, continuing to take the corrosion process in the morphological operation and adjusting the first target area can further increase the proportion of the area corresponding to the target object in the target area. Therefore, according to the depth information of the tracking area obtained after adjustment, more accurate distance information between the movable platform and the target can be obtained, and then the movement of the movable platform relative to the target can be more accurately controlled to avoid potential safety hazards. , which provides the motion performance of a movable platform.
在一些实施例中,本申请所述的可移动平台的控制方法中,步骤S302中,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,除了使用基于深度学习模型的语义分割技术,还可以采用基于帧差统计的运动物体分割技术来实现。In some embodiments, in the control method of the movable platform described in this application, in step S302, the first target area is adjusted so that the proportion of the area corresponding to the target object in the first target area In addition, in addition to using the semantic segmentation technology based on the deep learning model, the moving object segmentation technology based on frame difference statistics can also be used.
所述基于帧差统计的运动物体分割技术,是一种通过对视频图像序列的连续两帧图像做差分运算获取运动目标轮廓的方法。当监控场景中所述目标物为运动物体时,相邻两帧图像之间会存在差别,两帧相减,求得图像对应位置像素值差的绝对值,判断其是否大于某一阈值,进而分析视频或图像序列的所述目标物的运动特性。则基于所求得的图像对应位置像素值差的绝对值,能够获取所述目标物的轮廓,进而可以确定所述深度图中包含所述目标物的第一目标区域。The moving object segmentation technology based on frame difference statistics is a method for obtaining the contour of a moving object by performing a difference operation on two consecutive frames of a video image sequence. When the target object in the monitoring scene is a moving object, there will be a difference between two adjacent frames of images. The two frames are subtracted to obtain the absolute value of the pixel value difference at the corresponding position of the image to determine whether it is greater than a certain threshold, and then The motion characteristics of the object of the video or image sequence are analyzed. Then, based on the obtained absolute value of the pixel value difference at the corresponding position of the image, the contour of the target object can be obtained, and then the first target area including the target object in the depth map can be determined.
在一些实施例中,所述基于帧差统计的运动物体分割技术,可以通过以下方式来实现:对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,将所述轮廓所包含的区域作为调整后的区域。In some embodiments, the moving object segmentation technology based on frame difference statistics can be implemented by: performing a difference operation on the depth map and its adjacent frame depth maps, and determining the depth map based on the difference result For the contour of the target object, the area included in the contour is regarded as the adjusted area.
下面,结合图5B进行示例性说明。图5B是所述可移动平台所获取的目标物所在场景的深度图,当所述目标物为运动的人时,基于在不同时刻所采集到的深度图,由于其中目标物(人)是运动的,则对某一时刻所获取的深度图与其前(或后)一时刻所获取的相邻深度图进行差分运算。求得两幅图像帧对应位置像素值的绝对值。由于所述目标物(人)是运动物体,则基于所求得的两幅图像帧对应位置像素值的绝对值,能够获取所述目标物(人)的轮廓,基于所述轮廓,能够重新确定只包括所述目标物的第一目标区域。基于所述第一目标区域的距离信息计算所述目标物与所述可移动平台的距离,能够更加准确和真实。Hereinafter, an exemplary description will be given with reference to FIG. 5B . 5B is the depth map of the scene where the target object is obtained by the movable platform. When the target object is a moving person, based on the depth maps collected at different times, since the target object (person) is a moving person , then a difference operation is performed on the depth map acquired at a certain moment and the adjacent depth map acquired at a previous (or subsequent) moment. Find the absolute value of the pixel value at the corresponding position of the two image frames. Since the target (person) is a moving object, the contour of the target (person) can be obtained based on the obtained absolute values of the pixel values corresponding to the two image frames, and based on the contour, it is possible to re-determine Only the first target area of the target is included. Calculating the distance between the target object and the movable platform based on the distance information of the first target area can be more accurate and real.
在一些实施例中,本申请所述的可移动平台的控制方法中,步骤S304中,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述 目标物运动,包括:根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。In some embodiments, in the control method for a movable platform described in this application, in step S304, controlling the movable platform to move relative to the target object according to the depth information of the tracking area includes: The depth information of the tracking area is controlled to control the movable platform to follow the movement of the target.
当所述应用场景为无人机跟随目标物时,那么,根据前文所述的基于深度学习模型的语义分割技术和\或形态操作中的腐蚀处理和\或基于帧差统计的运动物体分割技术,所确定的所述深度图中包含所述目标物的第一目标区域,所述目标物所在的区域的占比较多。基于该第一目标区域,采用技术该区域内距离均值的方法获取所述目标物与所述可移动平台的距离信息,基于该距离信息,可以更加准确地控制所述可移动平台对所述目标物的跟随运动。When the application scenario is that the drone follows the target, then, according to the aforementioned deep learning model-based semantic segmentation technology and/or corrosion processing in morphological operations and/or frame difference statistics-based moving object segmentation technology , the determined depth map includes the first target area of the target object, and the area where the target object is located accounts for a large proportion. Based on the first target area, the distance information between the target object and the movable platform is obtained by using the method of the average distance in the area. Based on the distance information, the movable platform can be more accurately controlled to the target. The following movement of things.
当然,本领域技术人员应当理解,除了上述应用场景,本申请的可移动平台的控制方法,还可以用在智能汽车避障、智能机器人测距等诸多场景中,那么,相应地,控制所述可移动平台相对于所述目标物运动,还可以是控制所述智能汽车的运行速度与运行方向、控制所述智能机器人的前进方向和前进速度等等,本申请对此不做限制。Of course, those skilled in the art should understand that, in addition to the above application scenarios, the control method of the mobile platform of the present application can also be used in many scenarios such as intelligent vehicle obstacle avoidance, intelligent robot ranging and so on. The movable platform moves relative to the target, and may also control the running speed and running direction of the smart car, control the forward direction and forward speed of the smart robot, etc., which are not limited in this application.
本申请实施例还提供一种可移动平台的控制方法,参照图15,图15是本申请一示例性实施例示出的另一种可移动平台的控制方法,包括以下步骤:An embodiment of the present application also provides a method for controlling a movable platform. Referring to FIG. 15 , FIG. 15 is another method for controlling a movable platform shown in an exemplary embodiment of the present application, including the following steps:
S1501,获取所述可移动平台采集得到的目标物所在场景的深度图;S1501, acquiring the depth map of the scene where the target object is collected and obtained by the movable platform;
S1502,获取所述深度图上包括所述目标物的第一目标区域;S1502, acquiring a first target area including the target on the depth map;
S1503,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;S1503, delete all or part of other areas in the first target area that do not correspond to the area of the target to obtain a tracking area;
S1504,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。S1504, controlling the movable platform to move relative to the target according to the depth information of the tracking area.
示例性的,获取所述深度图上包括所述目标物的第一目标区域,包括:获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可 移动平台采集到的彩色图像上;将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Exemplarily, acquiring a first target area including the target on the depth map includes: acquiring a second target area including the target, where the second target area is located in the area collected by the movable platform. on the color image; project the second target area onto the depth map to obtain the first target area.
示例性的,将所述第二目标区域投影到所述深度图上,包括:将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Exemplarily, projecting the second target area onto the depth map includes: projecting the second target area on the color image generated at time T onto the depth map generated at time T.
示例性的,将所述第二目标区域投影到所述深度图上,之后还包括:分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;对所述第一特征点和所述第二特征点进行特征匹配;基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Exemplarily, projecting the second target area onto the depth map, further comprising: extracting a first feature point on the color image and a second feature point on the depth map respectively; Feature matching is performed between the first feature point and the second feature point; based on the result of the feature matching, the obtained first target area of the depth map is corrected.
示例性的,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。Exemplarily, the deleting all or part of other regions in the first target region that does not correspond to the region of the target includes: performing semantic segmentation on the first target region, and deleting all the regions based on the results of the semantic segmentation. All or part of other areas in the first target area that do not correspond to the area of the target.
示例性的,所述对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:基于预先训练的深度学习模型,对所述深度图进行语义分割;根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。Exemplarily, the performing semantic segmentation on the first target area, and deleting all or part of other areas in the first target area that do not correspond to the area of the target object based on the result of the semantic segmentation includes: The trained deep learning model performs semantic segmentation on the depth map; according to the semantic segmentation result, deletes all or part of other areas in the first target area that do not correspond to the area of the target.
示例性的,所述深度学习模型的训练过程包括:获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。Exemplarily, the training process of the deep learning model includes: acquiring semantic labels corresponding to pixels in a training image output by the deep learning model, calculating a first loss function based on the semantic segmentation labels, and calculating a first loss function based on the first loss. function to train the deep learning model.
示例性的,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型 之前,所述深度学习模型的初始参数。Exemplarily, before training the deep learning model based on the semantic label, the training process further includes: acquiring a class label corresponding to the training image output by the deep learning model, and calculating a second loss based on the class label. function, training the deep learning model based on the second loss function, and determining initial parameters of the deep learning model before training the deep learning model based on the first loss function.
示例性的,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。Exemplarily, the first loss function and/or the second loss function is a cross-entropy loss function.
示例性的,所述训练图像包括灰度图像。Exemplarily, the training images include grayscale images.
示例性的,所述根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,还包括:对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。Exemplarily, according to the semantic segmentation result, deleting all or part of other areas in the first target area that do not correspond to the area of the target object further includes: a semantic segmentation result output by the deep learning model. Convolution is performed using a preset convolution kernel to reduce the segmented area representing the target in the semantic segmentation result, and the first target area is adjusted based on the convolved semantic segmentation result.
示例性的,对所述深度学习模型输出的语义分割结果进行卷积,包括:每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。Exemplarily, performing convolution on the semantic segmentation result output by the deep learning model includes: during each convolution, taking the minimum pixel value of the area where the convolution kernel is located as the pixel value of the center point of the area where the convolution kernel is located. .
示例性的,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,删除所述轮廓所包含的区域以外的其他区域。Exemplarily, the deleting all or part of other regions in the first target region that does not correspond to the region of the target includes: performing a difference operation on the depth map and its adjacent frame depth maps, and Based on the difference result, the contour of the target object on the depth map is determined, and other regions other than the region included in the contour are deleted.
示例性的,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。Exemplarily, controlling the movable platform to move relative to the target object according to the depth information of the tracking area includes: controlling the movable platform to follow the movement of the target object according to the depth information of the tracking area. .
对于以上所给出的例子的详细介绍可以参照前文的记载,在此不再重复介绍。For the detailed description of the examples given above, reference may be made to the foregoing description, and the description will not be repeated here.
相应地,为了解决相关技术所存在的缺陷,本申请还提供了一种可移动平台,如图13所示,所述可移动平台包括:图像采集装置1301、存储器1302和处理器1303。Correspondingly, in order to solve the defects existing in the related art, the present application also provides a movable platform, as shown in FIG.
其中,所述图像采集装置,用于获取目标物所在场景的深度图;Wherein, the image acquisition device is used to acquire the depth map of the scene where the target object is located;
所述存储器1302用于存储程序代码;The memory 1302 is used to store program codes;
所述处理器1303调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor 1303 calls the program code, and when the program code is executed, is used to perform the following operations:
获取所述深度图以及所述深度图上包括所述目标物的第一目标区域;acquiring the depth map and the first target area on the depth map including the target;
调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;Adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases to obtain a tracking area;
根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
所述处理器1303还用于执行以下操作:The processor 1303 is further configured to perform the following operations:
获取所述深度图以及所述深度图上包括所述目标物的第一目标区域;删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。Obtain the depth map and the first target area including the target on the depth map; delete all or part of other areas in the first target area that do not correspond to the area of the target to obtain a tracking area ; control the movable platform to move relative to the target according to the depth information of the tracking area.
在一些实施例中,所述可移动平台可以是无人机、智能汽车、智能机器人等等,本申请对此不做限制。In some embodiments, the movable platform may be a drone, a smart car, a smart robot, etc., which is not limited in this application.
利用本申请所提供的一种可移动平台,通过获取目标物所在场景的深度图上包括所述目标物的目标区域,并对所述目标区域进行调整,以使所述目标区域中对应所述目标物的区域的占比增加,得到跟踪区域,进而根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。由于本申请对所述目标区域进行调整,使得所述目标区域中对应所述目标物的区域的占比增加,因此,根据所述跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而能够更加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。Using a movable platform provided by the present application, the target area including the target object is obtained on the depth map of the scene where the target object is located, and the target area is adjusted so that the target area corresponds to the target area. The proportion of the area of the target object is increased to obtain a tracking area, and then the movable platform is controlled to move relative to the target object according to the depth information of the tracking area. Since the target area is adjusted in the present application, the proportion of the area corresponding to the target object in the target area is increased. Therefore, according to the depth information of the tracking area, a more accurate movable platform and target can be obtained. The distance information of the object can then be more accurately controlled to control the movement of the movable platform relative to the target object, avoiding potential safety hazards and providing the movement performance of the movable platform.
此外,基于相关技术所存在的技术问题,本申请还提供了一种控制 装置,如图14所示,所述控制装置包括:存储器1401和处理器1402。In addition, based on the technical problems existing in the related art, the present application also provides a control device, as shown in FIG. 14 , the control device includes: a memory 1401 and a processor 1402.
其中,所述存储器1401用于存储程序代码;Wherein, the memory 1401 is used to store program codes;
所述处理器1402调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor 1402 calls the program code, and when the program code is executed, is used to perform the following operations:
获取可移动平台采集得到的目标物所在场景的深度图;Obtain the depth map of the scene where the target object is collected by the movable platform;
获取所述深度图上包括所述目标物的目标区域;acquiring a target area including the target on the depth map;
调整所述目标区域,以使所述目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;Adjusting the target area so that the proportion of the area corresponding to the target object in the target area increases to obtain a tracking area;
根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
所述处理器1402还用于执行以下操作:The processor 1402 is further configured to perform the following operations:
获取所述可移动平台采集得到的目标物所在场景的深度图;获取所述深度图上包括所述目标物的第一目标区域;删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。Acquiring the depth map of the scene where the target object is collected by the movable platform; acquiring the first target area including the target object on the depth map; deleting the area in the first target area that does not correspond to the target object The whole or part of other areas of the tracking area is obtained; according to the depth information of the tracking area, the movable platform is controlled to move relative to the target object.
利用本申请所提供的一种控制装置,通过获取目标物所在场景的深度图上包括所述目标物的目标区域,并对所述目标区域进行调整,以使所述目标区域中对应所述目标物的区域的占比增加,得到跟踪区域,进而根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。由于本申请对所述目标区域进行调整,使得所述目标区域中对应所述目标物的区域的占比增加,因此,根据所述跟踪区域的深度信息,能够获得更加准确的可移动平台与目标物的距离信息,进而能够更加准确地控制所述可移动平台相对于所述目标物的运动,避免安全隐患,提供可移动平台的运动性能。Using a control device provided by the present application, the target area including the target object is obtained on the depth map of the scene where the target object is located, and the target area is adjusted so that the target area corresponds to the target The proportion of the area of the object is increased to obtain a tracking area, and then the movable platform is controlled to move relative to the target object according to the depth information of the tracking area. Since the target area is adjusted in the present application, the proportion of the area corresponding to the target object in the target area is increased. Therefore, according to the depth information of the tracking area, a more accurate movable platform and target can be obtained. The distance information of the object can then be more accurately controlled to control the movement of the movable platform relative to the target object, avoiding potential safety hazards and providing the movement performance of the movable platform.
在本申请的实施例中还提供了一种机器可读存储介质,所述机器可读存储介质存储有计算机程序,当其在计算机上运行时执行时实现本申请上述方法中的任意实施例,在此不再赘述。In the embodiments of the present application, a machine-readable storage medium is also provided, and the machine-readable storage medium stores a computer program, which, when executed on a computer, implements any embodiment of the above method of the present application, It is not repeated here.
所述机器可读存储介质可以是前述任一项实施例所述的设备的内部存储单元,例如设备的硬盘或内存。所述机器可读存储介质也可以是所述设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述机器可读存储介质还可以既包括所述设备的内部存储单元也包括外部存储设备。所述机器可读存储介质用于存储所述计算机程序以及所述设备所需的其他程序和数据。所述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The machine-readable storage medium may be an internal storage unit of the device described in any of the foregoing embodiments, such as a hard disk or a memory of the device. The machine-readable storage medium may also be an external storage device of the device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card equipped on the device , Flash Card (Flash Card) and so on. Further, the machine-readable storage medium may also include both an internal storage unit of the device and an external storage device. The machine-readable storage medium is used to store the computer program and other programs and data required by the apparatus. The computer-readable storage medium can also be used to temporarily store data that has been or will be output.
另外对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。In addition, with regard to the apparatus embodiments, since they basically correspond to the method embodiments, reference may be made to the partial descriptions of the method embodiments for related parts. The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一机器可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing the relevant hardware through a computer program, and the program can be stored in a machine-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照 不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. Additionally, the processes depicted in the figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
本领域技术人员在考虑申请及实践这里申请的发明后,将容易想到本申请的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未申请的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。Other embodiments of the present application will readily occur to those skilled in the art upon consideration of applying and practicing the inventions claimed herein. This specification is intended to cover any variations, uses or adaptations of this specification that follow the general principles of this specification and include common general knowledge or conventional techniques in the technical field to which this specification does not apply . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the application being indicated by the following claims.
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。It is to be understood that the present application is not limited to the precise structures described above and illustrated in the accompanying drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the present application. within the scope of protection.

Claims (88)

  1. 一种可移动平台的控制方法,其特征在于,所述方法包括:A control method for a movable platform, characterized in that the method comprises:
    获取所述可移动平台采集得到的目标物所在场景的深度图;acquiring the depth map of the scene where the target object is collected and obtained by the movable platform;
    获取所述深度图上包括所述目标物的第一目标区域;acquiring a first target area including the target on the depth map;
    调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;Adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases to obtain a tracking area;
    根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
  2. 根据权利要求1所述的方法,其特征在于,获取所述深度图上包括所述目标物的第一目标区域,包括:The method according to claim 1, wherein acquiring the first target area including the target on the depth map comprises:
    获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可移动平台采集到的彩色图像上;acquiring a second target area containing the target, the second target area being located on the color image collected by the movable platform;
    将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Projecting the second target area onto the depth map to obtain the first target area.
  3. 根据权利要求2所述的方法,其特征在于,将所述第二目标区域投影到所述深度图上,包括:The method according to claim 2, wherein projecting the second target area onto the depth map comprises:
    将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  4. 根据权利要求2所述的方法,其特征在于,将所述第二目标区域投影到所述深度图上,之后还包括:The method according to claim 2, wherein projecting the second target area onto the depth map, further comprising:
    分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;respectively extracting the first feature point on the color image and the second feature point on the depth map;
    对所述第一特征点和所述第二特征点进行特征匹配;performing feature matching on the first feature point and the second feature point;
    基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Based on the result of the feature matching, the obtained first target area of the depth map is modified.
  5. 根据权利要求1所述的方法,其特征在于,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:The method according to claim 1, wherein adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, comprising:
    调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩;adjusting the first target area, so that the boundary of the first target area shrinks toward the direction close to the target;
    和/或,and / or,
    删除所述第一目标区域中未对应所述目标物的区域的其他区域中的全部或者部分区域。Deleting all or part of other areas in the first target area that do not correspond to the area of the target.
  6. 根据权利要求5所述的方法,其特征在于,所述调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩,包括:The method according to claim 5, wherein the adjusting the first target area so that the boundary of the first target area shrinks in a direction close to the target comprises:
    对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域。Semantic segmentation is performed on the first target area, and the first target area is adjusted based on the result of the semantic segmentation.
  7. 根据权利要求6所述的方法,其特征在于,所述对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域,包括:The method according to claim 6, wherein the performing semantic segmentation on the first target area, and adjusting the first target area based on a result of the semantic segmentation, comprises:
    基于预先训练的深度学习模型,对所述深度图进行语义分割;performing semantic segmentation on the depth map based on a pre-trained deep learning model;
    根据语义分割结果,调整所述第一目标区域。According to the semantic segmentation result, the first target area is adjusted.
  8. 根据权利要求7所述的方法,其特征在于,所述深度学习模型的训练过程包括:The method according to claim 7, wherein the training process of the deep learning model comprises:
    获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。The semantic labels corresponding to the pixels in the training image output by the deep learning model are acquired, a first loss function is calculated based on the semantic segmentation labels, and the deep learning model is trained based on the first loss function.
  9. 根据权利要求8所述的方法,其特征在于,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:The method according to claim 8, wherein before training the deep learning model based on the semantic labels, the training process further comprises:
    获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Obtain the class label corresponding to the training image output by the deep learning model, calculate a second loss function based on the class label, train the deep learning model based on the second loss function, and determine whether to train based on the first loss function Before the deep learning model, the initial parameters of the deep learning model.
  10. 根据权利要求9所述的方法,其特征在于,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。The method according to claim 9, wherein the first loss function and/or the second loss function is a cross-entropy loss function.
  11. 根据权利要求9所述的方法,其特征在于,所述训练图像包括灰度图像。The method of claim 9, wherein the training image comprises a grayscale image.
  12. 根据权利要求7所述的方法,其特征在于,所述根据语义分割结 果,调整所述第一目标区域,还包括:The method according to claim 7, wherein, adjusting the first target area according to the semantic segmentation result, further comprising:
    对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。The semantic segmentation result output by the deep learning model is convolved with a preset convolution kernel to reduce the segmentation area representing the target in the semantic segmentation result, and based on the semantic segmentation result after the convolution, adjust the first target area.
  13. 根据权利要求12所述的方法,其特征在于,对所述深度学习模型输出的语义分割结果进行卷积,包括:The method according to claim 12, wherein, performing convolution on the semantic segmentation result output by the deep learning model, comprising:
    每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。In each convolution, the minimum pixel value of the area where the convolution kernel is located is taken as the pixel value of the center point of the area where the convolution kernel is located.
  14. 根据权利要求1所述的方法,其特征在于,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:The method according to claim 1, wherein adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, comprising:
    对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,将所述轮廓所包含的区域作为调整后的区域。The difference operation is performed on the depth map and the depth maps of adjacent frames, and the contour of the target object on the depth map is determined based on the difference result, and the area included in the contour is used as the adjusted area.
  15. 根据权利要求1所述的方法,其特征在于,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:The method according to claim 1, wherein, according to the depth information of the tracking area, controlling the movable platform to move relative to the target comprises:
    根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to follow the movement of the target.
  16. 一种可移动平台的控制方法,其特征在于,所述方法包括:A control method for a movable platform, characterized in that the method comprises:
    获取所述可移动平台采集得到的目标物所在场景的深度图;acquiring the depth map of the scene where the target object is collected and obtained by the movable platform;
    获取所述深度图上包括所述目标物的第一目标区域;acquiring a first target area including the target on the depth map;
    删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;Deleting all or part of other areas in the first target area that do not correspond to the area of the target to obtain a tracking area;
    根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
  17. 根据权利要求16所述的方法,其特征在于,获取所述深度图上包括所述目标物的第一目标区域,包括:The method according to claim 16, wherein acquiring the first target area including the target on the depth map comprises:
    获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可 移动平台采集到的彩色图像上;Obtain a second target area containing the target, and the second target area is located on the color image collected by the movable platform;
    将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Projecting the second target area onto the depth map to obtain the first target area.
  18. 根据权利要求17所述的方法,其特征在于,将所述第二目标区域投影到所述深度图上,包括:The method according to claim 17, wherein projecting the second target region onto the depth map comprises:
    将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  19. 根据权利要求17所述的方法,其特征在于,将所述第二目标区域投影到所述深度图上,之后还包括:The method according to claim 17, wherein projecting the second target area onto the depth map, further comprising:
    分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;respectively extracting the first feature point on the color image and the second feature point on the depth map;
    对所述第一特征点和所述第二特征点进行特征匹配;performing feature matching on the first feature point and the second feature point;
    基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Based on the result of the feature matching, the obtained first target area of the depth map is modified.
  20. 根据权利要求16所述的方法,其特征在于,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The method according to claim 16, wherein the deleting all or part of other regions in the first target region that does not correspond to the region of the target comprises:
    对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。Semantic segmentation is performed on the first target area, and all or part of other areas in the first target area that do not correspond to the area of the target object are deleted based on the semantic segmentation result.
  21. 根据权利要求20所述的方法,其特征在于,所述对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The method according to claim 20, characterized in that, performing semantic segmentation on the first target area, deleting other areas of the first target area that do not correspond to areas of the target object based on a result of the semantic segmentation All or part of the area, including:
    基于预先训练的深度学习模型,对所述深度图进行语义分割;performing semantic segmentation on the depth map based on a pre-trained deep learning model;
    根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。According to the semantic segmentation result, delete all or part of other regions in the first target region that do not correspond to the region of the target object.
  22. 根据权利要求21所述的方法,其特征在于,所述深度学习模型的训练过程包括:The method according to claim 21, wherein the training process of the deep learning model comprises:
    获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。The semantic labels corresponding to the pixels in the training image output by the deep learning model are acquired, a first loss function is calculated based on the semantic segmentation labels, and the deep learning model is trained based on the first loss function.
  23. 根据权利要求22所述的方法,其特征在于,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:The method according to claim 22, wherein before training the deep learning model based on the semantic labels, the training process further comprises:
    获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Obtain the class label corresponding to the training image output by the deep learning model, calculate a second loss function based on the class label, train the deep learning model based on the second loss function, and determine whether to train based on the first loss function Before the deep learning model, the initial parameters of the deep learning model.
  24. 根据权利要求23所述的方法,其特征在于,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。The method according to claim 23, wherein the first loss function and/or the second loss function is a cross-entropy loss function.
  25. 根据权利要求23所述的方法,其特征在于,所述训练图像包括灰度图像。24. The method of claim 23, wherein the training image comprises a grayscale image.
  26. 根据权利要求21所述的方法,其特征在于,所述根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,还包括:The method according to claim 21, wherein, according to the semantic segmentation result, deleting all or part of other regions in the first target region that does not correspond to the region of the target further comprises:
    对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。The semantic segmentation result output by the deep learning model is convolved with a preset convolution kernel to reduce the segmentation area representing the target in the semantic segmentation result, and based on the semantic segmentation result after the convolution, adjust the first target area.
  27. 根据权利要求26所述的方法,其特征在于,对所述深度学习模型输出的语义分割结果进行卷积,包括:The method according to claim 26, wherein the convolution is performed on the semantic segmentation result output by the deep learning model, comprising:
    每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。In each convolution, the minimum pixel value of the area where the convolution kernel is located is taken as the pixel value of the center point of the area where the convolution kernel is located.
  28. 根据权利要求16所述的方法,其特征在于,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The method according to claim 16, wherein the deleting all or part of other regions in the first target region that does not correspond to the region of the target comprises:
    对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,删除所述轮廓所包含的区域以外的其他区域。A difference operation is performed on the depth map and the depth maps of adjacent frames, and the contour of the target object on the depth map is determined based on the difference result, and other areas except the area included in the contour are deleted.
  29. 根据权利要求16所述的方法,其特征在于,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:The method according to claim 16, wherein, according to the depth information of the tracking area, controlling the movable platform to move relative to the target comprises:
    根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to follow the movement of the target.
  30. 一种可移动平台,其特征在于,所述可移动平台包括:图像采集装置、存储器和处理器;A movable platform, characterized in that, the movable platform comprises: an image acquisition device, a memory and a processor;
    所述图像采集装置,用于获取目标物所在场景的深度图;The image acquisition device is used to acquire a depth map of the scene where the target object is located;
    所述存储器用于存储程序代码;the memory is used to store program codes;
    所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor invokes the program code, when the program code is executed, for performing the following operations:
    获取所述深度图以及所述深度图上包括所述目标物的第一目标区域;acquiring the depth map and the first target area on the depth map including the target;
    调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;Adjusting the first target area to increase the proportion of the area corresponding to the target in the first target area to obtain a tracking area;
    根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
  31. 根据权利要求30所述的可移动平台,其特征在于,获取所述深度图上包括所述目标物的第一目标区域,包括:The movable platform according to claim 30, wherein acquiring the first target area including the target on the depth map comprises:
    获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可移动平台采集到的彩色图像上;acquiring a second target area containing the target, the second target area being located on the color image collected by the movable platform;
    将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Projecting the second target area onto the depth map to obtain the first target area.
  32. 根据权利要求31所述的可移动平台,其特征在于,将所述第二目标区域投影到所述深度图上,包括:The movable platform of claim 31, wherein projecting the second target area onto the depth map comprises:
    将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  33. 根据权利要求31所述的可移动平台,其特征在于,将所述第二目标区域投影到所述深度图上,之后还包括:The movable platform of claim 31, wherein projecting the second target area onto the depth map, further comprising:
    分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;respectively extracting the first feature point on the color image and the second feature point on the depth map;
    对所述第一特征点和所述第二特征点进行特征匹配;performing feature matching on the first feature point and the second feature point;
    基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Based on the result of the feature matching, the obtained first target area of the depth map is modified.
  34. 根据权利要求30所述的可移动平台,其特征在于,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:The movable platform according to claim 30, wherein adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, comprising:
    调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩;和/或Adjusting the first target area so that the boundary of the first target area shrinks toward the target; and/or
    删除所述第一目标区域中未对应所述目标物的区域的其他区域中的全部或者部分区域。Deleting all or part of other areas in the first target area that do not correspond to the area of the target.
  35. 根据权利要求34所述的可移动平台,其特征在于,所述调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩,包括:The movable platform according to claim 34, wherein the adjusting the first target area so that the boundary of the first target area shrinks toward the target object comprises:
    对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域。Semantic segmentation is performed on the first target area, and the first target area is adjusted based on the result of the semantic segmentation.
  36. 根据权利要求35所述的可移动平台,其特征在于,所述对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域,包括:The movable platform according to claim 35, wherein the performing semantic segmentation on the first target area and adjusting the first target area based on a result of the semantic segmentation includes:
    基于预先训练的深度学习模型,对所述深度图进行语义分割;performing semantic segmentation on the depth map based on a pre-trained deep learning model;
    根据语义分割结果,调整所述第一目标区域。According to the semantic segmentation result, the first target area is adjusted.
  37. 根据权利要求36所述的可移动平台,其特征在于,所述深度学习模型的训练过程包括:The movable platform according to claim 36, wherein the training process of the deep learning model comprises:
    获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。A semantic label corresponding to a pixel in a training image output by the deep learning model is acquired, a first loss function is calculated based on the semantic segmentation label, and the deep learning model is trained based on the first loss function.
  38. 根据权利要求37所述的可移动平台,其特征在于,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:The movable platform according to claim 37, wherein before training the deep learning model based on the semantic labels, the training process further comprises:
    获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型, 确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Obtain the class label corresponding to the training image output by the deep learning model, calculate a second loss function based on the class label, train the deep learning model based on the second loss function, and determine to train based on the first loss function Before the deep learning model, the initial parameters of the deep learning model.
  39. 根据权利要求38所述的可移动平台,其特征在于,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。The movable platform according to claim 38, wherein the first loss function and/or the second loss function is a cross-entropy loss function.
  40. 根据权利要求38所述的可移动平台,其特征在于,所述训练图像包括灰度图像。39. The movable platform of claim 38, wherein the training image comprises a grayscale image.
  41. 根据权利要求36所述的可移动平台,其特征在于,所述根据语义分割结果,调整所述第一目标区域,还包括:The movable platform according to claim 36, wherein the adjusting the first target area according to the semantic segmentation result further comprises:
    对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。The semantic segmentation result output by the deep learning model is convolved with a preset convolution kernel to reduce the segmentation area representing the target in the semantic segmentation result, and based on the semantic segmentation result after the convolution, adjust the first target area.
  42. 根据权利要求41所述的可移动平台,其特征在于,对所述深度学习模型输出的语义分割结果进行卷积,包括:The movable platform according to claim 41, wherein the convolution of the semantic segmentation result output by the deep learning model comprises:
    每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。In each convolution, the minimum pixel value of the area where the convolution kernel is located is taken as the pixel value of the center point of the area where the convolution kernel is located.
  43. 根据权利要求30所述的可移动平台,其特征在于,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:The movable platform according to claim 30, wherein adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, comprising:
    对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,将所述轮廓所包含的区域作为调整后的区域。The difference operation is performed on the depth map and the depth maps of adjacent frames, and the contour of the target object on the depth map is determined based on the difference result, and the area included in the contour is used as the adjusted area.
  44. 根据权利要求30所述的可移动平台,其特征在于,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:The movable platform according to claim 30, wherein, according to the depth information of the tracking area, controlling the movable platform to move relative to the target comprises:
    根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to follow the movement of the target.
  45. 一种控制装置,其特征在于,所述控制装置包括:存储器和处理器;A control device, characterized in that the control device comprises: a memory and a processor;
    所述存储器用于存储程序代码;the memory is used to store program codes;
    所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor invokes the program code, when the program code is executed, for performing the following operations:
    获取可移动平台采集得到的目标物所在场景的深度图;Obtain the depth map of the scene where the target object is collected by the movable platform;
    获取所述深度图上包括所述目标物的目标区域;acquiring a target area including the target on the depth map;
    调整所述目标区域,以使所述目标区域中对应所述目标物的区域的占比增加,得到跟踪区域;Adjusting the target area so that the proportion of the area corresponding to the target object in the target area increases to obtain a tracking area;
    根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
  46. 根据权利要求45所述的控制装置,其特征在于,获取所述深度图上包括所述目标物的第一目标区域,包括:The control device according to claim 45, wherein acquiring the first target area including the target on the depth map comprises:
    获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可移动平台采集到的彩色图像上;acquiring a second target area containing the target, the second target area being located on the color image collected by the movable platform;
    将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Projecting the second target area onto the depth map to obtain the first target area.
  47. 根据权利要求46所述的控制装置,其特征在于,将所述第二目标区域投影到所述深度图上,包括:The control device according to claim 46, wherein projecting the second target area onto the depth map comprises:
    将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  48. 根据权利要求46所述的控制装置,其特征在于,将所述第二目标区域投影到所述深度图上,之后还包括:The control device according to claim 46, wherein the second target area is projected onto the depth map, further comprising:
    分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;respectively extracting the first feature point on the color image and the second feature point on the depth map;
    对所述第一特征点和所述第二特征点进行特征匹配;performing feature matching on the first feature point and the second feature point;
    基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Based on the result of the feature matching, the obtained first target area of the depth map is modified.
  49. 根据权利要求45所述的控制装置,其特征在于,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:The control device according to claim 45, wherein adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, comprising:
    调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩;和/或,Adjusting the first target area so that the boundary of the first target area shrinks toward the target object; and/or,
    删除所述第一目标区域中未对应所述目标物的区域的其他区域中的全部或者部分区域。Deleting all or part of other areas in the first target area that do not correspond to the area of the target.
  50. 根据权利要求49所述的控制装置,其特征在于,所述调整所述第一目标区域,以使所述第一目标区域的边界向靠近所述目标物的方向收缩,包括:The control device according to claim 49, wherein the adjusting the first target area so that the boundary of the first target area shrinks toward the target object comprises:
    对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域。Semantic segmentation is performed on the first target area, and the first target area is adjusted based on the result of the semantic segmentation.
  51. 根据权利要求50所述的控制装置,其特征在于,对所述第一目标区域进行语义分割,基于语义分割结果调整所述第一目标区域,包括:The control device according to claim 50, wherein, performing semantic segmentation on the first target area, and adjusting the first target area based on the semantic segmentation result, comprising:
    基于预先训练的深度学习模型,对所述深度图进行语义分割;performing semantic segmentation on the depth map based on a pre-trained deep learning model;
    根据语义分割结果,调整所述第一目标区域。According to the semantic segmentation result, the first target area is adjusted.
  52. 根据权利要求51所述的控制装置,其特征在于,所述深度学习模型的训练过程包括:The control device according to claim 51, wherein the training process of the deep learning model comprises:
    获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。A semantic label corresponding to a pixel in a training image output by the deep learning model is acquired, a first loss function is calculated based on the semantic segmentation label, and the deep learning model is trained based on the first loss function.
  53. 根据权利要求52所述的控制装置,其特征在于,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:The control device according to claim 52, wherein before training the deep learning model based on the semantic label, the training process further comprises:
    获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Obtain the class label corresponding to the training image output by the deep learning model, calculate a second loss function based on the class label, train the deep learning model based on the second loss function, and determine whether to train based on the first loss function Before the deep learning model, the initial parameters of the deep learning model.
  54. 根据权利要求53所述的控制装置,其特征在于,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。The control device according to claim 53, wherein the first loss function and/or the second loss function is a cross-entropy loss function.
  55. 根据权利要求53所述的控制装置,其特征在于,所述训练图像包 括灰度图像。The control device of claim 53, wherein the training image comprises a grayscale image.
  56. 根据权利要求51所述的控制装置,其特征在于,所述根据语义分割结果,调整所述第一目标区域,还包括:The control device according to claim 51, wherein the adjusting the first target area according to the semantic segmentation result further comprises:
    对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。The semantic segmentation result output by the deep learning model is convolved with a preset convolution kernel to reduce the segmentation area representing the target in the semantic segmentation result, and based on the semantic segmentation result after the convolution, adjust the first target area.
  57. 根据权利要求56所述的控制装置,其特征在于,对所述深度学习模型输出的语义分割结果进行卷积,包括:The control device according to claim 56, wherein the convolution of the semantic segmentation result output by the deep learning model comprises:
    每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。In each convolution, the minimum pixel value of the area where the convolution kernel is located is taken as the pixel value of the center point of the area where the convolution kernel is located.
  58. 根据权利要求45所述的控制装置,其特征在于,调整所述第一目标区域,以使所述第一目标区域中对应所述目标物的区域的占比增加,包括:The control device according to claim 45, wherein adjusting the first target area so that the proportion of the area corresponding to the target object in the first target area increases, comprising:
    对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,将所述轮廓所包含的区域作为调整后的区域。The difference operation is performed on the depth map and the depth maps of adjacent frames, and the contour of the target object on the depth map is determined based on the difference result, and the area included in the contour is used as the adjusted area.
  59. 根据权利要求45所述的控制装置,其特征在于,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:The control device according to claim 45, wherein, according to the depth information of the tracking area, controlling the movable platform to move relative to the target comprises:
    根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to follow the movement of the target.
  60. 一种可移动平台,其特征在于,所述可移动平台包括:图像采集装置、存储器和处理器;A movable platform, characterized in that, the movable platform comprises: an image acquisition device, a memory and a processor;
    所述图像采集装置,用于获取目标物所在场景的深度图;The image acquisition device is used for acquiring the depth map of the scene where the target object is located;
    所述存储器用于存储程序代码;the memory is used to store program codes;
    所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor invokes the program code, when the program code is executed, for performing the following operations:
    获取所述深度图以及所述深度图上包括所述目标物的第一目标区域;acquiring the depth map and the first target area on the depth map including the target;
    删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;Deleting all or part of other areas in the first target area that do not correspond to the area of the target to obtain a tracking area;
    根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to move relative to the target.
  61. 根据权利要求60所述的可移动平台,其特征在于,获取所述深度图上包括所述目标物的第一目标区域,包括:The movable platform according to claim 60, wherein acquiring the first target area including the target on the depth map comprises:
    获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可移动平台采集到的彩色图像上;acquiring a second target area containing the target, the second target area being located on the color image collected by the movable platform;
    将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Projecting the second target area onto the depth map to obtain the first target area.
  62. 根据权利要求61所述的可移动平台,其特征在于,将所述第二目标区域投影到所述深度图上,包括:The movable platform of claim 61, wherein projecting the second target area onto the depth map comprises:
    将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  63. 根据权利要求61所述的可移动平台,其特征在于,将所述第二目标区域投影到所述深度图上,之后还包括:The movable platform of claim 61, wherein projecting the second target area onto the depth map, further comprising:
    分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;respectively extracting the first feature point on the color image and the second feature point on the depth map;
    对所述第一特征点和所述第二特征点进行特征匹配;performing feature matching on the first feature point and the second feature point;
    基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Based on the result of the feature matching, the obtained first target area of the depth map is modified.
  64. 根据权利要求60所述的可移动平台,其特征在于,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The movable platform according to claim 60, wherein the deleting all or part of other areas in the first target area that does not correspond to the area of the target includes:
    对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。Semantic segmentation is performed on the first target area, and based on the result of the semantic segmentation, all or part of other areas in the first target area that are not corresponding to the area of the target object are deleted.
  65. 根据权利要求64所述的可移动平台,其特征在于,所述对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The movable platform according to claim 64, wherein the semantic segmentation of the first target area is performed, and other areas in the first target area that do not correspond to the target object are deleted based on a result of the semantic segmentation. All or part of an area, including:
    基于预先训练的深度学习模型,对所述深度图进行语义分割;performing semantic segmentation on the depth map based on a pre-trained deep learning model;
    根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。According to the semantic segmentation result, delete all or part of other regions in the first target region that do not correspond to the region of the target object.
  66. 根据权利要求65所述的可移动平台,其特征在于,所述深度学习模型的训练过程包括:The movable platform according to claim 65, wherein the training process of the deep learning model comprises:
    获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。A semantic label corresponding to a pixel in a training image output by the deep learning model is acquired, a first loss function is calculated based on the semantic segmentation label, and the deep learning model is trained based on the first loss function.
  67. 根据权利要求66所述的可移动平台,其特征在于,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:The mobile platform according to claim 66, wherein before training the deep learning model based on the semantic labels, the training process further comprises:
    获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Obtain the class label corresponding to the training image output by the deep learning model, calculate a second loss function based on the class label, train the deep learning model based on the second loss function, and determine whether to train based on the first loss function Before the deep learning model, the initial parameters of the deep learning model.
  68. 根据权利要求67所述的可移动平台,其特征在于,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。The movable platform according to claim 67, wherein the first loss function and/or the second loss function is a cross-entropy loss function.
  69. 根据权利要求67所述的可移动平台,其特征在于,所述训练图像包括灰度图像。68. The movable platform of claim 67, wherein the training image comprises a grayscale image.
  70. 根据权利要求65所述的可移动平台,其特征在于,所述根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,还包括:The movable platform according to claim 65, wherein, according to the semantic segmentation result, deleting all or part of the other regions in the first target region that does not correspond to the region of the target further comprises:
    对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。The semantic segmentation result output by the deep learning model is convolved with a preset convolution kernel to reduce the segmentation area representing the target in the semantic segmentation result, and based on the semantic segmentation result after the convolution, adjust the first target area.
  71. 根据权利要求70所述的可移动平台,其特征在于,对所述深度学习模型输出的语义分割结果进行卷积,包括:The movable platform according to claim 70, wherein the convolution of the semantic segmentation result output by the deep learning model comprises:
    每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的 中心点位置的像素值。In each convolution, the minimum pixel value of the area where the convolution kernel is located is taken as the pixel value of the center point of the area where the convolution kernel is located.
  72. 根据权利要求60所述的可移动平台,其特征在于,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The movable platform according to claim 60, wherein the deleting all or part of other areas in the first target area that does not correspond to the area of the target includes:
    对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,删除所述轮廓所包含的区域以外的其他区域。A difference operation is performed on the depth map and the depth maps of adjacent frames, and the contour of the target object on the depth map is determined based on the difference result, and other areas other than the area included in the contour are deleted.
  73. 根据权利要求60所述的可移动平台,其特征在于,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:The movable platform according to claim 60, wherein, according to the depth information of the tracking area, controlling the movable platform to move relative to the target comprises:
    根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to follow the movement of the target.
  74. 一种控制装置,其特征在于,所述控制装置包括:存储器和处理器;A control device, characterized in that the control device comprises: a memory and a processor;
    所述存储器用于存储程序代码;the memory is used to store program codes;
    所述处理器调用所述程序代码,当程序代码被执行时,用于执行以下操作:The processor invokes the program code, when the program code is executed, for performing the following operations:
    获取所述可移动平台采集得到的目标物所在场景的深度图;acquiring the depth map of the scene where the target object is collected and obtained by the movable platform;
    获取所述深度图上包括所述目标物的第一目标区域;acquiring a first target area including the target on the depth map;
    删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,得到跟踪区域;Deleting all or part of other areas in the first target area that do not correspond to the area of the target to obtain a tracking area;
    根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动。The movable platform is controlled to move relative to the target according to the depth information of the tracking area.
  75. 根据权利要求74所述的装置,其特征在于,获取所述深度图上包括所述目标物的第一目标区域,包括:The apparatus according to claim 74, wherein acquiring the first target area including the target on the depth map comprises:
    获取包含所述目标物的第二目标区域,所述第二目标区域位于所述可移动平台采集到的彩色图像上;acquiring a second target area containing the target, the second target area being located on the color image collected by the movable platform;
    将所述第二目标区域投影到所述深度图上,得到所述第一目标区域。Projecting the second target area onto the depth map to obtain the first target area.
  76. 根据权利要求75所述的装置,其特征在于,将所述第二目标区域投影到所述深度图上,包括:The apparatus of claim 75, wherein projecting the second target region onto the depth map comprises:
    将T时刻生成的所述彩色图像上的第二目标区域投影到T时刻生成的所述深度图上。Projecting the second target area on the color image generated at time T onto the depth map generated at time T.
  77. 根据权利要求75所述的装置,其特征在于,将所述第二目标区域投影到所述深度图上,之后还包括:The apparatus of claim 75, wherein projecting the second target area onto the depth map, further comprising:
    分别提取所述彩色图像上的第一特征点以及所述深度图上的第二特征点;respectively extracting the first feature point on the color image and the second feature point on the depth map;
    对所述第一特征点和所述第二特征点进行特征匹配;performing feature matching on the first feature point and the second feature point;
    基于特征匹配的结果,修正所得到的所述深度图的第一目标区域。Based on the result of feature matching, the obtained first target area of the depth map is modified.
  78. 根据权利要求74所述的装置,其特征在于,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The device according to claim 74, wherein the deleting all or part of other regions in the first target region that does not correspond to the region of the target comprises:
    对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。Semantic segmentation is performed on the first target area, and based on the result of the semantic segmentation, all or part of other areas in the first target area that are not corresponding to the area of the target object are deleted.
  79. 根据权利要求78所述的装置,其特征在于,所述对所述第一目标区域进行语义分割,基于语义分割结果删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The device according to claim 78, wherein the semantic segmentation of the first target area is performed, based on a result of the semantic segmentation, deletion of other areas in the first target area that do not correspond to the area of the target object All or part of the area, including:
    基于预先训练的深度学习模型,对所述深度图进行语义分割;performing semantic segmentation on the depth map based on a pre-trained deep learning model;
    根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域。According to the semantic segmentation result, delete all or part of other regions in the first target region that do not correspond to the region of the target object.
  80. 根据权利要求79所述的装置,其特征在于,所述深度学习模型的训练过程包括:The apparatus according to claim 79, wherein the training process of the deep learning model comprises:
    获取所述深度学习模型输出的训练图像中像素点对应的语义标签,基于所述语义分割标签计算第一损失函数,基于所述第一损失函数训练所述深度学习模型。A semantic label corresponding to a pixel in a training image output by the deep learning model is acquired, a first loss function is calculated based on the semantic segmentation label, and the deep learning model is trained based on the first loss function.
  81. 根据权利要求80所述的装置,其特征在于,在基于所述语义标签进行训练所述深度学习模型之前,所述训练过程还包括:The apparatus according to claim 80, wherein before training the deep learning model based on the semantic labels, the training process further comprises:
    获取所述深度学习模型输出的训练图像对应的类别标签,基于所述类别标签计算第二损失函数,基于所述第二损失函数训练所述深度学习模型,确定在基于所述第一损失函数训练所述深度学习模型之前,所述深度学习模型的初始参数。Obtain the class label corresponding to the training image output by the deep learning model, calculate a second loss function based on the class label, train the deep learning model based on the second loss function, and determine whether to train based on the first loss function Before the deep learning model, the initial parameters of the deep learning model.
  82. 根据权利要求81所述的装置,其特征在于,所述第一损失函数和/或所述第二损失函数为交叉熵损失函数。The apparatus according to claim 81, wherein the first loss function and/or the second loss function is a cross-entropy loss function.
  83. 根据权利要求81所述的装置,其特征在于,所述训练图像包括灰度图像。81. The apparatus of claim 81, wherein the training image comprises a grayscale image.
  84. 根据权利要求79所述的装置,其特征在于,所述根据语义分割结果,删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,还包括:The device according to claim 79, wherein, according to the semantic segmentation result, deleting all or part of other regions in the first target region that does not correspond to the region of the target further comprises:
    对所述深度学习模型输出的语义分割结果使用预设的卷积核进行卷积,以缩小所述语义分割结果中表征所述目标物的分割区域,并基于卷积后的语义分割结果,调整所述第一目标区域。The semantic segmentation result output by the deep learning model is convolved with a preset convolution kernel to reduce the segmentation area representing the target in the semantic segmentation result, and based on the semantic segmentation result after the convolution, adjust the first target area.
  85. 根据权利要求84所述的装置,其特征在于,对所述深度学习模型输出的语义分割结果进行卷积,包括:The device according to claim 84, wherein the convolution of the semantic segmentation result output by the deep learning model comprises:
    每次卷积时,将卷积核所在区域的最小像素值作为卷积核所在区域的中心点位置的像素值。In each convolution, the minimum pixel value of the area where the convolution kernel is located is taken as the pixel value of the center point of the area where the convolution kernel is located.
  86. 根据权利要求74所述的装置,其特征在于,所述删除所述第一目标区域中未对应所述目标物的区域的其他区域的全部或部分区域,包括:The device according to claim 74, wherein the deleting all or part of other regions in the first target region that does not correspond to the region of the target comprises:
    对所述深度图及其相邻帧深度图进行差分运算,并基于差分结果确定所述深度图上所述目标物的轮廓,删除所述轮廓所包含的区域以外的其他区域。A difference operation is performed on the depth map and the depth maps of adjacent frames, and the contour of the target object on the depth map is determined based on the difference result, and other areas other than the area included in the contour are deleted.
  87. 根据权利要求74所述的装置,其特征在于,根据所述跟踪区域的深度信息,控制所述可移动平台相对于所述目标物运动,包括:The device according to claim 74, wherein, according to the depth information of the tracking area, controlling the movable platform to move relative to the target comprises:
    根据所述跟踪区域的深度信息,控制所述可移动平台跟随所述目标物运动。According to the depth information of the tracking area, the movable platform is controlled to follow the movement of the target.
  88. 一种机器可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被执行时实现如权利要求1-29任一项所述的方法。A machine-readable storage medium, characterized in that a computer program is stored thereon, and when the computer program is executed, the method according to any one of claims 1-29 is implemented.
PCT/CN2021/072581 2021-01-18 2021-01-18 Movable platform and method and apparatus for controlling same, and machine-readable storage medium WO2022151507A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/072581 WO2022151507A1 (en) 2021-01-18 2021-01-18 Movable platform and method and apparatus for controlling same, and machine-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/072581 WO2022151507A1 (en) 2021-01-18 2021-01-18 Movable platform and method and apparatus for controlling same, and machine-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022151507A1 true WO2022151507A1 (en) 2022-07-21

Family

ID=82446822

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/072581 WO2022151507A1 (en) 2021-01-18 2021-01-18 Movable platform and method and apparatus for controlling same, and machine-readable storage medium

Country Status (1)

Country Link
WO (1) WO2022151507A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971380A (en) * 2014-05-05 2014-08-06 中国民航大学 Pedestrian trailing detection method based on RGB-D
CN104751491A (en) * 2015-04-10 2015-07-01 中国科学院宁波材料技术与工程研究所 Method and device for tracking crowds and counting pedestrian flow
US20160110610A1 (en) * 2014-10-15 2016-04-21 Sony Computer Entertainment Inc. Image processor, image processing method, and computer program
CN110400338A (en) * 2019-07-11 2019-11-01 Oppo广东移动通信有限公司 Depth map processing method, device and electronic equipment
CN111582155A (en) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 Living body detection method, living body detection device, computer equipment and storage medium
CN112223278A (en) * 2020-09-09 2021-01-15 山东省科学院自动化研究所 Detection robot following method and system based on depth visual information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971380A (en) * 2014-05-05 2014-08-06 中国民航大学 Pedestrian trailing detection method based on RGB-D
US20160110610A1 (en) * 2014-10-15 2016-04-21 Sony Computer Entertainment Inc. Image processor, image processing method, and computer program
CN104751491A (en) * 2015-04-10 2015-07-01 中国科学院宁波材料技术与工程研究所 Method and device for tracking crowds and counting pedestrian flow
CN110400338A (en) * 2019-07-11 2019-11-01 Oppo广东移动通信有限公司 Depth map processing method, device and electronic equipment
CN111582155A (en) * 2020-05-07 2020-08-25 腾讯科技(深圳)有限公司 Living body detection method, living body detection device, computer equipment and storage medium
CN112223278A (en) * 2020-09-09 2021-01-15 山东省科学院自动化研究所 Detection robot following method and system based on depth visual information

Similar Documents

Publication Publication Date Title
CN110163904B (en) Object labeling method, movement control method, device, equipment and storage medium
US11645765B2 (en) Real-time visual object tracking for unmanned aerial vehicles (UAVs)
US11710243B2 (en) Method for predicting direction of movement of target object, vehicle control method, and device
US9990736B2 (en) Robust anytime tracking combining 3D shape, color, and motion with annealed dynamic histograms
Huang et al. Robust inter-vehicle distance estimation method based on monocular vision
US11669972B2 (en) Geometry-aware instance segmentation in stereo image capture processes
US20210103299A1 (en) Obstacle avoidance method and device and movable platform
US20210237774A1 (en) Self-supervised 3d keypoint learning for monocular visual odometry
JP7135665B2 (en) VEHICLE CONTROL SYSTEM, VEHICLE CONTROL METHOD AND COMPUTER PROGRAM
CN110969064B (en) Image detection method and device based on monocular vision and storage equipment
CN111738033B (en) Vehicle driving information determination method and device based on plane segmentation and vehicle-mounted terminal
CN112654998B (en) Lane line detection method and device
WO2020010620A1 (en) Wave identification method and apparatus, computer-readable storage medium, and unmanned aerial vehicle
Gupta et al. 3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach using Single Monocular Images.
WO2024001617A1 (en) Method and apparatus for identifying behavior of playing with mobile phone
WO2022151507A1 (en) Movable platform and method and apparatus for controlling same, and machine-readable storage medium
CN112733678A (en) Ranging method, ranging device, computer equipment and storage medium
CN112016394A (en) Obstacle information acquisition method, obstacle avoidance method, mobile device, and computer-readable storage medium
Pinard et al. End-to-end depth from motion with stabilized monocular videos
CN111126170A (en) Video dynamic object detection method based on target detection and tracking
CN116259043A (en) Automatic driving 3D target detection method and related device
Onkarappa et al. On-board monocular vision system pose estimation through a dense optical flow
US20230419522A1 (en) Method for obtaining depth images, electronic device, and storage medium
CN112654997B (en) Lane line detection method and device
US20240029283A1 (en) Image depth prediction method, electronic device, and non-transitory storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21918707

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21918707

Country of ref document: EP

Kind code of ref document: A1