WO2021184289A1 - 对象解算、绕点飞行方法及设备 - Google Patents

对象解算、绕点飞行方法及设备 Download PDF

Info

Publication number
WO2021184289A1
WO2021184289A1 PCT/CN2020/080162 CN2020080162W WO2021184289A1 WO 2021184289 A1 WO2021184289 A1 WO 2021184289A1 CN 2020080162 W CN2020080162 W CN 2020080162W WO 2021184289 A1 WO2021184289 A1 WO 2021184289A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
coordinate system
target
coordinates
Prior art date
Application number
PCT/CN2020/080162
Other languages
English (en)
French (fr)
Inventor
聂谷洪
张李亮
施泽浩
杨龙超
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/080162 priority Critical patent/WO2021184289A1/zh
Priority to CN202080004354.7A priority patent/CN113168716A/zh
Publication of WO2021184289A1 publication Critical patent/WO2021184289A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Definitions

  • This application relates to the field of intelligent control technology, and in particular to a method and equipment for object calculation and flying around a point.
  • intelligent flight technologies such as intelligent flying or flying around points for target objects such as buildings, vehicles, ships, or aircraft are gradually being loved by the majority of users .
  • the electronic device needs to know the position of the target object, and realize the intelligent flight against the target object through the position.
  • the monocular camera installed in the electronic device can usually collect the 2-dimensional image information of the target object, and extract the four vertices of the 2-dimensional rectangular frame formed by the target object in the image information in the world coordinate system. According to the coordinate point of the center of the 2-dimensional rectangular frame, determine the center point for intelligent flight of the target object.
  • the two-dimensional box describes the spatial information of the target object less, there is a large error in the following center point determined by the center point of the target object corresponding to the two-dimensional rectangular box.
  • the positioning of the target object is not accurate enough.
  • the embodiments of the present application provide a method and device for object calculation and flying around a point.
  • the center point of the target object is calculated through a three-dimensional coordinate frame to improve the accuracy of object positioning and realize high-quality intelligent flight.
  • an embodiment of the present application provides an object calculation method, the method includes:
  • the object is at the center point of the world coordinate system; the center point of the target object in the world coordinate system is mapped to the image coordinate system to obtain the image coordinate point.
  • an embodiment of the present application provides a method for flying around a point, and the method includes:
  • the image information of the target object based on the image information, determine the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system; determine the target according to the first coordinates of the three-dimensional frame in the world coordinate system
  • the object is at the center point of the world coordinate system; the center point of the target object in the world coordinate system is mapped to the image coordinate system to obtain the image coordinate point; the electronic device is controlled to circle the target with the image coordinate point as the center point
  • the subject performs flight processing.
  • an embodiment of the present application provides an object focusing method, including:
  • the image information of the target object based on the image information, determine the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system; determine the target according to the first coordinates of the three-dimensional frame in the world coordinate system
  • the object is at the center point of the world coordinate system; the center point of the target object in the world coordinate system is mapped to the image coordinate system to obtain image coordinate points; and the target object is focused according to the image coordinate points.
  • an embodiment of the present application provides an object solving device, the device includes: a storage component and a processing component; the storage component is used to store one or more computer instructions, and the one or more computer instructions are used for It is called by the processing component to execute any object calculation method provided in the embodiment of the present application.
  • an embodiment of the present application provides an electronic device.
  • the device includes a storage component and a processing component; the storage component is used to store one or more computer instructions, and the one or more computer instructions are used to be
  • the processing component is called to execute any method of flying around a point provided in the embodiment of the present application.
  • an embodiment of the present application provides an electronic device, the device includes: a storage component and a processing component; the storage component is used to store one or more computer instructions, and the one or more computer instructions are used to be The processing component is called; the processing component is used for:
  • the image information of the target object based on the image information, determine the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system; determine the target according to the first coordinates of the three-dimensional frame in the world coordinate system
  • the object is at the center point of the world coordinate system; the center point of the target object in the world coordinate system is mapped to the image coordinate system to obtain image coordinate points; and the target object is focused according to the image coordinate points.
  • the first coordinate in the world coordinate system of the three-dimensional frame corresponding to the target object can be determined based on the image information.
  • the three-dimensional box can express more three-dimensional information of the target object.
  • the three-dimensional box can also express the depth information of the target object.
  • the world coordinate system is a spatial coordinate system relative to the real world.
  • the center point in the world coordinate system can be mapped to the image coordinate system to realize the three-dimensional reconstruction of the coordinate system on the electronic device.
  • the image coordinate point may be the target point when the electronic device intelligently follows the target object.
  • the three-dimensional frame is closer to the actual shape of the target object, and the positioning of the center point of the target object through the three-dimensional space coordinate system is more accurate, so that more accurate tracking of the target point can be obtained, and the positioning of the intelligent flight is more accurate.
  • FIG. 1 is a flowchart of an embodiment of an object solving method provided by an embodiment of the application
  • FIG. 2 is an example diagram of a three-dimensional frame of a target object provided by an embodiment of this application.
  • FIG. 3 is an example diagram of a center point of a three-dimensional frame provided by an embodiment of this application.
  • FIG. 5 is a flowchart of an embodiment of a method for training a network prediction model provided by an embodiment of this application;
  • FIG. 6 is an example diagram of angle error prediction provided by an embodiment of the application.
  • FIGS. 7a-7b are diagrams of another example of angular error prediction provided by an embodiment of the application.
  • FIG. 8 is a flowchart of an embodiment of a free-point flying method provided by an embodiment of the application.
  • FIG. 9 is a flowchart of another embodiment of a method for flying around a point according to an embodiment of the application.
  • FIG. 10 is a flowchart of an embodiment of an object focusing method provided by an embodiment of this application.
  • FIG. 11 is a schematic structural diagram of an embodiment of an object solving device provided by an embodiment of this application.
  • FIG. 12 is a schematic structural diagram of an embodiment of an electronic device provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of an embodiment of an electronic device provided by an embodiment of this application.
  • the words “if” and “if” as used herein can be interpreted as “when” or “when” or “in response to determination” or “in response to recognition”.
  • the phrase “if determined” or “if recognized (statement or event)” can be interpreted as “when determined” or “in response to determination” or “when recognized (statement or event) )” or “in response to identification (statement or event)”.
  • the embodiments of the present application can be applied to an intelligent flight control scene of an unmanned aerial vehicle.
  • the three-dimensional frame of the target object is obtained by analyzing the three-dimensional space of the target object, so as to obtain a more accurate center point of the target object in the three-dimensional space.
  • the location of the target object needs to be predicted, that is, the center point of the target object needs to be obtained as the monitoring center. point.
  • the 2-dimensional rectangular frame of the target object can be predicted, and the center of the 2-dimensional rectangular frame can be used as the center point of the target object.
  • the 2-dimensional rectangular frame can only express the length and height information of the target object, it lacks the depth information of the target object space.
  • the center point determined by the 2-dimensional rectangular frame is not accurate enough, and errors are likely to occur when performing intelligent flight or fixed-point tracking.
  • the target after obtaining the image information of the target object, the target can be determined based on the image information.
  • the three-dimensional frame can also express the depth information of the target object and describe more spatial information of the target object. Therefore, the target object corresponds to the first three-dimensional frame in the world coordinate system.
  • the coordinates can determine the center point of the target object in the world coordinate system, and then map the center point of the target object in the world coordinate system to the image coordinate system, and the obtained image coordinate point is the center when the electronic device intelligently follows the target object point.
  • the three-dimensional frame is closer to the actual shape of the target object, and the positioning of the center point of the target object through the three-dimensional space coordinate system is more accurate, so that more accurate tracking of the target point can be obtained, and the positioning of the intelligent flight is more accurate.
  • FIG. 1 it is a flowchart of an embodiment of an object solving method provided by an embodiment of this application.
  • the method may include the following steps:
  • the object calculation method provided in the embodiments of this application can be applied to electronic devices such as drones, mobile phones, handheld platforms, unmanned vehicles, etc.
  • electronic devices such as drones, mobile phones, handheld platforms, unmanned vehicles, etc.
  • the embodiments of this application do not impose any restrictions on the specific types of electronic devices.
  • the image information may be acquired by the camera of the electronic device for the target object.
  • the electronic device can be equipped with a camera, that is, a camera.
  • the camera in the electronic device is usually a monocular camera.
  • the electronic device can use the camera to take a two-dimensional picture of the target object.
  • the image information can include the taken two-dimensional picture.
  • the image information may also include shooting information such as a timestamp.
  • the electronic device may include a processor or a processing component to execute the object calculation method provided in the embodiments of the present application. It can be understood that the structure provided in the implementation of this application does not constitute a specific limitation on the electronic device. In some embodiments, the electronic device may include other components or a combination of components, and the components may be implemented in hardware, software, or a combination of software and hardware.
  • Target objects can include movable objects such as vehicles, ships, and airplanes, and can also include buildings, structural devices with a specific combination, large mechanical equipment, or electronic equipment, and so on. Because in intelligent tracking scenes such as intelligent tracking flight or flying around a point, if the ratio of the distance between the electronic device and the target object to the volume of the target object exceeds the first threshold, any point in the space of the target object can be regarded as the target object. The center point during intelligent flight; and if the ratio of the distance between the electronic device and the target object to the volume of the target object is less than the first threshold, the center point of the target object can be used as the target object to achieve more accurate intelligent flight.
  • the embodiments of the present application may be applicable to intelligent flight scenarios where the ratio of the distance between the electronic device and the target object to the volume of the target object is less than the first threshold.
  • the embodiment of the present application may also be applicable to the target object volume being larger than the second threshold. Object.
  • the first threshold and the second threshold can be set according to actual usage requirements. For example, when the distance unit is meters and the volume unit is cubic meters, the first threshold may be set to 1. The second threshold can be set to 10 cubic meters.
  • the target object may continuously move, and the electronic device may collect image information of the target object in a moving state to obtain the image information of the target object in real time.
  • the target object In order to accurately locate the target object, the target object needs to be fully presented in the image information, and the center and size of the target object in the image information are described with more accurate data. Further, optionally, the target object may be located in the middle of the image, so that the target object in the image information can be accurately identified, so that the target object in the image information can be analyzed.
  • the method may further include: performing contour detection on the image information to identify the contour of the target object in the image information; based on the position of the contour in the image information, judging whether the image information meets the conditions of use; If it is satisfied, based on the image information, determine the first coordinate of the three-dimensional frame corresponding to the target object in the world coordinate system; if it is not satisfied, a prompt message indicating that the image information does not meet the conditions of use can be output to prompt an accurate prompt for the image information .
  • the three-dimensional frame of the target object is a three-dimensional rectangular frame.
  • the first coordinates of the target object corresponding to the three-dimensional frame in the world coordinate system may include the coordinate points of the eight vertices of the target object corresponding to the three-dimensional rectangular frame in the world coordinate system, that is, the first coordinates are defined by the eight coordinates in the world coordinate system. Vertex structure.
  • the rectangular frame formed by connecting the 8 coordinate vertices can exactly surround the target object.
  • the 8 coordinate vertices of the target object 201 in FIG. 2 corresponding to the three-dimensional frame 202 are the first coordinates of the target object 201 in the world coordinate system OXYZ.
  • the determining, based on the image information, the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system may include: based on a three-dimensional contour detection technology, identifying the three-dimensional frame of the target object in the image information in the world coordinates The first coordinate in the system.
  • the three-dimensional contour detection technology is a detection technology commonly used by those skilled in the art, and will not be repeated here.
  • the first coordinate may include 8 coordinate vertices located in the world coordinate system. These 8 coordinate vertices are connected to each other to form the three-dimensional frame of the target object.
  • the coordinate point of the center point of the three-dimensional frame in the world coordinate system can be the target object in the world coordinate system.
  • the center point is the coordinate point of the center point of the three-dimensional frame in the world coordinate system.
  • the three-dimensional frame formed by connecting the 8 coordinate vertices of the first coordinate to each other is actually a cuboid, and the intersection point when the two diagonals of the cuboid intersect is the center point of the three-dimensional frame.
  • the diagonal of the body refers to the line connecting the two vertices that are not on the same side of the top and bottom of the rectangular parallelepiped.
  • the body diagonal 301 and the body diagonal 302 formed by the connection between the upper and lower bottom surfaces of the three-dimensional box 300 and the two vertices that are not on the same side
  • the intersection of the diagonals is the center point of the three-dimensional box 300
  • the coordinate point of the center point in the world coordinate system is the center point of the target object in the world coordinate system.
  • the three-dimensional box Compared with the two-dimensional box including the length and height information of the target object, the three-dimensional box also includes the depth information of the target object, which contains more spatial information of the target object, and the positioning of the center point of the target object is more accurate and more accurate.
  • the center point is more accurate and more accurate.
  • the method may further include: determining a target point when the electronic device intelligently follows the target object according to the image coordinate point.
  • the world coordinate system is the coordinate system of the target object in the real world, and the electronic device needs to use the imaging system of the electronic device as the benchmark in the intelligent following scenes such as flying around a point, target tracking, tracking shooting, or automatic obstacle avoidance.
  • the camera of the electronic device corresponds to the image coordinate system as a reference to achieve precise and intelligent tracking of the target object. Therefore, it is necessary to map the center point of the target object in the world coordinate system to the image coordinate system to obtain the image coordinate point.
  • the target point for the intelligent follower such as the electronic device to fly around the point or the intelligent follower flight for the target object, can be determined.
  • the image coordinate point of the target object can be used as the target point in the following process in the intelligent follow process.
  • the smart following may include smart tracking, tracking shooting, automatic obstacle avoidance, or flying around a point.
  • the smart tracking may specifically be that the electronic device locks the target object and tracks the target object as the target object moves, and the target object can be photographed during the tracking. Flying around a point can also be referred to as a point-of-interest circle. Specifically, the electronic device locks the target object and uses the center point of the target object as the center of the circle to fly around the point. The target object can be photographed during the flying around the point.
  • the mapping the center point of the target object in the world coordinate system to the image coordinate system, and the obtained image coordinate point is the center point when the electronic device intelligently flies against the target object may include : Determine the coordinate conversion relationship between the world coordinate system corresponding to the target object and the image coordinate system; map the center point of the target object in the world coordinate system to the image coordinate system based on the coordinate conversion relationship to obtain the image coordinate point; The image coordinate point is used as the center point of the electronic device when it intelligently flies against the target object.
  • the coordinate conversion relationship may be formed based on the camera internal parameter matrix corresponding to the camera parameter of the electronic device, and the camera external parameter matrix corresponding to the camera coordinate system and the world coordinate system.
  • the distance conversion matrix T1 is determined in combination with the angle rotation matrix R in the embodiment shown in FIG. 4 and the camera view angle parameters of the electronic device.
  • Camera external parameter matrix It is constructed based on the angle rotation matrix and the distance conversion matrix.
  • the distance between the target object and the electronic device may be set by the user when using the electronic device, or may be a default value set based on empirical data.
  • the first coordinate in the world coordinate system of the three-dimensional frame corresponding to the target object can be determined based on the image information.
  • the three-dimensional box can express more three-dimensional information of the target object.
  • the three-dimensional box can also express the depth information of the target object.
  • the world coordinate system is a spatial coordinate system relative to the real world.
  • the center point in the world coordinate system can be mapped to the image coordinate system to realize the three-dimensional reconstruction of the coordinate system on the electronic device.
  • the image coordinate point may be the target point when the electronic device intelligently follows the target object.
  • the three-dimensional frame is closer to the actual shape of the target object, and the positioning of the center point of the target object through the three-dimensional space coordinate system is more accurate, so that you can follow the target point more accurately, making the positioning of the intelligent flight more accurate.
  • the coordinate point of the center point of the three-dimensional frame in the world coordinate system can be obtained by using the first coordinate solution.
  • the first coordinates include eight coordinate vertices in the world coordinate system corresponding to the three-dimensional frame of the target object in the image information;
  • the determining the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system may include:
  • the eight coordinate points of the three-dimensional frame in the world coordinate system are used to perform center point calculation to obtain the center point of the target object in the world coordinate system.
  • Said using the three-dimensional frame to perform the center point solution at the eight coordinate points of the world coordinate system, and calculating the center point of the target object in the world coordinate system may include: using the three-dimensional frame to perform the center point solution on the eight coordinate vertices of the world coordinate system Calculate, obtain the coordinate point of the center point of the three-dimensional frame in the world coordinate system, and determine the coordinate point of the center point of the three-dimensional frame in the world coordinate system as the center point of the target object in the world coordinate system.
  • Neural network algorithm is a theoretical model that simulates human thinking mode, with strong nonlinear mapping ability and simulation ability.
  • a neural network algorithm may be used to predict the first coordinate of the target object in the world coordinate system.
  • the trained network prediction model can be used to input the image information into the network prediction model to quickly and accurately obtain the first coordinate of the target object in the world coordinate system.
  • the determining, based on the image information, the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system may include:
  • the image information is input into a network prediction model, and the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are obtained by calculation.
  • the method may further include:
  • the determining the target point when the electronic device intelligently follows the target object according to the image coordinate point may include:
  • a target point when the electronic device intelligently follows the target object is determined.
  • the size data may include the length, width and/or height of the target object.
  • FIG. 4 a flowchart of another embodiment of an object solving method provided in an embodiment of this application, the method may include the following steps:
  • the target angle is the angle of the target object relative to the electronic device.
  • the target angle of the electronic device may include:
  • the first coordinate of the three-dimensional frame in the image information of the target object in the world coordinate system is recognized;
  • the contour detection technology is recognized in the image The second coordinate in the coordinate system;
  • the target angle between the target object and the electronic device is determined.
  • determining the target angle between the target object and the electronic device according to the two-dimensional frame of the target object in the image information may include: calculating the length and height of the two-dimensional frame of the target object in the image information; querying the length of the two-dimensional frame And the data direction table corresponding to the height and the object angle to obtain the target angle corresponding to the length and the height.
  • the aspect ratio of the two-dimensional frame is stored in the data direction table in association with the object angle, and different aspect ratios correspond to different object angles.
  • the data direction table may be based on the actual length and actual height of the target object in advance, combined with the length and height corresponding to the multiple two-dimensional frames, to determine the object angles respectively corresponding to the multiple two-dimensional frames. It can count the actual length and height of the target object relative to the actual length and height of the target object under different object angles, and store the corresponding length and height of different two-dimensional frames in association with the corresponding object angle. In the data direction table for easy query.
  • the target angle is the angle of rotation between the target object and the electronic device, that is, the angle at which the target object needs to be rotated when the target object needs to be mapped from the world coordinate system to the image coordinate system of the electronic device.
  • the target angle may include: the horizontal rotation angle ⁇ of the target object relative to the electronic device, and the vertical rotation angle ⁇ of the target object relative to the electronic device.
  • the image coordinates are the coordinate points where the first coordinates are mapped in the image coordinate system, and the coordinate points obtained by mapping the first coordinates to the image coordinate system are the image coordinate points.
  • the image coordinates corresponding to the first coordinates in the image coordinate system may be determined based on the coordinate operation.
  • the target distance is used to determine the following distance when the electronic device intelligently follows the target object.
  • the flight trajectory of the electronic device can be adjusted according to the distance between the camera of the electronic device and the target object, that is, the target distance.
  • the center point of the target object in the world coordinate system mapped to the coordinate point of the image coordinate system can be used as the center point of the flight.
  • the target distance is the distance between the electronic device and the target object, which can be used for intelligent flight control.
  • the target distance may be the flight distance when the electronic device performs an intelligent follow-up flight to the target object.
  • the target distance can be the circle radius when the electronic device flies around the point with respect to the target object.
  • the image coordinate point of the target object in the image coordinate system is the circle around the point when flying around the point.
  • the point of interest that is, the center of the circle when flying around the target object.
  • 405 Determine the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system.
  • the method may further include: determining a target point when the electronic device intelligently follows the target object according to the image coordinate point.
  • the mapping the center point of the target object in the world coordinate system to the image coordinate system, and the obtained image coordinate point is the center point of the electronic device during intelligent flight against the target object may include: The coordinate conversion relationship between the first coordinate in the world coordinate system and the image coordinate in the image coordinate system, and the image coordinate point obtained by mapping the center point of the target object in the world coordinate system to the image coordinate system; It is determined that the image coordinate point is a target point when the electronic device intelligently follows the target object.
  • the image information can be input into the network prediction model obtained by training, and the three-dimensional frame corresponding to the target object is predicted in the world coordinate system through the network prediction model.
  • the first coordinate of the target object corresponds to the second coordinate of the two-dimensional frame in the image coordinate system and the target angle of the target object relative to the electronic device.
  • the three-dimensional frame of the target object can describe posture information such as the position or angle of the target object in a more three-dimensional manner, and can contain more information of the target object in space than the two-dimensional frame.
  • the coordinate conversion relationship between the first coordinates in the world coordinate system and the image coordinates in the image coordinate system may be combined with the
  • the second coordinate and the target angle are calculated to calculate the distance between the target object and the electronic device to obtain the target distance.
  • the distance between the target object and the camera of the electronic device is calculated through the various information of the target object in the image information, and the distance between the camera and the target object is used as the following distance, which can reduce the camera's shooting error of the target object and improve The shooting effect of the camera on the target object.
  • the neural network algorithm can contain different neural networks, it has strong nonlinear mapping capabilities and simulation capabilities.
  • the first coordinate, the second coordinate and the target angle can be directly predicted by the network prediction model .
  • the target object corresponds to the first coordinate of the three-dimensional frame in the world coordinate system, the second coordinate of the corresponding two-dimensional frame in the image coordinate system, and the target object is relative to the electronic device.
  • the target angle can include:
  • the network prediction model may be used to predict the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system, the second coordinates of the two-dimensional frame corresponding to the target object in the image coordinate system, and the target object The target angle relative to the electronic device.
  • the prediction result of the network prediction model can be used alone, for example, only the first coordinate of the three-dimensional frame corresponding to the target object in the world coordinate system is used.
  • the network prediction model used in the implementation of this application is a trained network prediction model.
  • the training process of the network prediction model refer to the embodiment shown in FIG. 5, and the specific training process and steps have been described in detail in the embodiment shown in FIG. 5, and will not be repeated here.
  • the distance between the target object and the camera of the electronic device in the conversion matrix is calculated to obtain the target distance.
  • the conversion matrix between the world coordinate system and the image coordinate system, and the coordinate points in the world coordinate system can be mapped to the image coordinate system through the conversion matrix.
  • the first coordinate in the world coordinate system is converted to the image coordinate in the image coordinate system through the conversion matrix.
  • the image coordinate system is a two-dimensional coordinate system when the electronic device corresponds to the camera to display an image, and can be used to display image pixels.
  • the center point of the target object in the world coordinate system is actually a point in the three-dimensional coordinate system, and the image coordinate point of the target object in the image coordinate system is actually a two-dimensional coordinate point.
  • the target object's coordinate point in the image coordinate system can be obtained through a series of coordinate mapping at the center point of the target object in the world coordinate system.
  • the center point of the world coordinate system and the image coordinate point of the image coordinate system are three-dimensional and two-dimensional matching points.
  • the conversion matrix when the first coordinate in the system is converted to the image coordinate in the image coordinate system may include:
  • a conversion matrix when the first coordinate in the world coordinate system is converted to the image coordinate in the image coordinate system is constructed.
  • the coordinate points of the world coordinate system can be passed through the camera through rigid body transformation, that is, rotation and translation.
  • the coordinate system is converted to the image coordinate system.
  • R can be used to represent the angle rotation, that is, the angle rotation matrix
  • T the translation of the object, that is, the distance conversion matrix.
  • R and T can be called external camera parameters.
  • the camera external parameter matrix can be constructed according to the angle rotation matrix and the distance rotation rectangle.
  • the camera external parameter matrix can be specifically:
  • the matrix corresponding to the rotation angle is the angle rotation matrix R, which can pass the target angle, that is, the rotation angle ⁇ of the target object relative to the electronic device in the horizontal direction, and the vertical rotation angle ⁇ of the target object relative to the electronic device.
  • the target object when the target object is translated from the world coordinate system to the camera coordinate system, it is related to the camera's own camera angle of view parameter v and also to the rotation matrix R.
  • the parameters in the camera used are mainly the focal length of the camera.
  • the focal length of the camera may include the focal length fx of the camera in the horizontal direction and the focal length fy of the camera in the vertical direction.
  • the camera internal parameter matrix can be expressed as:
  • fx is the focal length of the camera in the horizontal direction and fy is the focal length of the camera in the vertical direction; (cx, cy) is the coordinate point of the center of the two-dimensional frame formed by the second coordinates of the target object in the world coordinate system.
  • the second coordinate of the two-dimensional frame in the image coordinate system of the target object corresponding to the two-dimensional frame may include 4 coordinate points, and the center of the two-dimensional frame can be determined by the 4 coordinate points corresponding to the second coordinates.
  • the two-dimensional box is actually a rectangular box, and the intersection of two diagonal lines of the rectangular box is the center of the two-dimensional box.
  • the construction of the conversion matrix corresponding to the first coordinate in the world coordinate system to the image coordinate in the image coordinate system based on the camera internal parameter matrix, the angle rotation matrix, and the distance conversion matrix may include: determining the The camera external parameter matrix corresponding to the angle rotation matrix and the distance conversion matrix, based on the camera internal parameter matrix and the camera external parameter matrix, determine the corresponding conversion when the first coordinate in the world coordinate system is converted to the image coordinate in the image coordinate system matrix.
  • the corresponding conversion matrix can be expressed by the following formula:
  • the calculating the distance between the target object in the conversion matrix and the camera of the electronic device based on the first coordinates and the image coordinates, and obtaining the target distance may include :
  • the distance between the target object and the camera of the electronic device in the conversion equation is solved to obtain the target distance.
  • the conversion method constructed by combining the first coordinates and the image coordinates may specifically be:
  • s is the camera projection coefficient.
  • (u, k) are the image coordinates.
  • (x obj , y obj , z obj ) is the first coordinate.
  • the conversion matrix includes the second coordinate and/or the target angle corresponding to the known quantity, as well as the distance between the target object and the electronic device and the corresponding unknown quantity. Since the first coordinate and the image coordinate are known, the first coordinate and the conversion matrix can be constructed The product of is equal to the transformation equation of the image coordinates, and the unknown quantity in the transformation equation is solved to obtain the target distance. Among them, the unknown quantity in the camera external parameter matrix can be solved based on the first coordinates, the image coordinates, and the camera internal parameter matrix.
  • the solving the distance between the target object and the camera of the electronic device in the conversion equation, and obtaining the target distance may include:
  • An N-point perspective algorithm (Perspective-n-Point, PNP) is used to calculate the distance between the target object and the camera of the electronic device in the conversion equation to obtain the target distance.
  • the PNP algorithm can be used to solve the distance between the target object and the electronic device in the conversion equation to obtain the target distance.
  • the target angle of the target object relative to the electronic device includes: a horizontal angle of the target object relative to the electronic device in a horizontal direction, and a horizontal angle of the target object relative to the electronic device in a vertical direction.
  • the generating an angle rotation matrix corresponding to the coordinate axis of the world coordinate system and the coordinate axis of the camera coordinate system according to the target angle includes:
  • the horizontal angle and the vertical angle are input into a three-dimensional rotation matrix formula, and the angle rotation matrix corresponding to the coordinate axis of the world coordinate system and the coordinate axis of the camera coordinate system is calculated and obtained.
  • the three-dimensional rotation rectangle formula may include a horizontal rotation matrix formula and a vertical rotation matrix formula.
  • the image information is input into the network prediction model obtained by training, and the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are obtained, and the two-dimensional frame corresponding to the target object is in the image coordinate system.
  • the second coordinates below and the target angle of the target object relative to the electronic device may include:
  • At least one characteristic image of the image information is extracted.
  • At least one candidate frame of each feature image is extracted.
  • each candidate frame is represented by four coordinate points.
  • the number of the at least one target feature image is less than the number of the at least one feature image.
  • region feature extraction is performed on each target feature image to obtain at least one region feature corresponding to each target feature image.
  • the image information may correspond to at least one characteristic image, and each characteristic image may be used to describe characteristic information of the target object in the image information.
  • the extracting at least one characteristic image of the image information may include: inputting the image information into a basic characteristic extraction model to obtain at least one characteristic image.
  • the basic feature model may include VGGNet (Visual Geometry Group network, computer vision geometry group), ResNet (Residual Network, residual network), STN (Spatial Transformer Network, spatial transformation network), FCN (Fully Convolutional Networks), all Convolutional neural network) and other models.
  • the basic feature extraction model is mainly a neural network composed of different convolution kernels.
  • Said inputting the image information into the basic feature extraction model to obtain at least one feature image may specifically be: determining at least one convolution kernel of the basic feature extraction model, and performing convolution calculation on the image information with each convolution kernel, A characteristic image corresponding to each convolution kernel is obtained, and at least one characteristic image is obtained.
  • Different convolution kernels can be used to describe the characteristics of image information in different types, different scales and/or different directions. For example, the edge feature of the image information in a certain direction can be extracted.
  • the extracting at least one candidate frame of each characteristic image may include extracting at least one candidate frame of each characteristic image respectively.
  • the extracting at least one candidate frame of each feature image may include: inputting the at least one feature image into an RPN (Region Proposal Network) model, and extracting coordinate points of the candidate frame in the at least one feature image, At least one candidate frame corresponding to each feature image is obtained.
  • Each candidate frame is specifically expressed in the form of coordinate points.
  • the performing key image extraction on the at least one feature image to obtain the at least one target feature image may include: inputting the at least one feature image into a light-head (Light-Head R-CNN) to construct a lightweight Head R-CNN network) model to obtain at least one target feature image.
  • the light-head model can be used to reduce the complexity of at least one feature image. For example, when the feature image is an image corresponding to 3900 channels, after reducing the complexity through the light-head, the at least one target feature image can be reduced to 490 channels.
  • Each target feature image can correspond to at least one candidate frame, and different regional features in the target feature image can be extracted through the candidate frame, so that the candidate frame where the target object is located can be determined from the different regional features.
  • the performing region feature extraction on each target feature image based on at least one candidate frame corresponding to each target feature image to obtain at least one region feature corresponding to each target feature image may include: combining at least one target feature image and each At least one candidate frame input area detection algorithm corresponding to the target feature image obtains at least one area sensitive to the position of the target object.
  • the region detection algorithm may include Pooling Position Sensitive ROI Pooling (position-sensitive candidate region pooling), etc., which may be used to extract features corresponding to sensitive regions in the image.
  • At least one feature region corresponding to the at least one target feature image can be feature-fitted based on the at least one regional feature corresponding to the at least one target feature image to obtain the target object correspondence
  • the at least one regional feature corresponding to the at least one target feature image is subjected to fully connected processing is to input the at least one regional feature respectively corresponding to the at least one target feature into a fully connected layer (FC, fully connected layers) to perform the fully connected processing .
  • FC fully connected layer
  • the fully connected layer can perform non-linear combinations of at least one regional feature corresponding to at least one target feature, for example, weighting and/or linear transformation of multiple input target features, etc., to form a feature with a higher level of expression structure. That is, the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system, the second coordinates of the two-dimensional frame corresponding to the target object in the image coordinate system, and the target angle of the target object relative to the electronic device are obtained.
  • the network prediction model may be obtained through pre-training. That is, the model parameters of the network prediction model can be obtained through training sample training. Said inputting the image information into the network prediction model obtained by training, obtaining the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system and the second coordinates of the two-dimensional frame corresponding to the target object in the image coordinate system And the target angle of the target object relative to the electronic device may specifically include: inputting image information into a pre-trained network prediction model with known parameters, and obtaining the first coordinates and the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system. The target object corresponds to the second coordinates of the two-dimensional frame in the image coordinate system and the target angle of the target object relative to the electronic device.
  • the network prediction model may be a neural network model.
  • each calculation module in the embodiments of the present application By executing the calculation steps of each calculation module in the embodiments of the present application through each neuron calculation model of the network prediction model, it is possible to predict and obtain the first three-dimensional frame corresponding to the target object in the world coordinate system.
  • the parameter training process of the network prediction model is specifically described in detail in the embodiment shown in FIG. 5, and will not be repeated here.
  • the determining the image coordinates of the target object in the image coordinate system includes:
  • the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are mapped to the image coordinate system, and the image coordinates of the three-dimensional frame corresponding to the target object in the image coordinate system are obtained.
  • the mapping the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system to the image coordinate system, and obtaining the image coordinates of the three-dimensional frame corresponding to the target object in the image coordinate system may include :
  • the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are mapped to the image coordinate system to obtain the image coordinates.
  • the determining the camera external parameter matrix corresponding to the coordinate conversion between the world coordinate system and the camera coordinate system may include: generating the coordinate axes and the coordinate axes of the world coordinate system according to the target angle
  • the distance conversion matrix T1 is determined according to the preset distance between the target object and the electronic device;
  • the camera is determined according to the angle rotation matrix R and the distance conversion matrix T1 External parameter matrix.
  • the camera external parameter matrix can be used: Express.
  • v is the camera vision parameter
  • d1 is the preset distance between the electronic device and the target object.
  • the manner of obtaining the angle rotation matrix R may be the same as the manner of obtaining R in the foregoing embodiment, and will not be repeated here.
  • the generating a camera internal parameter matrix according to the second coordinates and the focal length of the camera corresponding to the electronic device includes:
  • a camera internal parameter matrix is generated.
  • a flowchart of an embodiment of a method for training a network prediction model provided by an embodiment of this application, the method may include the following steps:
  • each training image is marked with the first real coordinates of the target object in the training image corresponding to the three-dimensional frame in the world coordinate system, the second real coordinates of the target object corresponding to the two-dimensional frame in the image coordinate system, and The true angle of the target object relative to its electronic device.
  • the first real coordinates and the second real coordinates corresponding to each training image can be obtained by labeling, and at least one training image can be imported into CAD (Computer-aided Design, computer-aided design), and labeled in CAD
  • CAD Computer-aided Design
  • the key points of the 3D frame of the target object and the key points of the 2D frame in each training image, and the CAD model is used to fit the first real coordinates and 2D frame of the key points corresponding to the 3D frame of the object in the world coordinate system
  • the second real coordinate of the corresponding key point in the world coordinate system The true angle of the target object relative to the electronic device can be obtained by measurement.
  • the key points of the 3D box can refer to the 8 vertices of the 3D box.
  • the key points of the 2D frame can refer to the 4 vertices of the 2D frame.
  • the first real coordinates of the 8 vertices of the 3D frame in the world coordinate system can be obtained by CAD fitting, and the 4 vertices of the 2D frame in the world coordinates The second real coordinate under the system.
  • the trained network prediction model can be used to predict the first coordinates corresponding to the three-dimensional frame of the input image information and the second coordinates corresponding to the two-dimensional frame as well as the target object.
  • 503 Take the first real coordinates, the second real coordinates, and the real angles of each training image as training targets, and use the at least one training image to train to obtain model parameters of the network prediction model.
  • the embodiment of the present application provides a method for training a network prediction model.
  • the network prediction model can be obtained by pre-training, so that the trained network prediction model can be directly used when needed, which can improve calculation efficiency.
  • the network prediction model can be trained in real time to improve the timeliness of prediction of the network prediction model.
  • the first real coordinates, second real coordinates, and real angles of each training image are used as training targets, and the at least one training image training to obtain the model parameters of the network prediction model may include :
  • the training error satisfies the training constraint condition, determining that the reference parameter is a model parameter of the network prediction model;
  • the training error does not meet the training constraints, adjust the model parameters of the network prediction model based on the prediction results corresponding to each training image to obtain new reference parameters; return to the determination of the network prediction model The steps referring to the parameters continue.
  • the prediction result corresponding to each training image may include: in each training image, the target object corresponds to the first predicted coordinates of the three-dimensional frame in the world coordinate system, and the target object corresponds to the two-dimensional The second predicted coordinates of the frame in the image coordinate system and the predicted angle of the target object relative to its electronic device;
  • the calculation of the training error of the network prediction model corresponding to the reference parameter based on the prediction result of each training image and its labeled first real coordinates, second real coordinates, and real angles includes:
  • the determining the training error of the network prediction model for the at least one training image based on the first coordinate error, the second coordinate error, and the angle error corresponding to each training image may include: determining that the at least one training image respectively corresponds to the first coordinate
  • the first error constituted by the error determines the second error constituted by the at least one training image respectively corresponding to the second coordinate error and the third error constituted by the at least one training image respectively corresponding to the angular error.
  • the first error, the second error, and the third error are weighted and summed to obtain the training error of the network prediction model for at least one training image.
  • the third error formed by the angle error of at least one training image may be inputting the angle error corresponding to each training image into the error calculation function, The error loss corresponding to each training image is obtained, and the sum of the error loss corresponding to at least one training image is calculated to obtain the third error.
  • the third error formed by the at least one training image corresponding to the angle error may be counting the number of images marked as having an angle error in the at least one training image. , Determining the third error based on the total number of images of the at least one training image and the number of images marked as having an angular error in the at least one training image. For example, the ratio of the number of images to the total number of images can be used to determine the third error. In some embodiments, the third error is the ratio of the number of images to the total number of images.
  • the determining the angle error of each training image according to the predicted angle corresponding to each training image and its corresponding real angle may include:
  • the angle error corresponding to each training image is determined.
  • the predicted angle of each training image is obtained based on the prediction of the network prediction model. There is a certain angle error between the predicted angle and the real angle.
  • the angle error can include the angle offset between the predicted angle and the real angle.
  • the angle difference of the angle As shown in FIG. 6, the angle error taught by the predicted angle 601 and the real angle 602 is the included angle 603.
  • the determining the angle error of each training image according to the predicted angle corresponding to each training image and its corresponding real angle may include:
  • any training image corresponds to the predicted angle and is located in the target angle area where its real angle is located, determining that the training image does not have an angle error
  • any training image corresponding to the predicted angle is not located in the target angle area where its true angle is located, it is determined that the training image has an angular error.
  • the predicted angle 701 is located in the angular region 702, the real angle 703 is located in the angular region 704, and the angular region 702 is The area 704 is not the same angle area, and there is an angle error between the predicted angle 701 and the real angle 703. At this time, it can be determined that the predicted angle 701 and the training image corresponding to the real angle 703 have an angle error.
  • the predicted angle 705 is located in the corner area 706, the real angle 707 is located in the corner area 706, and the predicted angle 705 and the real angle 707 are located in the same corner area 706. At this time, the predicted angle 705 and the real angle 707 can be determined. There is no angular error in the corresponding training image.
  • the network prediction model predicts the prediction result of any input training image in the following manner:
  • At least one candidate frame of each feature image is extracted.
  • each candidate frame is represented by four coordinate points.
  • the number of the at least one target feature image is less than the number of the at least one feature image.
  • region feature extraction is performed on each target feature image to obtain at least one region feature corresponding to each target feature image.
  • At least one area feature corresponding to the at least one target feature image is subjected to full connection processing to obtain a prediction result corresponding to the training image.
  • the training image may correspond to at least one characteristic image, and each characteristic image may be used to describe characteristic information of the target object in the training image.
  • the extracting at least one feature image of the training image may include: inputting the training image into a basic feature extraction model to obtain at least one feature image.
  • the basic feature model may include VGGNet (Visual Geometry Group network, computer vision geometry group), ResNet (Residual Network, residual network), STN (Spatial Transformer Network, spatial transformation network), FCN (Fully Convolutional Networks), all Convolutional neural network) and other models.
  • the basic feature extraction model is mainly a neural network composed of different convolution kernels.
  • Said inputting the training image into the basic feature extraction model to obtain at least one feature image may specifically be: determining at least one convolution kernel of the basic feature extraction model, and performing convolution calculation on the training image and each convolution kernel, A characteristic image corresponding to each convolution kernel is obtained, and at least one characteristic image is obtained.
  • Different convolution kernels can be used to describe the characteristics of the training images in different types, different scales and/or different directions. For example, the edge feature of the training image in a certain direction can be extracted.
  • the extracting at least one candidate frame of each characteristic image may include extracting at least one candidate frame of each characteristic image respectively.
  • the extracting at least one candidate frame of each feature image may include: inputting the at least one feature image into an RPN (Region Proposal Network) model, and extracting coordinate points of the candidate frame in the at least one feature image, At least one candidate frame corresponding to each feature image is obtained.
  • Each candidate frame is specifically expressed in the form of coordinate points.
  • the performing key image extraction on the at least one feature image to obtain the at least one target feature image may include: inputting the at least one feature image into a light-head (Light-Head R-CNN) to construct a lightweight Head R-CNN network) model to obtain at least one target feature image.
  • the light-head model can be used to reduce the complexity of at least one feature image. For example, when the feature image is an image corresponding to 3900 channels, after reducing the complexity through the light-head, the at least one target feature image can be reduced to 490 channels.
  • Each target feature image can correspond to at least one candidate frame, and different regional features in the target feature image can be extracted through the candidate frame, so that the candidate frame where the target object is located can be determined from the different regional features.
  • the performing region feature extraction on each target feature image based on at least one candidate frame corresponding to each target feature image to obtain at least one region feature corresponding to each target feature image may include: combining at least one target feature image and each At least one candidate frame input area detection algorithm corresponding to the target feature image obtains at least one area sensitive to the position of the target object.
  • the region detection algorithm may include Pooling Position Sensitive ROI Pooling (position-sensitive candidate region pooling), etc., which may be used to extract features corresponding to sensitive regions in the image.
  • At least one feature region corresponding to the at least one target feature image can be feature-fitted based on the at least one regional feature corresponding to the at least one target feature image to obtain the target object correspondence
  • the at least one regional feature corresponding to the at least one target feature image is subjected to fully connected processing is to input the at least one regional feature respectively corresponding to the at least one target feature into a fully connected layer (FC, fully connected layers) to perform the fully connected processing .
  • FC fully connected layer
  • the fully connected layer can perform a nonlinear combination of at least one regional feature corresponding to at least one target feature, for example, weighting and/or linear transformation of multiple input target features, etc., to form a feature with a higher expression structure to obtain
  • the prediction result of the training image may include the first predicted coordinates of the three-dimensional frame corresponding to the target object in the training image in the world coordinate system, the second predicted coordinates of the two-dimensional frame corresponding to the target object in the image coordinate system, and the relative position of the target object.
  • the predictive angle of electronic equipment may be performed by the prediction result to form a feature with a higher expression structure.
  • FIG. 8 a flowchart of an embodiment of a method for flying around a point provided by an embodiment of this application.
  • the method may include the following steps:
  • 803 Determine the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system.
  • the first coordinate in the world coordinate system of the three-dimensional frame corresponding to the target object can be determined based on the image information.
  • the three-dimensional box can express more three-dimensional information of the target object.
  • the three-dimensional box can also express the depth information of the target object.
  • the world coordinate system is a spatial coordinate system relative to the real world.
  • the center point in the world coordinate system can be mapped to the image coordinate system to realize the three-dimensional reconstruction of the coordinate system on the electronic device.
  • the image coordinate point may be the center point when the electronic device performs intelligent flight against the target object.
  • the three-dimensional frame is closer to the actual shape of the target object, and the positioning of the center point of the target object through the three-dimensional space coordinate system is more accurate, so that the electronic device can be controlled to use the image coordinate point as the center point to perform flight processing around the target object, Make the positioning of intelligent flight more accurate.
  • FIG. 9 it is a flowchart of another embodiment of a method for flying around a point according to an embodiment of this application.
  • the method may include the following steps:
  • 905 Determine the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system.
  • 906 Map the center point of the target object in the world coordinate system to the image coordinate system to obtain image coordinate points.
  • the image information can be input into the network prediction model obtained by training, and the three-dimensional frame corresponding to the target object is predicted in the world coordinate system through the network prediction model.
  • the first coordinate of the target object corresponds to the second coordinate of the two-dimensional frame in the image coordinate system and the target angle of the target object relative to the electronic device.
  • the three-dimensional frame of the target object can describe posture information such as the position or angle of the target object in a more three-dimensional manner, and can contain more information of the target object in space than the two-dimensional frame.
  • the coordinate conversion relationship between the first coordinates in the world coordinate system and the image coordinates in the image coordinate system may be combined with the
  • the second coordinate and the target angle are calculated to calculate the distance between the target object and the electronic device to obtain the target distance.
  • the distance between the target object and the electronic device is calculated according to the various information of the target object in the image information, and the more comprehensive solution basic information is used to provide a more accurate solution result, so that the electronic device can be controlled to
  • the image coordinate point is used as the center point and the target distance is used as the flight radius to perform flight processing around the target object, so as to achieve more accurate flight around the point.
  • FIG. 10 it is a flowchart of an embodiment of an object focusing method provided by an embodiment of this application.
  • the method may include the following steps:
  • 1002 Based on the image information, determine the first coordinate in the world coordinate system of the three-dimensional frame corresponding to the target object.
  • 1003 Determine the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system.
  • 1004 Map the center point of the target object in the world coordinate system to the image coordinate system to obtain image coordinate points.
  • the electronic device When focusing on the target object according to the image coordinate point, the electronic device analyzes the original data through the data calculation method, and uses the image coordinate point as the center point of the target object to calculate the number of moving steps of the camera's lens motor or coil adjustment Data to complete focusing.
  • the electronic device uses the image coordinate point as the center point of the target object to calculate the number of moving steps of the camera's lens motor or coil adjustment Data to complete focusing.
  • FIG. 11 it is a schematic structural diagram of an embodiment of an object solving device provided by an embodiment of this application.
  • the device may include: a storage component 1101 and a processing component 1102; the storage component 1101 is used to store one Or multiple computer instructions, the one or more computer instructions are used to be called by the processing component 1102;
  • the processing component 1102 can be used to:
  • the image information of the target object based on the image information, determine the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system; determine the target according to the first coordinates of the three-dimensional frame in the world coordinate system
  • the object is at the center point of the world coordinate system; the center point of the target object in the world coordinate system is mapped to the image coordinate system, and the obtained image coordinate point is the target when the electronic device intelligently follows the target object point.
  • the processing component that determines the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system based on the image information may specifically be:
  • the image information is input into a network prediction model, and the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are obtained by calculation.
  • the first coordinates include eight coordinate vertices in the world coordinate system corresponding to the three-dimensional frame of the target object in the image information;
  • the processing component that determines the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system may specifically be:
  • the eight coordinate points of the three-dimensional frame in the world coordinate system are used to perform center point calculation to obtain the center point of the target object in the world coordinate system.
  • the processing component that determines the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system based on the image information may specifically be:
  • the target object corresponds to the first coordinate of the three-dimensional frame in the world coordinate system, the corresponding two-dimensional frame in the image coordinate system, the second coordinate of the two-dimensional frame in the image coordinate system, and the target object relative to the electronic device The target angle;
  • the processing component can also be used for:
  • the second coordinate and the target angle are combined to calculate the target object and the electronic The distance between the cameras of the device to obtain the target distance.
  • the target distance is used to determine a flying radius of the electronic device when flying around a point with respect to the target object.
  • processing component may also be used for:
  • the processing component determines the target point when the electronic device intelligently follows the target object, which may specifically be:
  • a target point when the electronic device intelligently follows the target object is determined.
  • the processing component determines, based on the image information, that the target object corresponds to the first coordinate of the three-dimensional frame in the world coordinate system, and the corresponding two-dimensional frame is in the image coordinate system.
  • the two-dimensional frame is in the image coordinate system.
  • the second coordinates of and the target angle of the target object relative to the electronic device may specifically be:
  • the processing component is based on the coordinate conversion relationship between the first coordinates in the world coordinate system and the image coordinates in the image coordinate system, and combines the second coordinates and the target angle to calculate
  • the distance between the target object and the camera of the electronic device, and obtaining the target distance may specifically be:
  • the distance between the target object and the camera of the electronic device in the conversion matrix is calculated to obtain the target distance.
  • the processing component uses the second coordinates and the target angle to construct the world on the basis that the distance between the target object and the camera of the electronic device is unknown.
  • the conversion matrix when the first coordinate in the coordinate system is converted to the image coordinate in the image coordinate system may specifically be:
  • a conversion matrix for converting the first coordinate in the world coordinate system to the image coordinate in the image coordinate system is constructed.
  • the processing component calculates the distance between the target object in the conversion matrix and the camera of the electronic device based on the first coordinates and the image coordinates, to obtain specific target distances.
  • the distance between the target object and the camera of the electronic device in the conversion equation is solved to obtain the target distance.
  • the processing component calculates the distance between the target object in the conversion equation and the camera of the electronic device, and obtaining the target distance may specifically be:
  • An N-point perspective algorithm (Perspective-n-Point, PNP) is used to calculate the distance between the target object and the camera of the electronic device in the conversion equation to obtain the target distance.
  • the target angle of the target object relative to the electronic device may include: a horizontal angle of the target object relative to the electronic device in a horizontal direction, and a horizontal angle of the target object relative to the electronic device in a vertical direction.
  • the processing component According to the target angle, the processing component generates the target rotation matrix corresponding to the coordinate axis of the world coordinate system and the coordinate axis of the camera coordinate system, which may specifically be:
  • the horizontal angle and the vertical angle are input into a three-dimensional rotation matrix formula, and a target rotation matrix corresponding to the coordinate axis of the world coordinate system and the coordinate axis of the camera coordinate system is obtained by calculation.
  • the processing component inputs the target image into a network prediction model obtained by training, and obtains the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system and the two-dimensional frame corresponding to the target object.
  • the second coordinates in the image coordinate system and the target angle of the target object relative to the electronic device may specifically be:
  • each candidate frame is represented by four coordinate points;
  • the processing component may be trained to obtain a network prediction model in the following manner:
  • each training image is marked with the first real coordinates of the target object in the training image corresponding to the three-dimensional frame in the world coordinate system, and the target object corresponding to the two-dimensional frame is in the image coordinate system
  • the at least one training image is used to train to obtain the model parameters of the network prediction model.
  • the processing component takes the first real coordinates, the second real coordinates, and the real angles of each training image as training targets, and uses the at least one training image to train to obtain the model parameters of the network prediction model. Specifically:
  • the training error satisfies the training constraint condition, determining that the reference parameter is a model parameter of the network prediction model;
  • the training error does not meet the training constraints, adjust the model parameters of the network prediction model based on the prediction results corresponding to each training image to obtain new reference parameters; return to the determination of the network prediction model The steps referring to the parameters continue.
  • the prediction result corresponding to each training image includes: the first predicted coordinates in the world coordinate system of the three-dimensional frame corresponding to the target object in each training image, and the two-dimensional frame corresponding to the target object in each training image.
  • the processing component calculates the training error of the network prediction model corresponding to the reference parameter based on the prediction result of each training image and its labeled first real coordinates, second real coordinates, and real angles, which may include:
  • the processing component determines the angle error of each training image according to the predicted angle corresponding to each training image and its corresponding real angle, specifically:
  • the angle error corresponding to each training image is determined.
  • the processing component determines the angular error of each training image according to the predicted angle corresponding to each training image and its corresponding real angle, specifically:
  • any training image corresponds to the predicted angle and is located in the target angle area where its real angle is located, determining that the training image does not have an angle error
  • any training image corresponding to the predicted angle is not located in the target angle area where its true angle is located, it is determined that the training image has an angular error.
  • the processing component predicts the prediction result corresponding to any training image input to the network prediction model in the following manner:
  • each candidate frame is represented by four coordinate points;
  • the determining the image coordinates of the target object in the image coordinate system includes:
  • the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are mapped to the image coordinate system, and the image coordinates of the three-dimensional frame corresponding to the target object in the image coordinate system are obtained.
  • the processing component maps the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system to the image coordinate system, and obtaining the image coordinates of the three-dimensional frame corresponding to the target object in the image coordinate system may specifically be :
  • the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system are mapped to the image coordinate system to obtain the image coordinates.
  • the processing component to generate the camera internal parameter matrix according to the second coordinates and the focal length of the camera corresponding to the electronic device may specifically be:
  • a camera internal parameter matrix is generated.
  • the object solving device described in FIG. 11 can execute the object solving method described in any of the foregoing embodiments, and its implementation principles and technical effects will not be described in detail.
  • the specific manner of operations performed by the processing component of the object solving device in the foregoing embodiment has been described in detail in the embodiment related to the method, and will not be elaborated here.
  • the device may include: a storage component 1201 and a processing component 1202; the storage component 1201 is used to store one or more Computer instructions, the one or more computer instructions are used to be called by the processing component 1202;
  • the processing component 1202 can be used for:
  • the image information of the target object based on the image information, determine the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system; determine the target according to the first coordinates of the three-dimensional frame in the world coordinate system
  • the object is at the center point of the world coordinate system; the center point of the target object in the world coordinate system is mapped to the image coordinate system to obtain the image coordinate point; the electronic device is controlled to circle the target with the image coordinate point as the center point
  • the subject performs flight processing.
  • the processing component that determines the first coordinates of the three-dimensional frame corresponding to the target object in the world coordinate system based on the image information may specifically be:
  • the processing component can also be used for:
  • the second coordinate and the target angle are combined to calculate the target object and the electronic The distance between the cameras of the device to obtain the target distance;
  • the processing component to control the electronic device to perform flight processing around the target object with the image coordinate point as the center point may specifically be:
  • the electronic device is controlled to perform flight processing around the target object with the image coordinate point as the center point and the target distance as the flying radius.
  • the electronic device described in FIG. 12 can execute the method of flying around a point described in any of the foregoing embodiments, and its implementation principles and technical effects will not be described in detail.
  • the specific manner of operations performed by the processing component of the electronic device in the foregoing embodiment has been described in detail in the embodiment of the method, and detailed description will not be given here.
  • FIG. 13 a schematic structural diagram of an embodiment of an electronic device provided by an embodiment of this application.
  • the device may include: a storage component 1301 and a processing component 1302; the storage component 1301 is used to store one or Multiple computer instructions, the one or more computer instructions are used to be called by the processing component 1302;
  • the processing component 1302 can be used for:
  • the first coordinate in the world coordinate system of the three-dimensional frame corresponding to the target object is determined. Determine the center point of the target object in the world coordinate system according to the first coordinates of the three-dimensional frame in the world coordinate system. The center point of the target object in the world coordinate system is mapped to the image coordinate system to obtain the image coordinate point. Focusing on the target object according to the image coordinate points.
  • the electronic device When focusing on the target object according to the image coordinate point, the electronic device analyzes the original data through the data calculation method, and uses the image coordinate point as the center point of the target object to calculate the number of moving steps of the camera's lens motor or coil adjustment Data to complete focusing.
  • the electronic device uses the image coordinate point as the center point of the target object to calculate the number of moving steps of the camera's lens motor or coil adjustment Data to complete focusing.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units.
  • Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work.
  • each implementation manner can be implemented by adding a necessary general hardware platform, and of course, it can also be implemented by a combination of hardware and software.
  • the above technical solution essentially or the part that contributes to the prior art can be embodied in the form of a computer product.
  • This application can use one or more computer usable storage containing computer usable program codes.
  • the form of a computer program product implemented on a medium including but not limited to disk storage, CD-ROM, optical storage, etc.).
  • These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • the electronic device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • the memory may include non-permanent memory in a computer readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

一种对象解算、绕点飞行方法及设备,对象解算方法包括:获取目标对象的图像信息(步骤101);基于图像信息,确定目标对象对应三维框在世界坐标系的第一坐标(步骤102);根据三维框在世界坐标系的第一坐标,确定目标对象在世界坐标系的中心点(步骤103);将目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点(步骤104);根据图像坐标点,确定电子设备针对目标对象进行智能跟随时的目标点。该对象解算方法提高对象位置预测的准确度。

Description

对象解算、绕点飞行方法及设备 技术领域
本申请涉及智能控制技术领域,尤其涉及一种对象解算、绕点飞行方法及设备。
背景技术
随着无人机、手持云台、手机、车载设备等电子设备的兴起,针对建筑物、车辆、船舶或飞行器等目标对象的智能跟随飞行或者绕点飞行等智能飞行技术逐渐被广大用户所喜爱。通常,电子设备需要获知目标对象的位置,并通过该位置实现针对目标对象的智能飞行。
目前,为了获得目标对象的位置,电子设备中装配的单目摄像头通常可以采集目标对象2维的图像信息,并提取图像信息中目标对象所形成的2维矩形框四个顶点在世界坐标系中的坐标点,并根据2维矩形框中心所在的坐标点,确定对目标对象的进行智能飞行时的中心点。
但是,由于二维框对目标对象的空间信息描述较少,因此利用目标对象对应2维矩形框的中心点确定的跟随时的中心点存在较大误差,进行智能跟随飞行或者绕点飞行时对目标对象的定位不够准确。
发明内容
有鉴于此,本申请实施例提供一种对象解算、绕点飞行方法及设备,通过三维坐标框对目标对象进行中心点解算以提高对象定位准确度,实现高质量的智能飞行。
第一方面,本申请实施例提供一种对象解算方法,所述方法包括:
获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。
第二方面,本申请实施例提供一种绕点飞行方法,所述方法包括:
获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理。
第三方面,本申请实施例提供一种对象对焦方法,包括:
获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象 在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;根据所述图像坐标点,对所述目标对象进行对焦。
第四方面,本申请实施例提供一种对象解算设备,所述设备包括:存储组件以及处理组件;所述存储组件用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件调用,以执行本申请实施例所提供的任一对象解算方法。
第五方面,本申请实施例提供一种电子设备,所述设备包括:存储组件以及处理组件;所述存储组件用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件调用,以执行本申请实施例所提供的任一绕点飞行方法。
第六方面,本申请实施例提供一种电子设备,所述设备包括:存储组件以及处理组件;所述存储组件用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件调用;所述处理组件用于:
获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;根据所述图像坐标点,对所述目标对象进行对焦。
本申请实施例中,获取目标对象的图像信息之后,可以基于该图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。与二维框相比,三维框能够表达目标对象更多的立体信息,除二维框已表达的长度以及高度信息之外,三维框还能表达目标对象的深度信息。通过三维框在世界坐标系中的第一坐标,可以确定目标对象在世界坐标系的中心点。世界坐标系是相对真实世界的空间坐标系,为了对目标对象执行智能飞行,可将位于世界坐标系中的中心点映射到图像坐标系,以实现坐标系在电子设备上的三维重建,获得的图像坐标点即可以是电子设备针对目标对象进行智能跟随时的目标点。三维框与目标对象的实际形状更接近,通过三维空间坐标系对目标对象的中心点的定位更精确,从而可以获得更准确的跟随目标点,使得智能飞行的定位更精准。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种对象解算方法的一个实施例的流程图;
图2为本申请实施例提供的一种目标对象的三维框的示例图;
图3为本申请实施例提供的一种三维框中心点示例图;
图4为本申请实施例提供的一种对象解算方法的又一个实施例的流程图;
图5为本申请实施例提供的一种网络预测模型的训练方法的一个实施例的流程图;
图6为本申请实施例提供的一个角度误差预测示例图;
图7a~7b为本申请实施例提供的又一个角度误差预测示例图;
图8为本申请实施例提供的一种饶点飞行方法的一个实施例的流程图;
图9为本申请实施例提供的一种绕点飞行方法的又一个实施例的流程图;
图10为本申请实施例提供的一种对象对焦方法的一个实施例的流程图;
图11为本申请实施例提供的一种对象解算设备的一个实施例的结构示意图;
图12为本申请实施例提供的一种电子设备的一个实施例的结构示意图;
图13为本申请实施例提供的一种电子设备的一个实施例的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义,“多种”一般包含至少两种,但是不排除包含至少一种的情况。
应当理解,本文中使用的术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
取决于语境,如在此所使用的词语“如果”、“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于识别”。类似地,取决于语境,短语“如果确定”或“如果识别(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当识别(陈述的条件或事件)时”或“响应于识别(陈述的条件或事件)”。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的商品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种商品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的商品或者系统中还存在另外的相同要素。
本申请实施例可以应用于无人机的智能飞行控制场景中,通过对目标对象进行三维空间分析,获得目标对象的三维框,以获得目标对象在三维空间更精准的中心点。
现有技术中,对汽车、轮船等体积较大的目标对象进行绕点飞行、定点跟随飞行或者监控飞行时,需要预测目标对象所在位置,也就是需要获得目标对象的中心点,作为监控的中心点。通常,可以预测目标对象的2维矩形框,再将2维矩形框的中心作为目标对象的中心点。但是,这种预测方式由于2维矩形框仅能表达目标对象的长度以及高度信息,缺乏对目标对象空间的深度信息。通过2维矩形框确定的中心点不够精确,在执行智能飞行或者定点追踪时容易出现误差。
为了解决目标对象中心点定位不准而导致的绕点飞行、智能追踪等智能跟随时出现的误差,本申请实施例中,获取目标对象的图像信息之后,可以基于所述图像信息,确定该目标对象三维框在世界坐标系中的第一坐标。三维框除能够表达目标对象的长度以及高度信息之外,还可以表达目标对象的深度信息,对目标对象的空间信息描述更多,因此,通过目标对象对应三维框在世界坐标系中的第一坐标可以确定目标对象在世界坐标系的中心点,进而可以将目标对象在世界坐标系的中心点映射到图像坐标系中,获得的图像坐标点为所述电子设备针对目标对象智能跟随时的中心点。三维框与目标对象的实际形状更接近,通过三维空间坐标系对目标对象的中心点的定位更精确,从而可以获得更准确的跟随目标点,使得智能飞行的定位更精准。
下面将结合附图对本申请实施例进行详细描述。
如图1所示,为本申请实施例提供的一种对象解算方法的一个实施例的流程图,所述方法可以包括以下几个步骤:
101:获取目标对象的图像信息。
本申请实施例所提供的对象解算方法可以应用于无人机、手机、手持云台、无人驾驶车等电子设备上,本申请实施例对电子设备的具体类型不作任何限制。
图像信息可以是电子设备的相机针对目标对象采集获得。电子设备中可以配置有相机,也即摄像头,电子设备中的相机通常为单目摄像头,电子设备可以利用相机拍摄针对目标对象的二维图片,图像信息可以包括拍摄的二维图片,在一些实施例中,图像信息还可以包括时间戳等拍摄信息。
电子设备中可以包括处理器或者处理组件以执行本申请实施例提供的对象解算方法。可以理解的是,本申请实施提供的结构并不构成对电子设备的具体限定。在一些实施例中,电子设备可以包括其他部件或者部件的组合,部件可以以硬件、软件或者软件和硬件组合实现。
目标对象可以包括车辆、船舶、飞机等可以移动的对象,也可以包括建筑物、具有特定组合的结构装置、大型机械设备或者电子设备等。由于在智能追踪飞行或者绕点飞行等智能跟随场景中,如果电子设备与目标对象的距离与目标对象的体积的比值超过第一阈值时,可以将目标对象在空间中的任意一点作为目标对象的智能飞行时的中心点;而如果电子设备与目标对象的距离与目标对象的体积的比值小于第一阈值时,可以将目标对象的中心点作为目标对象,以获得实现更精确的智能飞行,因此,本申请实施例可以适用于电子设备与目标对象的距离与目标对象的体积的比值小于第一阈值的智能飞行场景。
此外,在电子设备与目标对象的距离与目标对象的体积的比值计算过程中,目标对象的体积越大,该比值越小,因此,本申请实施例还可以适用于目标对象体积大于第二阈值的对象。
其中,第一阈值以及第二阈值可以根据实际的使用需要设置。例如,在距离单位为米,体积单位为立方米时,第一阈值可以设置为1。第二阈值可以设置为10立方米。
在一些实施例中,目标对象可以不断移动,电子设备可以采集处于移动状态的目标对象的图像信息,以实时获取目标对象的图像信息。
102:基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。
为了对目标对象进行准确进行定位,需要目标对象在图像信息中完全呈现,以更准确的数据描述图像信息中的目标对象的中心以及大小。进一步,可选地,目标对象可以位于图像中间,以便于能够准确识别图像信息中的目标对象,从而对图像信息中的目标对象进行分析。因此,所述方法还可以包括:对所述图像信息进行轮廓检测,识别目标对象在图像信息中的轮廓;基于所述轮廓在图像信息中的位置,以判断所述图像信息是否满足使用条件;如果满足,基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;如果不满足,可以输出图像信息不满足使用条件的提示信息,以提示对图像信息进行准确提示。
可选地,目标对象的三维框是三维矩形框。目标对象对应三维框在世界坐标系的第一坐标可以包括目标对象对应三维矩形框的8个顶点在世界坐标系中的坐标点,也即第一坐标是由位于世界坐标系中的8个坐标顶点构成,在世界坐标系中,所述8个坐标顶点相互连接形成的矩形框恰好能够完全包围所述目标对象。为了便于理解,图2中的目标对象201对应三维框202的8个坐标顶点即为目标对象201在世界坐标系OXYZ的第一坐标。
所述基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标可以包括:基于三维轮廓检测技术,识别所述目标对象在所述图像信息中的三维框在世界坐标系中的第一坐标。所述三维轮廓检测技术为本领域技术人员常用的检测技术,在此不再赘述。
103:根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点。
第一坐标可以包括8个位于世界坐标系的坐标顶点,这8个坐标顶点相互连接即构成目标对象的三维框,三维框的中心点在世界坐标系的坐标点可以为目标对象在世界坐标系的中心点。
在实际应用场景中,第一坐标的8个坐标顶点相互连接形成的三维框实际为一个长方体,该长方体的两条体对角线相交时的交点为所述三维框的中心点。体对角线是指连接长方体上下底面的不在同一侧面的两顶点的连线。
为了便于理解,如图3所示的三维框300,该三维框300的上下两个底面不在同一侧面的两个顶点的连线形成的体对角线301以及体对角线302,这两个体对角线的交点即为三维框300的中心点,该中心点在世界坐标系中的坐标点即为目标对象在世界坐标系的中心点。
相比于二维框包括目标对象的长度以及高度信息,三维框还包括了目标对象的深度信息,包含目标对象更多的空间信息,对目标对象的中心点的定位更加准确,可以获得更精准的中心点。
104:将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得的图像坐标点。
所述方法还可以包括:根据所述图像坐标点,确定所述电子设备针对所述目标对象进行智能跟随时的目标点。
世界坐标系为目标对象在真实世界的坐标系,而电子设备在绕点飞行、目标追踪、跟踪拍摄、或者自动避障等智能跟随场景中,需要以电子设备的成像体系为基准,也即以电子设备的相机对应图像坐标系为基准,实现对目标对象的精准智能跟随。因此,需要目标对象的在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。可以根据图像坐标点,确定电子设备针对目标对象进行绕点飞行或者智能跟随飞行等智能跟随的目标点。智能跟随过程中可以将目标对象的图像坐标点作为跟随过程中的目标点。
可选地,所述智能跟随可以包括智能追踪、跟踪拍摄、自动避障或者绕点飞行。所述智能追踪具体可以是电子设备锁定目标对象,并随着目标对象的移动而进行追踪,追踪期间可以对目标对象进行拍摄。绕点飞行,也可称为兴趣点环绕,具体可以是电子设备锁定目标对象,并将目标对象的中心点作为环绕中心而进行绕点飞行,绕点飞行期间可以对目标对象进行拍摄。
在某些实施例中,所述将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得的图像坐标点为所述电子设备针对所述目标对象智能飞行时的中心点可以包括:确定目标对象对应世界坐标系与图像坐标系之间的坐标转换关系;基于所述坐标转换关系将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;将所述图像坐标点作为所述电子设备针对所述目标对象智能飞行时的中心点。
其中,坐标转换关系可以基于电子设备的相机参数对应的相机内参矩阵,以及相机坐标系与世界坐标系之间对应的相机外参矩阵构成。
相机内参矩阵的获取方式具体可以参考其他实施例的相机内参矩阵的获取方式。基于设置的目标对象与电子设备之间的距离,结合图4所示实施例中的所述角度旋转矩阵R以及所述电子设备的相机视角参数,确定的距离转换矩阵T1。相机外参矩阵
Figure PCTCN2020080162-appb-000001
是基于角度旋转矩阵以及距离转换矩阵构建获得。
目标对象与电子设备之间的距离可以是用户在使用电子设备时设置,也可以是根据经验数据设置的默认值。
本申请实施例中,获取目标对象的图像信息之后,可以基于该图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。与二维框相比,三维框能够表达目标对象更多的立体信息,除二维框已表达的长度以及高度信息之外,三维框还能表达目标对象的深度信息。通过三维框在世界坐标系中的第一坐标,可以确定目标对象在世界坐标系的中心点。世界坐标系是相对真实世界的空间坐标系,为了对目标对象执行智能飞行,可将位于 世界坐标系中的中心点映射到图像坐标系,以实现坐标系在电子设备上的三维重建,获得的图像坐标点即可以是电子设备针对目标对象进行智能跟随时的目标点。三维框与目标对象的实际形状更接近,通过三维空间坐标系对目标对象的中心点的定位更精确,从而可以获得更准确的跟随目标点点,使得智能飞行的定位更精准。
其中,三维框的中心点在世界坐标系中的坐标点可以利用第一坐标解算获得。作为一个实施例,所述第一坐标包括所述目标对象在图像信息中对应三维框在所述世界坐标系的八个坐标顶点;
所述根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点可以包括:
利用所述三维框在世界坐标系的八个坐标点进行中心点解算,获得所述目标对象在所述世界坐标系的中心点。
所述利用三维框在世界坐标系的八个坐标点进行中心点解算,计算获得目标对象在世界坐标系的中心点可以包括:利用三维框在世界坐标系的八个坐标顶点进行中心点解算,获得所述三维框的中心点在世界坐标系的坐标点,确定所述三维框的中心点在世界坐标系的坐标点为所述目标对象在世界坐标系的中心点。
神经网络算法是一种模拟人类思维模式的理论模型,具有强大的非线性映射能力以及模拟能力。为了获得更精准的三维框,在一些实施例中,可以采用神经网络算法来预测目标对象在世界坐标系的第一坐标。在实际应用过程中,可以使用已训练好的网络预测模型,将图像信息输入网络预测模型,即可快速而准确地获得目标对象在世界坐标系的第一坐标。
因此,在一些实施例中,所述基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标可以包括:
将所述图像信息输入网络预测模型,计算获得所述目标对象对应三维框在世界坐标系的第一坐标。
在实际应用中,还可以根据目标对象的大小实现对目标对象的智能跟随。作为一个实施例,所述方法还可以包括:
根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象的尺寸数据;
所述根据所述图像坐标点,确定所述电子设备对所述目标对象进行智能跟随时的目标点可以包括:
根据所述图像坐标点以及所述目标对象的尺寸数据,确定所述电子设备对所述目标对象进行智能跟随时的目标点。
可选地,所述尺寸数据可以包括所述目标对象的长度、宽度和/或高度。
在电子设备对目标对象进行绕点飞行或者智能追踪飞行等智能跟随过程中,目标对象与电子设备之间的距离也属于较为重要的参考参数。因此,如图4所示,为本申请实施例提供的一种对象解算方法又一个实施例的流程图,所述方法可以包括以下几个步骤:
401:获取目标对象的图像信息。
本申请实施例的部分步骤与图1所示实施例的步骤相同,其执行内容与技术效果不再赘述。
402:基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度。
所述目标角度为目标对象相对电子设备的角度。
可选地,基于所述图像信息,确定目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系二维框在图像坐标系的第二坐标以及所述目标对象相对所述电子设备的目标角度可以包括:
基于三维轮廓检测技术,识别所述目标对象在所述图像信息中的三维框在世界坐标系中的第一坐标;基于轮廓检测技术,识别所述目标对象在图像信息中的二维框在图像坐标系中的第二坐标;
根据目标对象在图像信息中的二维框,确定目标对象与电子设备的目标角度。
可选地,根据目标对象在图像信息中的二维框,确定目标对象与电子设备的目标角度可以包括:计算目标对象在图像信息中的二维框的长度以及高度;查询二维框的长度以及高度与对象角度之间对应的数据方向表,获得该长度与高度对应的目标角度。数据方向表中存储有二维框的长宽比与对象角度关联存储,不同的长宽比对应不同的对象角度。
数据方向表可以预先基于目标对象的实际长度与实际高度,结合多个二维框分别对应的长度及高度,确定多个二维框分别对应的对象角度。可以统计在不同对象角度下,相对于该目标对象的实际长度与实际高度,目标对象的二维框对应的长度及高度,并将不同二维框对应的长度以及高度与对应的对象角度关联存储于所述数据方向表中,以便于查询。
目标角度为目标对象相对电子设备之间的旋转角度,也即,当需要将目标对象从世界坐标系映射到电子设备的图像坐标系时,目标对象需要旋转的角度。本申请实施例中,目标角度可以包括:目标对象相对电子设备在水平方向的旋转角度α、目标对象相对电子设备在垂直方向上的旋转角度β。
403:确定所述第一坐标在图像坐标系中的图像坐标。
其中,图像坐标为第一坐标在图像坐标系映射的坐标点,将第一坐标映射到图像坐标系获得的坐标点为图像坐标点。
可选地,可以基于坐标运算,确定第一坐标在图像坐标系对应的图像坐标。
404:基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
其中,所述目标距离用于确定所述电子设备针对所述目标对象进行智能跟随时的跟随距离。
在电子设备针对目标对象进行智能飞行过程中,为了提高飞行精度可以根据电子设备的相机与目标对象之间的距离,也即目标距离,调整电子设备的飞行轨迹。目标对象在世界坐标系的中心点映射到图像坐标系的坐标点可以作为飞行的中心点。
所述目标距离为电子设备与目标对象的距离,可以用于智能飞行控制。在实际应用中,当电子设备针对目标对象进行智能跟随飞行时,目标距离可以是电子设备对目标对象进行智能跟随飞行时的飞行间距。当电子设备针对目标对象进行绕点飞行时,目标距离可以是电子设备针对目标对象进行绕点飞行时的环绕半径,此时,目标对象在图像坐标系的图像坐标点为绕点飞行时的环绕兴趣点,也即环绕目标对象进行环绕飞行时的圆心。
405:根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点。
406:将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。
所述方法还可以包括:根据所述图像坐标点,确定所述电子设备针对所述目标对象进行智能跟随时的目标点。
可选地,所述将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得的图像坐标点为所述电子设备针对所述目标对象智能飞行时的中心点可以包括:基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得的图像坐标点;确定所述图像坐标点为所述电子设备针对所述目标对象进行智能跟随时的目标点。
本申请实施例中,获取电子设备针对目标对象采集的图像信息之后,可以将图像信息输入训练获得的网络预测模型,通过所述网络预测模型预测获得所述目标对象对应三维框在世界坐标系下的第一坐标,所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。目标对象的三维框可以以更立体的方式来描述目标对象的位置或角度等姿态信息,相较于二维框而言在空间上能够包含目标对象更多的信息。因此,在确定所述目标对象在图像坐标系中的图像坐标之后,可以基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与电子设备之间的距离,获得目标距离。通过目标对象在图像信息中的多种信息完成对目标对象与电子设备的相机之间的距离解算,实现以相机与目标对象的距离作为跟随距离,可以减少相机对目标对象的拍摄误差,提高相机对目标对象的拍摄效果。
由于神经网络算法可以包含不同的神经元网络,具有强大的非线性映射能力以及模拟能力,为了提高计算效率以及准确性,第一坐标、第二坐标以及目标角度均可以通过网络预测模型直接预测获得。
作为一个实施例,所述基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度可以包括:
将所述图像信息输入网络预测模型,获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
可选地,所述网络预测模型可以用于预测目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。但是网络预测模型的预测结果可以单独使用,如仅使用目标对象对应三维框在世界坐标系下的第一坐标。
本申请实施中使用的网络预测模型为已训练好的网络预测模型。网络预测模型的训练过程可以参考图5所示的实施例,其具体的训练过程以及步骤已在图5所示的实施例中详细描述,在此不再赘述。
作为一个实施例,所述基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离可以包括:
利用所述第二坐标以及所述目标角度,在所述目标对象与所述电子设备的相机之间的距离为未知量的基础上,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵。
基于所述第一坐标与所述图像坐标,解算所述转换矩阵中所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
世界坐标系与图像坐标系之间存在转换矩阵,通过该转换矩阵可以将世界坐标系中的坐标点映射到图像坐标系中。世界坐标系中的第一坐标通过转换矩阵转换到图像坐标系中的图像坐标。
图像坐标系为电子设备对应相机显示图像时的二维坐标系,可以用于显示图像像素。目标对象在世界坐标系的中心点实际为三维坐标系中的点,而目标对象在图像坐标系中的图像坐标点实际为2维坐标点。其中,目标对象在世界坐标系的中心点经过一系列的坐标映射可以获得目标对象在图像坐标系中的坐标点。世界坐标系的中心点与图像坐标系的图像坐标点是三维与二维相匹配的点。
作为一种可能的实现方式,所述利用所述第二坐标以及所述目标角度,在所述目标对象与所述电子设备的相机之间的距离为未知量的基础上,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵可以包括:
根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的角度旋转矩阵;
以所述目标对象与所述电子设备的相机之间的距离为未知量,结合所述角度旋转矩阵以及所述电子设备对应相机的相机视角参数,构建所述目标对象相对所述电子设备的距离转换矩阵;
根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵;
基于所述相机内参矩阵、所述角度旋转矩阵以及所述距离转换矩阵,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵。
在将世界坐标系的坐标点,例如第一坐标,转换至图像坐标系时,由于对象不会发生形变,可以通过刚体变换,也即旋转与平移的方式,将世界坐标系的坐标点通过相机坐标系转换到图像坐标系中。在转换过程中,可以通过R代表角度旋转,也即角度旋转矩阵,T代表对象的平移,也即距离转换矩阵。在旋转平移时,R、T与摄像机本身无关,因此,可以将R、T称为相机外参数,可以根据角度旋转矩阵以及距离旋转矩形构建相机的外参矩阵,相机外参矩阵具体可以是:
Figure PCTCN2020080162-appb-000002
本申请实施例中,旋转角度对应的矩阵为角度旋转矩阵R,可以通过目标角度,也即目标对象相对电子设备在水平方向的旋转角度α、目标对象相对电子设备在垂直方向上的旋转角度β计算获得。在空间坐标系中,可以分别计算目标角度在水平方向对应的旋转矩阵R α以及垂直方向对应的旋转矩阵R β,以计算获得R=R α·R β
本申请实施例中,目标对象从世界坐标系平移至相机坐标系时,与相机本身的相机视角参数v相关,也与旋转矩阵R相关,目标对象相对电子设备的距离转换矩阵可以使用公式T=-R·C(v,d)表示,其中,d为目标对象与电子设备的相机之间的距离,该距离为未知量。
目标对象从世界坐标系转换到相机坐标系时,与电子设备的相机本身相关联,需要使用到相机本身的参数,使用的相机内的参数主要是相机的焦距。相机的焦距可以包括相机在水平方向上的焦距fx以及相机在垂直方向上的焦距fy。相机内参矩阵可以表示为:
Figure PCTCN2020080162-appb-000003
其中,fx为相机在水平方向上的焦距以及fy为相机在垂直方向上的焦距;(cx,cy)为目标对象的第二坐标构成的二维框的中心在世界坐标系中的坐标点。
可选地,目标对象对应二维框在图像坐标系二维框在图像坐标系的第二坐标可以包括4个坐标点,通过第二坐标对应的4个坐标点可以确定二维框的中心在世界坐标系的坐标点。二维框实际为一个矩形框,该矩形框的两条对角线的交点为二维框的中心。
进一步,可选地,所述基于相机内参矩阵、角度旋转矩阵以及距离转换矩阵,构建世界坐标系中的第一坐标转换至图像坐标系中的图像坐标时对应的转换矩阵可以包括:确定所述角度旋转矩阵以及距离转换矩阵对应的相机外参矩阵,基于所述相机内参矩阵以及所述相机外参矩阵,确定世界坐标系中的第一坐标转换至图像坐标系中的图像坐标时对应的转换矩阵。
世界坐标系中的第一坐标转换至图像坐标系中的图像坐标时对应的转换矩阵具体可以通过以下公式表示:
Figure PCTCN2020080162-appb-000004
在某些实施例中,所述基于所述第一坐标与所述图像坐标,解算所述转换矩阵中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离可以包括:
以所述转换矩阵为坐标转换关系,结合所述第一坐标与所述图像坐标,构建转换方程;
解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
可选地,以转换矩阵为坐标转换关系,结合第一坐标与图像坐标构建的转换方式具体可以是:
Figure PCTCN2020080162-appb-000005
其中,s为相机投影系数。(u,k)为图像坐标。(x obj,y obj,z obj)为第一坐标。
转换矩阵中包括第二坐标和/或目标角度对应已知量,也包括目标对象与电子设备之间的距离对应未知量,由于第一坐标与图像坐标已知,可以构建第一坐标与转换矩阵的乘积等于图像坐标的转换方程,解算转换方程中的未知量,获得目标距离。其中,可以基于第一坐标、图像坐标以及相机内参矩阵,解算相机外参矩阵中未知量。
在一些实施例中,所述解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离可以包括:
利用N点透视算法(Perspective-n-Point,PNP)解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
具体可以通过PNP算法求解转换方程中的目标对象与电子设备之间的距离,获得目标距离。
在一种可能的设计中,所述目标对象相对所述电子设备的目标角度包括:所述目标对象在水平方向上相对所述电子设备的水平角度以及所述目标对象在垂直方向上相对所述电子设备的垂直角度;
所述根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的角度旋转矩阵包括:
将所述水平角度以及所述垂直角度输入三维旋转矩阵公式,计算获得所述世界坐标系的坐标轴与所述相机坐标系的坐标轴对应的角度旋转矩阵。
可选地,所述三维旋转矩形公式,可以包括水平旋转矩阵公式以及垂直旋转矩阵公式。
作为一个实施例,所述将所述图像信息输入训练获得的网络预测模型,获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度可以包括:
提取所述图像信息的至少一个特征图像。
提取每个特征图像的至少一个候选框。
其中,每个候选框以四个坐标点表示。
对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像。
其中,所述至少一个目标特征图像的数量小于所述至少一个特征图像的数量。
基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征。
将所述至少一个目标特征图像分别对应的至少一个区域特征进行全连接处理,以获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
可选地,图像信息可以对应至少一个特征图像,每个特征图像可以用于描述图像信息中目标对象的特征信息。作为一种可能的实现方式,所述提取所述图像信息的至少一个特征图像可以包括:将所述图像信息输入基础特征提取模型,获得至少一个特征图像。其中,所述基础特征模型可以包括VGGNet(Visual Geometry Group network,计算机视觉几何组),ResNet(Residual Network,残差网络),STN(Spatial Transformer Network,空间变换网络),FCN(Fully Convolutional Networks,全卷积神经网络)等模型。
所述基础特征提取模型主要是由不同卷积核构成的神经网络。所述将所述图像信息输入基础特征提取模型,获得至少一个特征图像具体可以是:确定基础特征提取模型的至少一个卷积核,将所述图像信息与每个卷积核进行卷积计算,获得每个卷积核对应的特征图 像,获得至少一个特征图像。不同卷积核可以用于描述图像信息在不同类型、不同尺度和/或不同方向上的特征。例如,可以提取图像信息在某个方向上的边缘特征。
作为一种可能的实现方式,所述提取每个特征图像的至少一个候选框可以包括分别提取每个特征图像的至少一个候选框。所述提取每个特征图像的至少一个候选框可以包括:将所述至少一个特征图像分别输入RPN(Region Proposal Network,区域生成网络)模型,提取所述至少一个特征图像中候选框的坐标点,获得每个特征图像对应的至少一个候选框。每个候选框具体以坐标点的形式表示。
由于获得的至少一个特征图像数量较大,在基于至少一个特征图像直接进行图像处理时,计算量比较大,增加计算复杂度,因此,可以对至少一个特征图像进行关键图像提取,以减少特征图像的数量,减少计算复杂度,提高计算效率。
可选地,所述对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像可以包括:将所述至少一个特征图像输入light-head(Light-Head R-CNN,构造了轻量头部R-CNN网络)模型,获得至少一个目标特征图像。可以利用light-head模型降低至少一个特征图像的复杂度,例如,当特征图像是3900个通道对应的图像时,经由light-head降低复杂度之后,可以至少一个目标特征图像降低到490个通道。
每个目标特征图像可以对应有至少一个候选框,通过候选框可以提取目标特征图像中不同的区域特征,从而可以从不同的区域特征中确定出目标对象所在的候选框。所述基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征可以包括:将至少一个目标特征图像以及每个目标特征图像对应的至少一个候选框输入区域检测算法,获得对目标对象的位置敏感的至少一个区域。所述区域检测算法可以包括Pooling Position Sensitive ROI Pooling(位置敏感的候选区域池化)等可以用于提取图像中敏感区域所对应的特征。
在获得每个目标特征图像的至少一个区域特征之后,可以基于至少一个目标特征图像分别对应的至少一个区域特征,将至少一个目标特征图像对应的至少一个特征区域进行特征拟合,获得目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。所述将所述至少一个目标特征图像分别对应的至少一个区域特征进行全连接处理是将至少一个目标特征分别对应的至少一个区域特征输入全连接层(FC,fully connected layers),进行全连接处理。全连接层可以将至少一个目标特征分别对应的至少一个区域特征进行非线性组合,例如对输入的多个目标特征进行加权和/或线性变换等全连接处理,形成表达结构更高级的特征,也即获得目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
可选地,所述网络预测模型可以是预先训练获得。也即可以通过训练样本训练获得网络预测模型的模型参数。所述将所述图像信息输入训练获得的网络预测模型,获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度具体可以包括:将图像信息输入预先训练获得的参数已知的网络预测模型,获得目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
网络预测模型可以是神经网络模型,通过网络预测模型的各个神经元计算模型通过执行本申请实施例中的各个计算模块的计算步骤,可以预测获得目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。关于网络预测模型的参数训练过程具体在图5所示的实施例中进行详细描述,在此不再赘述。
作为一个实施例,所述确定所述目标对象在图像坐标系中的图像坐标包括:
将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系,获得所述目 标对象对应三维框在图像坐标系中的图像坐标。
作为一种可能的实现方式,所述将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系,获得所述目标对象对应三维框在图像坐标系中的图像坐标可以包括:
根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵;
确定所述世界坐标系与所述相机坐标系之间的进行坐标转换时对应的相机外参矩阵;
根据所述相机内参矩阵以及所述相机外参矩阵,将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系的,获得所述图像坐标。
可选地,所述确定所述世界坐标系与所述相机坐标系之间的进行坐标转换时对应的相机外参矩阵可以包括:根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的角度旋转矩阵R;根据预先设置的目标对象与电子设备之间的距离,确定距离转换矩阵T1;根据所述角度旋转矩阵R与所述距离转换矩阵T1,确定相机外参矩阵。其中,相机外参矩阵可以使用:
Figure PCTCN2020080162-appb-000006
表示。R的获取方式可以参考上述实施例中的描述。距离转换矩阵可以使用公式T1=-R·C(v,d1)表示。其中,v为相机视觉参数,d1为预设置的电子设备与目标对象之间的距离。关于角度旋转矩阵R的获取方式可以与上述实施例中R的获取方式相同,在此不再赘述。
在某些实施例中,所述根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵包括:
确定所述第二坐标在图像坐标系的图像中心点;
根据所述图像中心点以及所述电子设备对应相机的焦距,生成相机内参矩阵。
如图5所示,为本申请实施例提供的一种网络预测模型的训练方法的一个实施例的流程图,所述方法可以包括以下几个步骤:
501:确定至少一个训练图像。
其中,每个训练图像被标注有所述训练图像中的目标对象对应三维框在世界坐标系下的第一真实坐标、所述目标对象对应二维框在图像坐标系下的第二真实坐标以及所述目标对象相对其电子设备的真实角度。
可选地,每个训练图像对应的第一真实坐标、第二真实坐标均可以通过标注获得,可以将至少一个训练图像分别导入CAD(Computer-aided design,计算机辅助设计),并在CAD中标注每个训练图像中目标对象的3维框的关键点以及2维框的关键点,并利用CAD模型拟合对象3维框所对应关键点在世界坐标系下的第一真实坐标、二维框所对应关键点在世界坐标系下的第二真实坐标。目标对象相对电子设备的真实角度可以通过测量获得。
作为一种可能的实现方式,3维框的关键点可以指三维框的8个顶点。2维框的关键点可以指二维框的4个顶点,通过CAD可以拟合获得三维框的8个顶点在世界坐标系下的第一真实坐标,以及二维框的4个顶点在世界坐标系下的第二真实坐标。
利用至少一个训练图像预测网络预测模型的模型参数之后,即可以使用已训练出的网络预测模型来预测输入的图像信息的三维框对应的第一坐标以及2维框对应的第二坐标以及目标对象相对电子设备的目标角度。
502:构建网络预测模型。
503:以每个训练图像被标注的第一真实坐标、第二真实坐标以及真实角度为训练目标,利用所述至少一个训练图像训练获得所述网络预测模型的模型参数。
本申请实施例提供一种网络预测模型的训练方式,网络预测模型可以预先训练获得,以在需要时直接使用已训练的网络预测模型,可以提高计算效率。在一些实施例中,网络预测模型可以实时训练获得,以提高网络预测模型的预测时效性。
作为一个实施例,所述以每个训练图像被标注的第一真实坐标、第二真实坐标以及真实角度为训练目标,利用所述至少一个训练图像训练获得所述网络预测模型的模型参数可以包括:
确定所述网络预测模型的参考参数;
将所述至少一个训练图像分别输入所述参考参数对应的网络预测模型,获得每个训练图像对应的预测结果;
基于每个训练图像预测结果与其被标注的第一真实坐标、第二真实坐标以及真实角度,计算所述参考参数对应网络预测模型的训练误差;
如果所述训练误差满足训练约束条件,确定所述参考参数为所述网络预测模型的模型参数;
如果所述训练误差不满足所述训练约束条件,基于每个训练图像对应的预测结果,调整所述网络预测模型的模型参数,获得新的参考参数;返回至所述确定所述网络预测模型的参考参数的步骤继续执行。
在某些实施例中,所述每个训练图像对应的预测结果可以包括:每个训练图像中所述目标对象对应三维框在世界坐标系下的第一预测坐标、所述目标对象对应二维框在图像坐标系下的第二预测坐标以及所述目标对象相对其电子设备的预测角度;
所述基于每个训练图像预测结果与其被标注的第一真实坐标、第二真实坐标以及真实角度,计算所述参考参数对应网络预测模型的训练误差包括:
根据每个训练图像对应所述第一预测坐标与所述第一真实坐标,确定每个训练图像的第一坐标误差;
根据每个训练图像对应所述第二预测坐标与所述第二真实坐标,确定每个训练图像的第二坐标误差;
根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差;
基于每个训练图像对应所述第一坐标误差、所述第二坐标误差以及所述角度误差,确定所述网络预测模型对所述至少一个训练图像的训练误差。
可选地,所述基于每个训练图像对应第一坐标误差、第二坐标误差以及角度误差,确定网络预测模型对至少一个训练图像的训练误差可以包括:确定至少一个训练图像分别对应第一坐标误差所构成的第一误差,确定至少一个训练图像分别对应第二坐标误差所构成的第二误差以及确定所述至少一个训练图像分别对应角度误差所构成的第三误差。将所述第一误差、第二误差以及第三误差进行加权求和,获得网络预测模型对至少一个训练图像的训练误差。
在基于预测角度与真实角度的差值,确定每个训练图像的角度误差时,至少一个训练图像分别对角度误差所构成的第三误差可以是将每个训练图像对应角度误差输入误差计算函数,获得每个训练图像对应的误差损失,计算至少一个训练图像分别对应的误差损失之和,获得第三误差。
在基于角区域分布的方式确定每个训练图像的角度误差时,所述至少一个训练图像分别对应角度误差所构成的第三误差可以是统计至少一个训练图像中被标记为存在角度误差的图像数量,基于至少一个训练图像的图像总数量以及至少一个训练图像中被标记为存在角度误差的图像数量,确定第三误差。例如,可以利用图像数量与图像总数量的比值,确定第三误差。在一些实施例例中,所述第三误差即为图像数量与图像总数量的比值。
作为一种可能的实现方式,所述根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差可以包括:
基于每个训练图像对应的预测角度与其对应真实角度的角度差值,确定每个训练图像对应的角度误差。
每个训练图像的预测角度是基于网络预测模型预测获得,预测角度与真实角度存在一定的角度误差,该角度误差可以包括预测角度与真实角度之间的角度偏移,具体可以指预测角度与真实角度的角度差值。如图6所示,预测角度601与真实角度602指教的角度误差为夹角603。
作为又一种可能的实现方式,所述根据每个训练图像对应所述预测角度与其对应真实 角度,确定每个训练图像的角度误差可以包括:
将一个圆周平均划分为多个角区域;
从所述多个角区域中确定每个训练图像对应真实角度所在目标角区域;
如果任一个训练图像对应所述预测角度位于其真实角度所在所述目标角区域时,确定所述训练图像不存在角度误差;
如果任一个训练图像对应所述预测角度不位于其真实角度所在所述目标角区域时,确定所述训练图像存在角误差。
为了便于理解,以所述平均划分的多个角区域为16个角区域为例,图7a中,预测角度701位于角区域702中,真实角度703位于角区域704中,而角区域702与角区域704不是同一个角区域,预测角度701与真实角度703之间存在角度误差,此时可以确定该预测角度701以及真实角度703对应的训练图像存在角度误差。图7b中,预测角度705位于角区域706中,真实角度707位于角区域706中,预测角度705与真实角度707位于同一个角区域706中,此时,可以确定该预测角度705以及真实角度707对应的训练图像不存在角度误差。
作为一种可能的实现方式,所述网络预测模型通过以下方式预测任一个输入的训练图像的预测结果:
提取所述训练图像的至少一个特征图像;
提取每个特征图像的至少一个候选框。
其中,每个候选框以四个坐标点表示。
对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像。
其中,所述至少一个目标特征图像的数量小于所述至少一个特征图像的数量。
基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征。
将所述至少一个目标特征图像分别对应的至少一个区域特征进行全连接处理,以获得所述训练图像对应预测结果。
可选地,训练图像可以对应至少一个特征图像,每个特征图像可以用于描述训练图像中目标对象的特征信息。作为一种可能的实现方式,所述提取所述训练图像的至少一个特征图像可以包括:将所述训练图像输入基础特征提取模型,获得至少一个特征图像。其中,所述基础特征模型可以包括VGGNet(Visual Geometry Group network,计算机视觉几何组),ResNet(Residual Network,残差网络),STN(Spatial Transformer Network,空间变换网络),FCN(Fully Convolutional Networks,全卷积神经网络)等模型。
所述基础特征提取模型主要是由不同卷积核构成的神经网络。所述将所述训练图像输入基础特征提取模型,获得至少一个特征图像具体可以是:确定基础特征提取模型的至少一个卷积核,将所述训练图像与每个卷积核进行卷积计算,获得每个卷积核对应的特征图像,获得至少一个特征图像。不同卷积核可以用于描述训练图像在不同类型、不同尺度和/或不同方向上的特征。例如,可以提取训练图像在某个方向上的边缘特征。
作为一种可能的实现方式,所述提取每个特征图像的至少一个候选框可以包括分别提取每个特征图像的至少一个候选框。所述提取每个特征图像的至少一个候选框可以包括:将所述至少一个特征图像分别输入RPN(Region Proposal Network,区域生成网络)模型,提取所述至少一个特征图像中候选框的坐标点,获得每个特征图像对应的至少一个候选框。每个候选框具体以坐标点的形式表示。
由于获得的至少一个特征图像数量较大,在基于至少一个特征图像直接进行图像处理时,计算量比较大,增加计算复杂度,因此,可以对至少一个特征图像进行关键图像提取,以减少特征图像的数量,减少计算复杂度,提高计算效率。
可选地,所述对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像可以包括:将所述至少一个特征图像输入light-head(Light-Head R-CNN,构造了轻量头 部R-CNN网络)模型,获得至少一个目标特征图像。可以利用light-head模型降低至少一个特征图像的复杂度,例如,当特征图像是3900个通道对应的图像时,经由light-head降低复杂度之后,可以至少一个目标特征图像降低到490个通道。
每个目标特征图像可以对应有至少一个候选框,通过候选框可以提取目标特征图像中不同的区域特征,从而可以从不同的区域特征中确定出目标对象所在的候选框。所述基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征可以包括:将至少一个目标特征图像以及每个目标特征图像对应的至少一个候选框输入区域检测算法,获得对目标对象的位置敏感的至少一个区域。所述区域检测算法可以包括Pooling Position Sensitive ROI Pooling(位置敏感的候选区域池化)等可以用于提取图像中敏感区域所对应的特征。
在获得每个目标特征图像的至少一个区域特征之后,可以基于至少一个目标特征图像分别对应的至少一个区域特征,将至少一个目标特征图像对应的至少一个特征区域进行特征拟合,获得目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。所述将所述至少一个目标特征图像分别对应的至少一个区域特征进行全连接处理是将至少一个目标特征分别对应的至少一个区域特征输入全连接层(FC,fully connected layers),进行全连接处理。全连接层可以将至少一个目标特征分别对应的至少一个区域特征进行非线性组合,例如对输入的多个目标特征进行加权和/或线性变换等全连接处理,形成表达结构更高级的特征,获得训练图像的预测结果。所述预测结果可以包括训练图像中目标对象对应三维框在世界坐标系下的第一预测坐标、所述目标对象对应二维框在图像坐标系下的第二预测坐标以及所述目标对象相对其电子设备的预测角度。
如图8所示,为本申请实施例提供的一种绕点飞行方法的一个实施例的流程图,所述方法可以包括以下几个步骤:
801:获取目标对象的图像信息。
802:基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。
803:根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点。
804:将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。
805:控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理。
本申请实施例部分步骤与图1、图4或图5所示实施例部分步骤相同,对于各个步骤的具体实施方式可以参考图1、图4或图5所示的实施例,在此不再赘述。
本申请实施例中,获取目标对象的图像信息之后,可以基于该图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。与二维框相比,三维框能够表达目标对象更多的立体信息,除二维框已表达的长度以及高度信息之外,三维框还能表达目标对象的深度信息。通过三维框在世界坐标系中的第一坐标,可以确定目标对象在世界坐标系的中心点。世界坐标系是相对真实世界的空间坐标系,为了对目标对象执行智能飞行,可将位于世界坐标系中的中心点映射到图像坐标系,以实现坐标系在电子设备上的三维重建,获得的图像坐标点即可以是电子设备针对目标对象进行智能飞行时的中心点。三维框与目标对象的实际形状更接近,通过三维空间坐标系对目标对象的中心点的定位更精确,从而可以控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理,使得智能飞行的定位更精准。
如图9所示,为本申请实施例提供的一种绕点飞行方法的又一个实施例的流程图,所述方法可以包括以下几个步骤:
901:获取目标对象的图像信息。
902:基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度。
903:确定所述第一坐标在图像坐标系中的图像坐标。
904:基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
905:根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点。
906:将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。
907:控制所述电子设备以所述图像坐标点为中心点以及以所述目标距离为飞行半径绕所述目标对象执行飞行处理。
本申请实施例部分步骤与图1、图4或图5所示实施例部分步骤相同,对于各个步骤的具体实施方式可以参考图1、图4或图5所示的实施例,在此不再赘述。
本申请实施例中,获取电子设备针对目标对象采集的图像信息之后,可以将图像信息输入训练获得的网络预测模型,通过所述网络预测模型预测获得所述目标对象对应三维框在世界坐标系下的第一坐标,所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。目标对象的三维框可以以更立体的方式来描述目标对象的位置或角度等姿态信息,相较于二维框而言在空间上能够包含目标对象更多的信息。因此,在确定所述目标对象在图像坐标系中的图像坐标之后,可以基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与电子设备之间的距离,获得目标距离。根据目标对象在图像信息中的多种信息完成对目标对象与电子设备之间的距离解算,以更全面的解算基础信息提供更准确的解算结果,从而可以控制所述电子设备以所述图像坐标点为中心点以及以所述目标距离为飞行半径绕所述目标对象执行飞行处理,实现更精准的绕点飞行。
如图10所示,为本申请实施例提供的一种对象对焦方法的一个实施例的流程图,所述方法可以包括以下几个步骤:
1001:获取目标对象的图像信息。
本申请实施例的部分步骤与图1所示实施例的步骤相同,在此不再赘述。
1002:基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。
1003:根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点。
1004:将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。
1005:根据所述图像坐标点,对所述目标对象进行对焦。
根据图像坐标点对目标对象进行对焦时可以是电子设备通过数据计算的方法对原始数据进行分析,以所述图像坐标点为目标对象的中心点,计算相机的镜头马达的移动步数或者线圈调整数据,从而完成对焦。通过确定目标对象中心点在图像坐标系的坐标,以将该图像坐标系的坐标用于相机的自动对焦,可以实现准确对焦,提高对焦精度。
如图11所示,为本申请实施例提供的一种对象解算设备的一个实施例的结构示意图,所述设备可以包括:存储组件1101以及处理组件1102;所述存储组件1101用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件1102调用;
所述处理组件1102可以用于:
获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得的图像坐标点为所述电子设备针对所述目标对象进行智能跟随时的目标点。
作为一个实施例,所述处理组件基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标具体可以是:
将所述图像信息输入网络预测模型,计算获得所述目标对象对应三维框在世界坐标系的第一坐标。
作为又一个实施例,所述第一坐标包括所述目标对象在图像信息中对应三维框在所述世界坐标系的八个坐标顶点;
所述处理组件根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点具体可以是:
利用所述三维框在世界坐标系的八个坐标点进行中心点解算,获得所述目标对象在所述世界坐标系的中心点。
作为一个实施例,所述处理组件基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标具体可以是:
基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度;
所述处理组件还可以用于:
确定所述目标对象对应三维框在图像坐标系中的图像坐标;
基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
其中,所述目标距离用于确定所述电子设备针对所述目标对象绕点飞行时的飞行半径。
作为一个实施例,所述处理组件还可以用于:
根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象的尺寸数据;
所述处理组件根据所述图像坐标点,确定所述电子设备对所述目标对象进行智能跟随时的目标点具体可以是:
根据所述图像坐标点以及所述目标对象的尺寸数据,确定所述电子设备对所述目标对象进行智能跟随时的目标点。
作为一种可能的实现方式,所述处理组件基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度具体可以是:
将所述目标图像输入网络预测模型,获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
可选地,所述处理组件基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离具体可以是:
利用所述第二坐标以及所述目标角度,在所述目标对象与所述电子设备的相机之间的距离为未知量的基础上,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵;
基于所述第一坐标与所述图像坐标,解算所述转换矩阵中所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
在某些实施例中,所述处理组件利用所述第二坐标以及所述目标角度,在所述目标对象与所述电子设备的相机之间的距离为未知量的基础上,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵具体可以是:
根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的目标旋转矩阵;
以所述目标对象与所述电子设备的相机之间的距离为未知量,结合所述目标旋转矩阵 以及所述电子设备的相机视角参数,构建所述目标对象相对所述电子设备的距离转换矩阵;
根据所述第二坐标以及所述电子设备对应相机的焦距,生成所述电子设备的相机内参矩阵;
基于所述相机内参矩阵、所述目标旋转矩阵以及所述距离转换矩阵,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵。
进一步,可选地,所述处理组件基于所述第一坐标与所述图像坐标,解算所述转换矩阵中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离具体可以是:
以所述转换矩阵为坐标转换关系,结合所述第一坐标与所述图像坐标,构建转换方程;
解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
在某些实施例中,所述处理组件解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离具体可以是:
利用N点透视算法(Perspective-n-Point,PNP)解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
在某些实施例中,所述目标对象相对所述电子设备的目标角度可以包括:所述目标对象在水平方向上相对所述电子设备的水平角度以及所述目标对象在垂直方向上相对所述电子设备的垂直角度;
所述处理组件根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的目标旋转矩阵具体可以是:
将所述水平角度以及所述垂直角度输入三维旋转矩阵公式,计算获得所述世界坐标系的坐标轴与所述相机坐标系的坐标轴对应的目标旋转矩阵。
在某些实施例中,所述处理组件将所述目标图像输入训练获得的网络预测模型,获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度具体可以是:
提取所述目标图像的至少一个特征图像;
提取每个特征图像的至少一个候选框;其中,每个候选框以四个坐标点表示;
对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像;其中,所述至少一个目标特征图像的数量小于所述至少一个特征图像的数量;
基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征;
将所述至少一个目标特征区域分别对应的至少一个区域特征进行分类处理,以获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
作为一种可能的实现方式,所述处理组件可以通过以下方式训练获得网络预测模型:
确定至少一个训练图像;其中,每个训练图像被标注有所述训练图像中的目标对象对应三维框在世界坐标系下的第一真实坐标、所述目标对象对应二维框在图像坐标系下的第二真实坐标以及所述目标对象相对其电子设备的真实角度;
构建网络预测模型;
以每个训练图像被标注的第一真实坐标、第二真实坐标以及真实角度为训练目标,利用所述至少一个训练图像训练获得所述网络预测模型的模型参数。
作为一个实施例,所述处理组件以每个训练图像被标注的第一真实坐标、第二真实坐标以及真实角度为训练目标,利用所述至少一个训练图像训练获得所述网络预测模型的模型参数具体可以是:
确定所述网络预测模型的参考参数;
将所述至少一个训练图像分别输入所述参考参数对应的网络预测模型,获得每个训练 图像对应的预测结果;
基于每个训练图像预测结果与其被标注的第一真实坐标、第二真实坐标以及真实角度,计算所述参考参数对应网络预测模型的训练误差;
如果所述训练误差满足训练约束条件,确定所述参考参数为所述网络预测模型的模型参数;
如果所述训练误差不满足所述训练约束条件,基于每个训练图像对应的预测结果,调整所述网络预测模型的模型参数,获得新的参考参数;返回至所述确定所述网络预测模型的参考参数的步骤继续执行。
在某些实施例中,所述每个训练图像对应的预测结果包括:每个训练图像中所述目标对象对应三维框在世界坐标系下的第一预测坐标、所述目标对象对应二维框在图像坐标系下的第二预测坐标以及所述目标对象相对其电子设备的预测角度;
所述处理组件基于每个训练图像预测结果与其被标注的第一真实坐标、第二真实坐标以及真实角度,计算所述参考参数对应网络预测模型的训练误差可以包括:
根据每个训练图像对应所述第一预测坐标与所述第一真实坐标,确定每个训练图像的第一坐标误差;
根据每个训练图像对应所述第二预测坐标与所述第二真实坐标,确定每个训练图像的第二坐标误差;
根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差;
基于每个训练图像对应所述第一坐标误差、所述第二坐标误差以及所述角度误差,确定所述网络预测模型对所述至少一个训练图像的训练误差。
进一步,可选地,所述处理组件根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差具体可以是:
基于每个训练图像对应的预测角度与其对应真实角度的角度差值,确定每个训练图像对应的角度误差。
作为一个实施例,所述处理组件根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差具体可以是:
将一个圆周平均划分为多个角区域;
从所述多个角区域中确定每个训练图像对应真实角度所在目标角区域;
如果任一个训练图像对应所述预测角度位于其真实角度所在所述目标角区域时,确定所述训练图像不存在角度误差;
如果任一个训练图像对应所述预测角度不位于其真实角度所在所述目标角区域时,确定所述训练图像存在角误差。
作为一种可能的实现方式,所述处理组件通过以下方式预测任一个输入网络预测模型的训练图像对应的预测结果:
提取所述训练图像的至少一个特征图像;
提取每个特征图像的至少一个候选框;其中,每个候选框以四个坐标点表示;
对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像;其中,所述至少一个目标特征图像的数量小于所述至少一个特征图像的数量;
基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征;
将所述至少一个目标特征区域分别对应的至少一个区域特征进行分类处理,以获得所述训练图像对应三维框在世界坐标系下的第一预测坐标、所述目标对象对应二维框在图像坐标系下的第二预测坐标以及所述目标对象相对其电子设备的预测角度。
在某些实施例中,所述确定所述目标对象在图像坐标系中的图像坐标包括:
将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系,获得所述目 标对象对应三维框在图像坐标系中的图像坐标。
作为一个实施例,所述处理组件将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系,获得所述目标对象对应三维框在图像坐标系中的图像坐标具体可以是:
根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵;
确定所述世界坐标系与所述相机坐标系之间的进行坐标转换时对应的相机外参矩阵;
根据所述相机内参矩阵以及所述相机外参矩阵,将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系的,获得所述图像坐标。
在某些实施例中,所述处理组件根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵具体可以是:
确定所述第二坐标在图像坐标系的图像中心点;
根据所述图像中心点以及所述电子设备对应相机的焦距,生成相机内参矩阵。
图11所述的对象解算设备可以执行上述任一实施例所述的对象解算方法,其实现原理和技术效果不再赘述。对于上述实施例中对象解算设备的处理组件所执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
如图12所示,为本申请实施例提供的一种电子设备的一个实施例的结构示意图,所述设备可以包括:存储组件1201以及处理组件1202;所述存储组件1201用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件1202调用;
所述处理组件1202可以用于:
获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理。
作为一个实施例,所述处理组件基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标具体可以是:
基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度;
所述处理组件还可以用于:
确定所述第一坐标在图像坐标系中的图像坐标;
基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离;
所述处理组件控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理具体可以是:
控制所述电子设备以所述图像坐标点为中心点以及以所述目标距离为飞行半径绕所述目标对象执行飞行处理。
图12所述的电子设备可以执行上述任一实施例所述的绕点飞行方法,其实现原理和技术效果不再赘述。对于上述实施例中电子设备的处理组件所执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
如图13所示,为本申请实施例提供的一种电子设备的一个实施例的结构示意图图,所述设备可以包括:存储组件1301以及处理组件1302;所述存储组件1301用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件1302调用;
所述处理组件1302可以用于:
获取目标对象的图像信息。基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标。根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点。将所述目标对象在世界坐标系的中心点映射到图像坐标系, 获得图像坐标点。根据所述图像坐标点,对所述目标对象进行对焦。
根据图像坐标点对目标对象进行对焦时可以是电子设备通过数据计算的方法对原始数据进行分析,以所述图像坐标点为目标对象的中心点,计算相机的镜头马达的移动步数或者线圈调整数据,从而完成对焦。通过确定目标对象中心点在图像坐标系的坐标,以将该图像坐标系的坐标用于相机的自动对焦,可以实现准确对焦,提高对焦精度。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助加必需的通用硬件平台的方式来实现,当然也可以通过硬件和软件结合的方式来实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以计算机产品的形式体现出来,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,电子设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (26)

  1. 一种对象解算方法,其特征在于,包括:
    获取目标对象的图像信息;
    基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;
    根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;
    将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象的尺寸数据;
    所述根据所述图像坐标点,确定所述电子设备对所述目标对象进行智能跟随时的目标点包括:
    根据所述图像坐标点以及所述目标对象的尺寸数据,确定所述电子设备对所述目标对象进行智能跟随时的目标点。
  3. 根据权利要求1所述的方法,其特征在于,所述第一坐标包括所述目标对象在图像信息中对应三维框在所述世界坐标系的八个坐标顶点;
    所述根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点包括:
    利用所述三维框在世界坐标系的八个坐标点进行中心点解算,获得所述目标对象在所述世界坐标系的中心点。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标包括:
    基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度;
    所述方法还包括:
    确定所述第一坐标在图像坐标系对应的图像坐标;
    基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离;其中,所述目标距离用于确定所述电子设备针对所述目标对象进行智能跟随时的跟随距离。
  5. 根据权利要求4所述的方法,其特征在于,所述基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度包括:
    将所述目标图像输入网络预测模型,获得所述目标对象对应三维框在世界坐标系的第一坐标、所述目标对象对应二维框在图像坐标系的第二坐标以及所述目标对象相对所述电子设备的目标角度。
  6. 根据权利要求4所述的方法,其特征在于,所述基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离包括:
    利用所述第二坐标以及所述目标角度,在所述目标对象与所述电子设备的相机之间的距离为未知量的基础上,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵;
    基于所述第一坐标与所述图像坐标,解算所述转换矩阵中所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
  7. 根据权利要求6所述的方法,其特征在于,所述利用所述第二坐标以及所述目标角度,在所述目标对象与所述电子设备的相机之间的距离为未知量的基础上,构建所述世 界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵包括:
    根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的角度旋转矩阵;
    以所述目标对象与所述电子设备的相机之间的距离为未知量,结合所述角度旋转矩阵以及所述电子设备的相机视角参数,构建所述目标对象相对所述电子设备的距离转换矩阵;
    根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵;
    基于所述相机内参矩阵、所述角度旋转矩阵以及所述距离转换矩阵,构建所述世界坐标系中的第一坐标转换至所述图像坐标系中的图像坐标时的转换矩阵。
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述第一坐标与所述图像坐标,解算所述转换矩阵中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离包括:
    以所述转换矩阵为坐标转换关系,结合所述第一坐标与所述图像坐标,构建转换方程;
    解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
  9. 根据权利要求8所述的方法,其特征在于,所述解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离包括:
    利用N点透视算法(Perspective-n-Point,PNP)解算所述转换方程中的所述目标对象与所述电子设备的相机之间的距离,获得目标距离。
  10. 根据权利要求7所述的方法,其特征在于,所述目标对象相对所述电子设备的目标角度包括:所述目标对象在水平方向上相对所述电子设备的水平角度以及所述目标对象在垂直方向上相对所述电子设备的垂直角度;
    所述根据所述目标角度,生成所述世界坐标系的坐标轴与相机坐标系的坐标轴对应的角度旋转矩阵包括:
    将所述水平角度以及所述垂直角度输入三维旋转矩阵公式,计算获得所述世界坐标系的坐标轴与所述相机坐标系的坐标轴对应的角度旋转矩阵。
  11. 根据权利要求4所述的方法,其特征在于,所述将所述目标图像输入训练获得的网络预测模型,获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度包括:
    提取所述目标图像的至少一个特征图像;
    提取每个特征图像的至少一个候选框;其中,每个候选框以四个坐标点表示;
    对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像;其中,所述至少一个目标特征图像的数量小于所述至少一个特征图像的数量;
    基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征;
    将所述至少一个目标特征图像分别对应的至少一个区域特征进行全连接处理,以获得所述目标对象对应三维框在世界坐标系下的第一坐标、所述目标对象对应二维框在图像坐标系下的第二坐标以及所述目标对象相对所述电子设备的目标角度。
  12. 根据权利要求4所述的方法,其特征在于,所述网络预测模型通过以下方式训练获得:
    确定至少一个训练图像;其中,每个训练图像被标注有所述训练图像中的目标对象对应三维框在世界坐标系下的第一真实坐标、所述目标对象对应二维框在图像坐标系下的第二真实坐标以及所述目标对象相对其电子设备的真实角度;
    构建网络预测模型;
    以每个训练图像被标注的第一真实坐标、第二真实坐标以及真实角度为训练目标,利 用所述至少一个训练图像训练获得所述网络预测模型的模型参数。
  13. 根据权利要求12所述的方法,其特征在于,所述以每个训练图像被标注的第一真实坐标、第二真实坐标以及真实角度为训练目标,利用所述至少一个训练图像训练获得所述网络预测模型的模型参数包括:
    确定所述网络预测模型的参考参数;
    将所述至少一个训练图像分别输入所述参考参数对应的网络预测模型,获得每个训练图像对应的预测结果;
    基于每个训练图像预测结果与其被标注的第一真实坐标、第二真实坐标以及真实角度,计算所述参考参数对应网络预测模型的训练误差;
    如果所述训练误差满足训练约束条件,确定所述参考参数为所述网络预测模型的模型参数;
    如果所述训练误差不满足所述训练约束条件,基于每个训练图像对应的预测结果,调整所述网络预测模型的模型参数,获得新的参考参数;返回至所述确定所述网络预测模型的参考参数的步骤继续执行。
  14. 根据权利要求13所述的方法,其特征在于,所述每个训练图像对应的预测结果包括:每个训练图像中所述目标对象对应三维框在世界坐标系下的第一预测坐标、所述目标对象对应二维框在图像坐标系下的第二预测坐标以及所述目标对象相对其电子设备的预测角度;
    基于每个训练图像预测结果与其被标注的第一真实坐标、第二真实坐标以及真实角度,计算所述参考参数对应网络预测模型的训练误差包括:
    根据每个训练图像对应所述第一预测坐标与所述第一真实坐标,确定每个训练图像的第一坐标误差;
    根据每个训练图像对应所述第二预测坐标与所述第二真实坐标,确定每个训练图像的第二坐标误差;
    根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差;
    基于每个训练图像对应所述第一坐标误差、所述第二坐标误差以及所述角度误差,确定所述网络预测模型对所述至少一个训练图像的训练误差。
  15. 根据权利要求14所述的方法,其特征在于,所述根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差包括:
    基于每个训练图像对应的预测角度与其对应真实角度的角度差值,确定每个训练图像对应的角度误差。
  16. 根据权利要求14所述的方法,其特征在于,所述根据每个训练图像对应所述预测角度与其对应真实角度,确定每个训练图像的角度误差包括:
    将一个圆周平均划分为多个角区域;
    从所述多个角区域中确定每个训练图像对应真实角度所在目标角区域;
    如果任一个训练图像对应所述预测角度位于其真实角度所在所述目标角区域时,确定所述训练图像不存在角度误差;
    如果任一个训练图像对应所述预测角度不位于其真实角度所在所述目标角区域时,确定所述训练图像存在角误差。
  17. 根据权利要求13所述的方法,其特征在于,所述网络预测模型通过以下方式预测任一个输入的训练图像的预测结果:
    提取所述训练图像的至少一个特征图像;
    提取每个特征图像的至少一个候选框;其中,每个候选框以四个坐标点表示;
    对所述至少一个特征图像进行关键图像提取,获得至少一个目标特征图像;其中,所述至少一个目标特征图像的数量小于所述至少一个特征图像的数量;
    基于每个目标特征图像对应的至少一个候选框,对每个目标特征图像进行区域特征提取,获得每个目标特征图像对应的至少一个区域特征;
    将所述至少一个目标特征图像分别对应的至少一个区域特征进行全连接处理,以获得所述训练图像对应三维框在世界坐标系下的第一预测坐标、所述目标对象对应二维框在图像坐标系下的第二预测坐标以及所述目标对象相对其电子设备的预测角度。
  18. 根据权利要求4所述的方法,其特征在于,所述确定所述目标对象在图像坐标系中的图像坐标包括:
    将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系,获得所述目标对象对应三维框在图像坐标系中的图像坐标。
  19. 根据权利要求18所述的方法,其特征在于,所述将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系,获得所述目标对象对应三维框在图像坐标系中的图像坐标包括:
    根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵;
    确定所述世界坐标系与所述相机坐标系之间的进行坐标转换时对应的相机外参矩阵;
    根据所述相机内参矩阵以及所述相机外参矩阵,将所述目标对象对应三维框在世界坐标系的第一坐标映射到图像坐标系的,获得所述图像坐标。
  20. 根据权利要求7或18任一项所述的方法,其特征在于,所述根据所述第二坐标以及所述电子设备对应相机的焦距,生成相机内参矩阵包括:
    确定所述第二坐标在图像坐标系的图像中心点;
    根据所述图像中心点以及所述电子设备对应相机的焦距,生成相机内参矩阵。
  21. 一种绕点飞行方法,其特征在于,包括:
    获取目标对象的图像信息;
    基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;
    根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;
    将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;
    控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理。
  22. 根据权利20所述的方法,其特征在于,所述基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标包括:
    基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标、对应二维框在图像坐标系的第二坐标以及所述目标对象相对电子设备的目标角度;
    确定所述目标对象对应三维框在图像坐标系中的图像坐标;
    基于所述世界坐标系中的第一坐标与所述图像坐标系中的图像坐标之间的坐标转换关系,结合所述第二坐标以及所述目标角度,解算所述目标对象与所述电子设备的相机之间的距离,获得目标距离;
    所述控制电子设备以所述图像坐标点为中心点绕所述目标对象执行飞行处理包括:
    控制所述电子设备以所述图像坐标点为中心点以及以所述目标距离为飞行半径绕所述目标对象执行飞行处理。
  23. 一种对象对焦方法,其特征在于,包括:
    获取目标对象的图像信息;
    基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;
    根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;
    将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;
    根据所述图像坐标点,对所述目标对象进行对焦。
  24. 一种对象解算设备,其特征在于,包括:存储组件以及处理组件;所述存储组件 用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件调用,以执行如权利要求1-20任一项所述的对象解算方法。
  25. 一种电子设备,其特征在于,包括:存储组件以及处理组件;所述存储组件用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件调用,以执行如权利要求21~22任一项所述的绕点飞行方法。
  26. 一种电子设备,其特征在于,包括:存储组件以及处理组件;所述存储组件用于存储一条或多条计算机指令,所述一条或多条计算机指令用于被所述处理组件调用;
    所述处理组件用于:
    获取目标对象的图像信息;基于所述图像信息,确定所述目标对象对应三维框在世界坐标系的第一坐标;根据所述三维框在所述世界坐标系的第一坐标,确定所述目标对象在所述世界坐标系的中心点;将所述目标对象在世界坐标系的中心点映射到图像坐标系,获得图像坐标点;根据所述图像坐标点,对所述目标对象进行对焦。
PCT/CN2020/080162 2020-03-19 2020-03-19 对象解算、绕点飞行方法及设备 WO2021184289A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/080162 WO2021184289A1 (zh) 2020-03-19 2020-03-19 对象解算、绕点飞行方法及设备
CN202080004354.7A CN113168716A (zh) 2020-03-19 2020-03-19 对象解算、绕点飞行方法及设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080162 WO2021184289A1 (zh) 2020-03-19 2020-03-19 对象解算、绕点飞行方法及设备

Publications (1)

Publication Number Publication Date
WO2021184289A1 true WO2021184289A1 (zh) 2021-09-23

Family

ID=76879199

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080162 WO2021184289A1 (zh) 2020-03-19 2020-03-19 对象解算、绕点飞行方法及设备

Country Status (2)

Country Link
CN (1) CN113168716A (zh)
WO (1) WO2021184289A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113455A1 (zh) * 2022-11-29 2024-06-06 北京天玛智控科技股份有限公司 实景监测方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610967B (zh) * 2021-08-13 2024-03-26 北京市商汤科技开发有限公司 三维点检测的方法、装置、电子设备及存储介质
CN114092822B (zh) * 2022-01-24 2022-07-26 广东皓行科技有限公司 图像处理方法、移动控制方法以及移动控制系统
CN116741019A (zh) * 2023-08-11 2023-09-12 成都飞航智云科技有限公司 一种基于ai的飞行模型训练方法、训练系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205815A (zh) * 2015-09-15 2015-12-30 西安理工大学 基于云台可控制摄像机的实时视频跟踪系统及跟踪方法
CN106803270A (zh) * 2017-01-13 2017-06-06 西北工业大学深圳研究院 无人机平台基于单目slam的多关键帧协同地面目标定位方法
CN107025659A (zh) * 2017-04-11 2017-08-08 西安理工大学 基于单位球面坐标映射的全景目标跟踪方法
CN109116869A (zh) * 2017-06-23 2019-01-01 北京臻迪科技股份有限公司 一种绕点飞行控制方法及装置
JP2019079091A (ja) * 2017-10-20 2019-05-23 株式会社富士通ビー・エス・シー 移動体制御プログラム、移動体制御方法および情報処理装置
CN110148169A (zh) * 2019-03-19 2019-08-20 长安大学 一种基于ptz云台相机的车辆目标三维信息获取方法
WO2020014909A1 (zh) * 2018-07-18 2020-01-23 深圳市大疆创新科技有限公司 拍摄方法、装置和无人机

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205815A (zh) * 2015-09-15 2015-12-30 西安理工大学 基于云台可控制摄像机的实时视频跟踪系统及跟踪方法
CN106803270A (zh) * 2017-01-13 2017-06-06 西北工业大学深圳研究院 无人机平台基于单目slam的多关键帧协同地面目标定位方法
CN107025659A (zh) * 2017-04-11 2017-08-08 西安理工大学 基于单位球面坐标映射的全景目标跟踪方法
CN109116869A (zh) * 2017-06-23 2019-01-01 北京臻迪科技股份有限公司 一种绕点飞行控制方法及装置
JP2019079091A (ja) * 2017-10-20 2019-05-23 株式会社富士通ビー・エス・シー 移動体制御プログラム、移動体制御方法および情報処理装置
WO2020014909A1 (zh) * 2018-07-18 2020-01-23 深圳市大疆创新科技有限公司 拍摄方法、装置和无人机
CN110148169A (zh) * 2019-03-19 2019-08-20 长安大学 一种基于ptz云台相机的车辆目标三维信息获取方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024113455A1 (zh) * 2022-11-29 2024-06-06 北京天玛智控科技股份有限公司 实景监测方法及装置

Also Published As

Publication number Publication date
CN113168716A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2021184289A1 (zh) 对象解算、绕点飞行方法及设备
CN112132972B (zh) 一种激光与图像数据融合的三维重建方法及系统
JP6830139B2 (ja) 3次元データの生成方法、3次元データの生成装置、コンピュータ機器及びコンピュータ読み取り可能な記憶媒体
KR102126724B1 (ko) 포인트 클라우드 데이터를 복구하기 위한 방법 및 장치
CN110568447B (zh) 视觉定位的方法、装置及计算机可读介质
CN109682381B (zh) 基于全向视觉的大视场场景感知方法、系统、介质及设备
CN109544677B (zh) 基于深度图像关键帧的室内场景主结构重建方法及系统
US10086955B2 (en) Pattern-based camera pose estimation system
US20210097717A1 (en) Method for detecting three-dimensional human pose information detection, electronic device and storage medium
US10451403B2 (en) Structure-based camera pose estimation system
CN107808407A (zh) 基于双目相机的无人机视觉slam方法、无人机及存储介质
CN112444242A (zh) 一种位姿优化方法及装置
JP2018022360A (ja) 画像解析装置、画像解析方法およびプログラム
US9858669B2 (en) Optimized camera pose estimation system
CN112258565B (zh) 图像处理方法以及装置
CN108764080B (zh) 一种基于点云空间二值化的无人机视觉避障方法
CN110260866A (zh) 一种基于视觉传感器的机器人定位与避障方法
CN110969648A (zh) 一种基于点云序列数据的3d目标跟踪方法及系统
CN114219855A (zh) 点云法向量的估计方法、装置、计算机设备和存储介质
CN115410167A (zh) 目标检测与语义分割方法、装置、设备及存储介质
CN117036300A (zh) 基于点云-rgb异源图像多级配准映射的路面裂缝识别方法
US20210156710A1 (en) Map processing method, device, and computer-readable storage medium
CN111709269B (zh) 一种深度图像中基于二维关节信息的人手分割方法和装置
CN113670316A (zh) 基于双雷达的路径规划方法、系统、存储介质及电子设备
Hu et al. R-CNN based 3D object detection for autonomous driving

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20925379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20925379

Country of ref document: EP

Kind code of ref document: A1