WO2020029758A1 - 对象三维检测及智能驾驶控制的方法、装置、介质及设备 - Google Patents

对象三维检测及智能驾驶控制的方法、装置、介质及设备 Download PDF

Info

Publication number
WO2020029758A1
WO2020029758A1 PCT/CN2019/096232 CN2019096232W WO2020029758A1 WO 2020029758 A1 WO2020029758 A1 WO 2020029758A1 CN 2019096232 W CN2019096232 W CN 2019096232W WO 2020029758 A1 WO2020029758 A1 WO 2020029758A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
target object
key point
detection
pseudo
Prior art date
Application number
PCT/CN2019/096232
Other languages
English (en)
French (fr)
Inventor
蔡颖婕
曾星宇
闫俊杰
王晓刚
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021501280A priority Critical patent/JP6949266B2/ja
Priority to US17/259,678 priority patent/US11100310B2/en
Priority to SG11202100378UA priority patent/SG11202100378UA/en
Publication of WO2020029758A1 publication Critical patent/WO2020029758A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/584Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of vehicle lights or traffic lights
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles

Definitions

  • Object three-dimensional (3D) detection is usually used to predict three-dimensional spatial parameters such as the spatial position, movement direction, and 3D size of an object.
  • three-dimensional spatial parameters such as the spatial position, movement direction, and 3D size of an object.
  • it is necessary to perform three-dimensional detection on other vehicles on the road to obtain the three-dimensional cuboids of other vehicles, the direction in which the vehicle is traveling, and the positional relationship with the camera.
  • Accurately obtaining the three-dimensional detection results of the object is conducive to improving the safety of autonomous driving.
  • the embodiments of the present disclosure provide a technical solution for three-dimensional object detection and intelligent driving control.
  • a method for three-dimensional object detection includes: obtaining two-dimensional coordinates of a key point of a target object in an image to be processed; The pseudo three-dimensional detector of the target object; obtaining depth information of the key point; and determining the three-dimensional detector of the target object according to the depth information of the key point and the pseudo three-dimensional detector.
  • a method for controlling intelligent driving includes: using a video frame included in a video collected by a camera device installed on a vehicle as a to-be-processed image, and adopting any of the foregoing A method determines a three-dimensional detection object of a target object; generates a vehicle control instruction according to the information of the three-dimensional detection object; and sends the vehicle control instruction to the vehicle.
  • a three-dimensional object detection device includes: a two-dimensional coordinate acquisition module for acquiring two-dimensional coordinates of key points of a target object in an image to be processed; and constructing a three-dimensional detection A volume module for constructing a pseudo three-dimensional detection volume of the target object according to the two-dimensional coordinates of the key point; an acquisition depth information module for acquiring depth information of the key point; and a three-dimensional detection volume module for determining A three-dimensional detection volume of the target object is determined according to the depth information of the key point and the pseudo three-dimensional detection volume.
  • an intelligent driving control device including: the object three-dimensional detection device according to any one of the above embodiments of the present disclosure, for using a video captured by a camera device provided on a vehicle The included video frame is an image to be processed, and a three-dimensional detection object of the target object is determined; a generation instruction module is configured to generate a vehicle control instruction according to the information of the three-dimensional detection body; and a transmission instruction module is used to send the vehicle to the vehicle Control instruction.
  • an electronic device including: a memory for storing a computer program; a processor for executing a computer program stored in the memory, and when the computer program is executed, Implement the method described in any one of the above embodiments of the present disclosure.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the method according to any one of the foregoing embodiments of the present disclosure is implemented.
  • a computer program including computer instructions, and when the computer instructions are run in a processor of a device, the method according to any one of the foregoing embodiments of the present disclosure is implemented.
  • the embodiment of the present disclosure uses the depth information of the key points and the pseudo three-dimensional detection.
  • FIG. 1 is a flowchart of an embodiment of a three-dimensional object detection method according to the present disclosure
  • FIG. 2 is a schematic diagram of an embodiment of key points of a target object in an image to be processed of the present disclosure
  • FIG. 3 is a schematic diagram of an embodiment of a pseudo three-dimensional detection body of the present disclosure
  • FIG. 4 is a flowchart of an embodiment of a smart driving control method according to the present disclosure.
  • FIG. 6 is a schematic structural diagram of an embodiment of a smart driving control device according to the present disclosure.
  • FIG. 7 is a block diagram of an exemplary device implementing an embodiment of the present disclosure.
  • the term "and / or” in the disclosure is only an association relationship describing the associated object, which means that there can be three kinds of relationships, for example, A and / or B can mean: A exists alone, and A and B exist simultaneously, There are three cases of B alone.
  • the character "/" in the present disclosure generally indicates that the related objects before and after are an "or" relationship.
  • Embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, and servers, which can operate with many other general or special-purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and / or configurations suitable for use with electronic devices such as terminal devices, computer systems, and servers including but not limited to: personal computer systems, server computer systems, thin clients, thick Clients, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of these systems, and more .
  • Electronic devices such as terminal equipment, computer systems, and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, and data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system / server can be implemented in a distributed cloud computing environment.
  • tasks are performed by remote processing devices linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including a storage device.
  • FIG. 1 is a flowchart of an embodiment of a three-dimensional object detection method of the present disclosure. As shown in FIG. 1, the method in this embodiment includes:
  • the image to be processed in the embodiment of the present disclosure may be an image such as a still picture or a photo, or a video frame in a dynamic video.
  • a camera device provided on a moving object
  • the video frame in the captured video is, for example, a video frame in a video captured by an imaging device disposed at a fixed position.
  • the moving object may be a vehicle, a robot, a robot arm, or the like.
  • the fixed position may be a desktop or a wall. Embodiments of the present disclosure do not limit the expression of moving objects and fixed positions.
  • the to-be-processed image in the embodiment of the present disclosure may be an image obtained by using a common high-definition camera, which can avoid phenomena such as high implementation costs caused by having to use a radar ranging device and a depth camera. .
  • the key point in the embodiment of the present disclosure is a key point with semantics, and the key point is usually an outer contour key point of the target object.
  • the semantic key points in the embodiment of the present disclosure may include: a key point on the front left corner of the vehicle (as shown in FIG. 2, referred to as left front and bottom below), a key point on the roof front left corner ( As shown by 2 in FIG. 2, the following is referred to as left front and upper left, key points on the left rear corner of the roof (as shown in 3 in FIG. 2, referred to as left rear upper as described below), and key points on the rear left corner of the vehicle (as indicated by 4 in FIG.
  • the semantics of the key points can indicate the position of the key points on the vehicle.
  • the vehicle in the embodiment of the present disclosure may also include a larger number of key points. The embodiment of the present disclosure does not limit the number of key points of the target object and the semantics expressed by the key points.
  • any key point in the embodiment of the present disclosure generally corresponds to one surface or two or three surfaces of a pseudo three-dimensional detection body (such as a three-dimensional rectangular parallelepiped).
  • the key point usually corresponds to one or two or more faces of the three-dimensional detection object. That is, there is a corresponding relationship between the key point and the surface of the pseudo three-dimensional object and the surface of the three-dimensional object.
  • the left front down, left front up, right front down, and right front up correspond to the front of the pseudo three-dimensional detector and the three-dimensional detector, that is, four front left, top left, front right bottom, and front right top can be observed from the front of the vehicle.
  • the rear wheels correspond to the pseudo three-dimensional test object and the right side of the three-dimensional test object, that is, four key points can be observed from the right position of the vehicle: left rear bottom, left rear top, right rear bottom, and right rear top; front right bottom, right front top , Rear right lower, rear right lower, front right and rear right Key points; left front down, left front up, right front down, right front up, left back down, left back up, right back down, and right back up corresponding to the top of the pseudo 3D detector and the 3D detector, that is, from the position above the vehicle Eight key points were observed: front left down, front left up, right front down, right front up, left back down, left back up, right back down, and right back up; eight front left, front right down, left rear down, right rear down, left
  • the front wheels, front right wheels, left rear wheels, and rear right wheels correspond to the pseudo three-dimensional detector and the underside of the three-dimensional detector, that is, the lower left, lower right, lower left, lower right, and lower left
  • Eight key points front wheel, front right wheel, rear left wheel and rear right wheel.
  • the embodiments of the present disclosure may not set the corresponding relationship between the key points and the pseudo three-dimensional detector and the top and bottom of the three-dimensional detector.
  • target object detection may also be performed on the image to be processed to obtain a two-dimensional target detection frame including the target object. Accordingly, in S100, the two-dimensional coordinates of the key points of the target object may be acquired based on the image portion of the two-dimensional target detection frame corresponding to the image to be processed.
  • Image blocks that is, image blocks containing target objects, for example, vehicle image blocks, that is, vehicle blocks
  • input the target object image blocks into a neural network and perform key points on the target object image blocks through the neural network.
  • Detection (such as vehicle key point detection) processing, so that the two-dimensional coordinates of each key point of the target object (such as a vehicle) in the target object image block (such as a vehicle image block) can be obtained based on the information output by the neural network, and then the target can be The two-dimensional coordinates of the key points of the object in the target object image block are converted into the two-dimensional coordinates of the key points of the target object in the image to be processed.
  • the embodiment of the present disclosure does not limit an implementation manner of obtaining a two-dimensional coordinate of a key point of a target object by using a neural network.
  • the embodiment of the present disclosure continues to perform other steps in the method of three-dimensional object detection; otherwise, the embodiment of the present disclosure may no longer execute the object.
  • Other steps in the 3D inspection method are conducive to saving computing resources.
  • the neural network in the embodiment of the present disclosure may include, but is not limited to, a convolutional layer, a non-linear Relu layer, a pooling layer, and a fully connected layer. The more layers the neural network includes, the The deeper the network.
  • the neural network according to the embodiment of the present disclosure may use a stack hourglass neural network framework structure, and may also use an Active Shape Model (ASM), Active Appearnce Model (AAM), or a cascade shape regression algorithm. Neural network framework structure. The embodiment of the present disclosure does not limit the structure of the neural network.
  • the operation S100 may be performed by a processor calling a corresponding instruction stored in a memory, or may be performed by a two-dimensional coordinate acquisition module 500 executed by the processor.
  • the embodiment of the present disclosure may first filter all key points of the currently obtained target object to select key points that meet the requirements of prediction accuracy (for example, select a key point with a confidence level higher than a predetermined confidence level threshold). Key point), and then use the two-dimensional coordinates of the selected key points that meet the requirements of prediction accuracy to construct a pseudo three-dimensional detection object of the target object in a two-dimensional plane. Since key points with low prediction accuracy are avoided from being used in the process of constructing the pseudo-three-dimensional detector of the target object, the embodiments of the present disclosure are beneficial to improve the accuracy of the pseudo-three-dimensional cuboid constructed.
  • a predetermined judgment rule may be used to determine an optimal plane and a first-time plane from among at least one possible plane constructed. , And then construct a pseudo 3D detector of the target object based on the best and second best faces.
  • the embodiment of the present disclosure may first determine an optimal surface of a target object in an image to be processed, construct the optimal surface in a two-dimensional plane, and then determine a method of the optimal surface.
  • Vector and based on the expansion of key points in the optimal surface along the direction of the normal vector, a pseudo three-dimensional detection body is formed. Conducive to the rapid and accurate construction of pseudo three-dimensional detection body.
  • the key points of the target object may include multiple points.
  • the manner in which the embodiment of the present disclosure determines the best surface of the target object in the image to be processed may be: first, to meet the requirements of prediction accuracy
  • the key points corresponding to each face determine the quality of each face, that is, the quality assessment of each face is based on the key points that meet the prediction accuracy requirements.
  • the face with the highest quality rating is taken as the best face of the target object.
  • a pseudo three-dimensional detection body of the target object can be constructed according to the two-dimensional coordinates of the selected key point.
  • the quality evaluation method of the surface may be: counting the number of key points corresponding to the prediction accuracy requirements corresponding to each surface, and using the counted number as the quality evaluation score of the surface, so that a surface corresponds to The greater the number of key points that meet the prediction accuracy requirements, the higher the quality assessment score for that face.
  • the method for evaluating the quality of a surface may also be: counting the number of key points corresponding to the prediction accuracy requirements corresponding to each surface and the sum of the prediction accuracy. In this way, each surface corresponds to a number of key points and Prediction accuracy score.
  • the embodiment of the present disclosure can calculate the quotient of the prediction accuracy score corresponding to each face and the number of key points, that is, calculate the average prediction accuracy score of each face, and accurately predict the prediction corresponding to the face.
  • the average score of the degree is used as the quality evaluation score of the surface, so the higher the average prediction accuracy score corresponding to a surface, the higher the quality evaluation score of the surface.
  • Embodiments of the present disclosure may also use other methods to determine the noodle quality.
  • the embodiments of the present disclosure do not limit the implementation of noodle quality evaluation methods.
  • the embodiment of the present disclosure may construct an optimal surface in a two-dimensional plane in various ways.
  • a key point on the optimal surface may be used to make a vertical line in the two-dimensional plane (that is, A line through the key point in the vertical direction), an edge on the best plane is located on the vertical line, and the intersection of the vertical line and the edge on the other plane is a vertex of the best surface.
  • two key points on the best surface can be used as a connecting line in a two-dimensional plane.
  • the connecting line can be an edge on the best surface, or the connecting line and its extension line can be the best surface.
  • One of the edges, that is, the two key points can be two vertices on the optimal surface, or the intersection of the extension line of the connection line between the two key points and the edge of the other surface is the vertex of the optimal surface.
  • a key point on the best surface is used as a parallel line
  • the parallel line is a line parallel to the other side of the best surface, that is, a key point on the best surface is used as the best surface.
  • the parallel line of the other side, one of the edges on the best plane is located on the parallel line, and the intersection point of the parallel line with the above-mentioned perpendicular line or the intersection point with the edges of other planes is the vertex of the best plane.
  • the embodiments of the present disclosure are not limited to an implementation manner of constructing an optimal surface in a two-dimensional plane.
  • the embodiment of the present disclosure can determine the normal vector of the best surface in multiple ways.
  • the first example first determine the sub-optimal surface of the pseudo three-dimensional detector, and then use the sub-optimal surface in the sub-optimal surface.
  • the key points make a vertical line to the best surface, so that the vertical line can be used as the normal vector of the best surface.
  • the second example remove the key points corresponding to the best surface from all the key points that meet the requirements of prediction accuracy, select a key point with the highest prediction accuracy from the remaining key points, and do the best surface through the key point A vertical line, and use this vertical line as the normal vector of the best surface.
  • the two key points can be The coordinate difference between the points in the two-dimensional plane is used as the normal vector of the best surface. For example, in FIG. 2, it is assumed that the left side of the vehicle is the best surface, the front of the vehicle is the second best surface, the coordinates of the key point 7 in the two-dimensional plane are (u 7 , v 7 ), and the coordinates of the key point 1 in the two-dimensional plane.
  • Point 8 and also obtained key point 10; suppose key point 1, key point 2, key point 3, key point 4, key point 5, key point 6, key point 7, key point 8, and key point 10 all meet Prediction accuracy requirements, and the prediction accuracy of the key point 10 is high; in the above case, it is clear that the key point 10 has a key point detection error.
  • the embodiment of the present disclosure determines the next best surface by using the above method. It is possible to avoid the phenomenon that a vertical line is made from the key point 10 to the optimal surface, thereby obtaining the normal vector of the optimal surface.
  • the normal vector of the best surface is (u 7 -u 1 , v 7 -v 1 ); the normal vector is also the bottom edge of the sub-optimal surface, and the embodiment of the present disclosure can form a third through key point 7 A vertical line and a third line parallel to the first line or the second line through the key point 7.
  • the top left vertex in the best plane expands along the direction of the normal vector and will intersect with the third vertical line to form a second best surface.
  • a pseudo three-dimensional detection body An example of a pseudo three-dimensional detection body formed by a target object in an image to be processed according to an embodiment of the present disclosure is shown in FIG. 3. After the optimal surface and its normal vector are determined in the embodiment of the present disclosure, a pseudo three-dimensional detector can be formed in various ways. The embodiment of the present disclosure does not limit the implementation process of forming the pseudo three-dimensional detector.
  • the operation S110 may be performed by a processor calling a corresponding instruction stored in a memory, or may be performed by a three-dimensional test volume module 510 executed by the processor.
  • the embodiment of the present disclosure may first obtain a depth map of a to-be-processed image by using a monocular method or a binocular method, etc .; and then, use the two-dimensional coordinates of the key points to read the key points from the depth map.
  • the depth value In the embodiment of the present disclosure, the depth value of the key point can also be directly obtained by using the H matrix method, that is, the two-dimensional coordinates of the key point are multiplied with the H matrix to obtain the depth value of the key point from the result of the multiplication (the unit may be meters).
  • the imaging device is a depth-based imaging device, the depth value of the key point can be directly obtained; the embodiment of the present disclosure does not limit the implementation process of obtaining the depth value of the key point.
  • the operation S120 may be performed by a processor calling a corresponding instruction stored in a memory, or may be performed by an acquiring depth information module 520 operated by the processor.
  • S130 Determine a three-dimensional detection object of the target object according to the depth information of the key point and the pseudo three-dimensional detection object.
  • the operation S130 may be performed by a processor calling a corresponding instruction stored in the memory, or may be performed by a determined three-dimensional sample volume module 530 executed by the processor.
  • an embodiment of the present disclosure may first construct an initial three-dimensional detection body (such as an initial three-dimensional cuboid) of a target object in three-dimensional space according to two-dimensional coordinates and depth information of key points, and then, at least detect pseudo three-dimensional The volume is used as a constraint condition of the three-dimensional detection volume, and the initial three-dimensional detection volume is corrected to obtain a three-dimensional detection volume (such as a three-dimensional rectangular parallelepiped) of the target object.
  • an initial three-dimensional detection body such as an initial three-dimensional cuboid
  • the volume is used as a constraint condition of the three-dimensional detection volume, and the initial three-dimensional detection volume is corrected to obtain a three-dimensional detection volume (such as a three-dimensional rectangular parallelepiped) of the target object.
  • the three-dimensional space in the embodiment of the present disclosure is generally a three-dimensional space in the real world, for example, a three-dimensional space based on a three-dimensional coordinate system of a camera device.
  • the embodiment of the present disclosure may convert the two-dimensional coordinates of the key point into three-dimensional coordinates in a three-dimensional space in various ways.
  • the depth value of the key point obtained above is converted into a distance in a three-dimensional space, and the distance can be considered as the distance between the key point and the camera device; then, the three-dimensional of each key point is calculated using the following formula (1) coordinate:
  • P represents the parameters of the camera device
  • X, Y, and Z represent the three-dimensional coordinates of the key point, that is, the three-dimensional coordinates of the key point in the three-dimensional space of the real world, where Z can be substituted into the key obtained above.
  • the depth value of the point; u and v represent the two-dimensional coordinates of the key point, that is, the two-dimensional coordinates of the key point in the coordinate system of the image to be processed;
  • w represents a scaling factor.
  • the variables X, Y, and w can be solved, thereby obtaining the three-dimensional coordinates of the key points, namely (X, Y, Z).
  • the manner in which the embodiment of the present disclosure determines the best surface of the target object in the three-dimensional space may be: first, for each surface corresponding to a key point that meets the requirements of prediction accuracy, determine the quality of each surface, That is, based on the key points that meet the requirements of prediction accuracy, quality evaluation is performed for each surface; then, the surface with the highest quality evaluation is used as the best surface of the target object.
  • the quality evaluation method of the noodles can be several methods as exemplified in the above step S110. The description will not be repeated here.
  • the embodiment of the present disclosure can construct an optimal surface in a three-dimensional plane in various ways.
  • a key point on the optimal surface can be used to make a vertical line in three-dimensional space (that is, pass through A line in the vertical direction (y direction) of the key point), one edge on the best plane is located on the vertical line, and the intersection of the vertical line and the edge on the other plane is a vertex of the best surface.
  • two key points on the best surface can be used as a connecting line in the three-dimensional space.
  • the connecting line can be an edge on the best surface, or the connecting line and its extension line can be on the best surface.
  • An edge, that is, the two key points can be two vertices on the best surface, or the intersection of the extension line of the connection line between the two key points and the edge of the other surface is the vertex of the best surface.
  • a key point on the best surface is used as a parallel line
  • the parallel line is a line parallel to the other side of the best surface, that is, a key point on the best surface is used as the best surface.
  • the parallel line of the other side, one of the edges on the best plane is located on the parallel line, and the intersection point of the parallel line with the above-mentioned perpendicular line or the intersection point with the edges of other planes is the vertex of the best plane.
  • the embodiments of the present disclosure are not limited to an implementation manner of constructing an optimal surface in a three-dimensional space.
  • the embodiment of the present disclosure can determine the normal vector of the optimal surface in multiple ways.
  • the first example first determine the sub-optimal surface of the three-dimensional detection body, and then use the key in the sub-optimal surface. The point makes a vertical line to the best surface, so that the vertical line can be used as the normal vector of the best surface.
  • the second example remove the key points corresponding to the best surface from all the key points that meet the requirements of prediction accuracy, select a key point with the highest prediction accuracy from the remaining key points, and do the best surface through the key point A vertical line, and use this vertical line as the normal vector of the best surface.
  • the vertices in the optimal surface may be extended in the direction of the normal vector of the optimal surface, so as to be related to the edges of other surfaces. Intersect, and finally form the initial three-dimensional detection body. For example, in FIG. 2, a first perpendicular line passing through the key point 1 and a second perpendicular line passing through the key point 4 are formed, and then pass through the key point 6 and the key point 5 at the same time, and are perpendicular to the first perpendicular line and the second perpendicular line.
  • the normal vector of the best surface is (X 7 -X 1 , Y 7 -Y 1 , Z 7 -Z 1 );
  • the normal vector is also the bottom edge of the sub-optimal surface, and the embodiment of the present disclosure can be formed by The third vertical line of the key point 7 and the third line parallel to the first line or the second line through the key point 7.
  • the vertex of the upper left corner in the best plane expands along the direction of the normal vector and will be aligned with the third vertical line.
  • an initial three-dimensional detection body can be quickly constructed for the target object, and the consumption of computing resources to construct the initial three-dimensional detection body is small, and the implementation cost is small. Lower.
  • the embodiment of the present disclosure is based on the key points of the target object to construct the initial three-dimensional detection object, the process of constructing the initial three-dimensional detection object is independent of factors such as whether the target object is located on the ground, etc. Therefore, the embodiment of the present disclosure can be effective Avoiding the phenomenon that the 3D detection of the object cannot be achieved in the scene where the target object is not on the ground, etc., which is conducive to improving the applicable range of the 3D detection of the object.
  • the initial three-dimensional detector is corrected according to the pseudo three-dimensional detector to form a three-dimensional detector of the target object.
  • the initial three-dimensional detector in the three-dimensional space may be based on the pseudo three-dimensional detector in the two-dimensional plane. Adjustments are made to improve the area overlap between the area of the adjusted 3D object mapped in the 2D plane and the pseudo 3D object.
  • the embodiment of the present disclosure may map each vertex in the initial three-dimensional detection body in a two-dimensional plane, thereby obtaining a graphic of the initial three-dimensional detection body in a two-dimensional plane.
  • the area overlap between the graphic area mapped in the two-dimensional plane and the area of the pseudo three-dimensional detection object in the two-dimensional plane can be changed.
  • the two The overlap area is as large as possible, and for example, the intersection ratio of the two is as large as possible.
  • the manner of changing the area overlap between the two may include: adjusting the position of the initial three-dimensional specimen in the three-dimensional space, and mapping the initial three-dimensional specimen in a two-dimensional plane.
  • the area of overlap between the region and the pseudo 3D object is the largest.
  • the graphic region mapped by the initial 3D object in the 2D plane completely covers the pseudo 3D object.
  • the pseudo 3D object completely covers the initial 3D object.
  • the manner of changing the area overlap between the two may also include: adjusting the size of the initial three-dimensional detection body in the three-dimensional space, and mapping the initial three-dimensional detection body in the two-dimensional plane.
  • the graphic area is as consistent as possible with the graphic area of the pseudo three-dimensional object.
  • the embodiment of the present disclosure can adjust the length / width / height of the initial three-dimensional object in the three-dimensional space, so that the adjusted length / width / height of the three-dimensional object in the two-dimensional space is adjusted to The ratio of the length / width / height of the pseudo three-dimensional detection object satisfies a predetermined ratio or is the same.
  • the embodiment of the present disclosure corrects the initial three-dimensional detection body in the three-dimensional space by using the pseudo three-dimensional detection body, which is beneficial to improving the accuracy of the three-dimensional detection body constructed for the target object in the three-dimensional space.
  • the embodiment of the present disclosure may further use the preset aspect ratio for the target object as a constraint condition of the initial three-dimensional sample, so that in the three-dimensional space, the initial three-dimensional sample may be performed according to the constraint condition. Correction.
  • the embodiment of the present disclosure may preset the ratio of the length, width, and height of the vehicle to 2: 1: 1, so that the ratio of the length, width, and height of the initial three-dimensional object exceeds a certain range of 2: 1: 1.
  • the length, width, and height of the initial three-dimensional object can be adjusted so that the length, width, and height ratio of the adjusted three-dimensional object does not exceed a certain range of 2: 1: 1.
  • the embodiment of the present disclosure may also use the detection frame of the target object in the image to be processed as a constraint condition of the initial three-dimensional specimen, so that in the three-dimensional space, the initial three-dimensional specimen may be performed according to the constraint condition. Correction.
  • the embodiment of the present disclosure may use a vehicle detection frame (also referred to as a vehicle outer frame) as a constraint condition of the initial three-dimensional detection body, and the overall position and / or length of the initial three-dimensional detection body The width and height are adjusted so that when the adjusted three-dimensional object is mapped in the two-dimensional space, it completely falls into the detection frame. Because the detection frame of the target object is usually more accurate, using the detection frame as a constraint to correct the initial three-dimensional detection body is beneficial to improve the accuracy of the three-dimensional detection result of the object.
  • the embodiment of the present disclosure may The subject is smoothed.
  • the smoothing process may include at least one of a smoothing process for the length, width, and height of the three-dimensional object, a smoothing process for the movement direction of the three-dimensional object, and a smoothing process for the center point of the bird's-eye view of the three-dimensional object.
  • the disclosed embodiment may also perform smoothing processing on the eight vertices of the three-dimensional detection body.
  • the embodiment of the present disclosure facilitates the improvement of the accuracy of the three-dimensional detection of the object by smoothing the three-dimensional detection body, and can avoid the large-scale shaking of the target object in two video frames adjacent to each other, which is conducive to improving the safety of automatic driving. Sex.
  • the embodiments of the present disclosure may use multiple historical to-be-processed images (such as 5 or 6 or 7 historical video frames) before the current to-be-processed image, and use corresponding fitting functions to predict Parameters such as the length, width, height, movement direction, or center point of the bird's eye view of the three-dimensional detection object of the target object of the current image to be processed.
  • the embodiment of the present disclosure may also perform smoothing processing on the 8 vertices of the three-dimensional detection body of the target object.
  • the fitting function in the embodiment of the present disclosure may adopt a quadratic function, a cubic exponential function, or a logarithmic function, etc.
  • the embodiments of the present disclosure are not limited to the expression form of the fitting function used in the balance processing process.
  • x represents the value of the historical video frame for optimal fitting.
  • x (x 1 , x 2 , x 3 , x 4 , x 5 )
  • t represents the time corresponding to the historical video frame.
  • t (t 1 , t 2 , t 3 , t 4 , t 5 )
  • a , B, and c represent coefficients of a quadratic function.
  • a, b, and c in formula (3) may be obtained using historical video frames, and then the prediction result pred of the current video frame may be obtained using formula (3).
  • represents the weight corresponding to the prediction result
  • pred represents the prediction result
  • represents the weight corresponding to the 3D object detection result of the current video frame
  • x 6 ′ represents the 3D object detection result of the current video frame
  • x 6 Represents the 3D object detection result of the current video frame after smoothing.
  • the embodiment of the present disclosure does not limit the value of the weight.
  • An embodiment of the present disclosure also provides a smart driving control method.
  • the intelligent driving control method of this embodiment includes: using a video frame included in a video collected by a camera device installed on a vehicle as a to-be-processed image, using any of the above-mentioned implementations of the present disclosure.
  • the example object three-dimensional detection method determines a three-dimensional object of the target object; generates a vehicle control instruction according to the information of the three-dimensional object; and sends the vehicle control instruction to a vehicle.
  • FIG. 4 is a flowchart of an embodiment of a smart driving control method according to an embodiment of the present disclosure.
  • the intelligent driving control method of the embodiment of the present disclosure can be applied to an environment of automatic driving (such as completely unassisted automatic driving), and can also be applied to an environment of assisted driving.
  • the embodiments of the present disclosure do not limit the application environment of the intelligent driving control method.
  • the operation S400 may be performed by a processor calling a corresponding instruction stored in a memory, or may be performed by a two-dimensional coordinate acquisition module 500 executed by the processor.
  • the operation S410 may be performed by a processor calling a corresponding instruction stored in the memory, or may be performed by a three-dimensional test volume module 510 that is run by the processor.
  • the operation S420 may be performed by a processor calling a corresponding instruction stored in the memory, or may be performed by an acquiring depth information module 520 operated by the processor.
  • S430 Determine the three-dimensional detection object of the target object according to the depth information of the key point and the pseudo three-dimensional detection object.
  • this operation reference may be made to the description of operation S130 in FIG. 1 in the foregoing method embodiment, and details are not described herein again.
  • the operation S430 may be performed by the processor calling a corresponding instruction stored in the memory, or may be performed by the determined three-dimensional detection volume module 530 executed by the processor.
  • the information of the three-dimensional object in the embodiment of the present disclosure includes any one or more of the following: the movement direction of the three-dimensional object, the positional relationship between the three-dimensional object and the imaging device, and three-dimensional detection. Body size.
  • the embodiment of the present disclosure does not limit the content included in the information of the three-dimensional detection object.
  • the generated vehicle control instruction may include any one or more of the following: brake instruction, deceleration driving instruction, left steering instruction, right steering instruction, hold The current speed running instruction, whistle instruction, and acceleration running instruction, the embodiments of the present disclosure do not limit the expression form of the vehicle control instruction.
  • the operation S440 may be performed by the processor calling a corresponding instruction stored in the memory, or may be performed by a generating instruction module 610 executed by the processor.
  • the operation S450 may be executed by the processor calling a corresponding instruction stored in the memory, or may be executed by a sending instruction module 620 executed by the processor.
  • the object three-dimensional detection technology of the embodiment of the present disclosure can also be applied to other fields; for example, it can implement object detection in industrial manufacturing, indoor areas such as supermarkets Object detection, object detection in the field of security, etc.
  • the embodiments of the present disclosure do not limit the applicable scenarios of the three-dimensional object detection technology.
  • any of the methods for three-dimensional object detection and intelligent driving control provided by the embodiments of the present disclosure may be executed by any appropriate device having data processing capabilities, including, but not limited to, a terminal device and a server.
  • any method of three-dimensional object detection and intelligent driving control provided by the embodiments of the present disclosure may be executed by a processor.
  • the processor executes any of the three-dimensional objects mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in a memory. Detection and intelligent driving control methods. I will not repeat them below.
  • a person of ordinary skill in the art may understand that all or part of the operations of the foregoing method embodiments may be performed by a program instructing related hardware.
  • the foregoing program may be stored in a computer-readable storage medium. When the program is executed, the program is executed.
  • Including operations of the foregoing method embodiments; and the foregoing storage media include: ROMs, RAMs, magnetic disks, or optical discs, which can store program codes.
  • the two-dimensional coordinate acquisition module 500 is configured to acquire two-dimensional coordinates of a key point of a target object in an image to be processed.
  • the image to be processed in the embodiment of the present disclosure may be a video frame in a video captured by an imaging device disposed on a moving object.
  • the image to be processed in the embodiment of the present disclosure may also be a video frame in a video captured by an imaging device disposed at a fixed position.
  • the target objects in the embodiments of the present disclosure may include any one or more of the following: motor vehicles, non-motor vehicles, pedestrians, animals, buildings, plants, obstacles, dangerous objects, traffic signs, and articles.
  • the obtaining target detection frame module 550 may perform target object detection on the image to be processed to obtain a two-dimensional target detection frame including the target object.
  • the two-dimensional coordinate acquisition module 500 can acquire the two-dimensional coordinates of the key points of the target object based on the image portion of the two-dimensional target detection frame corresponding to the image to be processed.
  • the object three-dimensional detection device in the embodiment of the present disclosure may no longer perform the object three-dimensional detection processing.
  • the acquiring two-dimensional coordinate module 500 no longer performs an operation of acquiring a two-dimensional target detection frame.
  • the constructing three-dimensional detection volume module 510 is configured to construct a pseudo three-dimensional detection volume of the target object according to the two-dimensional coordinates of the key points.
  • the second sub-module may include a first unit and a second unit.
  • the first unit is used to determine an optimal surface among at least one possible surface constructed according to predetermined surface quality judgment rules.
  • the second unit is used to construct a pseudo three-dimensional detection object of the target object according to the best surface.
  • the first unit may determine an optimal surface and a first-time best surface among at least one possible surface according to predetermined judgment rules of the surface quality; and the second unit may construct a pseudo-three-dimensional object of the target according to the best surface and the second-best surface. Detection body.
  • the second unit may first determine a normal vector of the best surface, and then the second unit may form a pseudo three-dimensional detection body according to the extension of the vertices in the best surface in the direction of the normal vector.
  • the method of determining the normal vector of the optimal surface by the second unit may be: a vertical line made from a key point in the sub-optimal surface to the optimal surface as the normal vector of the optimal surface.
  • the manner in which the second unit of the embodiment of the present disclosure determines the normal vector of the best surface may also be: a vertical line made from a key point with the highest prediction accuracy among other key points except the corresponding best surface to the best surface , As the normal vector of the best surface.
  • the manner in which the second unit of the embodiment of the present disclosure determines the normal vector of the optimal surface may also be: the coordinate difference between two key points on the side perpendicular to the optimal surface in the adjacent surface of the optimal surface is taken as Normal vector of the best face.
  • the selecting key point module 540 may be used to select from the plurality of key points The key points that meet the requirements of prediction accuracy are selected, so that the three-dimensional detection volume module 510 can construct a pseudo three-dimensional detection volume of the target object according to the two-dimensional coordinates of the key points selected and selected by the selection key point module 540.
  • the determining three-dimensional detection volume module 530 is configured to determine the three-dimensional detection volume of the target object according to the depth information of the key point and the pseudo three-dimensional detection volume.
  • the three-dimensional detection volume module 530 may include a third sub-module, a fourth sub-module, and a fifth sub-module.
  • the third sub-module is used to convert the two-dimensional coordinates of the key points into three-dimensional coordinates in the three-dimensional space according to the depth information of the key points.
  • the third sub-module can convert the two-dimensional coordinates of key points that meet the requirements of prediction accuracy into three-dimensional coordinates in three-dimensional space.
  • the fourth sub-module is used to construct an initial three-dimensional detection volume of the target object according to the three-dimensional coordinates of the key points.
  • the fourth sub-module may include a third unit, a fourth unit, and a fifth unit.
  • the third unit is used to determine the best surface of the target object according to the corresponding surfaces of the three-dimensional coordinates of the key points, and to construct the best surface of the target object in the three-dimensional space.
  • the fourth unit is used to determine the normal vector of the best surface.
  • the smoothing processing module 560 is configured to perform smoothing processing on a three-dimensional detection body of the same target object in a plurality of images to be processed with a temporal relationship.
  • the smoothing processing in the embodiment of the present disclosure may include any one or more of the following: smoothing processing on the length, width, and height of the three-dimensional detection object, smoothing processing on the movement direction of the three-dimensional detection object, and a bird's eye view center point of the three-dimensional detection object Smoothing processing and smoothing processing on the vertices of the three-dimensional object.
  • the obtaining movement direction module 570 is configured to obtain the movement direction of the three-dimensional detection body according to the three-dimensional coordinates of the key points of the target object.
  • the acquiring position relationship module 580 is configured to acquire a position relationship between the target object and an imaging device that captures the image to be processed according to the three-dimensional coordinates of the key points of the target object.
  • the modules such as the obtaining motion direction module 570 and the obtaining position relationship module 580, reference may be made to related descriptions in the foregoing method embodiments. The description will not be repeated here.
  • FIG. 6 is a schematic structural diagram of an embodiment of an intelligent driving control device according to an embodiment of the present disclosure.
  • the device in FIG. 6 mainly includes an object three-dimensional detection device 600, a generation instruction module 610, and a transmission instruction module 620.
  • the generation instruction module 610 is configured to generate a vehicle control instruction according to the information of the three-dimensional detection object obtained by the object three-dimensional detection device 600.
  • the sending instruction module 620 is configured to send a vehicle control instruction to the vehicle.
  • FIG. 7 illustrates an exemplary device 700 suitable for implementing embodiments of the present disclosure.
  • the device 700 may be a control system / electronic system configured in a car, a mobile terminal (for example, a smart mobile phone, etc.), a personal computer (for example, a PC Desktop computers or laptops, etc.), tablet computers, and electronic devices such as servers.
  • the device 700 includes one or more processors, a communication unit, and the like.
  • the one or more processors may be: one or more central processing units (CPUs) 701, and / or, one or more utilizations.
  • Image processor (GPU) 713 for visual tracking by neural network, etc.
  • An operation corresponding to any of the three-dimensional detection methods of the object for example, obtaining two-dimensional coordinates of a key point of a target object in an image to be processed; constructing a pseudo three-dimensional detection body of the target object according to the two-dimensional coordinates of the key point; obtaining Determining the depth information of the key point; and determining the three-dimensional detection volume of the target object according to the depth information of the key point and the pseudo three-dimensional detection volume.
  • the processor may communicate with the read-only memory 702 and / or the random access memory 703 to execute executable instructions, connect to the communication unit 712 through the bus 704, and communicate with other target devices via the communication unit 712, thereby completing the implementation of the present disclosure.
  • the operations corresponding to any of the intelligent driving control methods provided by the examples are, for example, taking a video frame included in a video collected by a camera device installed on a vehicle as a to-be-processed image, and determining a target using the three-dimensional object detection method according to any embodiment of the present disclosure A three-dimensional object of the object; generating a vehicle control instruction based on the information of the three-dimensional object; sending the vehicle control instruction to the vehicle.
  • the RAM 703 can also store various programs and data required for device operation.
  • the CPU 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704.
  • ROM 702 is an optional module.
  • the RAM 703 stores executable instructions or writes executable instructions to the ROM 702 at runtime.
  • the executable instructions cause the central processing unit 701 to perform operations corresponding to the above-mentioned three-dimensional object detection method or intelligent driving control method.
  • An input / output (I / O) interface 705 is also connected to the bus 704.
  • the communication unit 712 may be provided in an integrated manner, or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards), and are respectively connected to the bus.
  • the following components are connected to the I / O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output portion 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), and a speaker; a storage portion 708 including a hard disk and the like And a communication section 709 including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 709 performs communication processing via a network such as the Internet.
  • the driver 710 is also connected to the I / O interface 705 as needed.
  • a removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed, so that a computer program read out therefrom is installed in the storage section 708 as needed.
  • FIG. 7 is only an optional implementation manner.
  • the number and types of the components in FIG. 7 may be selected, deleted, added, or replaced according to actual needs.
  • it can also be implemented in separate settings or integrated settings.
  • GPU713 and CPU701 can be set separately.
  • GPU713 can be integrated on CPU701, and the communication department can be set separately or integrated. Wait on CPU701 or GPU713.
  • the process described below with reference to the flowchart may be implemented as a computer software program.
  • the embodiment of the present disclosure includes a computer program product including a computer Computer program on the computer program, the computer program includes program code for executing the steps shown in the flowchart, and the program code may include instructions corresponding to the steps in the three-dimensional object detection method or the intelligent driving control method provided by the embodiments of the present disclosure, such as For example, an instruction to obtain two-dimensional coordinates of a key point of a target object in an image to be processed; an instruction to construct a pseudo three-dimensional detection body of the target object according to the two-dimensional coordinates of the key point; obtain a depth of the key point An instruction of information; an instruction of determining a three-dimensional detector of the target object according to the depth information of the key point and the pseudo three-dimensional detector.
  • an instruction to determine a three-dimensional object of a target object by using the object three-dimensional detection method described in any embodiment of the present disclosure is to use a video frame included in a video collected by a camera installed on a vehicle as a to-be-processed image;
  • the information of the vehicle generates an instruction of the vehicle control instruction;
  • the instruction of the vehicle control instruction is sent to the vehicle.
  • the computer program may be downloaded and installed from a network through the communication portion 709, and / or installed from a removable medium 711.
  • a central processing unit (CPU) 701 the instructions for implementing the corresponding steps described in the embodiments of the present disclosure are executed.
  • an embodiment of the present disclosure further provides a computer program program product for storing computer-readable instructions that, when executed, cause a computer to perform the operations described in any of the foregoing embodiments.
  • 3D object detection method or intelligent driving control method is described in any of the foregoing embodiments.
  • the computer program product may be implemented by hardware, software, or a combination thereof.
  • the computer program product is embodied as a computer storage medium.
  • the computer program product is embodied as a software product, such as a Software Development Kit (SDK) and the like.
  • SDK Software Development Kit
  • an embodiment of the present disclosure further provides another three-dimensional object detection method or an intelligent driving control method and a corresponding device and electronic device, a computer storage medium, a computer program, and a computer program product.
  • the method includes: the first device sends an object three-dimensional detection instruction or an intelligent driving control instruction to the second device, and the instruction causes the second device to execute the object three-dimensional detection method or the intelligent driving control method in any one of the foregoing possible embodiments; One device receives the three-dimensional object detection result or the intelligent driving control result sent by the second device.
  • the visual object 3D detection instruction or the intelligent driving control instruction may be a call instruction
  • the first device may instruct the second device to perform the object 3D detection operation or the intelligent driving control operation by means of a call.
  • the second device may execute the steps and / or processes in any of the embodiments of the three-dimensional object detection method or the intelligent driving control method described above.
  • the methods and apparatus, electronic devices, and computer-readable storage media of the present disclosure may be implemented in many ways.
  • the methods and devices, electronic devices, and computer-readable storage media of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware.
  • the above order of the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order described above, unless specifically stated otherwise.
  • the present disclosure may also be implemented as programs recorded in a recording medium, which programs include machine-readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing a method according to the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)
  • Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
  • Image Processing (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

本公开实施例公开了一种对象三维检测及智能驾驶控制的方法、装置、介质及设备,其中的对象三维检测方法包括:获取待处理图像中的目标对象的关键点的二维坐标;根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体;获取所述关键点的深度信息;根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体。

Description

对象三维检测及智能驾驶控制的方法、装置、介质及设备
本公开要求在2018年08月07日提交中国专利局、申请号为CN201810891535.0、发明名称为“对象三维检测及智能驾驶控制的方法、装置、介质及设备”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机视觉技术,尤其是涉及一种对象三维检测方法、对象三维检测装置、智能驾驶控制方法、智能驾驶控制装置、电子设备、计算机可读存储介质以及计算机程序。
背景技术
对象三维(3D)检测,通常用于预测出物体的空间位置、运动方向以及3D尺寸等三维空间参数。例如,在自动驾驶技术中,需要针对路面上的其他车辆进行三维检测,以获得其他车辆的三维长方体、车辆行驶方向以及与拍摄装置的位置关系等。准确的获得物体的三维检测结果,有利于提高自动驾驶的安全性。
发明内容
本公开实施例提供一种对象三维检测和智能驾驶控制技术方案。
根据本公开实施例的一方面,提供一种对象三维检测方法,所述方法包括:获取待处理图像中的目标对象的关键点的二维坐标;根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体;获取所述关键点的深度信息;根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体。
根据本公开实施例的另一方面,提供一种智能驾驶控制方法,所述方法包括:以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,采用本公开实施例的上述任一方法确定目标对象的三维检测体;根据所述三维检测体的信息生成车辆控制指令;向所述车辆发送所述车辆控制指令。
根据本公开实施例的又一方面,提供一种对象三维检测装置,所述装置包括:获取二维坐标模块,用于获取待处理图像中的目标对象的关键点的二维坐标;构建三维检测体模块,用于根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体;获取深度信息模块,用于获取所述关键点的深度信息;确定三维检测体模块,用于根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体。
根据本公开实施例的再一方面,提供一种智能驾驶控制装置,所述装置包括:本公开上述任一实施例所述的对象三维检测装置,用于以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,确定目标对象的三维检测体;生成指令模块,用于根据所述三维检测体的信息生成车辆控制指令;发送指令模块,用于向所述车辆发送所述车辆控制指令。
根据本公开实施例的再一方面,提供一种电子设备,包括:存储器,用于存储计算机程序;处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现本公开上述任一实施例所述方法。
根据本公开实施例再一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现本公开上述任一实施例所述方法。
根据本公开实施例的再一个方面,提供一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现本公开上述任一实施例所述方法。
基于本公开提供的对象三维检测方法、对象三维检测装置、智能驾驶控制方法、智能驾驶控制装置、电子设备、计算机可读存储介质及计算机程序,通过利用目标对象的关键点的二维坐标,在二维平面中,构建该目标对象的伪三维检测体,由于目标对象的关键点检测结果的准确度,目前可以得到相应的保障,因此,本公开实施例通过关键点的深度信 息和伪三维检测体,可以使目标对象的三维检测体的大小尽可能的接近目标对象的实际大小,有利于在使计算资源消耗较小的情况下,提高对象三维检测的准确性,从而有利于在保证低实现成本的情况下,提高自动驾驶的安全性。
下面通过附图和实施例,对本公开的技术方案做的详细描述。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1为本公开的对象三维检测方法一个实施例的流程图;
图2为本公开的待处理图像中目标对象的关键点一个实施例的示意图;
图3为本公开的伪三维检测体一个实施例的示意图;
图4为本公开的智能驾驶控制方法一个实施例的流程图;
图5为本公开的对象三维检测装置一个实施例的结构示意图;
图6为本公开的智能驾驶控制装置一个实施例的结构示意图;
图7为实现本公开实施例的一示例性设备的框图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
还应理解,在本公开实施例中,“多个”可以指两个或两个以上,“至少一个”可以指一个、两个或两个以上。
本领域技术人员可以理解,本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等,既不代表任何特定技术含义,也不表示它们之间的必然逻辑顺序。
还应理解,对于本公开实施例中提及的任一部件、数据或结构,在没有明确限定或者在前后文给出相反启示的情况下,一般可以理解为一个或多个。
还应理解,本公开对各个实施例的描述着重强调各个实施例之间的不同之处,其相同或相似之处可以相互参考,为了简洁,不再一一赘述。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法以及设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应当注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行讨论。
另外,公开中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本公开中字符“/”,一般表示前后关联对象是一种“或”的关系。
本公开实施例可以应用于终端设备、计算机系统及服务器等电子设备,其可与众多其它通用或者专用的计算系统环境或者配置一起操作。适于与终端设备、计算机系统以及服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子,包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端设备、计算机系统以及服务器等电子设备可以在由计算机系统执行的计算机系统 可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑以及数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
图1为本公开对象三维检测方法一个实施例的流程图。如图1所示,该实施例方法包括:
S100、获取待处理图像中的目标对象的关键点的二维坐标。
在一个可选示例中,本公开实施例中的待处理图像可以为呈现静态的图片或照片等图像,也可以为呈现动态的视频中的视频帧,例如,设置在移动物体上的摄像装置所摄取的视频中的视频帧,再例如,设置在固定位置的摄像装置所摄取的视频中的视频帧。上述移动物体可以为车辆、机器人或者机械臂等。上述固定位置可以为桌面或者墙壁等。本公开实施例不限制移动物体和固定位置的表现形式。
在一个可选示例中,本公开实施例中的待处理图像可以是利用普通的高清摄像装置所获得的图像,可以避免必须使用雷达测距装置以及深度摄像装置等而导致的实现成本较高等现象。
在一个可选示例中,本公开实施例中的目标对象至少包括:前后左右四个面。例如,本公开实施例中的目标对象可以是机动车辆(即机动车,尤指汽车,如燃油汽车、电动汽车或者无人驾驶汽车等)、非机动车辆(如自行车、人力三轮车等)行人、动物、建筑物、植物、障碍物、危险物、交通标识物或者物品等。本公开实施例不限制目标对象的表现形式。由于目标对象可以为多种形式,因此,本公开实施例的对象三维检测方法具有通用性强的特点。
在一个可选示例中,本公开实施例中的关键点是具有语义的关键点,且该关键点通常是目标对象的外轮廓关键点。在目标对象为车辆的情况下,本公开实施例中的具有语义的关键点可以包括:车辆左前角关键点(如图2中的1,下述简称左前下)、车顶左前角关键点(如图2中的2,下述简称左前上)、车顶左后角关键点(如图2中的3,下述简称左后上)、车辆左后角关键点(如图2中的4,下述简称左后下)、左后轮底部关键点(如图2中的5,下述简称左后轮)、左前轮底部关键点(如图2中的6,下述简称左前轮)、车辆右前角关键点(如图2中的7,下述简称右前下)、车顶右前角关键点(如图2中的8,下述简称右前上)、车顶右后角关键点(与图2中的3左右对称,下述简称右后上)、车辆右后角关键点(与图2中的4左右对称,下述简称右后下)、右后轮底部关键点(与图2中的5左右对称,下述简称右后轮)以及右前轮底部关键点(与图2中的6左右对称,下述简称右前轮)。也就是说,关键点的语义可以表示出关键点在车辆上的位置。另外,本公开实施例中的车辆也可以包括更多数量的关键点。本公开实施例不对目标对象的关键点的数量以及关键点所表达的语义进行限制。
在一个可选示例中,本公开实施例中的任一关键点通常会对应伪三维检测体(如三维长方体)的一个面或者两个面或者三个面,同样的,本公开实施例中的关键点通常会对应三维检测体的一个面或者两个面或者更多面。也就是说,关键点与伪三维检测体的面以及三维检测体的面之间存在对应关系。如图2中,左前下、左前上、右前下和右前上对应伪三维检测体以及三维检测体的前面,即从车辆的前方位置可以观测到左前下、左前上、右前下和右前上这四个关键点;左前下、左前上、左后下、左后下、左前轮和左后轮对应伪三维检测体以及三维检测体的左面,即从车辆的左方位置可以观测到左前下、左前上、左后下、左后下、左前轮和左后轮这六个关键点;左后下、左后上、右后下和右后上对应伪三维检测体以及三维检测体的后面,即从车辆的后方位置可以观测到左后下、左后上、右后下和右后上这四个关键点;右前下、右前上、右后下、右后下、右前轮和右后轮对应伪三维检测体以及三维检测体的右面,即从车辆的右方位置可以观测到左后下、左后上、右后下和右后上这四个关键点;右前下、右前上、右后下、右后下、右前轮和右后轮这六个 关键点;左前下、左前上、右前下、右前上、左后下、左后上、右后下和右后上对应伪三维检测体以及三维检测体的上面,即从车辆的上方位置可以观测到左前下、左前上、右前下、右前上、左后下、左后上、右后下和右后上这八个关键点;左前下、右前下、左后下、右后下、左前轮、右前轮、左后轮和右后轮对应伪三维检测体以及三维检测体的下面,即从车辆的下方位置可以观测到左前下、右前下、左后下、右后下、左前轮、右前轮、左后轮和右后轮这八个关键点。另外,需要特别说明的是,本公开实施例可以不设置关键点与伪三维检测体以及三维检测体的上面和下面的对应关系。
在一个可选示例中,在上述操作S100之前,还可以对待处理图像进行目标对象检测,获得包括有目标对象的二维目标检测框。相应地,在S100中,可以基于待处理图像对应二维目标检测框的图像部分,获取目标对象的关键点的二维坐标。
在一个可选示例中,本公开实施例可以利用现有的神经网络来获取待处理图像中的目标对象的关键点的二维坐标。例如,将包含有目标对象(如车辆)的待处理图像输入神经网络中,经由该神经网络对待处理图像进行关键点检测(如车辆关键点检测)处理,从而根据神经网络输出的信息可以获得目标对象的各关键点在待处理图像中的二维坐标。再例如,先对待处理图像进行目标对象检测处理,获得包含有目标对象的二维目标检测框的位置,之后,可以根据二维目标检测框的位置对待处理图像进行切分处理,从而获得目标对象图像块(即包含有目标对象的图像块,例如,车辆图像块,即包含有车辆的图像块),再将目标对象图像块输入神经网络中,经由该神经网络对目标对象图像块进行关键点检测(如车辆关键点检测)处理,从而根据神经网络输出的信息可以获得目标对象(如车辆)的各关键点在目标对象图像块(如车辆图像块)中的二维坐标,进而可以将目标对象的各关键点在目标对象图像块中的二维坐标转换为目标对象的各关键点在待处理图像中的二维坐标。本公开实施例不限制利用神经网络获得目标对象的关键点的二维坐标的实现方式。另外,在成功获得二维目标检测框(即包含有目标对象的检测框)的情况下,本公开实施例继续执行对象三维检测方法中的其他步骤,否则,本公开实施例可以不再执行对象三维检测方法中的其他步骤,从而有利于节约计算资源。
在一个可选示例中,本公开实施例中的神经网络可以包括但不限于卷积层、非线性Relu层、池化层以及全连接层等,该神经网络所包含的层数越多,则网络越深。本公开实施例的神经网络可以使用Stack hourglass神经网络框架结构,也可以采用基于ASM(Active Shape Model,主动形状模型)、AAM(Active Appearnce Model,主动表观模型)或者基于级联形状回归算法的神经网络框架结构。本公开实施例对神经网络的结构不作限制。
在一个可选示例中,该操作S100可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取二维坐标模块500执行。
S110、根据关键点的二维坐标,在二维平面中,构建目标对象的伪三维检测体。
在一个可选示例中,本公开实施例中的目标对象的伪三维检测体通常是指:在二维平面中可以将目标对象框于其中的长方体。也就是说,通过在二维平面中作图,可以形成一长方体,由于该长方体并不是三维空间中的真实的长方体,而是从平面上,看上去的一个长方体,因此,本公开实施例将其称为伪三维检测体。然而,虽然伪三维检测体并不是三维空间中的真实的长方体,伪三维检测体的长宽高仍然可以反映出目标对象的长宽高。在通常情况下,可以将伪三维检测体的长宽高认为是伪三维检测体内的目标对象的长宽高。也就是说,伪三维检测体可以认为是二维平面中的目标对象的外接长方体。另外,本公开实施例中的伪三维检测体包括伪三维正方体。
在一个可选示例中,本公开实施例可以先针对当前获得的目标对象的所有关键点进行筛选,以挑选出符合预测准确度要求的关键点(如挑选出置信度高于预定置信度阈值的关键点),然后,再利用挑选出的符合预测准确度要求的关键点的二维坐标,在二维平面,构建目标对象的伪三维检测体。由于在构建目标对象的伪三维检测体的过程中,避免了预测准确度低的关键点被使用,因此,本公开实施例有利于提高构建出的伪三维长方体的准确性。
在一个可选示例中,本公开实施例可以先根据关键点和目标对象包括的面之间预定的第一所属关系以及关键点的二维坐标,构建出目标对象的至少一个可能面;然后,再根据构建出的可能面,构建目标对象的伪三维检测体。本公开实施例中的可能面可以为最佳面,本公开实施例中的可能面也可以为最佳面和次佳面。
本公开实施例中的可能面为最佳面时,在一个可选示例中,可以根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面,然后根据最佳面构建目标对象的伪三维检测体。
本公开实施例中的可能面为最佳面和次佳面时,在一个可选示例中,可以根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面和一次佳面,然后根据最佳面和次佳面构建目标对象的伪三维检测体。
在一个可选示例中,本公开实施例可以先确定出待处理图像中的目标对象的最佳面,并在二维平面中构建出该最佳面,然后,确定出该最佳面的法向量,并基于该最佳面中的关键点沿该法向量方向的拓展,形成伪三维检测体。有利于快捷准确的构建出伪三维检测体。
在一个可选示例中,目标对象的关键点可以包括多个,相应地,本公开实施例确定出待处理图像中的目标对象的最佳面的方式可以为:首先,针对符合预测准确度要求的关键点各自对应的面,确定各个面的质量,即基于符合预测准确度要求的关键点,为各个面进行质量评定。然后,将质量评定最高的面作为目标对象的最佳面。相应地,在该实施例中,可以根据选取出的关键点的二维坐标,构建目标对象的伪三维检测体。
在一个可选示例中,面的质量评定方式可以为:统计各个面所对应的符合预测准确度要求的关键点的数量,将统计出的数量作为面的质量评定分数,从而一个面所对应的符合预测准确度要求的关键点的数量越多,则该面的质量评定分数越高。一个例子:图2中,假定关键点1、关键点2、关键点3、关键点4、关键点5、关键点6、关键点7和关键点8均为符合预测准确度要求的关键点;由于关键点1、关键点2、关键点3、关键点4、关键点5和关键点6对应车辆的左面,而关键点1、关键点2、关键点7和关键点8对应车辆的前面,因此,车辆的左面的指令评定分数最高,车辆的左面为最佳面。
在一个可选示例中,面的质量评定方式也可以为:统计各个面所对应的符合预测准确度要求的关键点的预测准确度之和,这样,至少一个面会对应一个预测准确度分值,本公开实施例可以将面所对应的预测准确度分值,作为面的质量评定分数,从而一个面所对应的预测准确度分值越高,则该面的质量评定分数越高。
在一个可选示例中,面的质量评定方式还可以为:统计各个面所对应的符合预测准确度要求的关键点的数量以及预测准确度之和,这样,每一个面会对应一个关键点数量以及预测准确度分值,本公开实施例可以计算每个面所对应的预测准确度分值与关键点数量的商,即计算每一个面的预测准确度平均分值,将面所对应的预测准确度平均分值作为面的质量评定分数,从而一个面所对应的预测准确度平均分值越高,则该面的质量评定分数越高。
上述仅例举了三种面的质量评定方式,本公开实施例还可以采用其他方式来确定面的质量,本公开实施例不限制面的质量评定方式的实现方式。
在一个可选示例中,本公开实施例可以通过多种方式,在二维平面中,构建最佳面,例如,可以利用最佳面上的一个关键点在二维平面中做垂线(即通过关键点的竖直方向的线),最佳面上的一个边即位于该垂线上,该垂线与其他面上的边的交点,即为最佳面的一个顶点。再例如,可以利用最佳面上的两个关键点在二维平面中做连接线,该连接线可以为最佳面上的一个边,或者该连接线及其延长线可以为最佳面上的一个边,即这两个关键点可以是最佳面上的两个顶点,或者,这两个关键点的连接线的延伸线与其他面的边的交点为该最佳面的顶点。再例如,利用最佳面上的一个关键点做平行线,该平行线是与最佳面上的另一条边相互平行的线,即通过最佳面上的一个关键点做最佳面上的另一条边的平行线,最佳面上的一个边即位于该平行线上,该平行线与上述垂线的交点或者与其他面 的边的交点即为该最佳面的顶点。本公开实施例不限制在二维平面中构建最佳面的实现方式。
在一个可选示例中,本公开实施例可以通过多种方式确定出最佳面的法向量,第一个例子:先确定出伪三维检测体的次佳面,然后,利用次佳面中的关键点向最佳面做垂线,从而可以将该垂线作为最佳面的法向量。第二个例子:从所有符合预测准确度要求的关键点中去除对应最佳面的关键点,从剩余的关键点中挑选一个预测准确度最高的关键点,经由该关键点向最佳面做垂线,并将该垂线作为最佳面的法向量。第三个例子,如果存在两个关键点属于最佳面的相邻面,且这两个关键点的连接线位于相邻面的与最佳面相垂直的边上,则可以将这两个关键点在二维平面的坐标差,作为最佳面的法向量。例如,图2中,假定车辆的左面为最佳面,车辆的前面为次佳面,关键点7在二维平面的坐标为(u 7,v 7),关键点1在二维平面的坐标为(u 1,v 1),本公开实施例可以将(u 7-u 1,v 7-v 1)作为最佳面的法向量。上述仅例举了三个例子,本公开实施例还可以采用其他方式来获得最佳面的法向量,本公开实施例不限制获得最佳面的法向量的实现方式。
在一个可选示例中,本公开实施例确定次佳面的过程可以为:先确定与最佳面相邻接的面,并针对除了属于最佳面的关键点之外的其他关键点,统计与最佳面相邻接的各面各自包含的关键点数量,本公开实施例可以将包含关键点数据最多的面作为次佳面,从而可以避免次佳面选取不当的现象。例如,图2中,由于关键点检测过程出现错误,本公开实施例不但获取到关键点1、关键点2、关键点3、关键点4、关键点5、关键点6、关键点7以及关键点8,而且,还获取到关键点10;假定关键点1、关键点2、关键点3、关键点4、关键点5、关键点6、关键点7、关键点8以及关键点10均符合预测准确度要求,而且,关键点10的预测准确度较高;在上述情况下,很显然关键点10出现了关键点检测错误的情况,本公开实施例通过利用上述方式来确定次佳面,可以避免从关键点10向最佳面做垂线,从而获得最佳面的法向量的现象。
在一个可选示例中,本公开实施例在确定了最佳面及其法向量之后,可以将最佳面中的顶点沿该最佳面的法向量方向的拓展,从而会与其他面的边相交,最终形成伪三维检测体。例如,图2中,先形成通过关键点1的第一垂线以及通过关键点4的第二垂线,再形成同时经过关键点6和关键点5,而且与第一垂线和第二垂线分别相交的第一线,之后,形成经过关键点2或者关键点3与上述第一线相平行,而且与两条垂线分别相交的第二线,从而形成了最佳面的四条线以及四个顶点;该最佳面的法向量为(u 7-u 1,v 7-v 1);该法向量也为次佳面的底边,本公开实施例可以形成通过关键点7的第三垂线,并通过关键点7做与第一线或者第二线向平行的第三线,最佳面中的左上角的顶点沿该法向量方向拓展,会与第三垂线相交,形成次佳面的顶边,而且该交点与关键点8的连线,会与最佳面中的右上角的顶点沿该法向量方向拓展的线相交,经该相交点做第四垂线,第四垂线会与最佳面右下角顶点沿法向量方向拓展的线相交,从而在二维空间中,形成伪三维检测体。本公开实施例为待处理图像中的目标对象所形成的伪三维检测体的一个例子如图3所示。本公开实施例在确定了最佳面及其法向量之后,可以通过多种方式形成伪三维检测体,本公开实施例不限制形成伪三维检测体的实现过程。
在一个可选示例中,该操作S110可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的构建三维检测体模块510执行。
S120、获取关键点的深度信息。
在一个可选示例中,本公开实施例可以先利用单目方式或者双目方式等,获得待处理图像的深度图;然后,利用关键点的二维坐标从该深度图中读取出关键点的深度值。本公开实施例也可以采用H矩阵的方式直接获得关键点的深度值,即利用关键点的二维坐标与H矩阵相乘,从相乘的结果中获得关键点的深度值(单位可以为米);还有,在摄像装置为基于深度的摄像装置时,可以直接获得关键点的深度值;本公开实施例不限制获得关键点的深度值的实现过程。
在一个可选示例中,该操作S120可以由处理器调用存储器存储的相应指令执行,也 可以由被处理器运行的获取深度信息模块520执行。
S130、根据关键点的深度信息和伪三维检测体,确定目标对象的三维检测体。
在一个可选示例中,该操作S130可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定三维检测体模块530执行。
在一个可选示例中,本公开实施例可以先根据关键点的二维坐标和深度信息,在三维空间中构建目标对象的初始三维检测体(如初始三维长方体),然后,至少将伪三维检测体作为三维检测体的约束条件,对初始三维检测体进行校正处理,从而获得目标对象的三维检测体(如三维长方体)。
在一个可选示例中,本公开实施例可以先根据关键点的深度信息,将关键点的二维坐标转换为三维空间中的三维坐标,然后,再根据转换获得的关键点的三维坐标,构建目标对象的初始三维检测体。本公开实施例在将关键点的二维坐标转换为三维空间中的三维坐标的过程中,可以有选择性的进行转换,例如,仅将符合预测准确度要求的关键点的二维坐标转换为三维空间中的三维坐标。当然,本公开实施例也可以将所有关键点的二维坐标均转换为三维空间中的三维坐标,而在构建目标对象的初始三维检测体的过程中,仅根据符合预测准确度要求的关键点的三维坐标,来构建目标对象的初始三维检测体。本公开实施例中的三维空间通常为真实世界的三维空间,例如,基于摄像装置的三维坐标系的三维空间。
在一个可选示例中,本公开实施例可以通过多种方式将关键点的二维坐标转换为三维空间中的三维坐标。
例如,将上述获得的关键点的深度值转换为三维空间中的距离,该距离可以认为是关键点与摄像装置之间的距离;之后,利用下述公式(1)计算出各关键点的三维坐标:
P×[X,Y,Z] T=w×[u,v,1] T        公式(1)
在上述公式(1)中,P表示摄像装置的参数;X、Y、Z表示关键点的三维坐标,即关键点在真实世界的三维空间中的三维坐标,其中的Z可以代入上述获得的关键点的深度值;u和v表示关键点的二维坐标,即关键点在待处理图像的坐标系中的二维坐标;w表示缩放因子。
如果将P表示为下述3×3的矩阵:
Figure PCTCN2019096232-appb-000001
则,上述公式(1)可以表示为下述公式(2)的形式:
Figure PCTCN2019096232-appb-000002
通过将多个关键点的二维坐标代入到上述公式(2)中,可以求解出变量X、Y和w,从而获得关键点的三维坐标,即(X,Y,Z)。
在一个可选示例中,本公开实施例可以先确定出目标对象在三维空间中的最佳面,并在三维平面中构建出该最佳面,然后,确定出该最佳面的法向量,并基于该最佳面中的关键点沿该法向量方向的拓展,形成初始三维检测体(如三维长方体)。
在一个可选示例中,本公开实施例确定出目标对象在三维空间中的最佳面的方式可以为:首先,针对符合预测准确度要求的关键点各自对应的面,确定各个面的质量,即基于符合预测准确度要求的关键点,为各个面进行质量评定;然后,将质量评定最高的面作为目标对象的最佳面。面的质量评定方式可以如上述步骤S110中例举的几种方式。在此不再重复说明。
在一个可选示例中,本公开实施例可以通过多种方式,在三维平面中,构建最佳面,例如,可以利用最佳面上的一个关键点在三维空间中做垂线(即作通过关键点的竖直方向(y方向)的线),最佳面上的一个边即位于该垂线上,该垂线与其他面上的边的交点,即为最佳面的一个顶点。再例如,可以利用最佳面上的两个关键点在三维空间中做连接线,该连接线可以为最佳面上的一个边,或者该连接线及其延长线可以为最佳面上的一个边,即这两个关键点可以是最佳面上的两个顶点,或者,这两个关键点的连接线的延伸线与其他面的边的交点为该最佳面的顶点。再例如,利用最佳面上的一个关键点做平行线,该平行线是与最佳面上的另一条边相互平行的线,即通过最佳面上的一个关键点做最佳面上的另一条边的平行线,最佳面上的一个边即位于该平行线上,该平行线与上述垂线的交点或者与其他面的边的交点即为该最佳面的顶点。本公开实施例不限制在三维空间中构建最佳面的实现方式。
在一个可选示例中,本公开实施例可以通过多种方式确定出最佳面的法向量,第一个例子:先确定出三维检测体的次佳面,然后,利用次佳面中的关键点向最佳面做垂线,从而可以将该垂线作为最佳面的法向量。第二个例子:从所有符合预测准确度要求的关键点中去除对应最佳面的关键点,从剩余的关键点中挑选一个预测准确度最高的关键点,经由该关键点向最佳面做垂线,并将该垂线作为最佳面的法向量。第三个例子,如果存在两个关键点属于最佳面的相邻面,且这两个关键点的连接线位于相邻面的与最佳面相垂直的边上,则可以将这两个关键点在三维空间的坐标差,作为最佳面的法向量。例如,图2中,假定车辆的左面为最佳面,车辆的前面为次佳面,关键点7在三维空间的坐标为(X 7,Y 7,Z 7),关键点1在三维空间的坐标为(X 1,Y 1,Z 1),本公开实施例可以将(X 7-X 1,Y 7-Y 1,Z 7-Z 1)作为最佳面的法向量。上述仅例举了三个例子,本公开实施例还可以采用其他方式来获得最佳面的法向量,本公开实施例不限制获得最佳面的法向量的实现方式。
在一个可选示例中,本公开实施例在确定了最佳面及其法向量之后,可以将最佳面中的顶点沿该最佳面的法向量方向的拓展,从而会与其他面的边相交,最终形成初始三维检测体。例如,图2中,先形成通过关键点1的第一垂线以及通过关键点4的第二垂线,再形成同时经过关键点6和关键点5,而且与第一垂线和第二垂线分别相交的第一线,之后,形成经过关键点2或者关键点3与上述第一线相平行,而且与两条垂线分别相交的第二线,从而形成了最佳面的四条线以及四个顶点;该最佳面的法向量为(X 7-X 1,Y 7-Y 1,Z 7-Z 1);该法向量也为次佳面的底边,本公开实施例可以形成通过关键点7的第三垂线,并通过关键点7做与第一线或者第二线向平行的第三线,最佳面中的左上角的顶点沿该法向量方向拓展,会与第三垂线相交,形成次佳面的顶边,而且该交点与关键点8的连线,会与最佳面中的右上角的顶点沿该法向量方向拓展的线相交,经该相交点做第四垂线,第四垂线会与最佳面右下角顶点沿法向量方向拓展的线相交,从而在二维空间中,形成三维检测体。本公开实施例在确定了最佳面及其法向量之后,可以通过多种方式形成初始三维检测体,本公开实施例不限制形成初始三维检测体的实现过程。
通过上述方式,由于不需要道路分割以及语义分割等计算机视觉基础任务,因此,可以快速的为目标对象构建出初始三维检测体,且构建该初始三维检测体的计算资源的消耗较小,实现成本较低。另外,由于本公开实施例是以目标对象的关键点为基础,来构建初始三维检测体的,构建初始三维检测体的过程与目标对象是否位于地面等因素无关,因此,本公开实施例可以有效避免在目标对象位于非地面上等场景下,无法实现对象三维检测等现象,从而有利于提高对象三维检测的适用范围。
在一个可选示例中,根据伪三维检测体对初始三维检测体进行校正,形成目标对象的三维检测体,可以是根据二维平面中的伪三维检测体,对三维空间中的初始三维检测体进行调整,以提高调整后的三维检测体映射在二维平面中的区域与伪三维检测体间的面积重叠情况。
在一个可选示例中,本公开实施例可以将初始三维检测体中的各顶点映射在二维平面中,从而得到初始三维检测体在二维平面中的图形。本公开实施例通过对三维空间中的初始三维检测体进行调整,可以使映射在二维平面中的图形区域与二维平面中的伪三维检测体的面积重叠情况发生变化,例如,使两者的重叠面积尽可能最大,再例如,使两者的交并比尽可能最大。
在一个可选示例中,本公开实施例使两者的面积重叠情况发生变化的方式可以包括:调整初始三维检测体在三维空间中的位置,使初始三维检测体映射在二维平面中的图形区域与伪三维检测体的重叠面积最大,例如,使初始三维检测体映射在二维平面中的图形区域完全覆盖伪三维检测体;再例如,使伪三维检测体完全覆盖初始三维检测体映射在二维平面中的图形区域。
在一个可选示例中,本公开实施例使两者的面积重叠情况发生变化的方式也可以包括:调整初始三维检测体在三维空间中的尺寸,使初始三维检测体映射在二维平面中的图形区域与伪三维检测体的图形区域尽可能一致。例如,初始三维检测体在映射在二维空间中时,如果其长/宽/高的长度与伪三维检测体的长/宽/高的长度的比值不满足预定比值(如0.9-1.1等),则本公开实施例可以对初始三维检测体在三维空间中的长/宽/高的长度进行调整,以使调整后的三维检测体映射在二维空间中的长/宽/高的长度与伪三维检测体的长/宽/高的长度的比值满足预定比值或者相同。
由于检测待处理图像中的目标对象的关键点的准确度已经相对较高,且检测速度相对较快,因此,利用目标对象的关键点可以快速的在二维平面中构建出准确度较高的伪三维检测体。本公开实施例通过利用伪三维检测体来对三维空间中的初始三维检测体进行校正,有利于提高针对目标对象在三维空间中所构建出的三维检测体的准确度。
在一个可选示例中,本公开实施例还可以将针对目标对象预设的长宽高比例作为初始三维检测体的约束条件,从而在三维空间中,可以根据该约束条件对初始三维检测体进行校正。例如,在目标对象为车辆的情况下,本公开实施例可以预设车辆的长宽高比例为2:1:1,从而在初始三维检测体的长宽高比例超出2:1:1一定范围时,可以对初始三维检测体的长宽高进行调整,使调整后的三维检测体的长宽高比例不会超出2:1:1一定范围。
在一个可选示例中,本公开实施例还可将目标对象在待处理图像中的检测框作为初始三维检测体的约束条件,从而在三维空间中,可以根据该约束条件对初始三维检测体进行校正。例如,在目标对象为车辆的情况下,本公开实施例可以将车辆检测框(也可以称为车辆外接框)作为初始三维检测体的约束条件,对初始三维检测体的整体位置和/或长宽高进行调整,从而在调整后的三维检测体映射在二维空间中时,完全落入在检测框内。由于目标对象的检测框通常较为准确,因此,利用检测框作为约束条件对初始三维检测体进行校正,有利于提高对象三维检测结果的准确性。
在一个可选示例中,在待处理图像为具有时序关系的多个待处理图像中的一个待处理图像(如视频中的一视频帧)的情况下,本公开实施例可以对调整后的三维检测体进行平滑处理。该平滑处理可以包括:对三维检测体的长宽高的平滑处理、对三维检测体的运动方向的平滑处理以及对三维检测体的鸟瞰图中心点的平滑处理中的至少一个。另外,本公 开实施例也可以对三维检测体的八个顶点进行平滑处理。本公开实施例通过对三维检测体进行平滑处理,有利于提高对象三维检测的准确性,且可以避免目标对象在前后相邻的两视频帧中的大幅度晃动,从而有利于提高自动驾驶的安全性。
在实现平滑处理的过程中,本公开实施例可以利用当前待处理图像之前的多个历史待处理图像(如5个或者6个或者7个等历史视频帧),采用相应的拟合函数来预测当前待处理图像的目标对象的三维检测体的长宽高、运动方向或者鸟瞰图中心点等参数。当然,本公开实施例也可以为目标对象的三维检测体的8个顶点进行平滑处理。本公开实施例中的拟合函数可以采用二次函数、三次指数函数或者对数函数等,本公开实施例不限制在平衡处理过程中所采用的拟合函数的表现形式。
作为拟合函数的二次函数的一个例子如下述公式(3)所示:
x=f(t)=at 2+bt+c          公式(3)
在上述公式(3)中,x表示进行优化拟合的历史视频帧的值,如在采用5个历史视频帧进行拟合的情况下,x=(x 1,x 2,x 3,x 4,x 5),t表示历史视频帧所对应的时刻,如在采用5个历史视频帧进行拟合的情况下,t=(t 1,t 2,t 3,t 4,t 5),a、b和c表示二次函数的系数。
本公开实施例可以利用历史视频帧先获得公式(3)中的a、b和c,然后,再利用公式(3)获得当前视频帧的预测结果pred。
本公开实施例可以通过加权的方式对当前视频帧中的目标对象的三维检测体的长宽高、运动方向或者鸟瞰图中心点等参数进行相应的调整,从而实现相应的平滑处理,例如,本公开实施例可以利用下述公式(4)来对当前视频帧中的目标对象的三维检测体的长宽高、运动方向或者鸟瞰图中心点等参数进行相应的调整:
x 6=α*pred+β*x 6'              公式(4)
在上述公式(4)中,α表示预测结果对应的权重,pred表示预测结果,β表示当前视频帧的对象三维检测结果对应的权重,x 6′表示当前视频帧的对象三维检测结果,x 6表示平滑处理后的当前视频帧的对象三维检测结果。
权重值的设置可以根据实际需求确定,例如,在预测结果与当前视频帧的对象三维检测结果相差不大(如相差不超过预定值等)的情况下,可以设置α=0.5和β=0.5;再例如,在预测结果与当前视频帧的对象三维检测结果相差较大(如相差达到预定值)的情况下,可以设置α=0.8和β=0.2,也可以设置α=0.7和β=0.3等。本公开实施例对权重的取值不作限制。
在一个可选示例中,本公开实施例还可以根据目标对象的关键点的三维坐标,获取最终获得的三维检测体的三维空间参数,例如,三维检测体的运动方向、三维检测体与摄取待处理图像的摄像装置之间的位置关系、以及三维检测体的尺寸中的任意一项或多项。获取的三维空间参数可以用于对目标对象进行控制,例如,根据获取的三维空间参数产生相应的控制指令等。
本公开实施例还提供了一种智能驾驶控制方法,该实施例的智能驾驶控制方法包括:以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,采用本公开上述任一实施例对象三维检测方法确定目标对象的三维检测体;根据该三维检测体的信息生成车辆控 制指令;向车辆发送所述车辆控制指令。
图4为本公开实施例的智能驾驶控制方法的一个实施例的流程图。本公开实施例的智能驾驶控制方法可以适用于自动驾驶(如完全无人辅助的自动驾驶)环境中,也可以适用于辅助驾驶环境中。本公开实施例不限制智能驾驶控制方法的应用环境。
如图4所示,该实施例的智能驾驶控制方法包括:
S400、获取车辆上设置的摄像装置采集的待处理图像中的目标对象的关键点的二维坐标。本操作的实现方式可以参见上述方法实施例中针对图1中操作S100的描述,在此不再详细说明。
在一个可选示例中,该操作S400可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取二维坐标模块500执行。
S410、根据关键点的二维坐标,构建目标对象的伪三维检测体。本操作的实现方式可以参见上述方法实施例中针对图1中操作S110的描述,在此不再详细说明。
在一个可选示例中,该操作S410可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的构建三维检测体模块510执行。
S420、获取关键点的深度信息。本操作的实现方式可以参见上述方法实施例中针对图1中操作S120的描述,在此不再详细说明。
在一个可选示例中,该操作S420可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的获取深度信息模块520执行。
S430、根据关键点的深度信息和伪三维检测体,确定目标对象的三维检测体。本操作的实现方式可以参见上述方法实施例中针对图1中操作S130的描述,在此不再详细说明。
在一个可选示例中,该操作S430可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的确定三维检测体模块530执行。
S440、根据三维检测体的信息生成车辆控制指令。
在一个可选示例中,本公开实施例中的三维检测体的信息包括以下任意一项或多项:三维检测体的运动方向、三维检测体与所述摄像装置之间的位置关系、三维检测体的尺寸。本公开实施例不限制三维检测体的信息所包含的内容。
在一个可选示例中,本公开实施例根据三维检测体的信息,所生成的车辆控制指令可以包括以下任意一项或多项:刹车指令、减速行驶指令、左转向指令、右转向指令、保持当前速度行驶指令、鸣笛指令、加速行驶指令,本公开实施例不限制车辆控制指令的表现形式。
在一个可选示例中,该操作S440可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的生成指令模块610执行。
S450、向车辆发送车辆控制指令。
在一个可选示例中,该操作S450可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的发送指令模块620执行。
需要特别说明的是,本公开实施例的对象三维检测技术除了可以适用于智能驾驶控制领域之外,还可以应用在其他领域中;例如,可以实现工业制造中的对象检测、超市等室内领域的对象检测、安防领域中的对象检测等等,本公开实施例不限制对象三维检测技术的适用场景。
本公开实施例提供的任一种对象三维检测及智能驾驶控制的方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本公开实施例提供的任一种对象三维检测及智能驾驶控制的方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种对象三维检测及智能驾驶控制的方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分操作可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的操作;而前述的存储介质包括:ROM、RAM、磁碟或 者光盘等各种可以存储程序代码的介质。
图5为本公开实施例的对象三维检测装置一个实施例的结构示意图。如图5所示,该实施例的装置包括:获取二维坐标模块500、构建三维检测体模块510、获取深度信息模块520以及确定三维检测体模块530。可选的,该装置还可以包括:选取关键点模块540、获取目标检测框模块550、平滑处理模块560、获取运动方向模块570以及获取位置关系模块580。
获取二维坐标模块500用于获取待处理图像中的目标对象的关键点的二维坐标。本公开实施例中的待处理图像可以为:设置在移动物体上的摄像装置所摄取的视频中的视频帧。本公开实施例中的待处理图像也可以为:设置在固定位置的摄像装置所摄取的视频中的视频帧。本公开实施例中的目标对象可以包括以下任意一项或多项:机动车辆、非机动车辆、行人、动物、建筑物、植物、障碍物、危险物、交通标识物、物品。
在一个可选示例中,在获取二维坐标模块500执行操作之前,获取目标检测框模块550可以对待处理图像进行目标对象检测,以获得包括有目标对象的二维目标检测框。这样获取二维坐标模块500可以基于待处理图像对应的二维目标检测框的图像部分,获取目标对象的关键点的二维坐标。另外,在目标对象检测过程中,如果获取目标检测框模块550未成功获取到包括有目标对象的二维目标检测框,则本公开实施例的对象三维检测装置可以不再进行对象的三维检测处理,例如,获取二维坐标模块500不再执行获取二维目标检测框的操作。
构建三维检测体模块510用于根据关键点的二维坐标,构建目标对象的伪三维检测体。
在一个可选示例中,构建三维检测体模块510可以包括:第一子模块和第二子模块。其中的第一子模块用于根据关键点和目标对象包括的面之间预定的第一所属关系以及关键点的二维坐标,构建至少一个目标对象的可能面。其中的第二子模块用于根据可能面构建目标对象的伪三维检测体。
在一个可选示例中,上述第二子模块可以包括:第一单元和第二单元。其中的第一单元用于根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面。其中的第二单元用于根据最佳面构建目标对象的伪三维检测体。另外,第一单元可以根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面和一次佳面;而第二单元可以根据最佳面和次佳面构建目标对象的伪三维检测体。
在一个可选示例中,第二单元可以先确定最佳面的法向量,然后,第二单元根据最佳面中的顶点沿法向量方向的拓展,形成伪三维检测体。其中,第二单元确定最佳面的法向量的方式可以为:将次佳面中的关键点向最佳面所做的垂线,作为最佳面的法向量。本公开实施例的第二单元确定最佳面的法向量的方式也可以为:将除了对应最佳面之外的其他关键点中预测准确度最高的关键点向最佳面所做的垂线,作为最佳面的法向量。本公开实施例的第二单元确定最佳面的法向量的方式还可以为:将最佳面的相邻面中的、与最佳面垂直的边上的两个关键点的坐标差,作为最佳面的法向量。
在一个可选示例中,在目标对象的关键点包括多个的情况下,在构建三维检测体模块510执行构建伪三维检测体操作之前,选取关键点模块540可以用于从多个关键点中选取符合预测准确度要求的关键点,从而构建三维检测体模块510可以根据选取关键点模块540所选取出的关键点的二维坐标,构建目标对象的伪三维检测体。
获取深度信息模块520用于获取关键点的深度信息。
确定三维检测体模块530用于根据关键点的深度信息和伪三维检测体,确定目标对象的三维检测体。
在一个可选示例中,确定三维检测体模块530可以包括:第三子模块、第四子模块以及第五子模块。其中的第三子模块用于根据关键点的深度信息,将关键点的二维坐标转换为三维空间中的三维坐标。例如,第三子模块可以将符合预测准确度要求的关键点的二维坐标转换为三维空间中的三维坐标。第四子模块用于根据关键点的三维坐标,构建目标对象的初始三维检测体。第五子模块用于根据伪三维检测体对初始三维检测体进行校正,形 成目标对象的三维检测体。例如,第五子模块根据二维平面中的伪三维检测体,对三维空间中的初始三维检测体进行调整,以提高调整后的三维检测体映射在二维平面中的区域与伪三维检测体间的面积重叠情况。另外,第五子模块还可以根据目标对象的预设长宽高比例,对初始三维检测体进行校正。第五子模块也可以根据目标对象在待处理图像中的检测框对初始三维检测体进行校正,以使调整后的三维检测体映射在二维平面中的区域属于所述检测框。
在一个可选示例中,第四子模块可以包括:第三单元、第四单元和第五单元。其中的第三单元用于根据关键点的三维坐标各自对应的面,确定目标对象的最佳面,并在三维空间,构建目标对象的最佳面。第四单元用于确定最佳面的法向量。例如,第四单元将次佳面中的关键点向最佳面所做的垂线,作为最佳面的法向量;再例如,第四单元将除了对应最佳面之外的其他关键点中预测准确度最高的关键点向最佳面所做的垂线,作为最佳面的法向量;再例如,第四单元将最佳面的相邻面中的、与最佳面垂直的边上的两个关键点的坐标差,作为最佳面的法向量。第五单元用于根据最佳面中的顶点沿法向量方向的拓展,形成初始三维检测体。
平滑处理模块560用于针对具有时序关系的多个待处理图像中的同一目标对象的三维检测体进行平滑处理。本公开实施例中的平滑处理可以包括以下任意一项或多项:对三维检测体的长宽高的平滑处理、对三维检测体的运动方向的平滑处理、对三维检测体的鸟瞰图中心点的平滑处理以及对三维检测体的顶点的平滑处理。
获取运动方向模块570用于根据目标对象的关键点的三维坐标,获取三维检测体的运动方向。
获取位置关系模块580用于根据目标对象的关键点的三维坐标,获取目标对象与摄取所述待处理图像的摄像装置之间的位置关系。
本公开实施例中的获取二维坐标模块500、构建三维检测体模块510、获取深度信息模块520、确定三维检测体模块530、选取关键点模块540、获取目标检测框模块550、平滑处理模块560、获取运动方向模块570以及获取位置关系模块580等模块所执行的操作,可以参见上述方法实施例中的相关描述。在此不再重复说明。
图6为本公开实施例的智能驾驶控制装置的一个实施例的结构示意图。图6中的装置主要包括:对象三维检测装置600、生成指令模块610以及发送指令模块620。
生成指令模块610用于根据对象三维检测装置600所获得的三维检测体的信息生成车辆控制指令。
发送指令模块620用于向车辆发送车辆控制指令。
对象三维检测装置600的结构可以参见本公开上述任一对象三维检测装置实施例中的描述,生成指令模块610以及发送指令模块620所执行的操作,可以参见上述方法实施例中的相关描述。在此不再重复说明。
图7示出了适于实现本公开实施例的示例性设备700,设备700可以是汽车中配置的控制系统/电子系统、移动终端(例如,智能移动电话等)、个人计算机(PC,例如,台式计算机或者笔记型计算机等)、平板电脑以及服务器等电子设备。图7中,设备700包括一个或者多个处理器、通信部等,所述一个或者多个处理器可以为:一个或者多个中央处理单元(CPU)701,和/或,一个或者多个利用神经网络进行视觉跟踪的图像处理器(GPU)713等,处理器可以根据存储在只读存储器(ROM)702中的可执行指令或者从存储部分708加载到随机访问存储器(RAM)703中的可执行指令而执行各种适当的动作和处理。通信部712可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器702和/或随机访问存储器703中通信以执行可执行指令,通过总线704与通信部712相连、并经通信部712与其他目标设备通信,从而完成本公开实施例提供的任一对象三维检测方法对应的操作,例如,获取待处理图像中目标对象的关键点的二维坐标;根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体;获取所述关键点的深度信息;根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三 维检测体。另外,处理器可与只读存储器702和/或随机访问存储器703中通信以执行可执行指令,通过总线704与通信部712相连、并经通信部712与其他目标设备通信,从而完成本公开实施例提供的任一智能驾驶控制方法对应的操作,例如,以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,采用本公开任一实施例所述的对象三维检测方法确定目标对象的三维检测体;根据所述三维检测体的信息生成车辆控制指令;向所述车辆发送所述车辆控制指令。
上述各指令所执行的操作可以参见本公开上述对象三维检测方法或者智能驾驶控制方法实施例中的相关描述,在此不再详细说明。
此外,在RAM 703中,还可以存储有装置操作所需的各种程序以及数据。CPU701、ROM702以及RAM703通过总线704彼此相连。在有RAM703的情况下,ROM702为可选模块。RAM703存储可执行指令,或在运行时向ROM702中写入可执行指令,可执行指令使中央处理单元701执行上述对象三维检测方法或者智能驾驶控制方法所对应的操作。输入/输出(I/O)接口705也连接至总线704。通信部712可以集成设置,也可以设置为具有多个子模块(例如,多个IB网卡),并分别与总线连接。
以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装在存储部分708中。
需要特别说明的是,如图7所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图7的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如,GPU713和CPU701可分离设置,再如理,可将GPU713集成在CPU701上,通信部可分离设置,也可集成设置在CPU701或GPU713上等。这些可替换的实施例均落入本公开实施例的保护范围。
特别地,根据本公开实施例的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序,例如,本公开实施例包括一种计算机程序产品,其包含有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的步骤的程序代码,程序代码可包括对应执行本公开实施例提供的对象三维检测方法或者智能驾驶控制方法中的步骤对应的指令,例如,例如,获取待处理图像中目标对象的关键点的二维坐标的指令;根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体的指令;获取所述关键点的深度信息的指令;根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体的指令。或者,以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,采用本公开任一实施例所述的对象三维检测方法确定目标对象的三维检测体的指令;根据所述三维检测体的信息生成车辆控制指令的指令;向所述车辆发送所述车辆控制指令的指令。
在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载及安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时,执行本公开实施例中记载的实现上述相应步骤的指令。
在一个或多个可选实施例中,本公开实施例还提供了一种计算机程序程序产品,用于存储计算机可读指令,所述指令被执行时使得计算机执行上述任意实施例中所述的对象三维检测方法或者智能驾驶控制方法。
该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选例子中,所述计算机程序产品体现为计算机存储介质,在另一个可选例子中,所述计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
在一个或多个可选实施例中,本公开实施例还提供了另一种对象三维检测方法或者智能驾驶控制方法及其对应的装置和电子设备、计算机存储介质、计算机程序以及计算机程 序产品,其中的方法包括:第一装置向第二装置发送对象三维检测指示或者智能驾驶控制指示,该指示使得第二装置执行上述任一可能的实施例中的对象三维检测方法或者智能驾驶控制方法;第一装置接收第二装置发送的对象三维检测结果或者智能驾驶控制结果。
在一些实施例中,该视对象三维检测指示或者智能驾驶控制指示可以为调用指令,第一装置可以通过调用的方式指示第二装置执行对象三维检测操作或者智能驾驶控制操作,相应地,响应于接收到调用指令,第二装置可以执行上述对象三维检测方法或者智能驾驶控制方法中的任意实施例中的步骤和/或流程。
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
可能以许多方式来实现本公开的方法和装置、电子设备以及计算机可读存储介质。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置、电子设备以及计算机可读存储介质。用于方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本公开实施为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述,是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言,是显然的。选择和描述实施例是为了更好说明本公开的原理以及实际应用,并且使本领域的普通技术人员能够理解本公开实施例可以从而设计适于特定用途的带有各种修改的各种实施例。

Claims (47)

  1. 一种对象三维检测方法,其特征在于,包括:
    获取待处理图像中目标对象的关键点的二维坐标;
    根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体;
    获取所述关键点的深度信息;
    根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体。
  2. 根据权利要求1所述的方法,其特征在于,所述待处理图像包括:设置在移动物体上的摄像装置所摄取的视频中的视频帧,或者,设置在固定位置的摄像装置所摄取的视频中的视频帧。
  3. 根据权利要求1至2中任一项所述的方法,其特征在于,所述目标对象包括以下任意一项或多项:机动车辆、非机动车辆、行人、动物、建筑物、植物、障碍物、危险物、交通标识物、物品。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体,包括:
    根据关键点和目标对象包括的面之间预定的第一所属关系以及所述关键点的二维坐标,构建至少一个目标对象的可能面;
    根据所述可能面构建所述目标对象的伪三维检测体。
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述可能面构建所述目标对象的伪三维检测体,包括:
    根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面;
    根据所述最佳面构建所述目标对象的伪三维检测体。
  6. 根据权利要求5所述的方法,其特征在于,所述根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面,包括:
    根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面和一次佳面;
    所述根据所述最佳面构建所述目标对象的伪三维检测体,包括:
    根据所述最佳面和所述次佳面构建所述目标对象的伪三维检测体。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述目标对象的关键点包括多个;
    所述根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体之前,所述方法还包括:
    从多个所述关键点中选取符合预测准确度要求的关键点;
    所述根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体,包括:
    根据所述选取出的关键点的二维坐标,构建所述目标对象的伪三维检测体。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述获取待处理图像中的目标对象的关键点的二维坐标之前,所述方法还包括:
    对所述待处理图像进行目标对象检测,获得包括有目标对象的二维目标检测框;
    所述获取待处理图像中的目标对象的关键点的二维坐标,包括:
    基于所述待处理图像对应所述二维目标检测框的图像部分,获取所述目标对象的关键点的二维坐标。
  9. 根据权利要求8所述的方法,其特征在于,所述方法还包括:
    在所述目标对象检测过程中,在未成功获取到包括有目标对象的二维目标检测框的情况下,不进行对象的三维检测处理。
  10. 根据权利要求6所述的方法,其特征在于,所述根据所述最佳面和所述次佳面构建所述目标对象的伪三维检测体,包括:
    确定所述最佳面的法向量;
    根据所述最佳面中的顶点沿所述法向量方向的拓展,形成所述伪三维检测体。
  11. 根据权利要求10所述的方法,其特征在于,所述确定最佳面的法向量,包括:
    将所述次佳面中的关键点向最佳面所做的垂线,作为最佳面的法向量;或者,
    将除了对应所述最佳面之外的其他关键点中预测准确度最高的关键点向所述最佳面所做的垂线,作为最佳面的法向量;或者,
    将所述最佳面的相邻面中的、与所述最佳面垂直的边上的两个关键点的坐标差,作为所述最佳面的法向量。
  12. 根据权利要求1至11中任一项所述的方法,其特征在于,所述根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体,包括:
    根据所述关键点的深度信息,将所述关键点的二维坐标转换为三维空间中的三维坐标;
    根据所述关键点的三维坐标,构建所述目标对象的初始三维检测体;
    根据所述伪三维检测体对所述初始三维检测体进行校正,形成所述目标对象的三维检测体。
  13. 根据权利要求12所述的方法,其特征在于,所述将所述关键点的二维坐标转换为三维空间中的三维坐标,包括:
    将符合预测准确度要求的关键点的二维坐标转换为三维空间中的三维坐标。
  14. 根据权利要求13所述的方法,其特征在于,所述根据所述关键点的三维坐标,构建所述目标对象的初始三维检测体,包括:
    根据所述关键点的三维坐标各自对应的面,确定所述目标对象的最佳面,并在三维空间,构建所述目标对象的最佳面;
    确定所述最佳面的法向量;
    根据所述最佳面中的顶点沿所述法向量方向的拓展,形成所述初始三维检测体。
  15. 根据权利要求14所述的方法,其特征在于,所述确定最佳面的法向量,包括:
    将所述次佳面中的关键点向所述最佳面所做的垂线,作为所述最佳面的法向量;或者,
    将除了对应所述最佳面之外的其他关键点中预测准确度最高的关键点向所述最佳面所做的垂线,作为所述最佳面的法向量;或者,
    将所述最佳面的相邻面中的、与所述最佳面垂直的边上的两个关键点的坐标差,作为所述最佳面的法向量。
  16. 根据权利要求12至15中任一项所述的方法,其特征在于,所述根据所述伪三维检测体对所述初始三维检测体进行校正,形成所述目标对象的三维检测体,包括:
    根据所述二维平面中的伪三维检测体,对所述三维空间中的初始三维检测体进行调整,以提高调整后的三维检测体映射在二维平面中的区域与所述伪三维检测体间的面积重叠情况。
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述伪三维检测体对所述初始三维检测体进行校正,形成所述目标对象的三维检测体,还包括下述任意一项或多项:
    根据所述目标对象的预设长宽高比例,对所述初始三维检测体进行校正;
    根据所述目标对象在待处理图像中的检测框,对所述初始三维检测体进行校正,以使调整后的三维检测体映射在二维平面中的区域属于所述检测框。
  18. 根据权利要求1至17中任一项所述的方法,其特征在于,所述方法还包括:
    针对具有时序关系的多个待处理图像中的同一目标对象的三维检测体进行平滑处理。
  19. 根据权利要求18所述的方法,其特征在于,所述平滑处理,包括以下任意一项或多项:对三维检测体的长宽高的平滑处理、对三维检测体的运动方向的平滑处理、对三维检测体的鸟瞰图中心点的平滑处理以及对三维检测体的顶点的平滑处理。
  20. 根据权利要求1至19中任一项所述的方法,其特征在于,所述方法还包括以下任意一项或多项:
    根据所述目标对象的关键点的三维坐标,获取所述三维检测体的运动方向;
    根据所述目标对象的关键点的三维坐标,获取所述目标对象与摄取所述待处理图像的摄像装置之间的位置关系。
  21. 一种智能驾驶控制方法,其特征在于,以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,采用如权利要求1-20中任一项所述的方法确定目标对象的三维检测体;
    根据所述三维检测体的信息生成车辆控制指令;
    向所述车辆发送所述车辆控制指令。
  22. 根据权利要求21所述的方法,其特征在于,所述三维检测体的信息包括以下任意一项或多项:三维检测体的运动方向、三维检测体与所述摄像装置之间的位置关系、三维检测体的尺寸。
  23. 一种对象三维检测装置,其特征在于,包括:
    获取二维坐标模块,用于获取待处理图像中的目标对象的关键点的二维坐标;
    构建三维检测体模块,用于根据所述关键点的二维坐标,构建所述目标对象的伪三维检测体;
    获取深度信息模块,用于获取所述关键点的深度信息;
    确定三维检测体模块,用于根据所述关键点的深度信息和所述伪三维检测体,确定所述目标对象的三维检测体。
  24. 根据权利要求23所述的装置,其特征在于,所述待处理图像包括:设置在移动物体上的摄像装置所摄取的视频中的视频帧,或者,设置在固定位置的摄像装置所摄取的视频中的视频帧。
  25. 根据权利要求23或24所述的装置,其特征在于,所述目标对象包括以下任意一项或多项:机动车辆、非机动车辆、行人、动物、建筑物、植物、障碍物、危险物、交通标识物、物品。
  26. 根据权利要求23至25中任一项所述的装置,其特征在于,所述构建三维检测体模块,包括:
    第一子模块,用于根据关键点和目标对象包括的面之间预定的第一所属关系以及所述关键点的二维坐标,构建至少一个目标对象的可能面;
    第二子模块,用于根据所述可能面构建所述目标对象的伪三维检测体。
  27. 根据权利要求26所述的装置,其特征在于,所述第二子模块包括:
    第一单元,用于根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面;
    第二单元,用于根据所述最佳面构建所述目标对象的伪三维检测体。
  28. 根据权利要求27所述的装置,其特征在于,所述第一单元用于:
    根据面质量预定判断规则,在构建的至少一个可能面中确定一最佳面和一次佳面;
    所述第二单元用于:
    根据所述最佳面和所述次佳面构建所述目标对象的伪三维检测体。
  29. 根据权利要求23至28中任一项所述的装置,其特征在于,所述目标对象的关键点包括多个;
    所述装置还包括:选取关键点模块,用于从多个所述关键点中选取符合预测准确度要求的关键点;
    所述构建三维检测体模块,用于根据所述选取出的关键点的二维坐标,构建所述目标对象的伪三维检测体。
  30. 根据权利要求23至29中任一项所述的装置,其特征在于,所述装置还包括:
    获取目标检测框模块,用于对所述待处理图像进行目标对象检测,获得包括有目标对象的二维目标检测框;
    所述获取二维坐标模块,用于基于所述待处理图像对应所述二维目标检测框的图像部分,获取所述目标对象的关键点的二维坐标。
  31. 根据权利要求30所述的装置,其特征在于,
    在所述目标对象检测过程中,在所述获取目标检测框模块未成功获取到包括有目标对象的二维目标检测框的情况下,对象三维检测装置不进行对象的三维检测处理。
  32. 根据权利要求28所述的装置,其特征在于,所述第二单元用于:
    确定所述最佳面的法向量;
    根据所述最佳面中的顶点沿所述法向量方向的拓展,形成所述伪三维检测体。
  33. 根据权利要求32所述的装置,其特征在于,所述第二单元确定所述最佳面的法向量时,用于:
    将所述次佳面中的关键点向最佳面所做的垂线,作为最佳面的法向量;或者,
    将除了对应所述最佳面之外的其他关键点中预测准确度最高的关键点向所述最佳面所做的垂线,作为最佳面的法向量;或者,
    将所述最佳面的相邻面中的、与所述最佳面垂直的边上的两个关键点的坐标差,作为所述最佳面的法向量。
  34. 根据权利要求23至33中任一项所述的装置,其特征在于,所述确定三维检测体模块包括:
    第三子模块,用于根据所述关键点的深度信息,将所述关键点的二维坐标转换为三维空间中的三维坐标;
    第四子模块,用于根据所述关键点的三维坐标,构建所述目标对象的初始三维检测体;
    第五子模块,用于根据所述伪三维检测体对所述初始三维检测体进行校正,形成所述目标对象的三维检测体。
  35. 根据权利要求34所述的装置,其特征在于,所述第三子模块用于:
    将符合预测准确度要求的关键点的二维坐标转换为三维空间中的三维坐标。
  36. 根据权利要求35所述的装置,其特征在于,所述第四子模块包括:
    第三单元,用于根据所述关键点的三维坐标各自对应的面,确定所述目标对象的最佳面,并在三维空间,构建所述目标对象的最佳面;
    第四单元,用于确定所述最佳面的法向量;
    第五单元,用于根据所述最佳面中的顶点沿所述法向量方向的拓展,形成所述初始三维检测体。
  37. 根据权利要求36所述的装置,其特征在于,所述第四单元用于:
    将所述次佳面中的关键点向所述最佳面所做的垂线,作为所述最佳面的法向量;或者,
    将除了对应所述最佳面之外的其他关键点中预测准确度最高的关键点向所述最佳面所做的垂线,作为所述最佳面的法向量;或者,
    将所述最佳面的相邻面中的、与所述最佳面垂直的边上的两个关键点的坐标差,作为所述最佳面的法向量。
  38. 根据权利要求34至37中任一项所述的装置,其特征在于,所述第五子模块用于:
    根据所述二维平面中的伪三维检测体,对所述三维空间中的初始三维检测体进行调整,以提高调整后的三维检测体映射在二维平面中的区域与所述伪三维检测体间的面积重叠情况。
  39. 根据权利要求38所述的装置,其特征在于,所述第五子模块还用于:根据所述目标对象的预设长宽高比例,对所述初始三维检测体进行校正;和/或,根据所述目标对象在待处理图像中的检测框对所述初始三维检测体进行校正,以使调整后的三维检测体映射在二维平面中的区域属于所述检测框。
  40. 根据权利要求23至39中任一项所述的装置,其特征在于,所述装置还包括:
    平滑处理模块,用于针对具有时序关系的多个待处理图像中的同一目标对象的三维检测体进行平滑处理。
  41. 根据权利要求40所述的装置,其特征在于,所述平滑处理包括以下任意一项或多项:对三维检测体的长宽高的平滑处理、对三维检测体的运动方向的平滑处理、对三维检测体的鸟瞰图中心点的平滑处理以及对三维检测体的顶点的平滑处理中的至少一个。
  42. 根据权利要求23至41中任一项所述的装置,其特征在于,所述装置还包括:
    获取运动方向模块,用于根据所述目标对象的关键点的三维坐标,获取所述三维检测 体的运动方向;和/或,
    获取位置关系模块,用于根据所述目标对象的关键点的三维坐标,获取所述目标对象与摄取所述待处理图像的摄像装置之间的位置关系。
  43. 一种智能驾驶控制装置,其特征在于,包括:
    如权利要求22-40任一所述的对象三维检测装置,用于以车辆上设置的摄像装置采集的视频包括的视频帧为待处理图像,确定目标对象的三维检测体;
    生成指令模块,用于根据所述三维检测体的信息生成车辆控制指令;
    发送指令模块,用于向所述车辆发送所述车辆控制指令。
  44. 根据权利要求43所述的装置,其特征在于,所述三维检测体的信息包括以下任意一项或多项:三维检测体的运动方向、三维检测体与所述摄像装置之间的位置关系、三维检测体的尺寸。
  45. 一种电子设备,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-22中任一项所述的方法。
  46. 一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述权利要求1-22中任一项所述的方法。
  47. 一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-22中任一项所述的方法。
PCT/CN2019/096232 2018-08-07 2019-07-16 对象三维检测及智能驾驶控制的方法、装置、介质及设备 WO2020029758A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021501280A JP6949266B2 (ja) 2018-08-07 2019-07-16 対象三次元検出及びスマート運転制御方法、装置、媒体並びに機器
US17/259,678 US11100310B2 (en) 2018-08-07 2019-07-16 Object three-dimensional detection method and apparatus, intelligent driving control method and apparatus, medium and device
SG11202100378UA SG11202100378UA (en) 2018-08-07 2019-07-16 Object three-dimensional detection method and apparatus, intelligent driving control method and apparatus, medium and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810891535.0A CN110826357B (zh) 2018-08-07 2018-08-07 对象三维检测及智能驾驶控制的方法、装置、介质及设备
CN201810891535.0 2018-08-07

Publications (1)

Publication Number Publication Date
WO2020029758A1 true WO2020029758A1 (zh) 2020-02-13

Family

ID=69414504

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/096232 WO2020029758A1 (zh) 2018-08-07 2019-07-16 对象三维检测及智能驾驶控制的方法、装置、介质及设备

Country Status (5)

Country Link
US (1) US11100310B2 (zh)
JP (1) JP6949266B2 (zh)
CN (1) CN110826357B (zh)
SG (1) SG11202100378UA (zh)
WO (1) WO2020029758A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221751A (zh) * 2021-05-13 2021-08-06 北京百度网讯科技有限公司 关键点检测的方法、装置、设备以及存储介质
US11321862B2 (en) 2020-09-15 2022-05-03 Toyota Research Institute, Inc. Systems and methods for multi-camera modeling with neural camera networks
US11494927B2 (en) 2020-09-15 2022-11-08 Toyota Research Institute, Inc. Systems and methods for self-supervised depth estimation
US11508080B2 (en) 2020-09-15 2022-11-22 Toyota Research Institute, Inc. Systems and methods for generic visual odometry using learned features via neural camera models
US11615544B2 (en) 2020-09-15 2023-03-28 Toyota Research Institute, Inc. Systems and methods for end-to-end map building from a video sequence using neural camera models
JP7393270B2 (ja) 2020-03-25 2023-12-06 株式会社コア 情報処理装置、情報処理方法及び情報処理プログラム

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11159811B2 (en) * 2019-03-15 2021-10-26 Tencent America LLC Partitioning of coded point cloud data
CN112767300A (zh) * 2019-10-18 2021-05-07 宏达国际电子股份有限公司 自动生成手部的标注数据的方法和计算骨骼长度的方法
CN111340886B (zh) * 2020-02-25 2023-08-15 深圳市商汤科技有限公司 检测物体的拾取点的方法及装置、设备、介质和机器人
CN111723716B (zh) * 2020-06-11 2024-03-08 深圳地平线机器人科技有限公司 确定目标对象朝向的方法、装置、系统、介质及电子设备
CN111931643A (zh) * 2020-08-08 2020-11-13 商汤集团有限公司 一种目标检测方法、装置、电子设备及存储介质
WO2022110877A1 (zh) * 2020-11-24 2022-06-02 深圳市商汤科技有限公司 深度检测方法、装置、电子设备、存储介质及程序
US11475628B2 (en) * 2021-01-12 2022-10-18 Toyota Research Institute, Inc. Monocular 3D vehicle modeling and auto-labeling using semantic keypoints
US11922640B2 (en) * 2021-03-08 2024-03-05 Toyota Research Institute, Inc. Semi-supervised 3D object tracking in videos via 2D semantic keypoints
US20220383543A1 (en) * 2021-05-26 2022-12-01 Abb Schweiz Ag Multi-Stage Autonomous Localization Architecture for Charging Electric Vehicles
CN113469115A (zh) * 2021-07-20 2021-10-01 阿波罗智联(北京)科技有限公司 用于输出信息的方法和装置
CN113449373B (zh) * 2021-07-21 2024-04-30 深圳须弥云图空间科技有限公司 重叠检测方法、装置及电子设备
CN115345919B (zh) * 2022-08-25 2024-04-12 北京精英路通科技有限公司 一种深度确定方法、装置、电子设备以及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090041337A1 (en) * 2007-08-07 2009-02-12 Kabushiki Kaisha Toshiba Image processing apparatus and method
CN101915573A (zh) * 2010-08-04 2010-12-15 中国科学院自动化研究所 一种基于标记物的关键点检测的定位测量方法
CN102262724A (zh) * 2010-05-31 2011-11-30 汉王科技股份有限公司 目标图像特征点定位方法和目标图像特征点定位系统
CN102901446A (zh) * 2012-09-27 2013-01-30 无锡天授信息科技有限公司 一种运动目标三维立体定位系统及方法
CN104021368A (zh) * 2013-02-28 2014-09-03 株式会社理光 估计路面高度形状的方法和系统
CN106251395A (zh) * 2016-07-27 2016-12-21 中测高科(北京)测绘工程技术有限责任公司 一种三维模型快速重建方法及系统
CN107203962A (zh) * 2016-03-17 2017-09-26 掌赢信息科技(上海)有限公司 一种利用2d图片制作伪3d图像的方法及电子设备
CN108038902A (zh) * 2017-12-07 2018-05-15 合肥工业大学 一种面向深度相机的高精度三维重建方法和系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3052681B2 (ja) * 1993-08-06 2000-06-19 松下電器産業株式会社 3次元動画像生成装置
US5910817A (en) * 1995-05-18 1999-06-08 Omron Corporation Object observing method and device
GB2383915B (en) * 2001-11-23 2005-09-28 Canon Kk Method and apparatus for generating models of individuals
US7262767B2 (en) * 2004-09-21 2007-08-28 Victor Company Of Japan, Limited Pseudo 3D image creation device, pseudo 3D image creation method, and pseudo 3D image display system
JP4600760B2 (ja) * 2005-06-27 2010-12-15 アイシン精機株式会社 障害物検出装置
JP2010256252A (ja) * 2009-04-27 2010-11-11 Topcon Corp 三次元計測用画像撮影装置及びその方法
JP5299173B2 (ja) * 2009-08-26 2013-09-25 ソニー株式会社 画像処理装置および画像処理方法、並びにプログラム
US8260539B2 (en) * 2010-05-12 2012-09-04 GM Global Technology Operations LLC Object and vehicle detection and tracking using 3-D laser rangefinder
BR112015031284B1 (pt) * 2013-06-13 2020-06-23 Autonomic Materials, Inc. Material polimérico de autocura e método de criação do referido material
CN105313782B (zh) * 2014-07-28 2018-01-23 现代摩比斯株式会社 车辆行驶辅助系统及其方法
CN107093171B (zh) * 2016-02-18 2021-04-30 腾讯科技(深圳)有限公司 一种图像处理方法及装置、系统
US10372970B2 (en) * 2016-09-15 2019-08-06 Qualcomm Incorporated Automatic scene calibration method for video analytics
US10235771B2 (en) * 2016-11-11 2019-03-19 Qualcomm Incorporated Methods and systems of performing object pose estimation
US10031526B1 (en) * 2017-07-03 2018-07-24 Baidu Usa Llc Vision-based driving scenario generator for autonomous driving simulation
CN108229305B (zh) * 2017-11-21 2021-06-04 北京市商汤科技开发有限公司 用于确定目标对象的外接框的方法、装置和电子设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090041337A1 (en) * 2007-08-07 2009-02-12 Kabushiki Kaisha Toshiba Image processing apparatus and method
CN102262724A (zh) * 2010-05-31 2011-11-30 汉王科技股份有限公司 目标图像特征点定位方法和目标图像特征点定位系统
CN101915573A (zh) * 2010-08-04 2010-12-15 中国科学院自动化研究所 一种基于标记物的关键点检测的定位测量方法
CN102901446A (zh) * 2012-09-27 2013-01-30 无锡天授信息科技有限公司 一种运动目标三维立体定位系统及方法
CN104021368A (zh) * 2013-02-28 2014-09-03 株式会社理光 估计路面高度形状的方法和系统
CN107203962A (zh) * 2016-03-17 2017-09-26 掌赢信息科技(上海)有限公司 一种利用2d图片制作伪3d图像的方法及电子设备
CN106251395A (zh) * 2016-07-27 2016-12-21 中测高科(北京)测绘工程技术有限责任公司 一种三维模型快速重建方法及系统
CN108038902A (zh) * 2017-12-07 2018-05-15 合肥工业大学 一种面向深度相机的高精度三维重建方法和系统

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7393270B2 (ja) 2020-03-25 2023-12-06 株式会社コア 情報処理装置、情報処理方法及び情報処理プログラム
US11321862B2 (en) 2020-09-15 2022-05-03 Toyota Research Institute, Inc. Systems and methods for multi-camera modeling with neural camera networks
US11494927B2 (en) 2020-09-15 2022-11-08 Toyota Research Institute, Inc. Systems and methods for self-supervised depth estimation
US11508080B2 (en) 2020-09-15 2022-11-22 Toyota Research Institute, Inc. Systems and methods for generic visual odometry using learned features via neural camera models
US11615544B2 (en) 2020-09-15 2023-03-28 Toyota Research Institute, Inc. Systems and methods for end-to-end map building from a video sequence using neural camera models
CN113221751A (zh) * 2021-05-13 2021-08-06 北京百度网讯科技有限公司 关键点检测的方法、装置、设备以及存储介质
CN113221751B (zh) * 2021-05-13 2024-01-12 北京百度网讯科技有限公司 关键点检测的方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
JP6949266B2 (ja) 2021-10-13
CN110826357A (zh) 2020-02-21
SG11202100378UA (en) 2021-02-25
CN110826357B (zh) 2022-07-26
JP2021524115A (ja) 2021-09-09
US11100310B2 (en) 2021-08-24
US20210165997A1 (en) 2021-06-03

Similar Documents

Publication Publication Date Title
WO2020029758A1 (zh) 对象三维检测及智能驾驶控制的方法、装置、介质及设备
JP7002589B2 (ja) 対象の三次元検出およびインテリジェント運転制御方法、装置、媒体および機器
US11216971B2 (en) Three-dimensional bounding box from two-dimensional image and point cloud data
US10733482B1 (en) Object height estimation from monocular images
WO2020108311A1 (zh) 目标对象3d检测方法、装置、介质及设备
US20210117704A1 (en) Obstacle detection method, intelligent driving control method, electronic device, and non-transitory computer-readable storage medium
JP7091485B2 (ja) 運動物体検出およびスマート運転制御方法、装置、媒体、並びに機器
US20210078597A1 (en) Method and apparatus for determining an orientation of a target object, method and apparatus for controlling intelligent driving control, and device
CN114170826B (zh) 自动驾驶控制方法和装置、电子设备和存储介质
WO2023036083A1 (zh) 传感器数据处理方法、系统及可读存储介质
CN112509126A (zh) 三维物体检测的方法、装置、设备及存储介质
CN114648639B (zh) 一种目标车辆的检测方法、系统及装置
US11417063B2 (en) Determining a three-dimensional representation of a scene
CN113312979B (zh) 图像处理方法、装置、电子设备、路侧设备及云控平台
CN116152345B (zh) 一种嵌入式系统实时物体6d位姿和距离估计方法
CN115345919A (zh) 一种深度确定方法、装置、电子设备以及存储介质
CN114266900A (zh) 一种基于动态卷积的单目3d目标检测方法
CN116342948A (zh) 纯视觉自动驾驶场景空间占据识别方法、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19848039

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021501280

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19848039

Country of ref document: EP

Kind code of ref document: A1