CN111460852A

CN111460852A - Vehicle-mounted 3D target detection method, system and device

Info

Publication number: CN111460852A
Application number: CN201910047275.3A
Authority: CN
Inventors: 邹芳喻; 段成伟
Original assignee: Shanghai OFilm Smart Car Technology Co Ltd
Current assignee: Shanghai OFilm Smart Car Technology Co Ltd
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2020-07-28

Abstract

According to the vehicle-mounted 3D target detection method, after video images around a vehicle are collected, a target object needing to be detected is identified based on the video images, key points are extracted from the target object, and meanwhile position information of the key points is obtained to determine the relative position of the target object and the vehicle. Then, the target object is tracked based on the position information of the key points so as to improve the accuracy of target detection. The application also relates to an object detection system and an object detection device for executing the object detection method.

Description

Vehicle-mounted 3D target detection method, system and device

Technical Field

The present disclosure relates to the field of vehicles, and more particularly, to a driving-assisted vehicle-mounted 3d (three dimensions) target detection method, and a target detection system and a target detection apparatus for implementing the method.

Background

The vehicle-mounted target detection system is a common driving auxiliary system at present, a video acquisition device is installed on a vehicle to acquire a video of a target, and an on-vehicle processor is adopted to perform image target detection on the acquired video, so that early warning is provided for vehicle driving, and driving safety is improved. The target detection is mainly aimed at motor vehicles or non-motor vehicles, pedestrians, animals, roadblocks and the like in the road environment. For different application scenarios, different detection methods are generally adopted by the target detection system, such as a detection system for a single target class (e.g., a pedestrian) or a two-dimensional detection system for multiple target classes.

Three-dimensional (3D) target detection system is difficult to consider detection speed, position precision and detection accuracy simultaneously by the vehicle-mounted processor because of large data operation and analysis amount. In the existing 3D target detection system, the position precision is not high, so that the response speed of the detection system is reduced; some accuracy or response speed is too low, so that the position precision requirement of a real scene cannot be met.

Disclosure of Invention

The application provides a 3D target detection method for improving detection speed and accuracy, and certain position precision information can be ensured; simultaneously, this application still provides a target detection system and target detection device based on-vehicle all around system. The technical scheme is as follows:

a vehicle-mounted 3D target detection method comprises the following steps:

collecting video images around a vehicle;

identifying a target object to be detected based on the video image;

extracting key points from the target object and acquiring position information of the key points;

and tracking the target object based on the position information of the key points.

According to the vehicle-mounted 3D target detection method, a vehicle-mounted processor obtains a target object model through deep learning and other modes, and then based on the target object model, the space model of the target object is simplified by extracting the key points of the target object identified in the video image. And keeping tracking the target based on the key points so as to improve the accuracy of detecting the target object, and finally outputting a detection result. According to the vehicle-mounted 3D target detection method, the space model of the target object is simplified, the key points are adopted to replace the entity model of the target object, the load of data processing is reduced to a large extent, the result output speed of the vehicle-mounted 3D target detection method is improved, the detection accuracy is optimized through tracking of the target object, and the use requirement of a real scene is met.

Wherein the extracting key points from the target object and acquiring the position information of the key points comprises:

establishing a space block diagram for the identified target object;

extracting the contour intersection points of the space block diagram as the key points;

and acquiring the position information of the key point.

The 3D target detection method can also utilize the space block diagram to establish a space model of the target object, and the number and the positions of the key points are also determined while the space model of the target object is simplified. Thereby, the position information of the key point can be extracted for the target object as quickly as possible.

And acquiring the position information of the key point based on a global coordinate system taking the geometric center of the vehicle outline structure as an origin. The geometric center of the vehicle may provide a stable objective reference for the global coordinate system.

Wherein the extracting key points from the target object and obtaining the position information of the key points further comprises:

calculating a distance value between the target object and a vehicle based on the position information of the key points.

The vehicle-mounted 3D target detection method can also calculate the distance value between the target object and the vehicle through the identification of the target object, and then sends a reminding signal to a driver when the target object is close to the vehicle to remind the driver of avoiding.

Wherein the calculating a distance value between the target object and a vehicle based on the position information of the key points comprises:

and calculating the distance value between the key point and the vehicle based on the prestored vehicle outline dimension.

And improving the accuracy of the distance value between the key point and the vehicle by utilizing the external dimension of the vehicle.

Wherein the establishing a spatial frame diagram for the identified target object comprises:

establishing a two-dimensional block diagram of a target object in a visual direction;

and establishing a three-dimensional block diagram for the target object in the side-looking direction.

According to different visual angle orientations, different ways of establishing the spatial block diagram are adopted, so that the spatial block diagram can describe the target object more accurately, and the accuracy of the method is improved.

if the space block diagram is a two-dimensional block diagram, calculating the coordinate of any one key point as the distance value of the target object;

if the space frame diagram is a three-dimensional frame diagram, selecting the coordinates of the key points which are closest to the vehicle from the key points to calculate the distance value of the target object.

The spatial block diagram may be a two-dimensional block diagram or a three-dimensional block diagram because of the difference in view angle to the target object in the video image. Different calculation methods are adopted for the space block diagrams presenting different unique dimensions, the key points can be further screened on the basis that the target object is simplified into a plurality of key points, only the key points corresponding to the calculation methods are extracted to calculate the distance value between the target object and the vehicle, and the response speed of the vehicle-mounted 3D target detection method is further increased.

Wherein, if the spatial frame diagram is a two-dimensional frame diagram, calculating the coordinate of any one of the key points as the distance value of the target object includes:

when the two-dimensional block diagram is a circle, an ellipse or a continuous curve, determining the bisector or the pole as the key point. This avoids the disadvantage that the two-dimensional diagram is circular, elliptical or continuous, without sharp contour intersections.

Wherein, if the spatial frame diagram is a three-dimensional frame diagram, selecting the coordinates of the key points closest to the vehicle from the key points to calculate the distance value of the target object, includes:

the key point closest to the vehicle is located at a common edge position of at least two picture frames. Therefore, the key points closest to the vehicle can be quickly and accurately found.

Wherein the tracking the target object based on the location information of the keypoints comprises:

judging the motion trend of the target object according to the coordinate variation of the key point;

identifying the target object in subsequent video images based on the motion trend.

According to the vehicle-mounted 3D target detection method, the movement trend of the target object relative to the vehicle can be obtained by calculating and analyzing the coordinate variation of the key point in the video images with continuous frames. In the subsequent video images, the target object is purposefully identified in combination with the corresponding region of the motion trend in the video images, so that the identification and tracking efficiency of the target object can be improved.

Wherein, the determining the movement trend of the target according to the coordinate variation of the key point includes:

obtaining a current vehicle running state;

and judging the motion trend of the target object by combining the coordinate variation of the key point and the current vehicle running state.

The vehicle-mounted 3D target detection method can also be combined with the real-time running state of the vehicle, including information such as speed, gear, direction and brake, so that the coordinate variation of the key points can be combined to judge the motion trend of the target object, and the identification and tracking efficiency of the target object can be further improved.

Wherein the current vehicle driving state includes a change in a coordinate system with the geometric center of the vehicle as an origin. And converting the driving state of the vehicle into a data state which can be processed by an on-board processor.

Wherein the tracking the target object based on the location information of the keypoints further comprises:

and if the target object identified in the subsequent video image is deviated from the motion trend, fusing the target object in the subsequent video image with the motion trend.

When the vehicle-mounted 3D target detection method tracks the target object, the target object may be lost in the video image of a certain frame or the continuous video image within a certain period of time. The lost reason of the target object is complex and uncontrollable, so that the lost target object is fused by judging the motion trend of the target object through the video image in the previous period, the target object is continued for a period of time in the video image, the phenomenon that the target object is lost due to identification deviation or environmental change can be avoided, and further the occurrence of an unexpected situation is prevented.

Wherein the fusing the target object in the subsequent video image with the motion trend comprises:

the possible coordinate positions of the key points are calculated in subsequent video images. And completing the description of the target object through the coordinate positions of the key points.

In the video images around the collected vehicle, the video images are panoramic surrounding video images.

The vehicle-mounted 3D target detection method is applied to the panoramic surrounding video image, and can be used for comprehensively identifying the target of the surrounding environment of the vehicle, so that all possible target objects around the vehicle are identified, and the safety of the vehicle is improved.

And combining and selecting the panoramic surrounding video image to reduce the operation amount of the video image.

The method comprises the steps of acquiring a video image around a vehicle, wherein the video image is a night vision video image. To extend the environmental compatibility of the method of the present application.

The application relates to an on-vehicle 3D target detection system, include:

the video acquisition module is used for acquiring video images around the vehicle;

the target detection module is used for identifying a target object to be detected based on the video image;

the 3D regression module is used for extracting key points of the identified target object and acquiring position information of the key points;

and the target tracking module is used for tracking the target object based on the position information of the key point.

The vehicle-mounted 3D target detection system can implement the 3D target detection method through the cooperation of the modules, so that the target object can be quickly detected, meanwhile, the high detection accuracy is achieved, a good identification effect is obtained, and the use requirement of a vehicle in a real scene is met.

The application also relates to a vehicle-mounted 3D object detection device, which comprises a processor, an input device, an output device and a storage device, wherein the processor, the input device, the output device and the storage device are connected with each other, the storage device is used for storing a computer program, the computer program comprises program instructions, and the processor is configured to call the program instructions and execute the 3D object detection method.

Similarly, the vehicle-mounted 3D target detection device of the application also has higher detection accuracy while rapidly detecting the target object by executing the 3D target detection method, so that a better recognition effect is obtained, and the use requirement of the vehicle in a real scene is met.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the embodiments of the present application will be briefly described below.

Fig. 1 is a flowchart of a 3D target detection method provided in an embodiment of the present application;

FIG. 2 is a logic diagram of a 3D object detection method described herein;

FIG. 3 is a flowchart of the sub-step of step S30 in the 3D object detection method shown in FIG. 1;

fig. 4 is a schematic diagram of a road surface scene as a video image according to an embodiment of the present application;

FIG. 5 is a flowchart of the sub-step of step S34 in the 3D object detection method shown in FIG. 1;

fig. 6 is a flowchart of a sub-step of step S40 in the 3D object detection method shown in fig. 1;

fig. 7 is a flowchart of a sub-step of step S41 in the 3D object detection method shown in fig. 1;

FIG. 8 is a schematic diagram of a 3D object detection system provided by an embodiment of the present application;

fig. 9 is a schematic diagram of a 3D object detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1 and fig. 2, fig. 1 is a flowchart of a 3D object detection method according to an embodiment of the present application, and fig. 2 is a logic diagram of the 3D object detection method according to the present application. In an embodiment of the present application, the 3D target detection method at least includes the following steps:

s10, acquiring video images around the vehicle;

specifically, the vehicle is provided with a video acquisition device, and the video acquisition device can be a camera or a vehicle-mounted camera, such as a fisheye camera. The video acquisition device acquires video images around the vehicle in real time in the driving process of the vehicle, and transmits the acquired video images to the vehicle-mounted processor for analysis. The video image is a sequence of consecutive still images.

S20, identifying a target object to be detected based on the video image;

specifically, a target object model obtained by deep learning or the like is prestored in the onboard processor. The target object to be detected may be a motor vehicle, a non-motor vehicle, a pedestrian, an animal, a barricade, etc. And the vehicle-mounted processor detects the video image acquired by the video acquisition device based on the target object model so as to distinguish whether a target object exists in the video image. In this embodiment, the video images may be a sequence of consecutive still images of a target object, such as a motor vehicle, a non-motor vehicle, a pedestrian, an animal, a barricade, etc. It can be understood that before the video image is detected, certain preprocessing needs to be performed on the video image, such as image processing for eliminating image distortion or splicing multiple video images when multiple video acquisition devices are provided. On the other hand, when the video image detection operation is performed, frame-by-frame detection is required, that is, the video image acquired by the video acquisition device is extracted frame-by-frame, and the video image extracted frame-by-frame is detected until the target object is identified and acquired. Specifically, if no target object exists in the video image extracted by the current frame, detecting the next frame of video image; and if the target object exists in the video image extracted by the current frame, performing subsequent operation on the target object.

S30, extracting key points from the target object and acquiring the position information of the key points;

specifically, when a target object is detected in the video image, the type, the relative orientation and other information of the target object can be identified according to a target object model obtained by deep learning, different algorithms are adopted to simplify the target objects of different types and orientations to obtain a spatial model, and the key point of the target object is extracted from the simplified spatial model. Furthermore, the position information of each key point is obtained through the calibration of internal and external parameters of the video acquisition device. The position information is a three-dimensional coordinate value. Thus, in the object detection method of the present application, the object in the detected video image is represented by the three-dimensional coordinate values of the key points, and the relative position of the object in the video image is described by the determination of the three-dimensional coordinate values of the key points.

And S40, tracking the target object based on the position information of the key points.

Specifically, the description accuracy of the position information of the key point to the target object is relatively low, and an identification error may exist in a frame of video image to the target object, so that the accuracy of the 3D target detection method is reduced. In addition, in the running process of the vehicle, the position information of the target object relative to the vehicle is usually in a changing state, so that the target object needs to be tracked and identified. By continuously tracking the target object in continuous multi-frame video images, on one hand, whether the identification of the target object is accurate is verified, and on the other hand, the motion trend of the target object can be predicted by analyzing the extracted motion track of the key point, so that whether the target object can obstruct the vehicle traveling route is judged, and the vehicle driver is timely reminded and warned.

It is understood that a plurality of target objects may be contained in the same frame of video image, and the types and orientations of the plurality of target objects are different from each other. Meanwhile, each target object usually has a relatively complex outline, and if a relatively accurate spatial model is established for each target object and three-dimensional coordinate values are analyzed one by one according to the accurate spatial model, the calculation amount is too large. The vehicle-mounted processor is generally difficult to rapidly complete data analysis with a large amount of computation and output an analysis result in real time, so that the output speed is easy to lag, and the response speed of 3D target detection cannot meet the use requirement of a real scene. According to the 3D target detection method, the video images around the vehicle are obtained in real time in the driving process of the vehicle, the target object model is obtained by combining deep learning training, and target objects such as motor vehicles, pedestrians and the like in the video images can be identified. Further, through the target object model obtained through deep learning training, a corresponding space model can be created for the identified target object. The space model is provided with a plurality of key points, and the 3D target detection method provided by the application defines the position information of the target object by analyzing the position information of the key points extracted from the space model. Therefore, the target object with the complex shape only needs to extract the corresponding key points to analyze the position information after being simplified, and the operation amount of the vehicle-mounted processor is greatly simplified, so that the 3D target detection method can rapidly output the detection result, and the requirement of detecting the target in the driving process of the vehicle is met. Meanwhile, the 3D target detection method tracks the target object based on continuous multi-frame video images, and can improve the identification accuracy of the target object.

It needs to be provided that, in the 3D target detection method of the present application, the target object model is obtained based on deep learning of a machine. In the process of executing the method by the vehicle-mounted processor, the result of identifying and tracking the target object each time can be stored in the target object model, and the target object model obtained by deep learning is continuously trained, so that the accuracy of the target object model is continuously improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a sub-step of step S30 in the 3D object detection method shown in fig. 1. In this embodiment of the present application, the step S30 extracts a key point from the target object, and acquires position information of the key point, where the step includes:

s31, establishing a space block diagram for the identified target object;

specifically, please refer to a schematic diagram of a road surface scene shown in fig. 4 as a video image, and according to the shape recognition of the target object in the video image, a spatial block diagram matching the shape is searched from the target object model, and the spatial block diagram matching the shape is adopted to establish a spatial model for the target object. The spatial block diagram requires at least a large portion of the area in the video image in which the target object is housed. In other embodiments, the spatial block diagram completely houses the target object in the video image to avoid portions of the target object protruding beyond portions of the spatial block diagram from interfering with the vehicle's path of travel.

S32, extracting the outline intersection points of the space block diagram as the key points;

specifically, the spatial frame diagram is formed by a plurality of contour lines in the video image, the contour lines surround the outer edge of the target object to accommodate the target object, and the contour lines intersect to form a plurality of contour intersection points in the video image. According to the 3D target detection method, the contour intersection points are extracted as key points, and therefore the target object is positioned by calculating the coordinate information of the key points. And calculating the three-dimensional coordinates of each key point, namely the position information of each key point by the internal and external parameter calibration relation of the video acquisition device and combining ground and other reference objects in the video image. The origin of the three-dimensional coordinates is preferably arranged at the geometric center of the vehicle outline structure, so that when a plurality of video acquisition devices are arranged at different positions of the vehicle, coordinate systems of video images formed by splicing the video acquisition devices are mutually unified. Therefore, the three-dimensional coordinate values of the key points are also the three-dimensional coordinate values under a unified coordinate system, and the vehicle-mounted processor can conveniently carry out unified processing and judgment on the position information of the key points.

S33, acquiring the position information of the key points;

specifically, the position information of each key point is obtained by calibrating the internal and external parameters of the video acquisition device. The position information is a three-dimensional coordinate value, preferably a three-dimensional coordinate value of a global coordinate system having an origin of coordinates of a geometric center of the vehicle exterior structure. Thus, in the object detection method of the present application, the object in the detected video image is represented by the three-dimensional coordinate values of the key points, and the relative position of the object in the video image is described by the determination of the three-dimensional coordinate values of the key points.

In an embodiment, after extracting key points from the target object and acquiring location information of the key points in step S30, the 3D target detection method of the present application may further include:

and S34, calculating a distance value between the target object and the vehicle based on the position information of the key points.

Specifically, the onboard processor may have the vehicle profile information pre-stored therein. The contour information may be a contour dimension of the vehicle, or a set of a plurality of coordinate points describing a contour of the vehicle. After the position information of the key points is extracted, the position information of the key points can be compared with the appearance information of the vehicle, so that the distance value between the key point of each target object, which is closest to the vehicle, and the vehicle is calculated, and the distance value is the distance value between the target object and the vehicle. It can be understood that a first threshold related to the distance value can be preset in the vehicle-mounted processor, and when the distance value between the target object and the vehicle is smaller than the first threshold, the vehicle-mounted processor can send an instruction to the vehicle-mounted reminding device to send out a reminding signal for reminding a driver that the driver is close to the target object and avoids the target object.

Referring to fig. 5, fig. 5 is a flowchart illustrating a sub-step of step S34 in the 3D object detection method shown in fig. 1. In the embodiment of the present application, the step S34 of calculating the distance value between the target object and the vehicle based on the position information of the key point may include the sub-steps as shown in fig. 5:

s341, if the space block diagram is a two-dimensional block diagram, calculating the coordinate of any one key point as the distance value of the target object;

and S342, if the space block diagram is a three-dimensional block diagram, selecting the coordinate of the key point closest to the space block diagram as the distance value of the target object.

Specifically, please refer to fig. 4. As can be seen from fig. 4, there is a difference in the corresponding spatial block diagram given from the target object model according to the different shapes of the target object in the video image. For a target object with a front view direction relative to the video acquisition device, the space frame diagram matched from the target object model is a two-dimensional frame diagram. Because the image obtained by the video capture device is close to the planar image relative to the perspective of the front view orientation, the target object whose planar image appears in the video image can be fully described in the form of a two-dimensional block diagram. Further, for the target object space model of the two-dimensional block diagram, since it is embodied as a plane in the space coordinate system, and the vertical distance from any point on the plane to the origin of the coordinate system is the same. Therefore, for a target object whose spatial frame is a two-dimensional frame, the coordinates of any one of its key points can be taken to calculate its distance value, and all of them can be determined as the distance value of the target object relative to the vehicle.

The target object is a side-view object with respect to the video acquisition device, and the spatial block diagram matched from the target object model is a three-dimensional block diagram. Correspondingly, according to the perspective principle, the video acquisition device can complete the accommodation of the target object only by two picture frames for the image obtained by the target object in the side-looking direction. The at least two picture frames are each two-dimensional picture frames. Therefore, a spatial frame diagram matched from the target object model for a target object requiring two or more frame accommodation is defined as a three-dimensional frame diagram. In the target object described by the three-dimensional block diagram, the vertical distances of all key points of the target object relative to the origin of the coordinate system are different. At this time, the key point closest to the vehicle may be searched for according to the calculation result of the position information of each key point, and the distance value of the target object from the vehicle may be determined according to the three-dimensional coordinate value of the key point. Typically, the keypoint closest to the vehicle will occur at the common edge position of at least two picture frames.

It should be noted that, in the embodiment of fig. 4, the two-dimensional block diagram or the two-dimensional frame matched to the target object has a shape including a rectangle, a trapezoid, a parallelogram, or the like. These shapes typically have four contour intersections, i.e., four keypoints can be extracted from a two-dimensional block diagram or two-dimensional picture frame. However, the 3D object detection method of the present application is not limited to the specific shape of the spatial block diagram. Any frame shape that can accommodate the target object can be applied to the space diagram to accurately define the target object. Such as triangular, polygonal, including circular, etc., may be used as shapes in the spatial block diagram. When the space frame diagram is in a shape of a circle, an ellipse, a continuous curve and the like, contour intersection points can be defined by means of equally dividing points, poles relative to coordinates and the like, then key points of the defined contour intersection points are extracted, and the effect of simplifying description on a target object by the 3D target detection method can be achieved.

According to the 3D target detection method, the space model of the target object is simplified to a certain extent, so that the detection result can be rapidly output. However, the target object may have an identification error after simplification based on the video image. Therefore, in order to improve the recognition accuracy of the target object, after the target object is recognized, a step of tracking the detected target object in the subsequent video image is further provided.

Referring to fig. 6, fig. 6 is a flowchart illustrating a sub-step of step S40 in the 3D object detection method of fig. 1. In this embodiment, the step S40 of tracking the target object based on the position information of the key points includes:

s41, judging the movement trend of the target object according to the coordinate variation of the key points;

specifically, in the process of detecting the target object frame by frame, the target object continuously appearing in the frame by frame video image needs to be identified and the key point needs to be extracted. Because the vehicle is in a traveling state, the target object is usually in motion relative to the vehicle. Accordingly, the position of the target object in the frame-by-frame video image changes, and the position of the spatial frame diagram in the video image changes accordingly. And all the key points are extracted from the spatial diagram of the changed positions, and the position information (namely the three-dimensional coordinate values) of the key points are also changed together. According to the coordinate variation of each key point, the movement trend of the target object relative to the vehicle can be judged. The movement tendency includes a traveling speed of the target object with respect to the vehicle, and a moving direction of the target object.

Referring to fig. 7, fig. 7 is a flowchart illustrating a sub-step of step S41 in the 3D object detection method shown in fig. 6. In this embodiment of the application, the step S41 of determining the movement trend of the target according to the coordinate variation of the key point includes:

s411, obtaining the current vehicle running state;

and S412, combining the coordinate variation of the key point and the current vehicle running state to judge the motion trend of the target object.

Specifically, the vehicle is also provided with various sensors for detecting the vehicle speed, the transmission gear, and the vehicle turning direction. Based on the detection results of these sensors, parameters related to the vehicle running state, such as the current running speed and running direction of the vehicle, can be obtained in real time. The self-running state of the vehicle determines the change of a coordinate system with the geometric center of the vehicle as the origin. When the motion trend of the target object is judged, the motion trend of the target object can be more accurately judged by combining the driving state of the vehicle. Since the position of the target object relative to the vehicle varies depending on, on the one hand, the state of motion of the target object itself and, on the other hand, the driving state of the vehicle. Therefore, the coordinate variation of the key points extracted from the target object and the running state of the current vehicle are combined to judge the motion trend of the target object, and the tracking precision of the 3D target detection method to the target object can be improved.

S42, identifying the target object in the subsequent video image based on the motion trend.

Specifically, according to the motion trend of the target object, the position of the target object, which will appear in the subsequent video image, can be predicted. According to the motion speed of the target object relative to the vehicle, the translation distance of the target object in the subsequent video images can be predicted; depending on the direction of movement of the target object relative to the vehicle, the direction of translation of the target object in subsequent video images can be predicted. Therefore, when the on-board processor identifies the target object accurately, the target object can be more accurately positioned in the subsequent video images based on the motion trend judgment of the target object, and the identification speed is increased. On the contrary, when the target object cannot be accurately positioned in the subsequent video image based on the judgment of the motion trend of the target object, whether the current identification of the target object is accurate can be checked according to the actual position of the target object in the subsequent video image. When the deviation of the currently identified target object is detected, the vehicle-mounted processor can correct the identification of the target object in time and update the model of the target object, so that the identification accuracy of the 3D target detection method is improved.

S43, if the target object identified in the subsequent video image is deviated from the motion trend, fusing the target object in the subsequent video image and the motion trend.

Specifically, when the target object is identified in the subsequent video image according to the motion trend of the target object, if the target object cannot be identified in the subsequent image, or the position of the target object has a large deviation from the position predicted by the motion trend, the target object may be lost or distorted. The target object is lost or distorted for many reasons, such as inaccurate identification of the target object, a change in speed of the target object, a blockage of the target object, a frame drop of the video capture device, and poor signal transmission. When the unexpected situation disappears, the lost target object can return to the video image acquired by the video acquisition device again, and the distorted target object can be matched with the motion trend again. For the loss or distortion of the target object in the subsequent video image, if the target object is caused because the target object is actually far away from the vehicle, the normal running of the vehicle is not influenced. However, if the object is lost or distorted due to an error in the detection system, the object still exists within the visible range of the vehicle objectively, and the object is not recognized or judged incorrectly, so that the vehicle-mounted processor cannot respond correctly to the object existing objectively in time, and the object is ignored, which may cause adverse effects.

Therefore, according to the 3D target detection method, when the target object is not identified in the subsequent video images or the tracking of the target object has a deviation, the target object is defined in the subsequent video images in a fusion manner according to the motion trend of the target object according to the judgment of the motion trend of the target object. It can be understood that when the target object is lost, the target object key points judged according to the motion trend are directly labeled in the video image. Specifically, according to the position information of the key point corresponding to the target object, the coordinate position where the key point may appear is calculated in the subsequent video image by combining the movement speed and the movement direction of the key point. And describing the target object in the subsequent video image with a plurality of key points related to the target object. The fusion action in the subsequent video image can be continuously applied to a plurality of subsequent video images to check whether the target object is actually lost. And when the target object has larger deviation in the video image, fusing the key point position of the target object judged according to the motion trend with the key point position of the target object in the video image so as to correct the position information of the key point of the target object.

It can be understood that when the target object reappears in the video image after one or more frames, or the deviation between the target object and the predicted motion trend tends to be normal after one or more frames of video images are separated, it may be determined that the fusion action is finished, and the subsequent tracking action is continued with the reappeared target object as a reference. When the target object does not reappear in the multi-frame video images, or the actual position of the target object in the video images continuously has a large deviation from the position judged according to the motion trend, it can be judged that the target object has left the visible range of the vehicle, or the target object is judged incorrectly (including errors in judging the type of the target object, judging the motion trend of the target object, etc.), and the fusion action can also be ended. The duration of the fusion operation may be set manually, or may be set automatically after determining different types of target objects by deep learning of the machine. The 3D target detection method introduces the step of defining the target object in the subsequent video image in a fusion mode according to the motion trend of the target object, and aims to correct the system or device executing the method and the like, so that adverse consequences caused by errors of the system or device are avoided.

In one embodiment, in step S10, the video image of the surrounding of the vehicle is captured as a panoramic surround video image.

Specifically, the video acquisition device can be set to acquire videos in a panoramic surrounding manner. Such as the currently used embodiment in which cameras are provided at four positions of the front, rear, left, and right of the vehicle. The panoramic surrounding video image can judge and detect the road conditions around the vehicle more comprehensively, so that all possible target objects around the vehicle can be identified, and the safety of the vehicle is improved. It can be understood that the vehicle-mounted processor can also combine and select the panoramic surrounding video acquisition devices according to the monitoring of the vehicle-mounted sensor on the running state of the vehicle. If the car is backing, only the video acquisition devices in the back, left and right directions can be selected to start the video acquisition function. In this case, since the vehicle does not travel forward and the driver's field of view can also be considered forward, the amount of computation such as distortion removal and stitching of the video image can be reduced after the front-side video image is reduced, and the detection speed of the target object can be increased. The 3D target detection method has no strict requirements on the number and the type of the video acquisition devices. The vehicle-mounted video acquisition device can be realized by adopting fisheye cameras distributed around the vehicle. Certainly, in order to improve the environmental compatibility of the video capture device, a camera with a night vision function may be used to capture a video, so that the video image is a night vision video image.

Referring to fig. 8, the present application relates to a vehicle-mounted 3D object detection system 100, which includes:

the video capture module 101: the system is used for acquiring video images around the vehicle;

the target detection module 102: the target object to be detected is identified based on the video image;

3D regression module 103: the system is used for extracting key points of the identified target object and acquiring position information of the key points;

the target tracking module 104: for tracking the target object based on the location information of the keypoints.

The vehicle-mounted 3D target detection system 100 of the present application first acquires video images around a vehicle through the video acquisition module 101. The video capture module 101 transmits the captured video images to the target detection module 102. The target detection module 102 stores a target object model obtained by deep learning, and the target detection module 102 identifies a target object in the video image based on the target object model. The 3D regression module 103 establishes a spatial model according to the target object identified by the target detection module 102, extracts key points based on the spatial model, and acquires position information of the key points. The 3D regression module 103 may calculate the distance between the target object and the vehicle according to the position information of the key point, so as to assist the driver in determining the surrounding driving environment, and send a warning signal to the driver if necessary. Further, the target tracking module 104 tracks the target object in the subsequent video image based on the position information of the key point extracted by the 3D regression module 103, and improves the identification accuracy of the target object according to the tracking result. Meanwhile, the target tracking module 104 may also transmit the tracking result of the target object each time to the target detection module 102 for training the target object model obtained by deep learning. The target detection system 100 performs target object recognition on a video image obtained by a video acquisition module 101 through a target object model of a target detection module 102, simplifies a target object recognized in the video image in a three-dimensional coordinate system through a 3D regression module 103, and extracts position information of key points according to a space model obtained through simplification to obtain a distance between the target object and a vehicle. Finally, the target tracking module 104 also tracks the target object based on the key points to improve the accuracy of the target detection system 100 of the present application. The target detection system 100 of the application has the advantages that the target object can be rapidly detected, meanwhile, the higher detection accuracy is achieved, the better recognition effect is obtained, and the use requirement of the vehicle in a real scene is met.

In an optional embodiment, the 3D regression module 103 is further configured to establish a spatial block diagram for the identified target object, extract a contour intersection point of the spatial block diagram as a key point, and obtain position information of the key point.

In an alternative embodiment, the 3D regression module 103 is further configured to obtain the position information of the key points based on a global coordinate system with the geometric center of the vehicle outline structure as an origin.

In an alternative embodiment, the 3D regression module 103 is further configured to calculate a distance value of the target object to the vehicle based on the position information of the key points.

In an alternative embodiment, the 3D regression module 103 is further configured to calculate a distance value between the key point and the vehicle based on a pre-stored vehicle profile.

In an alternative embodiment, the 3D regression module 103 is configured to build a two-dimensional block diagram for the target object in the side-looking orientation, and to build a three-dimensional block diagram for the target object in the side-looking orientation.

In an alternative embodiment, when the spatial diagram is a two-dimensional diagram, the 3D regression module 103 is further configured to calculate the coordinate of any one of the key points as the distance value of the target object;

when the spatial frame diagram is a three-dimensional frame diagram, the 3D regression module 103 is further configured to select the coordinate of the key point closest to the selected key point as the distance value of the target object.

In an alternative embodiment, the 3D regression module 103 is configured to determine the bisector or the pole as the keypoint when the two-dimensional block diagram is a circle, an ellipse, or a continuous curve.

In an alternative embodiment, the 3D regression module 103 is configured to determine the nearest keypoint to the vehicle at a common edge position of at least two frames.

In an optional embodiment, the target tracking module 104 is further configured to determine a motion trend of the target object according to the coordinate variation of the key point, and identify the target object in a subsequent video image based on the motion trend.

In an alternative embodiment, the target tracking module 104 is further configured to obtain a current vehicle driving state from an on-vehicle sensor, and combine the coordinate variation of the key point and the current vehicle driving state to determine the movement trend of the target object.

In an alternative embodiment, the target tracking module 104 describes the current vehicle driving state by using the change of the coordinate system with the geometric center of the vehicle as the origin.

In an optional embodiment, if the target object identified in the subsequent video image deviates from the motion trend, the target tracking module 104 is further configured to fuse the target object in the subsequent video image with the motion trend.

In an alternative embodiment, the target tracking module 104 is configured to calculate the possible coordinate locations of the key points in subsequent video images.

In an alternative embodiment, the video image captured by the video capture module 101 is a panoramic surround video image.

In an alternative embodiment, the video capture module 101 is configured to perform combination selection on the panoramic surround video image.

In an alternative embodiment, the video capture module 101 is used to capture night vision video images of the surroundings of the vehicle.

It should be noted that the implementation of each operation in fig. 8 may also correspond to the corresponding description of the method embodiment described above.

The present application further relates to an on-vehicle 3D object detection apparatus 200, please refer to fig. 9, which includes a processor 201, an input device 202, an output device 203, and a storage device 204, where the processor 201, the input device 202, the output device 203, and the storage device 204 are connected to each other, where the storage device 204 is configured to store a computer program, the computer program includes program instructions, and the processor 201 is configured to call the program instructions to execute the above 3D object detection method.

Specifically, the processor 201 calls the program instructions stored in the storage device 204 to perform the following operations:

collecting video images around a vehicle;

identifying a target object to be detected based on the video image;

The storage 204 may include a volatile memory device (volatile memory), such as a random-access memory (RAM); the storage device 204 may also include a non-volatile memory device (non-volatile memory), such as a flash memory device (flash memory), a solid-state drive (SSD), etc.; storage device 105 may also include a combination of storage devices of the sort described above.

The processor 201 may be a Central Processing Unit (CPU). The Processor 201 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In one embodiment, the processor 201 calls the program instructions stored in the storage device 204, and when extracting the key points from the target object and acquiring the position information of the key points, performs the following operations:

establishing a space block diagram for the identified target object;

and acquiring the position information of the key point.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to obtain the location information of the keypoints based on a global coordinate system with the geometric center of the vehicle outline structure as the origin.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to perform the following operations after extracting key points from the target object and acquiring the position information of the key points:

and calculating the distance value of the target object to the vehicle based on the position information of the key points.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to calculate a distance value between the key point and the vehicle based on the pre-stored vehicle outline dimensions when calculating the distance value between the target object and the vehicle based on the position information of the key point.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to create a two-dimensional block diagram for the target object in the viewing orientation when creating a spatial block diagram for the identified target object; and establishing a three-dimensional block diagram for the target object in the side-looking direction.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to perform the following operations when calculating the distance value of the target object to the vehicle based on the position information of the key point:

and if the space block diagram is a three-dimensional block diagram, selecting the coordinate of the key point closest to the space block diagram as the distance value of the target object.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to determine the bisector or pole as the keypoint when the two-dimensional block diagram is a circle, an ellipse, or a continuous curve.

In one embodiment, the processor 201 calls the program instructions stored in the storage device 204, and if the spatial diagram is a three-dimensional diagram, determines that the keypoint closest to the vehicle is located at a common edge position of at least two frames.

In one embodiment, the processor 201 calls the program instructions stored in the storage device 204, and when tracking the target based on the position information of the key point, the following operations are performed:

In one embodiment, the processor 201 calls the program instructions stored in the storage device 204, and when the movement trend of the target is judged according to the coordinate variation of the key point, the following operations are performed:

obtaining a current vehicle running state;

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to describe the current vehicle driving state as a change in a coordinate system with the geometric center of the vehicle as the origin.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to, upon identifying the target object in the subsequent video image based on the motion trend, perform the following:

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to calculate the possible coordinate locations of the key points in the subsequent video images when fusing the target object in the subsequent video images with the motion trend.

In one embodiment, the processor 201 invokes program instructions stored in the storage device 204 to perform the following operations when capturing video images of the surroundings of the vehicle:

and acquiring a panoramic surrounding video image around the vehicle as a video image.

In one embodiment, the processor 201 invokes program instructions stored in the storage 204 to perform the combined selection of the panoramic surround video images.

In one embodiment, the processor 201 invokes program instructions stored in the memory device 204 to invoke a night vision lens to capture video images of the surroundings of the vehicle.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above-described embodiments do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the above-described embodiments should be included in the protection scope of the technical solution.

Claims

1. A vehicle-mounted 3D target detection method is characterized by comprising the following steps:

collecting video images around a vehicle;

identifying a target object to be detected based on the video image;

2. The method according to claim 1, wherein the extracting key points from the target object and obtaining the position information of the key points comprises:

establishing a space block diagram for the identified target object;

and acquiring the position information of the key point.

3. The method of claim 2, wherein the obtaining the location information of the key point comprises:

and acquiring the position information of the key point based on a global coordinate system taking the geometric center of the vehicle outline structure as an origin.

4. The method according to claim 2, wherein the extracting key points from the target object and obtaining position information of the key points further comprises:

5. The method of claim 4, wherein the calculating a distance value between the target object and a vehicle based on the location information of the keypoints comprises:

6. The method of claim 4, wherein the establishing a spatial block diagram of the identified target object comprises:

7. The method of claim 6, wherein the calculating a distance value between the target object and a vehicle based on the location information of the keypoints comprises:

8. The method according to claim 7, wherein if the spatial frame is a two-dimensional frame, calculating the coordinates of any one of the key points as the distance value of the target object includes:

when the two-dimensional block diagram is a circle, an ellipse or a continuous curve, determining the bisector or the pole as the key point.

9. The method according to claim 7, wherein if the spatial frame is a three-dimensional frame, selecting coordinates of the key points closest to the vehicle from the key points to calculate the distance value of the target object comprises:

the key point closest to the vehicle is located at a common edge position of at least two picture frames.

10. The method of claim 1, wherein tracking the target object based on the location information of the keypoints comprises:

11. The method according to claim 10, wherein the determining the movement trend of the target according to the coordinate variation of the key point comprises:

obtaining a current vehicle running state;

12. The method of claim 11, wherein the current vehicle driving state comprises a change in a coordinate system with the geometric center of the vehicle as an origin.

13. The method of claim 10, wherein the tracking the target object based on the location information of the keypoints further comprises:

14. The method of claim 13, wherein said fusing the target object in the subsequent video image with the motion trend comprises:

the possible coordinate positions of the key points are calculated in subsequent video images.

15. The method of claim 1, wherein the video image of the surrounding of the captured vehicle is a panoramic surround video image.

16. The method of claim 15, wherein the panoramic surround video image is selected in combination to reduce the amount of computation for the video image.

17. The method of claim 1, wherein the video image of the surrounding of the captured vehicle is a night vision video image.

18. An on-vehicle 3D object detection system, comprising:

a target tracking module: for tracking the target object based on the location information of the keypoints.

19. An in-vehicle 3D object detection apparatus, comprising a processor, an input device, an output device, and a storage device, the processor, the input device, the output device, and the storage device being interconnected, wherein the storage device is configured to store a computer program, the computer program comprising program instructions, the processor being configured to invoke the program instructions to perform the in-vehicle 3D object detection method according to any one of claims 1-17.