CN116664971A

CN116664971A - Model training method, obstacle detection method and related equipment

Info

Publication number: CN116664971A
Application number: CN202310492486.4A
Authority: CN
Inventors: 贺克赛
Original assignee: Inceptio Star Intelligent Technology Shanghai Co Ltd
Current assignee: Inceptio Star Intelligent Technology Shanghai Co Ltd
Priority date: 2023-04-28
Filing date: 2023-04-28
Publication date: 2023-08-29

Abstract

The invention provides a model training method, an obstacle detection method and related equipment, comprising the following steps: calculating to obtain a first loss value according to a prediction frame corresponding to the non-occluded target, a labeling frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a labeling category in labeling data; calculating to obtain a second loss value according to the prediction frame corresponding to the blocked target, the updated marking frame of the blocked target and the marking frame of the non-blocked target, and the prediction category corresponding to the blocked target and the marking category in the marking data; and training and updating the pre-constructed target detection model according to the first loss value and the second loss value. The target detection model obtained through training has higher detection accuracy aiming at the target shielding condition, does not need additional labeling cost, and does not increase additional calculation amount.

Description

Model training method, obstacle detection method and related equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a model training method, an obstacle detection method, and related devices.

Background

Obstacle detection on the road of a vehicle is an important component in advanced driving assistance systems (Advanced Driver Assistance Systems, abbreviated as ADAS), which is critical to the safety of the entire ADAS system. In the course of vehicle travel, obstacles include passenger cars, vans, buses, pedestrians, animals, traffic cones, and the like. In an actual obstacle detection scene, the situation of shielding between the obstacles often occurs, so that an obstacle detection model fails, and then, false alarm and missing report of an ADAS system are caused, and a great safety problem is brought.

In view of the problem that the accuracy of detecting an obstacle is low in the case of shielding, there is a need for an obstacle detecting method capable of improving the detection accuracy.

Disclosure of Invention

The invention provides a model training method, an obstacle detection method and related equipment, which are used for solving the problem of lower obstacle detection accuracy under the shielding condition.

The invention provides a model training method, which comprises the following steps:

acquiring a training image and annotation data corresponding to the training image;

determining a marked frame of the blocked target and a marked frame of the non-blocked target according to the position relation between the marked frames in the marked data;

Updating the annotation frame of the blocked target according to the superposition part between the annotation frame of the blocked target and the annotation frame of the non-blocked target to obtain an updated annotation frame of the blocked target;

inputting the training image into a pre-constructed target detection model to obtain prediction data; the prediction data comprises a prediction frame and a prediction category corresponding to the blocked target and the non-blocked target;

calculating according to a prediction frame corresponding to the non-occluded target, a labeling frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a labeling category in labeling data to obtain a first loss value;

calculating to obtain a second loss value according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the annotation frame of the non-blocked target, the prediction category corresponding to the blocked target and the annotation category in the annotation data;

and carrying out parameter updating on the pre-constructed target detection model according to the first loss value and the second loss value, and stopping updating under the condition that the pre-constructed target detection model is converged so as to obtain a trained target detection model.

According to the model training method provided by the invention, the calculating to obtain the second loss value according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the annotation frame of the non-blocked target, the prediction category corresponding to the blocked target and the annotation category in the annotation data comprises the following steps:

calculating to obtain the center loss of the blocked target according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the prediction category corresponding to the blocked target and the annotation category in the annotation data;

calculating to obtain the shielding loss of the shielded target according to the central point corresponding to the predicted frame of the shielded target, the central point corresponding to the updated marking frame of the shielded target and the central point corresponding to the marking frame of the non-shielded target;

and obtaining a second loss value according to the central loss of the blocked target and the blocking loss of the blocked target.

According to the model training method provided by the invention, the method for calculating the shielding loss of the shielded target according to the central point corresponding to the predicted frame of the shielded target, the central point corresponding to the updated marking frame of the shielded target and the central point corresponding to the marking frame of the non-shielded target comprises the following steps:

Performing norm calculation on a central point corresponding to a predicted frame of the blocked target and a central point corresponding to an updated marking frame of the blocked target to obtain a first norm value;

performing norm calculation on the central point corresponding to the prediction frame of the blocked target and the central point corresponding to the labeling frame of the non-blocked target to obtain a second norm value;

and taking the difference value between the first norm value and the second norm value as the shielding loss of the shielded target.

According to the model training method provided by the invention, the center point corresponding to the updated annotation frame of the blocked target is obtained by the following method:

under the condition that a plurality of overlapping parts exist, updating the marking frame of the blocked target according to the largest overlapping part in the plurality of overlapping parts so as to obtain an updated marking frame of the blocked target;

under the condition that the updated marking frame of the shielding target is rectangular, taking the center of the updated marking frame of the shielding target as a center point corresponding to the updated marking frame of the shielded target;

and under the condition that the updated marking frame of the shielding target is not rectangular, acquiring a triangle from the updated marking frame of the shielding target through a diagonal line, acquiring an inscribed circle in the triangle, and taking the circle center of the inscribed circle as a center point corresponding to the updated marking frame of the shielding target.

According to the model training method provided by the invention, the method for determining the labeling frame of the blocked target and the labeling frame of the non-blocked target according to the position relation between the labeling frames in the labeling data comprises the following steps:

acquiring a plurality of annotation frames corresponding to the training image, wherein each annotation frame corresponds to a target;

under the condition that the mark frame is intersected with the mark frame, determining that a blocked target and a non-blocked target exist in the training image;

and determining the annotation frame of the blocked target and the annotation frame of the non-blocked target from the intersected annotation frames according to the distance between the annotation frames and the origins corresponding to the training images.

According to the model training method provided by the invention, the pre-constructed target detection model is used for extracting the characteristics of the training image, and the thermodynamic diagram, the target width and the center point offset are obtained by prediction based on the extracted characteristic diagram, so that the prediction data are obtained according to the thermodynamic diagram, the target width and the center point offset.

The invention also provides an obstacle detection method, which comprises the following steps:

acquiring an image to be detected around a vehicle;

inputting the image to be detected into a pre-trained target detection model to obtain a detection result of the obstacle; the obstacle comprises an occluded obstacle and a non-occluded obstacle, and the pre-trained target detection model is obtained by training according to the model training method.

The invention also provides a model training device, which comprises:

the training data acquisition module is used for acquiring training images and marking data corresponding to the training images;

the target determining module is used for determining the marked frame of the blocked target and the marked frame of the non-blocked target according to the position relation between the marked frames in the marked data;

the mark frame updating module is used for updating the mark frame of the blocked target according to the superposition part between the mark frame of the blocked target and the mark frame of the non-blocked target so as to obtain the updated mark frame of the blocked target;

the model prediction module is used for inputting the training image into a pre-constructed target detection model to obtain prediction data; the prediction data comprises a prediction frame and a prediction category corresponding to the blocked target and the non-blocked target;

the loss calculation module of the non-occluded target is used for calculating and obtaining a first loss value according to a prediction frame corresponding to the non-occluded target, a labeling frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a labeling category in labeling data;

The loss calculation module of the blocked target is used for calculating to obtain a second loss value according to the prediction frame corresponding to the blocked target, the updated marking frame of the blocked target, the marking frame of the non-blocked target, the prediction category corresponding to the blocked target and the marking category in the marking data;

and the model updating module is used for carrying out parameter updating on the pre-built target detection model according to the first loss value and the second loss value, and stopping updating under the condition that the pre-built target detection model is converged so as to obtain a trained target detection model.

The present invention also provides an obstacle detection device including:

the image acquisition module to be detected is used for acquiring images to be detected around the vehicle;

the obstacle detection module is used for inputting the image to be detected into a pre-trained target detection model to obtain a detection result of the obstacle; the obstacle comprises an occluded obstacle and a non-occluded obstacle, and the pre-trained target detection model is obtained by training according to the model training method.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model training method as described in any one of the above or the obstacle detection method as described above when executing the program.

The invention further provides a vehicle comprising the electronic equipment.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a model training method as described in any one of the above or an obstacle detection method as described above.

The invention provides a model training method, an obstacle detection method and related equipment, wherein the model training method calculates loss values of a blocked target and a non-blocked target respectively, and considers the influence of the non-blocked target on the blocked target in the calculation process of the loss values of the blocked target, so that a prediction frame of the blocked target is restrained from being far away from a labeling frame of other targets and being closer to an original labeling frame of the blocked target, and a target detection model obtained through final training has higher detection accuracy aiming at the situation that the targets are blocked or overlapped. In addition, the method and the device distinguish the blocked target from the unblocked target on the original labeling data, and do not increase any labeling cost. And the adjustment of the network layer is not carried out in the target detection model, but the calculation of the shielding loss is only increased, so that the extra calculation amount is not increased in the process of target detection through the trained target detection model.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an intersecting annotation box provided by an embodiment of the invention;

FIG. 3 is a schematic diagram of an updated annotation box for an occluded object according to an embodiment of the present invention;

FIG. 4 is a second schematic diagram of an updated annotation box for an occluded object according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of an obstacle detection method according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an obstacle detecting apparatus according to an embodiment of the present invention;

fig. 8 illustrates a physical structure diagram of an electronic device.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

FIG. 1 is a schematic flow chart of a model training method according to an embodiment of the present invention; as shown in fig. 1, the model training method includes the following steps:

s101, acquiring a training image and annotation data corresponding to the training image.

In this embodiment, the obtained training image is a training image under the target detection scene, and the corresponding annotation data includes the annotation frame and the category information of each target in the training image.

S102, determining the marked frame of the blocked target and the marked frame of the non-blocked target according to the position relation between the marked frames in the marked data.

In the step, whether each target in the training image is an occluded target or a non-occluded target is determined according to the position relation of the marking frame corresponding to each training image.

Specifically, if a plurality of labeling frames in the training image intersect (i.e., there is a coincident part in an image area surrounded by the labeling frames), then the target in the training image is blocked, and if no intersection occurs between all the labeling frames, then all the targets in the training image are non-blocked targets. Under the condition that the annotation frames are intersected, the annotation frame of the blocked target and the annotation frame of the non-blocked target are further determined according to the distance relation between the target corresponding to the annotation frame and the camera.

S103, updating the marked frame of the blocked target according to the overlapping part between the marked frame of the blocked target and the marked frame of the non-blocked target so as to obtain the updated marked frame of the blocked target.

In this step, the overlapping area between the labeling frame of the blocked target and the labeling frame of the non-blocked target is obtained first, and then the overlapping area is removed based on the area surrounded by the original labeling frame of the blocked target, so as to form the non-blocked image area of the blocked target, that is, the labeling frame corresponding to the non-blocked image area (that is, the labeling frame after updating of the blocked target) is obtained according to the original labeling frame of the blocked target and the labeling frame of the non-blocked target. When one object is blocked by a plurality of objects, that is, when the overlapping area has a plurality of objects, only the two label frames with the largest overlapping area are selected for the subsequent label frame updating process. Since the labeling frame in the labeling data is rectangular in general, the updated labeling frame is rectangular or hexagonal, and the center point is redetermined for the updated labeling frame, and the center point determining process is described in detail below. And in the case of only one overlapping area, one blocked target and one unblocked target can be clearly obtained.

Taking two intersecting marking frames as shown in fig. 2 as an example, a superposition portion exists between the marking frame 1 and the marking frame 2, and the marking frame of the blocked target is updated by using the superposition portion, so as to form an updated marking frame of the blocked target. It should be noted that, in general, the labeling frames in the target detection scene are rectangular, each labeling frame has an initial center point, the center point is the center point of the rectangular labeling frame, after the labeling frame of the blocked target is updated by using the overlapping portion, the updated labeling frame of the blocked target may be hexagonal or still be rectangular, and the center point changes.

S104, inputting the training image into a pre-constructed target detection model to obtain prediction data.

The prediction data comprises a prediction frame and a prediction category corresponding to the blocked target and the non-blocked target.

And extracting features of the training image by the pre-constructed target detection model, predicting thermodynamic diagrams, target widths and center point offsets based on the extracted feature diagrams, and further obtaining prediction data according to the thermodynamic diagrams, the target widths and the center point offsets. More specifically, the target detection model is built based on centrnet. It should be noted that, in the embodiment of the present invention, the target detection model is a model based on a central net, and in other embodiments of the present invention, the target detection model may be another model modified on the basis of a central net, so long as the modified central net retains the loss calculation related to the central point in the loss calculation process.

In this step, the updated annotation frame of the occluded object and the annotation frame of the non-occluded object form new annotation frame data, and the new annotation frame data and the original category information in the annotation data form new annotation data. And inputting the training image into a pre-constructed target detection model, and outputting corresponding prediction data by the pre-constructed target detection model, wherein the prediction data comprises a prediction frame and a prediction category of each target in the training image.

And based on the predicted data and the new labeling data, carrying out loss calculation on the blocked target and the non-blocked target simultaneously by utilizing different loss functions.

S105, calculating to obtain a first loss value according to the prediction frame corresponding to the non-occluded target, the labeling frame corresponding to the non-occluded target, and the prediction category corresponding to the non-occluded target and the labeling category in the labeling data.

In the step, for a non-occluded target, a first loss value is calculated by using a conventional loss function about corner points and center points in the central net based on a prediction frame corresponding to the non-occluded target, a label frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a label category in label data.

S106, calculating to obtain a second loss value according to the prediction frame corresponding to the blocked target, the updated marking frame of the blocked target, the marking frame of the non-blocked target, the prediction category corresponding to the blocked target and the marking category in the marking data.

In this step, for the blocked target, the loss calculation is performed on the prediction frame corresponding to the blocked target and the updated labeling frame of the blocked target, and the prediction category corresponding to the blocked target and the labeling category in the labeling data by using the conventional loss function about the corner point and the center point in the center net, and the calculation of the blocking loss is performed according to the prediction frame corresponding to the blocked target, the updated labeling frame of the blocked target, and the labeling frame of the non-blocked target, and the second loss value is obtained by combining the two loss values.

The purpose of the occlusion loss is to restrict the predicted frame of the occluded object from moving away from the annotation frame of the other object and closer to the original annotation frame of the occluded object.

And S107, updating parameters of the pre-constructed target detection model according to the first loss value and the second loss value, and stopping updating under the condition that the pre-constructed target detection model is converged so as to obtain a trained target detection model.

In this step, according to the loss value of the non-occluded target (i.e., the first loss value) and the loss value of the occluded target (i.e., the second loss value), updating the network parameters in the target detection model by using a back propagation algorithm, and stopping updating under the condition that the target detection model converges, so as to obtain a trained target detection model.

After the trained target detection model is obtained, the image in the target detection scene can be detected, and the trained target detection model has a better detection effect aiming at the condition that the targets are shielded or overlapped, namely, the shielded targets are provided with a more accurate prediction frame, and a more accurate prediction category is obtained based on the accurate prediction frame.

It should be noted that, the prediction frames of the blocked target and the unblocked target are rectangular frames, and the shape of the prediction frame is not changed due to the update of the labeling frame.

In addition, the model training method provided for the shielding problem can be applied to other scenes such as pedestrian detection in which shielding situations can occur, as well as the target detection scene.

According to the model training method provided by the embodiment of the invention, the loss value calculation is carried out on the blocked target and the non-blocked target respectively, and the influence of the non-blocked target on the blocked target is considered in the loss value calculation process of the blocked target, so that the prediction frame of the blocked target is restrained from being far away from the labeling frames of other targets and being closer to the original labeling frame of the blocked target, and the finally trained target detection model has higher detection accuracy for the situation that the targets are blocked or overlapped. In addition, the method and the device distinguish the blocked target from the unblocked target on the original labeling data, and do not increase any labeling cost. And the adjustment of the network layer is not carried out in the target detection model, but the calculation of the shielding loss is only increased, so that the extra calculation amount is not increased in the process of target detection through the trained target detection model.

Further, on the basis of the foregoing embodiment, the calculating to obtain the second loss value according to the prediction frame corresponding to the blocked target, the updated labeling frame of the blocked target, the labeling frame of the non-blocked target, and the prediction category corresponding to the blocked target and the labeling category in the labeling data includes:

and calculating the center loss of the blocked target according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the prediction category corresponding to the blocked target and the annotation category in the annotation data.

And calculating the shielding loss of the shielded target according to the central point corresponding to the predicted frame of the shielded target, the central point corresponding to the updated marking frame of the shielded target and the central point corresponding to the marking frame of the non-shielded target.

In this embodiment, the center loss l_center_2 calculation is performed using the existing loss function in center net:

L_center_2＝a*L_center_cls_2+b*L_reg_2+c*L_offset_2

where a, b, and c are weights, and l_center_cls_2 is Focal Loss, which is a Loss calculated from a thermodynamic diagram (i.e., a hetmap) generated based on the updated annotation frame, annotation class, and training image of the occluded object, and a thermodynamic diagram obtained by predicting the occluded object. L_reg_2 is L1 loss, which is a loss calculated according to the center point corresponding to the updated marking frame of the blocked target, the width and height marking information (obtained according to the updated marking frame of the blocked target), and the predicted height and width value (obtained according to the predicted frame of the blocked target). And L_offset_2 is a loss calculated according to the central point coordinate offset corresponding to the updated marking frame of the blocked target and the central point coordinate offset corresponding to the predicted frame of the blocked target.

Correspondingly, in the calculation process of the first loss value of the non-occluded object, the center loss calculation is performed by using the existing loss function in the center net, except that the used data are different, and the specific first loss value l_center_1 is calculated as follows:

L_center_1＝a*L_center_cls_1+b*L_reg_1+c*L_offset_1

where l_center_cls_1 is Focal Loss, which is a Loss calculated from a thermodynamic diagram (i.e., a hetmap) generated based on a labeling frame, a labeling category, and a training image of an unoccluded target, and a thermodynamic diagram predicted from the unoccluded target. L_reg_1 is L1loss, which is a loss calculated according to a center point corresponding to a marked frame of a non-occluded object, width and height marked information (obtained according to the marked frame of the non-occluded object), and a predicted height and width value (obtained according to a predicted frame of the non-occluded object). The l_offset_1 is a loss calculated according to the center point coordinate offset corresponding to the labeling frame of the non-occluded object and the center point coordinate offset corresponding to the prediction frame of the non-occluded object.

In addition, in the calculation of the loss of the blocked target, the calculation of the blocking loss is also performed by using the loss function designed by the present invention, specifically, the calculation of the center point corresponding to the predicted frame of the blocked target and the center point corresponding to the updated labeling frame of the blocked target, the center point corresponding to the predicted frame of the blocked target and the center point corresponding to the labeling frame of the non-blocked target may be performed by using the norm loss function (for example, a conventional L1 norm loss function, an L2 norm loss function, etc.), so as to obtain the blocking loss l_ occ.

More specifically, the occlusion loss is calculated as:

Further, the occlusion loss calculation is performed by using the L2 norm loss function, namely:

L_occ＝L2(center_pred-center_gt)-L2(center_pred-center_other)

where l_ occ is the occlusion loss, center_pred is the center point corresponding to the predicted frame of the occluded object, center_gt is the center point corresponding to the updated label frame of the occluded object, and center_other is the center point corresponding to the label frame of the non-occluded object.

When a case occurs in which one target is blocked by a plurality of targets (i.e., one blocked target and a plurality of non-blocked targets), only the non-blocked target and the blocked target with the largest blocking portion are selected to perform the above-described calculation of the blocking loss l_ occ. When multiple objects are occluded by one object (i.e., one non-occluded object and multiple occluded objects), then the above-described occlusion penalty L_ occ calculation is performed for each occluded object.

After obtaining the center loss l_center and the occlusion loss l_ occ, a loss value of the occluded object, that is, a second loss value, is further obtained.

According to the model training method provided by the embodiment of the invention, the shielding loss is designed for the shielded target, and the shielding loss is used for restraining the prediction frame of the shielded target from being far away from the labeling frames of other targets and being closer to the original labeling frame of the shielded target, so that the detection accuracy of the target detection model on the shielded target is improved.

Further, on the basis of the above embodiment, the center point corresponding to the updated annotation frame of the occluded object is obtained by the following method:

and under the condition that a plurality of overlapping parts exist, updating the annotation frame of the blocked target according to the largest overlapping part in the plurality of overlapping parts so as to obtain the updated annotation frame of the blocked target.

And under the condition that the updated marking frame of the shielding target is rectangular, taking the center of the updated marking frame of the shielding target as the center point corresponding to the updated marking frame of the shielded target.

In this embodiment, the mark frame after updating the blocked target is accurately obtained through the overlapping portion. Since the labeling frame in the labeling data is generally rectangular, the updated labeling frame may be rectangular as shown in fig. 3 or hexagonal as shown in fig. 4.

When the updated marking frame is rectangular as shown in fig. 3 (i.e. the shadow area left by the blocked target), the center of the updated marking frame is obtained as the center point corresponding to the updated marking frame of the blocked target.

When the updated marking frame is a hexagon as shown in fig. 4, a diagonal line is taken from the hexagon, an inscribed circle is obtained from a triangle formed by the diagonal line and two sides in the hexagon, and the center of the inscribed circle is taken as a center point corresponding to the updated marking frame of the blocked target.

As can be seen from fig. 4, the coordinates of the three points of the triangle enclosed are (x 2, y 2), (x 3, y 3), (x 4, y 4), and at this time, the center coordinates of the inscribed circle are obtained as follows: x= (x2+x3+x4)/3, y= (y2+y3+y4)/3.

It should be noted that, in the hexagon, a triangle can be obtained through a diagonal line in the hexagon under normal conditions, and because in the data labeling process, if the blocked part is greater than a preset threshold value (say, 1/2 of the original target), accurate labeling of the target cannot be achieved. Therefore, the case where the shielding portions of the marking frames are small is obtained, and in this context, a triangle can be obtained in a hexagon.

Further, on the basis of the foregoing embodiment, the determining, according to the positional relationship between the labeling frames in the labeling data, the labeling frame of the occluded target and the labeling frame of the non-occluded target includes:

and obtaining a plurality of annotation frames corresponding to the training image, wherein each annotation frame corresponds to a target.

Under the condition that the mark frame and the mark frame are intersected, determining that an occluded target and a non-occluded target exist in the training image.

In this embodiment, if the labeling frame intersects with the labeling frame (the labeling frame generally just surrounds the target), then the target corresponding to the intersected labeling frame is blocked. On the basis, which annotation frame is closer to the camera is further determined according to the distance between the annotation frame and the origin corresponding to the training image, the object closer to the camera is determined to be a non-occluded object, the object far from the camera is determined to be an occluded object, namely, the annotation frame closest to the origin corresponding to the training image in the intersected annotation frames is taken as the annotation frame of the non-occluded object, and the other rest of the annotation frames are taken as the annotation frames of the occluded object. If the annotation frames corresponding to the training images are not intersected, all the annotation frames are the annotation frames of the non-occluded targets.

It should be noted that, in fig. 2, a case where two marking frames intersect, in other embodiments of the present invention, a case where three, four, or other marking frames intersect may also occur, and in this case, a marking frame corresponding to a non-occluded object and a plurality of occluded objects may still be determined according to the distance between the marking frame and the origin.

The origin corresponding to the training image mentioned above is the origin of the coordinates of the image processing in the target detection scene, and the upper left corner of the training image is generally taken as the origin, and other positions may be selected.

Fig. 5 is a schematic flow chart of an obstacle detection method according to an embodiment of the present invention; as shown in fig. 5, the obstacle detection method includes the steps of:

s501, acquiring an image to be detected around a vehicle.

S502, inputting the image to be detected into a pre-trained target detection model to obtain a detection result of the obstacle; the obstacle comprises an occluded obstacle and a non-occluded obstacle, and the pre-trained target detection model is obtained by training according to the model training method.

In this embodiment, the trained target detection model is applied to the scene of obstacle detection in automatic driving, specifically, an image around the vehicle is acquired through a vehicle-mounted camera, and then the acquired image is input to the pre-trained target detection model for forward reasoning, so that a detection result is obtained.

More specifically, the training image is a surrounding image of the vehicle, the corresponding labeling data is a labeling frame of an obstacle and a class of the obstacle (such as a passenger car, a minibus, a bus, a pedestrian, an animal, a traffic cone and the like) in the image, and the target detection model is obtained by training by using the model training method based on the training image and the labeling data.

The target detection model performs feature extraction on the image to be detected to obtain a feature map, and the feature map is respectively input into a thermodynamic diagram prediction branch, a target height and width prediction branch and a center point offset prediction branch of the target detection model, so that the thermodynamic diagram, the target width and the center point offset are obtained. And obtaining a prediction frame and a prediction category of each obstacle according to the thermodynamic diagram, the target width and the central point offset. Wherein the obstacle comprises a shielded obstacle and an unoccluded obstacle.

According to the obstacle detection method provided by the embodiment of the invention, the object detection model is utilized to detect the blocked obstacle and the non-blocked obstacle, so that a more accurate detection result is obtained, and the safety of vehicle control is further improved.

The model training device provided by the invention is described below, and the model training device described below and the model training method described above can be referred to correspondingly.

FIG. 6 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention; as shown in fig. 6, the model training apparatus includes a training data acquisition module 601, a target determination module 602, a labeling frame update module 603, a model prediction module 604, a loss calculation module 605 of an unoccluded target, a loss calculation module 606 of an occluded target, and a model update module 607.

The training data obtaining module 601 is configured to obtain a training image and label data corresponding to the training image.

The target determining module 602 is configured to determine a labeling frame of an occluded target and a labeling frame of a non-occluded target according to a positional relationship between labeling frames in the labeling data.

In the module, whether each target in the training image is a blocked target or a non-blocked target is determined according to the position relation of the marking frame corresponding to each training image.

The annotation frame updating module 603 is configured to update the annotation frame of the occluded target according to the overlapping portion between the annotation frame of the occluded target and the annotation frame of the non-occluded target, so as to obtain an updated annotation frame of the occluded target.

In the module, the overlapping area between the marked frame of the blocked target and the marked frame of the non-blocked target is firstly obtained, and then the overlapping area is removed on the basis of the area surrounded by the original marked frame of the blocked target, so as to form the non-blocked image area of the blocked target, namely, the marked frame corresponding to the non-blocked image area (namely, the marked frame after updating of the blocked target) is obtained according to the original marked frame of the blocked target and the marked frame of the non-blocked target.

Taking two intersecting marking frames as shown in fig. 2 as an example, a superposition portion exists between the marking frame 1 and the marking frame 2, and the marking frame of the blocked target is updated by using the superposition portion, so as to form an updated marking frame of the blocked target. In general, the labeling frame in the target detection scene is rectangular, and after updating the labeling frame of the blocked target by using the overlapping portion, the updated labeling frame of the blocked target may be hexagonal or still rectangular.

The model prediction module 604 is configured to input the training image into a pre-constructed target detection model, and obtain prediction data.

In the module, the updated annotation frame of the occluded target and the annotation frame of the non-occluded target form new annotation frame data, and the new annotation frame data and the original category information in the annotation data form new annotation data. And inputting the training image into a pre-constructed target detection model, and outputting corresponding prediction data by the pre-constructed target detection model, wherein the prediction data comprises a prediction frame and a prediction category of each target in the training image.

The loss calculation module 605 of the non-occluded target is configured to calculate and obtain a first loss value according to the prediction frame corresponding to the non-occluded target, the labeling frame corresponding to the non-occluded target, and the prediction category corresponding to the non-occluded target and the labeling category in the labeling data.

In the module, for a non-occluded target, a first loss value is calculated by using a conventional loss function about corner points and center points in the CenterNet based on a prediction frame corresponding to the non-occluded target, a label frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a label category in label data.

The loss calculation module 606 of the blocked target is configured to calculate a second loss value according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the annotation frame of the non-blocked target, and the prediction category corresponding to the blocked target and the annotation category in the annotation data.

In the module, for the blocked target, a conventional loss function about corner points and center points in the CenterNet is utilized to calculate losses of a prediction frame corresponding to the blocked target, a label frame after updating of the blocked target, a prediction category corresponding to the blocked target and a label category in label data, and calculate blocking losses according to the prediction frame corresponding to the blocked target, the label frame after updating of the blocked target and the label frame of the non-blocked target, and a second loss value is obtained by combining two loss values.

The model updating module 607 is configured to update parameters of the pre-constructed target detection model according to the first loss value and the second loss value, and stop updating when the pre-constructed target detection model converges, so as to obtain a trained target detection model;

in the module, according to the loss value (namely the first loss value) of the non-occluded target and the loss value (namely the second loss value) of the occluded target, updating network parameters in the target detection model through a back propagation algorithm, and stopping updating under the condition that the target detection model is converged, so as to obtain the trained target detection model.

According to the model training device provided by the embodiment of the invention, the loss value calculation is carried out on the blocked target and the non-blocked target respectively, and the influence of the non-blocked target on the blocked target is considered in the loss value calculation process of the blocked target, so that the prediction frame of the blocked target is restrained from being far away from the labeling frames of other targets and being closer to the original labeling frame of the blocked target, and the finally trained target detection model has higher detection accuracy for the situation that the targets are blocked or overlapped. In addition, the method and the device distinguish the blocked target from the unblocked target on the original labeling data, and do not increase any labeling cost. And the adjustment of the network layer is not carried out in the target detection model, but the calculation of the shielding loss is only increased, so that the extra calculation amount is not increased in the process of target detection through the trained target detection model.

Fig. 7 is a schematic structural diagram of an obstacle detecting apparatus according to an embodiment of the present invention; as shown in fig. 7, the obstacle detection device includes an image acquisition module 701 to be detected and an obstacle detection module 702.

The image to be detected acquisition module 701 is configured to acquire an image to be detected around the vehicle.

The obstacle detection module 702 is configured to input the image to be detected into a pre-trained target detection model, and obtain a detection result of an obstacle; the obstacle comprises an occluded obstacle and a non-occluded obstacle, and the pre-trained target detection model is obtained by training according to the model training method.

According to the obstacle detection device provided by the embodiment of the invention, the object detection model is utilized to detect the blocked obstacle and the non-blocked obstacle, so that a more accurate detection result is obtained, and the safety of vehicle control is further improved.

Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810 (processor), communication interface 820 (Communications Interface), memory 830 (memory) and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. Processor 810 may invoke logic instructions in memory 830 to perform the model training method provided above, the method comprising: acquiring a training image and annotation data corresponding to the training image; determining a marked frame of the blocked target and a marked frame of the non-blocked target according to the position relation between the marked frames in the marked data; updating the annotation frame of the blocked target according to the superposition part between the annotation frame of the blocked target and the annotation frame of the non-blocked target to obtain an updated annotation frame of the blocked target; inputting the training image into a pre-constructed target detection model to obtain prediction data; the prediction data comprises a prediction frame and a prediction category corresponding to the blocked target and the non-blocked target; calculating according to a prediction frame corresponding to the non-occluded target, a labeling frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a labeling category in labeling data to obtain a first loss value; calculating to obtain a second loss value according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the annotation frame of the non-blocked target, the prediction category corresponding to the blocked target and the annotation category in the annotation data; and carrying out parameter updating on the pre-constructed target detection model according to the first loss value and the second loss value, and stopping updating under the condition that the pre-constructed target detection model is converged so as to obtain a trained target detection model.

Or to perform an obstacle detection method comprising: acquiring an image to be detected around a vehicle; inputting the image to be detected into a pre-trained target detection model to obtain a detection result of the obstacle; the obstacle comprises an occluded obstacle and a non-occluded obstacle, and the pre-trained target detection model is obtained by training according to the model training method.

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model training method provided above, the method comprising: acquiring a training image and annotation data corresponding to the training image; determining a marked frame of the blocked target and a marked frame of the non-blocked target according to the position relation between the marked frames in the marked data; updating the annotation frame of the blocked target according to the superposition part between the annotation frame of the blocked target and the annotation frame of the non-blocked target to obtain an updated annotation frame of the blocked target; inputting the training image into a pre-constructed target detection model to obtain prediction data; the prediction data comprises a prediction frame and a prediction category corresponding to the blocked target and the non-blocked target; calculating according to a prediction frame corresponding to the non-occluded target, a labeling frame corresponding to the non-occluded target, a prediction category corresponding to the non-occluded target and a labeling category in labeling data to obtain a first loss value; calculating to obtain a second loss value according to the prediction frame corresponding to the blocked target, the updated annotation frame of the blocked target, the annotation frame of the non-blocked target, the prediction category corresponding to the blocked target and the annotation category in the annotation data; and carrying out parameter updating on the pre-constructed target detection model according to the first loss value and the second loss value, and stopping updating under the condition that the pre-constructed target detection model is converged so as to obtain a trained target detection model.

In another aspect, an embodiment of the present invention further provides a vehicle, including: the electronic device provided in the foregoing embodiment. The implementation principle and the generated technical effects of the vehicle provided by the embodiment of the invention are the same as those of the foregoing method embodiment, and are not described herein again.

In another aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of performing the model training method provided above or to perform an obstacle detection method.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of model training, comprising:

2. The model training method according to claim 1, wherein the calculating to obtain the second loss value according to the prediction frame corresponding to the blocked target, the updated labeling frame of the blocked target, the labeling frame of the non-blocked target, and the prediction category corresponding to the blocked target and the labeling category in the labeling data includes:

3. The model training method according to claim 2, wherein the calculating to obtain the occlusion loss of the occluded object according to the center point corresponding to the prediction frame of the occluded object, the center point corresponding to the updated labeling frame of the occluded object, and the center point corresponding to the labeling frame of the non-occluded object includes:

4. The model training method according to claim 2, wherein the center point corresponding to the updated annotation frame of the occluded object is obtained by:

5. The method according to claim 1, wherein determining the labeling frame of the occluded object and the labeling frame of the non-occluded object according to the positional relationship between the labeling frames in the labeling data comprises:

6. The model training method according to any one of claims 1 to 5, wherein the pre-constructed target detection model performs feature extraction on the training image, predicts a thermodynamic diagram, a target width and a center point offset based on the extracted feature diagram, and further obtains prediction data according to the thermodynamic diagram, the target width and the center point offset.

7. An obstacle detection method, comprising:

acquiring an image to be detected around a vehicle;

inputting the image to be detected into a pre-trained target detection model to obtain a detection result of the obstacle; the obstacle comprises an occluded obstacle and a non-occluded obstacle, and the pre-trained target detection model is obtained by training a model training method according to any one of claims 1-6.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the model training method according to any one of claims 1-6 or the obstacle detection method according to claim 7 when executing the program.

9. A vehicle comprising the electronic device of claim 8.

10. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the model training method of any of claims 1-6 or the obstacle detection method of claim 7.