CN113469025A

CN113469025A - Target detection method and device applied to vehicle-road cooperation, road side equipment and vehicle

Info

Publication number: CN113469025A
Application number: CN202110731377.4A
Authority: CN
Inventors: 夏春龙
Original assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Current assignee: Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2021-10-01
Anticipated expiration: 2041-06-29
Also published as: CN113469025B

Abstract

The disclosure provides a training method of a target detection model, a method, a device, equipment and a storage medium for detecting a target object, and relates to the field of artificial intelligence, in particular to the technical field of computer vision and the technical field of intelligent transportation. The specific implementation scheme of the training method of the target detection model is as follows: inputting sample data into a target detection model to obtain first detection data aiming at the sample data; wherein the first detection data indicates a predicted box for the first target object in sample data, the sample data having a tag indicating a real box for the first target object; determining the intersection ratio between the prediction frame and the real frame; determining a weight of a regression loss function in the predetermined loss functions based on the intersection ratio; and training the target detection model based on the weights, the prediction box, the real box and the predetermined loss function.

Description

Target detection method and device applied to vehicle-road cooperation, road side equipment and vehicle

Technical Field

The present disclosure relates to the field of artificial intelligence, specifically to the field of computer vision technology and the field of intelligent traffic technology, and more specifically to a training method for a target detection model, and a method, an apparatus, a device, and a storage medium for detecting a target object.

Background

Computer vision technology is increasingly being used in industrial scenarios, such as the vehicle-road cooperation V2X application scenario. As the basis of the computer vision technology, the target detection technology can solve the problems of time and labor consumption by utilizing a manual traditional mode, so that the method has a very wide application prospect.

In an anchor-based (anchor) target detection method, different loss coefficients are generally set for a foreground candidate frame and a background candidate frame, and a uniform loss coefficient is set for the foreground candidate frame.

Disclosure of Invention

Provided are a training method of a target detection model capable of improving model accuracy, and a method, an apparatus, a device, and a storage medium for detecting a target object.

According to a first aspect, there is provided a training method of an object detection model, comprising: inputting sample data into a target detection model to obtain first detection data aiming at the sample data; wherein the first detection data indicates a predicted box for the first target object in sample data, the sample data having a tag indicating a real box for the first target object; determining the intersection ratio between the prediction frame and the real frame; determining a weight of a regression loss function in the predetermined loss functions based on the intersection ratio; and training the target detection model based on the weights, the prediction box, the real box and the predetermined loss function.

According to a second aspect, there is provided a method of detecting a target object, comprising: taking an image to be detected as the input of a target detection model, and obtaining second detection data aiming at the image to be detected; and obtaining the position of a second target object in the image to be detected based on second detection data, wherein the target detection model is obtained by adopting the training method of the target detection model.

According to a third aspect, there is provided a training apparatus for an object detection model, comprising: the first data acquisition module is used for inputting the sample data into the target detection model and acquiring first detection data aiming at the sample data; wherein the first detection data indicates a predicted box for the first target object in sample data, the sample data having a tag indicating a real box for the first target object; the intersection ratio determining module is used for determining the intersection ratio between the prediction frame and the real frame; a weight determination module for determining a weight of a regression loss function in the predetermined loss functions based on the intersection ratio; and the model training module is used for training the target detection model based on the weight, the prediction frame, the real frame and the predetermined loss function.

According to a fourth aspect, there is provided an apparatus for detecting a target object, comprising: the second data acquisition module is used for taking the image to be detected as the input of the target detection model and acquiring second detection data aiming at the image to be detected; and a position obtaining module for obtaining a position of a second target object in the image to be detected based on the second detection data. The target detection model is obtained by training by adopting the training device of the target detection model.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a target detection model and/or a method of detecting a target object provided by the present disclosure.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of training a target detection model and/or the method of detecting a target object provided by the present disclosure.

According to a seventh aspect, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the method of training a target detection model and/or the method of detecting a target object provided by the present disclosure.

According to an eighth aspect, there is provided a roadside apparatus including the foregoing electronic apparatus.

According to a ninth aspect, a cloud control platform is provided, which comprises the electronic device.

According to a tenth aspect, there is provided an autonomous vehicle comprising the aforementioned electronic device.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a method and apparatus for training a target detection model and detecting a target object according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a method of training a target detection model according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of training a target detection model according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram of a method of detecting a target object according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a training apparatus for an object detection model according to an embodiment of the present disclosure;

fig. 6 is a block diagram of a structure of an apparatus for detecting a target object according to an embodiment of the present disclosure; and

fig. 7 is a block diagram of an electronic device for implementing a method of training a target detection model and/or a method of detecting a target object of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a training method of a target detection model, which includes a data obtaining stage, an intersection-to-parallel ratio determining stage, a weight determining stage and a model training stage. In the data obtaining stage, sample data is input into the target detection model, and first detection data aiming at the sample data is obtained. In the intersection ratio determining stage, the intersection ratio between the prediction frame and the real frame is determined. In the weight determination stage, the weight of the regression loss function in the predetermined loss functions is determined based on the intersection ratio. In the model training phase, the target detection model is trained based on the weights, the prediction box, the real box and a predetermined loss function.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

Fig. 1 is an application scenario diagram of a method and an apparatus for training a target detection model and detecting a target object according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include, for example, an image capture device 110, a server 120, and a terminal device 130. The image capturing device 110 may upload the captured image to the server 120 through a network, and the terminal device 130 may be communicatively connected to the server 120 through the network. The network may include wired or wireless communication links.

According to an embodiment of the present disclosure, the image capturing device 110 may be, for example, a camera device, which may be disposed at the road side of the road 140 to capture an image reflecting the road condition. Alternatively, the image pickup apparatus may be provided in a vehicle to pick up an image of the surroundings of the vehicle while the vehicle is traveling on a road. The image capture device 110 may, for example, upload the captured image to the server 120 in real time, where the image is stored or processed by the server 120.

According to an embodiment of the present disclosure, the server 120 may store a huge amount of images, or may store images uploaded by the image capturing device 110 into a database periodically, for example. The server 120 may train a pre-constructed target detection model according to the labeled image, for example, the trained target detection model 150 may be sent to the terminal device in response to a request from the terminal device 130.

In an embodiment, the server 120 may be a background management server that provides support for the running of client applications. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

According to the embodiment of the disclosure, the terminal device 130 may interact with the server 120 via the network in response to a user operation, so as to obtain the trained target detection model and image from the server 120, detect a target object in the image by using the target detection model, and display a detection result to the user. The terminal device 130 may be an electronic device having a display screen and a processing function, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.

In one embodiment, the terminal device 130 may be integrated into a vehicle control system of a vehicle, for example.

In an embodiment, the terminal device 130 may detect the target object by using a trained target detection model, for example, so as to obtain the position of the target object. The location of the target object may be applied, for example, in downstream applications. The downstream application may be, for example, an application of a vehicle-road cooperative task such as missing detection and travel track prediction.

In an embodiment, the server 120 may also perform target detection on the image uploaded by the image acquisition device 110 in real time by using a trained target detection model, and feed back the detection result to the terminal device 130, so that the terminal device 130 displays the detection result.

According to an embodiment of the present disclosure, the target detection model may detect the target object by using a detection method based on a Key point (Key point), or may detect the target object by using a detection method based on an anchor point (anchor). The detection method based on the anchor point adds prior distribution, for example, a group of fixed reference frames with different scales and different positions are preset to cover almost all positions and scales, and each reference frame is responsible for detecting the target with the intersection ratio larger than a threshold value. The model based on the method is relatively stable in training, and when anchor distribution is relatively dense, the detection of small targets can be effectively improved. The detection method based on the key points represents each target through a pair of corner points, the calculation amount is small, and a larger and more flexible solution space is provided. For example, an anchor-based target detection method is generally used in industry to construct a target detection model, so as to improve the detection rate of the target.

It should be noted that the training method of the target detection model provided in the present disclosure may be executed by the server 120, and the method of detecting the target object may be executed by the server 120 or the terminal device 130. Accordingly, the training apparatus of the target detection model provided by the present disclosure may be disposed in the server 120, and the apparatus for detecting the target object may be disposed in the server 120 or in the terminal device 130.

It should be understood that the types and numbers of image capturing devices, servers, and terminal devices in fig. 1 are merely illustrative. There may be any type and number of image capture devices, servers, and terminal devices, as desired for implementation.

The following describes in detail the training method of the target detection model provided by the present disclosure with reference to fig. 2 to 3 in conjunction with the application scenario described in fig. 1.

Fig. 2 is a flow chart diagram of a training method of an object detection model according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the target detection model of this embodiment may include operations S210 to S240.

In operation S210, sample data is input into the target detection model, and first detection data for the sample data is obtained.

According to the embodiment of the disclosure, the target detection model can be constructed based on the anchor target detection method, for example. The target detection model may be a model based on a first-order processing method represented by a single Look-over detector (YOLO), or may be a model based on a second-order processing method represented by a Region convolutional neural network (R-CNN). The difference between the second-order processing method and the first-order processing method is that the second-order processing method firstly adopts a Region candidate Network (Region pro-social Network) to distinguish the front and the back of the background so as to ease the distribution of positive and negative samples. Then, a fine target prediction is performed on the effective foreground Region (ROI). For example, a plurality of prior frames (anchor boxes) may be preset, and the plurality of prior frames may be divided into a foreground frame and a background frame via the area candidate network. If a first order processing method is used, information indicating a foreground frame or a background frame needs to be provided in a plurality of prior frames. The background frame refers to a prior frame of which the intersection ratio with the real frame is smaller than a preset threshold, and the foreground frame refers to a prior frame of which the intersection ratio with the real frame is larger than or equal to the preset threshold.

In an embodiment, the target detection model may be used only to predict the position of a target object in the sample data. Alternatively, the target detection model may be used not only for the location of the target object in the sample data, but also for predicting the class of the target object in the sample data. For example, the categories of the target object may include: automobiles, bicycles, motorcycles, pedestrians, and the like. The category of the target object may be set according to an application scenario, which is not limited in this disclosure.

According to an embodiment of the present disclosure, the sample data has a label indicating a real box for the first target object in the sample image. Specifically, the tag has coordinate values of a center point of a real frame, and a height and a width of the real frame. It is to be understood that this label is provided by way of example only to facilitate an understanding of the present disclosure, and is not intended to be limiting. In an embodiment, the tag may also indicate a real box by an offset with respect to an object.

The embodiment may input a sample image in the sample data into the target detection model, and output the first detection data by the target detection model. The first detection data includes a prediction box for the first target object. Specifically, the position information of the prediction frame, for example, the prediction coordinate value of the center point of the prediction frame, the prediction height and the prediction width of the prediction frame may be included.

It should be understood that each of the coordinate values may be a coordinate value in a two-dimensional coordinate system established with the center point of the sample image as an origin, or may be a coordinate value in a two-dimensional coordinate system established with the point at the upper left corner of the sample image as an origin.

In operation S220, an intersection ratio between the prediction box and the real box is determined.

For example, an overlapping portion where the prediction frame and the real frame overlap with each other may be determined based on the position information of the prediction frame and the position information of the real frame, and an area of the overlapping portion may be calculated as an intersection area between the prediction frame and the real frame. The sum of the area of the prediction box and the area of the real box is then determined as the area sum, and the difference between the area sum and the area of the overlap portion is determined as the union area. Finally, the ratio of the Intersection area to the Union area is used as the Intersection-over-Union ratio between the predicted box and the real box (IoU). For example, if the prediction frame and the real frame are completely overlapped, the intersection ratio takes a value of 1. For example, the intersection ratio can be expressed by the following formula:

the cross-over ratio is used. When the sample data is multiple, bi is the prediction frame of the ith sample data in the multiple sample data,

is a real box of the ith sample data.

In operation S230, a weight of a regression loss function among the predetermined loss functions is determined based on the intersection ratio.

According to an embodiment of the present disclosure, the regression loss function may be, for example, a mean square error function, a mean absolute error function, or a smoothed mean absolute error function (Huber loss function), etc. The weight of the regression loss function in this embodiment may be represented by a weight function with an intersection ratio as an argument.

In an embodiment, the regression loss function may be, for example, a Smooth L1 function, where a value of the function is 0 in a case where a prior frame corresponding to the prediction frame is a background frame, and a value of the function may be represented by the following formula in a case where a prior frame corresponding to the prediction frame is a foreground frame:

wherein L is_regTaking the value of the regression loss function, t_iIs a vector representing the position of the prediction box in the ith sample data,

is a vector representing the position of the real box corresponding to the predicted box in the ith sample data. E.g. t_iAnd

the following formulas can be respectively adopted:

wherein (x)_i，y_i) Is the coordinate value of the center point of the prediction box in the ith sample data, w_i、h_iRespectively, the width and height of the prediction box in the ith sample data.

Is the coordinate value of the center point of the real box corresponding to the predicted box in the ith sample data,

respectively, the width and height of the real box corresponding to the predicted box in the ith sample data. (x)_pi，y_pi) Is the coordinate value of the center point of the prior frame corresponding to the predicted frame in the ith sample data, w_pi、h_piRespectively, the width and height of the prior box corresponding to the prediction box in the ith sample data.It is understood that the regression loss function of this embodiment is similar to the principle of calculating the loss of distance in the fast region convolution neural network model, and is not described herein again. Wherein, the prior frame corresponding to the prediction frame is: the prediction box may be obtained by adjusting the corresponding prior box.

For example, the weight of the regression loss function may be positively correlated with the cross-over ratio, thereby allowing the model to focus on the power of certain examples. Or the weight of the regression loss function can be negatively correlated with the cross-over ratio, so that the model can provide a larger loss penalty for the sample with low prediction accuracy, and the detection accuracy of the trained model on various types of targets is improved.

According to embodiments of the present disclosure, when the weight is inversely related to the cross-to-parallel ratio, the weight may be inversely proportional to the cross-to-parallel ratio, for example. Alternatively, the weight may be an exponential function of the cross-over ratio, with the exponential function being based on the cross-over ratio and exponential with a negative number. For example, the negative correlation of the cross-over ratio with the weight can be represented by the following formula:

wherein,

for weight, x is the cross-over ratio and γ is a constant less than zero. The value of the constant can be set according to actual requirements, which is not limited by the disclosure. For example, the constant may be-1.

According to an embodiment of the present disclosure, in a case where the weight and the intersection ratio are negatively related to each other, when determining that the weight of the regression loss function in the predetermined loss functions is included based on the intersection ratio, it may be determined whether the intersection ratio is zero or not first. And if the weight of the regression loss function is zero, determining the weight of the regression loss function to be zero. Otherwise, determining the weight based on the negative correlation relationship between the intersection ratio and the weight. In this way, it is possible to avoid the situation where the denominator part of the weight function takes a value of zero (the weight is inversely proportional to the cross-over ratio) because the cross-over ratio is zero. This is because if the intersection ratio between the prediction box and the real box is zero, the prior box on which the prediction box is based is usually the background box.

According to an embodiment of the present disclosure, in a case where the weight and the intersection ratio are negatively related to each other, when determining that the weight of the regression loss function in the predetermined loss function is included based on the intersection ratio, it may be determined whether or not the intersection ratio between the prior frame on which the prediction frame is based and the real frame is equal to or greater than a predetermined threshold value. And if the weight is larger than or equal to the preset threshold, determining the weight based on the negative correlation relationship between the intersection ratio and the weight. And if the weight is smaller than the preset threshold value, determining the weight of the regression loss function to be zero.

In operation S240, the target detection model is trained based on the weights, the prediction box, the real box, and the predetermined loss function.

According to the embodiment of the present disclosure, the value of the predetermined loss function may be determined according to the weight, the position information of the prediction box, and the position information of the real box. And then, adjusting parameters in the target detection model by adopting a gradient descent algorithm or a back propagation algorithm to train the target detection model.

For example, when the target detection model is only used for predicting the position information of the target, the embodiment may first determine the weighted value of the regression loss function based on the weight, the prediction box and the real box. And then based on the weighted value, adjusting parameters in the model by adopting the gradient descent algorithm or the back propagation algorithm. When determining the weighted value, the value of the regression loss function may be determined according to the prediction frame and the real frame, and then the product of the value and the weight is used as the weighted value. In the case that the regression Loss function is the Loss L1 Loss function, the background frame and the foreground frame may be determined according to the intersection and parallel ratio between the prior frame and the real frame, and the weighted value of the regression Loss function may be determined according to the prediction frame and the real frame only for the prediction frame obtained based on the foreground frame.

According to the embodiment of the disclosure, when N sample data are used in a single training process, the weighting value can be expressed by the following formula:

wherein,

i.e. a weight determined based on the intersection ratio between the prediction box and the real box,

i.e. the aforementioned smoothen L1 function.

It will be appreciated that for each sample data, multiple prediction boxes may be derived, and that this embodiment may derive a weight for each prediction box. And a weighted value is obtained for each prediction box. In this embodiment, the sum of the plurality of weighted values obtained for the plurality of prediction boxes may be used as the weighted value for each sample data.

In summary, it can be seen from the analysis that, in the embodiment of the present disclosure, the weight of the regression loss function is determined according to the intersection ratio between the prediction frame and the real frame, so that the target detection model obtained through training can meet the actual requirement more accurately.

Furthermore, in the embodiment of the present disclosure, by setting the weight of the regression loss function and the cross-to-parallel ratio as a negative correlation incidence relation, the regression loss can be given a larger penalty to the prediction frame having a larger difference from the real frame, so that the training of the target detection model can consider different regression difficulties of different prior frames, effectively balance the regression difficulties of different targets, and thus improve the performance and the training efficiency of the target detection model obtained by training.

FIG. 3 is a schematic diagram of training a target detection model according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the target detection model may also be used for class prediction of objects in a prediction box, for example. Accordingly, the label of the aforementioned sample data may also indicate the true category of the first target object in the sample image. The first detection data output by the aforementioned target detection model may also indicate a prediction category of the first target object. For example, the first detection data may include probabilities of the first target object with respect to a plurality of predetermined categories, and a category corresponding to the maximum probability value may be used as a prediction probability of the first target object.

In this embodiment, the predetermined loss function may further include a classification loss function, whereby both classification loss and regression loss are considered when training the object detection model.

As shown in fig. 3, in this embodiment 300, when training the target detection model, the sample data 310 may be used as an input of the target detection model 320, and the target detection model 320 outputs the first detection data 330. The first detection data may be a one-dimensional vector including position information of a prediction box and a prediction probability that an object in the prediction box is in each of a plurality of predetermined classes. The prediction box 331 is obtained from the position information of the prediction box, and the prediction type 332 of the object in the prediction box is obtained from the probability. After the prediction box 331 is obtained, the intersection ratio between the real box 311 indicated by the label in the sample data 310 and the prediction box 331 may be determined by a similar method as described above, and the weighted value 340 of the regression loss function is obtained based on the weight determined according to the intersection ratio. Meanwhile, the value of the classification loss function may be determined as the first value 350 based on the prediction class 332 and the true class 312.

After the first value 350 and the weighted value 340 are obtained, a value of the predetermined loss function may be determined based on the weighted value 340 and the first value 350, and taken as a second value 360. The target detection model 320 is then trained based on the second value 360.

The classification loss function may be, for example, a cross entropy loss function or a Hinge (Hinge) loss function. Taking the cross entropy loss function as an example, the value of the classification loss function can be represented by the following formula:

wherein L is_clsFor the classification loss function, c is a pluralityAny one of the predetermined categories.

The true probability of the ith sample data in the N sample data to a predetermined class c is determined, and if the class indicated by the label of the ith sample data is the class c, the ith sample data is subjected to true probability determination

Is 1, otherwise this is true

Is 0. p is a radical of_i(c) The prediction probability that the class of the object in the prediction box is the class c.

According to an embodiment of the present disclosure, the sum of the weighted value 340 and the first value 350 may be taken as the value of the predetermined loss function. Alternatively, weights may be assigned to the regression loss function and the classification loss function in advance, and the embodiment may determine a weighted sum of the weighted value and the first value based on a first predetermined weight for the regression loss function and a second predetermined weight for the classification loss function, and use the weighted sum as the value of the predetermined loss function. For example, the predetermined loss function may be expressed using the following formula:

L＝αL_cls+βL_reg。

wherein L is a predetermined loss function, L_clsAs a function of classification loss, L_regIs a weighted regression loss function, i.e. the product of the aforementioned determined weight based on the cross-over ratio and the regression loss function (L_regThe value of (a) is the aforementioned weighted value). α is the second predetermined weight and β is the first predetermined weight.

According to the embodiment, the target detection model is trained by comprehensively considering regression loss and classification loss, so that the target detection model can learn the position characteristics and the class characteristics at the same time, the accuracy of predicting the position of the target object can be improved, and the accuracy of predicting the class of the target object can be improved.

Based on the training method of the target detection model described above, the present disclosure also provides a method of detecting a target object. The method of detecting the target object will be described below with reference to fig. 4.

Fig. 4 is a flowchart illustrating a method of detecting a target object according to an embodiment of the present disclosure.

As shown in fig. 4, the method 400 of detecting a target object of this embodiment may include operations S410 to S420.

In operation S410, second detection data for the image to be detected is obtained with the image to be detected as an input of the target detection model. The method of obtaining the second detection data is similar to the aforementioned method of obtaining the first detection data, except that the image to be detected input to the object detection model does not have a label. The target detection model is obtained by training by adopting the training method of the target detection model.

In operation S420, a position of a second target object in the image to be detected is obtained based on the second detection data.

According to an embodiment of the present disclosure, the second detection data may include position information of the prediction box, similar to the first detection data, and the position of the second target object may be obtained based on the position information. Specifically, the region surrounded by the prediction frame may be the region where the second target object is located.

According to an embodiment of the present disclosure, the target detection model may also be used to predict a class of the second target object. The method of detecting a target object of this embodiment may further obtain a category of the second target object based on the second detection data. Specifically, a maximum probability among the probabilities for the plurality of predetermined classes in the second detection data may be determined, and a class corresponding to the maximum probability may be set as the class of the second target object.

Based on the training method of the target detection model described above, the present disclosure also provides a training apparatus of the target detection model. The apparatus will be described in detail below with reference to fig. 5.

Fig. 5 is a block diagram of a structure of a training apparatus for an object detection model according to an embodiment of the present disclosure.

As shown in fig. 5, the training apparatus 500 of the object detection model of this embodiment may include a first data obtaining module 510, an intersection ratio determining module 520, a weight determining module 530, and a model training module 540.

The first data obtaining module 510 is configured to input sample data into the target detection model, and obtain first detection data for the sample data. Wherein the first detection data indicates a predicted box for the first target object in sample data, the sample data having a tag indicating a real box for the first target object. In an embodiment, the first data obtaining module 510 may be configured to perform the operation S210 described above, for example, and is not described herein again.

The intersection ratio determination module 520 is used for determining the intersection ratio between the prediction box and the real box. In an embodiment, the intersection ratio determining module 520 may be configured to perform the operation S220 described above, for example, and is not described herein again.

The weight determination module 530 is configured to determine a weight of a regression loss function in the predetermined loss functions based on the intersection ratio. In an embodiment, the weight determining module 530 may be configured to perform the operation S230 described above, for example, and is not described herein again.

The model training module 540 is configured to train the target detection model based on the weights, the prediction box, the real box, and the predetermined loss function. In an embodiment, the model training module 540 may be configured to perform the operation S240 described above, which is not described herein again.

According to an embodiment of the present disclosure, the weight determination module may include a first determination submodule and a second determination submodule, the first determination submodule being configured to determine the weight based on a negative correlation between the cross-over ratio and the weight if the cross-over ratio is greater than zero. The second determination submodule is configured to determine that the weight of the regression loss function is zero if the intersection ratio is equal to zero.

According to an embodiment of the present disclosure, the weights and cross-over ratios are inversely related to each other.

According to an embodiment of the present disclosure, the negative correlation between the weight and the cross-over ratio is represented by the following formula:

wherein,

x is the cross-over ratio and γ is a constant less than zero for the weights.

According to an embodiment of the present disclosure, the model training module includes a weight value determination sub-module and a training sub-module. The weighted value determining submodule is used for determining the weighted value of the regression loss function based on the weight, the prediction frame and the real frame. And the training submodule is used for training the target detection model based on the weighted value.

According to an embodiment of the present disclosure, the predetermined loss function further comprises a classification loss function, the tag further indicates a true class of the target; the first detection data further indicates a prediction category of the first target object; the model training module further comprises a value determination submodule for determining a value of the classification loss function as a first value based on the prediction category and the real category. The training submodule may include a loss determining unit and a training unit, where the loss determining unit is configured to determine, based on the weighted value and the first value, a value of a predetermined loss function as a second value; the training unit is used for training the target detection model based on the second value.

According to an embodiment of the present disclosure, the loss determining unit is configured to determine, as the value of the predetermined loss function, a weighted sum of the weighted value and the first value based on a first predetermined weight for the regression loss function and a second predetermined weight for the classification loss function.

Based on the method for detecting the target object, the disclosure also provides a device for detecting the target object. The apparatus for detecting a target object will be described in detail below with reference to fig. 6.

Fig. 6 is a block diagram of a structure of an apparatus for detecting a target object according to an embodiment of the present disclosure.

As shown in fig. 6, the apparatus 600 for detecting a target object of this embodiment may include a second data obtaining module 610 and a position determining module 620.

The second data obtaining module 610 is configured to obtain second detection data for the image to be detected by using the image to be detected as an input of the target detection model. In an embodiment, the second data obtaining module 610 is configured to perform the operation S410 described above, which is not described herein again.

The position determining module 620 is configured to obtain a position of a second target object in the image to be detected based on the second detection data. In an embodiment, the position determining module 620 is configured to perform the operation S420 described above, which is not described herein again.

According to an embodiment of the present disclosure, the apparatus 600 for detecting a target object may further include a category determining module, configured to obtain a category of the second target object based on the second detection data.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Fig. 7 shows a block diagram of an electronic device for implementing a training method of a target detection model and/or a method of detecting a target object of an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 performs the respective methods and processes described above, such as a training method of the target detection model and/or a method of detecting the target object. For example, in some embodiments, the training method of the target detection model and/or the method of detecting the target object may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When loaded into RAM 703 and executed by the computing unit 701, may perform one or more steps of the method of training a target detection model and/or the method of detecting a target object described above. Alternatively, in other embodiments, the computing unit 701 may be configured by any other suitable means (e.g. by means of firmware) to perform a training method of a target detection model and/or a method of detecting a target object.

Based on the foregoing electronic device, the present disclosure also provides a roadside device including the foregoing electronic device. For example, the roadside apparatus may include a communication section and the like in addition to the electronic apparatus. The electronic device may be integrated with the communication unit, or may be provided separately from the communication unit. In one embodiment, the electronic device may acquire data, such as pictures and videos, from a sensing device (e.g., a roadside camera) for image video processing and data computation.

According to the embodiment of the disclosure, the electronic device itself may also have a perception data acquisition function and a communication function, for example, the electronic device may be an AI camera, and then the electronic device may directly perform image video processing and data calculation based on the acquired perception data.

Based on the electronic equipment, the cloud control platform is further provided by the disclosure and comprises the electronic equipment. For example, the cloud control platform may perform processing at a cloud end, and an electronic device included in the cloud control platform may acquire data, such as pictures and videos, of a sensing device (such as a roadside camera), so as to perform image video processing and data calculation. The cloud control platform can also be called a vehicle road cooperative management platform, an edge computing platform, a cloud computing platform, a central system, a cloud server and the like.

Based on the electronic equipment, the disclosure also provides an automatic driving vehicle, and the electronic equipment can be integrated in a vehicle control system of the automatic driving vehicle.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of training an object detection model, comprising:

inputting sample data into a target detection model to obtain first detection data aiming at the sample data; wherein the first detection data indicates a predicted box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object;

determining an intersection-to-parallel ratio between the prediction box and the real box;

determining a weight of a regression loss function in predetermined loss functions based on the intersection ratio; and

training the target detection model based on the weights, the prediction box, the real box, and the predetermined loss function.

2. The method of claim 1, wherein the weights and the intersection ratios are inversely related to each other.

3. The method of claim 2, wherein determining weights for a regression loss function in the predetermined loss functions based on the intersection ratio comprises:

determining the weight based on a negative correlation between the cross-over ratio and the weight if the cross-over ratio is greater than zero;

determining the weight of the regression loss function to be zero in the case that the intersection ratio is equal to zero.

4. The method of claim 3, wherein a negative correlation between the weight and the intersection ratio is represented by the following formula:

wherein,

x is the cross-over ratio and γ is a constant less than zero for the weights.

5. The method of any of claims 1-4, wherein training the target detection model comprises:

determining a weighted value of the regression loss function based on the weight, the prediction box, and the true box; and

and training the target detection model based on the weighted value.

6. The method of claim 5, wherein the predetermined loss function further comprises a classification loss function; the tag further indicates a real category of the object; the first detection data further indicates a prediction category of the first target object;

training the target detection model further comprises: determining a value of the classification loss function as a first value based on the prediction category and the real category; and

training the target detection model based on the weighted scores comprises:

determining a value of the predetermined loss function as a second value based on the weighted value and the first value; and

and training the target detection model based on the second value.

7. The method of claim 6, wherein determining the value of the predetermined loss function comprises:

determining a weighted sum of the weighted value and the first value as a value of the predetermined loss function based on a first predetermined weight for the regression loss function and a second predetermined weight for the classification loss function.

8. A method of detecting a target object, comprising:

taking an image to be detected as input of a target detection model, and obtaining second detection data aiming at the image to be detected; and

obtaining a position of a second target object in the image to be detected based on the second detection data,

wherein the target detection model is obtained by training by the method of any one of claims 1 to 7.

9. The method of claim 8, further comprising:

obtaining a category of the second target object based on the second detection data.

10. A training apparatus for an object detection model, comprising:

the first data acquisition module is used for inputting sample data into the target detection model and acquiring first detection data aiming at the sample data; wherein the first detection data indicates a predicted box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object;

the intersection ratio determining module is used for determining the intersection ratio between the prediction frame and the real frame;

a weight determination module for determining a weight of a regression loss function in the predetermined loss functions based on the intersection ratio; and

a model training module for training the target detection model based on the weights, the prediction box, the real box and the predetermined loss function.

11. The apparatus of claim 10, wherein the weights and the intersection ratios are inversely related to each other.

12. The apparatus of claim 11, wherein the weight determination module comprises:

a first determining submodule, configured to determine the weight based on a negative correlation between the cross-over ratio and the weight if the cross-over ratio is greater than zero;

a second determining submodule for determining that the weight of the regression loss function is zero when the intersection ratio is equal to zero.

13. The apparatus of claim 12, wherein a negative correlation between the weight and the intersection ratio is represented by the following formula:

wherein,

x is the cross-over ratio and γ is a constant less than zero for the weights.

14. The apparatus of any of claims 10-13, wherein the model training module comprises:

a weighted value determination submodule for determining a weighted value of the regression loss function based on the weight, the prediction frame and the true frame; and

and the training submodule is used for training the target detection model based on the weighted value.

15. The apparatus of claim 14, wherein the predetermined loss function further comprises a classification loss function; the tag further indicates a real category of the object; the first detection data further indicates a prediction category of the first target object;

the model training module further comprises a value determination submodule for determining a value of the classification loss function as a first value based on the prediction category and the real category; and

the training submodule includes:

a loss determination unit for determining a loss of the first value based on the weighted value and the first value,

determining a value of the predetermined loss function as a second value; and

and the training unit is used for training the target detection model based on the second value.

16. The apparatus of claim 15, wherein the loss determination unit is to:

17. An apparatus for detecting a target object, comprising:

the second data acquisition module is used for taking the image to be detected as the input of a target detection model and acquiring second detection data aiming at the image to be detected; and

a position determination module for obtaining a position of a second target object in the image to be detected based on the second detection data,

wherein the target detection model is obtained by training by using the device of any one of claims 10-16.

18. The apparatus of claim 17, further comprising:

and the position determining module is used for obtaining the category of the second target object based on the second detection data.

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-9.

21. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 9.

22. A roadside apparatus comprising the electronic apparatus of claim 19.

23. A cloud controlled platform comprising the electronic device of claim 19.

24. An autonomous vehicle comprising the electronic device of claim 19.