CN113469025B

CN113469025B - Target detection method and device applied to vehicle-road cooperation, road side equipment and vehicle

Info

Publication number: CN113469025B
Application number: CN202110731377.4A
Authority: CN
Inventors: 夏春龙
Original assignee: Apollo Zhilian Beijing Technology Co Ltd
Current assignee: Apollo Zhilian Beijing Technology Co Ltd
Priority date: 2021-06-29
Filing date: 2021-06-29
Publication date: 2024-05-31
Anticipated expiration: 2041-06-29
Also published as: CN113469025A

Abstract

The disclosure provides a training method of a target detection model, a method, a device, equipment and a storage medium for detecting a target object, and relates to the field of artificial intelligence, in particular to the technical field of computer vision and the technical field of intelligent traffic. The specific implementation scheme of the training method of the target detection model is as follows: inputting sample data into a target detection model to obtain first detection data aiming at the sample data; wherein the first detection data indicates a prediction box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object; determining the intersection ratio between the prediction frame and the real frame; determining weights of regression loss functions in the predetermined loss functions based on the cross ratios; and training the target detection model based on the weights, the prediction box, the real box and the predetermined loss function.

Description

Target detection method and device applied to vehicle-road cooperation, road side equipment and vehicle

Technical Field

The present disclosure relates to the field of artificial intelligence, in particular to the field of computer vision technology and the field of intelligent traffic technology, and more particularly to a training method of a target detection model, and a method, apparatus, device and storage medium for detecting a target object.

Background

The application of computer vision technology in industrial scenes (such as vehicle-road cooperative V2X application scenes) is becoming more and more abundant. As the basis of the computer vision technology, the target detection technology can solve the problem of time and labor consumption in a manual traditional mode, so that the method has a very wide application prospect.

In an anchor point (anchor) -based target detection method, different loss coefficients are generally set for a foreground candidate frame and a background candidate frame, and a uniform loss coefficient is set for the foreground candidate frame.

Disclosure of Invention

Provided are a training method of a target detection model capable of improving model accuracy, and a method, apparatus, device, and storage medium for detecting a target object.

According to a first aspect, there is provided a training method of a target detection model, comprising: inputting sample data into a target detection model to obtain first detection data aiming at the sample data; wherein the first detection data indicates a prediction box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object; determining the intersection ratio between the prediction frame and the real frame; determining weights of regression loss functions in the predetermined loss functions based on the cross ratios; and training the target detection model based on the weights, the prediction box, the real box and the predetermined loss function.

According to a second aspect, there is provided a method of detecting a target object, comprising: taking the image to be detected as input of a target detection model to obtain second detection data aiming at the image to be detected; and obtaining the position of a second target object in the image to be detected based on the second detection data, wherein the target detection model is obtained by training by adopting the training method of the target detection model.

According to a third aspect, there is provided a training apparatus of an object detection model, comprising: the first data obtaining module is used for inputting the sample data into the target detection model to obtain first detection data aiming at the sample data; wherein the first detection data indicates a prediction box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object; the cross-over ratio determining module is used for determining the cross-over ratio between the prediction frame and the real frame; a weight determination module for determining the weight of the regression loss function in the predetermined loss function based on the cross-over ratio; and the model training module is used for training the target detection model based on the weight, the prediction frame, the real frame and the preset loss function.

According to a fourth aspect, there is provided an apparatus for detecting a target object, comprising: the second data obtaining module is used for obtaining second detection data aiming at the image to be detected by taking the image to be detected as the input of the target detection model; and a position obtaining module for obtaining the position of the second target object in the image to be detected based on the second detection data. The target detection model is obtained by training the training device of the target detection model.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the training method of the target detection model and/or the method of detecting a target object provided by the present disclosure.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the training method of the object detection model and/or the method of detecting an object provided by the present disclosure.

According to a seventh aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the training method of the object detection model and/or the method of detecting an object provided by the present disclosure.

According to an eighth aspect, there is provided a roadside device comprising the aforementioned electronic device.

According to a ninth aspect, a cloud control platform is provided, which comprises the electronic device.

According to a tenth aspect, there is provided an autonomous vehicle comprising the aforementioned electronic device.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic illustration of an application scenario of a method and apparatus for training a target detection model and detecting a target object according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a training method of a target detection model according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of training a target detection model according to an embodiment of the present disclosure;

FIG. 4 is a flow chart of a method of detecting a target object according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a training apparatus of an object detection model according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of an apparatus for detecting a target object according to an embodiment of the present disclosure; and

FIG. 7 is a block diagram of an electronic device for implementing a training method of a target detection model and/or a method of detecting a target object in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The disclosure provides a training method of a target detection model, which comprises a data acquisition stage, an intersection ratio determination stage, a weight determination stage and a model training stage. In the data obtaining stage, sample data is input into a target detection model, and first detection data for the sample data is obtained. In the cross-over ratio determination phase, the cross-over ratio between the predicted frame and the real frame is determined. In the weight determination phase, the weight of the regression loss function in the predetermined loss function is determined based on the cross-over ratio. In the model training phase, the target detection model is trained based on the weights, the prediction box, the real box and the predetermined loss function.

An application scenario of the method and apparatus provided by the present disclosure will be described below with reference to fig. 1.

FIG. 1 is an application scenario diagram of a method and apparatus for training a target detection model and detecting a target object according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include, for example, an image acquisition device 110, a server 120, and a terminal device 130. The image capturing device 110 may upload the captured image to the server 120 through a network, and the terminal device 130 may be communicatively connected to the server 120 through the network. The network may include wired or wireless communication links.

According to an embodiment of the present disclosure, the image capturing apparatus 110 may be, for example, an image capturing apparatus that may be disposed at a road side of the road 140 to capture an image reflecting a road condition. Or the image pickup apparatus may be provided in the vehicle to take an image of the surroundings of the vehicle when the vehicle is running on a road. The image capture device 110 may, for example, upload the captured image to the server 120 in real time, with the image being stored or processed by the server 120.

According to embodiments of the present disclosure, the server 120 may store a huge amount of images, for example, or may store the images uploaded by the image acquisition device 110 into a database periodically. The server 120 may train the pre-constructed object detection model according to the annotated image, for example, the trained object detection model 150 may be sent to the terminal device in response to a request from the terminal device 130.

In an embodiment, the server 120 may be a background management server that provides support for the running of client applications. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

According to an embodiment of the present disclosure, the terminal device 130 may interact with the server 120 via a network in response to a user operation to obtain a trained object detection model and an image from the server 120, detect a target object in the image using the object detection model, and display the detection result to the user. The terminal device 130 may be an electronic device having a display screen and having processing capabilities including, but not limited to, a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like.

In an embodiment, the terminal device 130 may also be integrated in a vehicle control system of a vehicle, for example.

In an embodiment, the terminal device 130 may detect the target object using a trained target detection model, for example, to obtain the position of the target object. The location of the target object may be applied, for example, in downstream applications. The downstream application may be, for example, application of a road cooperative task such as spill detection and travel track prediction.

In an embodiment, the server 120 may also use a trained target detection model to perform target detection on the image uploaded by the image acquisition device 110 in real time, and feed back the detection result to the terminal device 130, where the terminal device 130 displays the detection result.

According to an embodiment of the present disclosure, the target detection model may detect the target object using a Key point (Key point) -based detection method, or may detect the target object using an anchor point (anchor) -based detection method. The anchor point-based detection method adds prior distribution, for example, a group of fixed reference frames with different scales and different positions are preset to cover almost all positions and scales, and each reference frame is responsible for detecting a target with the intersection ratio larger than a threshold value. The model based on the method is relatively stable in training, and can effectively improve the detection of small targets when the Anchor distribution is relatively dense. The detection method based on the key points is characterized in that each target is represented by a pair of corner points, so that the calculation amount is small, and a larger and more flexible solution space is provided. For example, an anchor-based object detection method is generally used in industry to construct an object detection model, thereby improving the detection rate of an object.

It should be noted that, the training method of the target detection model provided in the present disclosure may be performed by the server 120, and the method of detecting the target object may be performed by the server 120 or the terminal device 130. Accordingly, the training apparatus of the target detection model provided in the present disclosure may be disposed in the server 120, and the apparatus for detecting the target object may be disposed in the server 120 or in the terminal device 130.

It should be understood that the types and numbers of image capturing devices, servers and terminal devices in fig. 1 are merely illustrative. There may be any type and number of image capturing devices, servers and terminal devices as required by the implementation.

The training method of the object detection model provided by the present disclosure will be described in detail below with reference to the application scenario described in fig. 1 through fig. 2 to 3.

Fig. 2 is a flow diagram of a training method of a target detection model according to an embodiment of the present disclosure.

As shown in fig. 2, the training method 200 of the object detection model of this embodiment may include operations S210 to S240.

In operation S210, sample data is input into a target detection model, and first detection data for the sample data is obtained.

According to embodiments of the present disclosure, the object detection model may be constructed, for example, based on an anchor's object detection method. The object detection model may be a model based on a first order processing method represented by a single view detector (You Only Look Once, YOLO) or may be a model based on a second order processing method represented by a regional convolutional neural network (Region-CNN, R-CNN). The second-order processing method is different from the first-order processing method in that the second-order processing method adopts a region candidate network (Region Proposal Network) to distinguish front and back backgrounds so as to alleviate positive and negative sample distribution. A fine target prediction is then performed on the valid foreground region (Region of interest, ROI). For example, a plurality of prior frames (anchor boxes) may be preset, and the plurality of prior frames may be divided into a foreground frame and a background frame via the area candidate network. If a first order processing method is used, information indicating a foreground frame or a background frame is required to be included in the multiple prior frames. The background frame refers to a priori frame with the intersection ratio with the real frame smaller than a preset threshold value, and the foreground frame refers to a priori frame with the intersection ratio with the real frame larger than or equal to the preset threshold value.

In an embodiment, the object detection model may be used only to predict the position of the object in the sample data. Or the object detection model is not only used for the position of the object in the sample data, but also for predicting the class of the object in the sample data. For example, the categories of target objects may include: automobiles, bicycles, motorcycles, pedestrians, and the like. The category of the target object may be set according to an application scenario, which is not limited in the present disclosure.

According to an embodiment of the present disclosure, the sample data has a label indicating a real box for the first target object in the sample image. Specifically, the tag has coordinate values of a center point of a real frame, and a height and a width of the real frame. It is to be understood that this label is merely an example to facilitate an understanding of the present disclosure, which is not limited thereto. In an embodiment, the tag may also indicate a real box by an offset relative to an object.

The embodiment may input the sample image in the sample data into the target detection model, and output the first detection data from the target detection model. The first detection data includes a prediction box for a first target object. Specifically, the position information of the prediction frame, for example, the predicted coordinate value of the center point of the prediction frame, the predicted height and the predicted width of the prediction frame may be included.

It is to be understood that each coordinate value may be a coordinate value in a two-dimensional coordinate system established with the center point of the sample image as the origin, or a coordinate value in a two-dimensional coordinate system established with the point at the upper left corner of the sample image as the origin, and the coordinate system for each coordinate value is not limited in the present disclosure, so long as each coordinate value is for the same coordinate system.

In operation S220, an intersection ratio between the prediction frame and the real frame is determined.

For example, an overlapping portion where the prediction frame and the real frame overlap each other may be determined based on the position information of the prediction frame and the position information of the real frame, and the area of the overlapping portion may be calculated as the intersection area between the prediction frame and the real frame. The sum between the area of the predicted frame and the area of the real frame is then determined as the area sum, and the difference between the area sum and the area of the overlap is determined as the union area. Finally, the ratio of intersection area to Union area is taken as the intersection ratio between the predicted and real frames (Intersection-over-Union, ioU). For example, if the predicted frame and the real frame are completely overlapped, the value of the overlap ratio is 1. For example, the overlap ratio may be expressed using the following formula:

Is the cross-over ratio. In the case where there are a plurality of sample data, bi is a prediction frame of the i-th sample data among the plurality of sample data,/> Is the true box of the ith sample data.

In operation S230, the weight of the regression loss function in the predetermined loss function is determined based on the cross ratio.

According to embodiments of the present disclosure, the regression loss function may be, for example, a mean square error function, a mean absolute error function, a smoothed mean absolute error function (Huber loss function), or the like. The weight of the regression loss function in this embodiment may be represented by a weight function with the cross-over ratio as an argument.

In an embodiment, the regression loss function may be, for example, a smoth L1 function, where the value of the function is 0 when the prior frame corresponding to the prediction frame is a background frame, and where the value of the function is expressed by the following formula when the prior frame corresponding to the prediction frame is a foreground frame:

where L _reg is the value of the regression loss function, t _i is a vector representing the position of the prediction frame in the ith sample data, To represent a vector of positions of real frames corresponding to the predicted frames in the i-th sample data. For example, t _i and/>The following equations can be used, respectively:

Wherein, (x _i,y_i) is the coordinate value of the center point of the predicted frame in the ith sample data, and w _i、h_i is the width and the height of the predicted frame in the ith sample data, respectively. Coordinate value of center point of real frame corresponding to predicted frame in ith sample data,/>The width and height of the real frame corresponding to the predicted frame in the ith sample data, respectively. (x _pi,y_pi) is the coordinate value of the center point of the prior frame corresponding to the predicted frame in the i-th sample data, and w _pi、h_pi is the width and height of the prior frame corresponding to the predicted frame in the i-th sample data, respectively. It will be appreciated that the regression loss function of this embodiment is similar to the principle of calculating the loss of distance in the fast region convolutional neural network model and will not be described in detail herein. Wherein, the prior frame corresponding to the prediction frame refers to: the prediction box may be obtained by adjusting the corresponding a priori box.

For example, the weight of the regression loss function may be positively correlated with the cross-ratios, whereby the model may be focused on the capabilities of certain examples. Or the weight of the regression loss function can be inversely related to the cross-correlation ratio, so that the model can provide larger loss penalty for the sample with low prediction accuracy, and the detection accuracy of the model obtained by training on various types of targets is improved.

In accordance with embodiments of the present disclosure, when a weight is inversely related to the cross-over ratio, the weight may be inversely proportional to the cross-over ratio, for example. Alternatively, the weight may be an exponential function of the cross-over ratio, with the exponential function being based on the cross-over ratio and being exponential by a negative number. For example, the negative correlation of the cross-over ratio with the weight can be expressed by the following formula: wherein/> For weight, x is the cross-over ratio and γ is a constant less than zero. The constant value can be set according to actual requirements, which is not limited in this disclosure. For example, the constant may be-1.

According to an embodiment of the present disclosure, in the case where the weight and the overlap ratio are inversely related to each other, when determining that the weight of the regression loss function in the predetermined loss function includes based on the overlap ratio, it may be determined whether the overlap ratio is zero first. If the regression loss function is zero, the weight of the regression loss function is determined to be zero. Otherwise, determining the weight based on the negative correlation between the cross-correlation ratio and the weight. In this way, it is possible to avoid a case where the denominator portion of the weight function takes on zero value (the weight is inversely proportional to the overlap ratio) because the overlap ratio is zero. This is because if the intersection ratio between the predicted frame and the real frame is zero, the prior frame on which the predicted frame is derived is typically the background frame.

According to an embodiment of the present disclosure, in the case where the weight and the intersection ratio are inversely related to each other, when determining that the weight of the regression loss function in the predetermined loss function includes based on the intersection ratio, it may be determined first whether or not the intersection ratio between the prior frame on which the prediction frame is based and the real frame is equal to or greater than a predetermined threshold. And if the cross-correlation ratio is greater than or equal to a preset threshold value, determining the weight based on the negative correlation between the cross-correlation ratio and the weight. If the regression loss function is smaller than the preset threshold value, the weight of the regression loss function is determined to be zero.

In operation S240, the object detection model is trained based on the weights, the prediction box, the real box, and the predetermined loss function.

According to the embodiment of the disclosure, the value of the predetermined loss function may be determined first according to the weight, the position information of the prediction frame, and the position information of the real frame. And then, a gradient descent algorithm or a back propagation algorithm is adopted to adjust parameters in the target detection model, so that training of the target detection model is realized.

For example, when the object detection model is used only to predict the position information of the object, the embodiment may first determine the weighted value of the regression loss function based on the weight, the prediction box, and the real box. And then based on the weighted value, adopting the gradient descent algorithm or the back propagation algorithm to adjust parameters in the model. When determining the weighted value, the value of the regression loss function can be determined according to the prediction frame and the real frame, and then the product of the value and the weight is taken as the weighted value. Under the condition that the regression Loss function is the Loss L1 Loss function, the background frame and the foreground frame can be determined according to the intersection ratio between the prior frame and the real frame, and the weighted value of the regression Loss function is determined according to the predicted frame and the real frame only aiming at the predicted frame obtained based on the foreground frame.

According to embodiments of the present disclosure, when N sample data are employed in a single training process, the following formula may be used to represent the weighted value:

wherein, I.e. the weights determined based on the intersection ratio between the prediction frame and the real frame,I.e. the aforementioned smoothl 1 function.

It will be appreciated that for each sample data, multiple prediction frames may be derived, and that this embodiment may then derive a weight for each prediction frame. And a weighted value is obtained for each prediction block. The embodiment may take the sum of a plurality of weighted values obtained for a plurality of prediction frames as the weighted value for each sample data.

From analysis, it can be known that, according to the embodiment of the disclosure, the weight of the regression loss function is determined according to the intersection ratio between the prediction frame and the real frame, so that the target detection model is more accurately trained, and the target detection model obtained by training can more meet the actual requirements.

Furthermore, according to the embodiment of the disclosure, the weight of the regression loss function and the cross-correlation ratio are set to be in a negative correlation association relationship, so that the regression loss gives a larger penalty to the prediction frame with a larger phase difference with the real frame, and therefore, the training of the target detection model can be performed by considering different regression difficulties of different prior frames, the regression difficulties of different targets are effectively balanced, and the performance and training efficiency of the target detection model obtained through training are improved.

Fig. 3 is a schematic diagram of training a target detection model according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the object detection model may also be used, for example, for category prediction of objects in a prediction box. Accordingly, the tag of the sample data may also indicate the true category of the first target object in the sample image. The first detection data output by the target detection model may further indicate a predicted class of the first target object. For example, the first detection data may include probabilities of the first target object with respect to a plurality of predetermined categories, and the category corresponding to the maximum probability value is taken as the predicted probability of the first target object.

In this embodiment, the predetermined loss function may further comprise a classification loss function, whereby classification loss and regression loss are taken into account simultaneously when training the target detection model.

As shown in fig. 3, in this embodiment 300, when training the target detection model, the sample data 310 may be used as an input to the target detection model 320, and the first detection data 330 may be output from the target detection model 320. The first detection data may be a one-dimensional vector including position information of a prediction frame and a prediction probability of an object in the prediction frame for each of a plurality of predetermined categories. The prediction frame 331 can be obtained from the position information of the prediction frame, and the prediction category 332 of the object in the prediction frame can be obtained according to the probability. After obtaining the prediction frame 331, a similar method as described above may be used to determine the intersection ratio between the real frame 311 indicated by the label in the sample data 310 and the prediction frame 331, and obtain the weighted value 340 of the regression loss function based on the weight determined according to the intersection ratio. Meanwhile, the value of the classification loss function may be determined as the first value 350 based on the predicted class 332 and the real class 312.

After the first value 350 and the weighted value 340 are obtained, the value of the predetermined loss function may be determined as the second value 360 based on the weighted value 340 and the first value 350. The object detection model 320 is then trained based on the second value 360.

The classification loss function may be, for example, a cross entropy loss function or a Hinge (Hinge) loss function. Taking the cross entropy loss function as an example, the value of the classification loss function can be expressed by the following formula:

Where L _cls is a classification loss function and c is any one of a plurality of predetermined classes. For the true probability of the ith sample data in the N sample data aiming at the preset category c, if the category indicated by the label of the ith sample data is the category c, the/>The value of (1) is 1, otherwise the/>The value of (2) is 0.p _i (c) is the prediction probability that the class of the object in the prediction frame is the class c.

According to embodiments of the present disclosure, the sum of the weighted value 340 and the first value 350 may be valued as a predetermined loss function. Or the regression loss function and the classification loss function may be assigned weights in advance, the embodiment may determine a weighted sum of the weighted value and the first value based on the first predetermined weight for the regression loss function and the second predetermined weight for the classification loss function, and take the weighted sum as the value of the predetermined loss function. For example, the predetermined loss function may be expressed using the following formula:

L＝αL_cls+βL_reg。

Wherein L is a predetermined loss function, L _cls is a classification loss function, L _reg is a weighted regression loss function, which is the product of the weight determined based on the cross-correlation ratio and the regression loss function (the value of L _reg is the weighted value). Alpha is a second predetermined weight and beta is a first predetermined weight.

According to the embodiment, the regression loss and the classification loss are comprehensively considered to train the target detection model, so that the target detection model can learn the position characteristics and the category characteristics at the same time, the accuracy of predicting the position of the target object can be improved, and the accuracy of predicting the category of the target object can also be improved.

The present disclosure also provides a method for detecting a target object based on the training method of the target detection model described above. The method of detecting the target object will be described below with reference to fig. 4.

Fig. 4 is a flow chart of a method of detecting a target object according to an embodiment of the present disclosure.

As shown in fig. 4, the method 400 of detecting a target object of this embodiment may include operations S410 to S420.

In operation S410, second detection data for the image to be detected is obtained with the image to be detected as an input of the target detection model. The method for obtaining the second detection data is similar to the method for obtaining the first detection data, except that the image to be detected input into the target detection model has no label. The target detection model is obtained by training the target detection model by adopting the training method.

In operation S420, a position of a second target object in the image to be detected is obtained based on the second detection data.

According to an embodiment of the present disclosure, the second detection data may include position information of the prediction frame, similar to the first detection data, based on which the position of the second target object may be obtained. Specifically, the area surrounded by the prediction frame may be used as the area where the second target object is located.

The object detection model may also be used to predict a class of the second object according to embodiments of the present disclosure. The method of detecting a target object of this embodiment may further obtain a category of the second target object based on the second detection data. Specifically, the maximum probability among probabilities for a plurality of predetermined categories in the second detection data may be determined first, and a category corresponding to the maximum probability may be regarded as the category of the second target object.

Based on the training method of the target detection model described above, the present disclosure also provides a training device of the target detection model. The device will be described in detail below in connection with fig. 5.

Fig. 5 is a block diagram of a training apparatus of an object detection model according to an embodiment of the present disclosure.

As shown in fig. 5, the training apparatus 500 for the object detection model of this embodiment may include a first data obtaining module 510, an intersection ratio determining module 520, a weight determining module 530, and a model training module 540.

The first data obtaining module 510 is configured to input sample data into the target detection model, and obtain first detection data for the sample data. Wherein the first detection data indicates a prediction box for the first target object in the sample data, the sample data having a tag indicating a real box for the first target object. In an embodiment, the first data obtaining module 510 may be used to perform the operation S210 described above, which is not described herein.

The cross-over ratio determining module 520 is configured to determine a cross-over ratio between the prediction frame and the real frame. In an embodiment, the cross ratio determining module 520 may be used to perform the operation S220 described above, which is not described herein.

The weight determination module 530 is configured to determine weights for the regression loss function in the predetermined loss function based on the cross-ratios. In an embodiment, the weight determining module 530 may be used to perform the operation S230 described above, which is not described herein.

The model training module 540 is configured to train the target detection model based on the weights, the prediction box, the real box, and the predetermined loss function. In an embodiment, the model training module 540 may be used to perform the operation S240 described above, which is not described herein.

According to an embodiment of the present disclosure, the weight determining module may include a first determining submodule for determining the weight based on a negative correlation between the cross-over ratio and the weight in case the cross-over ratio is greater than zero, and a second determining submodule. The second determining submodule is used for determining that the weight of the regression loss function is zero under the condition that the cross ratio is equal to zero.

According to embodiments of the present disclosure, the weights and the cross ratios are inversely related to each other.

According to an embodiment of the present disclosure, the negative correlation between weights and the cross-ratios is expressed by the following formula: wherein/> For the weights, x is the cross ratio and γ is a constant less than zero.

According to an embodiment of the present disclosure, the model training module includes a weight determination sub-module and a training sub-module. The weighted value determination submodule is used for determining weighted value of the regression loss function based on the weight, the prediction frame and the real frame. The training submodule is used for training the target detection model based on the weighted value.

According to an embodiment of the present disclosure, the predetermined loss function further comprises a classification loss function, the tag further indicating a true class of the target; the first detection data also indicates a predicted class of the first target object; the model training module further comprises a value determining sub-module, which is used for determining the value of the classification loss function based on the prediction category and the real category as a first value. The training sub-module may include a loss determination unit and a training unit, where the loss determination unit is configured to determine, as the second value, a value of a predetermined loss function based on the weighted value and the first value; the training unit is used for training the target detection model based on the second value.

According to an embodiment of the present disclosure, the above-mentioned loss determination unit is configured to determine, as the value of the predetermined loss function, a weighted sum of the weighted value and the first value based on a first predetermined weight for the regression loss function and a second predetermined weight for the classification loss function.

Based on the method for detecting the target object, the disclosure also provides a device for detecting the target object. The apparatus for detecting a target object will be described in detail with reference to fig. 6.

Fig. 6 is a block diagram of a structure of an apparatus for detecting a target object according to an embodiment of the present disclosure.

As shown in fig. 6, the apparatus 600 for detecting a target object of this embodiment may include a second data obtaining module 610 and a position determining module 620.

The second data obtaining module 610 is configured to obtain second detection data for the image to be detected, with the image to be detected as an input of the target detection model. In an embodiment, the second data obtaining module 610 is configured to perform the operation S410 described above, which is not described herein.

The position determining module 620 is configured to obtain a position of the second target object in the image to be detected based on the second detection data. In an embodiment, the location determining module 620 is configured to perform the operation S420 described above, which is not described herein.

According to an embodiment of the present disclosure, the foregoing apparatus 600 for detecting a target object may further include a category determining module for obtaining a category of the second target object based on the second detection data.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 7 illustrates a block diagram of an electronic device for implementing a training method of a target detection model and/or a method of detecting a target object in accordance with an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, for example, a training method of a target detection model and/or a method of detecting a target object. For example, in some embodiments, the training method of the target detection model and/or the method of detecting the target object may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the training method of the target detection model and/or the method of detecting a target object described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the training method of the target detection model and/or the method of detecting the target object by any other suitable means (e.g. by means of firmware).

Based on the electronic equipment, the disclosure further provides road side equipment, and the road side equipment comprises the electronic equipment. For example, the roadside apparatus may include a communication section or the like in addition to the electronic apparatus. The electronic device may be integrally integrated with the communication part or may be provided separately from the communication part. In an embodiment, the electronic device may acquire data of a sensing device (such as a roadside camera), such as pictures and videos, so as to perform image video processing and data calculation.

According to the embodiment of the disclosure, the electronic device may also have a sensing data acquisition function and a communication function, for example, the electronic device may be an AI camera, and then the electronic device may directly perform image video processing and data calculation based on the acquired sensing data.

Based on the electronic equipment, the disclosure further provides a cloud control platform, which comprises the electronic equipment. For example, the cloud control platform may perform processing at the cloud, and the electronic device included in the cloud control platform may acquire data of the sensing device (such as a roadside camera), such as a picture and a video, so as to perform image video processing and data calculation. The cloud control platform can also be called a vehicle-road collaborative management platform, an edge computing platform, a cloud computing platform, a central system, a cloud server and the like.

Based on the foregoing electronic device, the disclosure further provides an autonomous vehicle, where the foregoing electronic device may be integrated into a vehicle control system of the autonomous vehicle.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A training method of a target detection model, comprising:

Inputting sample data into a target detection model to obtain first detection data for the sample data; wherein the first detection data indicates a prediction box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object;

Determining an intersection ratio between the prediction frame and the real frame;

Determining weights for regression loss functions in a predetermined loss function based on the intersection ratio; and

Training the target detection model based on the weights, the prediction box, the real box and the predetermined loss function;

wherein the weight is an exponential function of the cross-over ratio, the exponential function takes the cross-over ratio as a base, and takes a negative number as an exponent;

wherein the predetermined loss function further comprises a classification loss function, the tag further indicates a true class of the target, the first detection data further indicates a predicted class of the first target object; the training the object detection model based on the weights, the prediction box, the real box, and the predetermined loss function includes:

determining a value of the classification loss function as a first value based on the predicted class and the real class;

determining a weighted value of the regression loss function based on the weight, the prediction box and the real box;

determining the value of the predetermined loss function as a second value based on the weighted value and the first value; and

And training the target detection model based on the second value.

2. The method of claim 1, wherein determining weights for regression loss functions in a predetermined loss function based on the overlap ratio comprises:

Determining the weight based on a negative correlation between the cross-over ratio and the weight in the case where the cross-over ratio is greater than zero;

And determining that the weight of the regression loss function is zero under the condition that the intersection ratio is equal to zero.

3. The method of claim 2, wherein the negative correlation between the weights and the cross-ratios is expressed by the following formula:

；

wherein, For the weight,/>For the cross ratio,/>Is a constant less than zero.

4. The method of claim 1, wherein determining the value of the predetermined loss function comprises:

A weighted sum of the weighted value and the first value is determined as the value of the predetermined loss function based on a first predetermined weight for the regression loss function and a second predetermined weight for the classification loss function.

5. A method of detecting a target object, comprising:

taking an image to be detected as input of a target detection model, and obtaining second detection data aiming at the image to be detected; and

Obtaining a position of a second target object in the image to be detected based on the second detection data,

The target detection model is obtained by adopting a training method, and the training method comprises the following steps:

And training the target detection model based on the second value.

6. The method of claim 5, further comprising:

and obtaining the category of the second target object based on the second detection data.

7. A training device for a target detection model, comprising:

A first data obtaining module for inputting sample data into a target detection model to obtain first detection data for the sample data; wherein the first detection data indicates a prediction box for a first target object in the sample data, the sample data having a tag indicating a real box for the first target object;

The cross-over ratio determining module is used for determining the cross-over ratio between the prediction frame and the real frame;

a weight determination module for determining the weight of the regression loss function in the predetermined loss function based on the intersection ratio; and

The model training module is used for training the target detection model based on the weight, the prediction frame, the real frame and the preset loss function;

Wherein the predetermined loss function further comprises a classification loss function; the tag also indicates a true category of the target; the first detection data also indicates a prediction category of the first target object;

the model training module further comprises a value determining sub-module, which is used for determining the value of the classification loss function as a first value based on the prediction category and the real category; and a weighted value determining sub-module for determining a weighted value of the regression loss function based on the weight, the prediction frame, and the real frame; and

The training submodule includes:

A loss determination unit configured to determine, as a second value, a value of the predetermined loss function based on the weighted value and the first value; and

And the training unit is used for training the target detection model based on the second value.

8. The apparatus of claim 7, wherein the weight determination module comprises:

a first determining submodule, configured to determine the weight based on a negative correlation between the cross-over ratio and the weight in a case where the cross-over ratio is greater than zero;

And the second determining submodule is used for determining that the weight of the regression loss function is zero under the condition that the cross ratio is equal to zero.

9. The apparatus of claim 8, wherein a negative correlation between the weights and the cross-ratios is expressed by the following formula:

；

wherein, For the weight,/>For the cross ratio,/>Is a constant less than zero.

10. The apparatus of claim 7, wherein the loss determination unit is to:

11. An apparatus for detecting a target object, comprising:

the second data acquisition module is used for taking an image to be detected as input of a target detection model to acquire second detection data aiming at the image to be detected; and

A position determining module for obtaining the position of a second target object in the image to be detected based on the second detection data,

Wherein the target detection model is trained by a training device, the training device comprises:

The training submodule includes:

12. The apparatus of claim 11, further comprising:

And the position determining module is used for obtaining the category of the second target object based on the second detection data.

13. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.

16. A roadside device comprising the electronic device of claim 13.

17. A cloud control platform comprising the electronic device of claim 13.

18. An autonomous vehicle comprising the electronic device of claim 13.