CN112241675A

CN112241675A - Object detection model training method and device

Info

Publication number: CN112241675A
Application number: CN201910659672.6A
Authority: CN
Inventors: 周定富; 方进; 宋希彬; 官晨晔; 杨睿刚
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2021-01-19

Abstract

The embodiment of the invention provides an object detection model training method and device. The method comprises the following steps: determining an object true value bounding box according to the labeling data of the detection target; inputting the detection target into an object detection model, and obtaining an object detection surrounding frame according to detection data output by the object detection model; determining the intersection and parallel ratio of the object true value surrounding frame and the object detection surrounding frame according to the object true value surrounding frame and the object detection surrounding frame; determining a loss function value according to the intersection ratio; and according to the loss function value, performing back propagation to optimize the object detection model. The object detection model training method and device provided by the embodiment of the invention can improve the accuracy of object detection model detection.

Description

Object detection model training method and device

Technical Field

The invention relates to the technical field of computers, in particular to a training method and a training device for an object detection model.

Background

Object detection is an important subject in the field of computer vision, and has wide application in the fields of virtual reality, cultural relic protection, machining, computer simulation and the like. Unmanned driving is a new technology in the field of transportation at present, and has a wide prospect. In the unmanned technology, an object detection technology plays a very important role. In order to realize object detection in intelligent driving, a deep learning model is usually adopted, and some existing data are used for training the deep learning model, so that the model can learn how to recognize objects in the training process. The object detection technology in the driving technology needs to have extremely high accuracy based on the consideration of safety.

Disclosure of Invention

The embodiment of the invention provides an object detection model training method, which aims to solve one or more technical problems in the prior art.

In a first aspect, an embodiment of the present invention provides an object detection model training method, including:

determining an object true value bounding box according to the labeling data of the detection target;

inputting the detection target into an object detection model, and obtaining an object detection surrounding frame according to detection data output by the object detection model;

determining the intersection and parallel ratio of the object true value surrounding frame and the object detection surrounding frame according to the object true value surrounding frame and the object detection surrounding frame;

determining a loss function value according to the intersection ratio;

and optimizing the object detection model according to the loss function value.

In one embodiment, determining the loss function value according to the intersection ratio in a case where there is an overlap between the object true value bounding box and the object detection bounding box includes:

determining a loss function value according to the following formula:

L＝1-IoU；

where L is the loss function value and IoU is the cross-over ratio.

In one embodiment, determining the loss function value according to the intersection ratio in a case where there is no overlap between the object true value bounding box and the object detection bounding box comprises:

determining a loss function value according to the following formula:

L＝1-GIoU；

wherein the content of the first and second substances,

wherein, Area_CFor containing the Area of the object truth value bounding box and the smallest bounding box of the object detection bounding box, Area_UThe area of the union of the object truth bounding box and the object detection bounding box is IoU, which is the intersection ratio.

In one embodiment, optimizing the object detection model based on the loss function values comprises:

according to the loss function value, performing back propagation calculation to obtain a back propagation calculation gradient value;

and calculating gradient values according to the back propagation, and optimizing parameters of the object detection model.

In one embodiment, determining the intersection ratio of the object truth value bounding box and the object detection bounding box according to the object truth value bounding box and the object detection bounding box includes:

determining the intersection and union of the object detection bounding box and the object truth value bounding box;

and determining the intersection ratio according to the intersection and the union.

In one embodiment, any edge of the object truth value bounding box and any edge of the object detection bounding box are in a non-parallel state.

In a second aspect, an embodiment of the present invention provides an object detection model training apparatus, including:

a truth value module: the real-value bounding box of the object is determined according to the marking data of the detection target;

a detection module: the object detection surrounding frame is used for inputting the detection target into an object detection model and obtaining an object detection surrounding frame according to detection data output by the object detection model;

and a cross-over ratio calculation module: the intersection ratio of the object truth value bounding box and the object detection bounding box is determined according to the object truth value bounding box and the object detection bounding box;

a loss function calculation module: determining a loss function value according to the intersection ratio;

an optimization module: for optimizing the object detection model in dependence on the loss function values.

In one embodiment, in the case that there is an overlap between the object truth bounding box and the object detection bounding box, the loss function calculation module is configured to:

determining a loss function value according to the following formula:

L＝1-IoU；

where L is the loss function value and IoU is the cross-over ratio.

In one embodiment, in the case that there is no overlap between the object truth bounding box and the object detection bounding box, the loss function calculation module is configured to:

determining a loss function value according to the following formula:

L＝1-GIoU；

wherein the content of the first and second substances,

In one embodiment, the optimization module comprises:

a back propagation calculation unit: the system is used for carrying out back propagation calculation according to the loss function value to obtain a back propagation calculation gradient value;

a reverse processing unit: and the gradient value is calculated according to the back propagation, and the parameters of the object detection model are optimized.

In one embodiment, the intersection ratio calculation module comprises:

an intersection set calculation unit: the intersection and union of the object detection bounding box and the object truth value bounding box are determined;

an intersection set processing unit: for determining the intersection ratio from the intersection and the union.

In a third aspect, an embodiment of the present invention provides an object detection model training apparatus, where functions of the apparatus may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above-described functions.

In one possible design, the structure of the apparatus includes a processor and a memory, the memory is used for storing a program for supporting the apparatus to execute the above object detection model training method, and the processor is configured to execute the program stored in the memory. The device may also include a communication interface for communicating with other devices or a communication network.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing computer software instructions for an object detection model training apparatus, which includes a program for executing the object detection model training method.

One of the above technical solutions has the following advantages or beneficial effects:

in the embodiment of the invention, the loss function is determined by the intersection ratio of the object true value bounding box and the object detection bounding box, so that the accuracy of the object detection model for detecting the object can be improved, and the accuracy of the object detection model is ensured.

The foregoing summary is provided for the purpose of description only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.

Drawings

In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.

FIG. 1 shows a flow diagram of an object detection model training method according to an embodiment of the invention.

FIG. 2 shows a flow diagram of an object detection model training method according to an embodiment of the invention.

FIG. 3 shows a flow diagram of an object detection model training method according to an embodiment of the invention.

FIG. 4 shows a flow diagram of an object detection model training method according to an embodiment of the invention.

Fig. 5 is a block diagram showing a structure of an object detection model training apparatus according to an embodiment of the present invention.

Fig. 6 is a block diagram showing a structure of an object detection model training apparatus according to an embodiment of the present invention.

Fig. 7 shows a block diagram of the structure of an object detection model training apparatus according to an embodiment of the present invention.

Detailed Description

In the following, only certain exemplary embodiments are briefly described. As those skilled in the art will recognize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

Fig. 1 shows a flow chart of a training method of an object detection model according to an embodiment of the invention. As shown in fig. 1, the training method of the object detection model includes:

step S11: and determining an object true value bounding box according to the labeling data of the detection target.

Step S12: and inputting the detection target into an object detection model, and obtaining an object detection surrounding frame according to detection data output by the object detection model.

Step S13: and determining the intersection and combination ratio of the object true value surrounding frame and the object detection surrounding frame according to the object true value surrounding frame and the object detection surrounding frame.

Step S14: and determining a loss function value according to the intersection ratio.

Step S15: and optimizing the object detection model according to the loss function value.

In the object detection technology, an object is usually represented by a 2D (Two-dimensional) Bounding Box (Bounding Box) or a 3D (Three-dimensional) Bounding Box with parameters, specifically, the center, dimension, direction, and the like of the Bounding Box. Therefore, the object detection problem is tasked with narrowing the difference between the annotation data and the detection data. Compared with the detection data, the labeling data can accurately represent information such as the position, the angle and the like of the object in the detection target and can be used as reference data for identifying the object in the detection target. In the embodiment of the present invention, the execution order of step S11 and step S12 is not limited, and the two steps may be executed simultaneously or in any order.

Typically, the object detection framework uses a square loss and an absolute loss to optimize the object detection model. However, these general loss functions simply perform direct regression by using the attributes of the object, such as the center point position, the length, width, height, and deflection angle, as regression parameters. In the embodiment of the invention, the loss function is determined by the intersection ratio of the object true value bounding box and the object detection bounding box, so that the accuracy of the object detection model for detecting the object can be improved, and the accuracy of the object detection model is ensured.

In the embodiment of the invention, the detection target can be an image or a three-dimensional point cloud.

determining a loss function value according to the following formula:

L-1-IoU formula 1;

where L is the loss function value and IoU is the cross-over ratio.

In an embodiment of the present invention, IoU (overlap over Union) may represent a ratio of Intersection to Union of an object detection bounding box and a true value bounding box in an image. Compared with the square loss and the absolute value loss, the loss function value is determined by adopting IoU, and each parameter of a frame can be considered when the object detection model is optimized, so that the model optimized by adopting the loss function value can detect the object more accurately, and the effect of reducing the difference between detection data and a true value is achieved. Secondly, the calculation of IoU explicitly involves the relationship between parameters, and embodiments of the present invention can be optimized in conjunction with the relationship between the parameters when optimizing the model, as opposed to using two independent data as the loss function values. Again, IoU is itself an object detection related parameter, preferably as a loss function of the object detection model.

determining a loss function value according to the following formula:

L-1-GIoU formula 2;

wherein the content of the first and second substances,

wherein, GIoU is generalized cross-over ratio; area_CFor containing the Area of the object truth value bounding box and the smallest bounding box of the object detection bounding box, Area_UThe area of the union of the object truth bounding box and the object detection bounding box is IoU, which is the intersection ratio.

The optimization method of the object detection model provided by the embodiment of the invention is suitable for 2D object detection and 3D object detection in a detection target.

In the embodiment of the present invention, when there is an intersection between the detection bounding box and the true bounding box, the IoU value is calculated by the following formula:

a denotes a true value bounding box and B denotes a detection bounding box.

Fig. 2 shows a flow chart of an object detection method according to an embodiment of the invention. In this embodiment, the steps S11-S15 can refer to the related descriptions in the above embodiments, and are not described herein again.

The difference from the above embodiment is that, as shown in fig. 2, optimizing the object detection model according to the loss function value includes:

step S21: and carrying out back propagation calculation according to the loss function value to obtain a back propagation calculation gradient value.

Step S22: and calculating gradient values according to the back propagation, and optimizing parameters of the object detection model.

In the back propagation calculation process, the loss function value calculated by using the loss function calculation method provided by the embodiment of the invention can be used in any known back propagation calculation process.

For example, when both the detection bounding box and the truth bounding box are two-dimensional boxes, in equation 3 above, a ≧ B denotes an area where the detection bounding box and the truth bounding box intersect, and a ≧ B denotes an area where the detection bounding box and the truth bounding box merge. When the detection surrounding frame and the truth surrounding frame are both three-dimensional frames, in the above formula 3, a ≡ B denotes a volume of an intersection of the detection surrounding frame and the truth surrounding frame, and a ≡ B denotes a volume of a union of the detection surrounding frame and the truth surrounding frame.

In a specific embodiment, when the detection bounding box and the true bounding box are both 2D boxes, IoU values may be calculated according to the area values of the overlap and merge portions between the two bounding boxes, that is:

A∩B＝Area_overlap；Area_overlaprepresents the area of the overlapping portion of A and B;

A∪B＝Area_A+Area_B-Area_overlapi.e. the area of the union of a and B.

Because the value range of IoU is between 0 and 1, the value range of the loss value is also between 0 and 1.

In a specific embodiment, when both the detection bounding box and the true bounding box are 3D boxes, IoU is calculated from the volume values of the overlap and merge portions between the two bounding boxes, i.e.:

A∩B＝Area_overlap×h_overlap；Area_overlaprepresents the area of the overlapping portion of A and B; h is_overlapRepresents the height of the overlapping part of A and B;

A∪B＝(Area_A+Area_B-Area_overlap)×h_unioni.e. the area of the union of a and B. h is_unionIndicating the height of the union of a and B.

In one embodiment, the annotation data comprises an object angle annotation value and the detection data comprises an object angle detection value.

In a specific embodiment, when the long sides or the wide sides of the detection bounding box and the true value bounding box are parallel to the coordinate axis, as shown in the figureAnd 3, neither the labeling data nor the detection data contains the angle coordinate value of the object, and the shaded part represents the overlapped part. The annotation data includes the coordinate values of the four angles of the true bounding box of the object: c₁(x₁，y₁)，D₁(x₂，y₁)，E₁(x₂，y₂)，F₁(x₁，y₂). The detection data includes angle values C of four coordinates of a detection bounding box of the object₂(x′₁，y′₁)，D₂(x′₂，y′₁)，E₂(x′₂，y′₂)，F₂(x′₁，y′₂). Wherein x₁≤x₂，y₂≤y₁；x′₁≤x′₂，y′₂≤y′₁。

The intersection of A and B is calculated by the following formula:

A∩B＝Area_overlap＝(max(x₂，x′₂)-min(x₁，x′₁))×(max(y₁，y′₁)-min(y₂，y′₂))。

the union calculation formula of A and B is as follows: a ═ U B ═ Area_A+Area_B-Area_overlap；

Wherein, Area_A＝(x₂-x₁)×(y₁-y₂)；Area_B＝(x′₂-x′₁)×(y′₁-y′₂)。

In a specific embodiment, when the detection bounding box and the true bounding box have no longer parallel long sides and width sides to the coordinate axes, as shown in fig. 4, the annotation data includes coordinate values of four angles of the true bounding box of the object: c₁(x₁，y₁)，D₁(x₂，y₁)，E(x₂，y₂)，F(x₁，y₂). The detection data includes angle values C of four coordinates of a detection bounding box of the object₃(x₃，y₃)，D₃(x₄，y₃)，E₃(x₄，y₄)，F₃(x₃，y₄)

By adopting the object detection model training method provided by the embodiment of the invention, the data on the public data set is utilized for testing, and compared with the object detection model training method in the prior art, the object detection model training method has a remarkable improvement effect. And selecting public data on the PointPillars public data set for testing, and dividing the data into simple, general and difficult types according to the detection difficulty. And evaluating the quality of the test by adopting AP (Average Precision) and mAP (mean Average Precision). The test results are shown in table 1 below:

TABLE 1

As can be seen from the above table, according to the object detection model training method provided by the embodiment of the present invention, the loss function value is determined according to the cross-over ratio, and then the object detection model is optimized according to the loss function value. In a similar way, the object detection model training method provided by the embodiment of the invention is tested by adopting the public data set, and compared with the prior art that the object detection model is optimized by other loss function values, the accuracy of object detection can be obviously improved.

An embodiment of the present invention further provides an object detection model training apparatus, which has a structure shown in fig. 5, and includes:

true value block 51: the real-value bounding box of the object is determined according to the marking data of the detection target;

the detection module 52: the object detection surrounding frame is used for inputting the detection target into an object detection model and obtaining an object detection surrounding frame according to detection data output by the object detection model;

intersection ratio calculation module 53: the intersection ratio of the object truth value bounding box and the object detection bounding box is determined according to the object truth value bounding box and the object detection bounding box;

loss function calculation module 54: determining a loss function value according to the intersection ratio;

the optimization module 55: for optimizing the object detection model in dependence on the loss function values.

determining a loss function value according to the following formula:

L＝1-IoU：

where L is the loss function value and IoU is the cross-over ratio.

determining a loss function value according to the following formula:

L＝1-GIoU；

wherein the content of the first and second substances,

In one embodiment, as shown in fig. 6, the optimization module comprises:

back propagation calculating unit 61: the system is used for carrying out back propagation calculation according to the loss function value to obtain a back propagation calculation gradient value;

the reverse processing unit 62: and the gradient value is calculated according to the back propagation, and the parameters of the object detection model are optimized.

In one embodiment, the intersection ratio calculation module comprises:

The functions of each module in each apparatus in the embodiments of the present invention may refer to the corresponding description in the above method, and are not described herein again.

Fig. 7 shows a block diagram of the structure of an apparatus according to an embodiment of the invention. As shown in fig. 7, the apparatus includes: a memory 910 and a processor 920, the memory 910 having stored therein computer programs operable on the processor 920. The processor 920, when executing the computer program, implements the object detection model optimization method in the above embodiments. The number of the memory 910 and the processor 920 may be one or more.

The apparatus/device/terminal/server further comprises:

and a communication interface 930 for communicating with an external device to perform data interactive transmission.

Memory 910 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 910, the processor 920 and the communication interface 930 are implemented independently, the memory 910, the processor 920 and the communication interface 930 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.

Optionally, in an implementation, if the memory 910, the processor 920 and the communication interface 930 are integrated on a chip, the memory 910, the processor 920 and the communication interface 930 may complete communication with each other through an internal interface.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and the computer program is used for implementing the method of any one of the above embodiments when being executed by a processor.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present invention, and these should be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. An object detection model training method, comprising:

determining a loss function value according to the intersection ratio;

and optimizing the object detection model according to the loss function value.

2. The method of claim 1, wherein determining a loss function value based on the intersection ratio in the case of an overlap between the object truth bounding box and the object detection bounding box comprises:

determining a loss function value according to the following formula:

L＝1-IoU；

where L is the loss function value and IoU is the cross-over ratio.

3. The method of claim 1, wherein determining a loss function value according to the intersection ratio in the absence of overlap between the object truth bounding box and the object detection bounding box comprises:

determining a loss function value according to the following formula:

L＝1-GIoU；

wherein the content of the first and second substances,

4. The method of claim 1, wherein optimizing the object detection model based on the loss function values comprises:

5. The method of claim 1, wherein determining the intersection ratio of the object truth bounding box and the object detection bounding box according to the object truth bounding box and the object detection bounding box comprises:

6. The method according to any one of claims 1 to 5, wherein any edge of the object truth bounding box is non-parallel to any edge of the object detection bounding box.

7. An object detection model training device, comprising:

8. The apparatus of claim 7, wherein in the case that there is an overlap between the object true bounding box and the object detect bounding box, the loss function computation module is configured to:

determining a loss function value according to the following formula:

L＝1-IoU；

where L is the loss function value and IoU is the cross-over ratio.

9. The apparatus of claim 7, wherein in the absence of overlap between the object truth bounding box and the object detection bounding box, the loss function computation module is configured to:

determining a loss function value according to the following formula:

L＝1-GIoU；

wherein the content of the first and second substances,

10. The apparatus of claim 7, wherein the optimization module comprises:

11. The apparatus of claim 7, wherein the intersection ratio calculation module comprises:

12. The apparatus according to any one of claims 7-11, wherein any edge of the object truth bounding box is non-parallel to any edge of the object detection bounding box.

13. An object detection model optimization apparatus, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-6.

14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 6.