WO2022183484A1

WO2022183484A1 - Method and apparatus for determining object detection model

Info

Publication number: WO2022183484A1
Application number: PCT/CN2021/079304
Authority: WO
Inventors: 高鲁涛; 罗达新
Original assignee: 华为技术有限公司
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-09
Also published as: CN116508073A

Abstract

The present application provides a method and apparatus for determining an object detection model, applicable in the field of intelligent driving, autonomous driving or unmanned driving of intelligent vehicles. The method comprises: obtaining, on the basis of a first object detection model, a first object detection box of an image to be detected, determining parameter information of the first object detection box, and determining parameter information of a target ground truth box, wherein the target ground truth box is an actual boundary contour of an object in said image, and the parameter information of the first object detection box and the parameter information of the target ground truth box both comprise parameters related to object ranging of the object; determining, according to the parameter information of the first object detection box and the parameter information of the target ground truth box, a deviation between the first object detection box and the target ground truth box; and determining a second object detection model according to the deviation. The parameters related to object ranging are used to modify the first object detection model, thereby improving the accuracy of object ranging.

Description

Method and device for determining target detection model

technical field

The present application relates to the field of target detection, and more particularly, to a method and apparatus for determining a target detection model.

Background technique

With the development of society, more and more machines in modern life are developing towards automation and intelligence, and vehicles for mobile travel are no exception. Intelligent vehicles are gradually entering people's daily life. For example, in vehicles that include unmanned systems, object detection can be achieved. For example, target detection may include obstacle recognition, traffic light recognition, and sign recognition.

The task of object detection is an important task in the field of machine vision. The purpose of object detection is to find out the object of interest in an image or video and detect the location of the object. The target detection is implemented based on the target detection model, and the target detection model can be constructed based on a series of training samples. In the training phase of the target detection model, the loss function can measure the gap between the target position detected by the target detection model and the real position of the target. According to the difference between the two, the parameters of the target detection model are modified in the direction of decreasing the loss function, so that the target detection model detects that the target position is getting closer and closer to the real position.

The task after target detection may be target ranging, that is, determining the distance between the target and itself according to the position of the target obtained by target detection. However, since target detection and target ranging are two separate tasks, often in the training phase of the target detection model, the design of the loss function only considers the needs of target detection and does not consider the needs of subsequent target ranging, which is likely to cause subsequent The error of target ranging is large.

SUMMARY OF THE INVENTION

The present application provides a method and device for determining a target detection model, which can improve the accuracy of target ranging.

A first aspect provides a method for determining a target detection model, where the target detection model is used for target detection, the method comprising: acquiring a first target detection frame according to a first target detection model, the first target detection frame is the boundary contour obtained by performing target detection on the image to be detected; determine the parameter information of the first target detection frame, and the parameter information of the first target detection frame includes parameters related to target ranging; The parameter information of the value box, the target ground-truth box is the actual boundary contour of the target in the image to be detected, and the parameter information of the target ground-truth box includes the target range related to the target object. parameter; according to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame, determine a loss item, and the loss item is used to indicate the difference between the first target detection frame and the target ground-truth frame A second target detection model is determined according to the loss term, and the second target detection model is used to determine a second target detection frame.

The first target detection frame can be understood as the position of the target in the image to be detected obtained through target detection.

In an achievable manner, obtaining the first target detection frame according to the first target detection model may include: inputting the image to be detected into the first target detection model, and the first target detection model detects the target object based on a corresponding algorithm The coordinate value of the area occupied in the image to be detected, and according to the coordinate value of the area occupied by the target in the image to be detected, the boundary contour of the target object, that is, the first target detection frame is obtained.

Exemplarily, the coordinate value of the area occupied by the target object in the image to be detected may include the coordinate value of the diagonal corner of the area occupied by the target object in the image to be detected.

Exemplarily, the coordinate value of the area occupied by the target object in the image to be detected may include the width, height and coordinate value of the center point of the area occupied by the target object in the image to be detected.

Among them, the target ground-truth frame can be understood as the actual position of the target in the image to be detected.

In an achievable manner, the actual boundary contour of the target is manually marked.

Specifically, the coordinate value of the area actually occupied by the target object in the image to be detected can be manually marked, so as to obtain the actual boundary contour of the target object according to the coordinate value of the area actually occupied by the target object in the image to be detected, that is, Target ground truth box.

Exemplarily, the human can label the coordinate values of the diagonal corners of the area occupied by the target in the image to be detected.

Exemplarily, the human can annotate the width, height and coordinate value of the center point of the area occupied by the target in the image to be detected.

In an achievable manner, if the determined loss item is smaller than the preset value, it is considered that the position of the target detected by the target detection model corresponding to the current loss item is very close to the real position of the target. At this time, the second target detection model determined according to the loss term is the target detection model. Wherein, the second target detection model is the first target detection model, and if target detection is performed again on the image to be detected based on the second target detection model, the obtained second target detection frame of the target object is the first target detection frame.

In another achievable manner, if the determined loss term is greater than or equal to the preset value, the first target detection model needs to be corrected N (N≥1, N is a positive integer) times until the Nth time The loss term obtained by the revised first target detection model is smaller than the preset value, that is, it is considered that the position of the target detected by the Nth revised first target detection model (for example, the second target detection model) is very close to the real position of the target . At this time, the second target detection model is the target detection model. Wherein, the second target detection model is not the first target detection model. And if target detection is performed again on the image to be detected based on the second target detection model, the obtained second target detection frame of the target object is different from the first target detection frame. And compared with the first target detection frame, the second target detection frame is closer to the target ground-truth frame.

First, perform target detection on the image to be detected based on the first target detection model, obtain a first target detection frame, determine the parameter information of the first target detection frame, and obtain the parameter information of the actual boundary contour of the target in the image to be detected, that is, The parameter information of the target ground-truth frame, wherein the parameter information of the first target detection frame and the parameter information of the target ground-truth frame both include parameters related to the target ranging for the target; secondly, according to the parameters of the first target detection frame The information and the parameter information of the target detection frame are used to determine the loss item between the first target detection frame and the target ground-truth frame; finally, the second target detection model is determined according to the loss item. Therefore, in the process of target detection, the parameters related to the target ranging are used to determine the deviation between the first target detection frame and the target true value frame, and according to the deviation, the first target detection model is modified to obtain The second target detection model, which can improve the accuracy of target ranging.

With reference to the first aspect, in some implementations of the first aspect, the parameter information of the first target detection frame includes the coordinate value of the center point of the bottom edge of the first target detection frame, the parameter of the target ground-truth frame The information includes the coordinate value of the center point of the bottom edge of the target ground truth box.

With reference to the first aspect, in some implementations of the first aspect, the target ranging is ranging by using a landing point ranging method.

With reference to the first aspect, in some implementations of the first aspect, the parameter information of the first target detection frame includes the area of the first target detection frame, and the parameter information of the target ground-truth frame includes the target The area of the ground truth box.

With reference to the first aspect, in some implementations of the first aspect, the target ranging is ranging by using a proportional ranging method.

With reference to the first aspect, in some implementations of the first aspect, determining the loss term according to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame includes: according to the first target The parameter information of the detection frame, the parameter information of the target ground-truth frame, and the parameters of the minimum enclosing rectangular frame between the first target detection frame and the target ground-truth frame are used to determine the loss term.

With reference to the first aspect, in some implementations of the first aspect, the parameter of the minimum enclosing rectangle includes the length of the diagonal of the minimum enclosing rectangle.

With reference to the first aspect, in some implementations of the first aspect, the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the first target detection frame and the The parameters of the minimum enclosing rectangular box of the target ground-truth box include: determining the loss term according to the following formula:

Wherein, point c is the center point of the bottom edge of the target detection frame, point c ^gt is the center point of the bottom edge of the target ground-truth frame, and the ρ(c,c ^gt ) is the point c and the point c ^gt is the distance between points, and the s is the length of the diagonal of the minimum enclosing rectangle.

With reference to the first aspect, in some implementations of the first aspect, the parameter of the minimum enclosing rectangle includes an area of the minimum enclosing rectangle.

Wherein, a1 is the area of the target detection frame, a2 is the area of the target ground-truth frame, and a3 is the area of the minimum circumscribed rectangular frame.

In a second aspect, an apparatus for determining a target detection model is provided, where the target detection model is used for target detection, and the apparatus includes: an obtaining unit configured to obtain a first target detection frame according to a first target detection model, the The first target detection frame is the boundary contour obtained by performing target detection on the image to be detected; the processing unit is used to determine the parameter information of the first target detection frame, and the parameter information of the first target detection frame includes the same as the target detection frame. parameters related to target ranging; the processing unit is further configured to determine the parameter information of the target ground truth frame, the target ground truth frame is the actual boundary contour of the target object in the image to be detected, and the target ground truth frame is The parameter information of the value box includes parameters related to target ranging for the target; the processing unit is further configured to, according to the parameter information of the first target detection frame and the parameter information of the target true value frame, determining a loss term, where the loss term is used to indicate a deviation between the first target detection frame and the target ground-truth frame; the processing unit is further configured to determine a second target detection model according to the loss term, The second target detection model is used to determine a second target detection frame.

First, the acquisition unit performs target detection on the image to be detected based on the first target detection model to obtain a first target detection frame; the processing unit determines the parameter information of the first target detection frame and the parameter information of the actual boundary contour of the target in the to-be-detected image , that is, the parameter information of the target ground-truth frame, wherein the parameter information of the first target detection frame and the parameter information of the target ground-truth frame both include parameters related to target ranging; The parameter information of the detection frame and the parameter information of the target detection frame determine the loss term between the first target detection frame and the target ground truth frame; finally, the processing unit also determines the second target detection model according to the loss term. Therefore, in the process of target detection, the parameters related to the target ranging are used to determine the deviation between the first target detection frame and the target true value frame, and according to the deviation, the first target detection model is modified to obtain The second target detection model, which can improve the accuracy of target ranging.

With reference to the second aspect, in some implementations of the second aspect, the parameter information of the first target detection frame includes the coordinate value of the center point of the bottom edge of the first target detection frame, the parameter of the target ground-truth frame The information includes the coordinate value of the center point of the bottom edge of the target ground truth box.

With reference to the second aspect, in some implementations of the second aspect, the target ranging is ranging by using a landing point ranging method.

With reference to the second aspect, in some implementations of the second aspect, the parameter information of the first target detection frame includes the area of the first target detection frame, and the parameter information of the target ground-truth frame includes the target The area of the ground truth box.

With reference to the second aspect, in some implementations of the second aspect, the target ranging is ranging by using a proportional ranging method.

With reference to the second aspect, in some implementations of the second aspect, the processing unit is further specifically configured to: according to the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the The parameters of the minimum enclosing rectangle between the first target detection frame and the target ground-truth frame determine the loss term.

With reference to the second aspect, in some implementations of the second aspect, the parameter of the minimum enclosing rectangle includes the length of the diagonal of the minimum enclosing rectangle.

With reference to the second aspect, in some implementations of the second aspect, the processing unit is further specifically configured to: determine the loss term according to the following formula:

With reference to the second aspect, in some implementations of the second aspect, the parameter of the minimum enclosing rectangle includes an area of the minimum enclosing rectangle.

With reference to the second aspect, in some implementations of the second aspect, the actual boundary contour of the target object is manually marked.

A third aspect provides an apparatus for determining a target detection model, comprising at least one memory and at least one processor, wherein the at least one memory is used for storing a program, and the at least one processor is used for running the program to realize the first aspect the method described.

It should be understood that a program may also be referred to as program code, computer instructions, computer programs, program instructions, or the like.

In a fourth aspect, a chip is provided, comprising at least one processor and an interface circuit, the interface circuit is configured to provide program instructions or data for the at least one processor, and the at least one processor is configured to execute the program instructions, to implement the method described in the first aspect.

Optionally, as an implementation manner, the chip system may further include a memory, in which a program is stored, the processor is configured to execute the program stored in the memory, and when the program is executed, the The processor is configured to perform the method in the first aspect.

In a fifth aspect, a computer-readable storage medium is provided, where the computer-readable medium stores a program code for execution by a device, and when the program code is executed by the device, the method described in the first aspect is implemented.

In a sixth aspect, a computer program product is provided, the computer program product comprising a computer program, when the computer program product is executed by a computer, the computer performs the method in the aforementioned first aspect.

It should be understood that, in this application, the method of the first aspect may specifically refer to the method in the first aspect and any one of the various implementation manners of the first aspect.

In a seventh aspect, a terminal is provided, including the apparatus for determining a target detection model according to the second aspect or the third aspect.

Further, the terminal may be an intelligent transportation device (vehicle or drone), a smart home device, an intelligent manufacturing device, or a robot, and the like. The intelligent transportation device may be, for example, an automated guided vehicle (AGV), or an unmanned transportation vehicle.

Description of drawings

FIG. 1 is a schematic diagram of an example of an application scenario of the technical solution provided by the embodiment of the present application.

Figure 2 is a schematic diagram of a target detection frame and a target ground-truth frame.

FIG. 3 is a schematic diagram of another target detection frame and target ground-truth frame.

Figure 4 is a schematic diagram of a set of object detection boxes and object ground-truth boxes.

FIG. 5 is a schematic diagram of another target detection frame and target ground-truth frame.

Figure 6 is a schematic diagram of another set of target detection boxes and target ground-truth boxes.

FIG. 7 is a schematic diagram of distance measurement based on a similar triangle method of landing points.

Figure 8 is a schematic diagram of another set of target detection boxes and target ground-truth boxes.

FIG. 9 is a schematic flowchart of a method for determining a target detection model provided by the present application.

FIG. 10 is a schematic diagram of a set of first target detection frames and target ground-truth frames provided by the present application.

FIG. 11 is a schematic diagram of another set of first target detection frames and target ground-truth frames provided by this application.

FIG. 12 is a schematic diagram of a set of target detection frames and target ground-truth frames provided by this application.

FIG. 13 is a schematic diagram of another set of target detection frames and target ground-truth frames provided by this application.

FIG. 14 is a schematic diagram of comparison of target ranging based on different target detection methods.

FIG. 15 is a schematic structural diagram of an apparatus for determining a target detection model provided by an embodiment of the present application.

FIG. 16 is a schematic structural diagram of another apparatus for determining a target detection model provided by an embodiment of the present application.

Detailed ways

The technical solutions in the present application will be described below with reference to the accompanying drawings.

The terms used in the following embodiments are for the purpose of describing particular embodiments only, and are not intended to be limitations of the present application. As used in the specification of this application and the appended claims, the singular expressions "a," "an," "the," "above," "the," and "the" are intended to also include. Such expressions as "one or more" unless the context clearly dictates otherwise. It should also be understood that, in the following embodiments of the present application, "at least one" and "one or more" refer to one, two or more than two.

References to "one embodiment" or "some embodiments" or the like described in the embodiments of the present application mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.

Hereinafter, scenarios to which the technical solutions of the embodiments of the present application may be applied will be introduced with reference to FIG. 1 .

FIG. 1 shows a schematic diagram of an example application scenario of the target detection model provided by the embodiment of the present application. As shown in FIG. 1 , the application scenario 100 may include a vehicle 111 and a vehicle 112 driving in front of the vehicle 111 . Wherein, the vehicle 111 is provided with the target detection model determined by the method 200 , then the vehicle 111 can perform target detection on an object in front of the vehicle 111 (eg, the vehicle 112 ) based on the target detection model. After the target detection, the vehicle 111 may also perform target ranging on the vehicle 112, so as to allow the driver of the vehicle 111 to perceive the condition of the road ahead and make a driving strategy (eg, path planning) in advance.

It should be understood that the vehicle 111 and the vehicle 112 are schematically shown in FIG. 1 only for ease of understanding, but this should not constitute any limitation to the present application. For example, the scene shown in FIG. 1 may further include more objects (including devices), which is not limited in this application.

With the development of society, more and more machines in modern life are developing towards automation and intelligence, and vehicles for mobile travel are no exception. Intelligent vehicles are gradually entering people's daily life. In recent years, the advanced driving assistant system (ADAS) has played a very important role in intelligent vehicles. It uses various sensors installed on the vehicle to sense the surroundings at any time during the driving process of the vehicle. environment, collect data, identify, detect and track stationary and moving objects, and combine with the map data of the navigator to carry out systematic calculation and analysis, so as to make drivers aware of possible dangers in advance and effectively increase the driving experience of the car. Comfort and safety.

For example, in a vehicle including an unmanned system, vision-based sensors (eg, on-board cameras), radar-type sensors (eg, on-board millimeter-wave radar, on-board lidar, on-board ultrasonic radar), and the like may be installed. Due to the low cost and relatively mature technology of the camera, it is the first to become the main sensor of the unmanned system. For example, ADAS implements functions such as adaptive cruise control (ACC), automatic emergency braking (AEB), lane change assist (LCA), blind spot monitoring (BSD), etc. Can't do without the camera.

According to the images collected by the camera, many ADAS functions can be realized, such as lane line detection, freespace detection, obstacle recognition, traffic light recognition, sign recognition and other functions.

Among them, functions such as lane line detection and freespace detection generally use the semantic segmentation model in the machine learning algorithm to give classification information to pixels belonging to lane lines or pixels belonging to freespace in the image.

Among them, the functions of obstacle recognition, traffic light recognition and sign recognition generally use the target detection model in the machine learning algorithm to achieve target detection.

The task of object detection is an important task in the field of machine vision. The purpose of object detection is to find out the object of interest in the image or video (the object can be called the object), and simultaneously realize the position of the output object and the object classification of the object. For example, the method of outputting the object category of the target and the minimum bounding box of the target on the image (the image involved in this application refers to the image of the target object detected by the terminal device) is called 2D target detection; The target detection of the object category and the length, width, height and rotation angle of the target in the three-dimensional space is called 3D target detection.

Target detection, also known as target extraction, is a kind of target positioning based on target geometric and statistical characteristics, which combines target positioning and recognition into one, and its accuracy and real-time performance are the most important for the entire system that needs to achieve target positioning. an important capability. Especially in complex scenes, when multiple targets need to be processed in real time, the automatic extraction and recognition of targets is particularly important.

Exemplarily, the object detection model corresponding to the obstacle recognition will detect the position of the obstacle in the image, and give the category of the obstacle (for example, vehicle, pedestrian, etc.). For example, the target detection model corresponding to obstacle recognition can be based on the YOLO algorithm to obtain the position of the obstacle from the image collected by the camera, and the obtained position of the obstacle is generally represented by a rectangular frame. At the same time, the category information of the obstacle and the confidence information corresponding to the category can also be obtained from the image collected by the camera. For example, if the obstacle category is a vehicle, the confidence level corresponding to the obstacle category is a vehicle is 90%.

In the 2D object detection process, the bounding box is usually used to describe the position of the detected object. The detection frame can be a rectangular frame, which can be determined by the coordinates of the rectangular frame. For example, it can be determined by the coordinates of the opposite corners of the rectangular box.

For example, FIG. 2 is a schematic diagram of a 2D detection frame provided by an embodiment of the present application. The dashed box (which may be referred to as a target detection frame) as shown in FIG. 2 is the position of the target obtained through target detection. The solid line box shown in Figure 2 (which can be called the target ground truth box) is the actual position of the target.

The embodiments of the present application do not limit how to implement target detection, and target detection can be performed by using an existing method or a method after technical development.

It should be understood that in the process of 2D target detection, the bounding box is predicted by the target detection model. The target detection model will predict the position and size of a detection frame relative to the reference point, the type of objects in the detection frame and the confidence of whether there is an object. , and the object class confidence.

The target detection is implemented based on the target detection model, and the target detection model can be constructed based on a series of training samples. Exemplarily, the target detection model can be obtained by training through multiple images and the position of the target in each of the multiple images. When any image is input into the target detection model, the target detection model can detect the position of the target in the image through a corresponding algorithm to obtain the position of the detected target.

In the early stage of building the target detection model, there may be insufficient training samples, and the constructed target detection model may not be able to detect the position of the target well, resulting in the target position detected by the target detection model being far away from the real position of the target, that is, the target The detection accuracy of the detection model is not high. Therefore, it is necessary to revise the target detection model to obtain a target detection model with higher detection accuracy. Among them, the loss function (Loss Function) can be used to measure the gap between the position of the target detected by the target detection model and the real position of the target. If the value output by the loss function is larger, the gap between the position of the target detected by the target detection model and the real position of the target is larger, and if the value output by the loss function is smaller, the position of the target detected by the target detection model is the same as the target. closer to the true location. Therefore, the process of revising the target detection model can be understood as a process of reducing the loss function. When the value output by the loss function is smaller than the preset value, it is considered that the position of the target detected by the target detection model is very close to the real position of the target.

Hereinafter, taking Mode 1 and Mode 2 as examples, we will introduce how to calculate the loss term in the loss function.

Mode 1, the calculation method of the loss function considers the distance between the center point of the target detection frame and the center point of the target ground-truth frame.

For example, as shown in FIG. 3 , the loss term output by the loss function of the target detection frame 101 and the target ground-truth frame 102 can be:

Among them, ρ(A1, A2) is the distance between point A1 and point A2, point A1 is the center point of the target detection frame 101, point A2 is the target ground truth frame 102, and s1 is the target detection frame 101 and the target ground truth frame. The length of the diagonal of the smallest enclosing rectangle 103 of 102 (eg, the distance between point A3 and point A4).

In the construction of the target detection model corresponding to FIG. 3 , the loss function of the target detection frame 101 and the target ground-truth frame 102 can be gradually reduced, that is, the center point of the target detection frame 101 is continuously approached to the center point of the target ground-truth frame 102, to Improve the detection accuracy of the target detection model corresponding to Figure 3. For example, the loss term output by the loss function of the target detection frame 101 and the target ground-truth frame 102 shown in (a) of FIG. 4 , the target detection frame 101 and the target ground-truth frame 102 shown in (b) of FIG. 4 The loss term output by the loss function, the loss term output by the loss function of the target detection frame 101 and the target ground-truth frame 102 shown in (c) in FIG. 4 decrease sequentially.

In method 2, the calculation method of the loss function considers the distance between the aspect ratio of the target detection frame and the aspect ratio of the target ground-truth frame.

For example, as shown in FIG. 5 , the loss term output by the loss function of the target detection frame 103 and the target ground-truth frame 104 can be:

α is a weight parameter, w2 is the width of the target ground truth frame 104 , h2 is the height of the target ground truth frame 104 , w1 is the width of the target detection frame 103 , and h1 is the height of the target detection frame 103 .

In the construction of the target detection model corresponding to FIG. 6 , the loss function of the target detection frame 103 and the target ground-truth frame 104 can be gradually reduced, that is, the aspect ratio of the target detection frame 101 is constantly approaching the aspect ratio of the target ground-truth frame 102 , to improve the detection accuracy of the target detection model corresponding to Figure 5. For example, the loss term output by the loss function of the target detection frame 103 and the target ground-truth frame 104 shown in (a) of FIG. 6 , the target detection frame 103 and the target ground-truth frame 104 shown in (b) of FIG. 6 The loss term output by the loss function of , the loss term output by the loss function of the target detection frame 103 and the target ground-truth frame 104 shown in (c) in FIG. 6 decrease sequentially.

After the target detection model is revised, target detection can be performed based on the revised target detection model to obtain the position or size of the target. Then, the distance to the target, ie, target ranging, can be measured based on the target's location or size, in order to make a driving strategy (eg, path planning) in advance.

The following, taking the landing point ranging method and the proportional ranging method as an example, introduces how to determine the distance between the target and itself, that is, the target ranging, according to the position or size of the target detection frame obtained by the target detection in the image.

Among them, the landing point ranging method takes the landing point similar triangle ranging method and the landing point coordinate transformation ranging method as examples, and introduces how to determine the distance between the target and itself according to the position or size of the target detection frame obtained by target detection in the image. .

(1) Landing point similar triangle ranging method

The landing point similarity triangle ranging method uses the triangle similarity relationship to determine the distance between the target and itself.

Specifically, for example, as shown in FIG. 7 , the camera on the vehicle 111 is located at point p, the direction of the optical axis of the camera is parallel to the ground, and I is the imaging plane of the camera. According to the triangle similarity relationship, we can get:

Among them, y is the distance between the projection point of the target landing point in the image and the optical center of the image, the unit is pixel, f is the focal length, the unit is pixel, H is the height of the camera from the ground, the unit is m, and Z is the target landing point distance The horizontal distance of the camera, in m.

Then, the horizontal distance Z from the target landing point to the camera is:

Then Z can be considered as the distance between the vehicle 112 and the vehicle 111 .

In general, the projection point of the target landing point in the image can be equivalent to the midpoint of the bottom edge of the target detection frame obtained based on the target detection. For example, the midpoint of the bottom edge of the target detection frame as shown in FIG. 2 is O point. That is, the above y is the distance from the midpoint of the bottom edge of the target detection frame obtained by target detection to the optical center of the image. In this way, the vehicle 111 can determine the distance of the vehicle 112 from itself based on the target detection and formula (2).

(2) Landing point coordinate transformation ranging method

The landing point coordinate transformation ranging method is based on the image coordinates of the projection point of the target landing point in the image, combined with the camera internal parameter matrix and the camera external parameter matrix, the distance between the target and itself can be obtained.

Like the landing point similar triangle ranging method, in general, in the landing point coordinate transformation ranging method, the projection point of the target landing point in the image can also be equivalent to the bottom edge of the target detection frame obtained based on target detection. point.

(3) Proportional ranging method

The proportional ranging method uses the proportional relationship between the real object size and the image target size to measure the distance.

In general, the distance between the target and itself is determined according to the proportional relationship between the area of the real object under a certain viewing angle and the imaging area of the target in the image. For example, the size of the target detection frame obtained by target detection is 2m × 2m. As the target distance itself goes from near to far, the size of the target detection frame in the image will be scaled down. If the real size of the target is known and The proportional relationship can be used to obtain the distance between the target and itself.

Based on the methods described in (1) to (3) above, the target ranging task after target detection, that is, detecting the distance of the target, can be completed.

Since target detection and target ranging are two separate processes, often target detection only considers the needs of target detection and does not consider the needs of subsequent target ranging, which is likely to cause large errors in subsequent target ranging.

For example, in the target detection task, since the distance between the center point of the target detection frame 106 and the center point of the target ground-truth frame 105 as shown in (a) of FIG. 8 is equal to that shown in (b) of FIG. 8 The distance between the target detection frame 107 and the target ground-truth frame 105 shown in FIG. The distance is equal to the distance between the aspect ratio of the target detection frame 107 and the aspect ratio of the target ground-truth frame 105 as shown in (b) of FIG. , the loss items of the target detection frame 106 and the target ground-truth frame 105 shown in (a) of FIG. 8 and the loss items of the target detection frame 106 and the target ground-truth frame 107 shown in (b) of FIG. 8 all the same.

However, in the target ranging task, if based on the methods described in (1) and (2) above, the projection point of the target landing point in the image is equivalent to the midpoint of the bottom edge of the target detection frame obtained based on target detection. , to complete the task of target ranging. Since the midpoint of the bottom edge of the target detection frame 106 is closer to the midpoint of the bottom edge of the target ground truth frame 105 than the midpoint of the bottom edge of the target detection frame 107 , the target obtained based on the target detection shown in (b) in FIG. In the detection frame 107, the error of the target distance obtained by the measurement is larger than the target distance obtained by the measurement in the target detection frame 106 based on the target detection shown in (a) in FIG. 8 .

Therefore, the embodiment of the present application provides a method for determining a target detection model, and the target detection model is used for target detection, so that the target detection model determined by this method can improve the accuracy of target detection, and can also improve the target detection after target detection. distance accuracy.

In the embodiments of the present application, the upper left corner of the image to be detected is taken as the coordinate origin, the horizontal direction of the image to be detected from left to right is the positive direction of the x-axis, and the vertical direction of the image to be detected from top to bottom is the y-axis In the positive direction, the size of the image to be detected in the x-axis direction is the width of the image to be detected, and the size of the image to be detected in the y-axis direction is the height of the image to be detected.

In the embodiments of the present application, the target detection frame (for example, the first target detection frame and the second target detection frame) and the target ground-truth frame are taken as an example for description.

Hereinafter, the method for determining a target detection model provided by the embodiments of the present application will be introduced with reference to specific drawings.

For example, FIG. 9 is a schematic flowchart of a method 200 for determining a target detection model provided by an embodiment of the present application. As shown in Figure 9, the method 200 includes:

S210: Acquire a first target detection frame according to the first target detection model. The first target detection frame is a boundary contour obtained by performing target detection on the image to be detected.

In some embodiments, the image to be detected may be input into a first target detection model, and the first target detection model detects the coordinate value of the area occupied by the target in the image to be detected based on a corresponding algorithm, so that according to the The coordinate value of the area occupied by the target in the image to be detected is used to obtain the boundary contour of the target, that is, the first target detection frame. That is, the first target detection frame can be understood as the position of the target object in the image to be detected obtained through target detection.

In the embodiments of the present application, the targets all refer to the same target in the image to be detected.

Exemplarily, the coordinate value of the area occupied by the target object in the image to be detected may include the coordinate value of the diagonal corner of the area occupied by the target object in the image to be detected. For example, the coordinate values of the diagonal corners of the area occupied by the target in the image to be detected include the coordinate values of the upper left corner (x1, y1) and the coordinate value of the lower right corner (x2, y2) of the area occupied by the target in the image to be detected ).

Exemplarily, the coordinate values of the area occupied by the target in the image to be detected may include the width W ₁ , the height H ₁ of the area occupied by the target in the image to be detected, and the coordinate values of the center point (x _o1 , y _o1 ) . Among them, the center point can be understood as the center symmetry point of the area occupied by the target in the image to be detected.

In some embodiments, the first device (the device performing S220 ) includes a camera, and the image to be detected is captured by the camera. In other embodiments, the first device (the device performing S220 ) does not include a camera, and the first device may be obtained from other devices capable of obtaining images to be detected.

In some embodiments, a plurality of first target detection frames may be acquired according to the first target detection model, and S220 to S250 are performed for each first target detection frame.

S220: Determine parameter information of the first target detection frame. The parameter information of the first target detection frame includes parameters related to target ranging for the target.

Exemplarily, in the embodiment where the target ranging is based on the landing point similar triangle ranging method or the landing point coordinate transformation ranging method, that is, the target ranging needs to obtain the implementation of the bottom edge midpoint of the first target detection frame. For example, the parameter information of the first target detection frame may include the coordinate value of the center point of the bottom edge of the first target detection frame. Thus, the accuracy of target ranging can be improved.

For example, if the coordinate value of the area occupied by the target in the image to be detected includes the coordinate value (x1, y1) of the upper left corner and the coordinate value (x2, y2) of the lower right corner of the area occupied by the target in the image to be detected, Then the coordinate value of the center point of the bottom edge of the target detection frame is (

y2).

For another example, if the coordinate values of the area occupied by the target in the image to be detected include the width W ₁ and height H ₁ of the area occupied by the target in the image to be detected and the coordinate values of the center point (x _o1 , y _o1 ), Then the coordinate value of the center point of the bottom edge of the target detection frame is (x _o1 ,

).

Exemplarily, in the embodiment in which the target ranging is completed based on the proportional ranging method, that is, in the embodiment in which the target detection needs to obtain the area of the target detection frame (which can also be understood as the size), the parameter information of the first target detection frame is Including the area of the first target detection frame. Thus, the accuracy of target ranging can be improved.

For example, if the coordinate value of the area occupied by the target in the image to be detected includes the coordinate value (x1, y1) of the upper left corner and the coordinate value (x2, y2) of the lower right corner of the area occupied by the target in the image to be detected, Then the area of the target detection frame is |x2-x1|×|y2-y1|.

For another example, if the coordinate values of the area occupied by the target in the image to be detected include the width W ₁ and height H ₁ of the area occupied by the target in the image to be detected and the coordinate values of the center point (x _o1 , y _o1 ), Then the area of the target detection frame is W ₁ ×H ₁ .

S230: Determine parameter information of the target ground truth box. The target ground-truth frame is the actual boundary contour of the target in the image to be detected, and the parameter information of the target ground-truth frame includes parameters related to target ranging for the target.

In some embodiments, the coordinate value of the area actually occupied by the target object in the image to be detected may be manually marked, so as to obtain the actual boundary of the target object according to the coordinate value of the area actually occupied by the target object in the image to be detected Contour, that is, the target ground-truth box. That is, the target ground-truth frame can be understood as the actual position of the target in the image to be detected.

Exemplarily, the coordinate values of the diagonal corners of the area actually occupied by the target in the image to be detected may be manually marked. For example, the coordinates of the upper left corner (x3, y3) and the coordinates of the lower right corner (x4, y4) of the area actually occupied by the target in the image to be detected.

Exemplarily, the width W ₂ , the height H ₂ and the coordinate values (x _o2 , y _o2 ) of the area actually occupied by the target in the image to be detected may be manually marked.

In the embodiment in which the target ranging is completed based on the landing point similar triangle ranging method or the landing point coordinate transformation ranging method, that is, the embodiment in which the target ranging needs to obtain the midpoint of the bottom edge of the target detection frame, the target true value frame The parameter information includes the coordinate value of the center point of the bottom edge of the target ground truth box. Thus, the accuracy of target ranging can be improved.

For example, if the coordinate value of the area actually occupied by the target in the image to be detected includes the coordinate value of the upper left corner (x3, y3) and the coordinate value of the lower right corner (x4, y4) of the area actually occupied by the target in the image to be detected ), then the coordinate value of the center point of the bottom edge of the target ground-truth box is (

y3).

For another example, if the coordinate values of the diagonal corners of the area actually occupied by the target in the image to be detected include the width L ₂ and height w ₂ of the area actually occupied by the target in the image to be detected, and the coordinate value of the center point (x _o2 ) , y _o2 ), then the coordinate value of the center point of the bottom edge of the target truth box is (x _o2 ,

).

In the embodiment in which the target ranging is completed based on the proportional ranging method, that is, the target detection needs to obtain the area of the target detection frame (which can also be understood as the size), and the parameter information of the target ground-truth frame includes the target ground-truth frame. area. Thus, the accuracy of target ranging can be improved.

For example, if the coordinate value of the area actually occupied by the target in the image to be detected includes the coordinate value of the upper left corner (x3, y3) and the coordinate value of the lower right corner (x4, y4) of the area actually occupied by the target in the image to be detected ), then the area of the target ground-truth box is |x3-x4|×|y3-y4|.

For another example, if the coordinate values of the area actually occupied by the target in the image to be detected include the width L ₂ and height W ₂ of the area actually occupied by the target in the image to be detected, and the coordinates of the center point (x _o2 , y _o2 ), then the coordinate value of the center point of the bottom edge of the target ground-truth box is W ₂ ×H ₂ .

Optionally, the order of S220 and S230 is not limited in this embodiment of the present application.

S240: Determine a loss item according to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame. Among them, the loss term is used to indicate the deviation between the first target detection frame and the target ground-truth frame.

Hereinafter, S240 will be described in detail by taking Mode A, Mode B, Mode C, and Mode D as examples.

Mode A, the loss term is determined according to the coordinate value of the center point of the bottom edge of the first target detection frame and the coordinate value of the center point of the bottom edge of the target true value frame.

The loss term is determined according to the following formula:

Among them, point c is the center point of the bottom edge of the first target detection frame, point c ^gt is the center point of the bottom edge of the target ground-truth frame, ρ(c, c ^gt ) is the distance between point c and point c ^gt , H ₂ is the height of the target ground truth box.

Mode B, the loss term is determined according to the coordinate value of the bottom center point of the first target detection frame, the coordinate value of the bottom center point of the target true value frame, and the length of the diagonal line of the minimum circumscribed rectangular frame.

The mode B specifically includes S11 and S12.

S11: Determine the length of the diagonal of the smallest circumscribed rectangular frame between the first target detection frame and the target ground truth frame.

Specifically, according to the coordinate value of the first target detection frame and the coordinate value of the target truth value frame, determine the coordinate value of the minimum bounding rectangle frame of the first target detection frame and the target truth value frame, and according to the minimum bounding rectangle frame Coordinate value to determine the length of the diagonal of the smallest bounding rectangle.

Exemplarily, if the coordinate value of the first target detection frame includes the coordinate value of the upper left corner (x1, y1), and the coordinate value of the lower right corner (x2, y2). If the coordinate value of the target ground-truth box includes the coordinate value of the upper left corner (x3, y3) and the coordinate value of the lower right corner (x4, y4). Then the coordinate value of the upper left corner of the minimum bounding rectangle of the first target detection frame and the target ground truth frame is (min[x1, x3], min[y1, y3]), and the coordinate value of the lower right corner is (max[x2, x4], max[y2, y4]). Thus the length of the diagonal of the smallest bounding rectangle is:

Exemplarily, if the coordinate value of the first target detection frame includes the width, height and coordinate value of the center point of the first target detection frame, the first target detection frame can be determined according to the width, height and coordinate value of the center point. The coordinate value of the upper left corner and the coordinate value of the lower right corner of a target detection frame. If the coordinate value of the target truth value box includes the width and height of the target truth value box and the coordinate value of the center point, the upper left corner of the target truth value box can be determined according to the width, height and coordinate value of the center point of the target truth value box. The coordinate value and the coordinate value of the lower right corner. Therefore, according to the above example, the length of the diagonal of the minimum circumscribed rectangular frame of the first target detection frame and the target ground-truth frame can be determined.

S12, the loss term is determined according to the following formula:

Among them, point c is the center point of the bottom edge of the first target detection frame, point c ^gt is the center point of the bottom edge of the target ground-truth frame, ρ(c, c ^gt ) is the distance between point c and point c ^gt , s is the length of the diagonal of the smallest bounding rectangle.

For example, FIG. 10 is a schematic diagram of a set of first target detection frames and target ground-truth frames provided by this application. As shown in FIG. 10 , where the coordinate value of the upper left corner B1 of the target truth value box 107 is (x _b1 , y _b1 ) and the coordinate value of the lower right corner B2 point is (x _b2 , y _b2 ), then the target truth value is The coordinate value of the center point c ^gt of the bottom edge of the frame 107 is (

y _b2 ). The coordinate value of the upper left corner B3 of the first target detection frame 108 is (x _b3 , y _b3 ) and the coordinate value of the lower right corner B4 point is (x _b4 , y _b4 ), then the center of the bottom edge of the first target detection frame 108 The coordinate value of point c is (

y _b4 ). Thus the distance between point c and point c ^gt

In addition, the minimum bounding rectangle of the target ground-truth frame 107 and the first target detection frame 108 is the target ground-truth frame 107, then the length of the diagonal of the minimum bounding rectangle of the target ground-truth frame 107 and the first target detection frame 108

Then according to method B, it can be obtained: the loss term between the target ground-truth frame 107 and the first target detection frame 108 is

Mode C, the loss term is determined according to the area of the first target detection frame, the area of the target ground-truth frame, and the area of the minimum circumscribed rectangular frame.

S21: Determine the area of the smallest circumscribed rectangular frame between the first target detection frame and the target ground truth frame.

Specifically, according to the coordinate value of the first target detection frame and the coordinate value of the target truth value frame, determine the coordinate value of the minimum enclosing rectangular frame of the first target detection frame and the target truth value frame, and according to the minimum enclosing rectangle frame Coordinate value to determine the area of the smallest bounding rectangle.

Exemplarily, if the coordinate value of the first target detection frame includes the coordinate value of the upper left corner (x1, y1), and the coordinate value of the lower right corner (x2, y2). If the coordinate value of the target ground-truth box includes the coordinate value of the upper left corner (x3, y3) and the coordinate value of the lower right corner (x4, y4). Then the coordinate value of the upper left corner of the minimum bounding rectangle of the first target detection frame and the target ground truth frame is (min[x1, x3], min[y1, y3]), and the coordinate value of the lower right corner is (max[x2, x4], max[y2, y4]). Thus the area of the smallest bounding rectangle is:

|min[x1,x3]-max[x2,x4]|×|min[y1,y3]-max[y2,y4]|

Exemplarily, if the coordinate value of the first target detection frame includes the width, height and coordinate value of the center point of the first target detection frame, the first target detection frame can be determined according to the width, height and coordinate value of the center point. The coordinate value of the upper left corner and the coordinate value of the lower right corner of a target detection frame. If the coordinate value of the target truth value box includes the width and height of the target truth value box and the coordinate value of the center point, the upper left corner of the target truth value box can be determined according to the width, height and coordinate value of the center point of the target truth value box. The coordinate value and the coordinate value of the lower right corner. Therefore, according to the above example, the area of the minimum enclosing rectangular frame of the first target detection frame and the target ground-truth frame can be determined.

S22, the loss term is determined according to the following formula:

Among them, a1 is the area of the first target detection frame, a2 is the area of the target ground-truth frame, and a3 is the area of the minimum bounding rectangle.

For example, FIG. 11 is a schematic diagram of a set of first target detection frames and target ground-truth frames provided by this application. As shown in FIG. 11 , the coordinate values of the upper left corner C1 of the target ground truth frame 109 are (x _c1 , y _c1 ) and the coordinate values of the lower right corner C2 point are (x _c2 , y _c2 ). Then the area a2 of the target ground truth box 109 is |x _c1 -x _c2 |×|y _c1 -y _c2 |. The coordinate value of point C3 in the upper left corner of the first target detection frame 110 is (x _c3 , y _c3 ) and the coordinate value of point C4 in the lower right corner is (x _c4 , y _c4 ), then the area a1 of the first target detection frame 110 is |x _c3 -x _c4 |×|y _c3 -y _c4 |. In addition, the minimum circumscribed rectangle frame of the target ground truth frame 109 and the first target detection frame 110 is frame 111, and the area a3 of the minimum circumscribed rectangular frame 111 is |x _c1 -x _c4 |×|y _c1 -y _c2 |. Then according to method c, we can obtain: the loss term between the target ground-truth frame 109 and the first target detection frame 110 is:

Mode D, the loss term is determined according to the area of the first target detection frame and the area of the target ground-truth frame.

The loss term is determined according to the following formula:

Among them, a1 is the area of the first target detection frame, and a2 is the area of the target ground-truth frame.

In some embodiments, at least two ways of determining the loss item in the prior art (for example, the above-mentioned way 1 or way 2), way A, way B, way C, and way D may also be used, Determine the loss term.

S250: Determine a second target detection model according to the loss term, where the second target detection model is used to determine a second target detection frame.

In some embodiments, if the determined loss item is smaller than the preset value, it is considered that the position of the target detected by the target detection model corresponding to the current loss item is very close to the real position of the target. At this time, according to the loss term obtained in S240, the determined second target detection model is the target detection model. The second target detection model is the first target detection model. If target detection is performed again on the image to be detected in S210 based on the second target detection model, the obtained second target detection frame of the target is the first target detection frame.

In other embodiments, if the determined loss term is greater than or equal to the preset value, S210 to S240 need to be repeatedly performed to perform N (N≥1, N is a positive integer) corrections on the first target detection model until the first target detection model is corrected based on The loss term obtained by the first target detection model after the Nth revision is smaller than the preset value, that is, it is considered that the position of the target detected by the first target detection model (for example, the second target detection model) after the Nth revision is very close to the target real location. At this time, the second target detection model is the target detection model. Wherein, the second target detection model is not the first target detection model. And if target detection is performed again on the image to be detected in S210 based on the second target detection model, the obtained second target detection frame of the target object is different from the first target detection frame. And compared with the first target detection frame, the second target detection frame is closer to the target ground-truth frame.

For example, if the determined loss item is greater than or equal to the preset value, the first target detection model needs to be modified for the first time according to the loss item to obtain the first modified first target detection model (for example, the third target detection model). detection model), and replace the first target detection model in S210 to S240 with the third target detection model, and repeat S210 to S240 to obtain the loss term again. If the loss term determined at this time is still greater than or equal to the preset value, the first target detection model needs to be revised for the second time to obtain the second revised first target detection model (for example, the second target detection model) , and replace the first target detection model in S210 to S240 with the second target detection model, and repeat S210 to S240 to obtain the loss item again, if the loss item determined at this time is less than the preset value. At this time, according to the loss term obtained in S240, the determined second target detection model is the target detection model. Wherein, the second target detection model is not the first target detection model. And if target detection is performed again on the image to be detected in S210 based on the second target detection model, the obtained second target detection frame of the target object is different from the first target detection frame. And compared with the first target detection frame, the second target detection frame is closer to the target ground-truth frame.

For example, FIG. 12 is a schematic diagram of a set of target detection boxes and target ground-truth boxes provided by this application. Among them, the target detection frame and the target ground-truth frame shown in (a) of FIG. 12 are obtained based on the target detection of the prior art. The target detection frame shown in (b) of FIG. 12 is obtained when the loss term determined based on the above-mentioned method B is smaller than the preset value.

For another example, FIG. 13 is a schematic diagram of another set of target detection frames and target ground-truth frames provided by this application. Among them, the target detection frame and the target ground-truth frame shown in (a) of FIG. 13 are obtained based on the target detection of the prior art. The target detection frame shown in (b) of FIG. 13 is obtained when the loss term determined based on the above-mentioned method C is smaller than the preset value.

The process of repeatedly performing the above S210 to S240 until the loss term is smaller than the preset value may be referred to as the training process of the first target detection model. Through the trained first target detection model (for example, the second target detection model), target detection can be performed on any graphics to be detected.

Exemplarily, the trained first target detection model may be applied to the terminal.

Further, the terminal may be an intelligent transportation device (vehicle or drone), a smart home device, an intelligent manufacturing device, or a robot, and the like. The intelligent transportation device can be, for example, an AGV or an unmanned transportation vehicle.

Exemplarily, after the trained first target detection model is applied to the terminal, the target ranging task can be performed.

Exemplarily, in order to improve the accuracy of target ranging, in the target detection model (eg, the second target detection model) determined based on the above method A or method B, the target ranging task can be performed based on the landing point ranging related algorithm. For example, the landing point ranging may be the landing point similarity triangle ranging method or the landing point coordinate transformation ranging method.

Exemplarily, in order to improve the accuracy of target ranging, in the target detection model (eg, the second target detection model) determined based on the above method C or method D, the target ranging task can be performed based on a proportional ranging method related algorithm.

Because in the method 200 for determining a target detection model provided by this embodiment of the present application, the loss term is determined based on the parameters related to target ranging (the coordinates of the bottom center point of the target detection frame or the area of the target detection frame), and the correction is made. The target detection model completes the training of the target detection model. Therefore, the method 200 can improve the accuracy of target detection, and can also improve the accuracy of target ranging.

For example, FIG. 14 is a schematic diagram of a comparison of target ranging based on different target detection models. Wherein, if the target ranging is performed based on the target detection model of the prior art, the determined distance between the vehicle 112 and the vehicle 111 is Z1. If target ranging is performed based on the target detection model determined in the above manner B, the determined distance between the vehicle 112 and the vehicle 111 is Z. Because in the above-mentioned method B provided by the embodiment of the present application, the target detection model is a loss item determined based on parameters related to target ranging, that is, the coordinate value of the center point of the bottom edge of the target detection frame. And in the subsequent target ranging, it is also based on the coordinate value of the center point of the bottom edge of the target detection frame, so that the position of the target can be detected more accurately, so that the target can be measured.

The method for determining the target detection model of the embodiment of the present application is described above with reference to FIGS. 1 to 14 , and the apparatus for determining the target detection model of the embodiment of the present application is described below with reference to FIGS. 15 and 16 . It should be understood that the description of the apparatus for determining the target detection model corresponds to the description of the method for determining the target detection model. Therefore, for parts not described in detail, reference may be made to the foregoing description of the method for determining the target detection model.

FIG. 15 is a schematic structural diagram of an apparatus for determining a target detection model provided by an embodiment of the present application. As shown in FIG. 15, the apparatus 300 for determining a target detection model includes an acquisition module 310 and a processing module 320, wherein,

Obtaining unit 310 is used to obtain the first target detection frame according to the first target detection model, and the first target detection frame is the boundary contour that the target detection is carried out to the image to be detected;

a processing unit 320, configured to determine parameter information of the first target detection frame, where the parameter information of the first target detection frame includes parameters related to target ranging;

The processing unit 320 is further configured to determine parameter information of the target ground-truth frame, where the target ground-truth frame is the actual boundary contour of the target in the to-be-detected image, and the parameter information of the target ground-truth frame includes: parameters related to target ranging for the target;

The processing unit 320 is further configured to determine a loss item according to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame, where the loss item is used to indicate the first target detection frame and the deviation between the target ground-truth boxes;

The processing unit 320 is further configured to determine a second target detection model according to the loss term, where the second target detection model is used to determine a second target detection frame.

Optionally, the parameter information of the first target detection frame includes the coordinate value of the bottom center point of the first target detection frame, and the parameter information of the target truth frame includes the bottom center of the target truth frame. Point coordinate value.

Optionally, the target ranging is a ranging method using a landing point ranging method.

Optionally, the parameter information of the first target detection frame includes the area of the first target detection frame, and the parameter information of the target ground truth frame includes the area of the target ground truth frame.

Optionally, the target ranging is ranging by using a proportional ranging method.

Optionally, the processing unit 320 is further specifically configured to: according to the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the first target detection frame and the target ground-truth The parameter of the minimum enclosing rectangular box of the box, which determines the loss term.

Optionally, the parameter of the minimum enclosing rectangle includes the length of the diagonal of the minimum enclosing rectangle.

Optionally, the processing unit 320 is further specifically configured to: determine the loss item according to the following formula:

Optionally, the parameter of the minimum enclosing rectangle includes the area of the minimum enclosing rectangle.

Optionally, the actual boundary contour of the target is manually marked.

The apparatus 400 for determining a target detection model includes at least one memory 410 and at least one processor 420, the at least one memory 410 is used for storing a program, and the at least one processor 420 is used for running the program to realize the aforementioned method 200.

It should be understood that the processor in the embodiments of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application-specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. The non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which acts as an external cache. By way of example and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (DRAM) Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Fetch memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

The descriptions of the processes corresponding to the above figures have their own emphasis, and for parts that are not described in detail in a certain process, please refer to the relevant descriptions of other processes.

Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium has program instructions, and when the program instructions are directly or indirectly executed, the foregoing method can be implemented.

Embodiments of the present application also provide a computer program product containing instructions, which, when running on a computing device, enables the computing device to execute the foregoing method, or enables the computing device to implement the foregoing apparatus for determining a target detection model function.

An embodiment of the present application further provides a chip, including at least one processor and an interface circuit, where the interface circuit is configured to provide program instructions or data for the at least one processor, and the at least one processor is configured to execute the program instructions , so that the above method can be realized.

An embodiment of the present application further provides a terminal, including the aforementioned apparatus for determining a target detection model.

The above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server or data center by wire (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that contains a set of one or more available media. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media. The semiconductor medium may be a solid state drive.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

A method for determining a target detection model, wherein the target detection model is used for target detection, wherein the method comprises:

Obtain a first target detection frame according to the first target detection model, where the first target detection frame is a boundary contour obtained by performing target detection on the image to be detected;

determining parameter information of the first target detection frame, where the parameter information of the first target detection frame includes parameters related to target ranging;

Determine the parameter information of the target ground-truth frame, the target ground-truth frame is the actual boundary contour of the target in the image to be detected, and the parameter information of the target ground-truth frame includes and is related to the target detection of the target. distance-related parameters;

According to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame, a loss item is determined, and the loss item is used to indicate the difference between the first target detection frame and the target ground-truth frame. deviation;

A second object detection model is determined according to the loss term, and the second object detection model is used to determine a second object detection frame.
The method according to claim 1, wherein the parameter information of the first target detection frame includes a coordinate value of a bottom center point of the first target detection frame, and the parameter information of the target ground-truth frame includes all the The coordinate value of the center point of the bottom edge of the target ground truth box.
The method according to claim 2, wherein the target ranging is a ranging method using a landing point ranging method.
The method according to claim 1, wherein the parameter information of the first target detection frame includes an area of the first target detection frame, and the parameter information of the target ground-truth frame includes the target ground-truth frame area.
The method according to claim 4, characterized in that, the target ranging is ranging by using a proportional ranging method.
The method according to any one of claims 2 to 5, wherein,

According to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame, the loss term is determined, including:

The loss term is determined according to the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the parameters of the minimum enclosing rectangular frame between the first target detection frame and the target ground-truth frame.
The method according to claim 6, wherein the parameter of the minimum enclosing rectangle includes the length of the diagonal of the minimum enclosing rectangle.
The method according to claim 7, wherein, according to the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the first target detection frame and the target ground-truth The parameters of the minimum bounding rectangle of the box, including:

The loss term is determined according to the following formula:

Wherein, point c is the center point of the bottom edge of the target detection frame, point c gt is the center point of the bottom edge of the target ground-truth frame, and the ρ(c,c gt ) is the point c and the point c gt is the distance between points, and the s is the length of the diagonal of the minimum enclosing rectangle.
The method according to claim 6, wherein the parameter of the minimum enclosing rectangle includes the area of the minimum enclosing rectangle.
The method according to claim 9, wherein, according to the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the first target detection frame and the target ground-truth The parameters of the minimum bounding rectangle of the box, including:

The loss term is determined according to the following formula:

Wherein, a1 is the area of the target detection frame, a2 is the area of the target ground-truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
The method according to any one of claims 1 to 10, wherein the actual boundary contour of the target is manually marked.
A device for determining a target detection model, wherein the target detection model is used for target detection, wherein the device comprises:

an obtaining unit, configured to obtain a first target detection frame according to a first target detection model, where the first target detection frame is a boundary contour obtained by performing target detection on the image to be detected;

a processing unit, configured to determine parameter information of the first target detection frame, where the parameter information of the first target detection frame includes parameters related to target ranging;

The processing unit is further configured to determine the parameter information of the target ground truth frame, the target ground truth frame is the actual boundary contour of the target in the image to be detected, and the parameter information of the target ground truth frame includes and Performing target ranging related parameters on the target;

The processing unit is further configured to determine a loss item according to the parameter information of the first target detection frame and the parameter information of the target ground-truth frame, where the loss item is used to indicate the first target detection frame and the all target detection frame. describe the deviation between the target ground-truth boxes;

The processing unit is further configured to determine a second target detection model according to the loss term, where the second target detection model is used to determine a second target detection frame.
The device according to claim 12, wherein the parameter information of the first target detection frame includes a coordinate value of a bottom center point of the first target detection frame, and the parameter information of the target ground-truth frame includes all the The coordinate value of the center point of the bottom edge of the target ground truth box.
The device according to claim 13, wherein the target ranging is a ranging method using a landing point ranging method.
The device according to claim 12, wherein the parameter information of the first target detection frame comprises an area of the first target detection frame, and the parameter information of the target ground truth frame comprises the target ground truth frame area.
The device according to claim 15, wherein the target ranging is a ranging method using a proportional ranging method.
The device according to any one of claims 13 to 16, wherein the processing unit is further specifically configured to:

The loss term is determined according to the parameter information of the first target detection frame, the parameter information of the target ground-truth frame, and the parameters of the minimum enclosing rectangular frame between the first target detection frame and the target ground-truth frame.
The apparatus according to claim 17, wherein the parameter of the minimum circumscribed rectangle comprises the length of the diagonal of the minimum circumscribed rectangle.
The device according to claim 18, wherein the processing unit is further specifically configured to:

The loss term is determined according to the following formula:

Wherein, point c is the center point of the bottom edge of the target detection frame, point c gt is the center point of the bottom edge of the target ground-truth frame, and the ρ(c,c gt ) is the point c and the point c gt is the distance between points, and the s is the length of the diagonal of the minimum enclosing rectangle.
The apparatus according to claim 17, wherein the parameter of the minimum enclosing rectangle includes an area of the minimum enclosing rectangle.
The device according to claim 20, wherein the processing unit is further specifically configured to:

The loss term is determined according to the following formula:

Wherein, a1 is the area of the target detection frame, a2 is the area of the target ground-truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
The device according to any one of claims 12 to 21, wherein the actual boundary contour of the target is manually marked.
A device for determining a target detection model, characterized in that it comprises at least one memory and at least one processor, the at least one memory is used for storing a program, and the at least one processor is used for running the program, so as to realize the method as claimed in the claims The method of any one of 1 to 11.
A computer-readable storage medium, characterized in that, a program or an instruction is stored on the computer-readable storage medium, and when the program or instruction is executed, a computer executes the method according to any one of claims 1 to 11. method.
A chip, characterized by comprising at least one processor and an interface circuit, wherein the interface circuit is configured to provide program instructions or data for the at least one processor, and the at least one processor is configured to execute the program instructions to A method as claimed in any one of claims 1 to 11 is implemented.