CN116508073A - Method and device for determining target detection model - Google Patents

Method and device for determining target detection model Download PDF

Info

Publication number
CN116508073A
CN116508073A CN202180079541.6A CN202180079541A CN116508073A CN 116508073 A CN116508073 A CN 116508073A CN 202180079541 A CN202180079541 A CN 202180079541A CN 116508073 A CN116508073 A CN 116508073A
Authority
CN
China
Prior art keywords
target
target detection
frame
truth
parameter information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202180079541.6A
Other languages
Chinese (zh)
Inventor
高鲁涛
罗达新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116508073A publication Critical patent/CN116508073A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Radar Systems Or Details Thereof (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a method and a device for determining a target detection model, which can be applied to the fields of intelligent driving, automatic driving or unmanned driving of intelligent automobiles. The method comprises the following steps: acquiring a first target detection frame of an image to be detected based on a first target detection model, determining parameter information of the first target detection frame, and determining parameter information of a target truth box, wherein the target truth box is an actual boundary contour of a target object in the image to be detected, and the parameter information of the first target detection frame and the parameter information of the target detection frame both comprise parameters related to target ranging of the target object; determining deviation between the first target detection frame and the target truth frame according to the parameter information of the first target detection frame and the parameter information of the target detection frame; a second object detection model is determined based on the deviation. Therefore, the first target detection side model is corrected by adopting the parameters related to the target ranging, and the accuracy of the target ranging can be improved.

Description

Method and device for determining target detection model Technical Field
The present application relates to the field of object detection, and more particularly, to a method for determining an object detection model and an apparatus thereof.
Background
With the development of society, more and more machines in modern life develop to automation and intellectualization, and vehicles for mobile travel are no exception, and intelligent vehicles are gradually entering into daily life of people. For example, in a vehicle including an unmanned system, target detection may be achieved. For example, the target detection may include obstacle recognition, traffic light recognition, sign recognition, and the like.
The task of object detection (object detection) is an important task in the field of machine vision. The object detection aims at finding out an object of interest in an image or video and detecting the position of the object. The target detection is implemented based on a target detection model, which may be constructed based on a series of training samples. In the training stage of the target detection model, a Loss Function (Loss Function) can measure the difference between the target position detected by the target detection model and the target real position. And correcting the parameters of the target detection model towards the direction of decreasing the loss function according to the difference between the parameters, so that the target detection model detects that the target position is closer to the real position.
The task after target detection may be target ranging, i.e. determining the distance of the target from itself according to the position of the target detected by the target. However, since the target detection and the target ranging are two separate tasks, the design of the loss function only considers the requirement of target detection and does not consider the requirement of subsequent target ranging in the training stage of the target detection model, which easily causes larger error of the subsequent target ranging.
Disclosure of Invention
The application provides a method and a device for determining a target detection model, which can improve the accuracy of target ranging.
In a first aspect, there is provided a method of determining an object detection model for object detection, the method comprising: acquiring a first target detection frame according to a first target detection model, wherein the first target detection frame is a boundary contour obtained by carrying out target detection on an image to be detected; determining parameter information of the first target detection frame, wherein the parameter information of the first target detection frame comprises parameters related to target ranging of a target object; determining parameter information of a target truth box, wherein the target truth box is an actual boundary outline of the target object in the image to be detected, and the parameter information of the target truth box comprises parameters related to target ranging of the target object; determining a loss term according to the parameter information of the first target detection frame and the parameter information of the target truth frame, wherein the loss term is used for indicating the deviation between the first target detection frame and the target truth frame; and determining a second target detection model according to the loss item, wherein the second target detection model is used for determining a second target detection frame.
The first target detection frame may be understood as a position of a target object obtained by target detection in an image to be detected.
In one implementation, acquiring the first target detection frame according to the first target detection model may include: inputting the image to be detected into a first target detection model, detecting the coordinate value of the area occupied by the object in the image to be detected based on a corresponding algorithm by the first target detection model, and obtaining the boundary contour of the object, namely a first target detection frame, according to the coordinate value of the area occupied by the object in the image to be detected.
For example, the coordinate value of the region occupied by the object in the image to be detected may include a coordinate value of a diagonal of the region occupied by the object in the image to be detected.
For example, the coordinate values of the area occupied by the object in the image to be detected may include the width, the height and the coordinate values of the center point of the area occupied by the object in the image to be detected.
The target truth box can be understood as the actual position of the target object in the image to be detected.
In one possible implementation, the actual boundary contour of the object is manually marked.
Specifically, the coordinate value of the area actually occupied by the target object in the image to be detected can be marked manually, so that the actual boundary outline of the target object, namely the target truth box, is obtained according to the coordinate value of the area actually occupied by the target object in the image to be detected.
For example, the coordinate values of the diagonal corners of the region occupied by the target object in the image to be detected may be manually marked.
For example, the width, height and coordinate values of the center point of the region occupied by the target object in the image to be detected may be manually marked.
In one possible implementation manner, if the determined loss term is smaller than the preset value, the position of the target detected by the target detection model corresponding to the current loss term is considered to be very close to the real position of the target. At this time, the second object detection model determined according to the loss term is the object detection model. The second target detection model is the first target detection model, and if the target detection is performed again on the image to be detected based on the second target detection model, the second target detection frame of the obtained target object is the first target detection frame.
In another implementation manner, if the determined loss term is greater than or equal to the preset value, the first target detection model needs to be corrected N (N is greater than or equal to 1, N is a positive integer) times until the loss term obtained based on the first target detection model after the nth correction is less than the preset value, that is, the position of the target detected by the first target detection model (for example, the second target detection model) after the nth correction is considered to be very close to the real position of the target. At this time, the second object detection model is the object detection model. Wherein the second object detection model is not the first object detection model. And if the target detection is carried out on the image to be detected again based on the second target detection model, the second target detection frame and the first target detection frame of the obtained target object are different. And the second target detection frame is closer to the target truth frame than the first target detection frame.
Firstly, carrying out target detection on an image to be detected based on a first target detection model to obtain a first target detection frame, determining parameter information of the first target detection frame, and obtaining parameter information of an actual boundary contour of a target object in the image to be detected, namely parameter information of a target truth frame, wherein the parameter information of the first target detection frame and the parameter information of the target truth frame both comprise parameters related to target ranging of the target object; secondly, determining a loss item between the first target detection frame and the target truth frame according to the parameter information of the first target detection frame and the parameter information of the target detection frame; finally, a second object detection model is determined from the loss term. In the process of target detection, the deviation between the first target detection frame and the target truth frame is determined by adopting the parameters related to target ranging, and the first target detection model is corrected according to the deviation to obtain a second target detection model, so that the accuracy of target ranging can be improved.
With reference to the first aspect, in certain implementation manners of the first aspect, the parameter information of the first target detection frame includes a bottom center point coordinate value of the first target detection frame, and the parameter information of the target truth frame includes a bottom center point coordinate value of the target truth frame.
With reference to the first aspect, in certain implementations of the first aspect, the target ranging is ranging performed using a landed point ranging method.
With reference to the first aspect, in certain implementation manners of the first aspect, the parameter information of the first target detection box includes an area of the first target detection box, and the parameter information of the target truth box includes an area of the target truth box.
With reference to the first aspect, in certain implementations of the first aspect, the target ranging is ranging performed using a proportional ranging method.
With reference to the first aspect, in certain implementation manners of the first aspect, determining a loss term according to the parameter information of the first target detection box and the parameter information of the target truth box includes: and determining the loss item according to the parameter information of the first target detection frame, the parameter information of the target truth frame and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
With reference to the first aspect, in certain implementations of the first aspect, the parameter of the minimum bounding rectangle frame includes a length of a diagonal of the minimum bounding rectangle frame.
With reference to the first aspect, in certain implementation manners of the first aspect, the parameters according to the parameter information of the first target detection frame, the parameter information of the target truth frame, and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame include: determining the loss term according to the following formula:
Wherein the point c is the center point of the bottom edge of the target detection frame, c gt The point is the bottom center point of the target truth box, the ρ (c, c gt ) For the c point and the c gt And the distance between the points, wherein s is the length of the diagonal line of the minimum circumscribed rectangular frame.
With reference to the first aspect, in certain implementations of the first aspect, the parameter of the minimum bounding rectangular box includes an area of the minimum bounding rectangular box.
With reference to the first aspect, in certain implementation manners of the first aspect, the parameters according to the parameter information of the first target detection frame, the parameter information of the target truth frame, and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame include: determining the loss term according to the following formula:
wherein a1 is the area of the target detection frame, a2 is the area of the target truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
In a second aspect, there is provided an apparatus for determining an object detection model for object detection, the apparatus comprising: the acquisition unit is used for acquiring a first target detection frame according to the first target detection model, wherein the first target detection frame is a boundary contour obtained by carrying out target detection on an image to be detected; the processing unit is used for determining parameter information of the first target detection frame, wherein the parameter information of the first target detection frame comprises parameters related to target ranging of a target object; the processing unit is further configured to determine parameter information of a target truth box, where the target truth box is an actual boundary contour of the target object in the image to be detected, and the parameter information of the target truth box includes parameters related to performing target ranging on the target object; the processing unit is further configured to determine a loss term according to the parameter information of the first target detection frame and the parameter information of the target truth frame, where the loss term is used to indicate a deviation between the first target detection frame and the target truth frame; the processing unit is further configured to determine a second target detection model according to the loss term, where the second target detection model is used to determine a second target detection frame.
Firstly, an acquisition unit performs target detection on an image to be detected based on a first target detection model to obtain a first target detection frame; the processing unit determines parameter information of a first target detection frame and parameter information of an actual boundary contour of a target object in an image to be detected, namely parameter information of a target truth frame, wherein the parameter information of the first target detection frame and the parameter information of the target truth frame both comprise parameters related to target ranging of the target object; secondly, the processing unit determines a loss item between the first target detection frame and the target truth frame according to the parameter information of the first target detection frame and the parameter information of the target detection frame; finally, the processing unit also determines a second object detection model based on the loss term. In the process of target detection, the deviation between the first target detection frame and the target truth frame is determined by adopting the parameters related to target ranging, and the first target detection model is corrected according to the deviation to obtain a second target detection model, so that the accuracy of target ranging can be improved.
With reference to the second aspect, in certain implementation manners of the second aspect, the parameter information of the first target detection frame includes a bottom center point coordinate value of the first target detection frame, and the parameter information of the target truth frame includes a bottom center point coordinate value of the target truth frame.
With reference to the second aspect, in certain implementations of the second aspect, the target ranging is ranging performed using a landed point ranging method.
With reference to the second aspect, in certain implementations of the second aspect, the parameter information of the first target detection box includes an area of the first target detection box, and the parameter information of the target truth box includes an area of the target truth box.
With reference to the second aspect, in some implementations of the second aspect, the target ranging is ranging performed using a proportional ranging method.
With reference to the second aspect, in certain implementation manners of the second aspect, the processing unit is further specifically configured to: and determining the loss item according to the parameter information of the first target detection frame, the parameter information of the target truth frame and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
With reference to the second aspect, in certain implementations of the second aspect, the parameter of the minimum bounding rectangle frame includes a length of a diagonal of the minimum bounding rectangle frame.
With reference to the second aspect, in certain implementation manners of the second aspect, the processing unit is further specifically configured to: determining the loss term according to the following formula:
Wherein the point c is the center point of the bottom edge of the target detection frame, c gt The point is the bottom center point of the target truth box, the ρ (c, c gt ) For the c point and the c gt And the distance between the points, wherein s is the length of the diagonal line of the minimum circumscribed rectangular frame.
With reference to the second aspect, in certain implementations of the second aspect, the parameter of the minimum bounding rectangular box includes an area of the minimum bounding rectangular box.
With reference to the second aspect, in certain implementation manners of the second aspect, the processing unit is further specifically configured to: determining the loss term according to the following formula:
wherein a1 is the area of the target detection frame, a2 is the area of the target truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
With reference to the second aspect, in certain implementations of the second aspect, the actual boundary contour of the target is manually labeled.
In a third aspect, an apparatus for determining an object detection model is provided, comprising at least one memory for storing a program and at least one processor for running the program to implement the method of the first aspect.
It should be understood that the program may also be referred to as program code, computer instructions, computer programs, program instructions, etc.
In a fourth aspect, a chip is provided, comprising at least one processor and interface circuitry for providing program instructions or data to the at least one processor, the at least one processor being configured to execute the program instructions to implement the method of the first aspect.
Optionally, as an implementation manner, the chip system may further include a memory, where the memory stores a program, and the processor is configured to execute the program stored on the memory, and when the program is executed, the processor is configured to perform the method in the first aspect.
In a fifth aspect, a computer readable storage medium is provided, the computer readable storage medium storing program code for execution by a device, which program code, when executed by the device, implements the method of the first aspect.
In a sixth aspect, there is provided a computer program product comprising a computer program which, when executed by a computer, performs the method of the first aspect.
It should be understood that, in this application, the method of the first aspect may specifically refer to the method of the first aspect and any implementation manner of the various implementation manners of the first aspect.
A seventh aspect provides a terminal, including the apparatus for determining an object detection model according to the second aspect or the third aspect.
Further, the terminal may be an intelligent transportation device (vehicle or unmanned aerial vehicle), an intelligent home device, an intelligent manufacturing device, a robot, or the like. The intelligent transport may be, for example, an automated guided transport (automated guided vehicle, AGV), or an unmanned transport.
Drawings
Fig. 1 is a schematic diagram of an application scenario of the technical solution provided in the embodiment of the present application.
Fig. 2 is a schematic diagram of a target detection box and a target truth box.
FIG. 3 is a schematic diagram of another target detection box and target truth box.
FIG. 4 is a schematic diagram of a set of target detection boxes and target truth boxes.
Fig. 5 is a schematic diagram of another target detection box and target truth box.
FIG. 6 is a schematic diagram of another set of target detection boxes and target truth boxes.
Fig. 7 is a schematic diagram of a method for ranging based on a place-similarity triangle method.
FIG. 8 is a schematic diagram of another set of target detection boxes and target truth boxes.
Fig. 9 is a schematic flow chart diagram of a method of determining a target detection model provided herein.
Fig. 10 is a schematic diagram of a set of first target detection boxes and target truth boxes provided herein.
FIG. 11 is a schematic diagram of another set of first target detection boxes and target truth boxes provided herein.
FIG. 12 is a schematic diagram of a set of target detection boxes and target truth boxes provided herein.
FIG. 13 is a schematic diagram of another set of target detection boxes and target truth boxes provided herein.
Fig. 14 is a comparative schematic diagram of target ranging based on different target detection methods.
Fig. 15 is a schematic structural diagram of an apparatus for determining an object detection model according to an embodiment of the present application.
Fig. 16 is a schematic structural diagram of another apparatus for determining an object detection model according to an embodiment of the present application.
Detailed Description
The technical solutions in the present application will be described below with reference to the accompanying drawings.
The terminology used in the following embodiments is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include the same. Such as "one or more" such expressions, unless the context clearly indicates otherwise. It should also be understood that in the various embodiments herein below, "at least one", "one or more" means one, two or more than two.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
The following describes, with reference to fig. 1, a scenario in which the technical solution of the embodiment of the present application may be applied.
Fig. 1 is a schematic diagram illustrating an example of an application scenario of a target detection model provided in an embodiment of the present application. As shown in fig. 1, the application scenario 100 may include a vehicle 111 and a vehicle 112 traveling in front of the vehicle 111. Wherein an object detection model determined by the method 200 is provided in the vehicle 111, the vehicle 111 may perform object detection on an object (e.g., the vehicle 112) in front of the vehicle 111 based on the object detection model. After the target detection, the vehicle 111 may also perform target ranging on the vehicle 112 so as to make the driver of the vehicle 111 aware of the condition of the road ahead in advance, and make a driving strategy (e.g., path planning) in advance.
It should be appreciated that vehicle 111 and vehicle 112 are schematically illustrated in fig. 1 for ease of understanding only, and this should not constitute any limitation to the present application. For example, the scenario shown in fig. 1 may also include more objects (including devices), which is not limited in this application.
With the development of society, more and more machines in modern life develop to automation and intellectualization, and vehicles for mobile travel are no exception, and intelligent vehicles are gradually entering into daily life of people. In recent years, advanced driving assistance systems (advanced driving assistant system, ADAS) play a very important role in intelligent vehicles, and the advanced driving assistance systems utilize various sensors mounted on the vehicle to sense surrounding environments at any time in the running process of the vehicle, collect data, identify, detect and track stationary and moving objects, and combine navigator map data to perform systematic operation and analysis, so that a driver can perceive possible danger in advance, and the comfort and safety of driving of the vehicle are effectively improved.
For example, in a vehicle including an unmanned system, a vision-based sensor (e.g., an in-vehicle camera), a radar-based sensor (e.g., an in-vehicle millimeter wave radar, an in-vehicle laser radar, an in-vehicle ultrasonic radar), or the like may be mounted. Because the cost of the camera is lower and the technology is mature, the camera becomes a main force sensor of the unmanned system. For example, the ADAS implements adaptive cruise control (adaptive cruise control, ACC), automatic emergency braking (autonomous emergency braking, AEB), lane change assist (lane change assist, LCA), blind spot monitoring (blind spot monitoring, BSD), etc. functions that are all independent of the camera.
According to the image collected by the camera, a plurality of ADAS functions can be realized, such as lane line detection, travel area (fresh) detection, obstacle recognition, traffic light recognition, sign recognition and the like.
The functions of lane line detection, freespace detection and the like generally use a semantic segmentation model in a machine learning algorithm to give classification information to pixels belonging to lane lines or pixels belonging to freespace in an image.
The functions of obstacle recognition, traffic light recognition, sign recognition and the like generally utilize a target detection model in a machine learning algorithm to realize target detection.
The task of object detection is an important task in the field of machine vision. The object detection aims to find out an object of interest (which may be referred to as an object) in an image or video and to achieve both the position of the output object and the object classification of the object. For example, a manner of outputting an object type of the object and a minimum bounding box of the object on an image (an image referred to in this application as an image of the object detected by the terminal device) is referred to as 2D object detection; also, for example, object detection that outputs information of an object type of an object and a length, a width, a height, a rotation angle, and the like of the object in a three-dimensional space is referred to as 3D object detection.
The target detection, which can also be called target extraction, is a target positioning based on target geometric and statistical characteristics, which combines the positioning and recognition of the target into one, and the accuracy and the real-time performance are an important capability of the whole system for realizing the target positioning. Especially in complex scenes, when multiple targets need to be processed in real time, automatic extraction and recognition of the targets are particularly important.
Illustratively, an obstacle recognition corresponding object detection model will detect the position of an obstacle in an image and give the category of the obstacle (e.g., vehicle, pedestrian, etc.). For example, the object detection model corresponding to obstacle recognition may be based on the YOLO algorithm, and the position of the obstacle is obtained from the image acquired by the camera, where the obtained position of the obstacle is generally represented by a rectangular frame. Meanwhile, the category information of the obstacle and the confidence information corresponding to the category can be acquired from the image acquired by the camera, for example, the category of the obstacle is a vehicle, and the confidence corresponding to the category of the obstacle is 90% of the confidence corresponding to the category of the obstacle.
In 2D object detection, a detection box (bounding box) is typically used to describe the location of the detected object. The detection frame may be a rectangular frame and may be determined by coordinates of the rectangular frame. For example, it may be determined by coordinates of the diagonal corners of the rectangular frame.
For example, fig. 2 is a schematic diagram of a 2D detection frame provided in an embodiment of the present application. The dashed box (which may be referred to as an object detection box) as shown in fig. 2 is the position of the object detected by the object. The solid line box (which may be referred to as the target truth box) as shown in fig. 2 is the actual position of the target.
In the embodiment of the present application, how to implement the target detection is not limited, and the target detection may be performed by an existing method or a method after the development of a technology.
It should be appreciated that in the 2D object detection process, the bounding box is predicted by the object detection model, which predicts the position and size of a detection frame relative to the reference point, the confidence of the object class and whether there is an object in the detection frame, and the confidence of the object class.
The target detection is implemented based on a target detection model, which may be constructed based on a series of training samples. For example, the object detection model may be trained from a plurality of images and the position of the object in each of the plurality of images. When any one image is input into the target detection model, the target detection model can detect the position of the target in the image through a corresponding algorithm, and the detected position of the target is obtained.
In the early stage of constructing the target detection model, there may be a situation that the constructed target detection model cannot well detect the position of the target due to insufficient training samples, so that the target position detected by the target detection model is far away from the real position of the target, i.e. the detection accuracy of the target detection model is not high. Therefore, the target detection model needs to be corrected, and a target detection model with high detection precision is obtained. Wherein the Loss Function (Loss Function) can be used to measure the gap between the position of the target detected by the target detection model and the true position of the target. The larger the value output by the loss function is, the larger the difference between the position of the target detected by the target detection model and the real position of the target is, and the smaller the value output by the loss function is, the closer the position of the target detected by the target detection model and the real position of the target are. Thus, the process of modifying the object detection model can be understood as a process of reducing the loss function. And when the value output by the loss function is smaller than a preset value, the position of the target detected by the target detection model is considered to be very close to the real position of the target.
In the following, taking mode 1 and mode 2 as examples, how to calculate the loss term in the loss function is described.
In the mode 1, the loss function is calculated by taking into consideration the distance between the center point of the target detection frame and the center point of the target truth frame.
For example, as shown in FIG. 3, the penalty term of the penalty function output of target detection box 101 and target truth box 102 may be:
where ρ (A1, A2) is the distance between point A1 and point A2, point A1 is the center point of target detection frame 101, point A2 is target truth frame 102, s1 is the length of the diagonal of the smallest bounding rectangle frame 103 of target detection frame 101 and target truth frame 102 (e.g., the distance between point A3 and point A4).
In the construction of the target detection model corresponding to fig. 3, the loss functions of the target detection frame 101 and the target truth frame 102 may be gradually reduced, that is, the center point of the target detection frame 101 is continuously close to the center point of the target truth frame 102, so as to improve the detection accuracy of the target detection model corresponding to fig. 3. For example, the penalty term of the penalty function outputs of the target detection box 101 and the target truth box 102 shown in (a) in fig. 4, the penalty term of the penalty function outputs of the target detection box 101 and the target truth box 102 shown in (b) in fig. 4, and the penalty term of the penalty function outputs of the target detection box 101 and the target truth box 102 shown in (c) in fig. 4 decrease in order.
Mode 2, the loss function is calculated by taking into account the distance between the aspect ratio of the target detection box and the aspect ratio of the target truth box.
For example, as shown in FIG. 5, the penalty term of the penalty function output of target detection box 103 and target truth box 104 may be:
where α is a weight parameter, w2 is a width of the target truth box 104, h2 is a height of the target truth box 104, w1 is a width of the target detection box 103, and h1 is a height of the target detection box 103.
In the construction of the object detection model corresponding to fig. 6, the loss functions of the object detection box 103 and the object truth box 104 may be gradually reduced, that is, the aspect ratio of the object detection box 101 is continuously close to the aspect ratio of the object truth box 102, so as to improve the detection accuracy of the object detection model corresponding to fig. 5. For example, the penalty term of the penalty function outputs of the target detection box 103 and the target truth box 104 shown in (a) of fig. 6, the penalty term of the penalty function outputs of the target detection box 103 and the target truth box 104 shown in (b) of fig. 6, and the penalty term of the penalty function outputs of the target detection box 103 and the target truth box 104 shown in (c) of fig. 6 decrease in order.
After the target detection model is corrected, target detection can be performed based on the corrected target detection model, and the position or the size of the target can be obtained. The distance of the target, i.e. the target ranging, may then be measured based on the position or size of the target in order to make a driving strategy (e.g. path planning) in advance.
In the following, taking a falling-site ranging method and a proportional ranging method as examples, how to determine the distance between the target and itself, i.e. the target ranging, according to the position or the size of the target detection frame in the image, which is obtained by target detection.
The method for measuring the distance between the object and the object is described how to determine the distance between the object and the object according to the position or the size of the object detection frame in the image, which is obtained by detecting the object, by taking a similar triangle distance measuring method of the object and a coordinate transformation distance measuring method of the object.
(1) Triangle distance measuring method for similar landing points
The method for measuring the distance of the similar triangle of the landing point utilizes the similar relationship of the triangle to determine the distance of the target from the target.
Specifically, for example, as shown in fig. 7, the camera is located at a point p on the vehicle 111, the optical axis direction of the camera is parallel to the ground, and I is the imaging plane of the camera. From the triangle similarity relationship, it is possible to:
wherein y is the distance from the projection point of the target landing point in the image to the optical center of the image, the unit is pixel, f is the focal length, the unit is pixel, H is the height of the camera from the ground, the unit is m, and Z is the horizontal distance from the target landing point to the camera, and the unit is m.
Further, the horizontal distance Z between the target landing point and the camera is:
Z may be considered the distance of vehicle 112 from vehicle 111.
In general, a projection point of the target landing point in the image may be equivalent to a midpoint of the bottom edge of the target detection frame obtained based on the target detection, for example, the midpoint of the bottom edge of the target detection frame is an O point as described in fig. 2. Namely, y is the distance from the midpoint of the bottom edge of the target detection frame obtained by target detection to the optical center of the image. Thus, the vehicle 111 can determine the distance of the vehicle 112 from itself based on the target detection and the formula (2).
(2) Coordinate transformation distance measuring method for landing place
The coordinate transformation ranging rule of the landing point is to combine the camera internal parameter matrix and the camera external parameter matrix according to the image coordinates of the projection points of the target landing point in the image, so as to obtain the distance between the target and the self.
In the same manner as the method of measuring the distance by using a similar triangle, in general, in the method of measuring the distance by using a coordinate transformation of the distance by using the distance, the projection point of the target landing point in the image can be equivalent to the midpoint of the bottom edge of the target detection frame obtained by detecting the target.
(3) Proportional ranging method
The proportional ranging method is to range by utilizing the proportional relation between the size of a real object and the size of an image target.
In general, the distance between the target and itself is determined according to the proportional relationship between the area of the real object at a certain viewing angle and the imaging area of the target in the image. For example, the size of the target detection frame obtained by target detection is 2m×2m, the size of the target detection frame in the image is scaled down as the target distance is from the near to the far, and if the real size of the target and the scaling relationship are known, the distance of the target distance can be obtained.
Based on the methods described in (1) to (3), the target ranging task after target detection, that is, the distance of the detected target, can be completed.
Because the target detection and the target ranging are two separate processes, the target detection only considers the requirement of target detection, and does not consider the requirement of subsequent target ranging, so that the error of the subsequent target ranging is easy to be larger.
For example, in the target detection task, since the distance between the center point of the target detection frame 106 and the center point of the target truth frame 105 shown in (a) in fig. 8 is equal to the distance between the target detection frame 107 and the target truth frame 105 shown in (b) in fig. 8, and the distance between the aspect ratio of the target detection frame 106 and the aspect ratio of the target truth frame 105 shown in (a) in fig. 8 is equal to the distance between the aspect ratio of the target detection frame 107 and the aspect ratio of the target truth frame 105 shown in (b) in fig. 8, the penalty term of the target detection frame 106 and the target truth frame 105 shown in (a) in fig. 8 and the penalty term of the target detection frame 106 and the target truth frame 107 shown in (b) in fig. 8 are the same, regardless of whether the penalty term is calculated in the above-described manner 1 or 2.
However, in the target ranging task, if the methods described in (1) and (2) are adopted, the projection point of the target landing point in the image is equivalent to the midpoint of the bottom edge of the target detection frame obtained by the target detection, and the target ranging task is completed. Since the midpoint of the bottom edge of the target detection frame 106 is closer to the midpoint of the bottom edge of the target truth frame 105 than the midpoint of the bottom edge of the target detection frame 107, the error in the measured target distance based on the target detection frame 107 after target detection shown in fig. 8 (b) is larger than the measured target distance based on the target detection frame 106 after target detection shown in fig. 8 (a).
Therefore, the embodiment of the application provides a method for determining the target detection model, and the target detection model is used for target detection, so that the target detection model determined by the method can improve the accuracy of target detection and the accuracy of target ranging after target detection.
In this embodiment of the present application, the upper left corner of the image to be detected is taken as the origin of coordinates, the horizontal direction of the image to be detected from left to right is the positive direction of the x-axis, the vertical direction of the image to be detected from top to bottom is the positive direction of the y-axis, the size of the image to be detected in the x-axis direction is the width of the image to be detected, and the size of the image to be detected in the y-axis direction is the height of the image to be detected.
In the embodiment of the present application, the target detection frame (for example, the first target detection frame and the second target detection frame) and the target truth frame are taken as examples to describe the rectangular frame.
Hereinafter, a method for determining an object detection model according to an embodiment of the present application will be described with reference to specific drawings.
For example, fig. 9 is a schematic flow chart of a method 200 for determining a target detection model provided in an embodiment of the present application. As shown in fig. 9, the method 200 includes:
S210, acquiring a first target detection frame according to the first target detection model. The first target detection frame is a boundary contour obtained by performing target detection on the image to be detected.
In some embodiments, the image to be detected may be input into a first target detection model, where the first target detection model detects, based on a corresponding algorithm, a coordinate value of an area occupied by the object in the image to be detected, so as to obtain, according to the coordinate value of the area occupied by the object in the image to be detected, a boundary contour of the object, that is, a first target detection frame. I.e. the first object detection frame may be understood as the position of the object in the image to be detected, which is obtained by object detection.
In the embodiment of the present application, the target objects refer to the same target object in the image to be detected.
For example, the coordinate value of the region occupied by the object in the image to be detected may include a coordinate value of a diagonal of the region occupied by the object in the image to be detected. For example, the coordinate values of the diagonal angle of the region occupied by the object in the image to be detected include the coordinate value (x 1, y 1) of the upper left corner and the coordinate value (x 2, y 2) of the lower right corner of the region occupied by the object in the image to be detected.
For example, the coordinate value of the region occupied by the object in the image to be detected may include the width W of the region occupied by the object in the image to be detected 1 Height H 1 Coordinate value of center point (x o1 ,y o1 ). The center point is understood as the center symmetry point of the area occupied by the target in the image to be detected.
In some embodiments, the first device (the device performing S220) includes a camera, and an image to be detected is acquired by the camera. In other embodiments, the first device (the device performing S220) does not include a camera, and the first device may acquire from other devices capable of acquiring an image to be detected.
In some embodiments, a plurality of first target detection boxes may be acquired according to the first target detection model, and S220 to S250 are performed for each first target detection box.
S220, determining parameter information of the first target detection frame. The parameter information of the first target detection frame comprises parameters related to target ranging of the target object.
For example, in an embodiment in which the target ranging is accomplished based on a falling point similar triangle ranging method or a falling point coordinate transformation ranging method, that is, an embodiment in which the target ranging needs to acquire a midpoint of a bottom edge of the first target detection frame, the parameter information of the first target detection frame may include a bottom edge center point coordinate value of the first target detection frame. Thereby improving the accuracy of target ranging.
For example, if the coordinate values of the region occupied by the object in the image to be detected include the coordinate value (x 1, y 1) of the upper left corner and the coordinate value (x 2, y 2) of the lower right corner of the region occupied by the object in the image to be detected, the coordinate value of the bottom center point of the object detection frame isy2)。
For another example, if the object isThe coordinate value of the area occupied by the object in the image to be detected comprises the width W of the area occupied by the object in the image to be detected 1 And height H 1 Coordinate value of center point (x o1 ,y o1 ) The bottom center point coordinate value of the target detection frame is (x) o1)。
Illustratively, in embodiments where target ranging is accomplished based on a proportional ranging method, i.e., embodiments where target detection requires acquisition of an area (also understood as a size) of a target detection frame, the parameter information of the first target detection frame includes the area of the first target detection frame. Thereby, the accuracy of target ranging can be improved.
For example, if the coordinate values of the region occupied by the object in the image to be detected include the coordinate value (x 1, y 1) of the upper left corner and the coordinate value (x 2, y 2) of the lower right corner of the region occupied by the object in the image to be detected, the area of the object detection frame is |x2-x1|×|y2-y1|.
For another example, if the coordinate value of the area occupied by the object in the image to be detected includes the width W of the area occupied by the object in the image to be detected 1 And height H 1 Coordinate value of center point (x o1 ,y o1 ) The area of the target detection frame is W 1 ×H 1
S230, determining parameter information of the target truth box. The target truth box is an actual boundary outline of the target object in the image to be detected, and the parameter information of the target truth box comprises parameters related to target ranging of the target object.
In some embodiments, the coordinate value of the area actually occupied by the target object in the image to be detected may be manually marked, so as to obtain the actual boundary contour of the target object, that is, the target truth box, according to the coordinate value of the area actually occupied by the target object in the image to be detected. I.e. the target truth box can be understood as the actual position of the target object in the image to be detected.
For example, the coordinate values of the diagonal corners of the region actually occupied by the target object in the image to be detected may be manually marked. For example, the coordinate value (x 3, y 3) of the upper left corner and the coordinate value (x 4, y 4) of the lower right corner of the area actually occupied by the target object in the image to be detected.
For example, the width W of the area actually occupied by the target object in the image to be detected can be manually marked 2 Height H 2 Coordinate value of center point (x o2 ,y o2 )。
In embodiments where target ranging is accomplished based on a landed point-like triangle ranging method or a landed point coordinate transformation ranging method, i.e., embodiments where target ranging requires acquisition of a midpoint of the bottom edge of the target detection frame, the parameter information of the target truth frame includes a bottom edge center point coordinate value of the target truth frame. Thereby, the accuracy of target ranging can be improved.
For example, if the coordinate values of the area actually occupied by the object in the image to be detected include the coordinate value (x 3, y 3) of the upper left corner and the coordinate value (x 4, y 4) of the lower right corner of the area actually occupied by the object in the image to be detected, the coordinate value of the bottom center point of the target truth box isy3)。
For another example, if the coordinate value of the diagonal of the area actually occupied by the object in the image to be detected includes the width L of the area actually occupied by the object in the image to be detected 2 And height w 2 Coordinate value of center point (x o2 ,y o2 ) The bottom center point coordinate value of the target truth box is (x) o2)。
In embodiments where target ranging is accomplished based on a proportional ranging method, i.e., embodiments where target detection requires acquisition of the area (also understood as size) of the target detection frame, the parameter information of the target truth frame includes the area of the target truth frame. Thereby, the accuracy of target ranging can be improved.
For example, if the coordinate values of the area actually occupied by the object in the image to be detected include the coordinate value (x 3, y 3) of the upper left corner and the coordinate value (x 4, y 4) of the lower right corner of the area actually occupied by the object in the image to be detected, the area of the object truth box is |x3-x4|×|y3-y4|.
For another example, if the coordinate value of the actual area occupied by the object in the image to be detected includes the width L of the actual area occupied by the object in the image to be detected 2 And height W 2 Coordinate value of center point (x o2 ,y o2 ) The coordinate value of the bottom center point of the target truth box is W 2 ×H 2
Alternatively, the embodiment of the present application does not limit the sequence of S220 and S230.
S240, determining a loss item according to the parameter information of the first target detection frame and the parameter information of the target truth frame. Wherein the penalty term is used to indicate the deviation between the first target detection box and the target truth box.
Hereinafter, S240 will be specifically described taking mode a, mode B, mode C, and mode D as examples.
And A, determining a loss item according to the coordinate value of the bottom center point of the first target detection frame and the coordinate value of the bottom center point of the target truth frame.
The loss term is determined according to the following formula:
wherein the point c is the bottom edge center point of the first target detection frame, c gt The point is the bottom center point of the target truth box, ρ (c, c) gt ) Is c and c gt Distance between points, H 2 Is the height of the target truth box.
And B, determining a loss term according to the coordinate value of the bottom edge central point of the first target detection frame, the coordinate value of the bottom edge central point of the target truth frame and the length of the diagonal line of the minimum circumscribed rectangular frame.
The mode B specifically includes S11 and S12.
S11, determining the length of the diagonal line of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
Specifically, according to the coordinate values of the first target detection frame and the coordinate values of the target truth frame, the coordinate values of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame are determined, and according to the coordinate values of the minimum circumscribed rectangular frame, the length of the diagonal line of the minimum circumscribed rectangular frame is determined.
For example, if the coordinate values of the first object detection frame include (x 1, y 1) for the upper left corner and (x 2, y 2) for the lower right corner. If the coordinate values of the target truth box include the coordinate value (x 3, y 3) of the upper left corner and the coordinate value (x 4, y 4) of the lower right corner. The coordinate value of the upper left corner of the smallest circumscribed rectangle of the first target detection frame and the target truth frame is (min [ x1, x3], min [ y1, y3 ]), and the coordinate value of the lower right corner is (max [ x2, x4], max [ y2, y4 ]). The length of the diagonal of the smallest bounding rectangle is thus:
for example, if the coordinate values of the first target detection frame include the coordinate values of the width, the height, and the center point of the first target detection frame, the coordinate values of the upper left corner and the coordinate values of the lower right corner of the first target detection frame may be determined according to the coordinate values of the width, the height, and the center point of the first target detection frame. If the coordinate values of the target truth frame include the width, the height and the coordinate values of the center point of the target truth frame, the coordinate values of the upper left corner and the coordinate values of the lower right corner of the target truth frame can be determined according to the width, the height and the coordinate values of the center point of the target truth frame. Thus, the length of the diagonal of the smallest circumscribed rectangular box of the first target detection box and the target truth box can be determined according to the above example.
S12, determining a loss term according to the following formula:
wherein the point c is the bottom edge center point of the first target detection frame, c gt The point is the bottom center point of the target truth box, ρ (c, c) gt ) Is c and c gt The distance between the points, s, is the length of the diagonal of the smallest bounding rectangular box.
For example, fig. 10 is a schematic diagram of a set of first target detection boxes and target truth boxes provided herein. As shown in FIG. 10, wherein the coordinate value of the upper left corner B1 point of the target truth box 107 is (x b1 ,y b1 ) And the coordinate value of the lower right corner B2 point is (x b2 ,y b2 ) Then the bottom center point c of the target truth box 107 gt The coordinate value of the point is%y b2 ). The coordinate value of the upper left corner B3 point of the first object detection frame 108 is (x) b3 ,y b3 ) And the coordinate value of the lower right corner B4 point is (x b4 ,y b4 ) The coordinate value of the bottom center point c of the first target detection frame 108 isy b4 ). Thus points c and c gt Distance between pointsFurther, the target truth box 107 and the first target detection box 108 are the most significantThe small bounding rectangle is the target truth box 107, then the length of the diagonal of the smallest bounding rectangle of the target truth box 107 and the first target detection box 108Then the following is available in manner B: the penalty term between the target truth box 107 and the first target detection box 108 is
And C, determining a loss term according to the area of the first target detection frame, the area of the target truth frame and the area of the minimum circumscribed rectangular frame.
S21, determining the area of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
Specifically, according to the coordinate value of the first target detection frame and the coordinate value of the target truth frame, the coordinate value of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame is determined, and according to the coordinate value of the minimum circumscribed rectangular frame, the area of the minimum circumscribed rectangular frame is determined.
For example, if the coordinate values of the first object detection frame include (x 1, y 1) for the upper left corner and (x 2, y 2) for the lower right corner. If the coordinate values of the target truth box include the coordinate value (x 3, y 3) of the upper left corner and the coordinate value (x 4, y 4) of the lower right corner. The coordinate value of the upper left corner of the smallest circumscribed rectangle of the first target detection frame and the target truth frame is (min [ x1, x3], min [ y1, y3 ]), and the coordinate value of the lower right corner is (max [ x2, x4], max [ y2, y4 ]). Thus the area of the minimum circumscribed rectangular frame is:
|min[x1,x3]-max[x2,x4]|×|min[y1,y3]-max[y2,y4]|
for example, if the coordinate values of the first target detection frame include the coordinate values of the width, the height, and the center point of the first target detection frame, the coordinate values of the upper left corner and the coordinate values of the lower right corner of the first target detection frame may be determined according to the coordinate values of the width, the height, and the center point of the first target detection frame. If the coordinate values of the target truth frame include the width, the height and the coordinate values of the center point of the target truth frame, the coordinate values of the upper left corner and the coordinate values of the lower right corner of the target truth frame can be determined according to the width, the height and the coordinate values of the center point of the target truth frame. Thus, the area of the smallest circumscribed rectangular box of the first target detection box and the target truth box can be determined according to the above example.
S22, determining a loss term according to the following formula:
wherein a1 is the area of the first target detection frame, a2 is the area of the target truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
For example, fig. 11 is a schematic diagram of a set of first target detection boxes and target truth boxes provided herein. As shown in FIG. 11, wherein the coordinate value of the upper left corner C1 point of the target truth box 109 is (x c1 ,y c1 ) And the coordinate value of the lower right corner C2 point is (x) c2 ,y c2 ). Then the area a2 of the target truth box 109 is |x c1 -x c2 |×|y c1 -y c2 | a. The invention relates to a method for producing a fibre-reinforced plastic composite. The coordinate value of the upper left corner C3 point of the first object detection frame 110 is (x) c3 ,y c3 ) And the coordinate value of the lower right corner C4 point is (x) c4 ,y c4 ) The area a1 of the first target detection frame 110 is |x c3 -x c4 |×|y c3 -y c4 | a. The invention relates to a method for producing a fibre-reinforced plastic composite. In addition, if the smallest circumscribed rectangle of the target truth box 109 and the first target detection box 110 is the box 111, the area a3 of the smallest circumscribed rectangle box 111 is |x c1 -x c4 |×|y c1 -y c2 | a. The invention relates to a method for producing a fibre-reinforced plastic composite. Then the following is available in manner c: the penalty term between target truth box 109 and first target detection box 110 is
And D, determining a loss term according to the area of the first target detection frame and the area of the target truth frame.
The loss term is determined according to the following formula:
wherein a1 is the area of the first target detection frame, and a2 is the area of the target truth frame.
In some embodiments, the penalty term may also be determined by any of the manners of determining the penalty term in the prior art (e.g., manner 1 or manner 2 described above), at least two of manner a, manner B, manner C, manner D.
S250, determining a second target detection model according to the loss term, wherein the second target detection model is used for determining a second target detection frame.
In some embodiments, if the determined loss term is smaller than the preset value, the position of the target detected by the target detection model corresponding to the current loss term is considered to be very close to the real position of the target. At this time, the determined second object detection model is the object detection model according to the loss term obtained in S240. The second target detection model is the first target detection model, and if the target detection is performed again on the image to be detected in S210 based on the second target detection model, the second target detection frame of the obtained target object is the first target detection frame.
In other embodiments, if the determined loss term is greater than or equal to the preset value, S210 to S240 are repeated, and N (N is greater than or equal to 1, N is a positive integer) times of correction is performed on the first target detection model until the loss term obtained based on the first target detection model after the nth correction is less than the preset value, that is, the position of the target detected by the first target detection model (e.g., the second target detection model) after the nth correction is considered to be very close to the real position of the target. At this time, the second object detection model is the object detection model. Wherein the second object detection model is not the first object detection model. And if the target detection is performed again on the image to be detected in S210 based on the second target detection model, the second target detection frame and the first target detection frame of the obtained target object are different. And the second target detection frame is closer to the target truth frame than the first target detection frame.
For example, if the determined loss term is greater than or equal to the preset value, the first target detection model needs to be corrected for the first time according to the loss term to obtain a first target detection model (for example, a third target detection model) after the 1 st correction, and the first target detection model in S210 to S240 is replaced by the third target detection model, and S210 to S240 are repeatedly executed to obtain the loss term again. If the determined loss term is still greater than or equal to the preset value, the first target detection model needs to be corrected for the second time to obtain a first target detection model (for example, a second target detection model) after the 2 nd time correction, the first target detection model in S210 to S240 is replaced by the second target detection model, S210 to S240 are repeatedly executed, and the loss term is obtained again, if the determined loss term is less than the preset value. At this time, the determined second object detection model is the object detection model according to the loss term obtained in S240. Wherein the second object detection model is not the first object detection model. And if the target detection is performed again on the image to be detected in S210 based on the second target detection model, the second target detection frame and the first target detection frame of the obtained target object are different. And the second target detection frame is closer to the target truth frame than the first target detection frame.
For example, FIG. 12 is a schematic diagram of a set of target detection boxes and target truth boxes provided herein. Wherein the target detection box and the target truth box shown in fig. 12 (a) are based on the target detection of the prior art. The target detection frame shown in (B) of fig. 12 is obtained when the loss term determined based on the above-described mode B is smaller than a preset value.
For another example, FIG. 13 is a schematic diagram of another set of target detection boxes and target truth boxes provided herein. Wherein the target detection box and the target truth box shown in fig. 13 (a) are based on the target detection of the prior art. The target detection frame shown in (b) of fig. 13 is obtained when the loss term determined based on the above-described manner C is smaller than a preset value.
The process of repeatedly performing the above-described S210 to S240 until the loss term is less than the preset value may be referred to as a training process of the first object detection model. By means of the trained first target detection model (e.g., the second target detection model), target detection can be performed on any pattern to be detected.
For example, a trained first object detection model may be applied to the terminal.
Further, the terminal may be an intelligent transportation device (vehicle or unmanned aerial vehicle), an intelligent home device, an intelligent manufacturing device, a robot, or the like. The intelligent transport device may be, for example, an AGV or an unmanned transport vehicle.
Illustratively, after applying the trained first object detection model to the terminal, an object ranging task may be performed.
For example, in order to improve the accuracy of the target ranging, in the target detection model (e.g., the second target detection model) determined based on the above-described mode a or mode B, the target ranging task may be performed based on the landplace ranging correlation algorithm. For example, the landpoint ranging may be a landpoint-like triangle ranging method or a landpoint coordinate transformation ranging method.
For example, in order to improve the accuracy of target ranging, in the target detection model (e.g., the second target detection model) determined based on the above-described mode C or mode D, the target ranging task may be performed based on a proportional ranging correlation algorithm.
Because in the method 200 for determining the target detection model provided in the embodiment of the present application, the loss term is determined based on the parameter related to the target ranging (the bottom center point coordinate of the target detection frame or the area of the target detection frame), and the target detection model is corrected, so as to complete the training of the target detection model. Thus, the method 200 may improve the accuracy of target detection and may also improve the accuracy of target ranging.
For example, fig. 14 is a comparative schematic diagram of target ranging based on different target detection models. Wherein, if the object ranging is performed based on the object detection model of the related art, the determined distance of the vehicle 112 from the vehicle 111 is Z1. If the target ranging is performed based on the target detection model determined in the above-described manner B, the determined distance of the vehicle 112 from the vehicle 111 is Z. Since in the above-described mode B provided in the embodiment of the present application, the target detection model is a loss term determined based on the parameter related to the target ranging, that is, the bottom center point coordinate value of the target detection frame. In the subsequent target ranging, the coordinate value of the bottom edge center point of the target detection frame is also based, so that the position of the target can be detected more accurately, and the target can be ranging.
The method for determining the object detection model according to the embodiment of the present application is described above with reference to fig. 1 to 14, and the apparatus for determining the object detection model according to the embodiment of the present application is described below with reference to fig. 15 and 16. It is to be understood that the description of the means for determining the object detection model and the description of the method for determining the object detection model correspond to each other, and therefore, parts not described in detail can be referred to the previous description of the method for determining the object detection model.
Fig. 15 is a schematic structural diagram of an apparatus for determining an object detection model according to an embodiment of the present application. As shown in fig. 15, the apparatus 300 for determining an object detection model includes an acquisition module 310 and a processing module 320, wherein,
an obtaining unit 310, configured to obtain a first target detection frame according to a first target detection model, where the first target detection frame is a boundary contour obtained by performing target detection on an image to be detected;
a processing unit 320, configured to determine parameter information of the first target detection frame, where the parameter information of the first target detection frame includes parameters related to performing target ranging on a target object;
the processing unit 320 is further configured to determine parameter information of a target truth box, where the target truth box is an actual boundary contour of the target object in the image to be detected, and the parameter information of the target truth box includes parameters related to performing target ranging on the target object;
The processing unit 320 is further configured to determine a loss term according to the parameter information of the first target detection frame and the parameter information of the target truth frame, where the loss term is used to indicate a deviation between the first target detection frame and the target truth frame;
the processing unit 320 is further configured to determine a second target detection model according to the loss term, where the second target detection model is used to determine a second target detection frame.
Optionally, the parameter information of the first target detection frame includes a bottom center point coordinate value of the first target detection frame, and the parameter information of the target truth frame includes a bottom center point coordinate value of the target truth frame.
Optionally, the target ranging is ranging by a falling point ranging method.
Optionally, the parameter information of the first target detection frame includes an area of the first target detection frame, and the parameter information of the target truth frame includes an area of the target truth frame.
Optionally, the target ranging is ranging performed by a proportional ranging method.
Optionally, the processing unit 320 is further specifically configured to: and determining the loss item according to the parameter information of the first target detection frame, the parameter information of the target truth frame and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
Optionally, the parameter of the minimum bounding rectangle frame includes a length of a diagonal of the minimum bounding rectangle frame.
Optionally, the processing unit 320 is further specifically configured to: determining the loss term according to the following formula:
wherein the point c is the center point of the bottom edge of the target detection frame, c gt The points are the purposesThe bottom center of the standard value box, the ρ (c, c gt ) For the c point and the c gt And the distance between the points, wherein s is the length of the diagonal line of the minimum circumscribed rectangular frame.
Optionally, the parameter of the minimum bounding rectangular box includes an area of the minimum bounding rectangular box.
Optionally, the processing unit 320 is further specifically configured to: determining the loss term according to the following formula:
wherein a1 is the area of the target detection frame, a2 is the area of the target truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
Optionally, the actual boundary contour of the object is manually noted.
Fig. 16 is a schematic structural diagram of another apparatus for determining an object detection model according to an embodiment of the present application.
The apparatus 400 for determining an object detection model comprises at least one memory 410 and at least one processor 420, the at least one memory 410 being configured to store a program, the at least one processor 420 being configured to run the program to implement the method 200 described above.
It should be appreciated that the processor in embodiments of the present application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).
The descriptions of the processes corresponding to the drawings have emphasis, and the descriptions of other processes may be referred to for the parts of a certain process that are not described in detail.
Embodiments of the present application also provide a computer readable storage medium having program instructions that, when executed directly or indirectly, cause the methods in the foregoing to be implemented.
Embodiments of the present application also provide a computer program product containing instructions that, when run on a computing device, cause the computing device to perform the method of the preceding description, or cause the computing device to implement the functions of the apparatus of determining the object detection model of the preceding description.
The embodiment of the application also provides a chip, which comprises at least one processor and an interface circuit, wherein the interface circuit is used for providing program instructions or data for the at least one processor, and the at least one processor is used for executing the program instructions, so that the method in the previous step is realized.
The embodiment of the application also provides a terminal, which comprises the device for determining the target detection model.
Further, the terminal may be an intelligent transportation device (vehicle or unmanned aerial vehicle), an intelligent home device, an intelligent manufacturing device, a robot, or the like. The intelligent transport device may be, for example, an AGV or an unmanned transport vehicle.
The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (25)

  1. A method of determining a target detection model for target detection, the method comprising:
    acquiring a first target detection frame according to a first target detection model, wherein the first target detection frame is a boundary contour obtained by carrying out target detection on an image to be detected;
    Determining parameter information of the first target detection frame, wherein the parameter information of the first target detection frame comprises parameters related to target ranging of a target object;
    determining parameter information of a target truth box, wherein the target truth box is an actual boundary outline of the target object in the image to be detected, and the parameter information of the target truth box comprises parameters related to target ranging of the target object;
    determining a loss term according to the parameter information of the first target detection frame and the parameter information of the target truth frame, wherein the loss term is used for indicating the deviation between the first target detection frame and the target truth frame;
    and determining a second target detection model according to the loss item, wherein the second target detection model is used for determining a second target detection frame.
  2. The method of claim 1, wherein the parameter information of the first target detection frame comprises a bottom center point coordinate value of the first target detection frame, and the parameter information of the target truth frame comprises a bottom center point coordinate value of the target truth frame.
  3. The method of claim 2, wherein the target ranging is ranging using a landline ranging method.
  4. The method of claim 1, wherein the parameter information of the first target detection box comprises an area of the first target detection box and the parameter information of the target truth box comprises an area of the target truth box.
  5. The method of claim 4, wherein the target ranging is ranging using a proportional ranging method.
  6. The method according to any one of claim 2 to 5, wherein,
    determining a loss term according to the parameter information of the first target detection frame and the parameter information of the target truth frame, including:
    and determining the loss item according to the parameter information of the first target detection frame, the parameter information of the target truth frame and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
  7. The method of claim 6, wherein the parameter of the minimum bounding rectangle box comprises a length of a diagonal of the minimum bounding rectangle box.
  8. The method according to claim 7, wherein the determining the parameters according to the parameter information of the first target detection frame, the parameter information of the target truth frame, and the minimum bounding rectangle of the first target detection frame and the target truth frame comprises:
    Determining the loss term according to the following formula:
    wherein the point c is the center point of the bottom edge of the target detection frame, c gt The point is the bottom center point of the target truth box, the ρ (c, c gt ) For the c point and the c gt And the distance between the points, wherein s is the length of the diagonal line of the minimum circumscribed rectangular frame.
  9. The method of claim 6, wherein the parameter of the minimum bounding rectangle box comprises an area of the minimum bounding rectangle box.
  10. The method according to claim 9, wherein the determining the parameters according to the parameter information of the first target detection frame, the parameter information of the target truth frame, and the minimum bounding rectangle of the first target detection frame and the target truth frame comprises:
    determining the loss term according to the following formula:
    wherein a1 is the area of the target detection frame, a2 is the area of the target truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
  11. The method according to any one of claims 1 to 10, wherein the actual boundary profile of the object is manually noted.
  12. An apparatus for determining an object detection model for object detection, the apparatus comprising:
    The acquisition unit is used for acquiring a first target detection frame according to the first target detection model, wherein the first target detection frame is a boundary contour obtained by carrying out target detection on an image to be detected;
    the processing unit is used for determining parameter information of the first target detection frame, wherein the parameter information of the first target detection frame comprises parameters related to target ranging of a target object;
    the processing unit is further configured to determine parameter information of a target truth box, where the target truth box is an actual boundary contour of the target object in the image to be detected, and the parameter information of the target truth box includes parameters related to performing target ranging on the target object;
    the processing unit is further configured to determine a loss term according to the parameter information of the first target detection frame and the parameter information of the target truth frame, where the loss term is used to indicate a deviation between the first target detection frame and the target truth frame;
    the processing unit is further configured to determine a second target detection model according to the loss term, where the second target detection model is used to determine a second target detection frame.
  13. The apparatus of claim 12, wherein the parameter information of the first target detection frame comprises a bottom center point coordinate value of the first target detection frame, and the parameter information of the target truth frame comprises a bottom center point coordinate value of the target truth frame.
  14. The apparatus of claim 13, wherein the target ranging is ranging using a landline ranging method.
  15. The apparatus of claim 12, wherein the parameter information of the first target detection box comprises an area of the first target detection box and the parameter information of the target truth box comprises an area of the target truth box.
  16. The apparatus of claim 15, wherein the target ranging is ranging using a proportional ranging method.
  17. The apparatus according to any one of claims 13 to 16, wherein the processing unit is further specifically configured to:
    and determining the loss item according to the parameter information of the first target detection frame, the parameter information of the target truth frame and the parameters of the minimum circumscribed rectangular frame of the first target detection frame and the target truth frame.
  18. The apparatus of claim 17, wherein the parameter of the minimum bounding rectangle box comprises a length of a diagonal of the minimum bounding rectangle box.
  19. The apparatus of claim 18, wherein the processing unit is further specifically configured to:
    determining the loss term according to the following formula:
    Wherein the point c is the center point of the bottom edge of the target detection frame, c gt The point is the bottom center point of the target truth box, the ρ (c, c gt ) For the c point and the c gt And the distance between the points, wherein s is the length of the diagonal line of the minimum circumscribed rectangular frame.
  20. The apparatus of claim 17, wherein the parameter of the minimum bounding rectangular box comprises an area of the minimum bounding rectangular box.
  21. The apparatus according to claim 20, wherein the processing unit is further specifically configured to:
    determining the loss term according to the following formula:
    wherein a1 is the area of the target detection frame, a2 is the area of the target truth frame, and a3 is the area of the minimum circumscribed rectangular frame.
  22. The apparatus according to any one of claims 12 to 21, wherein the actual boundary profile of the object is manually noted.
  23. An apparatus for determining an object detection model, comprising at least one memory for storing a program and at least one processor for running the program to implement the method of any one of claims 1 to 11.
  24. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a program or instructions, which when executed, cause a computer to perform the method of any of claims 1 to 11.
  25. A chip comprising at least one processor and interface circuitry for providing program instructions or data to the at least one processor, the at least one processor for executing the program instructions to implement the method of any one of claims 1 to 11.
CN202180079541.6A 2021-03-05 2021-03-05 Method and device for determining target detection model Pending CN116508073A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/079304 WO2022183484A1 (en) 2021-03-05 2021-03-05 Method and apparatus for determining object detection model

Publications (1)

Publication Number Publication Date
CN116508073A true CN116508073A (en) 2023-07-28

Family

ID=83154872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202180079541.6A Pending CN116508073A (en) 2021-03-05 2021-03-05 Method and device for determining target detection model

Country Status (2)

Country Link
CN (1) CN116508073A (en)
WO (1) WO2022183484A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109029363A (en) * 2018-06-04 2018-12-18 泉州装备制造研究所 A kind of target ranging method based on deep learning
CN109117831B (en) * 2018-09-30 2021-10-12 北京字节跳动网络技术有限公司 Training method and device of object detection network
FR3087033B1 (en) * 2018-10-03 2021-07-23 Idemia Identity & Security France METHODS OF LEARNING PARAMETERS OF A CONVOLUTION NEURON NETWORK AND DETECTION OF ELEMENTS OF VISIBLE INTEREST IN AN IMAGE
CN112016605B (en) * 2020-08-19 2022-05-27 浙江大学 Target detection method based on corner alignment and boundary matching of bounding box
CN112329873A (en) * 2020-11-12 2021-02-05 苏州挚途科技有限公司 Training method of target detection model, target detection method and device

Also Published As

Publication number Publication date
WO2022183484A1 (en) 2022-09-09

Similar Documents

Publication Publication Date Title
US11719788B2 (en) Signal processing apparatus, signal processing method, and program
US20210229280A1 (en) Positioning method and device, path determination method and device, robot and storage medium
EP4033324B1 (en) Obstacle information sensing method and device for mobile robot
CN110796063A (en) Method, device, equipment, storage medium and vehicle for detecting parking space
CN110794406B (en) Multi-source sensor data fusion system and method
US11144770B2 (en) Method and device for positioning vehicle, device, and computer readable storage medium
US11200432B2 (en) Method and apparatus for determining driving information
CN111986128A (en) Off-center image fusion
CN113239719B (en) Trajectory prediction method and device based on abnormal information identification and computer equipment
EP4290276A1 (en) Road boundary recognition method and apparatus
CN111857135A (en) Obstacle avoidance method and apparatus for vehicle, electronic device, and computer storage medium
JP2018048949A (en) Object recognition device
CN112581613A (en) Grid map generation method and system, electronic device and storage medium
CN114705121B (en) Vehicle pose measurement method and device, electronic equipment and storage medium
CN114694111A (en) Vehicle positioning
Sakic et al. Camera-LIDAR object detection and distance estimation with application in collision avoidance system
CN114972427A (en) Target tracking method based on monocular vision, terminal equipment and storage medium
US20220171975A1 (en) Method for Determining a Semantic Free Space
Fu et al. Camera-based semantic enhanced vehicle segmentation for planar lidar
US20230060542A1 (en) Method and Apparatus for Evaluating Maps for Autonomous Driving and Vehicle
CN116189150B (en) Monocular 3D target detection method, device, equipment and medium based on fusion output
CN112529011A (en) Target detection method and related device
CN116508073A (en) Method and device for determining target detection model
CN108416305B (en) Pose estimation method and device for continuous road segmentation object and terminal
US20240142590A1 (en) Online sensor alignment using feature registration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination