CN110598764A

CN110598764A - Training method and device of target detection model and electronic equipment

Info

Publication number: CN110598764A
Application number: CN201910801587.9A
Authority: CN
Inventors: 刘子立; 郑途; 徐国栋; 钱炜; 杨政
Original assignee: Hangzhou Feibao Technology Co Ltd
Current assignee: Hangzhou Feibao Technology Co Ltd; Hangzhou Fabu Technology Co Ltd
Priority date: 2019-08-28
Filing date: 2019-08-28
Publication date: 2019-12-20

Abstract

The invention provides a training method and device of a target detection model and electronic equipment. The method comprises the following steps: acquiring reference information of a reference frame in a training image; obtaining prediction information obtained through analysis of a target detection model; the prediction information comprises prediction central point information and prediction size information corresponding to all sample points in a central area, the central area is a sub-area of a reference frame, the central point of the central area is an object central point, and the prediction size information comprises sample point information and corresponding distance information; and obtaining and repeatedly training the target detection model according to the first evaluation result and the second evaluation result until the target detection model meets the expected requirement. The second evaluation result comprises the size data corresponding to all the sample points, the data volume evaluated by the second evaluation result is large and the second evaluation result can be obtained quickly, the convergence speed of the target verification model is improved, the training times of the target detection model are reduced, and the training speed is improved.

Description

Training method and device of target detection model and electronic equipment

Technical Field

The invention relates to the technical field of internet, in particular to a training method and device of a target detection model and electronic equipment.

Background

The target detection is a research field mutually promoting and developing with deep learning, and the target detection can be applied to real scenes such as vehicle detection, pedestrian detection and traffic light detection and a plurality of fields such as unmanned driving and security systems by utilizing image data acquired by a camera sensor.

The existing target detection method is to use a target detection model to position and identify a target object, most target detection models are designed based on an anchor, and the anchor-free design mode is recently developed. The training process of the target detection model designed based on anchor-free is as follows: inputting an original picture and labeling information (including a reference frame and categories corresponding to a target object) of a training target verification model into the model, identifying the target object in the original picture by the target verification model to predict the position and size information of a central point corresponding to the target object, corresponding to the position and size information of the central point corresponding to the target object, evaluating the predicted central point by the target verification model according to the actual central point of the reference frame, evaluating the predicted width and height of the central point of the reference frame according to the width and height of the reference frame, and then repeatedly training the model by the target verification model according to the evaluation until the training is completed.

However, with this scheme, the number of times of training is large in the process of training the target verification model, which results in a long training time for the target detection model.

Disclosure of Invention

The invention provides a training method and a training device for a target detection model and electronic equipment, which are used for solving the problem that the existing target detection model is long in training time.

In a first aspect, a method for training a target detection model is provided, including:

acquiring reference information of a reference frame corresponding to a target object in a training image, wherein the reference information comprises object center point information and actual size information of the target object, and the actual size information comprises distances from four sides of the reference frame to points corresponding to the object center point information;

acquiring prediction information of a prediction frame corresponding to the target object, which is obtained through analysis of a target detection model; the prediction information comprises prediction central point information and prediction size information corresponding to all sample points in a central area, the central area is a sub-area of the reference frame, the central point of the central area is the object central point, and the prediction size information comprises sample point information and distance information from four edges of the prediction frame to points corresponding to the sample point information;

evaluating the predicted central point information according to the object central point information to obtain a first evaluation result, and evaluating the predicted size information according to the object central point information and the actual size information to obtain a second evaluation result;

and repeatedly training the target detection model according to the first evaluation result and the second evaluation result until the target detection model meets the expected requirement.

In a second aspect, there is provided a training apparatus for a target detection model, including:

the first information acquisition module is used for acquiring reference information of a reference frame corresponding to a target object in a training image, wherein the reference information comprises object center point information and actual size information of the target object, and the actual size information comprises distances from four sides of the reference frame to points corresponding to the object center point information;

the second information acquisition module is used for acquiring the prediction information of the prediction frame corresponding to the target object, which is obtained by analyzing the target detection model; the prediction information comprises prediction central point information and prediction size information corresponding to all sample points in a central area, the central area is a sub-area of the reference frame, the central point of the central area is the object central point, and the prediction size information comprises sample point information and distance information from four edges of the prediction frame to points corresponding to the sample point information;

the evaluation module is used for evaluating the predicted central point information according to the object central point information to obtain a first evaluation result, and evaluating the predicted size information according to the object central point information and the actual size information to obtain a second evaluation result;

and the repeated training module is used for repeatedly training the target detection model according to the first evaluation result and the second evaluation result until the target detection model meets the expected requirement.

In a third aspect, an electronic device is provided, including:

a processor; and

a memory arranged to store computer executable instructions that when executed perform a method of training an object detection model as described in the first aspect using the processor.

In a fourth aspect, there is provided a computer readable medium storing one or more programs which, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of training an object detection model according to the first aspect.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects: the method comprises the steps of obtaining an object center point of a target object reference, obtaining an actual size and a predicted center point and a predicted size of a target object obtained through analysis of a target detection model, evaluating the predicted center point according to the object center point, evaluating the predicted size according to the object center point and the actual size, optimizing the target detection model, and completing training of the target detection model after repeated training. In the process of evaluating the predicted size, the predicted size information including the distance information of the four sides of the plurality of groups of predicted frames is obtained by determining a central area and obtaining the sample point in the central area and the distance information of the predicted frames and the sample point, and then the second evaluation result is calculated. Compared with the scheme that the width and height data of a group of prediction frames are determined by using the object center point for evaluation in the prior art, the data evaluated by the second evaluation result in the embodiment comprises the distances from all sample points in the central area to the four sides of the prediction frames, the data quantity evaluated by the second evaluation result is large and is acquired quickly, so that the convergence speed of the target verification model is increased, the training times of the target detection model are reduced, and the training speed is increased.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a schematic flow chart diagram of a method of training a target detection model according to one embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of a method of training a target detection model according to another embodiment of the present application;

FIG. 3 is a flow diagram of a method of training a target detection model according to one embodiment of the present application;

FIG. 4 is a schematic view of a central region of an embodiment of the present application;

FIG. 5 is a flow chart of a method of training a target detection model according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a method of training a target detection model according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 8 is a schematic structural diagram of a training apparatus for an object detection model according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a training apparatus for a target detection model according to another embodiment of the present application.

With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a schematic flow chart of a training method of a target detection model according to an embodiment of the present application. As shown in fig. 1, the training method of the target detection model includes:

s102, acquiring reference information of a reference frame corresponding to a target object in the training image, wherein the reference information comprises object center point information and actual size information of the target object, and the actual size information comprises distances from four sides of the reference frame to points corresponding to the object center point information.

Specifically, the training image is an image for training the target detection model, and the reference information is prediction information corresponding to the target object for evaluating the target detection model. The reference information includes object center point information and actual size information of the target object, and the reference frame can be determined through the object center point and the actual size information, wherein the object center point information and the actual size information can be obtained after information input by a human is converted. Such as: and manually inputting the coordinates of the center point of the object and the four-side distance information corresponding to the actual size information, and determining the reference frame and the corresponding reference information according to the coordinates and the four-side distance information.

Optionally, as an embodiment, in S102, acquiring reference information of a reference frame corresponding to a target object in a training image includes:

and S1022, obtaining labeling information corresponding to the target object, wherein the labeling information comprises marking information corresponding to the object center point information and the actual size information.

And S1024, converting the labeling information to obtain the reference information.

Specifically, the labeling information includes mark information corresponding to the object center point information and the actual size information, and the process of acquiring the reference information may include: and acquiring the marking information, converting the marking information corresponding to the object center point information in the marking information into object center point information, and converting the marking information corresponding to the actual size information in the marking information into actual size information to obtain reference information.

In the embodiment, the reference information is obtained by obtaining the marking information and performing coding conversion on the marking information, and the reference information is convenient to obtain. The labeling information can be obtained after other detection models are analyzed, or can be manually labeled, and the labeling information is convenient to obtain.

The method shown in fig. 1 further comprises:

s104, obtaining the prediction information of a prediction frame corresponding to the target object, which is obtained through the analysis of the target detection model; the prediction information comprises prediction central point information and prediction size information corresponding to all sample points in the central area, the central area is a sub-area of the reference frame, the central point of the central area is an object central point, and the prediction size information comprises the sample point information and distance information from four edges of the prediction frame to the points corresponding to the sample point information.

Specifically, a training image is acquired, and the training image is analyzed by using a target detection model, so that prediction information corresponding to a prediction frame corresponding to a target object in the training image is obtained, wherein the prediction information comprises prediction center point information and prediction size information of the target object. The prediction size information includes sample point information corresponding to the sample point and distance information from four sides of the prediction frame to points corresponding to the sample point information. The process of obtaining the predicted size information is as follows: and determining a central area according to the reference frame, wherein the central point of the central area is the central point of the object, the central area is preferably an elliptical area, then obtaining the information of the sample points, calculating and obtaining the distance information corresponding to all the sample points according to four edges of the prediction frame, and combining all the sample points in the central area with the corresponding distance information to obtain the predicted size information.

Optionally, as an embodiment, in S104, obtaining prediction information of a prediction frame corresponding to the target object, which is obtained through analysis by the target detection model, includes:

and S1042, acquiring a training image.

And S1044, analyzing the training image through the target detection model to obtain the prediction information.

Specifically, the process of obtaining the prediction information may include: and acquiring a training image, and analyzing the training image by using a target detection model to acquire prediction information corresponding to the prediction frame, wherein the prediction information comprises prediction central point information and prediction size information.

In this embodiment, the target detection model is used to obtain the prediction information related to the prediction frame, the prediction information is evaluated according to the reference information, and the accuracy of the target detection model can be determined by comparing the information with the information.

Optionally, as an embodiment, the method shown in fig. 1 further includes: and determining the target sample point as a sample point in a target reference frame, wherein the target sample point exists in at least two reference frames corresponding to different target objects at the same time, and the target reference frame is a small-size reference frame in the at least two reference frames corresponding to different target objects.

Specifically, in the process of identifying a target object in an image, two target objects may be adjacent to each other, and at this time, sample points of the two target objects may be overlapped, and the sample point is defined as a sample point in a small-sized reference frame, that is, in the case where a single sample point corresponds to two target objects at the same time, it is determined that the sample point corresponds to the small-sized reference frame.

The method shown in fig. 1 further comprises:

and S106, evaluating and predicting the central point information according to the object central point information to obtain a first evaluation result, and evaluating and predicting the size information according to the object central point information and the actual size information to obtain a second evaluation result.

Specifically, the predicted center point is evaluated according to the object center point to obtain a first evaluation result, and the predicted size is evaluated according to the object center point and the actual size to obtain a second evaluation result.

It should be understood that, for one prediction frame of one target object, information corresponding to one prediction frame is divided into prediction center point information and prediction size information, and the prediction center point and the prediction size are evaluated respectively.

In the evaluation process of the predicted central point, one prediction frame may correspond to one predicted central point, and the first evaluation result may be obtained according to the distance between the predicted central point and the object central point, that is, the closer the distance between the predicted central point and the object central point is, the better the first evaluation result is.

In the process of evaluating the predicted size, the predicted size information comprising the distance information of the distances between four sides of a plurality of groups of predicted frames is obtained by determining a central area and obtaining sample points in the central area and distance information related to the sample points, the distance information of the distances between four sides of each group of predicted frames corresponds to different sample points, and then the predicted size information comprising a plurality of groups of data is evaluated according to the center point and the actual size of the object. According to the position of the sample point, the distance information between the four sides between each sample point and the prediction frame is easily acquired, so the speed of acquiring the prediction size information is convenient and fast.

And S108, repeatedly training the target detection model according to the first evaluation result and the second evaluation result until the target detection model meets the expected requirement.

Specifically, according to the first evaluation result and the second evaluation result, the loss of the target detection model is determined, and the training is repeated until the target detection model meets the expected requirement.

In this embodiment, the target detection model is optimized and the training of the target detection model is completed after repeated training by obtaining the object center point of the target object reference, the actual size and the predicted center point and the predicted size of the target object obtained by analyzing the target detection model, then evaluating the predicted center point according to the object center point, and evaluating the predicted size according to the object center point and the actual size. In the process of evaluating the predicted size, the predicted size information including the distance information of the four sides of the plurality of groups of predicted frames is obtained by determining a central area and obtaining the sample point in the central area and the distance information of the predicted frames and the sample point, and then the second evaluation result is calculated. Compared with the scheme that the width and height data of a group of prediction frames are determined by using the object center point for evaluation in the prior art, the data evaluated by the second evaluation result in the embodiment comprises the distances from all sample points in the central area to the four sides of the prediction frames, the data quantity evaluated by the second evaluation result is large and is acquired quickly, so that the convergence speed of the target verification model is increased, the training times of the target detection model are reduced, and the training speed is increased.

Optionally, as an embodiment, the second evaluation result is related to a distance between the sample point and a point corresponding to the object center point information.

Specifically, the second evaluation result is used for evaluating the size of the prediction frame predicted by the target detection model according to the actual size of the target object, the second evaluation result is related to the distance between the sample point and the object center point, the closer the sample point is to the object center point, the greater the weight of the sample point occupying the second evaluation result is, that is, the closer the sample point is to the object center point, the greater the influence of the size data corresponding to the sample point on the second evaluation result is, the accuracy of the evaluation result is improved, and the training of the target detection model can be completed more quickly.

Optionally, as an embodiment, the first evaluation result is related to a distance between the predicted central point and a point corresponding to the object central point information.

Specifically, the first evaluation result is used for evaluating the predicted central point of the predicted frame of the target detection model according to the actual central point of the target object, the first evaluation result is related to the distance between the predicted central point and the object central point, and the closer the distance between the predicted central point and the object central point is, the more accurate the first evaluation result is, so that the training of the target detection model can be completed more quickly.

Optionally, as an embodiment, fig. 2 is a schematic flowchart of a training method of a target detection model according to another embodiment of the present application. As shown in fig. 2, the method shown in fig. 1 further includes:

and S110, acquiring an image to be analyzed.

And S112, inputting the image to be analyzed into the trained target detection model to obtain a positioning central point and a plurality of analysis sizes, wherein the plurality of analysis sizes correspond to the plurality of analysis central points respectively.

Specifically, in the process of applying the target detection model, an image to be analyzed is acquired. And inputting the image to be analyzed into a target detection model obtained by training, on one hand, obtaining the information of the positioning central point of the target object, on the other hand, obtaining the analysis size information corresponding to a plurality of target objects, wherein the two steps can be separately executed. The process of acquiring the location center point information may include: analyzing a central point characteristic graph of an image to be analyzed to obtain a plurality of pieces of central point information, and then removing the duplication of the plurality of pieces of central point information to determine the positioning central point information, wherein the positioning central point information corresponds to the positioning central point. The process of acquiring analysis size information may include: analyzing the size characteristic diagram of the image to be analyzed to obtain a plurality of analysis size information, wherein the analysis size information corresponds to the analysis size.

S114, screening a plurality of analysis sizes to obtain a target size based on the positioning center point, wherein the analysis center point corresponding to the target size is related to the positioning center point.

Specifically, after the locating center point is determined, a target size is screened from the plurality of analysis sizes according to the locating center point. The analysis center point corresponding to the target size is the positioning center point.

And S116, combining the positioning central point and the target size to obtain a positioning result.

Optionally, as an embodiment, the method shown in fig. 1 further includes:

s118, converting the positioning result into an image result, wherein the image result comprises a positioning frame corresponding to the target object in the image to be analyzed; or

And converting the positioning result into an information result of the same type as the labeling information.

Specifically, the positioning center point and the target size are combined to obtain a positioning result. After the positioning result is obtained, the positioning result can be converted into an image result, so that a user can conveniently observe and recognize the image result; the positioning result can also be converted into an information result, the information result can be information such as vector information and the like, application processing in other aspects is facilitated, the information result can also be mark information including a central point and a size of a corresponding target object, and data can be further processed conveniently through the mark information.

The following further explains the training process of the target detection model with reference to fig. 3, and as shown in fig. 3, the training process of the target detection model mainly includes the following steps:

s301, a training image and annotation information are obtained, wherein the annotation information comprises a central point and size information of a reference frame corresponding to a target object in the training image.

And S302, acquiring reference information. And converting the labeling information into reference information, wherein the reference information comprises object center point information and actual size information of the target object.

And S303, analyzing the training image through the target inspection model to obtain the prediction information. The prediction information includes a predicted center point and predicted size information of the target object. The predicted size information includes all sample point information within the central region determined from the reference information and size information corresponding to the sample point information.

The determination process of the central area is as shown in fig. 4, first determining a reference frame according to the labeling information, then selecting an object center point corresponding to the determined reference frame, and determining the center point as the central area of the object center point. All points within the central region are sample points, each having a set of size information from the prediction box.

S304, evaluating the predicted central point information in the predicted information according to the object central point information in the reference information to obtain a first evaluation result; and evaluating the predicted size information in the predicted information according to the object center point information and the actual size information in the reference information to obtain a second evaluation result.

Wherein the first evaluation result can be obtained by formula A, defining L_locIn characterizing target object predictionsCalculating the degree of closeness of the center point and the actual center point to obtain L_locThe larger the value of (b), the lower the proximity of the predicted central point to the actual central point. In formula A, the predicted characteristic is H, and the actual center point isi and j respectively represent the space dimension of the features, c represents the channel dimension, different channel dimensions represent different categories of target objects, M represents the total number of positive samples, the input image is an image of H W N, N is the number of channels of the input image, H_ijcAnda matrix of h w c, H, W indicating the resolution of the input image, h, w indicating the resolution of the feature image; both α f and β f are constants.

The second evaluation result can be obtained by formula B, definition L_regThe proximity between the predicted size and the actual size of the target object is characterized. In formula B, N_regFor the number of all samples, m represents the number of the labeled frame corresponding to the sub-region where the position (i, j) is located, W represents the weight corresponding to the sample, G is a distance function for calculating the coincidence degree between the predicted frame and the actual frame, B_ijThe sample point locations and corresponding predicted sizes, the predicted sizes being a set of distances from the four borders of the predicted box to the sample points,the position of the object center point of the target object and the actual size are the set of distances from the four borders of the reference frame to the object center point.

The weight W of the sample is given by the formula C, where n_mIs a measure which can reflect the size of the label box with the label number m; the function F may be a linear function or a decaying function, such as logarithmic, square root; a. the_mNamely a sub-region generated by a marking frame with the number of m; a represents a union of sub-regions generated by all the labeling frames in the image; the function K is a measure reflecting the distance of the sample (i, j) to the center of the label box numbered m. The larger the distance, the smaller the result of the function K. Because the samples in the sub-area generated by the large object are far more than the samples in the sub-area generated by the small object, the loss generated by the small object is easily submerged by the large object in the training process, and therefore, the sample weight W is introduced for adjustment, the probability that the loss generated by the small object is submerged is reduced, and the training speed and accuracy are improved.

S305, determining whether the first evaluation result and the second evaluation result meet the expected requirements. If so, finishing training; if not, after optimizing the target detection model, returning to K1 to repeat training the optimized target detection model.

The following further explains the analysis process of the trained object detection model with reference to fig. 5 and 6, and as shown in fig. 5 and 6, the analysis process of the trained object detection model mainly includes the following steps:

s501, obtaining an image to be analyzed.

And S502, acquiring a positioning central point. The image to be analyzed is input into a target detection model, the target detection model analyzes the characteristics of the central point of the image to be analyzed, all the central points possibly corresponding to the target object are determined, one target object can generate responses at a plurality of positions, the response of the position closer to the central point of the target object is larger, the response is deduplicated, the central position of the target object is obtained, namely, after the deduplication is performed from all the central points, the positioning central point corresponding to the target object is obtained.

S503, acquiring a plurality of analysis sizes. The target detection model analyzes the size characteristics of the image to be analyzed and obtains a plurality of analysis sizes.

And S504, determining a target size corresponding to the positioning center point from the plurality of analysis sizes. And screening a target size from the plurality of analysis sizes according to the positioning central point, wherein the central point corresponding to the target size is the positioning central point.

And S505, combining the positioning central point and the target size to obtain a positioning result. The positioning result includes a positioning frame corresponding to the target object.

An electronic device according to an embodiment of the present application will be described in detail below with reference to fig. 7. Referring to fig. 7, at a hardware level, the electronic device includes a processor, optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be interconnected by an internal bus, which may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an extended EISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the training device of the target detection model on a logic level. The processor is used for executing the program stored in the memory and is specifically used for executing the following operations:

acquiring reference information of a reference frame corresponding to a target object in a training image, wherein the reference information comprises object center point information and actual size information of the target object, and the actual size information comprises distances from four edges of the reference frame to points corresponding to the object center point information;

acquiring prediction information of a prediction frame corresponding to a target object, which is obtained through analysis of a target detection model; the prediction information comprises prediction central point information and prediction size information corresponding to all sample points in a central area, the central area is a sub-area of a reference frame, the central point of the central area is an object central point, and the prediction size information comprises the sample point information and distance information from four edges of the prediction frame to points corresponding to the sample point information;

evaluating and predicting center point information according to the object center point information to obtain a first evaluation result, and evaluating and predicting size information according to the object center point information and the actual size information to obtain a second evaluation result;

The method for training the target detection model disclosed in the embodiment of fig. 1 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

Of course, besides the software implementation, the electronic device of the present application does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Embodiments of the present application also provide a computer-readable storage medium storing one or more programs, where the one or more programs include instructions, which, when executed by an electronic device including a plurality of application programs, enable the electronic device to perform the method of the embodiment shown in fig. 1, and are specifically configured to perform the following method:

Fig. 8 is a schematic structural diagram of a training apparatus for an object detection model according to an embodiment of the present application. Referring to fig. 8, in a software implementation, the training apparatus 800 for the object detection model may include: a first information acquisition module 802, a second information acquisition module 804, an evaluation module 806, and a repetitive training module 808, wherein,

a first information obtaining module 802, configured to obtain reference information of a reference frame corresponding to a target object in a training image, where the reference information includes object center point information and actual size information of the target object, and the actual size information includes distances from four sides of the reference frame to points corresponding to the object center point information;

a second information obtaining module 804, configured to obtain prediction information of a prediction frame corresponding to the target object, obtained through analysis by the target detection model; the prediction information comprises prediction central point information and prediction size information corresponding to all sample points in a central area, the central area is a sub-area of a reference frame, the central point of the central area is an object central point, and the prediction size information comprises the sample point information and distance information from four edges of the prediction frame to points corresponding to the sample point information;

the evaluation module 806 is configured to evaluate and predict the center point information according to the object center point information to obtain a first evaluation result, and evaluate and predict the size information according to the object center point information and the actual size information to obtain a second evaluation result;

and the repeated training module 808 is configured to repeatedly train the target detection model according to the first evaluation result and the second evaluation result until the target detection model meets an expected requirement.

Optionally, as an embodiment, the first evaluation result is related to a distance between a point corresponding to the predicted central point information and a point corresponding to the object central point information.

Optionally, as an embodiment, as shown in fig. 9, the training apparatus 800 of the target detection model further includes:

an input module 810, configured to obtain an image to be analyzed;

an analysis module 812, configured to input the image to be analyzed into the trained target detection model to obtain a positioning center point and a plurality of analysis sizes, where the plurality of analysis sizes correspond to the plurality of analysis center points, respectively;

a screening module 814, configured to screen a plurality of analysis sizes based on the positioning center point to obtain a target size, where the analysis center point corresponding to the target size is related to the positioning center point;

and an output module 816, configured to combine the positioning center point and the target size to obtain a positioning result.

Optionally, as an embodiment, the first information obtaining module 802 includes:

the first acquisition unit is used for acquiring marking information corresponding to the target object, and the marking information comprises marking information corresponding to the object center point information and the actual size information;

and the conversion unit is used for converting the marking information to obtain the reference information.

Optionally, as an embodiment, the method further includes:

the output unit is used for converting the positioning result into an image result, and the image result comprises a positioning frame corresponding to the target object in the image to be analyzed; or

Optionally, as an embodiment, the second information obtaining module 804 includes:

a second acquisition unit for acquiring a training image;

and the analysis unit is used for analyzing the training image through the target detection model to obtain the prediction information.

It should be understood that the above and other operations and/or functions of each unit in the training apparatus for the target detection model according to the embodiment of the present application are respectively for implementing the corresponding flow of the method in fig. 1, and are not described herein again for brevity.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A method for training a target detection model, comprising:

2. The method according to claim 1, wherein the second evaluation result is related to a distance of the sample point from a point corresponding to the object center point information.

3. The method according to claim 1, wherein the first evaluation result is related to a distance of a point corresponding to the predicted center point information from a point corresponding to the object center point information.

4. The method according to any one of claims 1-3, further comprising:

acquiring an image to be analyzed;

inputting the image to be analyzed into a trained target detection model to obtain a positioning central point and a plurality of analysis sizes, wherein the plurality of analysis sizes correspond to the plurality of analysis central points respectively;

screening the plurality of analysis sizes to obtain a target size based on the positioning center point, wherein the analysis center point corresponding to the target size is related to the positioning center point;

and combining the positioning central point and the target size to obtain a positioning result.

5. The method according to claim 4, wherein the obtaining of the reference information of the reference frame corresponding to the target object in the training image comprises:

acquiring marking information corresponding to the target object, wherein the marking information comprises marking information corresponding to the object center point information and the actual size information;

and converting the labeling information to obtain the reference information.

6. The method of claim 5, further comprising:

converting the positioning result into an image result, wherein the image result comprises a positioning frame corresponding to a target object in the image to be analyzed; or

7. The method of claim 5, wherein the obtaining of the prediction information of the prediction box corresponding to the target object obtained through the analysis of the target detection model comprises:

acquiring the training image;

and analyzing the training image through the target detection model to obtain the prediction information.

8. An apparatus for training an object detection model, comprising:

9. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that when executed perform a method of training an object detection model as claimed in any one of claims 1 to 7 using the processor.

10. A computer-readable medium, characterized in that the computer-readable medium stores one or more programs which, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of training an object detection model according to any one of claims 1-7.