CN116310656A

CN116310656A - Training sample determining method and device and computer equipment

Info

Publication number: CN116310656A
Application number: CN202310528868.8A
Authority: CN
Inventors: 夏烨烽; 王健; 王靖博; 安阳; 钱鑫明; 卢仁建
Original assignee: Freetech Intelligent Systems Co Ltd
Current assignee: Freetech Intelligent Systems Co Ltd
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-06-23
Anticipated expiration: 2043-05-11
Also published as: CN116310656B

Abstract

The application relates to a training sample determination method, a training sample determination device and computer equipment. The method comprises the following steps: acquiring an image positive sample set and an image negative sample set, and determining the number ratio of the image positive samples in the image positive sample set to the image negative samples in the image negative sample set; if the number ratio is not the preset number ratio, identifying each image negative sample in the image negative sample set to respectively obtain the similarity between each image negative sample and different types of preset target objects; according to the similarity between each image negative sample and different types of preset target objects, respectively determining a preset number of image negative samples corresponding to the types of each preset target object from an image negative sample set to obtain a training image negative sample set; and training the target detection model according to the training image negative sample set and the image positive sample set. The method can improve the detection capability of the target detection model on the target types with rare target quantity, and improve the performance of the target detection model.

Description

Training sample determining method and device and computer equipment

Technical Field

The present disclosure relates to the field of computer vision, and in particular, to a training sample determining method, apparatus, and computer device.

Background

Object detection is one of the most important components of computer vision technology. The target detection can judge whether a preset target object exists in the picture or the video, and annotate the existing preset target object. Because of the characteristics of the target detection to distinguish the preset target object from other irrelevant objects, the target detection is widely applied to important fields such as intelligent driving, video monitoring, aerospace and the like.

However, most target object types detected by targets in real life have the characteristic of long-tail distribution, i.e. the proportion of target objects under different target object types is not uniformly distributed. For example, for the target detection of a highway, the probability of occurrence of pedestrians and two-wheelers is far smaller than that of motor vehicles, and the number of pictures of the pedestrians and the two-wheelers acquired is far smaller than that of the motor vehicles.

In the process of detecting the target objects of different types, the target detection model has poor detection capability on the target object types with rare target object numbers due to unbalanced number of the target objects of different types in the model training stage, so that the performance of the target detection model is influenced.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a training sample determination method, apparatus, and computer device that can improve the performance of a target detection model.

In a first aspect, the present application provides a training sample determining method, the method comprising:

acquiring an image positive sample set and an image negative sample set, and determining the number ratio of the image positive samples in the image positive sample set to the image negative samples in the image negative sample set;

if the number ratio is not the preset number ratio, identifying each image negative sample in the image negative sample set to obtain the similarity of each image negative sample and different types of preset target objects;

according to the similarity between each image negative sample and different types of preset target objects, respectively determining a preset number of image negative samples corresponding to the types of the preset target objects from the image negative sample set to obtain a training image negative sample set;

and training a target detection model according to the training image negative sample set and the image positive sample set.

In one embodiment, the training the target detection model according to the training image negative sample set and the image positive sample set includes:

Obtaining a training image positive sample set according to the image positive sample set and the true value labeling frame set;

and training a target detection model according to the training image negative sample set and the training image positive sample set.

In one embodiment, the obtaining the training image positive sample set according to the image positive sample set and the true value labeling frame set includes:

determining a truth value labeling frame corresponding to each image positive sample in the image positive sample set from the truth value labeling frame set;

performing fusion processing according to each image positive sample and the corresponding true value labeling frame to obtain a virtual image positive sample;

obtaining a training image positive sample subset corresponding to each type according to the virtual image positive sample and the image positive sample set;

and respectively taking the same number of training image positive samples from the training image positive sample subsets to obtain a training image positive sample set.

In one embodiment, the fusing processing is performed according to each positive image sample and the corresponding true value labeling frame to obtain a positive virtual image sample, which includes:

obtaining a virtual image positive sample according to the first vertex coordinates of each image positive sample and the corresponding second vertex coordinates of the true value annotation frame; the virtual image positive sample comprises at least one of a bitwise frame, an intersection frame, a union frame and a random disturbance frame of the image positive sample and the corresponding true value annotation frame.

In one embodiment, the acquiring the positive and negative image sample sets includes:

acquiring a training picture set with the true value annotation frame;

according to the preset step length and the preset size, generating anchor frames of each training picture in the training picture set;

dividing the anchor frames into image positive samples and image negative samples according to the intersection ratio of the anchor frames and the true value labeling frames;

the positive image sample set is obtained from the positive image sample, and the negative image sample set is obtained from the negative image sample.

In one embodiment, the determining, from the image negative sample set, a preset number of image negative samples corresponding to the type of each preset target object according to the similarity between each image negative sample and the preset target object of different types includes:

and for each type of the preset target object, sorting the image negative samples based on the sequence of the similarity from large to small to obtain a sorted image negative sample set, and determining a preset number of image negative samples from the sorted image negative sample set.

In one embodiment, the method further comprises:

Acquiring a picture to be detected and a trained target detection model;

generating an anchor frame of the picture to be detected according to a preset step length and a preset size;

determining at least one candidate frame corresponding to a target object to be detected in the picture to be detected from the anchor frame according to the trained target detection model; the target frame is used for determining the type of a preset target object and the position of the preset target object in the picture to be detected.

In a second aspect, the present application further provides a training sample determining apparatus. The device comprises:

the sample acquisition module is used for acquiring an image positive sample set and an image negative sample set and determining the number ratio of the image positive samples in the image positive sample set to the image negative samples in the image negative sample set;

the negative sample identification module is used for identifying each image negative sample in the image negative sample set if the number ratio is not a preset number ratio, and respectively obtaining the similarity of each image negative sample and different types of preset target objects;

the training negative sample determining module is used for respectively determining a preset number of image negative samples corresponding to the types of the preset target objects from the image negative sample set according to the similarity between the image negative samples and the preset target objects of different types to obtain a training image negative sample set;

And the model training module is used for training a target detection model according to the training image negative sample set and the image positive sample set.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

According to the training sample determining method, the device, the computer equipment, the storage medium and the computer program product, the target detection model training process is optimized for the image samples participating in training, after the image positive sample set and the image negative sample set are obtained and the number ratio between the image positive sample set and the image negative sample set is determined, if the number ratio is not the preset number ratio, the preset number of image negative samples corresponding to the types of the preset target objects are respectively determined from the image negative sample set according to the similarity of the image negative samples and the preset target objects of different types, so that the image negative samples corresponding to each preset target object type are uniformly involved in training, the problem that the target detection model is fit for the preset target object types with scarce target object types due to insufficient training samples is avoided, the adverse effect of the unbalanced number of the preset target objects of different types on the target detection model is effectively weakened, the detection capability of the target detection model for the preset target object types with scarce target object types is improved, and the performance of the target detection model is improved.

Drawings

FIG. 1 is a diagram of an application environment for a training sample determination method in one embodiment;

FIG. 2 is a flow chart of a training sample determination method in one embodiment;

FIG. 3 is a flow diagram of training a target detection model based on a negative set of training images and a positive set of images in one embodiment;

FIG. 4 is a flow diagram of determining a positive sample of a virtual image in one embodiment;

FIG. 5 is a schematic diagram of a fusion process according to positive image samples and truth-value labeling boxes in one embodiment;

FIG. 6 is a flow chart of a training sample determination method according to another embodiment;

FIG. 7 is a diagram showing a distribution of negative examples of images obtained by mining training pictures in one embodiment;

FIG. 8 is a schematic flow chart of performing object detection on a picture to be detected in one embodiment;

FIG. 9 is a block diagram of a training sample determination apparatus in one embodiment;

FIG. 10 is a block diagram showing the construction of a training sample determining apparatus according to another embodiment;

FIG. 11 is an internal block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The training sample determining method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 acquires an image positive sample set and an image negative sample set from the server 104, and determines the number ratio of the image positive samples in the image positive sample set to the image negative samples in the image negative sample set; if the number ratio is not the preset number ratio, identifying each image negative sample in the image negative sample set to respectively obtain the similarity between each image negative sample and different types of preset target objects; according to the similarity between each image negative sample and different types of preset target objects, respectively determining a preset number of negative samples corresponding to the types of each preset target object from the image negative sample set to obtain a training image negative sample set; and training the target detection model according to the training image negative sample set and the image positive sample set. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a training sample determining method is provided, and this embodiment is applied to a terminal for illustration by using the method. In this embodiment, the method includes the steps of:

step 202, acquiring a positive image sample set and a negative image sample set, and determining the number ratio of the positive image samples in the positive image sample set to the negative image samples in the negative image sample set.

In this embodiment, when a target detection model for detecting the type and position of a preset target object needs to be trained, first, an image positive sample and an image negative sample for training the model are acquired in each training picture. The positive image sample and the negative image sample are detection frames in the training pictures. The image positive sample may be a detection frame with an intersection ratio with a true value labeling frame of the target object to be detected greater than a preset threshold, and the image negative sample is a detection frame with an intersection ratio with a true value labeling frame of the target object to be detected less than or equal to the preset threshold. The positions and types of different preset target objects in the truth labeling frame labeling training pictures can be labeled in advance in a manual mode.

The types of the preset target objects in the picture to be detected, which is required to be detected by the target detection model, are exemplified by three types of automobiles, two-wheelers and pedestrians. The image positive sample is a detection frame with the intersection ratio of the image positive sample and the true value marking frame of the automobile, the two-wheel vehicle or the pedestrian in the training picture being larger than a preset threshold value, and the image negative sample can be a detection frame with the intersection ratio of the image negative sample and the true value marking frame of the automobile, the two-wheel vehicle or the pedestrian in the training picture being smaller than the preset threshold value. For example, the positive sample is a detection frame with the content of an automobile or a part of an automobile, and the negative sample is a detection frame with the content of a road or a green belt.

It can be appreciated that, for the training pictures acquired in the actual scene, the area ratio occupied by the preset target object is far smaller than that occupied by the background. Therefore, the number of the image positive samples in the training picture is far smaller than that of the image negative samples, and the excessive image negative samples which are irrelevant to the preset target object to be detected participate in training, so that the performance of the target detection model is not improved. Thus, the different object detection models define a quantitative ratio according to an empirical value for the positive and negative image samples involved in training, for example, when the object detection model is a master RCNN model, the quantitative ratio of the positive image sample to the negative image sample involved in model training is 1:3.

And 204, if the number ratio is not the preset number ratio, identifying each image negative sample in the image negative sample set to respectively obtain the similarity between each image negative sample and different types of preset target objects.

Specifically, the types of the preset target objects are exemplified by automobiles, bicycles, and pedestrians. When the number ratio of the image positive samples to the image negative samples is not the preset number ratio, inputting each image negative sample in the image negative sample set into a target detection model, and respectively obtaining the similarity of each image negative sample corresponding to the preset target object of the automobile type, the two-wheel vehicle type and the pedestrian type according to the probability value of matching each image negative sample output by the activation function in the target detection model with each preset target object type.

For example, a negative image is input to the object detection model, and the activation function outputs a probability value (0.5,0.3,0.2) that the negative image matches three types of automobiles, bicycles, and pedestrians. The negative image sample has a similarity of 0.5 to the preset target object of the car type, a similarity of 0.3 to the preset target object of the two-wheel vehicle type, and a similarity of 0.2 to the preset target object of the pedestrian type.

Step 206, according to the similarity between each image negative sample and different types of preset target objects, respectively determining a preset number of negative samples corresponding to each type of preset target object from the image negative sample set, so as to obtain a training image negative sample set.

Specifically, each image negative sample may have different similarities with different types of preset target objects, for example, a certain image negative sample may have three different similarities corresponding to a preset target object of an automobile type, a two-wheel vehicle type, and a pedestrian type, respectively, and a type with the largest similarity is taken as the type of the image negative sample. And respectively determining a preset number of image negative samples corresponding to each type from the image negative sample set to obtain training image negative samples. The preset number enables the image positive samples in the image positive sample set to be in proportion to the training image negative samples in the training image negative sample set in the preset number.

For example, the image negative samples of which the types are the car type, the two-wheel vehicle type or the pedestrian type are determined according to the maximum value of the similarity of each image negative sample with three preset target objects of the car, the two-wheel vehicle and the pedestrian. And respectively taking out a preset number of image negative samples for the three types of image negative samples of the image negative sample set to obtain training image negative samples.

Step 208, training the target detection model according to the training image negative sample set and the image positive sample set.

According to the training sample determining method, the determining process of the target detection model training sample is optimized for the image samples participating in training, after the image positive sample set and the image negative sample set are obtained and the number ratio between the image positive sample set and the image negative sample set is determined, if the number ratio is not the preset number ratio, the preset number of image negative samples corresponding to the types of the preset target objects are respectively determined from the image negative sample set according to the similarity of the image negative samples and the preset target objects of different types, so that the training image negative sample set is obtained, and the target detection model is trained according to the training image negative sample set and the image positive sample set. According to the method, the number of the negative samples of the image corresponding to each preset target object type is balanced to participate in training, so that the problem that the target detection model is lack of fit for the target object types with sparse preset target object numbers due to insufficient training samples is avoided, adverse effects of different types of preset target object numbers on the target detection model are effectively weakened, the detection capability of the target detection model for the preset target object types with sparse preset target object numbers is improved, and the performance of the target detection model is improved.

In one embodiment, the training sample determining method in the application includes resampling the positive image samples to obtain the positive training image samples, in addition to determining the balanced number of negative training image samples according to the similarity with the type of the preset target object. As shown in fig. 3, there is provided a method of training a target detection model, including:

step 302, obtaining a training image positive sample set according to the image positive sample set and the true value labeling frame set.

Specifically, according to the image positive sample set and the truth value labeling frame set, there may be various implementation manners for obtaining the training image positive sample set, for example, each image positive sample with the intersection ratio of the corresponding truth value labeling frame within the preset threshold range may be screened from the image positive sample set, and used as the training image positive sample set. The new detection frames can be generated based on the image positive sample set and the true value labeling frame set, and the training image positive sample set is obtained from the generated detection frames and the existing image positive sample set.

Optionally, in one embodiment, as shown in fig. 4, there is provided a step of determining a positive sample of a virtual image, including:

step 402, determining a truth-labeling frame corresponding to each image positive sample in the image positive sample set from the truth-labeling frame set.

The true value labeling frame corresponding to the positive image sample refers to a true value labeling frame with the intersection ratio of the true value labeling frame and the positive image sample being larger than a preset threshold value.

And step 404, performing fusion processing according to each image positive sample and the corresponding truth labeling frame to obtain a virtual image positive sample.

The fusion processing is to calculate the first vertex coordinates of the positive samples of each image and the second vertex coordinates of the corresponding true value labeling frames to obtain the vertex coordinates of the split frames, intersection frames, union frames and random disturbance frames of the positive samples of each image and the corresponding true value labeling frames. The first vertex coordinates are determined by the coordinates of the upper left corner and the lower right corner of the positive sample of the image, and the second vertex coordinates are determined by the abscissa and the ordinate of the upper left corner and the lower right corner of the truth labeling frame. The virtual image positive sample comprises at least one of a bitwise frame, an intersection frame, a union frame and a random disturbance frame of the image positive sample and a corresponding true value labeling frame. For example, the virtual image positive sample may be a split box of the image positive sample and the corresponding truth-labeling box, or a combination of the split box with other at least one detection box.

Specifically, the coordinates of the upper left corner and the lower right corner in the first vertex coordinates of each positive image sample and the coordinates of the corresponding vertex in the second vertex coordinates of the corresponding true value labeling frame are averaged, and the obtained detection frame is a bitwise frame of the positive image sample and the corresponding true value labeling frame. And taking the coordinate of the upper left corner in the first vertex coordinate and the second vertex coordinate as the minimum value, and taking the coordinate of the lower right corner as the maximum value, wherein the obtained detection frame is the union frame of the positive image sample and the corresponding true value labeling frame. And taking the coordinates of the upper left corner point in the first vertex coordinates and the second vertex coordinates as a maximum value, and taking the coordinates of the lower right corner point as a minimum value, wherein the obtained detection frame is a union frame of the positive image sample and the corresponding true value labeling frame. And randomly floating the vertex coordinates of any detection frame in the bitwise frame, the intersection frame and the union frame of the positive image sample and the corresponding true value labeling frame according to a preset floating range to obtain a random disturbance frame of the positive image sample and the corresponding true value labeling frame.

And selecting at least one detection frame or a combination of multiple detection frames from the bitboxes, intersection frames, union frames and random disturbance frames of each image positive sample and the corresponding true value labeling frame as a virtual image positive sample.

As shown in fig. 5, fig. 5 is a schematic diagram of a fusion process according to a positive image sample and its corresponding truth labeling frame. Wherein the first vertex coordinate of the positive image sample Anchor0 is (x ^1a ,y ^1a ,x ^2a ,y ^2a ). The second vertex coordinate of the truth labeling frame GT is (x) ^1g ,y ^1g ,x ^2g ,y ^2g ）。

The coordinates of the upper left corner point and the lower right corner point in the first vertex coordinates and the coordinates of the corresponding vertex in the second vertex coordinates are averaged to obtain the coordinates of

Is the division of Anchor0 and GT, namely the detection frame Anchor1 of (1)A bit frame. The coordinates of the upper left corner of the first vertex coordinates and the second vertex coordinates are minimized, and the coordinates of the lower right corner are maximized, so that a coordinate (min (x) ^1g ,x ^1a ), min(y ^1g ,y ^1a ), max(x ^2g ,x ^2a ), max(y ^2g ,y ^2a ) A detection frame Anchor2, which is an intersection frame of Anchor0 and GT. The coordinates of the upper left corner point of the first vertex coordinates and the second vertex coordinates are maximized, the coordinates of the lower right corner point are minimized, and the coordinates (max (x) ^1g ,x ^1a ), max(y ^1g ,y ^1a ), min(x ^2g ,x ^2a ), min(y ^2g ,y ^2a ) A detection frame Anchor3, namely a union frame of Anchor0 and GT. And randomly floating the vertex coordinates of any one detection frame of the anchor1, the anchor2 and the anchor3 according to a preset floating range to obtain a positive image sample and a random disturbance frame of a corresponding true value labeling frame.

At least one detection frame or a combination of a plurality of detection frames is selected from the anchor1, the anchor2, the anchor3 and the random disturbance frames, namely the virtual image positive sample corresponding to the image positive sample anchor 0.

Step 406, obtaining a training image positive sample subset corresponding to each type according to the virtual image positive sample and the image positive sample set.

In step 408, the same number of training image positive samples are taken from the training image positive sample subsets, respectively, to obtain a training image positive sample set.

In the above embodiment, the virtual image positive sample is obtained by performing fusion processing on the image positive sample set and the true value labeling frame, and according to each virtual image positive sample and each image positive sample set, training image positive sample subsets corresponding to different preset target object types are obtained, and the same number of image positive samples are taken from each training image positive sample subset to participate in the target detection model. Compared with a processing mode of simply copying the image positive sample, the method of the embodiment effectively avoids the problem that common positive sample resampling is easy to be over-fitted, and improves the performance of the target detection model more obviously. Meanwhile, the training image positive samples with the same number are taken from the image positive samples with different types to participate in model training, so that the detection capability of the target detection model for the preset target object types with rare number of preset target objects is further improved.

Step 304, training the target detection model according to the training image negative sample set and the training image positive sample set.

In the above embodiment, the training image positive sample set is obtained according to the image positive sample set and the true value labeling frame set, and the target detection model is trained according to the training image negative sample set and the training image positive sample set. The training-participating image positive sample set is enriched, and the problem that the performance of the model is affected due to insufficient training of the target detection model caused by insufficient number of the training-participating image positive samples is avoided.

In another embodiment, as shown in fig. 6, a training sample determining method is provided, and this embodiment is applied to a terminal for illustration by the method. In this embodiment, the method includes the steps of:

step 602, a training picture set with a true value labeling frame is obtained.

Step 604, generating anchor frames of each training picture in the training picture set according to the preset step length and the preset size.

In this embodiment, the preset step length and the preset size are set in advance by an algorithm, all anchor points are determined from the training picture according to the preset step length, and the detection frame generated in the training picture according to the anchor points and the preset size is the anchor frame.

In step 606, the anchor boxes are divided into positive image samples and negative image samples according to the intersection ratio of each anchor box and each true value labeling box.

The intersection ratio of each anchor frame and each truth value labeling frame is obtained by dividing the area of an intersection frame of the anchor frame and the truth value frame with the area of a union frame of the anchor frame and the truth value frame. The value of the cross ratio is at most 1.

Specifically, the division of the anchor frame into an image positive sample and an image negative sample includes: and taking the anchor frame with the cross-over ratio of the true value labeling frame being larger than a preset threshold value as an image positive sample, and taking the anchor frame with the cross-over ratio of the true value labeling frame being smaller than or equal to the preset threshold value as an image negative sample.

Step 608, obtaining an image positive sample set according to the image positive sample, and obtaining an image negative sample set according to the image negative sample.

Step 610, determining a ratio of the number of positive samples in the positive sample set to the number of negative samples in the negative sample set.

And step 612, if the number ratio is not the preset number ratio, identifying each image negative sample in the image negative sample set, and respectively obtaining the similarity between each image negative sample and different types of preset target objects.

Step 614, for each type of preset target object, sorting the negative images according to the order of the similarity from large to small, to obtain a sorted negative image sample set, and determining a preset number of negative image samples from the sorted negative image sample set, to obtain a negative training image sample set.

It can be understood that, in order to avoid the manual labeling error in the training picture preparation stage, the positive image sample caused by the label missing of the true labeling frame is divided into negative image samples to affect the accuracy of the model, OHEM (online hard example mining, on-line negative mining) needs to be performed on the negative image samples, that is, a part of the negative image samples with the maximum similarity with the preset target object is screened out and used as the negative training image samples. However, the conventional negative case mining does not consider the problem of unbalanced quantity among different preset target object types, so that the improvement effect on the accuracy of the target detection model is limited. In order to solve the problem, in this embodiment, a negative difficulty mining scheme is proposed to screen the negative image samples participating in training class by class according to the type of the preset target object.

Specifically, in this embodiment, for a certain preset target object type, the negative samples are ranked from large to small according to the similarity with the type. The similarity between the negative sample and all preset target object types can be determined according to the probability value output by the normalized exponential function softmax. And determining a preset number of image negative samples from the sequenced image negative sample set, and obtaining the training image negative samples corresponding to the type. Repeating the steps for each type to obtain a training image negative sample set. The method for determining the preset number of image negative samples is that for the image negative sample set sequenced according to the similarity, the image negative sample with the maximum similarity with the preset target object type is selected from the image negative sample set, and the image negative sample with the maximum similarity is continuously selected from the rest image negative samples until the preset number of image negative samples are obtained.

For example, in the four-classification target detection model for distinguishing a background, an automobile, a two-wheel vehicle and a pedestrian, the probability value output by the softmax function to the detection frame is (p 0, p1, p2, p 3), p0 represents the probability that the content in the detection frame is the background, p1 represents the probability that the content in the detection frame is the automobile, p2 represents the probability that the content in the detection frame is the two-wheel vehicle, and p3 represents the probability that the content in the detection frame is the pedestrian. After dividing the positive and negative image samples and the type of the positive image sample, (p 1, p2, p 3) of the negative image sample can be calculated separately. And sorting the image negative samples of the image negative sample set from large to small according to the p1 value, and taking the first K image negative samples from the sorted image negative sample set, namely, the training image negative samples corresponding to the automobile type. Repeating the steps for the other two types to obtain a training image negative sample set.

Fig. 7 is a distribution diagram of a training image negative sample obtained by mining the same training image for negative cases by using different methods. As shown in fig. 7, the first row is a profile of a training image negative sample obtained by performing conventional negative mining, the second row is a profile of a training image negative sample obtained by performing negative mining for three preset target object types of a motor vehicle, a pedestrian, and a two-wheel vehicle by the method of the present embodiment, the third row is a profile of a training image negative sample obtained by performing negative mining for a motor vehicle type by the method of the present embodiment, the fourth row is a profile of a training image negative sample obtained by performing negative mining for a pedestrian type by the method of the present embodiment, and the fifth row is a profile of a training image negative sample obtained by performing negative mining for a two-wheel vehicle type by the method of the present embodiment.

As can be seen from fig. 7, the conventional negative case mining result is closest to the image negative sample distribution obtained by performing negative case mining on the motor vehicle type due to the unbalanced distribution of the number of preset target object types, but the method of the embodiment is adopted to perform negative case mining on the three preset target object types of the motor vehicle, the pedestrian and the two-wheel vehicle respectively, so that the number of training image negative samples corresponding to the three preset target object types is balanced on the basis of screening out the image negative samples relatively similar to the preset target object types.

And 616, training a target detection model according to the training image negative sample set and the image positive sample set, and carrying out target detection on the picture to be detected through the trained target detection model.

Specifically, training the target detection model according to a training image negative sample set and an image positive sample set, including calculating a loss value of each training image according to the training image negative sample set, the image positive sample set and a loss function, calculating average loss values of all the training images, and adjusting network parameters through back propagation to obtain a trained target detection model.

The negative image samples participating in training are screened in the mode of the embodiment, and the same preset number is selected after the negative image samples corresponding to each preset target object type are ranked according to the similarity with the preset target object type from large to small. Compared with the traditional negative difficult case mining method. The influence of unbalanced number of preset target objects of different types on the performance of the target detection model is weakened, and the accuracy of the target detection model is further improved.

In one embodiment, as shown in fig. 8, the target detection is performed on the picture to be detected through the trained target detection model, which specifically includes steps 802 to 808, where:

step 802, obtaining a picture to be detected and a trained target detection model.

The method for obtaining the target detection model refers to the above embodiment, and is not described herein.

Step 804, generating an anchor frame of the picture to be detected according to the preset step length and the preset size.

And step 806, determining at least one candidate frame corresponding to the target object to be detected in the picture to be detected from the anchor frames according to the trained target detection model.

It can be appreciated that, in practical application of the target detection model, since the number of preset target objects cannot be predicted in advance, the target detection model often outputs a candidate frame far greater than the number of preset target objects. Therefore, for each preset target object in the picture to be detected, a plurality of coincident candidate frames often exist.

Step 808, determining a target frame of the target object to be detected from the at least one candidate frame.

The method for determining the target frame mode of the target object to be detected from at least one candidate frame comprises the following steps: and selecting a candidate frame with the highest matching probability with the preset target object type from the candidate frames as a target frame. And eliminating the candidate frame overlapped with the target frame, and continuously selecting the candidate frame with the highest matching probability with the preset target object type from the eliminated candidate frame as the target frame until the candidate frame is empty.

Specifically, if the target detection model determines only one candidate frame of the target object to be detected from anchor frames of the picture to be detected, determining the candidate frame as a target frame of the target object to be detected; if a plurality of candidate frames of the target object to be detected are determined, in order to determine the most accurate target frame from the plurality of coincident candidate frames, a non-maximum suppression algorithm is required to be adopted to reject the candidate frames in the candidate frame set.

The candidate frame set output by the current target detection model includes N candidate frames matched with a preset target object of the automobile type, a candidate frame anchor 'with the highest probability of matching the automobile type is selected from the N candidate frames as one target frame output by the model, the intersection ratio of the remaining N-1 candidate frames to the anchor' is calculated respectively, and the intersection ratio is compared with an intersection ratio threshold value in sequence. If the cross-over ratio is greater than the cross-over ratio threshold, the current candidate frame may coincide with the anchor', and the current candidate frame is removed from the candidate frame set. And after one round of comparison is finished, obtaining a candidate frame set after the first round of elimination.

Selecting a candidate frame anchor' with the highest matching probability with the automobile type from the candidate frame set after the first round of elimination as another target frame of the model output, and repeating the steps of cross comparison to obtain the candidate frame set after the second round of elimination. Repeating the steps until the candidate frames of the automobile type in the candidate frame set are empty, and obtaining all target frames corresponding to the preset target objects of the automobile type. In order to obtain target frames corresponding to the pedestrian type and the two-wheel vehicle type, corresponding steps are respectively executed on the original candidate frame sets, and then target frames of all types of preset target objects in the picture to be detected can be obtained.

In the above embodiment, at least one candidate frame is determined from the anchor frames of the picture to be detected through the trained target detection model, and then the target frame of the target object to be detected is determined from the candidate frames. According to the method, the target object to be detected in the picture to be detected is detected, and as the accuracy of the trained target detection model is improved, the target frame obtained by carrying out target detection on the trained target detection model can more accurately determine the preset target object of each type in the picture to be detected.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a training sample determining device for realizing the above-mentioned training sample determining method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitations in the embodiments of the one or more training sample determining apparatuses provided below may be referred to the limitations of the training sample determining method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 9, there is provided a training sample determining apparatus 900, comprising: a sample acquisition module 902, a sample acquisition module negative sample identification module 904, a training negative sample determination module 906, and a model training module 908, wherein:

a sample acquisition module 902 for acquiring a positive and a negative set of images and determining a ratio of the number of positive and negative samples in the positive and negative set of images.

The sample obtaining module negative sample identifying module 904 is configured to identify each image negative sample in the image negative sample set if the number ratio is not the preset number ratio, so as to obtain similarity between each image negative sample and different types of preset target objects.

The training negative-sample determining module 906 is configured to determine, from the image negative-sample set, a preset number of image negative-samples corresponding to the type of each preset target object according to the similarity between each image negative-sample and the preset target objects of different types, so as to obtain a training image negative-sample set.

The model training module 908 is configured to train the target detection model according to the training image negative sample set and the image positive sample set.

By the aid of the device, the number of the negative samples of the images corresponding to each preset target object type is balanced to participate in training, and the problem that the target detection model is lack of fit for the target object types with sparse preset target object numbers due to insufficient training samples is avoided, so that adverse effects of different types of unbalanced preset target object numbers on the target detection model are effectively weakened, the detection capability of the target detection model for the preset target object types with sparse preset target object numbers is improved, and performance of the target detection model is improved.

In one embodiment, the model training module 908 is further configured to obtain a training image positive sample set according to the image positive sample set and the true value labeling frame set; and training the target detection model according to the training image negative sample set and the training image positive sample set.

In one embodiment, the model training module 908 is further configured to determine, from the set of truth labeling frames, a truth labeling frame corresponding to each image positive sample in the set of image positive samples; carrying out fusion processing according to each image positive sample and each corresponding true value labeling frame to obtain a virtual image positive sample; obtaining a training image positive sample subset corresponding to each type according to the virtual image positive sample and the image positive sample set; and respectively taking the same number of training image positive samples from the training image positive sample subsets to obtain a training image positive sample set.

In one embodiment, the model training module 908 is further configured to obtain a virtual image positive sample according to the first vertex coordinates of each image positive sample and the second vertex coordinates of the corresponding truth labeling frame; the virtual image positive sample comprises at least one of a bitwise frame, an intersection frame, a union frame and a random disturbance frame of the image positive sample and a corresponding true value labeling frame.

In one embodiment, the sample acquisition module 902 is further configured to acquire a training picture set with a truth-value labeling box; according to the preset step length and the preset size, generating anchor frames of all training pictures in the training picture set; dividing the anchor frames into an image positive sample and an image negative sample according to the intersection ratio of each anchor frame and each true value labeling frame; an image positive sample set is obtained from the image positive sample, and an image negative sample set is obtained from the image negative sample.

In one embodiment, the training negative determination module 906 is further configured to rank, for each type of the preset target object, the image negative samples based on the order of the similarity from large to small, to obtain a ranked image negative sample set, and determine a preset number of image negative samples from the ranked image negative sample set.

In another embodiment, as shown in fig. 10, another training sample determining apparatus 900 is provided, which includes, in addition to a sample acquisition module 902, a sample acquisition module negative sample identification module 904, a training negative sample determination module 906, and a model training module 908, a target detection module 910 for performing target detection on a picture to be detected through a trained target detection model.

The respective modules in the training sample determination apparatus described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 11. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a training sample determination method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 11 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

if the number ratio is not the preset number ratio, identifying each image negative sample in the image negative sample set to respectively obtain the similarity between each image negative sample and different types of preset target objects;

according to the similarity between each image negative sample and different types of preset target objects, respectively determining a preset number of image negative samples corresponding to the types of each preset target object from an image negative sample set to obtain a training image negative sample set;

And training the target detection model according to the training image negative sample set and the image positive sample set.

In one embodiment, the processor when executing the computer program further performs the steps of:

and training the target detection model according to the training image negative sample set and the training image positive sample set.

determining a true value labeling frame corresponding to each image positive sample in the image positive sample set from the true value labeling frame set;

carrying out fusion processing according to each image positive sample and each corresponding true value labeling frame to obtain a virtual image positive sample;

obtaining a virtual image positive sample according to the first vertex coordinates of each image positive sample and the second vertex coordinates of the corresponding true value labeling frame; the virtual image positive sample comprises at least one of a bitwise frame, an intersection frame, a union frame and a random disturbance frame of the image positive sample and a corresponding true value labeling frame.

acquiring a training picture set with a true value labeling frame;

according to the preset step length and the preset size, generating anchor frames of all training pictures in the training picture set;

dividing the anchor frames into an image positive sample and an image negative sample according to the intersection ratio of each anchor frame and each true value labeling frame;

an image positive sample set is obtained from the image positive sample, and an image negative sample set is obtained from the image negative sample.

and for each type of the preset target object, sequencing the image negative samples based on the sequence of the similarity from large to small to obtain a sequenced image negative sample set, and determining a preset number of image negative samples from the sequenced image negative sample set.

and training a target detection model according to the training image negative sample set and the image positive sample set, and carrying out target detection on the picture to be detected through the trained target detection model.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

acquiring a training picture set with a true value labeling frame;

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, when the image to be detected is subjected to target detection, user information (including but not limited to user equipment information, user personal information and the like) and data (including but not limited to data for analysis, stored data, displayed data and the like) related to the application are information and data authorized by a user or fully authorized by all parties, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as Static Random access memory (Static Random access memory AccessMemory, SRAM) or dynamic Random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of training sample determination, the method comprising:

2. The method of claim 1, wherein the training a target detection model from the training image negative sample set and the image positive sample set comprises:

3. The method according to claim 2, wherein obtaining a training image positive sample set from the image positive sample set and the true value labeling frame set comprises:

4. A method according to claim 3, wherein said fusing each of said positive image samples with the corresponding true value annotation box to obtain a positive virtual image sample comprises:

5. The method of claim 1, wherein the acquiring the positive and negative image sample sets comprises:

acquiring a training picture set with a true value labeling frame;

6. The method according to claim 1, wherein the determining, from the image negative sample set, a preset number of image negative samples corresponding to the type of each preset target object according to the similarity between each image negative sample and the preset target object of different types, respectively, includes:

7. The method according to claim 1, wherein the method further comprises:

acquiring a picture to be detected and a trained target detection model;

determining at least one candidate frame corresponding to a target object to be detected in the picture to be detected from the anchor frame according to the trained target detection model;

determining a target frame of the target object to be detected from the at least one candidate frame; the target frame is used for determining the type of a preset target object and the position of the preset target object in the picture to be detected.

8. A training sample determination apparatus, the apparatus comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 7.