CN117830744A

CN117830744A - Training method and device for target recognition model, vehicle and storage medium

Info

Publication number: CN117830744A
Application number: CN202410027102.6A
Authority: CN
Inventors: 罗远庆; 罗林; 邢晨
Original assignee: Great Wall Motor Co Ltd
Current assignee: Great Wall Motor Co Ltd
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-04-05

Abstract

The application provides a training method and device of a target recognition model, a vehicle and a storage medium. The method comprises the following steps: acquiring a first image set without labels, and carrying out target recognition on the first image set based on a large model to obtain a second image set of a first target detection frame with confidence coefficient not smaller than a first threshold value; training based on the second image set to obtain a first classification model and a first target recognition model; and expanding the training sample for multiple times according to the large model, the first image set and the first classification model in a mode of reducing the first threshold value, and respectively training the first classification model and the first target recognition model based on the training sample after each expansion so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until a preset condition is met, thereby obtaining the target recognition model after final training. The method does not need to manually label a large amount of image data, saves time and labor, and reduces labeling cost.

Description

Training method and device for target recognition model, vehicle and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a training method and apparatus for a target recognition model, a vehicle, and a storage medium.

Background

With the rapid development of technology in the artificial intelligence field such as deep learning, people can use related technology to realize the perception of the surrounding environment. For example, during driving of a vehicle, images captured by cameras on the vehicle may be object-identified using a deep learning-based object-identification model. However, before performing object recognition using the above object recognition model, the object recognition model needs to be trained.

At present, a target recognition model is usually trained by using a monitoring training mode, however, a large amount of marked data is required for monitoring training, and the data is marked manually, which is time-consuming and labor-consuming.

Disclosure of Invention

The embodiment of the application provides a training method, device, vehicle and storage medium of a target recognition model, so as to solve the problem that the existing method needs a large amount of manual annotation data and is time-consuming and labor-consuming.

In a first aspect, an embodiment of the present application provides a training method for a target recognition model, including:

acquiring a first image set without labels, and carrying out first target identification on the first image set based on a large model to obtain a second image set containing a first target detection frame with confidence coefficient not smaller than a first threshold value;

Taking the second image set as a training sample, and respectively carrying out primary training on the initial auxiliary training classification model and the initial target recognition model to obtain a first classification model with classification accuracy lower than first requirement accuracy and a first target recognition model with target recognition accuracy lower than second requirement accuracy;

and expanding the training sample for multiple times according to the large model, the first image set and the first classification model in a mode of reducing the first threshold value, and respectively training the first classification model and the first target recognition model based on the training sample after each expansion so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until a preset condition is met, thereby obtaining the target recognition model after final training.

In one possible implementation, expanding the training samples multiple times according to the large model, the first image set, and the first classification model in a manner of reducing the first threshold, and training the first classification model and the first target recognition model based on each expanded training sample, respectively, includes:

reducing the first threshold value to obtain a second threshold value;

marking: performing second target recognition on the first image set based on the large model to obtain a third image set containing a second target detection frame with confidence coefficient not smaller than a second threshold value; the number of the second target detection frames is greater than the number of the first target detection frames;

Inputting the image in the second target detection frame into a first classification model to obtain a classification result;

based on the classification result and the third image set, respectively retraining the first classification model and the first target recognition model to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model;

and (3) reducing the second threshold value to obtain a new second threshold value, taking the second classification model as a new first classification model, taking the second target recognition model as a new first target recognition model, jumping to a marking step, and performing iterative training.

In one possible implementation manner, the first training is performed on the initial auxiliary training classification model and the initial target recognition model by using the second image set as a training sample, so as to obtain a first classification model with classification accuracy lower than the first requirement accuracy and a first target recognition model with target recognition accuracy lower than the second requirement accuracy, which include:

labeling the images in the first target detection frame in the second image set as targets, and labeling the background images around the first target detection frame as backgrounds;

Taking the image marked as a target and the image marked as a background in the second image set as training samples, and performing primary training on the initial auxiliary training classification model to obtain a first classification model with classification accuracy lower than first requirement accuracy;

and performing background filling treatment or masking treatment on non-target and non-background areas in each image of the second image set to obtain a treated second image set, taking the treated second image set as a training sample, and performing primary training on the initial target recognition model to obtain a first target recognition model with target recognition accuracy lower than second requirement accuracy.

In one possible implementation, based on the classification result and the third image set, retraining the first classification model and the first target recognition model respectively to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model, including:

labeling an image in a second target detection frame with a classification result as a target, and labeling a background image around the second target detection frame with the classification result as the target as a background;

Taking the image marked as a target and the image marked as a background in the third image set as training samples, and retraining the first classification model to obtain a second classification model with classification accuracy higher than that of the first classification model;

and performing background filling processing or masking processing on non-target and non-background areas in each image of the third image set to obtain a processed third image set, and re-training the first target recognition model by taking the processed third image set as a training sample to obtain a second target recognition model with target recognition accuracy higher than that of the first target recognition model.

In one possible implementation manner, during each iterative training, if the second threshold is smaller than the preset threshold, after the marking step, the method further includes:

inputting the image in the second target detection frame into the first target recognition model to obtain a target recognition result;

correspondingly, based on the classification result and the third image set, respectively retraining the first classification model and the first target recognition model to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model, wherein the retraining comprises the following steps:

And respectively retraining the first classification model and the first target recognition model based on the classification result, the target recognition result and the third image set to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model.

In one possible implementation manner, based on the classification result, the target recognition result and the third image set, respectively retraining the first classification model and the first target recognition model to obtain a second classification model with higher classification accuracy than the first classification model and a second target recognition model with higher target recognition accuracy than the first target recognition model, including:

labeling an image in a second target detection frame with the classification result as a target and the target identification result as a target, and labeling a background image around the second target detection frame with the classification result as a target and the target identification result as a target as a background;

and respectively retraining the first classification model and the first target recognition model based on the image marked as the target and the image marked as the background in the third image set to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model.

In one possible implementation manner, the preset condition includes at least one of a proportion of newly added positive samples in the training samples after the current expansion being lower than a preset proportion, performance of the first target recognition model meeting a preset requirement, a first threshold value after the current reduction being smaller than a third threshold value, and a training frequency being greater than a first preset frequency.

In one possible implementation manner, in the iterative training process, when the iteration number reaches the second preset number, if it is determined that the first classification model cannot correctly classify the target and the background, adding a plurality of manually marked images, and performing iterative training.

In a second aspect, an embodiment of the present application provides a training apparatus for a target recognition model, including:

the first large model detection module is used for acquiring a first image set without labels, and carrying out first target identification on the first image set based on a large model to obtain a second image set containing a first target detection frame with confidence coefficient not smaller than a first threshold value;

the first training module is used for taking the second image set as a training sample, and respectively carrying out primary training on the initial auxiliary training classification model and the initial target recognition model to obtain a first classification model with classification accuracy lower than first requirement accuracy and a first target recognition model with target recognition accuracy lower than second requirement accuracy;

The iterative training module is used for expanding training samples for multiple times according to the large model, the first image set and the first classification model in a mode of reducing the first threshold value, and training the first classification model and the first target recognition model based on the training samples after expansion each time so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until a preset condition is met, so that the target recognition model after final training is completed is obtained.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory is configured to store a computer program, and the processor is configured to invoke and run the computer program stored in the memory, to perform a training method of the object recognition model according to the first aspect or any possible implementation manner of the first aspect.

In a fourth aspect, embodiments of the present application provide a vehicle comprising an electronic device as in the third aspect.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed by a processor implements the steps of the training method of the object recognition model as described above in the first aspect or any one of the possible implementations of the first aspect.

The embodiment of the application provides a training method, a training device, a vehicle and a storage medium of a target recognition model, wherein the method can be used for carrying out target recognition on a first image set without labels through a large model to obtain a second image set of a first target detection frame with confidence coefficient not smaller than a first threshold value, and respectively carrying out primary training on an initial auxiliary training classification model and an initial target recognition model through the second image set to obtain a first classification model and a first target recognition model which do not meet the use requirement; according to the method, training samples are continuously expanded according to a large model, a first image set and a first classification model in a mode of reducing a first threshold value, the first classification model and the first target recognition model are respectively trained based on the training samples after expansion each time, so that the two models continuously learn new characteristics of targets, the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model are improved until preset conditions are met, the finally trained target recognition model can be obtained, therefore, training of the target recognition model can be achieved based on unlabeled image data, manual labeling of a large amount of image data is not needed, time and labor are saved, and labeling cost can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an implementation of a training method of a target recognition model according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a training device for a target recognition model according to an embodiment of the present application;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the following description will be made with reference to the accompanying drawings by way of specific embodiments.

Referring to fig. 1, a flowchart of an implementation of a training method of a target recognition model according to an embodiment of the present application is shown, where an execution subject of the training method of the target recognition model may be an electronic device, and the electronic device may be a vehicle-mounted controller or other electronic devices, which is not limited herein specifically.

The training method of the target recognition model is detailed as follows:

in S101, a first image set without labels is obtained, and a first target recognition is performed on the first image set based on a large model, so as to obtain a second image set including a first target detection frame with a confidence level not smaller than a first threshold.

The images in the first image set without the label are images without labels, namely images without labels of the target and the background. The target is the target to be identified by the target identification model, and the background can be understood as a non-target.

The first image set may include images acquired by a camera located outside or inside the vehicle. For example, if the vehicle exterior image is collected by a camera positioned outside the vehicle, the targets such as pedestrians or vehicles (bicycles, automobiles and the like) outside the vehicle can be identified through a target identification model, so that the attention of a driver is reminded; if the in-vehicle image is collected by the camera in the vehicle, the in-vehicle passenger can be identified through the object identification model, so as to judge whether each seat is provided with the passenger, or the hand of the in-vehicle passenger can be identified, so as to further judge whether smoke is drawn or a call is made, or the face of the driver can be identified, so as to further judge whether the driver is tired, and the like.

The large Model may be referred to as a large language Model (Large Language Model, LLM), a base Model (Foundation Model), or an infrastructure Model, and is a highly versatile deep learning Model trained on the basis of massive data and computing resources.

A large model may be considered a machine learning model with a large number of parameters and computing resources. Large models typically require a large amount of data and computational power during training and have millions to billions of parameters. The design purpose of the large model is to improve the representation capability and performance of the model, and to better capture patterns and rules in the data when processing complex tasks. The large model adopts hundreds of millions of data resources in the training process, learns various knowledge, and can perform target identification according to the requirements of users.

In some possible implementations, the large model may be a groudingdino model.

In this embodiment, the first object recognition may be performed on each image in the first image set without the label through the large model. The first threshold is set, and the first threshold is a higher threshold, for example, may be 0.9 or 0.95. The targets with the confidence coefficient larger than or equal to the first threshold value in the first image set are identified through the large model, and the targets with the higher confidence coefficient can be identified through the large model by marking the targets with the first target detection frame, so that the target identification model can be trained based on the targets with the higher confidence coefficient. The target may be set according to actual use requirements, for example, the aforementioned pedestrian, vehicle, or the aforementioned passenger in the vehicle, hands, face, etc.

In this embodiment, an image set obtained after the first object recognition, in which the confidence coefficient of the first image set without labels is greater than or equal to the first threshold, is performed through the large model is referred to as a second image set. The second image set includes a first target detection box having a confidence level greater than or equal to a first threshold. The confidence level of the image in the first target detection frame for the target is greater than or equal to a first threshold.

In S102, the second image set is used as a training sample, and initial training is performed on the initial auxiliary training classification model and the initial target recognition model respectively, so as to obtain a first classification model with classification accuracy lower than the first requirement accuracy and a first target recognition model with target recognition accuracy lower than the second requirement accuracy.

The initial auxiliary training classification model is used as an auxiliary training model for training the initial target recognition model. The initial auxiliary training classification model can classify the input image, and the number of output branches of the initial auxiliary training classification model can be set according to the number of target types to be identified by the initial target identification model. Specifically, the number of output branches of the initial auxiliary training classification model is one plus the number of target types to be identified by the initial target identification model, each output branch corresponds to one target type one by one, and one output branch which is more than one output branch corresponds to the background, namely the non-target. For example, the initial object recognition model is used to recognize pedestrians and vehicles, and the output of the initial auxiliary training classification model may be divided into three categories, pedestrian, vehicle, and other (non-object), respectively, and so on.

In the subsequent iterative training process, the confidence coefficient of the large model is continuously reduced, so that the target image identified by the large model is not accurate any more, the first classification model after each training is required to classify the target image identified by the large model, and the image with the classification result as the target is used as a training sample of the target, so that the accuracy of the marking of the training sample is improved.

The initial target recognition model is a model to be trained in the application, and the target recognition model which is finally trained can be obtained through training the model for multiple times.

The input of the initial auxiliary training classification model is an image to be classified, and the output is a classification result, such as whether the image is a pedestrian, a vehicle or a non-target. The input of the initial target recognition model is an image to be recognized by the target, and the input is an image marked with the target.

Because the confidence coefficient of the first target detection frame contained in the second image set is larger than the first threshold, namely, the confidence coefficient is higher, the first training can be directly carried out on the initial auxiliary training classification model through the second image set to obtain a first classification model, the first training can be directly carried out on the initial target recognition model through the second image set to obtain a first target recognition model, and further classification confirmation is not needed to be carried out on the second image set.

Specifically, the image marked as the target and the image marked as the background in the second image set can be segmented from the original image, and the segmented image marked as the target and the segmented image marked as the background are respectively input into an initial auxiliary training classification model for initial training, so that a first classification model is obtained. The original images marked with the targets and the background in the second image set can be respectively input into the initial target recognition model for primary training, and the first target recognition model is obtained.

The first classification model and the first target recognition model can learn the characteristics of some targets through primary training, but because the confidence coefficient (not smaller than a first threshold value) of the training sample of the current training is higher, the training sample serving as a target image is fewer, so that only part of the characteristics of the targets can be learned, the classification accuracy of the first classification model is lower than the first requirement accuracy, the target recognition accuracy of the first target recognition model is lower than the second requirement accuracy, and the training is required to be continued subsequently. The first required accuracy is the classification accuracy which is finally achieved by the first classification model, and the second required accuracy is the target recognition accuracy which is finally achieved by the first target recognition model.

In S103, in a manner of reducing the first threshold, according to the large model, the first image set and the first classification model, the training sample is extended for multiple times, and the first classification model and the first target recognition model are respectively trained based on the training sample after each extension, so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until a preset condition is met, and a target recognition model after final training is obtained.

In this embodiment, after the large model performs target recognition on the first image set by reducing the first threshold, the number of target detection frames whose confidence coefficient is not smaller than the first threshold is continuously increased, but due to the reduction of the confidence coefficient, the image in the target detection frame is likely to be not a real target, so that the image in the target detection frame can be classified again through the first classification model obtained after training, so as to determine whether the image is a real target, thereby expanding the training sample and ensuring the accuracy of the training sample. And each time the first threshold value is lowered, obtaining an extended training sample.

And respectively training the first classification model after the last training and the first target recognition model after the last training by using the training sample after each expansion, so that the two models learn new characteristics of the target, the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model are improved continuously until a preset condition is met, and the first target recognition model at the moment is the target recognition model after the final training.

The first threshold may be lowered by a certain step. The step of each reduction can be determined according to actual requirements, for example, 0.1, 0.05 or 0.02, etc.

In the embodiment, target recognition can be performed on a first image set without labels through a large model, a second image set containing a first target detection frame with confidence coefficient not smaller than a first threshold value is obtained, and initial auxiliary training classification models and initial target recognition models are respectively subjected to initial training through the second image set, so that a first classification model and a first target recognition model which do not meet the use requirements are obtained; according to the method, training samples are continuously expanded according to a large model, a first image set and a first classification model in a mode of reducing a first threshold value, the first classification model and the first target recognition model are respectively trained based on the training samples after expansion each time, so that the two models continuously learn new characteristics of targets, the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model are improved until preset conditions are met, the finally trained target recognition model can be obtained, therefore, training of the target recognition model can be achieved based on unlabeled image data, manual labeling of a large amount of image data is not needed, time and labor are saved, and labeling cost can be reduced.

In some embodiments, in the step S103, the expanding the training samples multiple times according to the large model, the first image set and the first classification model in a manner of reducing the first threshold, and training the first classification model and the first target recognition model based on each expanded training sample respectively may include the following steps one to five.

In step one, the first threshold is lowered to obtain a second threshold.

As previously described, the first threshold may be lowered in steps to obtain the second threshold; the second threshold is less than the first threshold.

In the second step, namely the marking step, carrying out second target identification on the first image set based on the large model to obtain a third image set containing a second target detection frame with the confidence coefficient not smaller than a second threshold value; the number of second target detection frames is greater than the number of first target detection frames.

In this embodiment, the second target recognition may be performed on each image in the first image set without the label through the large model. In order to increase the number of training samples as the target image, the target recognition model is made to learn more features of the target, the confidence of target recognition of the large model is lowered, a second threshold is set, the second threshold is smaller than the first threshold, and the confidence is lowered with respect to the first threshold, for example, 0.8 or 0.85. And identifying the target with the confidence coefficient not smaller than the second threshold value in the first image set through the large model, and marking the target with a second target detection frame, so that a third image set can be obtained.

In this embodiment, an image set obtained by performing object recognition on a first image set without labels by using a large model, where the confidence coefficient is greater than or equal to a second threshold value, is referred to as a third image set. The third image set includes a second target detection box having a confidence level greater than or equal to a second threshold. The confidence level of the image in the second target detection frame for the target is greater than or equal to a second threshold.

Because the confidence of the large model is reduced, the number of second target detection frames obtained by the second target recognition is greater than the number of first target detection frames obtained by the first target recognition, namely the number of targets obtained by the large model for the second target recognition is greater than the number of targets obtained by the large model for the first target recognition.

In the third step, the image in the second target detection frame is input into the first classification model, and a classification result is obtained.

Since the second threshold is lower than the first threshold, there may be some images within the second object detection frame that are not objects. In order to further determine whether the images in the second target detection frames are real targets, the images in the second target detection frames are input into the first classification model, and respective corresponding classification results are obtained.

The first classification model has a certain classification capability because it has been trained to learn the characteristics of some objects. By the dual determination of the large model and the first classification model, the accuracy of the training sample as the target image can be improved.

In the fourth step, based on the classification result and the third image set, respectively training the first classification model and the first target recognition model again to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model.

And determining the image serving as the target and the image serving as the background through the classification result of the first classification model and the third image set, retraining the first classification model based on the image serving as the target and the image serving as the background to obtain a corresponding second classification model, and retraining the first target recognition model based on the image serving as the target and the image serving as the background to obtain a corresponding second target recognition model.

Specifically, for the image in the second target detection frame in the third image set, it may be determined whether the image in the second target detection frame is a target image by further classification of the first classification model. By reducing the confidence coefficient of the large model (namely setting a second threshold value), and classifying the recognition result of the large model through the first classification model, the images marked as targets in the current training and the previous training can be different, namely the diversity of training samples is increased, so that the first classification model and the first target recognition model learn more characteristics of the targets, the classification accuracy of the second classification model obtained after the current training is higher than that of the first classification model, and the target recognition accuracy of the second target recognition model obtained after the current training is higher than that of the first target recognition model.

In the fifth step, the second threshold is lowered, a new second threshold is obtained, the second classification model is used as a new first classification model, the second target recognition model is used as a new first target recognition model, the step is skipped to a marking step, and iterative training is carried out.

When the preset conditions are met, the iterative training is ended, and a target recognition model with the final training completed is obtained.

In order to further train the target recognition model, the accuracy of target recognition is improved, the training samples trained each time are kept diversified by continuously reducing the second threshold value, the number of the training samples is increased, and the target recognition model learns more characteristics of the target. And jumping to a marking step to perform iterative training (circularly executing the second step to the fifth step) until the preset condition is met, and obtaining the target recognition model with the final training completed.

Wherein, the second threshold is lowered, and the second threshold can be lowered according to a certain step. The step of each reduction can be determined according to actual requirements, for example, 0.1, 0.05 or 0.02, etc.

According to the method, a third image set containing a second target detection frame with confidence coefficient larger than or equal to a second threshold value is obtained through a large model, meanwhile, images in the second target detection frame are classified through a trained first classification model, a target detection result with high confidence coefficient is obtained, the first classification model and the first target recognition model are continuously trained based on the classification result and the third image set, so that new characteristics of the target are learned through the two models, classification accuracy and target recognition accuracy are improved, the second threshold value is continuously reduced, iterative training is carried out, new characteristics of the target are continuously learned until preset conditions are met, and a target recognition model with final training completion can be obtained.

In some embodiments, the step S102 may include:

performing background filling processing or mask processing on non-target and non-background areas in each image of the second image set to obtain a processed second image set, taking the processed second image set as a training sample, and performing primary training on an initial target recognition model to obtain a first target recognition model with target recognition accuracy lower than second requirement accuracy.

The background image around the first target detection frames marked as the background does not intersect with the images in the first target detection frames.

The confidence of the background image marked as background is smaller than a fourth threshold, which is a smaller threshold, for example, may be 0.1 or 0.2, etc.

The confidence of non-target and non-background regions in the respective images of the second image set lies between the fourth threshold and the first threshold. Non-target and non-background areas in an image are images of the image other than the image labeled as target and the image labeled as background.

In this embodiment, the images in the first target detection frames in the images of the second image set are all marked as targets, and the background images of a certain area around each first target detection frame are marked as backgrounds. The size of the certain area can be set according to actual requirements, and is not particularly limited herein.

And dividing the image marked as the target and the image marked as the background in the second image set from the original image, respectively inputting the divided image marked as the target and the image marked as the background into an initial auxiliary training classification model for initial training, and enabling the initial auxiliary training classification model to learn part of the characteristics of the target to obtain a first classification model.

And as described in the third step, the first classification model may further classify the result obtained by the second target recognition of the large model, determine the final image that can be used as the target, and improve the accuracy of the target image in the training sample.

Because the non-target and non-background areas in the images of the second image set are not determined to be targets or backgrounds, the initial target recognition model cannot be trained, the non-target and non-background areas in the images of the second image set can be subjected to background filling processing or mask processing, namely, the areas are filled with the images marked as the backgrounds, or the areas are subjected to mask processing (shielding processing), so that a processed second image set is obtained, and the processed images in the second image set are input into the initial target recognition model for initial training, so that the first target recognition model is obtained.

In some possible implementations, the target recognition may be performed on the first image set through a large model, and the image in the detection frame with the recognized confidence coefficient smaller than the fourth threshold value is used as a background image and is marked as a background.

In some embodiments, the step four may include:

The background image around the second target detection frame with the classification result as the target and the images in the second target detection frame with the classification result as the target are not intersected.

In this embodiment, in order to increase the number of training samples, the diversity of training samples is improved, and the confidence of the large model is reduced. The large model considers the image in the second target detection frame as a target, but because the confidence is reduced, the image in the second target detection frame is possibly not the target, so that in order to improve the accuracy of the target image, the image in the second target detection frame is further classified through the first classification model which has learned part of the characteristics of the target, the image in the second target detection frame with the classification result of the first classification model as the target is marked as the target, the image in the second target detection frame with the classification result of the first classification model as the non-target is discarded, and the image which is not the target is no longer considered as the target, so that the image which is not the target is removed, and the training effect is avoided.

The present embodiment also labels as background the background image of a certain area around the second target detection frame labeled as target in the third image set.

And inputting the image marked as the target and the image marked as the background in the third image set into the first classification model for training to obtain a second classification model with classification accuracy higher than that of the first classification model.

Because the non-target and non-background areas in the images of the third image set are not determined to be targets or backgrounds, the first target recognition model cannot be trained, so that background filling processing or mask processing can be performed on the non-target and non-background areas in the images of the third image set, namely, the areas are filled with the images marked as the backgrounds, or mask processing (shielding processing) is performed on the areas, the processed third image set is obtained, the processed images in the third image set are input into the first target recognition model, and training is performed, so that the second target recognition model with target recognition accuracy higher than that of the first target recognition model is obtained.

In some embodiments, during each iterative training, if the second threshold is less than the preset threshold, after the marking step, the method further includes:

In some embodiments, retraining the first classification model and the first target recognition model based on the classification result, the target recognition result, and the third image set to obtain a second classification model with classification accuracy higher than the first classification model and a second target recognition model with target recognition accuracy higher than the first target recognition model, respectively, including:

The preset threshold value can be a credible and an unreliable demarcation threshold value of a result of target identification by the large model. For example, if the target recognition result with the confidence level less than 0.5 is considered to be unreliable, the target recognition result with the confidence level greater than or equal to 0.5 is considered to be reliable, the preset threshold is 0.5, and so on.

When the second threshold is smaller than the preset threshold, the result of target identification of the first image set by the large model has higher unreliability. The first classification model is also a model which is not completely trained, and the classification result also has certain uncertainty. The determination of the target by only the target recognition result of the large model and the classification result of the first classification model also has a high uncertainty. At this time, a first target recognition model is introduced to further recognize the target of the detection result of the large model, and the large model, the first classification model and the first target recognition model are all recognized as images of targets and marked as targets. Because the first target recognition model has a certain target recognition capability after several rounds of training, the accuracy of the target can be further improved through the large model, the first classification model and the first target recognition model, so that the first classification model and the first target recognition model can be further trained by using an accurate target image, and the classification capability of the classification model and the target recognition capability of the target recognition model are improved.

The background image around the second target detection frame with the classification result as the target and the target recognition result as the target and the images in the second target detection frame with the classification results as the target and the target recognition result as the target are all free of intersection.

The retraining the first classification model and the first target recognition model based on the image marked as the target and the image marked as the background in the third image set to obtain a second classification model with classification accuracy higher than the first classification model and a second target recognition model with target recognition accuracy higher than the first target recognition model, may include:

The specific process may refer to the foregoing embodiments, and will not be described in detail.

In some embodiments, the preset condition includes at least one of a proportion of newly added positive samples in the training samples after the current expansion being lower than a preset proportion, performance of the first target recognition model meeting a preset requirement, a first threshold value after the current reduction being smaller than a third threshold value, and a training number being greater than a first preset number.

The preset ratio is a lower ratio, for example, may be 10%, 5%, or the like. The ratio of the newly added positive samples in the training samples after the expansion is the ratio of the number of the newly added images determined as targets in the training samples after the expansion to the number of the images determined as targets in the training samples after the expansion. The newly added image determined as the target in the training sample after the expansion of the present time refers to the image determined as the target newly compared with the training sample after the expansion of the last time. When the proportion of newly added positive samples in the training samples after the expansion is lower than the preset proportion, the number of the positive samples obtained by performing iterative training is also smaller, and the target recognition model cannot learn more knowledge, so that the iterative training can be ended.

The performance of the first target recognition model after the training can be tested through a test set, the preset requirement can include that the target recognition accuracy of the first target recognition model tested by the test set is higher than a certain proportion, for example, higher than 95% or 98%, and the like, and the iterative training can be ended.

The third threshold is a smaller threshold, such as 0.1 or 0.2, etc. When the first threshold value after the current reduction is too low, the accuracy of the target identified by the large model is too low, the training effect can not be improved any more, and at this time, the training can be ended.

The training times are increased once after each training of the first classification model and the first target recognition model is completed. And (3) determining that one training is finished every time the second to fifth steps are finished, and adding one to the training times. When the number of training times is greater than the first preset number of times, the iterative training may be ended. The first preset number of times may be set according to actual needs, and is not particularly limited herein.

In some embodiments, in the iterative training process, when the iteration number reaches the second preset number, if it is determined that the first classification model cannot correctly classify the target and the background, a plurality of images with artificial labels are added to perform iterative training.

The second preset number of times is smaller than the first preset number of times, and the second preset number of times is a smaller number of times, such as 3, 4, 5, or the like.

When the iteration times reach the second preset times, if the first classification model is found to be incapable of correctly classifying the target and the background, a plurality of manually marked images can be added, so that the first classification model and the first target recognition model learn the characteristics of the correct target, and the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model are improved. The classification accuracy of the first classification model on the target is lower than a certain threshold value, and the first classification model is considered to be incapable of accurately classifying the target and the background. The threshold may be less than the first desired accuracy, may be set according to actual requirements, for example, may be 70%, 60%, etc. The number of iterations may be equal to the number of training.

In this embodiment, the addition of a part of manually-marked images is to enable the first classification model to distinguish the target from the background, so that the classification accuracy of the first classification model is not too low, and therefore, compared with the existing training method, the number of manually-marked images required can be reduced.

The training method of the target recognition model can be applied to some scenes without or with less annotation data, such as infrared data scenes without or with less annotation data, and the like, so that the cost of manual annotation can be reduced.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.

The following are device embodiments of the present application, for details not described in detail therein, reference may be made to the corresponding method embodiments described above.

Fig. 2 is a schematic structural diagram of a training device for an object recognition model according to an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown, which is described in detail below:

as shown in fig. 2, the training apparatus 20 for an object recognition model includes: a first large model detection module 21, a first training module 22 and an iterative training module 23.

The first large model detection module 21 is configured to obtain a first image set without labels, and perform first target recognition on the first image set based on a large model to obtain a second image set including a first target detection frame with a confidence level not less than a first threshold;

The first training module 22 is configured to perform primary training on the initial auxiliary training classification model and the initial target recognition model respectively by using the second image set as a training sample, so as to obtain a first classification model with classification accuracy lower than the first requirement accuracy and a first target recognition model with target recognition accuracy lower than the second requirement accuracy;

the iterative training module 23 is configured to expand the training samples multiple times according to the large model, the first image set and the first classification model in a manner of reducing the first threshold, and train the first classification model and the first target recognition model based on the training samples after expansion each time, so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until a preset condition is satisfied, thereby obtaining a target recognition model after final training.

In one possible implementation manner, in the iterative training module 23, the above-mentioned manner of reducing the first threshold value, expanding the training samples multiple times according to the large model, the first image set and the first classification model, and training the first classification model and the first target recognition model based on each expanded training sample respectively includes:

Reducing the first threshold value to obtain a second threshold value;

In one possible implementation, the first training module 22 is specifically configured to:

In a possible implementation manner, in the iterative training module 23, based on the classification result and the third image set, the first classification model and the first target recognition model are respectively retrained to obtain a second classification model with a classification accuracy higher than that of the first classification model and a second target recognition model with a target recognition accuracy higher than that of the first target recognition model, which includes:

In a possible implementation manner, in the iterative training module 23, during each iterative training, if the second threshold is smaller than the preset threshold, after the marking step, the method further includes:

In a possible implementation manner, in the iterative training module 23, based on the classification result, the target recognition result and the third image set, the first classification model and the first target recognition model are respectively retrained to obtain a second classification model with higher classification accuracy than the first classification model and a second target recognition model with higher target recognition accuracy than the first target recognition model, which includes:

The present application also provides a computer program product having a program code which, when run in a corresponding processor, controller, computing device or electronic apparatus, performs the steps in the training method embodiment of any one of the object recognition models described above, for example S101 to S103 shown in fig. 1. Those skilled in the art will appreciate that the methods and apparatus presented in the embodiments of the present application may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. The special purpose processor may include an Application Specific Integrated Circuit (ASIC), a Reduced Instruction Set Computer (RISC), and/or a Field Programmable Gate Array (FPGA). The proposed method and device are preferably implemented as a combination of hardware and software. The software is preferably installed as an application program on a program storage device. Which is typically a machine based on a computer platform having hardware, such as one or more Central Processing Units (CPUs), random Access Memory (RAM), and one or more input/output (I/O) interfaces. An operating system is also typically installed on the computer platform. The various processes and functions described herein may either be part of the application program or part of the application program which is executed by the operating system.

Fig. 3 is a schematic diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 30 and a memory 31. The memory 31 is used for storing a computer program 32, and the processor 30 is used for calling and running the computer program 32 stored in the memory 31 to execute the steps in the training method embodiment of each object recognition model described above, such as S101 to S103 shown in fig. 1. Alternatively, the processor 30 is configured to invoke and run the computer program 32 stored in the memory 31 to implement the functions of the modules/units in the above-described device embodiments, such as the functions of the modules/units 21 to 23 shown in fig. 2.

Illustratively, the computer program 32 may be partitioned into one or more modules/units that are stored in the memory 31 and executed by the processor 30 to complete/implement the schemes provided herein. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions for describing the execution of the computer program 32 in the electronic device 3. For example, the computer program 32 may be split into modules/units 21 to 23 shown in fig. 2.

The electronic device 3 may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and does not constitute a limitation of the electronic device 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input-output device, a network access device, a bus, etc.

The processor 30 may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the electronic device 3, such as a hard disk or a memory of the electronic device 3. The memory 31 may be an external storage device of the electronic device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 31 is used for storing the computer program and other programs and data required by the electronic device. The memory 31 may also be used for temporarily storing data that has been output or is to be output.

Corresponding to the electronic equipment, the embodiment of the application also provides a vehicle, which comprises the electronic equipment.

The description of the vehicle may refer to the description of the method, and will not be repeated.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by instructing related hardware by a computer program, where the computer program may be stored in a computer readable storage medium, and the computer program may implement the steps of the training method embodiment of each object recognition model when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.

Furthermore, the features of the embodiments shown in the drawings or mentioned in the description of the present application are not necessarily to be construed as separate embodiments from each other. Rather, each feature described in one example of one embodiment may be combined with one or more other desired features from other embodiments, resulting in other embodiments not described in text or with reference to the drawings.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of training a target recognition model, comprising:

Taking the second image set as a training sample, and respectively carrying out primary training on an initial auxiliary training classification model and an initial target recognition model to obtain a first classification model with classification accuracy lower than first requirement accuracy and a first target recognition model with target recognition accuracy lower than second requirement accuracy;

and expanding training samples for multiple times according to the large model, the first image set and the first classification model in a mode of reducing the first threshold value, and respectively training the first classification model and the first target recognition model based on the training samples after each expansion so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until preset conditions are met, thereby obtaining the target recognition model after final training.

2. The method according to claim 1, wherein expanding the training samples a plurality of times based on the large model, the first image set, and the first classification model in such a manner as to lower the first threshold value, and training the first classification model and the first target recognition model based on each expanded training sample, respectively, comprises:

Reducing the first threshold value to obtain a second threshold value;

based on the classification result and the third image set, respectively training the first classification model and the first target recognition model again to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model;

and reducing the second threshold value to obtain a new second threshold value, taking the second classification model as a new first classification model, taking the second target recognition model as a new first target recognition model, and jumping to the marking step to perform iterative training.

3. The method for training the target recognition model according to claim 1, wherein the first training is performed on the initial auxiliary training classification model and the initial target recognition model by using the second image set as a training sample, respectively, to obtain a first classification model with classification accuracy lower than a first requirement accuracy and a first target recognition model with target recognition accuracy lower than a second requirement accuracy, and the method comprises:

Labeling an image in a first target detection frame in the second image set as a target, and labeling a background image around the first target detection frame as a background;

and performing background filling processing or masking processing on non-target and non-background areas in each image of the second image set to obtain a processed second image set, and performing primary training on the initial target recognition model by taking the processed second image set as a training sample to obtain a first target recognition model with target recognition accuracy lower than second requirement accuracy.

4. The method according to claim 2, wherein retraining the first classification model and the first target recognition model based on the classification result and the third image set, respectively, results in a second classification model having a classification accuracy higher than the first classification model and a second target recognition model having a target recognition accuracy higher than the first target recognition model, includes:

and performing background filling processing or masking processing on non-target and non-background areas in each image of the third image set to obtain a processed third image set, and retraining the first target recognition model by taking the processed third image set as a training sample to obtain a second target recognition model with target recognition accuracy higher than that of the first target recognition model.

5. The method according to claim 2, wherein, during each iteration of training, if the second threshold is smaller than a preset threshold, after the marking step, further comprising:

Correspondingly, based on the classification result and the third image set, respectively retraining the first classification model and the first target recognition model to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model, including:

and respectively training the first classification model and the first target recognition model based on the classification result, the target recognition result and the third image set to obtain a second classification model with classification accuracy higher than that of the first classification model and a second target recognition model with target recognition accuracy higher than that of the first target recognition model.

6. The method according to claim 5, wherein retraining the first classification model and the first target recognition model based on the classification result, the target recognition result, and the third image set, respectively, results in a second classification model having a classification accuracy higher than the first classification model and a second target recognition model having a target recognition accuracy higher than the first target recognition model, includes:

7. The training method of the target recognition model according to any one of claims 1 to 6, wherein the preset condition includes at least one of a proportion of newly added positive samples in the training samples after the current expansion being lower than a preset proportion, performance of the first target recognition model meeting a preset requirement, a first threshold value after the current reduction being smaller than a third threshold value, and a training number being greater than a first preset number.

8. A training device for a target recognition model, comprising:

and the iterative training module is used for expanding training samples for multiple times according to the large model, the first image set and the first classification model in a mode of reducing the first threshold value, and respectively training the first classification model and the first target recognition model based on the training samples after each expansion so as to improve the classification accuracy of the first classification model and the target recognition accuracy of the first target recognition model until a preset condition is met, so that a target recognition model with final training completion is obtained.

9. A vehicle comprising an electronic device, the electronic device comprising a memory for storing a computer program and a processor for invoking and running the computer program stored in the memory to perform the training method of the object recognition model according to any of claims 1 to 7.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the training method of the object recognition model according to any one of claims 1 to 7.