CN114596484A

CN114596484A - Target detection capability training method and device, storage medium and electronic equipment

Info

Publication number: CN114596484A
Application number: CN202210164319.2A
Authority: CN
Inventors: 关称心; 徐青
Original assignee: Shanghai Ofilm Intelligent Vehicle Co ltd
Current assignee: Shanghai Ofilm Intelligent Vehicle Co ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2022-06-07

Abstract

The application provides an image target detection capability training method and device, a storage medium and electronic equipment. The method comprises the steps of training a first image with a first mark so that a first detection model obtains first detection capability and model parameters forming the first detection capability; analyzing a second image with a second identifier by adopting a first detection model to acquire first information, wherein the first information at least comprises a plurality of targets and the categories of the targets; the second detection model determines a positive example and a negative example of the target of each category in the plurality of targets according to the first information; and the second detection model acquires model parameters of the first detection model so as to enable the second detection model to inherit the first detection capability of the first detection model, and trains the positive examples and the negative examples of the targets of each category in the multiple targets so as to enable the second detection model to acquire the second detection capability. The method has strong target detection capability.

Description

Target detection capability training method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of electronics, and in particular, to an image target detection capability training method, an image target detection apparatus, a computer-readable storage medium, and an electronic device.

Background

The existing strong supervision learning target detection algorithm relies on a large amount of manually labeled true value sample data (GT data) to train a target detection/segmentation model, the GT data production time and economic cost are high, and the quality is difficult to guarantee.

Disclosure of Invention

In view of the above problems, embodiments of the present application provide a method for training image target detection capability, which can obtain stronger target detection capability only with fewer true value samples and input samples only including simple image-level labels, and as the input samples are continuously increased by themselves, the number of the input samples is increased more and more, scenes are also enriched more and more, the self-labeling capability of the model is also increased more and more, and the target detection capability of the model can also be continuously self-enhanced, thereby realizing "online learning".

An embodiment of a first aspect of the present application provides an image target detection capability training method, which includes:

training a first image with a first identifier so that a first detection model obtains first detection capability and model parameters forming the first detection capability;

analyzing a second image with a second identifier by using the first detection model to acquire first information, wherein the first information at least comprises a plurality of targets and the categories of the targets;

the second detection model determines a positive example and a negative example of the target of each category in the plurality of targets according to the first information; and

the second detection model obtains the model parameters of the first detection model to cause the second detection model to inherit the first detection capability of the first detection model, and trains the positive examples and the negative examples of the targets of each category of the plurality of targets to cause the second detection model to obtain a second detection capability.

The image target detection capability training method in the embodiment of the application may inherit model parameters of the first detection model, so that the second detection model may obtain the first detection capability of the first detection model, and meanwhile, the second detection model may determine a positive example and a negative example of each category of the targets according to first information obtained by the first detection model, and train the positive examples and the negative examples of each category of the targets, so that the second detection model obtains the second detection capability, thereby improving the detection capability of the second detection model, and thus obtaining a stronger target detection capability by only using fewer first images with the first identifier (i.e., fewer number of true value samples).

Further, after training the positive examples and the negative examples of each class of target in the targets to make the second detection model obtain a second detection capability, the method further comprises:

analyzing a third image, a fourth image and a test image by using the second detection model to respectively obtain a first detection result, a second detection result and a third detection result, wherein the first detection result comprises the third image and a fifth image, the second detection result comprises the fourth image and a sixth image, the third image is an unidentified first image, the fourth image is an unidentified second image, the fifth image is a first image with a third identification, and the sixth image is a second image with a fourth identification; and

and when the accuracy of the third detection result meets a preset condition or the number of times of training of the second detection model is smaller than a first preset threshold value, sending the first detection result and the second detection result into the first detection model for cyclic training so as to improve the first detection capability of the first detection model and further improve the second detection capability of the second detection model.

Because the second detection model inherits the detection capability of the first detection model, the first detection capability of the first detection model is improved, and the second detection capability of the second detection model is improved accordingly. Along with the continuous promotion of second detection model testing capability, the first testing result that obtains at every turn and the second testing result degree of accuracy also can constantly improve, adopt first testing result and the second testing result after the degree of accuracy improves as new training set to train first detection model, can improve first detection model's first testing capability again, and then improve the second detection model's second testing capability. Therefore, only a small amount of first images and second images are needed, the detection capability and accuracy of the second detection model can be continuously improved through the cyclic execution of the first images, the second images, the first detection results and the second detection results, a large amount of truth-value samples are not needed, the cost for obtaining the truth-value samples (namely the cost for obtaining the first images with the first identifications) is greatly reduced, and meanwhile, the second detection model can have the self-learning capability and continuously improve the second detection capability of the second detection model. The second image data set is unchanged in the complete process of the application, but when the algorithm is trained again, the second image data set is a new data set, and because the new second image data set can be continuously input in the using process of the algorithm and the second image data set can automatically label image-level label information through a detection result at a certain stage, the algorithm model can continuously feed back and self-enhance to obtain the self-learning ability.

Further, after the first detection model and the second detection model complete a specified round of training, a test is performed on a third image, a fourth image, and a test image, where n is a positive integer, and when n is 1, the method further includes:

detecting the test image by adopting the first detection model to obtain a fourth detection result;

the third detection result meets a preset condition, and the third detection result comprises the following steps:

and when the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than a second preset threshold value.

Through the limitation to the circulation condition, the detection capability and the detection accuracy of the second detection model can be improved according to actual needs, the detection capability of the second detection model can be continuously improved under the condition of using fewer true value samples, the cost of the true value samples can be effectively reduced, and continuous self-learning is realized.

Further, after the first detection model and the second detection model complete a specified round of training, a test is performed on a third image, a fourth image and a test image, the number of times of the test is n, wherein n is a positive integer, and when n is greater than or equal to 2, the method further includes:

training the first detection result and the second detection result to improve the first detection capability of the first detection model, so as to improve the second detection capability of the second detection model;

detecting the test image by using the second detection model to obtain an nth third detection result;

and when the difference between the accuracy of the third detection result at the nth time and the accuracy of the third detection result at the (n-1) th time is larger than a second preset threshold value.

Further, before the training of the first image with the first identifier, the method further includes:

and performing illumination distortion, geometric distortion and image shielding on the first image by adopting a data enhancement algorithm in a target detection algorithm. This may improve scene generalization in the first image.

Further, the first information further includes orientation information of each of the objects in the second image and a confidence of each of the objects, and the orientation information of each of the objects includes coordinates and a vector of the object. Therefore, not only the position information of each target can be obtained, but also the size of each target can be obtained, so that the targets can be better identified, and the obstacle avoidance function can be better realized.

An embodiment of a second aspect of the present application provides an image target detection capability training apparatus, which includes:

the first detection unit is used for training a first image with a first identifier so as to enable a first detection model to obtain first detection capability and form model parameters of the first detection capability; and

the second detection unit is used for analyzing a second image with a second identifier by adopting a first detection model so as to obtain first information, wherein the first information at least comprises a plurality of targets and the categories of the targets;

the second detection unit is further used for determining a positive example and a negative example of the target of each category in the plurality of targets according to the first information by the second detection model;

the second detection unit is further configured to obtain the model parameters of the first detection model by the second detection model, so that the second detection model inherits the first detection capability of the first detection model, and train the positive examples and the negative examples of the targets of each category of the multiple targets, so that the second detection model obtains a second detection capability.

The image target detection capability training device in the embodiment of the application may inherit the model parameters of the first detection model, so that the second detection model may obtain the first detection capability of the first detection model, and meanwhile, the second detection model may determine the positive examples and the negative examples of the targets of each category in the multiple targets according to the first information obtained by the first detection model, and train the positive examples and the negative examples of the targets of each category in the multiple targets, so that the second detection model obtains the second detection capability, thereby improving the detection capability of the second detection model, and thus obtaining the stronger target detection capability by only using fewer first images with the first identifier (i.e., fewer number of true value samples).

The second detection unit is further configured to analyze a third image, a fourth image and the test image by using the second detection model to obtain a first detection result, a second detection result and a third detection result, respectively, where the first detection result includes the third image and a fifth image, the second detection result includes the fourth image and a sixth image, the third image is an unidentified first image, the fourth image is an unidentified second image, the fifth image is a first image with a third identifier, and the sixth image is a second image with a fourth identifier;

the image target detection capability training device further comprises an analysis and judgment unit, wherein the analysis and judgment unit is used for sending the first detection result and the second detection result into the first detection model for cyclic training when the accuracy of the third detection result meets a preset condition or the training frequency of the second detection model is smaller than a first preset threshold value, so that the first detection capability of the first detection model is improved, and the second detection capability of the second detection model is improved.

Because the second detection unit inherits the detection capability of the first detection model, the first detection capability of the first detection model is improved, and the second detection capability of the second detection unit is improved accordingly. Along with the continuous promotion of second detecting element detection ability, the first testing result that obtains at every turn and the second testing result degree of accuracy also can continuously improve, adopt first testing result and the second testing result after the degree of accuracy improves to train first detection model as new training set, can improve the first detection ability of first detection model again, and then improve the second detection ability of second detecting element. Therefore, only a small number of first images and a small number of second images are needed, the detection capability and accuracy of the second detection unit can be continuously improved through the cyclic execution of the first images, the second images, the first detection results and the second detection results, a large number of true value samples are not needed, the cost for obtaining the true value samples (namely the cost for obtaining the first images with the first identifications) is greatly reduced, and meanwhile, the second detection model can have the self-learning capability and continuously improve the second detection capability of the second detection unit.

Further, after the first detection model and the second detection model complete the training of the designated round, the first image, the second image and the test image without any identifier are tested once, the number of times of testing is n, wherein n is a positive integer, when n is 1,

the first detection unit is further configured to detect the test image by using the first detection model to obtain a fourth detection result;

the analysis and judgment unit is further used for judging whether the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than a second preset threshold; or when the training times of the second detection model are smaller than a first preset threshold value, the first detection result and the second detection result are sent to the first detection model to continue training, so that the first detection capability of the first detection model is improved, and the second detection capability of the second detection model is improved.

Through the limitation to the circulation condition, the detection capability and the detection accuracy of the second detection unit can be improved according to actual needs, the detection capability of the second detection unit can be continuously improved under the condition of using less true value samples, the cost of the true value samples can be effectively reduced, and continuous self-learning is realized.

Further, after the first detection model and the second detection model finish the training of the appointed times, the first image, the second image and the test image without any mark are tested for one time, the test times are n, wherein n is a positive integer, when n is more than or equal to 2,

the first detection unit is further configured to train the first detection result and the second detection result to improve a first detection capability of the first detection model, so as to improve the second detection capability of the second detection model;

the analysis and judgment unit is further configured to determine whether the difference between the accuracy of the third detection result at the nth time and the accuracy of the third detection result at the (n-1) th time is greater than a second preset threshold.

Further, the first detection unit is further configured to: and performing illumination distortion, geometric distortion and image shielding on the first image by adopting a data enhancement algorithm in a target detection algorithm. This may improve scene generalization in the first image.

In a third aspect, the present application provides a computer-readable storage medium, which stores computer-executable program code for causing a computer to execute the method of the embodiments of the present application.

In a third aspect of the present application, an electronic device is provided, where the electronic device includes a processor and a memory, where the memory stores program code executable by the processor, and when the program code is called and executed by the processor, the electronic device performs the method according to the embodiment of the present application.

The image target detection capability training method can inherit model parameters of a first detection model, so that a second detection model can obtain first detection capability of the first detection model, meanwhile, the second detection model can determine a positive example and a negative example of each type of targets in a plurality of targets according to first information obtained by the first detection model, and train the positive examples and the negative examples of each type of targets in the plurality of targets, so that the second detection model obtains second detection capability, and detection capability of the second detection model is improved, and accordingly stronger target detection capability can be obtained only by adopting fewer first images with first identifications (namely fewer true value samples).

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image target detection capability training method according to an embodiment of the present application.

Fig. 2 is a schematic flowchart of an image target detection capability training method according to another embodiment of the present application.

Fig. 3 is a flowchart illustrating an image target detection capability training method according to another embodiment of the present application.

Fig. 4 is a flowchart illustrating an image target detection capability training method according to another embodiment of the present application.

Fig. 5 is a flowchart illustrating an image target detection capability training method according to yet another embodiment of the present application.

Fig. 6 is a block diagram illustrating a configuration of an image target detection capability training apparatus according to an embodiment of the present application.

Fig. 7 is a block diagram of an image target detection capability training apparatus according to still another embodiment of the present application.

Fig. 8 is a circuit block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

It should be noted that, for convenience of description, like reference numerals denote like parts in the embodiments of the present application, and a detailed description of the like parts is omitted in different embodiments for the sake of brevity.

The current target detection analysis model applied to a scene image, such as a strong supervision target detection model, needs to rely on a large amount of manually labeled truth sample data (GT data) to train a target detection/segmentation model, and the labeling of the truth sample data is long in time, high in cost and difficult to guarantee in quality. In addition, the strong supervised learning algorithm has insufficient generalization capability for the unlearned scenes, and is difficult to realize universality and applicability. Therefore, the traditional strong supervision learning algorithm can only carry out front-end optimization, once the algorithm is transplanted and deployed, the strong supervision target detection model cannot be self-promoted, and online learning cannot be realized.

The detection model obtained by the image target detection capability training method can be applied to detection and segmentation of targets in the scene image, such as vehicles, people, obstacles and the like, and identification of direction information and types of the targets. In addition, the image target detection capability training method can also be applied to the fields of face recognition, medical detection and the like.

Referring to fig. 1, an embodiment of the present application provides a method for training image target detection capability, which includes:

s101, training a first image with a first identifier to enable a first detection model to obtain first detection capability and form model parameters of the first detection capability;

in some embodiments, prior to training the first image with the first identification, the method further comprises: the method comprises the steps of obtaining a first image and carrying out first identification on the first image.

Optionally, a scene of the original data is selected according to the application scene, and a first image of the scene is acquired. The scene of the raw data may be selected according to the target categories that may exist in the scene to which the method of the present application is applied, for example, a scene including all the target categories is selected, or a scene including the main target categories, for example, a target that needs to be accurately identified in an automatic driving process including a person, a vehicle, a road sign, a tree, and the like. In some embodiments, the first image may be acquired by an image capture device such as a camera, video camera, cell phone, or the like. In other embodiments, the first pattern may be obtained from a common image database, a network database, or the like.

Further, the first image is manually marked so that the first image has the first mark. In other words, the image sample (i.e. the first image) is manually annotated to obtain true value sample data (i.e. the first image with the first identity). The first identification includes all objects in the first image, category information of all objects, orientation information of all objects, and the like. Specifically, the position information of the target may be labeled in the form of a target candidate frame, that is, a candidate frame is adopted for framing each target in the first image, and the position of the candidate frame is the position information of the target.

In some embodiments, before training the first image with the first identifier, the method further comprises: and performing effective data enhancement operation such as illumination distortion, geometric distortion, image shielding and the like on the first image by adopting a data enhancement algorithm in a target detection algorithm (OD algorithm) so as to improve the scene generalization in the first image.

Optionally, training the first image with the first identifier to enable the first detection model to obtain the first detection capability and form the model parameters of the first detection capability may be:

the first image (with the first mark) which is subjected to manual labeling is sent to the first detection model for training and learning of the first detection model, so that the first detection capability of the target is obtained, and the model parameter with the first detection capability is formed.

Specifically, a strong supervision target detection model SSLOD (such as a target detection model based on Mask _ Rcnn algorithm) is selected, and a first image (or referred to as an original truth sample) with a first identifier is sent to a first detection model (SSLOD model) for training; meanwhile, in the training process, an attention mechanism technology (such as SENET, namely a neural network of an attention mechanism) is combined, high weight is added to a key channel or area of a characteristic diagram (namely a first image), and low weight is added to an irrelevant or non-key channel or area, so that the training process of the first detection model is more targeted, and the method has obvious benefits on the detection of small and medium targets (namely the targets with smaller volume or size); meanwhile, SSP-Net (characteristic pyramid pooling network) characteristic pyramid structure can be adopted for multi-scale characteristic fusion, so that the input targets do not need to be subjected to normalization scaling, the targets with different sizes can be ensured to enter the first detection model, the output with fixed size can be obtained, and the OD training result is further improved; meanwhile, since the first detection model has true value sample data (the first image with the first identifier), the frame overlapping phenomenon in the target detection can be filtered by using an NMS (non-maximum suppression) technique according to the true value sample data. Therefore, the first detection model can obtain better target detection capability (namely, the first detection capability) through training and learning of the first image with the first identifier, can detect multiple classes of targets to be detected (for example, K classes, where K is a positive integer) defined in the image, and outputs the orientation information, class information and confidence of the targets.

Alternatively, the first Detection model may be, but is not limited to, a strongly Supervised Object Detection model (SSLOD model). The number of the first images is plural, and specifically, may be, but not limited to, 3 ten thousand, 5 ten thousand, 10 ten thousand, 20 ten thousand, 30 ten thousand, 50 ten thousand, or the like. The greater the number of the first images, the better, but the greater the number, the higher the cost, and the specific number may be determined according to the first detection capability required to be achieved by the initial model using the scene and the first detection model.

The existing strong supervision target detection model needs to be put into practical application, needs to meet the preset target detection capability and detection accuracy, and needs to be trained and learned in a large amount before being put into application, so that a large amount of manually labeled true value samples need to be provided for training and learning. In this embodiment, the first detection model only needs to obtain a certain first detection capability through the first image, and therefore, the number of the first images in this embodiment may be much smaller than the number of true samples required for modeling the strong supervision target detection model. For example, it may be 1%, 3%, 5%, 8%, 10%, etc. of the number of samples required by the existing model.

S102, analyzing a second image with a second identifier by using the first detection model to acquire first information, wherein the first information at least comprises a plurality of targets and the categories of the targets;

in some embodiments, a second image having a second signature is analyzed using the first detection model, the method further comprising: and acquiring a second image, and performing second identification on the second image.

Specifically, according to a practical possible typical application scene, a batch of new original data (namely a second image) without any artificial annotation is randomly sampled, and simple image-level annotation is carried out on the second image; in other words, only the categories of which targets are included in the second image are identified, and orientation information and the like for each category are not labeled. For example, if the second image includes several cars (car) and people (Person), the labels of the images are car and Person, and there is no need to care how many cars and Person are in which area of the image. Optionally, the second identification comprises a category of all objects of the second image; in other words, the second identifier comprises a simple image-level tag.

In some embodiments, the first image is different from the second image, in other words, the first image and the second image are images of different scenes. In other embodiments, the second image is a portion of the first image that is different, in other words, the second image may be the first image and may be selected from the first image.

Optionally, the analyzing, by using the first detection model, the second image with the second identifier to obtain the first information may be: and analyzing and detecting the second images with the second marks by adopting the first detection model with the first detection capability to obtain the first information in each second image.

Optionally, the first information further includes orientation information of each of the objects in the second image and a confidence of each of the objects, and the orientation information of each of the objects includes coordinates and a vector of the object.

Specifically, the orientation information of the target may be labeled in the form of a target candidate frame, in other words, a candidate frame is adopted for framing each target in the second image. Alternatively, the candidate box may be a rectangular box, and coordinates or vectors of four corners of the rectangular box constitute the orientation information of the target. The specific position of the target can be accurately obtained by acquiring the coordinates or vectors of the four points of the target candidate frame, and in addition, the size of the target can be calculated through the coordinates or vectors of the four corners of the candidate frame, so that the target can be better identified, and the obstacle avoidance function can be better realized.

S103, determining a positive example and a negative example of each category of targets in the plurality of targets according to the first information by using a second detection model; and

alternatively, the second detection model may be, but is not limited to, a multiple instance learning module (MIL module) that includes a multiple instance classifier (MIL classifier).

Optionally, an MIL module of the second detection model is used to obtain first information obtained by the first detection model according to the second image, and determine a positive example and a negative example of each category of the plurality of targets according to the first information. For example, for "person", the "person" is its positive bag (positive Box), each pixel point in the positive bag is a positive example of the category "person", the category other than "person" is a negative bag (negative Box) of "person", and each pixel point in the negative bag is a negative example of "person". Thus, positive and negative examples of objects of each category may be obtained, and when K categories of objects are included in the second image, all objects of each image may be converted into binary data sets, thereby obtaining K binary data sets.

S104, the second detection model obtains the model parameters of the first detection model, so that the second detection model inherits the first detection capability of the first detection model, and trains the positive examples and the negative examples of the targets of each category in the multiple targets, so that the second detection model obtains a second detection capability.

Optionally, a transfer style migration learning is adopted to migrate the model parameters of the first detection model into an MIL module of the second detection model, so that the second detection model obtains the model parameters of the first detection model to inherit the first detection capability of the first detection model. Meanwhile, the positive examples and the negative examples of the targets of each category in the multiple targets are trained by adopting a multi-example classifier in a second detection model, so that the second detection model obtains the capability of distinguishing the positive examples and the negative examples of the targets of each category, and further obtains the capability of target detection, namely the second detection capability.

In order to obtain a better training effect, an attention mechanism and a multi-scale feature fusion operation in an SSLOD module (a first detection model) are still adopted in the MIL _ classifier model of the second detection model, and a frame overlapping suppression algorithm similar to the non-maximum suppression is adopted (for example, in the definition of the loss function loss _ function, a penalty term is added to the loss _ function of a target with a larger detection frame IOU (cross-over ratio) and a smaller confidence coefficient C, which has the highest feature score, so as to eliminate the influence of the frame overlapping), so that the MIL _ classifier model finally obtains the capability of performing class K pixel two classification on the image, and can identify whether each pixel in the image belongs to a certain class of targets in class K. In other words, the ability to classify positive and negative examples in a class K target is obtained.

Referring to fig. 2, an embodiment of the present application further provides a method for training image target detection capability, which includes:

s201, training a first image with a first identifier so that a first detection model obtains first detection capability and model parameters forming the first detection capability;

s202, analyzing a second image with a second identifier by using the first detection model to acquire first information, wherein the first information at least comprises a plurality of targets and the categories of the targets;

s203, determining a positive example and a negative example of the target of each category in the plurality of targets according to the first information by using a second detection model;

s204, the second detection model obtains the model parameters of the first detection model, so that the second detection model inherits the first detection capability of the first detection model, and trains the positive examples and the negative examples of the targets of each category in the multiple targets, so that the second detection model obtains a second detection capability;

for detailed descriptions of steps S201 to S203, please refer to the descriptions of the corresponding parts of the above embodiments, which are not repeated herein.

S205, analyzing a third image, a fourth image and a test image by using the second detection model to respectively obtain a first detection result, a second detection result and a third detection result, wherein the first detection result comprises the third image and a fifth image, the second detection result comprises the fourth image and a sixth image, the third image is an unidentified first image, the fourth image is an unidentified second image, the fifth image is a first image with a third identifier, and the sixth image is a second image with a fourth identifier; and

optionally, the first image without any artificial annotation, the second image without any artificial annotation and the test image without any artificial annotation (testdata) are sent to an MIL module of a second detection model together for detection and analysis, so as to obtain a first detection result of the first image, a second detection result of the second image and a third detection result of the test image; the first detection result comprises an unidentified first image and a first image with a third identifier, and the second detection result comprises an unidentified second image and a second image with a fourth identifier. Specifically, after the MIL module of the second detection model is used to perform pixel two classification (i.e., positive example and negative example classification) on the first image, the second image and the test image in K categories, instantiation segmentation of the target in each category in the first image, the second image and the test image can be realized through certain logic post-processing, that is, a certain target is distinguished from a background and other targets in significance, and finally, a minimum rectangle is enclosed on the target segmented by the instantiation, and the enclosure is the OD detection result of the target. Optionally, the second detection model further includes a pseudo tag module (pseudo _ GT module), and the pseudo tag module of the second detection model is adopted to perform templated replacement on the OD detection results of the first image, the second image and the test image to obtain a data format (e.g., json or xml format) for training and testing, that is, the first detection result, the second detection result and the third detection result are obtained respectively.

Optionally, the third identifier includes orientation information, category, and confidence of each target in the first image obtained by the second detection model. The fourth identification comprises the position information, the category and the confidence of each target in the second image obtained by the detection of the second detection model. The third detection result includes the orientation information, the category, and the confidence of all the targets in the test image.

It should be noted that the test image is only used for testing and is not used for model training in the whole learning and training process.

S206, when the accuracy of the third detection result meets a preset condition or the number of times of training of the second detection model is smaller than a first preset threshold, the first detection result and the second detection result are sent to the first detection model for cyclic training so as to improve the first detection capability of the first detection model and further improve the second detection capability of the second detection model.

Optionally, the first preset threshold may be, but not limited to, 2 times, 3 times, 4 times, 5 times, 6 times, and the like, and may be specifically determined according to the accuracy and economic cost of the second detection capability required to be obtained by the second detection model, which is not specifically limited in this application.

Optionally, the pseudo tag module of the second detection model is used to determine the accuracy of the third detection result of the test image, and when the accuracy of the third detection result reaches a preset condition and is smaller than a preset value, for example, but not limited to being smaller than 90%, and then smaller than 95%, or the number of times of training of the second detection model is smaller than a first preset threshold, it is considered that the detection accuracy of the model can be further improved, at this time, the first detection result and the second detection result are sent to the first detection model as a training set, and training and learning are continued to improve the first detection capability of the first detection model, in other words, the accuracy of the first detection model for target detection is improved, so that the second detection capability of the second detection model is improved, that is, the accuracy of the second detection model for target detection is improved. Because the second detection model inherits the detection capability of the first detection model, the first detection capability of the first detection model is improved, and the second detection capability of the second detection model is improved accordingly. Along with the continuous promotion of second detection model testing capability, the first testing result that obtains at every turn and the second testing result degree of accuracy also can constantly improve, adopt first testing result and the second testing result after the degree of accuracy improves as new training set to train first detection model, can improve first detection model's first testing capability again, and then improve the second detection model's second testing capability. Therefore, only a small number of first images and a small number of second images are needed, the detection capability and accuracy of the second detection model can be continuously improved by executing the steps S201 to S206 circularly on the first image, the second image, the first detection result and the second detection result, a large number of true value samples are not needed, the cost for obtaining the true value samples (namely the cost for obtaining the first image with the first identifier) is greatly reduced, and meanwhile, the second detection model has the self-learning capability and the second detection capability of the second detection model is continuously improved.

In addition, when a user continuously collects a new scene image and needs to upgrade the version of the second detection model, the new scene image can be used as the second image, and the steps S202 to S206 are executed in a circulating manner again.

For the features of this embodiment that are the same as those of the above embodiment, please refer to the description of the corresponding portions of the above embodiment, which is not repeated herein.

Referring to fig. 3, an embodiment of the present application further provides an image target detection capability training method, which includes:

s301, training a first image with a first identifier so that a first detection model obtains first detection capability and model parameters forming the first detection capability;

s302, analyzing a second image with a second identifier by using the first detection model to acquire first information, wherein the first information at least comprises a plurality of targets and categories of the targets;

s303, determining a positive example and a negative example of each category of targets in the plurality of targets according to the first information by using a second detection model;

s304, the second detection model obtains the model parameters of the first detection model, so that the second detection model inherits the first detection capability of the first detection model, and trains the positive examples and the negative examples of each category of targets in the multiple targets, so that the second detection model obtains a second detection capability;

s305, analyzing the first unidentified image, the second unidentified image and the unidentified test image by using the second detection model to respectively obtain a first detection result, a second detection result and a third detection result, wherein the first detection result comprises the first unidentified image and the first image with the third identification, and the second detection result comprises the second unidentified image and the second image with the fourth identification;

for detailed descriptions of step S301 to step S305, please refer to the description of the corresponding parts of the above embodiments, which are not repeated herein.

S306, detecting the test image by adopting the first detection model to obtain a fourth detection result; and

optionally, the test image is sent to the first detection model, and detection analysis is performed on the test image to obtain a fourth detection result of the test image. And the fourth detection result comprises the category, the azimuth information and the confidence of all the targets included in the test image.

After the first detection model and the second detection model complete the training of the designated round, the third image, the fourth image and the test image are tested once, where the number of tests is n, where n is a positive integer, and when n is 1, step S307 is executed.

S307, when the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is larger than a second preset threshold or the number of times of training of the second detection model is smaller than a first preset threshold, sending the first detection result and the second detection result into the first detection model to continue training, so as to improve the first detection capability of the first detection model and improve the second detection capability of the second detection model.

Optionally, after the first detection model and the second detection model complete a training round and obtain the first detection result, the second detection result and the third detection result of the first unidentified first image, the second unidentified second image and the unidentified test image respectively, comparing the difference between the accuracy of the third detection result and the accuracy of the fourth detection result with a second preset threshold value, when the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than a second preset threshold or the number of times of training of the second detection model is less than a first preset threshold, the first detection result and the second detection result are sent to the first detection model for continuous training, to increase the first detection capability of the first detection model and thereby increase the second detection capability of the second detection model.

Optionally, the accuracy of the third detection result may be obtained by the confidence of each target obtained by detecting the test image with the first detection model, and the accuracy of the fourth detection result may be obtained by the confidence of each target obtained by detecting the test image with the second detection model.

Alternatively, the second preset threshold may be, but is not limited to, 3%, 5%, 8%, 10%, 15%, etc.

In a specific embodiment, the first preset threshold is 3 times, the second preset threshold is 5%, and when the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than 5% or n is less than 3, the first detection result and the second detection result are sent to the first detection model for continuous training, so as to improve the first detection capability of the first detection model, and thus improve the second detection capability of the second detection model.

Referring to fig. 4, in some embodiments, after step S307, the method further includes: s308, training the first detection result and the second detection result to improve the first detection capability of the first detection model, so as to improve the second detection capability of the second detection model.

Specifically, a first detection result and a second detection result are sent to a first detection model, and the first detection result and the second detection result are trained to improve the first detection capability of the first detection model, so that the second detection capability of the second detection model is improved.

Referring to fig. 5, an embodiment of the present application further provides a method for training image target detection capability, which includes:

s401, training a first image with a first identifier so that a first detection model obtains first detection capability and model parameters forming the first detection capability;

s402, analyzing a second image with a second identifier by using the first detection model to acquire first information, wherein the first information at least comprises a plurality of targets and the categories of the targets;

s403, determining a positive example and a negative example of the target of each category in the plurality of targets according to the first information by using a second detection model;

s404, the second detection model obtains the model parameters of the first detection model, so that the second detection model inherits the first detection capability of the first detection model, and trains the positive examples and the negative examples of the targets of each category in the multiple targets, so that the second detection model obtains a second detection capability;

s405, analyzing a third image, a fourth image and a test image by using the second detection model to obtain a first detection result, a second detection result and a third detection result respectively, wherein the first detection result comprises a first image which is not marked and a first image with a third mark, and the second detection result comprises a second image which is not marked and a second image with a fourth mark;

after the first detection model and the second detection model finish the training of the appointed round, the third image, the fourth image and the test image are tested once, the test frequency is n, wherein n is a positive integer, when n is equal to 1, the steps S406, S407 and S409 are executed, and when n is larger than or equal to 2, the steps S408 and S409 are executed.

S406, detecting the test image by adopting the first detection model to obtain a fourth detection result;

optionally, the test image is sent to the first detection model, and detection analysis is performed on the test image to obtain a fourth detection result of the test image. And the fourth detection result comprises the category, the azimuth information and the confidence of all targets included in the test image.

S407, when the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than a second preset threshold or the training frequency of the second detection model is less than a first preset threshold, sending the first detection result and the second detection result into the first detection model for continuous training so as to improve the first detection capability of the first detection model and thus improve the second detection capability of the second detection model; and

for detailed descriptions of steps S401 to S407, refer to the descriptions of corresponding parts of the above embodiments, which are not repeated herein. After step S407, step S409 is executed.

S408, detecting the test image by using the second detection model to obtain an nth third detection result; when the difference between the accuracy of the third detection result of the nth time and the accuracy of the third detection result of the (n-1) th time is greater than a second preset threshold or the training frequency of the second detection model is less than a first preset threshold, sending the first detection result and the second detection result into the first detection model for continuous training so as to improve the first detection capability of the first detection model and further improve the second detection capability of the second detection model;

optionally, after each round of training is finished, detecting the test image by using a second detection model to obtain a third detection result for the nth time, and when the difference between the accuracy of the third detection result for the nth time and the accuracy of the third detection result for the (n-1) th time is greater than a second preset threshold, for example, greater than 5% or 10%; or when the training frequency of the second detection model is smaller than a first preset threshold value, sending the first detection result and the second detection result into the first detection model for continuous training so as to improve the first detection capability of the first detection model and further improve the second detection capability of the second detection model. After step S408, step S409 is performed.

S409, training the first detection result and the second detection result to improve the first detection capability of the first detection model, so as to improve the second detection capability of the second detection model.

For a detailed description of step S409, refer to the description of the corresponding parts of the above embodiments, which are not repeated herein.

Referring to fig. 6, an embodiment of the present application further provides an apparatus 500 for training image target detection capability, which includes:

a first detection unit 510, configured to train a first image with a first identifier, so that a first detection model obtains a first detection capability and forms a model parameter of the first detection capability; and

a second detecting unit 530, configured to analyze the second image with the second identifier by using the first detection model to obtain first information, where the first information at least includes a plurality of targets and categories of the plurality of targets;

the second detecting unit 530 is further configured to determine, according to the first information, a positive example and a negative example of the target in each category of the plurality of targets by the second detection model;

the second detecting unit 530 is further configured to obtain the model parameters of the first detecting model by the second detecting model, so that the second detecting model inherits the first detecting capability of the first detecting model, and train the positive examples and the negative examples of the targets of each category of the multiple targets, so that the second detecting model obtains a second detecting capability.

The image target detection capability training apparatus 500 according to the embodiment of the application may inherit the model parameters of the first detection model, so that the second detection model may obtain the first detection capability of the first detection model, and meanwhile, the second detection model may determine the positive examples and the negative examples of the targets of each category according to the first information obtained by the first detection model, and train the positive examples and the negative examples of the targets of each category in the targets, so that the second detection model obtains the second detection capability, thereby improving the detection capability of the second detection model, and thus, only fewer first images with the first identifier (i.e., fewer number of true value samples) need to be used, and thus, stronger target detection capability may be obtained.

In some embodiments, the first detection unit 510 is further configured to: and performing illumination distortion, geometric distortion and image shielding on the first image by adopting a data enhancement algorithm in a target detection algorithm.

Referring to fig. 7, in some embodiments, the second detecting unit 530 is further configured to analyze a third image, a fourth image and a test image by using the second detection model to obtain a first detection result, a second detection result and a third detection result, respectively, where the first detection result includes the third image and a fifth image, the second detection result includes the fourth image and a sixth image, the third image is an unidentified first image, the fourth image is an unidentified second image, the fifth image is a first image with a third identifier, and the sixth image is a second image with a fourth identifier; the image target detection capability training device 500 further includes an analysis and determination unit 550, where the analysis and determination unit 550 is configured to, when the accuracy of the third detection result meets a preset condition or the number of times of training of the second detection model is smaller than a first preset threshold, send the first detection result and the second detection result into the first detection model for cycle training, so as to improve the first detection capability of the first detection model, and thus improve the second detection capability of the second detection model.

In some embodiments, after the first detection model and the second detection model complete the training of the designated round, the third image, the fourth image and the test image are tested once, the number of tests is n, where n is a positive integer, and when n is 1,

the first detecting unit 510 is further configured to detect the test image by using the first detection model to obtain a fourth detection result;

the analysis and judgment unit 550 is further configured to determine whether the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than a second preset threshold; or when the training times of the second detection model are smaller than a first preset threshold, the first detection result and the second detection result are sent to the first detection model to be trained continuously, so that the first detection capability of the first detection model is improved, and the second detection capability of the second detection model is improved.

In some embodiments, after the first detection model and the second detection model complete a specified round of training, a test is performed on the third image, the fourth image and the test image for a specified number of times, where n is a positive integer, and when n is greater than or equal to 2,

the first detecting unit 510 is further configured to train the first detecting result and the second detecting result to improve a first detecting capability of the first detecting model, so as to improve the second detecting capability of the second detecting model;

the analysis and judgment unit 550 is further configured to determine that a difference between the accuracy of the third detection result at the nth time and the accuracy of the third detection result at the (n-1) th time is greater than a second preset threshold.

For the features of this embodiment that are the same as those of the above embodiments, reference is made to the description of the corresponding portions of the above embodiments, which are not repeated herein.

The embodiment of the present application further provides a computer-readable storage medium, where executable program codes are stored in the computer-readable storage medium, and the computer-executable program codes are used for enabling a computer to execute the image target detection capability training method according to the embodiment of the present application.

Referring to fig. 8, an electronic device 600 according to an embodiment of the present application is further provided, which includes a processor 610 and a memory 630, where the memory 630 stores a program code executable by the processor 610, and when the program code is called and executed by the processor 610, the method for training the image target detection capability according to an embodiment of the present application is performed.

The memory 630, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the image target detection capability training method in the embodiment of the present invention. The processor 610 executes various functional applications and data processing of the server by running nonvolatile software programs, instructions and modules stored in the memory 630, so as to implement the image target detection capability training method of the above method embodiment.

May include Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact disk Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Furthermore, the method is simple. Any connection is properly termed a computer-readable medium. For example, if software is transmitted from a website, a server, or other remote source using a coaxial cable, a fiber optic cable, a twisted pair, a Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, the coaxial cable, the fiber optic cable, the twisted pair, the DSL, or the wireless technologies such as infrared, radio, and microwave are included in the fixation of the medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy Disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The electronic device 600 of the present invention includes, but is not limited to, computers, notebook computers, tablet computers, mobile phones, cameras, smart bands, smart watches, smart glasses, and other electronic devices with display screens.

Reference in the specification to "an embodiment" or "an implementation" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. Furthermore, it should be understood that the features, structures, or characteristics described in the embodiments of the present application may be combined arbitrarily without contradiction between them to form another embodiment without departing from the spirit and scope of the present application.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present application and not for limiting, and although the present application is described in detail with reference to the above preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present application without departing from the spirit and scope of the technical solutions of the present application.

Claims

1. An image target detection capability training method is characterized by comprising the following steps:

the second detection model determines positive examples and negative examples of the targets of each category in the plurality of targets according to the first information; and

2. The method according to claim 1, wherein after training the positive examples and the negative examples of each class of the target so that the second detection model obtains a second detection capability, the method further comprises:

3. The method for training the image target detection capability according to claim 2, wherein after the first detection model and the second detection model complete a specified round of training, a third image, a fourth image and a test image are tested once, where n is a positive integer, and when n is 1, the method further comprises:

4. The method for training the image target detection capability of claim 2, wherein after the first detection model and the second detection model complete a specified round of training, a test is performed on a third image, a fourth image and a test image for a time, where n is a positive integer, and when n is greater than or equal to 2, the method further comprises:

5. The method for training the image target detection capability of any one of claims 1-4, wherein before the training the first image with the first identifier, the method further comprises:

and performing illumination distortion, geometric distortion and image shielding on the first image by adopting a data enhancement algorithm in a target detection algorithm.

6. The method according to any one of claims 1-4, wherein the first information further includes orientation information of each of the objects in the second image and a confidence of each of the objects, and the orientation information of each of the objects includes coordinates and a vector of the object.

7. An image object detectability training apparatus, comprising:

the second detection unit is further configured to obtain the model parameters of the first detection model by the second detection model, so that the second detection model inherits the first detection capability of the first detection model, and train the positive examples and the negative examples of the targets of each category in the multiple targets, so that the second detection model obtains a second detection capability.

8. The apparatus according to claim 7, wherein the second detection unit is further configured to analyze a third image, a fourth image, and a test image by using the second detection model to obtain a first detection result, a second detection result, and a third detection result, respectively, where the first detection result includes the third image and a fifth image, the second detection result includes the fourth image and a sixth image, the third image is an unidentified first image, the fourth image is an unidentified second image, the fifth image is a first image with a third identifier, and the sixth image is a second image with a fourth identifier;

9. The apparatus according to claim 8, wherein after the first and second detection models complete a given round of training, a test is performed on a third image, a fourth image and a test image once, where n is a positive integer, and when n is 1,

the analysis and judgment unit is further used for judging whether the difference between the accuracy of the third detection result and the accuracy of the fourth detection result is greater than a second preset threshold; or when the training times of the second detection model are smaller than a first preset threshold, the first detection result and the second detection result are sent to the first detection model to be trained continuously, so that the first detection capability of the first detection model is improved, and the second detection capability of the second detection model is improved.

10. The apparatus for training image target detection capability according to claim 8, wherein the first detection model and the second detection model perform a test on the third image, the fourth image and the test image after completing a specified training round, the number of tests is n, wherein n is a positive integer, when n is greater than or equal to 2,

detecting the test image by adopting the second detection model to obtain an nth third detection result;

11. The apparatus for training image object detection capability according to any one of claims 7-10, wherein the first detection unit is further configured to: and performing illumination distortion, geometric distortion and image shielding on the first image by adopting a data enhancement algorithm in a target detection algorithm.

12. The apparatus according to any one of claims 7 to 10, wherein the first information further includes orientation information of each of the objects in the second image and a confidence of each of the objects, and the orientation information of each of the objects includes coordinates and a vector of the object.

13. A computer-readable storage medium having computer-executable program code stored thereon for causing a computer to perform the method of any one of claims 1-6.

14. An electronic device, comprising a processor and a memory, the memory storing program code executable by the processor, the program code when invoked and executed by the processor performing the method of any of claims 1-6.