CN111814835A

CN111814835A - Training method and device of computer vision model, electronic equipment and storage medium

Info

Publication number: CN111814835A
Application number: CN202010533723.3A
Authority: CN
Inventors: 苑帅; 付万豪; 刘殿超; 张观良; 王刚
Original assignee: Ricoh Software Research Center Beijing Co Ltd
Current assignee: Ricoh Software Research Center Beijing Co Ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-10-23
Anticipated expiration: 2040-06-12
Also published as: CN111814835B

Abstract

The application discloses a training method and a device of a computer vision model, electronic equipment and a storage medium, wherein the training method of the computer vision model comprises the following steps: identifying the sample images in the first sample image set by using a computer vision model to obtain an image identification result; classifying the sample images based on the image recognition result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images; and performing iterative training on the computer vision model according to the second sample image set, and performing parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage. According to the method and the device, iterative training is carried out on the model by selecting the typical sample image, the time for manually marking the sample is reduced, and the difficult sample image can be obtained by comparing the manual marking result with the model marking result, so that the model with better recognition performance is trained.

Description

Training method and device of computer vision model, electronic equipment and storage medium

Technical Field

The application relates to the technical field of computer vision, in particular to a training method and device of a computer vision model, electronic equipment and a storage medium.

Background

With the arrival of the artificial intelligence era, the deep learning technology has been widely applied to various fields, and particularly in the field of computer vision, the deep learning model also plays an important role. For the training of the deep learning model, when the samples of different situations are faced, a large number of training samples and time are needed to improve the model performance, so that the expected effect is achieved.

However, in the existing deep learning training process, difficult samples of the training model are difficult to obtain, so that the detection performance of many initial models on some difficult samples under different conditions is poor, and therefore the model needs to be continuously improved and updated to obtain better performance.

Disclosure of Invention

In view of the above, the present application is proposed to provide a training method, apparatus, electronic device and storage medium for a computer vision model that overcomes or at least partially solves the above mentioned problems.

According to a first aspect of the present application, there is provided a method of training a computer vision model, comprising:

identifying the sample images in the first sample image set by using a computer vision model to obtain an image identification result;

classifying the sample images in the first sample image set based on the image identification result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images;

and performing iterative training on the computer vision model according to the second sample image set, and performing parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage.

Optionally, the image recognition result includes an image prediction probability value, classifying the sample images in the first sample image set based on the image recognition result, and respectively selecting at least part of the sample images under each classification based on a statistical distribution function includes:

comparing the image prediction probability value with a preset probability value, and determining a fuzzy sample image and a clear sample image according to a comparison result;

and respectively carrying out sample extraction on the fuzzy sample image and the clear sample image based on a normal distribution probability model.

Optionally, the generating a second sample image set according to the selected sample image includes:

and determining the sample attribute of each sample image in the second sample image set according to the image recognition result so as to determine the training sample weight of the computer vision model according to the sample attribute.

Optionally, the determining, according to the image recognition result, the sample attribute of each sample image in the second sample image set includes:

comparing the image identification result of each sample image in the second sample image set with the image artificial labeling result;

and determining the sample difficulty and easiness attribute of each sample image according to the comparison result.

Optionally, the determining the sample difficulty attribute of each sample image according to the comparison result includes:

if the difference degree represented by the comparison result is greater than a preset threshold value, determining that the sample difficulty attribute of the corresponding sample image is difficult, otherwise, determining that the sample difficulty attribute is conventional;

the determining the sample attributes of the sample images in the second sample image set according to the image recognition result further includes:

and taking the comparison result of each sample image with the sample difficult and easy attribute as the labeling difference attribute of the corresponding sample image.

Optionally, the annotation difference attribute comprises at least one of a candidate box position difference, a candidate box size difference, and a candidate box category difference, and the determining the training sample weight of the computer vision model according to the sample attribute comprises:

and determining the training sample weight of the corresponding sample image according to the labeling difference attribute of each sample image which is difficult to be marked by the sample difficulty attribute.

Optionally, the iteratively training the computer vision model according to the second sample image set comprises:

and visually displaying the image identification result and the labeling information of each sample image in the second sample image set through a front end page.

According to a second aspect of the present application, there is provided a method for implementing a computer vision task, including:

acquiring an image to be processed;

and executing the computer vision task based on the image to be processed by utilizing a computer vision model to obtain a task execution result, wherein the computer vision model is obtained by training based on the training method of the computer vision model.

According to a third aspect of the present application, there is provided a training apparatus for a computer vision model, comprising:

the identification unit is used for identifying the sample images in the first sample image set by using a computer vision model to obtain an image identification result;

the selecting unit is used for classifying the sample images in the first sample image set based on the image recognition result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images;

and the training unit is used for carrying out iterative training on the computer vision model according to the second sample image set and carrying out parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage.

Optionally, the image recognition result includes an image prediction probability value, and the selecting unit is further configured to:

Optionally, the selecting unit is further configured to:

Optionally, the labeling difference attribute includes at least one of a candidate box position difference, a candidate box size difference, and a candidate box category difference, and the selecting unit is further configured to:

Optionally, the training unit is further configured to:

According to a fourth aspect of the present application, there is provided an apparatus for implementing a computer vision task, comprising:

the acquisition unit is used for acquiring an image to be processed;

and the execution unit is used for executing the computer vision task based on the image to be processed by utilizing a computer vision model to obtain a task execution result, wherein the computer vision model is obtained by training based on the training device of the computer vision model.

According to a fifth aspect of the present application, there is provided an electronic device comprising: a processor; and a memory arranged to store computer executable instructions which, when executed, cause the processor to perform a method of training a computer vision model as described in any one of the above, or a method of implementing a computer vision task as described above.

According to a sixth aspect of the application, a computer-readable storage medium is provided, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement a method of training a computer vision model as described in any one of the above, or a method of implementing a computer vision task as described above.

According to the technical scheme, the sample images in the first sample image set are identified by using the computer vision model, so that an image identification result is obtained; classifying the sample images in the first sample image set based on the image identification result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images; and performing iterative training on the computer vision model according to the second sample image set, and performing parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage. According to the method and the device, iterative training is performed on the computer vision model by selecting the typical sample images, so that time and energy for manually marking the sample images are greatly saved, and the training efficiency of the model is greatly improved. In addition, by comparing the manual labeling result with the model labeling result, the image of the difficult sample can be obtained, so that the model with better recognition performance is trained. Through visual display of training effects, indexes such as intersection ratio and the like can be seen more visually, sample recognition results which meet requirements but have deviation with actual results can be found conveniently, and the model is more accurate.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a flow diagram of a method of training a computer vision model according to one embodiment of the present application;

FIG. 2 illustrates a sample image selection strategy according to one embodiment of the present application;

FIG. 3 illustrates a schematic flow chart of a training process of a computer vision model according to an embodiment of the present application;

FIG. 4 illustrates a diagram of recognition results of a computer vision model according to one embodiment of the present application;

FIG. 5 illustrates a recognition effect diagram of a computer vision model according to one embodiment of the present application;

FIG. 6 shows a flow diagram of a method of implementation of a computer vision task, according to an embodiment of the present application;

FIG. 7 shows a schematic structural diagram of a training apparatus for computer vision models, according to an embodiment of the present application;

FIG. 8 shows a schematic structural diagram of an apparatus for performing computer vision tasks according to an embodiment of the present application;

FIG. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 10 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Under the scene such as solar power station patrols and examines based on unmanned aerial vehicle carries out, because the difference such as the position of power station, weather and shooting condition, the image sample of gathering also has very big difference between each power station, in order to guarantee that the model all has better processing performance to the image sample under the different conditions, the model needs continuous improvement and renewal.

Based on this, an embodiment of the present application provides a training method of a computer vision model, as shown in fig. 1, the training method of the computer vision model includes the following steps S110 to S130:

and step S110, identifying the sample images in the first sample image set by using a computer vision model to obtain an image identification result.

The computer vision model of the embodiment of the application can be referred to as an image recognition model, and the essence of the image recognition model refers to the fact that a computer is used for processing, analyzing and understanding images so as to recognize various different modes of targets and objects. In the prior art, model frames for image recognition mainly include a ResNet residual error network, an inclusion network (a computer vision model, which has no Chinese translation name temporarily), and the like, and those skilled in the art can flexibly select the model frames according to actual needs without specific limitations.

Besides determining a basic computer vision model framework, a sample image for training, namely a first sample image set, needs to be determined, and then the sample image in the sample image set is identified by using a computer vision model, so that an image identification result is obtained. The sample image may refer to a sample image that has been manually labeled with a category.

And step S120, classifying the sample images in the first sample image set based on the image recognition result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images.

The images usually contain rich feature information, and in order to extract main features from the images as a classification basis, an image recognition model needs to be established by analyzing and learning similarities and differences among images in a large number of image samples, so that a great amount of labor and time is often consumed for obtaining an image recognition model with better performance. Therefore, in order to improve the training efficiency of the model and save the time and energy of manual labeling, the embodiment of the application reduces the samples for labeling and training by selecting typical samples. The selection of the training samples is a key step in the model training process, and the selection of the training samples has larger influence on the model classification precision than the selection of the classification model, so the quality and the quality of the selection of the training samples determine the high and low of the model classification precision to a certain extent.

In specific implementation, after obtaining the recognition result of each sample image in the first sample image set, the embodiment of the present application classifies the sample images in the first sample image set based on the image recognition result, where the classification may include two dimensions, one is a category of a target object in a sample (e.g., a person, an animal, etc. in the sample image), and the other is an attribute category of the sample (e.g., a fuzzy sample/a clear sample, a difficult sample/a regular sample, etc.), and after obtaining the sample classification, each category sample in each classification dimension is respectively selected based on a statistical distribution function, and then a second sample image set is generated according to the selected sample image, and is used as a basis of a subsequent training model.

And step S130, performing iterative training on the computer vision model according to the second sample image set, and performing parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage.

After the sample images are selected to obtain a sample image set, iterative training is carried out on the computer vision model according to the sample image set, parameters of the computer vision model are updated according to training loss values obtained in each iterative training stage, and the training can be finished until a final training result reaches a convergence condition. The iterative training is a process of continuously updating the parameters of the model, and aims to continuously reduce the difference between the output class and the real class of the model, namely the training loss value, in each iterative updating process so as to improve the performance of the model.

Through the process, the typical image sample is selected for iterative training of the computer vision model, so that better model identification effect can be obtained, time and energy for manually marking the image sample are saved greatly, and the training efficiency of the model is greatly improved.

In an embodiment of the present application, the image recognition result includes an image prediction probability value, classifying the sample images in the first sample image set based on the image recognition result, and respectively selecting at least some sample images under each classification based on a statistical distribution function includes: comparing the image prediction probability value with a preset probability value, and determining a fuzzy sample image and a clear sample image according to a comparison result; and respectively carrying out sample extraction on the fuzzy sample image and the clear sample image based on a normal distribution probability model.

In specific implementation, when the sample image is selected based on the category of the sample image, the following steps may be adopted in the embodiment of the present application:

1) when a computer vision model is used for predicting a first sample image set, a category prediction result (such as an image prediction probability value) of each sample image is obtained, the maximum probability value in the prediction probability value of each sample image is compared with a preset probability value (which can be determined according to the number of the sample images of each category, and is default to 0.5), if the maximum probability value is greater than the preset probability value, the model indicates that the sample image has a determined classification result, a clear sample image is correspondingly obtained, and otherwise, a fuzzy sample image is obtained.

2) Assuming that the type X of the Sample image is a random variable, the value of the random variable is (X1, X2, …, xn), and p (xi) represents the probability of occurrence of the event xi (i.e., the probability of the Sample image belonging to the type xi), and Σ p (xi) 1, let h (X) - Σ p (xi) logp (xi) e X, when determining the Clear Sample and the blurred Sample, a boundary about h (X) may be defined according to the actual situation, if the predicted probability value of the Sample image output by the model is greater than h (X), the corresponding Sample image is a blurred Sample (FS for short), and otherwise, the corresponding Sample image is a Clear Sample (CS for short).

3) The fuzzy samples and the clear samples can be regarded as samples obtained based on the attribute class dimensionality of the samples, as shown in fig. 2, when the fuzzy samples and the clear samples are subjected to sample extraction, extraction can be performed based on a normal distribution probability model, a curve 1 in fig. 2 is a total sample image curve, namely a distribution curve of a first sample image set, and a curve 2 is a selection strategy curve of a sample image, wherein a preset probability value is taken as a demarcation point. For samples obtained based on the class dimensions of the target objects in the samples, the samples under each target object class are extracted according to an average proportion (such as 1: 1) as far as possible, so that the extracted samples of each object class are ensured to be balanced as far as possible, and the training effect of the model is improved.

In an embodiment of the present application, the generating the second sample image set according to the selected sample image includes: and determining the sample attribute of each sample image in the second sample image set according to the image recognition result so as to determine the training sample weight of the computer vision model according to the sample attribute.

As mentioned above, in the scene of solar power station inspection based on unmanned aerial vehicle, because the source of the image sample is different and the collection condition is different, the sample does not have a label value and characteristic information, and can also have different weights for representing the importance degree of the sample, and the classification accuracy of the model can be further improved by integrating the information (label value, characteristic information, weight, etc.) into the model for training.

In specific implementation, after obtaining the image recognition result of each sample image in the first sample image set and the second sample image set by the model, the embodiment of the application may determine the sample attribute of each sample image in the second sample image set according to the image recognition result, and further determine the training sample weight of the computer vision model according to the sample attribute.

In an embodiment of the application, the determining, according to the image recognition result, a sample attribute of each sample image in the second sample image set includes: comparing the image identification result of each sample image in the second sample image set with the image artificial labeling result; and determining the sample difficulty and easiness attribute of each sample image according to the comparison result.

In specific implementation, the sample attributes in the embodiment of the present application may include sample difficulty and difficulty attributes, that is, attributes used to characterize whether the sample image can be accurately identified by the model. After the image recognition results of the sample images in the second sample image set are obtained, the image recognition results can include image pre-labeling results, the image pre-labeling results are compared with manual labeling results, and then the sample difficulty and difficulty attributes of the sample images are determined according to the comparison results.

In an embodiment of the application, the determining a sample difficulty attribute of each sample image according to the comparison result includes: if the difference degree represented by the comparison result is greater than a preset threshold value, determining that the sample difficulty attribute of the corresponding sample image is difficult, otherwise, determining that the sample difficulty attribute is conventional; the determining the sample attributes of the sample images in the second sample image set according to the image recognition result further includes: and taking the comparison result of each sample image with the sample difficult and easy attribute as the labeling difference attribute of the corresponding sample image.

The sample difficulty and easiness attributes can comprise a difficulty attribute and a conventional attribute, correspondingly, when the image pre-labeling result of the model is compared with the manual labeling result, the difference degree between the image pre-labeling result and the manual labeling result can be determined, if the difference degree exceeds a certain preset threshold (can be flexibly set according to actual conditions), the sample image is considered as a difficult sample image, and if the difference degree does not exceed the certain preset threshold, the sample image is considered as a conventional sample image. A Hard sample (Hard sample) may refer to a sample whose features are difficult for a model to learn about. In order to improve the learning capability and generalization capability of the model to the difficult sample images, the difference between the image pre-labeling result and the manual labeling result can be further used as the labeling difference attribute of the sample image, so that the importance degree of each sample image in the model training process can be quantized according to the standard difference attribute, and the model performance is improved.

In one embodiment of the present application, the annotation difference attribute comprises at least one of a candidate box position difference, a candidate box size difference, and a candidate box category difference, and the determining the training sample weight of the computer vision model according to the sample attribute comprises: and determining the training sample weight of the corresponding sample image according to the labeling difference attribute of each sample image which is difficult to be marked by the sample difficulty attribute.

The annotation difference attribute in the embodiment of the application may be regarded as an attribute for measuring the importance degree of the sample image, and the factors that affect the importance degree of the sample image may specifically include a position difference of a candidate box of the image (such as a barycentric offset of the candidate box), a size difference of the candidate box (such as a height and a width of the candidate box), a category difference of the candidate box, and the like, and may also include a prediction probability value difference of the image output by the model, and the like. Specifically, because the difference between the image recognition result of the difficult sample image and the manual labeling result is larger than that of the conventional sample image, and sufficient attention needs to be paid to improve the recognition capability of the model on the difficult sample image, the embodiment of the present application may determine the training sample weight of each sample image whose sample difficult attribute is difficult by using the following formula:

the weight of the hard sample is a, Δ (width), b, Δ (height), c, center of gravity offset, d, Δ (prediction probability value), e, Δ (candidate box type), wherein the Δ symbols all refer to absolute values of differences between the output result of the model and the result of the artificial labeling, and a, b, c, d and e are empirical constants which can be flexibly set and adjusted according to the actual training effect, and are not specifically limited herein.

In one embodiment of the present application, the iteratively training the computer vision model from the second sample image set comprises: and visually displaying the image identification result and the labeling information of each sample image in the second sample image set through a front end page.

In order to visually see the training effect of the model, the embodiment of the application further visually displays the image recognition result (for example, the image prediction probability value output by the model) and the labeling information (for example, the information of the positions, sizes, categories, and the like of the candidate box and the artificial labeling box) of each sample image in the second sample image set through the front end page. In addition, in order to more intuitively see the difference between the model output result and the real result, an Intersection-over-unity (IoU) index can be displayed on a front-end page, wherein IoU refers to the overlapping rate of a candidate frame generated by the model and an original labeling frame or an artificial labeling frame, namely the ratio of the Intersection and the Union of the candidate frame and the original labeling frame, and the larger the IoU value is, the closer the candidate frame and the artificial labeling frame are proved to be, and the better the model identification effect is, so that the embodiment of the application can find out a sample identification result which meets the model training requirement but has a deviation with the actual result according to the size of IoU value, so as to further improve the model and enable the model to be more accurate.

In one embodiment of the present application, the training method of the computer vision model further includes: judging whether the training result of the computer vision model meets a preset convergence condition or not according to the training loss value obtained in each iterative training stage; and determining whether to continue iterative training of the computer vision model according to the judgment result.

During specific implementation, the training effect of the current model can be evaluated, if the model does not reach the expected effect, namely the preset convergence condition, the iterative training is continued to update the model, and the steps of labeling, model training and the like in the process are repeated. If the model achieves the desired effect after evaluation, the training process of the model is ended.

As shown in fig. 3, a schematic diagram of a training process of a computer vision model is provided. Firstly, a first sample image set is obtained, a computer vision model is trained by utilizing the first sample image set, an image recognition result, namely a pre-labeling result, is obtained, and then sample images of the first sample image set are selected based on the image recognition result, so that a second sample image set is obtained. And comparing the pre-labeling result and the manual labeling result of each sample image in the second sample image set based on the previous image identification result to obtain the difference between the pre-labeling result and the manual labeling result, when the difference value does not exceed a preset threshold value, considering the corresponding sample image as a conventional sample image, and if the difference value exceeds the preset threshold value, considering the corresponding sample image as a difficult sample image, so as to mark the sample attribute, and performing iterative training on the computer vision model. And evaluating the training effect of the model after each iterative training, finishing the training when the model effect reaches the expected effect, and outputting the final computer vision model.

As shown in fig. 4, a schematic diagram of the recognition result of the computer vision model is provided, and the rectangular boxes in fig. 4 are, from left to right: the wrong candidate box is a candidate box lower than the preset probability value, a candidate box which accords with the preset probability value and needs to be corrected, and the correct candidate box is an artificial labeling box. The difference between the candidate frame output by the model and the candidate frame marked manually can be visually seen, so that the subsequent improvement on the model is facilitated.

As shown in fig. 5, a schematic diagram of the recognition effect of a computer vision model in a specific task scene is provided, where fig. 5(a) is a model recognition effect of a solar power station image segmentation task, and fig. 5(b) is a model recognition effect diagram of a solar power station fault detection task, and it can be seen that the computer vision model obtained by training with the training method of the present application obtains a better recognition effect in both the image segmentation task and the fault detection task of the solar power station.

An embodiment of the present application further provides an implementation method of a computer vision task, as shown in fig. 6, the implementation method of the computer vision task includes the following steps S610 to S620:

step S610, acquiring an image to be processed.

Step S620, executing a computer vision task based on the image to be processed by using a computer vision model to obtain a task execution result, wherein the computer vision model is obtained by training based on the computer vision model training method as described in any one of the previous items.

When executing tasks such as image segmentation or fault detection of the solar power station, the task image to be processed may be obtained first, and then the computer vision model obtained by the training method is used to perform image segmentation or fault detection on the task image to be processed, so as to obtain a task processing result. The computer vision model can be obtained by training specifically by the following method:

identifying the sample images in the first sample image set by using a computer vision model to obtain an image identification result; classifying the sample images in the first sample image set based on the image identification result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images; and performing iterative training on the computer vision model according to the second sample image set, and performing parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage.

An embodiment of the present application provides a training apparatus 700 for a computer vision model, as shown in fig. 7, the training apparatus 700 for a computer vision model includes: an identification unit 710, a selection unit 720 and a training unit 730.

The identifying unit 710 in this embodiment of the application is configured to identify the sample images in the first sample image set by using a computer vision model, so as to obtain an image identification result.

The selecting unit 720 in this embodiment is configured to classify the sample images in the first sample image set based on the image recognition result, and respectively select at least part of the sample images under each classification based on a statistical distribution function, and generate a second sample image set according to the selected sample images.

The training unit 730 of the embodiment of the application is configured to perform iterative training on the computer vision model according to the second sample image set, and perform parameter updating or end training on the computer vision model according to a training loss value obtained in each iterative training stage.

In an embodiment of the present application, the image recognition result includes an image prediction probability value, and the extracting unit 720 is further configured to: comparing the image prediction probability value with a preset probability value, and determining a fuzzy sample image and a clear sample image according to a comparison result; and respectively carrying out sample extraction on the fuzzy sample image and the clear sample image based on a normal distribution probability model.

In an embodiment of the present application, the selecting unit 720 is further configured to: and determining the sample attribute of each sample image in the second sample image set according to the image recognition result so as to determine the training sample weight of the computer vision model according to the sample attribute.

In an embodiment of the present application, the selecting unit 720 is further configured to: comparing the image identification result of each sample image in the second sample image set with the image artificial labeling result; and determining the sample difficulty and easiness attribute of each sample image according to the comparison result.

In an embodiment of the present application, the selecting unit 720 is further configured to: if the difference degree represented by the comparison result is greater than a preset threshold value, determining that the sample difficulty attribute of the corresponding sample image is difficult, otherwise, determining that the sample difficulty attribute is conventional; and taking the comparison result of each sample image with the sample difficult and easy attribute as the labeling difference attribute of the corresponding sample image.

In an embodiment of the present application, the annotation difference attribute includes at least one of a difference in position of the candidate box, a difference in size of the candidate box, and a difference in category of the candidate box, and the selecting unit 720 is further configured to: and determining the training sample weight of the corresponding sample image according to the labeling difference attribute of each sample image which is difficult to be marked by the sample difficulty attribute.

In an embodiment of the present application, the training unit 730 is further configured to: and visually displaying the image identification result and the labeling information of each sample image in the second sample image set through a front end page.

An embodiment of the present application further provides an apparatus 800 for implementing a computer vision task, as shown in fig. 8, the apparatus 800 for implementing a computer vision task includes: an acquisition unit 810 and an execution unit 820.

The obtaining unit 810 of the embodiment of the application is configured to obtain an image to be processed.

The execution unit 820 of the embodiment of the present application is configured to execute a computer vision task based on the image to be processed by using a computer vision model, so as to obtain a task execution result, where the computer vision model is obtained by training based on the training apparatus of the computer vision model as described above.

When executing tasks such as solar power station image segmentation or fault detection, a task image to be processed can be obtained first, and then the computer vision model obtained by the training method is used for performing image segmentation processing or fault detection processing on the task image to be processed, so that a task processing result is obtained. The computer vision model can be obtained by training with the following training device:

the identification unit is used for identifying the sample images in the first sample image set by using a computer vision model to obtain an image identification result; the selecting unit is used for classifying the sample images in the first sample image set based on the image recognition result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images; and the training unit is used for carrying out iterative training on the computer vision model according to the second sample image set and carrying out parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

In summary, according to the technical scheme of the application, the sample images in the first sample image set are identified by using the computer vision model, so that an image identification result is obtained; classifying the sample images in the first sample image set based on the image identification result, respectively selecting at least part of the sample images under each classification based on a statistical distribution function, and generating a second sample image set according to the selected sample images; and performing iterative training on the computer vision model according to the second sample image set, and performing parameter updating or finishing training on the computer vision model according to the training loss value obtained in each iterative training stage. According to the method and the device, iterative training is performed on the computer vision model by selecting the typical sample images, so that time and energy for manually marking the sample images are greatly saved, and the training efficiency of the model is greatly improved. In addition, by comparing the manual labeling result with the model labeling result, the image of the difficult sample can be obtained, so that the model with better recognition performance is trained. Through visual display of training effects, indexes such as intersection ratio and the like can be seen more visually, sample recognition results which meet requirements but have deviation with actual results can be found conveniently, and the model is more accurate.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a training apparatus for computer vision models or an implementation apparatus for computer vision tasks according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 9 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 900 comprises a processor 910 and a memory 920 arranged to store computer executable instructions (computer readable program code). The memory 920 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 920 has a storage space 930 storing computer readable program code 931 for performing any of the method steps described above. For example, the storage space 930 for storing the computer readable program code may comprise respective computer readable program codes 931 for implementing various steps in the above methods, respectively. The computer readable program code 931 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as that shown in fig. 10. FIG. 10 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 1000 stores computer readable program code 931 for performing the steps of the method according to the present application, readable by the processor 910 of the electronic device 900, the computer readable program code 931, when executed by the electronic device 900, causing the electronic device 900 to perform the steps of the method described above, and in particular the computer readable program code 931 stored by the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 931 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A method of training a computer vision model, comprising:

2. The method of claim 1, wherein the image recognition result comprises an image prediction probability value, and the classifying the sample images in the first sample image set based on the image recognition result and selecting at least some of the sample images under each classification based on a statistical distribution function respectively comprises:

3. The method of claim 1, wherein the generating a second set of sample images from the selected sample images comprises:

4. The method for training a computer vision model according to claim 3, wherein the determining the sample attributes of the sample images in the second sample image set according to the image recognition result comprises:

5. The method of claim 4, wherein determining the sample difficulty attribute of each sample image according to the comparison comprises:

6. The method of claim 5, wherein the annotation difference attribute comprises at least one of a candidate box position difference, a candidate box size difference, and a candidate box category difference, and wherein determining the training sample weight for the computer vision model based on the sample attribute comprises:

7. A method of training a computer vision model according to any one of claims 1 to 6, wherein the iteratively training the computer vision model from the second sample image set comprises:

8. A method for implementing a computer vision task, comprising:

acquiring an image to be processed;

and executing a computer vision task based on the image to be processed by utilizing a computer vision model to obtain a task execution result, wherein the computer vision model is obtained by training based on the training method of the computer vision model as claimed in any one of claims 1 to 6.

9. An apparatus for training a computer vision model, comprising:

10. An apparatus for performing a computer vision task, comprising:

the acquisition unit is used for acquiring an image to be processed;

an execution unit, configured to execute a computer vision task based on the to-be-processed image by using a computer vision model, so as to obtain a task execution result, where the computer vision model is obtained by training based on the training apparatus of the computer vision model according to claim 9.

11. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of training a computer vision model as claimed in any one of claims 1 to 6 or claim 7, or a method of implementing a computer vision model as claimed in claim 8.

12. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method of training a computer vision model as claimed in any one of claims 1 to 6 or claim 7, or a method of implementing a computer vision model as claimed in claim 8.