CN116681997A

CN116681997A - Classification method, system, medium and equipment for bad scene images

Info

Publication number: CN116681997A
Application number: CN202310696005.1A
Authority: CN
Inventors: 常雨喆
Original assignee: Shumei Tianxia Beijing Technology Co ltd; Beijing Nextdata Times Technology Co ltd
Current assignee: Shumei Tianxia Beijing Technology Co ltd; Beijing Nextdata Times Technology Co ltd
Priority date: 2023-06-13
Filing date: 2023-06-13
Publication date: 2023-09-01
Anticipated expiration: 2043-06-13

Abstract

The application belongs to the field of image recognition, and particularly relates to a classification method, a classification system, a classification medium and classification equipment for bad scene images. By determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.

Description

Classification method, system, medium and equipment for bad scene images

Technical Field

The application belongs to the field of image recognition, and particularly relates to a classification method, a classification system, a classification medium and classification equipment for bad scene images.

Background

The traditional Cutmix enhancement, when randomly cropping the source data, most likely the selected portion is all from a background area that is not label independent, then its contribution to the label of the new image generated is negligible. For example, the source data is an indoor smoking scene, the area of randomly selecting cut is selected to be an indoor sofa or other furniture, no person or tobacco is selected at all, and when the area is pasted on the target data, errors are caused to a new label, so that learning and training of a model are affected. So that the processing accuracy of learning the trained model is too low.

Disclosure of Invention

The application aims to provide a classification method, a classification system, a classification medium and classification equipment for bad scene images.

The technical scheme for solving the technical problems is as follows: a method of classifying an image of a poor scene, comprising:

acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;

training a detection model according to the labeling image set and the pre-training weight;

acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;

and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.

The beneficial effects of the application are as follows: by determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.

On the basis of the technical scheme, the application can be improved as follows.

Further, the detection result includes: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:

randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,

the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.

Further, when no detection result is obtained, combining the detection results of any two images to be classified, and performing Cutmix processing on the two images to be classified, wherein the process of obtaining the composite image is as follows:

randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.

Further, the pre-training weights are:

training weights of the historical detection model obtained through coco data set training are used.

The other technical scheme for solving the technical problems is as follows: a classification system for poor scene images, comprising:

the labeling module is used for: acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;

the training module is used for: training a detection model according to the labeling image set and the pre-training weight;

the detection module is used for: acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;

the classification module is used for: and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.

Further, the pre-training weights are:

The other technical scheme for solving the technical problems is as follows: a storage medium having instructions stored therein which, when read by a computer, cause the computer to perform the method of any of the preceding claims.

The other technical scheme for solving the technical problems is as follows: an electronic device includes the storage medium and a processor executing instructions within the storage medium.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of a method for classifying images of a poor scene;

FIG. 2 is a block diagram of a classification system for poor scene images according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a classification model framework provided by an embodiment of a classification method for poor scene images according to the present application;

FIG. 4 is a schematic diagram illustrating image data filling provided by an embodiment of a classification method for poor scene images according to the present application;

FIG. 5 is a schematic diagram of target data provided by an embodiment of a classification method for poor scene images according to the present application;

FIG. 6 is a schematic view of source data provided by an embodiment of a classification method for poor scene images according to the present application;

FIG. 7 is a schematic view of a cutmix provided by an embodiment of a classification method for poor scene images according to the present application;

fig. 8 is a schematic diagram of detutmix provided in an embodiment of a classification method for bad scene images according to the present application.

Detailed Description

The principles and features of the present application are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the application.

As shown in fig. 1, a method for classifying an image of a poor scene includes:

In some possible embodiments, the training is aided by determining a targeted detection model as an auxiliary model, and guiding the Cutmix enhancement to produce a more semantically rich synthetic image set in the classification model training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.

The process of obtaining the image data corresponding to each piece of label information based on the label information in the label dictionary, labeling all pieces of image data, and generating a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data can be as follows:

and designing a mapping dictionary according to the labels of the bad scene classification models, wherein one classification label corresponds to the labels of the plurality of detection models. Classification Label set is denoted S _cls ＝{l ₀ ,l ₁ ,...,l _n E.g.: the classified label set comprises gambling, health hazard, potential safety hazard and the like, and the detection model label set is marked as S _det ＝{l' ₀₀ ,l' ₀₁ ,...,l' _nm E.g.: the detection model tag set includes: chip, table, slot machine, dice, casino, mah-jong, playing card, drinking, smoking, driving and playing mobile phone, etc., classifying a certain label l _i Corresponds to k detection labels { l' _i0 ,l' _i1 ,...,l' _ik }. For example, classification labels are: gambling, the detection label corresponding to the classification label comprises: chips, tables, slot machines, dice, casinos, mahjong, and poker. The gambling and the corresponding detection label are used as a subset of the mapping dictionaryThe set of the plurality of subsets is determined as a mapping dictionary.

After the tag dictionary is confirmed, the training set for training the detection model is obtained by crawling or collecting the image data corresponding to the detection tags for marking, and then a contraband detection model capable of identifying the objects is trained for generating data after data enhancement required by classification model training. The labeling is by manual labeling. The neural network model designed in the scheme uses a supervised training mode (a large amount of manual annotation data is required to be collected for training); the classification model and the detection model are trained by respective manually marked data (the marking method of the classification model is to manually mark the data to be marked according to the category to be classified, such as a cat and dog classification model, and manually mark whether a cat or a dog exists in the data, and the training data of the detection model is used for marking the category and also for marking the position frame of the target by using a rectangular frame).

The specific process of training the detection model according to the labeling image set and the pre-training weight is as follows:

the detection model uses an L model of Yolov7, and the pre-training weight used by the detection model is obtained by training weights of a historical model obtained by training the prepared contraband data finetune in the coco data set. The Coco dataset is a published large-scale dataset that can be used for image detection (imaging). It has more than 330K images (220K of which are marked images) containing 150 tens of thousands of objects, 80 object categories (pedestrians, cars, elephants, etc.); deep neural networks often perform training on pre-training weights trained on a public dataset in a customized training task, rather than starting training on randomly initialized weights. The method has the advantages that the model is easy to converge, the training speed is high, and the result is better. The "pretraining weight obtained by coco data set training" is a pretraining weight of a parameter of common neural network training.

At least two images to be classified, the categories of which are the detection categories of the detection model, are obtained, each image to be classified is respectively processed through the detection model, and the specific process for obtaining the detection result corresponding to each image to be classified is as follows:

because the model for cloud service needs to process massive instant data, the model needs to meet accuracy and even performance at the same time, and after balancing, an acceptance v3 model is selected as a classification model. As shown in fig. 3, when the classification model trains forward propagation, the detection result (each identified target frame and probability) is obtained by the detection model first for the data of each batch (batch processing), and it should be noted that the image data input into the detection model by default is uniform image data with the same size and format, i.e. the sizes, formats, etc. of any two images to be classified are the same.

Combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on bad scene images through the trained classification model, wherein the specific process comprises the following steps of:

randomly selecting an image to be classified from the batch data as target data x _A Its label is y _A According to the detection result obtained by the detection model processing, selecting the stuck area M _A The detection model can carry out target detection on the detected picture according to the detection task of the detection model. Taking a simple cat and dog detection model as an example, for a picture containing dogs, the trained model outputs a detected rectangular frame, the top, bottom, left and right are the boundaries of the dogs in the picture, and the class score of the dogs, i.e. the probability of the dogs in the identified target, is output. In this scheme, if a piece of detected data contains an object identified by a detection model, such as playing cards, mahjong, etc., the detection model outputs the position of each object in the image, such as playing cards, mahjong, etc., and the probability of which detection tag class it belongs to.

That is, after detection by the detection model, any piece of data in the batch, namely, any piece of data in the image to be classified is selectedAs shown in FIG. 5, the results of the detection model (whether there are identified targets, locations and probabilities) are known, and the data itself has labeling results of the classification labels. Based on these two information, the area M to be pasted can be selected _A : on the pasted data, a region is randomly selected, and if targets identified by the detection model exist in the region, the region is selected again, and the target regions are avoided. (if the black sample is a black sample of a certain label, the positions of all detection frames are avoided, the cutting area is randomly selected, the white sample can be directly and randomly selected, the black sample is an image to be classified with a detection result, and the white sample is an image to be classified without the detection result); randomly selecting one piece of image to be classified from the rest of images as source data x _B Its label is y _B As shown in fig. 6, the pasting region M is selected based on the detection result obtained by the detection model processing _B (for black samples, selecting a detection label target corresponding to the classification label of the data from a detection set identified by corresponding detection, and randomly taking a target with a probability larger than 0.5. In order to obtain richer semantic information, cutting with margin is performed according to the outward random expansion of the position of a target frame, wherein margin is the meaning, and because the detection frame identified by a detection model is just at the edge of the target; will M _B Conversion of the size of the region into M _A Is pasted to the target data M after the size of (2) _A Is y= (1- γ) y _A +γy _B Wherein, gamma is a super parameter between 0 and 1. The weight used to balance the label duty ratios of the two pictures can be set to a number near 0 if the label duty ratio of source is greater, a number near 1 if target is greater, and a number near 0.5 if balancing is desired. An example is shown in fig. 4. It can be seen that the Cutmix random selection may select the background area, as shown in fig. 7, which interferes with model learning; detCutmix will be selected accuratelyAreas containing key targets and avoiding the important area paste on the target data.

As shown in fig. 8, the improved DetCutmix enhancement mode can effectively improve the accuracy of the model obtained by training based on the Cutmix enhancement mode, improve the robustness and generalization of the model, and also obviously improve recall of bad scene data.

Preferably, in any of the foregoing embodiments, the detection result includes: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:

It should be noted that, the detection result further includes the position information of the target frame and the detection probability of each target, the relative position area may be determined by the position information of the target frame in the detection result, and the determination of whether the two relative position areas overlap may be: and acquiring a blank image with the same size as the image to be classified, drawing the two relative position areas in the blank image according to the corresponding position information, and judging whether the two relative position areas in the blank image have an overlapping part or not at the moment, wherein the overlapping part does not comprise the overlapping of the boundary line.

Preferably, in any of the above embodiments, when no detection result is obtained, the two images to be classified are combined with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:

Preferably, in any of the above embodiments, the pre-training weights are:

As shown in fig. 2, a classification system for bad scene images includes:

the labeling module 100 is configured to: acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;

the training module 200 is used for: training a detection model according to the labeling image set and the pre-training weight;

the detection module 300 is used for: acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;

the classification module 400 is configured to: and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.

Preferably, in any of the above embodiments, the pre-training weights are:

The other technical scheme for solving the technical problems is as follows: a storage medium having instructions stored therein which, when read by a computer, cause the computer to perform a moving object tracking method as claimed in any one of the preceding claims.

The reader will appreciate that in the description of this specification, a description of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the method embodiments described above are merely illustrative, e.g., the division of steps is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple steps may be combined or integrated into another step, or some features may be omitted or not performed.

The above-described method, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present application, and these modifications and substitutions are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. A method of classifying an image of a poor scene, comprising:

2. The method of claim 1, wherein the detecting result comprises: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:

3. The method for classifying an image of a poor scene according to claim 1, wherein when no detection result is obtained, the two images to be classified are cutmax processed in combination with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:

4. The method of claim 1, wherein the pre-training weights are:

5. A classification system for poor scene images, comprising:

6. The system for classifying an image of a poor scene of claim 5, wherein said detecting comprises: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:

7. The classification system of claim 5, wherein when no detection result is obtained, the two images to be classified are combined with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:

8. The system for classifying an image of a poor scene of claim 5, wherein said pre-training weights are:

9. A storage medium having stored therein instructions which, when read by a computer, cause the computer to perform the method of any of claims 1 to 4.

10. An electronic device comprising the storage medium of claim 9, a processor executing instructions within the storage medium.