CN116681997A - Classification method, system, medium and equipment for bad scene images - Google Patents

Classification method, system, medium and equipment for bad scene images Download PDF

Info

Publication number
CN116681997A
CN116681997A CN202310696005.1A CN202310696005A CN116681997A CN 116681997 A CN116681997 A CN 116681997A CN 202310696005 A CN202310696005 A CN 202310696005A CN 116681997 A CN116681997 A CN 116681997A
Authority
CN
China
Prior art keywords
image
classified
images
training
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310696005.1A
Other languages
Chinese (zh)
Other versions
CN116681997B (en
Inventor
常雨喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Original Assignee
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shumei Tianxia Beijing Technology Co ltd, Beijing Nextdata Times Technology Co ltd filed Critical Shumei Tianxia Beijing Technology Co ltd
Priority to CN202310696005.1A priority Critical patent/CN116681997B/en
Priority claimed from CN202310696005.1A external-priority patent/CN116681997B/en
Publication of CN116681997A publication Critical patent/CN116681997A/en
Application granted granted Critical
Publication of CN116681997B publication Critical patent/CN116681997B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/772Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application belongs to the field of image recognition, and particularly relates to a classification method, a classification system, a classification medium and classification equipment for bad scene images. By determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.

Description

Classification method, system, medium and equipment for bad scene images
Technical Field
The application belongs to the field of image recognition, and particularly relates to a classification method, a classification system, a classification medium and classification equipment for bad scene images.
Background
The traditional Cutmix enhancement, when randomly cropping the source data, most likely the selected portion is all from a background area that is not label independent, then its contribution to the label of the new image generated is negligible. For example, the source data is an indoor smoking scene, the area of randomly selecting cut is selected to be an indoor sofa or other furniture, no person or tobacco is selected at all, and when the area is pasted on the target data, errors are caused to a new label, so that learning and training of a model are affected. So that the processing accuracy of learning the trained model is too low.
Disclosure of Invention
The application aims to provide a classification method, a classification system, a classification medium and classification equipment for bad scene images.
The technical scheme for solving the technical problems is as follows: a method of classifying an image of a poor scene, comprising:
acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;
training a detection model according to the labeling image set and the pre-training weight;
acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;
and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.
The beneficial effects of the application are as follows: by determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
On the basis of the technical scheme, the application can be improved as follows.
Further, the detection result includes: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:
randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,
the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.
Further, when no detection result is obtained, combining the detection results of any two images to be classified, and performing Cutmix processing on the two images to be classified, wherein the process of obtaining the composite image is as follows:
randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.
Further, the pre-training weights are:
training weights of the historical detection model obtained through coco data set training are used.
The other technical scheme for solving the technical problems is as follows: a classification system for poor scene images, comprising:
the labeling module is used for: acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;
the training module is used for: training a detection model according to the labeling image set and the pre-training weight;
the detection module is used for: acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;
the classification module is used for: and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.
The beneficial effects of the application are as follows: by determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
Further, the detection result includes: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:
randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,
the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.
Further, when no detection result is obtained, combining the detection results of any two images to be classified, and performing Cutmix processing on the two images to be classified, wherein the process of obtaining the composite image is as follows:
randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.
Further, the pre-training weights are:
training weights of the historical detection model obtained through coco data set training are used.
The other technical scheme for solving the technical problems is as follows: a storage medium having instructions stored therein which, when read by a computer, cause the computer to perform the method of any of the preceding claims.
The beneficial effects of the application are as follows: by determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
The other technical scheme for solving the technical problems is as follows: an electronic device includes the storage medium and a processor executing instructions within the storage medium.
The beneficial effects of the application are as follows: by determining a specific detection model, the detection model is used as an auxiliary model, and in classification model training, cutmix enhancement is guided to generate a composite image set with richer semantics to assist training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of a method for classifying images of a poor scene;
FIG. 2 is a block diagram of a classification system for poor scene images according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a classification model framework provided by an embodiment of a classification method for poor scene images according to the present application;
FIG. 4 is a schematic diagram illustrating image data filling provided by an embodiment of a classification method for poor scene images according to the present application;
FIG. 5 is a schematic diagram of target data provided by an embodiment of a classification method for poor scene images according to the present application;
FIG. 6 is a schematic view of source data provided by an embodiment of a classification method for poor scene images according to the present application;
FIG. 7 is a schematic view of a cutmix provided by an embodiment of a classification method for poor scene images according to the present application;
fig. 8 is a schematic diagram of detutmix provided in an embodiment of a classification method for bad scene images according to the present application.
Detailed Description
The principles and features of the present application are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the application.
As shown in fig. 1, a method for classifying an image of a poor scene includes:
acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;
training a detection model according to the labeling image set and the pre-training weight;
acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;
and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.
In some possible embodiments, the training is aided by determining a targeted detection model as an auxiliary model, and guiding the Cutmix enhancement to produce a more semantically rich synthetic image set in the classification model training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
The process of obtaining the image data corresponding to each piece of label information based on the label information in the label dictionary, labeling all pieces of image data, and generating a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data can be as follows:
and designing a mapping dictionary according to the labels of the bad scene classification models, wherein one classification label corresponds to the labels of the plurality of detection models. Classification Label set is denoted S cls ={l 0 ,l 1 ,...,l n E.g.: the classified label set comprises gambling, health hazard, potential safety hazard and the like, and the detection model label set is marked as S det ={l' 00 ,l' 01 ,...,l' nm E.g.: the detection model tag set includes: chip, table, slot machine, dice, casino, mah-jong, playing card, drinking, smoking, driving and playing mobile phone, etc., classifying a certain label l i Corresponds to k detection labels { l' i0 ,l' i1 ,...,l' ik }. For example, classification labels are: gambling, the detection label corresponding to the classification label comprises: chips, tables, slot machines, dice, casinos, mahjong, and poker. The gambling and the corresponding detection label are used as a subset of the mapping dictionaryThe set of the plurality of subsets is determined as a mapping dictionary.
After the tag dictionary is confirmed, the training set for training the detection model is obtained by crawling or collecting the image data corresponding to the detection tags for marking, and then a contraband detection model capable of identifying the objects is trained for generating data after data enhancement required by classification model training. The labeling is by manual labeling. The neural network model designed in the scheme uses a supervised training mode (a large amount of manual annotation data is required to be collected for training); the classification model and the detection model are trained by respective manually marked data (the marking method of the classification model is to manually mark the data to be marked according to the category to be classified, such as a cat and dog classification model, and manually mark whether a cat or a dog exists in the data, and the training data of the detection model is used for marking the category and also for marking the position frame of the target by using a rectangular frame).
The specific process of training the detection model according to the labeling image set and the pre-training weight is as follows:
the detection model uses an L model of Yolov7, and the pre-training weight used by the detection model is obtained by training weights of a historical model obtained by training the prepared contraband data finetune in the coco data set. The Coco dataset is a published large-scale dataset that can be used for image detection (imaging). It has more than 330K images (220K of which are marked images) containing 150 tens of thousands of objects, 80 object categories (pedestrians, cars, elephants, etc.); deep neural networks often perform training on pre-training weights trained on a public dataset in a customized training task, rather than starting training on randomly initialized weights. The method has the advantages that the model is easy to converge, the training speed is high, and the result is better. The "pretraining weight obtained by coco data set training" is a pretraining weight of a parameter of common neural network training.
At least two images to be classified, the categories of which are the detection categories of the detection model, are obtained, each image to be classified is respectively processed through the detection model, and the specific process for obtaining the detection result corresponding to each image to be classified is as follows:
because the model for cloud service needs to process massive instant data, the model needs to meet accuracy and even performance at the same time, and after balancing, an acceptance v3 model is selected as a classification model. As shown in fig. 3, when the classification model trains forward propagation, the detection result (each identified target frame and probability) is obtained by the detection model first for the data of each batch (batch processing), and it should be noted that the image data input into the detection model by default is uniform image data with the same size and format, i.e. the sizes, formats, etc. of any two images to be classified are the same.
Combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on bad scene images through the trained classification model, wherein the specific process comprises the following steps of:
randomly selecting an image to be classified from the batch data as target data x A Its label is y A According to the detection result obtained by the detection model processing, selecting the stuck area M A The detection model can carry out target detection on the detected picture according to the detection task of the detection model. Taking a simple cat and dog detection model as an example, for a picture containing dogs, the trained model outputs a detected rectangular frame, the top, bottom, left and right are the boundaries of the dogs in the picture, and the class score of the dogs, i.e. the probability of the dogs in the identified target, is output. In this scheme, if a piece of detected data contains an object identified by a detection model, such as playing cards, mahjong, etc., the detection model outputs the position of each object in the image, such as playing cards, mahjong, etc., and the probability of which detection tag class it belongs to.
That is, after detection by the detection model, any piece of data in the batch, namely, any piece of data in the image to be classified is selectedAs shown in FIG. 5, the results of the detection model (whether there are identified targets, locations and probabilities) are known, and the data itself has labeling results of the classification labels. Based on these two information, the area M to be pasted can be selected A : on the pasted data, a region is randomly selected, and if targets identified by the detection model exist in the region, the region is selected again, and the target regions are avoided. (if the black sample is a black sample of a certain label, the positions of all detection frames are avoided, the cutting area is randomly selected, the white sample can be directly and randomly selected, the black sample is an image to be classified with a detection result, and the white sample is an image to be classified without the detection result); randomly selecting one piece of image to be classified from the rest of images as source data x B Its label is y B As shown in fig. 6, the pasting region M is selected based on the detection result obtained by the detection model processing B (for black samples, selecting a detection label target corresponding to the classification label of the data from a detection set identified by corresponding detection, and randomly taking a target with a probability larger than 0.5. In order to obtain richer semantic information, cutting with margin is performed according to the outward random expansion of the position of a target frame, wherein margin is the meaning, and because the detection frame identified by a detection model is just at the edge of the target; will M B Conversion of the size of the region into M A Is pasted to the target data M after the size of (2) A Is y= (1- γ) y A +γy B Wherein, gamma is a super parameter between 0 and 1. The weight used to balance the label duty ratios of the two pictures can be set to a number near 0 if the label duty ratio of source is greater, a number near 1 if target is greater, and a number near 0.5 if balancing is desired. An example is shown in fig. 4. It can be seen that the Cutmix random selection may select the background area, as shown in fig. 7, which interferes with model learning; detCutmix will be selected accuratelyAreas containing key targets and avoiding the important area paste on the target data.
As shown in fig. 8, the improved DetCutmix enhancement mode can effectively improve the accuracy of the model obtained by training based on the Cutmix enhancement mode, improve the robustness and generalization of the model, and also obviously improve recall of bad scene data.
Preferably, in any of the foregoing embodiments, the detection result includes: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:
randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,
the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.
It should be noted that, the detection result further includes the position information of the target frame and the detection probability of each target, the relative position area may be determined by the position information of the target frame in the detection result, and the determination of whether the two relative position areas overlap may be: and acquiring a blank image with the same size as the image to be classified, drawing the two relative position areas in the blank image according to the corresponding position information, and judging whether the two relative position areas in the blank image have an overlapping part or not at the moment, wherein the overlapping part does not comprise the overlapping of the boundary line.
Preferably, in any of the above embodiments, when no detection result is obtained, the two images to be classified are combined with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:
randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.
Preferably, in any of the above embodiments, the pre-training weights are:
training weights of the historical detection model obtained through coco data set training are used.
As shown in fig. 2, a classification system for bad scene images includes:
the labeling module 100 is configured to: acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;
the training module 200 is used for: training a detection model according to the labeling image set and the pre-training weight;
the detection module 300 is used for: acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;
the classification module 400 is configured to: and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.
In some possible embodiments, the training is aided by determining a targeted detection model as an auxiliary model, and guiding the Cutmix enhancement to produce a more semantically rich synthetic image set in the classification model training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
Preferably, in any of the foregoing embodiments, the detection result includes: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:
randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,
the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.
Preferably, in any of the above embodiments, when no detection result is obtained, the two images to be classified are combined with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:
randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.
Preferably, in any of the above embodiments, the pre-training weights are:
training weights of the historical detection model obtained through coco data set training are used.
The other technical scheme for solving the technical problems is as follows: a storage medium having instructions stored therein which, when read by a computer, cause the computer to perform a moving object tracking method as claimed in any one of the preceding claims.
In some possible embodiments, the training is aided by determining a targeted detection model as an auxiliary model, and guiding the Cutmix enhancement to produce a more semantically rich synthetic image set in the classification model training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
The other technical scheme for solving the technical problems is as follows: an electronic device includes the storage medium and a processor executing instructions within the storage medium.
In some possible embodiments, the training is aided by determining a targeted detection model as an auxiliary model, and guiding the Cutmix enhancement to produce a more semantically rich synthetic image set in the classification model training. Finally, a classification model with better robustness than the model obtained by training the original Cutmix is obtained, the generalization performance and the accuracy of the whole classification model are improved, and the wind control image content is more accurately checked.
The reader will appreciate that in the description of this specification, a description of terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the method embodiments described above are merely illustrative, e.g., the division of steps is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple steps may be combined or integrated into another step, or some features may be omitted or not performed.
The above-described method, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-only memory (ROM), a random access memory (RAM, randomAccessMemory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The present application is not limited to the above embodiments, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the present application, and these modifications and substitutions are intended to be included in the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (10)

1. A method of classifying an image of a poor scene, comprising:
acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;
training a detection model according to the labeling image set and the pre-training weight;
acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;
and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.
2. The method of claim 1, wherein the detecting result comprises: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:
randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,
the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.
3. The method for classifying an image of a poor scene according to claim 1, wherein when no detection result is obtained, the two images to be classified are cutmax processed in combination with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:
randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.
4. The method of claim 1, wherein the pre-training weights are:
training weights of the historical detection model obtained through coco data set training are used.
5. A classification system for poor scene images, comprising:
the labeling module is used for: acquiring image data corresponding to each piece of label information based on the label information in the label dictionary, and labeling all pieces of image data to generate a labeled image set, wherein the labeled image set comprises labeled images of each piece of image data;
the training module is used for: training a detection model according to the labeling image set and the pre-training weight;
the detection module is used for: acquiring at least two images to be classified, the categories of which are the detection categories of the detection model, and respectively processing each image to be classified through the detection model to obtain a detection result corresponding to each image to be classified;
the classification module is used for: and combining detection results of any two images to be classified, carrying out Cutmix processing on the two images to be classified to obtain a synthetic image, forming a synthetic image set by all the synthetic images, training a classification model by taking the synthetic image set as a training set to obtain a trained classification model, and carrying out classification processing on the bad scene images through the trained classification model.
6. The system for classifying an image of a poor scene of claim 5, wherein said detecting comprises: combining the detection results of any two images to be classified by the target frame of each identified target, and carrying out Cutmix processing on the two images to be classified to obtain a composite image, wherein the process of obtaining the composite image comprises the following steps of:
randomly cutting off a local image in a first area corresponding to any target frame in a first image to be classified, selecting a local image of a second area from preset target frames in a second image to be classified, filling the first area, determining the filled first image to be classified as a composite image,
the relative position area of the preset target frame in the second image to be classified is not overlapped with the relative position area of any target frame selected from the first image to be classified in the first image to be classified.
7. The classification system of claim 5, wherein when no detection result is obtained, the two images to be classified are combined with the detection result of any two images to be classified, and the process of obtaining the composite image is as follows:
randomly selecting an arbitrary region in each image to be classified, exchanging local images in the two selected regions, and taking any exchanged image to be classified as a composite image.
8. The system for classifying an image of a poor scene of claim 5, wherein said pre-training weights are:
training weights of the historical detection model obtained through coco data set training are used.
9. A storage medium having stored therein instructions which, when read by a computer, cause the computer to perform the method of any of claims 1 to 4.
10. An electronic device comprising the storage medium of claim 9, a processor executing instructions within the storage medium.
CN202310696005.1A 2023-06-13 Classification method, system, medium and equipment for bad scene images Active CN116681997B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310696005.1A CN116681997B (en) 2023-06-13 Classification method, system, medium and equipment for bad scene images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310696005.1A CN116681997B (en) 2023-06-13 Classification method, system, medium and equipment for bad scene images

Publications (2)

Publication Number Publication Date
CN116681997A true CN116681997A (en) 2023-09-01
CN116681997B CN116681997B (en) 2024-05-17

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095240A (en) * 2023-10-16 2023-11-21 之江实验室 Blade classification method and system based on fine granularity characteristics

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200019816A1 (en) * 2017-03-27 2020-01-16 Shenzhen Institutes Of Advanced Technology Classification method and classification device of indoor scene
CN114387486A (en) * 2022-01-19 2022-04-22 中山大学 Image classification method and device based on continuous learning
CN114494777A (en) * 2022-01-24 2022-05-13 西安电子科技大学 Hyperspectral image classification method and system based on 3D CutMix-transform
CN114708274A (en) * 2021-11-11 2022-07-05 北京工商大学 Image segmentation method and system of T-CutMix data enhancement and three-dimensional convolution neural network based on real-time selection mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200019816A1 (en) * 2017-03-27 2020-01-16 Shenzhen Institutes Of Advanced Technology Classification method and classification device of indoor scene
CN114708274A (en) * 2021-11-11 2022-07-05 北京工商大学 Image segmentation method and system of T-CutMix data enhancement and three-dimensional convolution neural network based on real-time selection mechanism
CN114387486A (en) * 2022-01-19 2022-04-22 中山大学 Image classification method and device based on continuous learning
CN114494777A (en) * 2022-01-24 2022-05-13 西安电子科技大学 Hyperspectral image classification method and system based on 3D CutMix-transform

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117095240A (en) * 2023-10-16 2023-11-21 之江实验室 Blade classification method and system based on fine granularity characteristics

Similar Documents

Publication Publication Date Title
CN107169049B (en) Application tag information generation method and device
JP2022541199A (en) A system and method for inserting data into a structured database based on image representations of data tables.
Wilkinson et al. Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections
Kadam et al. Detection and localization of multiple image splicing using MobileNet V1
CN112800848A (en) Structured extraction method, device and equipment of information after bill identification
CN109902285B (en) Corpus classification method, corpus classification device, computer equipment and storage medium
CN103810274A (en) Multi-feature image tag sorting method based on WordNet semantic similarity
US11380033B2 (en) Text placement within images using neural networks
CN101398846A (en) Image, semantic and concept detection method based on partial color space characteristic
CN112015721A (en) E-commerce platform storage database optimization method based on big data
CN113378815B (en) Scene text positioning and identifying system and training and identifying method thereof
CN116152840A (en) File classification method, apparatus, device and computer storage medium
CN112347997A (en) Test question detection and identification method and device, electronic equipment and medium
CN116824274B (en) Small sample fine granularity image classification method and system
Karatzas et al. An on-line platform for ground truthing and performance evaluation of text extraction systems
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN112464957A (en) Method and device for acquiring structured data based on unstructured bid document content
CN116681997B (en) Classification method, system, medium and equipment for bad scene images
CN111881900A (en) Corpus generation, translation model training and translation method, apparatus, device and medium
CN116681997A (en) Classification method, system, medium and equipment for bad scene images
CN114579796B (en) Machine reading understanding method and device
CN113779482B (en) Method and device for generating front-end code
CN114067343A (en) Data set construction method, model training method and corresponding device
CN114818639A (en) Presentation generation method, device, equipment and storage medium
CN114331932A (en) Target image generation method and device, computing equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant