CN117953350A - Diaper image detection method based on improved YOLOv network model - Google Patents
Diaper image detection method based on improved YOLOv network model Download PDFInfo
- Publication number
- CN117953350A CN117953350A CN202410350445.6A CN202410350445A CN117953350A CN 117953350 A CN117953350 A CN 117953350A CN 202410350445 A CN202410350445 A CN 202410350445A CN 117953350 A CN117953350 A CN 117953350A
- Authority
- CN
- China
- Prior art keywords
- diaper
- yolov
- feature
- prediction
- network model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 54
- 238000012360 testing method Methods 0.000 claims abstract description 24
- 238000013507 mapping Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 238000000034 method Methods 0.000 claims description 38
- 230000004927 fusion Effects 0.000 claims description 29
- 238000012795 verification Methods 0.000 claims description 22
- 238000004422 calculation algorithm Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 13
- 238000001914 filtration Methods 0.000 claims description 11
- 238000002372 labelling Methods 0.000 claims description 10
- 230000007246 mechanism Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 230000002776 aggregation Effects 0.000 claims description 6
- 238000004220 aggregation Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 4
- 206010037180 Psychiatric symptoms Diseases 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- IXURVUHDDXFYDR-UHFFFAOYSA-N 1-[4-(difluoromethoxy)-3-(oxolan-3-yloxy)phenyl]-3-methylbutan-1-one Chemical compound CC(C)CC(=O)C1=CC=C(OC(F)F)C(OC2COCC2)=C1 IXURVUHDDXFYDR-UHFFFAOYSA-N 0.000 claims 2
- 238000012545 processing Methods 0.000 abstract description 3
- 241000894007 species Species 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000013210 evaluation model Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012567 pattern recognition method Methods 0.000 description 1
- 238000000746 purification Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
- G06T7/0008—Industrial image inspection checking presence/absence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/52—Scale-space analysis, e.g. wavelet analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
- G06V10/763—Non-hierarchical techniques, e.g. based on statistics of modelling distributions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
- G06T2207/30124—Fabrics; Textile; Paper
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a paper diaper image detection method based on an improved YOLOv network model, which belongs to the field of image detection and comprises the following steps of: s1, collecting a diaper image data set and preprocessing; s2, training YOLOv a network model; s3, testing: inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result. According to the paper diaper image detection method based on the improved YOLOv network model, the subsequent data processing amount is reduced by preprocessing the acquired image, the detection speed is increased, and meanwhile, the detection accuracy and the robustness are improved through the improved YOLOv network model.
Description
Technical Field
The invention relates to the technical field of image detection, in particular to a paper diaper image detection method based on an improved YOLOv network model.
Background
Paper diapers are important products in the field of modern infant and adult care, the comfort and absorption properties of which are directly related to the health and quality of life of the user. Diapers generally have a variety of designs, such as animals, cartoon characters, etc., which not only increase the attractiveness of the product, but also attract the attention of the baby, making them more receptive to the use of the diaper.
The image on the diaper is required to be detected after the production of the diaper is finished so as to determine whether the phenomenon of image deletion exists, and the current diaper image detection method mainly depends on the traditional image processing technology based on deep learning, wherein YOLO (You Only Look Once) series of algorithms are paid attention to in the deep learning by the rapid and accurate characteristics of the algorithms. YOLOv7 is an improved version of the YOLO algorithm that uses an end-to-end architecture of convolutional neural networks that can detect and locate objects in images in real time.
The prior art discloses the following technique for image detection and recognition using YOLOv 7:
CN202211564267.4 discloses a bird identification method, system and medium based on improvement YOLOv7, the method steps include: acquiring a first image set, wherein the first image set is a history image in bird flight or a history image of a part of the bird which is blocked; performing motion blur data enhancement on the first image set to obtain a second image set; constructing YOLOv a model, and adding a parameter-free attention mechanism into the YOLOv model to obtain an improved YOLOv model; training the YOLOv model through the second image set to obtain an optimal improved YOLOv model; and identifying the newly acquired bird image through the optimal improvement YOLOv model to obtain the bird species.
CN202211397515.0 discloses a natural tree species identification method based on improvement YOLOv7, comprising the following steps: acquiring a natural tree species image, wherein the natural tree species image comprises: training images and test images; performing data enhancement on the training image by using a Mosaic data enhancement means to obtain an enhanced training image; constructing YOLOv a network, and improving the YOLOv network structure to obtain an improved YOLOv model; the improved YOLOv model comprises a backbone network, a detection head layer network, an attention mechanism module, rep and Conv, and four layers of feature images with different sizes are output through the detection head layer network; training the improved YOLOv model based on the training image, thereby obtaining better effect; inputting the test set image into a trained improved YOLOv model to obtain the identification result of the natural tree species.
As can be seen, the conventional pattern recognition and detection method based on YOLOv a 7 has the following defects:
1. the whole acquired image is required to be preprocessed, so that the calculated amount is increased, and the detection speed is reduced;
2. Accuracy and robustness are to be further improved.
Disclosure of Invention
In order to solve the problems, the invention provides a paper diaper image detection method based on an improved YOLOv network model, which reduces the subsequent data processing amount by preprocessing the acquired image, increases the detection speed, and improves the detection accuracy and robustness through the improved YOLOv network model.
In order to achieve the above object, the present invention provides a diaper image detection method based on an improved YOLOv network model, comprising the steps of:
s1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:
S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image;
S12, carrying out data enhancement on the diaper image data set to obtain an enhanced diaper image data set;
S13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;
s14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;
s2, training YOLOv a network model:
S3, testing:
inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.
Preferably, in step S11, the diaper data set includes a plurality of diaper imagesThe corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;
in step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.
Preferably, in step S12, the diaper image dataset is data enhanced using Mosaic data enhancement in combination with Mixup data enhancement with 20% probability;
the method for enhancing the Mosaic data comprises the following steps:
randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;
another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.
Preferably, the step S13 specifically includes the following steps:
And clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained K clustering center coordinates are used as the widths and heights of the real frames, the real frames are marked as (class, xmin, xmax, ymax), class represents the category of paper diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.
Preferably, the step S2 specifically includes the following steps:
S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, sampling 1/8, 1/16 and 1/32 times of the images through a main network of the YOLOv network models, extracting to obtain three effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by utilizing the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;
S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;
s23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;
s24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;
S25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;
S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;
S27, repeating the steps S24-S26 until convergence is achieved, and finishing YOLOv network model training;
The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;
The step S21 specifically includes the following steps:
S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling, outputting three feature layers with different scales, respectively setting the three feature layers as M5, M4 and M3 from small to large according to the scales, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;
S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;
S213, the light-weight detection head is utilized to adjust the channel number of the three output feature graphs P5, P4 and P3 to be 3 (5+num_class), and N prediction feature graphs are obtained, wherein num_class represents the category number.
Preferably, the step S212 specifically includes the following steps:
S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;
S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;
S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;
S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5.
Preferably, the step S22 specifically includes the following steps:
S221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;
S222, covering K/N priori frames of the corresponding prediction feature map on each anchor point;
S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.
Preferably, the step S23 specifically includes the following steps:
S231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:
;
in the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;
s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:
;
;
;
in the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)、The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; A confidence penalty for classification;
S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:
;
in the method, in the process of the invention, Is the total loss value;
S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.
Preferably, in step S24, the pant diaper image of the whole training set is input YOLOv into the network model in one epoch for forward propagation and reverse optimization of the network parameters;
in step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.
Preferably, the step S3 specifically includes the following steps:
S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;
S32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;
S33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;
And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.
The invention has the following beneficial effects:
on the basis of effectively detecting and identifying various pattern designs on the paper diaper, the accuracy and the robustness of paper diaper pattern detection can be improved by introducing a deeper convolutional neural network structure, an improved loss function and a data enhancement technology, and the method comprises the following steps:
(1) Through improving YOLOv algorithm, a feature filtering and purifying module is added between a main network and a feature fusion module, and a Feature Filtering and Purifying Module (FFPM) is utilized to perform new cascade fusion on the multi-scale features pre-input to the neck, so that cross-layer conflict is effectively filtered, feature learning is enhanced, and network detection accuracy is improved;
(2) Inspired by a multi-scale convolution attention module MSCA, a multi-scale feature enhancement Module (MSFE) is designed, the features after cascade fusion are further optimized, and rich guiding information is provided for shallow features;
(3) The Coordianate Attention (CA) module is introduced to be fused with the shallow layer characteristics after dimension reduction, so that the spatial position information of the paper diaper image target is reserved, and the characteristics are further enhanced;
(4) Considering the redundant connection existing in the original feature fusion module PANet, biFPN is introduced to improve the feature fusion efficiency of the network model, and BiFPN is also a bidirectional pyramid structure, as well as PANet, comprising the feature flow directions of the two branches from top to bottom and from bottom to top, so that the accuracy improvement of the network model is realized.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a logic flow diagram of a pant diaper image detection method based on the improved YOLOv network model of the present invention;
fig. 2 is a schematic diagram of a InceptionNext module structure of a diaper image detection method based on an improved YOLOv network model according to the present invention;
FIG. 3 is a schematic structural diagram of a feature filter and purification module FFPM of the present invention based on a diaper image detection method of the improved YOLOv network model;
FIG. 4 is a schematic diagram of a channel attention mechanism (CA) of a diaper image detection method based on an improved YOLOv network model of the present invention;
FIG. 5 is a graph showing the results of the simulation experiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the embodiment of the application, are intended for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
As shown in fig. 1-4, the diaper image detection method based on the improved YOLOv network model comprises the following steps:
s1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:
S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image, and using the labeling file as a reference standard in the training process, and calculating the loss value of a network and the performance of an evaluation model;
In step S11, the pant diaper dataset comprises a plurality of pant diaper images The corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;
s12, data enhancement is carried out on the diaper image data set, so that the enhanced diaper image data set is obtained, the diversity and the richness of the data are increased, and the generalization capability of the model is improved;
In step S12, data enhancement is performed on the diaper image dataset using mosaics data enhancement in combination with Mixup data enhancement with 20% probability;
the method for enhancing the Mosaic data comprises the following steps:
randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;
another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.
S13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;
preferably, the step S13 specifically includes the following steps:
And clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained I clustering center coordinates are taken as the widths and heights of the real frames (the I value is generally taken as 9), the real frames are marked as (class, xmin, xmax, ymax), class represents the types of diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.
S14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;
in step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.
S2, training YOLOv a network model:
In the embodiment, the training environment of the network is Python 3.8 and deep learning framework PyTorch 1.8.8, and CUDA is used for acceleration; the learning rate adjustment strategy is cosine annealing attenuation, and the initial learning rate is set to be 0.0001; setting the training epoch of the network to 300; setting a momentum parameter of the network to 0.937;
The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;
the step S2 specifically comprises the following steps:
S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, extracting through a main network of the YOLOv network models to obtain effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;
The step S21 specifically includes the following steps:
S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling by 1/8, 1/16 and 1/32 times, outputting three feature layers with different scales, respectively setting the feature layers as M5, M4 and M3 from small to large, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;
S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;
step S212 specifically includes the following steps:
S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;
S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;
S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;
S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5.
S213, the light-weight detection head is utilized to adjust the channel number of the three output feature graphs P5, P4 and P3 to be 3 (5+num_class), and N prediction feature graphs are obtained, wherein num_class represents the category number.
S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;
the step S22 specifically includes the following steps:
S221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;
S222, covering K/N prior frames of the corresponding prediction feature map on each anchor point, wherein the description is/represent division;
S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.
S23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;
the step S23 specifically includes the following steps:
S231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:
;
in the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;
s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:
;
;
;
in the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)、The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; A confidence penalty for classification;
S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:
;
in the method, in the process of the invention, Is the total loss value;
S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.
In S231-S234, the merging ratio (Intersection over Union, ioU) loss is calculated according to the predicted frames and the corresponding GT frames, the classification confidence loss and the frame confidence loss are calculated according to the classification confidence and the frame confidence of each predicted frame included in the network output feature map, the merging ratio loss, the classification confidence loss and the frame confidence loss are weighted and summed according to a preset proportion to obtain the network overall loss, and the network parameters are optimized by back propagation.
S24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;
In step S24, inputting YOLOv a diaper image of the whole training set into a YOLOv network model in one epoch for forward propagation and reverse optimization of network parameters;
S25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;
in step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.
S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;
S27, repeating the steps S24-S26 until convergence, and finishing the training of the YOLOv network model, wherein in the embodiment, the network convergence is judged through continuous multi-round AP value invariance or occurrence of a descending trend, and the performance of the YOLOv network model on the verification set is indicated to reach a stable level at the moment, so that the training is determined to be finished;
S3, testing:
inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.
The step S3 specifically comprises the following steps:
S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;
S32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;
S33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;
And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.
Simulation experiment
To verify the performance of the method proposed in this embodiment, the image in the test set is predicted using the modified YOLOv network, and the average accuracy mean mAP (MEAN AVERAGE accuracy) and the accuracy (accuracy) and Recall (Recall) for each category are calculated using the prediction result and GT. As shown in fig. 5, the present invention can detect various patterns of diapers, and has high accuracy.
Therefore, the paper diaper image detection method based on the improved YOLOv network model solves the problems of poor detection effect, low detection speed and the like of a small target in a paper diaper pattern detection task by introducing a quick attention mechanism at a specific position and adopting a high-efficiency detection head and other structures based on depth convolution, has larger advantages in detection precision and efficiency index compared with the existing advanced detection model, and well meets the real-time requirement in a practical scene.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.
Claims (10)
1. The diaper image detection method based on the improved YOLOv network model is characterized by comprising the following steps of: the method comprises the following steps:
s1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:
S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image;
S12, carrying out data enhancement on the diaper image data set to obtain an enhanced diaper image data set;
S13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;
s14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;
s2, training YOLOv a network model:
S3, testing:
inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.
2. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S11, the pant diaper dataset comprises a plurality of pant diaper imagesThe corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;
in step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.
3. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S12, data enhancement is performed on the diaper image dataset using mosaics data enhancement in combination with Mixup data enhancement with 20% probability;
the method for enhancing the Mosaic data comprises the following steps:
randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;
another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.
4. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S13 specifically includes the following steps:
And clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained K clustering center coordinates are used as the widths and heights of the real frames, the real frames are marked as (class, xmin, xmax, ymax), class represents the category of paper diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.
5. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S2 specifically comprises the following steps:
S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, sampling 1/8, 1/16 and 1/32 times of the images through a main network of the YOLOv network models, extracting to obtain three effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by utilizing the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;
S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;
s23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;
s24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;
S25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;
S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;
S27, repeating the steps S24-S26 until convergence is achieved, and finishing YOLOv network model training;
The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;
The step S21 specifically includes the following steps:
S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling, outputting three feature layers with different scales, respectively setting the three feature layers as M5, M4 and M3 from small to large according to the scales, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;
S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;
S213, the light-weight detection head is utilized to adjust the channel number of the three output feature graphs P5, P4 and P3 to be 3 (5+num_class), and N prediction feature graphs are obtained, wherein num_class represents the category number.
6. The diaper image detection method based on the improved YOLOv network model of claim 5, wherein: step S212 specifically includes the following steps:
S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;
S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;
S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;
S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5.
7. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S22 specifically includes the following steps:
S221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;
S222, covering K/N priori frames of the corresponding prediction feature map on each anchor point;
S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.
8. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S23 specifically includes the following steps:
S231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:
;
in the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;
s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:
;
;
;
in the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; /(I)A confidence penalty for classification;
S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:
;
in the method, in the process of the invention, Is the total loss value;
S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.
9. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S24, inputting YOLOv a diaper image of the whole training set into a YOLOv network model in one epoch for forward propagation and reverse optimization of network parameters;
in step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.
10. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S3 specifically comprises the following steps:
S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;
S32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;
S33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;
And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410350445.6A CN117953350B (en) | 2024-03-26 | 2024-03-26 | Diaper image detection method based on improved YOLOv network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410350445.6A CN117953350B (en) | 2024-03-26 | 2024-03-26 | Diaper image detection method based on improved YOLOv network model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117953350A true CN117953350A (en) | 2024-04-30 |
CN117953350B CN117953350B (en) | 2024-06-11 |
Family
ID=90796545
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410350445.6A Active CN117953350B (en) | 2024-03-26 | 2024-03-26 | Diaper image detection method based on improved YOLOv network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117953350B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118447023A (en) * | 2024-07-08 | 2024-08-06 | 合肥工业大学 | Method, device and system for detecting embedded part and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150090599A1 (en) * | 2013-10-02 | 2015-04-02 | Tel Nexx, Inc. | Insoluble Anode With a Plurality of Switchable Conductive Elements Used to Control Current Density in a Plating Bath |
WO2021198243A1 (en) * | 2020-03-30 | 2021-10-07 | Carl Zeiss Ag | Method for virtually staining a tissue sample and a device for tissue analysis |
WO2022213307A1 (en) * | 2021-04-07 | 2022-10-13 | Nokia Shanghai Bell Co., Ltd. | Adaptive convolutional neural network for object detection |
CN116612427A (en) * | 2023-05-08 | 2023-08-18 | 福州大学 | Intensive pedestrian detection system based on improved lightweight YOLOv7 |
CN117079163A (en) * | 2023-08-25 | 2023-11-17 | 杭州智元研究院有限公司 | Aerial image small target detection method based on improved YOLOX-S |
CN117095391A (en) * | 2023-09-05 | 2023-11-21 | 新疆农业大学 | Lightweight apple target detection method |
CN117274774A (en) * | 2023-09-20 | 2023-12-22 | 哈尔滨理工大学 | Yolov 7-based X-ray security inspection image dangerous goods detection algorithm |
CN117372684A (en) * | 2023-11-13 | 2024-01-09 | 南京邮电大学 | Target detection method based on improved YOLOv5s network model |
CN117408970A (en) * | 2023-10-27 | 2024-01-16 | 太原科技大学 | Semantic segmentation-based method for polishing surface defects of medium plate by robot |
CN117523394A (en) * | 2023-11-09 | 2024-02-06 | 无锡学院 | SAR vessel detection method based on aggregation characteristic enhancement network |
CN117542082A (en) * | 2023-11-28 | 2024-02-09 | 浙江理工大学 | Pedestrian detection method based on YOLOv7 |
CN117557493A (en) * | 2023-08-30 | 2024-02-13 | 四川轻化工大学 | Transformer oil leakage detection method, system, electronic equipment and storage medium |
US20240062533A1 (en) * | 2022-03-29 | 2024-02-22 | Zhejiang Lab | Visual enhancement method and system based on fusion of spatially aligned features of multiple networked vehicles |
-
2024
- 2024-03-26 CN CN202410350445.6A patent/CN117953350B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150090599A1 (en) * | 2013-10-02 | 2015-04-02 | Tel Nexx, Inc. | Insoluble Anode With a Plurality of Switchable Conductive Elements Used to Control Current Density in a Plating Bath |
WO2021198243A1 (en) * | 2020-03-30 | 2021-10-07 | Carl Zeiss Ag | Method for virtually staining a tissue sample and a device for tissue analysis |
WO2022213307A1 (en) * | 2021-04-07 | 2022-10-13 | Nokia Shanghai Bell Co., Ltd. | Adaptive convolutional neural network for object detection |
US20240062533A1 (en) * | 2022-03-29 | 2024-02-22 | Zhejiang Lab | Visual enhancement method and system based on fusion of spatially aligned features of multiple networked vehicles |
CN116612427A (en) * | 2023-05-08 | 2023-08-18 | 福州大学 | Intensive pedestrian detection system based on improved lightweight YOLOv7 |
CN117079163A (en) * | 2023-08-25 | 2023-11-17 | 杭州智元研究院有限公司 | Aerial image small target detection method based on improved YOLOX-S |
CN117557493A (en) * | 2023-08-30 | 2024-02-13 | 四川轻化工大学 | Transformer oil leakage detection method, system, electronic equipment and storage medium |
CN117095391A (en) * | 2023-09-05 | 2023-11-21 | 新疆农业大学 | Lightweight apple target detection method |
CN117274774A (en) * | 2023-09-20 | 2023-12-22 | 哈尔滨理工大学 | Yolov 7-based X-ray security inspection image dangerous goods detection algorithm |
CN117408970A (en) * | 2023-10-27 | 2024-01-16 | 太原科技大学 | Semantic segmentation-based method for polishing surface defects of medium plate by robot |
CN117523394A (en) * | 2023-11-09 | 2024-02-06 | 无锡学院 | SAR vessel detection method based on aggregation characteristic enhancement network |
CN117372684A (en) * | 2023-11-13 | 2024-01-09 | 南京邮电大学 | Target detection method based on improved YOLOv5s network model |
CN117542082A (en) * | 2023-11-28 | 2024-02-09 | 浙江理工大学 | Pedestrian detection method based on YOLOv7 |
Non-Patent Citations (2)
Title |
---|
RONG JIA: "Underwater Object Detection in Marine Ranching Based on Improved YOLOv8", 《MDPI》, 25 December 2023 (2023-12-25) * |
李东;张雪英;段淑斐;闫密密;: "结合语音融合特征和随机森林的构音障碍识别", 西安电子科技大学学报, no. 03, 4 December 2017 (2017-12-04) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118447023A (en) * | 2024-07-08 | 2024-08-06 | 合肥工业大学 | Method, device and system for detecting embedded part and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117953350B (en) | 2024-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN117953350B (en) | Diaper image detection method based on improved YOLOv network model | |
CN108564097B (en) | Multi-scale target detection method based on deep convolutional neural network | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
CN109492529A (en) | A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion | |
CN106228185B (en) | A kind of general image classifying and identifying system neural network based and method | |
CN109461157A (en) | Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field | |
CN110532859A (en) | Remote Sensing Target detection method based on depth evolution beta pruning convolution net | |
CN110287960A (en) | The detection recognition method of curve text in natural scene image | |
CN110263833A (en) | Based on coding-decoding structure image, semantic dividing method | |
CN108509978A (en) | The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN | |
CN108009509A (en) | Vehicle target detection method | |
CN110059698A (en) | The semantic segmentation method and system based on the dense reconstruction in edge understood for streetscape | |
CN112784736B (en) | Character interaction behavior recognition method based on multi-modal feature fusion | |
CN110097090A (en) | A kind of image fine granularity recognition methods based on multi-scale feature fusion | |
CN111488827A (en) | Crowd counting method and system based on multi-scale feature information | |
CN111145145B (en) | Image surface defect detection method based on MobileNet | |
CN104537684A (en) | Real-time moving object extraction method in static scene | |
CN112465199A (en) | Airspace situation evaluation system | |
CN110110663A (en) | A kind of age recognition methods and system based on face character | |
CN110032952A (en) | A kind of road boundary point detecting method based on deep learning | |
CN106845456A (en) | A kind of method of falling over of human body monitoring in video monitoring system | |
CN111652273A (en) | Deep learning-based RGB-D image classification method | |
CN113705655A (en) | Full-automatic classification method for three-dimensional point cloud and deep neural network model | |
CN110310298A (en) | A kind of road target real-time three-dimensional point cloud segmentation method based on cycling condition random field | |
CN112149664A (en) | Target detection method for optimizing classification and positioning tasks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |