CN117953350B

CN117953350B - Diaper image detection method based on improved YOLOv network model

Info

Publication number: CN117953350B
Application number: CN202410350445.6A
Authority: CN
Inventors: 曹凤姣; 李志彪; 顾柏军; 汪志龙
Original assignee: Hangzhou Haoyue Care Products Co ltd
Current assignee: Hangzhou Haoyue Care Products Co ltd
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-06-11
Anticipated expiration: 2044-03-26
Also published as: CN117953350A

Abstract

The invention discloses a paper diaper image detection method based on an improved YOLOv network model, which belongs to the field of image detection and comprises the following steps of: s1, collecting a diaper image data set and preprocessing; s2, training YOLOv a network model; s3, testing: inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result. According to the paper diaper image detection method based on the improved YOLOv network model, the subsequent data processing amount is reduced by preprocessing the acquired image, the detection speed is increased, and meanwhile, the detection accuracy and the robustness are improved through the improved YOLOv network model.

Description

Diaper image detection method based on improved YOLOv network model

Technical Field

The invention relates to the technical field of image detection, in particular to a paper diaper image detection method based on an improved YOLOv network model.

Background

Paper diapers are important products in the field of modern infant and adult care, the comfort and absorption properties of which are directly related to the health and quality of life of the user. Diapers generally have a variety of designs, such as animals, cartoon characters, etc., which not only increase the attractiveness of the product, but also attract the attention of the baby, making them more receptive to the use of the diaper.

The image on the diaper is required to be detected after the production of the diaper is finished so as to determine whether the phenomenon of image deletion exists, and the current diaper image detection method mainly depends on the traditional image processing technology based on deep learning, wherein YOLO (You Only Look Once) series of algorithms are paid attention to in the deep learning by the rapid and accurate characteristics of the algorithms. YOLOv7 is an improved version of the YOLO algorithm that uses an end-to-end architecture of convolutional neural networks that can detect and locate objects in images in real time.

The prior art discloses the following technique for image detection and recognition using YOLOv 7:

CN202211564267.4 discloses a bird identification method, system and medium based on improvement YOLOv7, the method steps include: acquiring a first image set, wherein the first image set is a history image in bird flight or a history image of a part of the bird which is blocked; performing motion blur data enhancement on the first image set to obtain a second image set; constructing YOLOv a model, and adding a parameter-free attention mechanism into the YOLOv model to obtain an improved YOLOv model; training the YOLOv model through the second image set to obtain an optimal improved YOLOv model; and identifying the newly acquired bird image through the optimal improvement YOLOv model to obtain the bird species.

CN202211397515.0 discloses a natural tree species identification method based on improvement YOLOv7, comprising the following steps: acquiring a natural tree species image, wherein the natural tree species image comprises: training images and test images; performing data enhancement on the training image by using a Mosaic data enhancement means to obtain an enhanced training image; constructing YOLOv a network, and improving the YOLOv network structure to obtain an improved YOLOv model; the improved YOLOv model comprises a backbone network, a detection head layer network, an attention mechanism module, rep and Conv, and four layers of feature images with different sizes are output through the detection head layer network; training the improved YOLOv model based on the training image, thereby obtaining better effect; inputting the test set image into a trained improved YOLOv model to obtain the identification result of the natural tree species.

As can be seen, the conventional pattern recognition and detection method based on YOLOv a7 has the following defects:

1. The whole acquired image is required to be preprocessed, so that the calculated amount is increased, and the detection speed is reduced;

2. accuracy and robustness are to be further improved.

Disclosure of Invention

In order to solve the problems, the invention provides a paper diaper image detection method based on an improved YOLOv network model, which reduces the subsequent data processing amount by preprocessing the acquired image, increases the detection speed, and improves the detection accuracy and robustness through the improved YOLOv network model.

In order to achieve the above object, the present invention provides a diaper image detection method based on an improved YOLOv network model, comprising the steps of:

S1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:

S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image;

s12, carrying out data enhancement on the diaper image data set to obtain an enhanced diaper image data set;

s13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;

s14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;

s2, training YOLOv a network model:

s3, testing:

Inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.

Preferably, in step S11, the diaper data set includes a plurality of diaper imagesThe corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;

In step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.

Preferably, in step S12, the diaper image dataset is data enhanced using Mosaic data enhancement in combination with Mixup data enhancement with 20% probability;

the method for enhancing the Mosaic data comprises the following steps:

Randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;

another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.

Preferably, the step S13 specifically includes the following steps:

and clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained K clustering center coordinates are used as the widths and heights of the real frames, the real frames are marked as (class, xmin, xmax, ymax), class represents the category of paper diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.

Preferably, the step S2 specifically includes the following steps:

S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, sampling 1/8, 1/16 and 1/32 times of the images through a main network of the YOLOv network models, extracting to obtain three effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by utilizing the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;

S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;

S23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;

S24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;

s25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;

S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;

s27, repeating the steps S24-S26 until convergence is achieved, and finishing YOLOv network model training;

The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;

the step S21 specifically includes the following steps:

S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling, outputting three feature layers with different scales, respectively setting the three feature layers as M5, M4 and M3 from small to large according to the scales, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;

S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;

s213, the light-weight detection head is utilized to adjust the channel number of the three output feature graphs P5, P4 and P3 to be 3 (5+num_class), and N prediction feature graphs are obtained, wherein num_class represents the category number.

Preferably, the step S212 specifically includes the following steps:

S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;

S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;

S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;

S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5.

Preferably, the step S22 specifically includes the following steps:

s221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;

s222, covering K/N priori frames of the corresponding prediction feature map on each anchor point;

S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.

Preferably, the step S23 specifically includes the following steps:

s231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:

；

In the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;

s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:

;

；

In the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)、/>The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; /(I)A confidence penalty for classification;

S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:

；

In the method, in the process of the invention, Is the total loss value;

S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.

Preferably, in step S24, the pant diaper image of the whole training set is input YOLOv into the network model in one epoch for forward propagation and reverse optimization of the network parameters;

In step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.

Preferably, the step S3 specifically includes the following steps:

S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;

s32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;

s33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;

And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.

The invention has the following beneficial effects:

On the basis of effectively detecting and identifying various pattern designs on the paper diaper, the accuracy and the robustness of paper diaper pattern detection can be improved by introducing a deeper convolutional neural network structure, an improved loss function and a data enhancement technology, and the method comprises the following steps:

(1) Through improving YOLOv algorithm, a feature filtering and purifying module is added between a main network and a feature fusion module, and a Feature Filtering and Purifying Module (FFPM) is utilized to perform new cascade fusion on the multi-scale features pre-input to the neck, so that cross-layer conflict is effectively filtered, feature learning is enhanced, and network detection accuracy is improved;

(2) Inspired by a multi-scale convolution attention module MSCA, a multi-scale feature enhancement Module (MSFE) is designed, the features after cascade fusion are further optimized, and rich guiding information is provided for shallow features;

(3) The Coordianate Attention (CA) module is introduced to be fused with the shallow layer characteristics after dimension reduction, so that the spatial position information of the paper diaper image target is reserved, and the characteristics are further enhanced;

(4) Considering the redundant connection existing in the original feature fusion module PANet, biFPN is introduced to improve the feature fusion efficiency of the network model, and BiFPN is also a bidirectional pyramid structure, as well as PANet, comprising the feature flow directions of the two branches from top to bottom and from bottom to top, so that the accuracy improvement of the network model is realized.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

FIG. 1 is a logic flow diagram of a pant diaper image detection method based on the improved YOLOv network model of the present invention;

fig. 2 is a schematic diagram of a InceptionNext module structure of a diaper image detection method based on an improved YOLOv network model according to the present invention;

FIG. 3 is a schematic structural diagram of a feature filter and purification module FFPM of the present invention based on a diaper image detection method of the improved YOLOv network model;

FIG. 4 is a schematic diagram of a channel attention mechanism (CA) of a diaper image detection method based on an improved YOLOv network model of the present invention;

FIG. 5 is a graph showing the results of the simulation experiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the embodiment of the application, are intended for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality.

It should be noted that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

As shown in fig. 1-4, the diaper image detection method based on the improved YOLOv network model comprises the following steps:

S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image, and using the labeling file as a reference standard in the training process, and calculating the loss value of a network and the performance of an evaluation model;

in step S11, the pant diaper dataset comprises a plurality of pant diaper images The corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;

S12, data enhancement is carried out on the diaper image data set, so that the enhanced diaper image data set is obtained, the diversity and the richness of the data are increased, and the generalization capability of the model is improved;

In step S12, data enhancement is performed on the diaper image dataset using mosaics data enhancement in combination with Mixup data enhancement with 20% probability;

the method for enhancing the Mosaic data comprises the following steps:

Preferably, the step S13 specifically includes the following steps:

And clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained I clustering center coordinates are taken as the widths and heights of the real frames (the I value is generally taken as 9), the real frames are marked as (class, xmin, xmax, ymax), class represents the types of diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.

S2, training YOLOv a network model:

In the embodiment, the training environment of the network is Python 3.8 and deep learning framework PyTorch 1.8.8, and CUDA is used for acceleration; the learning rate adjustment strategy is cosine annealing attenuation, and the initial learning rate is set to be 0.0001; setting the training epoch of the network to 300; setting a momentum parameter of the network to 0.937;

the step S2 specifically comprises the following steps:

S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, extracting through a main network of the YOLOv network models to obtain effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;

the step S21 specifically includes the following steps:

S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling by 1/8, 1/16 and 1/32 times, outputting three feature layers with different scales, respectively setting the feature layers as M5, M4 and M3 from small to large, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;

Step S212 specifically includes the following steps:

The step S22 specifically includes the following steps:

S222, covering K/N prior frames of the corresponding prediction feature map on each anchor point, wherein the description is/represent division;

the step S23 specifically includes the following steps:

；

;

；

In the method, in the process of the invention, Is the total loss value;

In S231-S234, the merging ratio (Intersection over Union, ioU) loss is calculated according to the predicted frames and the corresponding GT frames, the classification confidence loss and the frame confidence loss are calculated according to the classification confidence and the frame confidence of each predicted frame included in the network output feature map, the merging ratio loss, the classification confidence loss and the frame confidence loss are weighted and summed according to a preset proportion to obtain the network overall loss, and the network parameters are optimized by back propagation.

in step S24, inputting YOLOv a diaper image of the whole training set into a YOLOv network model in one epoch for forward propagation and reverse optimization of network parameters;

S27, repeating the steps S24-S26 until convergence, and finishing the training of the YOLOv network model, wherein in the embodiment, the network convergence is judged through continuous multi-round AP value invariance or occurrence of a descending trend, and the performance of the YOLOv network model on the verification set is indicated to reach a stable level at the moment, so that the training is determined to be finished;

s3, testing:

The step S3 specifically comprises the following steps:

Simulation experiment

To verify the performance of the method proposed in this embodiment, the image in the test set is predicted using the modified YOLOv network, and the average accuracy mean mAP (MEAN AVERAGE accuracy) and the accuracy (accuracy) and Recall (Recall) for each category are calculated using the prediction result and GT. As shown in fig. 5, the present invention can detect various patterns of diapers, and has high accuracy.

Therefore, the paper diaper image detection method based on the improved YOLOv network model solves the problems of poor detection effect, low detection speed and the like of a small target in a paper diaper pattern detection task by introducing a quick attention mechanism at a specific position and adopting a high-efficiency detection head and other structures based on depth convolution, has larger advantages in detection precision and efficiency index compared with the existing advanced detection model, and well meets the real-time requirement in a practical scene.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims

1. The diaper image detection method based on the improved YOLOv network model is characterized by comprising the following steps of: the method comprises the following steps:

s2, training YOLOv a network model:

the step S2 specifically comprises the following steps:

the step S21 specifically includes the following steps:

Step S212 specifically includes the following steps:

S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5;

S213, adjusting the channel number of the three output feature graphs P5, P4 and P3 to 3 (5+num_class) by using a lightweight detection head, and obtaining N prediction feature graphs, wherein num_class represents the number of categories;

s3, testing:

2. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S11, the pant diaper dataset comprises a plurality of pant diaper imagesThe corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;

3. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S12, data enhancement is performed on the diaper image dataset using mosaics data enhancement in combination with Mixup data enhancement with 20% probability;

the method for enhancing the Mosaic data comprises the following steps:

4. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S13 specifically includes the following steps:

5. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S22 specifically includes the following steps:

6. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S23 specifically includes the following steps:

；

In the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; /(I)A confidence penalty for classification;

；

In the method, in the process of the invention, Is the total loss value;

7. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S24, inputting YOLOv a diaper image of the whole training set into a YOLOv network model in one epoch for forward propagation and reverse optimization of network parameters;

8. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S3 specifically comprises the following steps: