CN117953350B - Diaper image detection method based on improved YOLOv network model - Google Patents

Diaper image detection method based on improved YOLOv network model Download PDF

Info

Publication number
CN117953350B
CN117953350B CN202410350445.6A CN202410350445A CN117953350B CN 117953350 B CN117953350 B CN 117953350B CN 202410350445 A CN202410350445 A CN 202410350445A CN 117953350 B CN117953350 B CN 117953350B
Authority
CN
China
Prior art keywords
diaper
yolov
feature
prediction
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410350445.6A
Other languages
Chinese (zh)
Other versions
CN117953350A (en
Inventor
曹凤姣
李志彪
顾柏军
汪志龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Haoyue Care Products Co ltd
Original Assignee
Hangzhou Haoyue Care Products Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Haoyue Care Products Co ltd filed Critical Hangzhou Haoyue Care Products Co ltd
Priority to CN202410350445.6A priority Critical patent/CN117953350B/en
Publication of CN117953350A publication Critical patent/CN117953350A/en
Application granted granted Critical
Publication of CN117953350B publication Critical patent/CN117953350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a paper diaper image detection method based on an improved YOLOv network model, which belongs to the field of image detection and comprises the following steps of: s1, collecting a diaper image data set and preprocessing; s2, training YOLOv a network model; s3, testing: inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result. According to the paper diaper image detection method based on the improved YOLOv network model, the subsequent data processing amount is reduced by preprocessing the acquired image, the detection speed is increased, and meanwhile, the detection accuracy and the robustness are improved through the improved YOLOv network model.

Description

Diaper image detection method based on improved YOLOv network model
Technical Field
The invention relates to the technical field of image detection, in particular to a paper diaper image detection method based on an improved YOLOv network model.
Background
Paper diapers are important products in the field of modern infant and adult care, the comfort and absorption properties of which are directly related to the health and quality of life of the user. Diapers generally have a variety of designs, such as animals, cartoon characters, etc., which not only increase the attractiveness of the product, but also attract the attention of the baby, making them more receptive to the use of the diaper.
The image on the diaper is required to be detected after the production of the diaper is finished so as to determine whether the phenomenon of image deletion exists, and the current diaper image detection method mainly depends on the traditional image processing technology based on deep learning, wherein YOLO (You Only Look Once) series of algorithms are paid attention to in the deep learning by the rapid and accurate characteristics of the algorithms. YOLOv7 is an improved version of the YOLO algorithm that uses an end-to-end architecture of convolutional neural networks that can detect and locate objects in images in real time.
The prior art discloses the following technique for image detection and recognition using YOLOv 7:
CN202211564267.4 discloses a bird identification method, system and medium based on improvement YOLOv7, the method steps include: acquiring a first image set, wherein the first image set is a history image in bird flight or a history image of a part of the bird which is blocked; performing motion blur data enhancement on the first image set to obtain a second image set; constructing YOLOv a model, and adding a parameter-free attention mechanism into the YOLOv model to obtain an improved YOLOv model; training the YOLOv model through the second image set to obtain an optimal improved YOLOv model; and identifying the newly acquired bird image through the optimal improvement YOLOv model to obtain the bird species.
CN202211397515.0 discloses a natural tree species identification method based on improvement YOLOv7, comprising the following steps: acquiring a natural tree species image, wherein the natural tree species image comprises: training images and test images; performing data enhancement on the training image by using a Mosaic data enhancement means to obtain an enhanced training image; constructing YOLOv a network, and improving the YOLOv network structure to obtain an improved YOLOv model; the improved YOLOv model comprises a backbone network, a detection head layer network, an attention mechanism module, rep and Conv, and four layers of feature images with different sizes are output through the detection head layer network; training the improved YOLOv model based on the training image, thereby obtaining better effect; inputting the test set image into a trained improved YOLOv model to obtain the identification result of the natural tree species.
As can be seen, the conventional pattern recognition and detection method based on YOLOv a7 has the following defects:
1. The whole acquired image is required to be preprocessed, so that the calculated amount is increased, and the detection speed is reduced;
2. accuracy and robustness are to be further improved.
Disclosure of Invention
In order to solve the problems, the invention provides a paper diaper image detection method based on an improved YOLOv network model, which reduces the subsequent data processing amount by preprocessing the acquired image, increases the detection speed, and improves the detection accuracy and robustness through the improved YOLOv network model.
In order to achieve the above object, the present invention provides a diaper image detection method based on an improved YOLOv network model, comprising the steps of:
S1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:
S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image;
s12, carrying out data enhancement on the diaper image data set to obtain an enhanced diaper image data set;
s13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;
s14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;
s2, training YOLOv a network model:
s3, testing:
Inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.
Preferably, in step S11, the diaper data set includes a plurality of diaper imagesThe corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;
In step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.
Preferably, in step S12, the diaper image dataset is data enhanced using Mosaic data enhancement in combination with Mixup data enhancement with 20% probability;
the method for enhancing the Mosaic data comprises the following steps:
Randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;
another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.
Preferably, the step S13 specifically includes the following steps:
and clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained K clustering center coordinates are used as the widths and heights of the real frames, the real frames are marked as (class, xmin, xmax, ymax), class represents the category of paper diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.
Preferably, the step S2 specifically includes the following steps:
S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, sampling 1/8, 1/16 and 1/32 times of the images through a main network of the YOLOv network models, extracting to obtain three effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by utilizing the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;
S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;
S23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;
S24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;
s25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;
S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;
s27, repeating the steps S24-S26 until convergence is achieved, and finishing YOLOv network model training;
The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;
the step S21 specifically includes the following steps:
S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling, outputting three feature layers with different scales, respectively setting the three feature layers as M5, M4 and M3 from small to large according to the scales, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;
S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;
s213, the light-weight detection head is utilized to adjust the channel number of the three output feature graphs P5, P4 and P3 to be 3 (5+num_class), and N prediction feature graphs are obtained, wherein num_class represents the category number.
Preferably, the step S212 specifically includes the following steps:
S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;
S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;
S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;
S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5.
Preferably, the step S22 specifically includes the following steps:
s221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;
s222, covering K/N priori frames of the corresponding prediction feature map on each anchor point;
S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.
Preferably, the step S23 specifically includes the following steps:
s231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;
s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:
;
;
In the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)、/>The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; /(I)A confidence penalty for classification;
S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, Is the total loss value;
S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.
Preferably, in step S24, the pant diaper image of the whole training set is input YOLOv into the network model in one epoch for forward propagation and reverse optimization of the network parameters;
In step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.
Preferably, the step S3 specifically includes the following steps:
S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;
s32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;
s33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;
And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.
The invention has the following beneficial effects:
On the basis of effectively detecting and identifying various pattern designs on the paper diaper, the accuracy and the robustness of paper diaper pattern detection can be improved by introducing a deeper convolutional neural network structure, an improved loss function and a data enhancement technology, and the method comprises the following steps:
(1) Through improving YOLOv algorithm, a feature filtering and purifying module is added between a main network and a feature fusion module, and a Feature Filtering and Purifying Module (FFPM) is utilized to perform new cascade fusion on the multi-scale features pre-input to the neck, so that cross-layer conflict is effectively filtered, feature learning is enhanced, and network detection accuracy is improved;
(2) Inspired by a multi-scale convolution attention module MSCA, a multi-scale feature enhancement Module (MSFE) is designed, the features after cascade fusion are further optimized, and rich guiding information is provided for shallow features;
(3) The Coordianate Attention (CA) module is introduced to be fused with the shallow layer characteristics after dimension reduction, so that the spatial position information of the paper diaper image target is reserved, and the characteristics are further enhanced;
(4) Considering the redundant connection existing in the original feature fusion module PANet, biFPN is introduced to improve the feature fusion efficiency of the network model, and BiFPN is also a bidirectional pyramid structure, as well as PANet, comprising the feature flow directions of the two branches from top to bottom and from bottom to top, so that the accuracy improvement of the network model is realized.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
FIG. 1 is a logic flow diagram of a pant diaper image detection method based on the improved YOLOv network model of the present invention;
fig. 2 is a schematic diagram of a InceptionNext module structure of a diaper image detection method based on an improved YOLOv network model according to the present invention;
FIG. 3 is a schematic structural diagram of a feature filter and purification module FFPM of the present invention based on a diaper image detection method of the improved YOLOv network model;
FIG. 4 is a schematic diagram of a channel attention mechanism (CA) of a diaper image detection method based on an improved YOLOv network model of the present invention;
FIG. 5 is a graph showing the results of the simulation experiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the embodiment of the application, are intended for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. Examples of the embodiments are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements throughout or elements having like or similar functionality.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
As shown in fig. 1-4, the diaper image detection method based on the improved YOLOv network model comprises the following steps:
S1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:
S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image, and using the labeling file as a reference standard in the training process, and calculating the loss value of a network and the performance of an evaluation model;
in step S11, the pant diaper dataset comprises a plurality of pant diaper images The corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;
S12, data enhancement is carried out on the diaper image data set, so that the enhanced diaper image data set is obtained, the diversity and the richness of the data are increased, and the generalization capability of the model is improved;
In step S12, data enhancement is performed on the diaper image dataset using mosaics data enhancement in combination with Mixup data enhancement with 20% probability;
the method for enhancing the Mosaic data comprises the following steps:
Randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;
another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.
S13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;
Preferably, the step S13 specifically includes the following steps:
And clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained I clustering center coordinates are taken as the widths and heights of the real frames (the I value is generally taken as 9), the real frames are marked as (class, xmin, xmax, ymax), class represents the types of diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.
S14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;
In step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.
S2, training YOLOv a network model:
In the embodiment, the training environment of the network is Python 3.8 and deep learning framework PyTorch 1.8.8, and CUDA is used for acceleration; the learning rate adjustment strategy is cosine annealing attenuation, and the initial learning rate is set to be 0.0001; setting the training epoch of the network to 300; setting a momentum parameter of the network to 0.937;
The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;
the step S2 specifically comprises the following steps:
S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, extracting through a main network of the YOLOv network models to obtain effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;
the step S21 specifically includes the following steps:
S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling by 1/8, 1/16 and 1/32 times, outputting three feature layers with different scales, respectively setting the feature layers as M5, M4 and M3 from small to large, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;
S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;
Step S212 specifically includes the following steps:
S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;
S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;
S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;
S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5.
S213, the light-weight detection head is utilized to adjust the channel number of the three output feature graphs P5, P4 and P3 to be 3 (5+num_class), and N prediction feature graphs are obtained, wherein num_class represents the category number.
S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;
The step S22 specifically includes the following steps:
s221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;
S222, covering K/N prior frames of the corresponding prediction feature map on each anchor point, wherein the description is/represent division;
S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.
S23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;
the step S23 specifically includes the following steps:
s231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;
s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:
;
;
In the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)、/>The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; /(I)A confidence penalty for classification;
S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, Is the total loss value;
S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.
In S231-S234, the merging ratio (Intersection over Union, ioU) loss is calculated according to the predicted frames and the corresponding GT frames, the classification confidence loss and the frame confidence loss are calculated according to the classification confidence and the frame confidence of each predicted frame included in the network output feature map, the merging ratio loss, the classification confidence loss and the frame confidence loss are weighted and summed according to a preset proportion to obtain the network overall loss, and the network parameters are optimized by back propagation.
S24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;
in step S24, inputting YOLOv a diaper image of the whole training set into a YOLOv network model in one epoch for forward propagation and reverse optimization of network parameters;
s25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;
In step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.
S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;
S27, repeating the steps S24-S26 until convergence, and finishing the training of the YOLOv network model, wherein in the embodiment, the network convergence is judged through continuous multi-round AP value invariance or occurrence of a descending trend, and the performance of the YOLOv network model on the verification set is indicated to reach a stable level at the moment, so that the training is determined to be finished;
s3, testing:
Inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.
The step S3 specifically comprises the following steps:
S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;
s32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;
s33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;
And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.
Simulation experiment
To verify the performance of the method proposed in this embodiment, the image in the test set is predicted using the modified YOLOv network, and the average accuracy mean mAP (MEAN AVERAGE accuracy) and the accuracy (accuracy) and Recall (Recall) for each category are calculated using the prediction result and GT. As shown in fig. 5, the present invention can detect various patterns of diapers, and has high accuracy.
Therefore, the paper diaper image detection method based on the improved YOLOv network model solves the problems of poor detection effect, low detection speed and the like of a small target in a paper diaper pattern detection task by introducing a quick attention mechanism at a specific position and adopting a high-efficiency detection head and other structures based on depth convolution, has larger advantages in detection precision and efficiency index compared with the existing advanced detection model, and well meets the real-time requirement in a practical scene.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention and not for limiting it, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that: the technical scheme of the invention can be modified or replaced by the same, and the modified technical scheme cannot deviate from the spirit and scope of the technical scheme of the invention.

Claims (8)

1. The diaper image detection method based on the improved YOLOv network model is characterized by comprising the following steps of: the method comprises the following steps:
S1, collecting paper diaper images, forming a paper diaper image data set, and preprocessing:
S11, adding a labeling file for labeling the type and the position of each paper diaper image as a real frame of the paper diaper image;
s12, carrying out data enhancement on the diaper image data set to obtain an enhanced diaper image data set;
s13, clustering real frames of all diaper images in the reinforced diaper image dataset by using a Kmeans clustering algorithm to obtain K prior frames;
s14, dividing the enhanced diaper image data set into a training set, a verification set and a test set according to a preset proportion;
s2, training YOLOv a network model:
the step S2 specifically comprises the following steps:
S21, randomly selecting X paper diaper images in a training set, inputting YOLOv network models, sampling 1/8, 1/16 and 1/32 times of the images through a main network of the YOLOv network models, extracting to obtain three effective feature images with different scales, inputting the effective feature images into an improved feature aggregation module, and further fusing the effective feature images by utilizing the improved feature aggregation module to capture global semantic information and simultaneously generating three prediction feature images with different scales;
S22, uniformly distributing K prior frames onto three prediction feature graphs in advance according to the scale, and adjusting the corresponding prior frames according to anchor point information on the prediction feature graphs to obtain prediction frames;
S23, calculating YOLOv a loss value of the network model by using the prediction frame and a real frame corresponding to the diaper image, and evaluating the difference between the prediction frame and the real frame;
S24, updating YOLOv parameters of the network model according to the loss value, and performing training set iteration until all paper diaper images in the training set are input into the YOLOv network model once;
s25, inputting the diaper images in the verification set into a YOLOv network model after training, and predicting each diaper image in the verification set by using the YOLOv network model to obtain a prediction frame of the verification set;
S26, counting the average precision value of each class of paper diaper images according to the prediction frames of the verification set and the corresponding real frames;
s27, repeating the steps S24-S26 until convergence is achieved, and finishing YOLOv network model training;
The YOLOv network model in step S2 includes a backbone network CSPDARKNET-53, a feature filtering and purifying module FFPM, an improved feature fusion module, and a lightweight detection head, which are sequentially arranged; in a modified feature fusion module, PANet is replaced with BiFPN; the backbone network CSPDARKNET-53 includes an ELAN module, an MP module, a SPPCSPC module, a C2f module, a InceptionNext module, and a multi-scale feature enhancement module MSFE, introducing a channel attention mechanism in the SPPCSPC module of the backbone network CSPDARKNET-53; in lightweight detection heads, a 7×7 depth convolution replaces the 3×3 convolution Kernel and introduces a Selective-and-Kernel attention mechanism; the activation function of YOLOv network model is changed from Mish to Hard-Swish;
the step S21 specifically includes the following steps:
S211, randomly selecting X paper diaper images in a training set, inputting the X paper diaper images into a backbone network CSPDARKNET-53, carrying out step-by-step feature extraction, taking out three effective feature images with different scales and channel numbers from the three effective feature images, simultaneously carrying out downsampling, outputting three feature layers with different scales, respectively setting the three feature layers as M5, M4 and M3 from small to large according to the scales, inputting the feature layers into a feature filtering and purifying module FFPM, filtering cross-layer conflicts, and outputting three effective feature images with different scales L5, L4 and L3;
S212, inputting the three effective feature graphs L5, L4 and L3 into an improved feature fusion module for further fusion, and gradually fusing the features of deep layers and shallow layers through non-adjacent feature layers to output feature graphs P5, P4 and P3 with output scales consistent with those of the input effective feature graphs L5, L4 and L3;
Step S212 specifically includes the following steps:
S2121, inputting an effective feature map M5 into a SPPCSPC module to obtain a feature map K5, upsampling the feature map K5, inputting a multi-scale feature enhancement module MSFE to be fused with the effective feature map M4, and inputting a fusion result into a InceptionNext module to obtain a feature map K4;
S2122, up-sampling the feature map K4, then inputting a multi-scale feature enhancement module MSFE and a feature map M3 for fusion, and inputting a fusion result into a InceptionNext module to obtain a shallowest layer output feature map P3;
S2123, inputting the output feature map P3 into a C2f module, downsampling, inputting a multi-scale feature enhancement module MSFE, fusing with the effective feature map M4 and the feature map K4, and inputting a fused result into a YOLOv network model InceptionNext module to obtain an intermediate layer output feature map P4;
S2124, inputting the output feature map P4 into a C2f module, downsampling, fusing with a feature map K5, and inputting a fusion result into a InceptionNext module to obtain the deepest output feature map P5;
S213, adjusting the channel number of the three output feature graphs P5, P4 and P3 to 3 (5+num_class) by using a lightweight detection head, and obtaining N prediction feature graphs, wherein num_class represents the number of categories;
s3, testing:
Inputting the diaper images in the test set into a YOLOv network model after training, predicting all the diaper images in the test set by utilizing the YOLOv network model to obtain a prediction frame on a prediction feature map, mapping the prediction frame onto an original image through the proportional relation between the prediction feature map and the original image, positioning the patterns on the diaper, and obtaining a final diaper image detection result.
2. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S11, the pant diaper dataset comprises a plurality of pant diaper imagesThe corresponding labeling file is a txt format file for recording the position information and the category information of the target in the diaper image, wherein the size of the diaper image is 1024 multiplied by 1024 pixels;
In step S14, the dividing ratios of the training set, the verification set and the test set are 80%, 10% and 10%, and the train. Txt, val. Txt and test. Txt files are generated respectively and saved to the corresponding image list.
3. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S12, data enhancement is performed on the diaper image dataset using mosaics data enhancement in combination with Mixup data enhancement with 20% probability;
the method for enhancing the Mosaic data comprises the following steps:
Randomly selecting 4 paper diaper images, enhancing and combining the paper diaper images by utilizing Mosaic data to form a new image, and taking the new image as new training data;
another image was randomly selected and blended with the original image with 20% probability using Mixup data enhancement to generate new training data.
4. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S13 specifically includes the following steps:
and clustering the widths and heights of all real frames in the training set by using a Kmeans clustering algorithm, wherein the obtained K clustering center coordinates are used as the widths and heights of the real frames, the real frames are marked as (class, xmin, xmax, ymax), class represents the category of paper diaper images contained in the real frames, xmin and ymin respectively represent the x coordinate and the y coordinate of the top left corner vertex of the real frames, and xmax and ymax respectively represent the x coordinate and the y coordinate of the bottom right corner vertex of the real frames.
5. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S22 specifically includes the following steps:
s221, sequencing the K prior frames generated in the step S13 according to the size of the scale, uniformly distributing the K prior frames to the generated N prediction feature maps, dividing each prediction feature map into H multiplied by W grids, and setting an anchor point in the center of each grid unit;
s222, covering K/N priori frames of the corresponding prediction feature map on each anchor point;
S223, each anchor point on the prediction feature map corresponds to a vector with the length of 3 x (5+num_class), a one-dimensional adjustment vector with the length of 5+num_class for each prior frame is obtained by carrying out dimension splitting on the vector, width and height adjustment information, center point coordinate adjustment information and frame confidence adjustment information of the corresponding prior frame are obtained, and the position and the size of the prior frame are adjusted through the adjustment information, so that the corresponding prediction frame is obtained.
6. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S23 specifically includes the following steps:
s231, comparing each prediction frame with a corresponding real frame, and calculating the cross ratio loss, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, Is the cross ratio loss value; /(I)Is the cross-over ratio;
s232, in the output feature map, calculating the classification confidence coefficient and the frame confidence coefficient of each prediction frame, and further obtaining classification confidence coefficient loss and frame confidence coefficient loss, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, 、/>、/>、/>Respectively representing the left upper corner coordinates of the frame, the width of the frame and the height of the frame; /(I)Representing a class true value, wherein the value is 0 or 1; /(I)Representing the probability of the image category of the paper diaper; /(I)Loss of confidence for the frame; /(I)The upper left corner coordinates of the prediction frame; /(I)Is the height of the prediction frame; /(I)Is the width of the prediction frame; /(I)For classification confidence; /(I)A confidence penalty for classification;
S233, multiplying the classification confidence coefficient loss and the frame confidence coefficient loss by preset corresponding proportions respectively, and then adding to obtain a total loss value, wherein the specific calculation formula is as follows:
In the method, in the process of the invention, Is the total loss value;
S234, adjusting YOLOv network model parameters by using a back propagation algorithm, and minimizing the total loss value.
7. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: in step S24, inputting YOLOv a diaper image of the whole training set into a YOLOv network model in one epoch for forward propagation and reverse optimization of network parameters;
In step S25, after each epoch is completed, each diaper image in the verification set is predicted using the updated YOLOv network model parameters.
8. The diaper image detection method based on the improved YOLOv network model according to claim 1, wherein: the step S3 specifically comprises the following steps:
S31, YOLOv network models output N prediction feature graphs corresponding to each paper diaper image;
s32, on each prediction feature map, adjusting the prior frames according to the adjustment vector corresponding to each anchor point to obtain all prediction frames of each paper diaper image;
s33, removing redundant prediction frames by using a non-maximum suppression method to obtain prediction frames on a prediction feature map;
And S34, mapping the prediction frame on the prediction feature map onto the scale of the original image according to the proportional relation to obtain a final prediction frame.
CN202410350445.6A 2024-03-26 2024-03-26 Diaper image detection method based on improved YOLOv network model Active CN117953350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410350445.6A CN117953350B (en) 2024-03-26 2024-03-26 Diaper image detection method based on improved YOLOv network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410350445.6A CN117953350B (en) 2024-03-26 2024-03-26 Diaper image detection method based on improved YOLOv network model

Publications (2)

Publication Number Publication Date
CN117953350A CN117953350A (en) 2024-04-30
CN117953350B true CN117953350B (en) 2024-06-11

Family

ID=90796545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410350445.6A Active CN117953350B (en) 2024-03-26 2024-03-26 Diaper image detection method based on improved YOLOv network model

Country Status (1)

Country Link
CN (1) CN117953350B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021198243A1 (en) * 2020-03-30 2021-10-07 Carl Zeiss Ag Method for virtually staining a tissue sample and a device for tissue analysis
WO2022213307A1 (en) * 2021-04-07 2022-10-13 Nokia Shanghai Bell Co., Ltd. Adaptive convolutional neural network for object detection
CN116612427A (en) * 2023-05-08 2023-08-18 福州大学 Intensive pedestrian detection system based on improved lightweight YOLOv7
CN117079163A (en) * 2023-08-25 2023-11-17 杭州智元研究院有限公司 Aerial image small target detection method based on improved YOLOX-S
CN117095391A (en) * 2023-09-05 2023-11-21 新疆农业大学 Lightweight apple target detection method
CN117274774A (en) * 2023-09-20 2023-12-22 哈尔滨理工大学 Yolov 7-based X-ray security inspection image dangerous goods detection algorithm
CN117372684A (en) * 2023-11-13 2024-01-09 南京邮电大学 Target detection method based on improved YOLOv5s network model
CN117408970A (en) * 2023-10-27 2024-01-16 太原科技大学 Semantic segmentation-based method for polishing surface defects of medium plate by robot
CN117523394A (en) * 2023-11-09 2024-02-06 无锡学院 SAR vessel detection method based on aggregation characteristic enhancement network
CN117542082A (en) * 2023-11-28 2024-02-09 浙江理工大学 Pedestrian detection method based on YOLOv7
CN117557493A (en) * 2023-08-30 2024-02-13 四川轻化工大学 Transformer oil leakage detection method, system, electronic equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150090599A1 (en) * 2013-10-02 2015-04-02 Tel Nexx, Inc. Insoluble Anode With a Plurality of Switchable Conductive Elements Used to Control Current Density in a Plating Bath
CN114419605B (en) * 2022-03-29 2022-07-19 之江实验室 Visual enhancement method and system based on multi-network vehicle-connected space alignment feature fusion

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021198243A1 (en) * 2020-03-30 2021-10-07 Carl Zeiss Ag Method for virtually staining a tissue sample and a device for tissue analysis
WO2022213307A1 (en) * 2021-04-07 2022-10-13 Nokia Shanghai Bell Co., Ltd. Adaptive convolutional neural network for object detection
CN116612427A (en) * 2023-05-08 2023-08-18 福州大学 Intensive pedestrian detection system based on improved lightweight YOLOv7
CN117079163A (en) * 2023-08-25 2023-11-17 杭州智元研究院有限公司 Aerial image small target detection method based on improved YOLOX-S
CN117557493A (en) * 2023-08-30 2024-02-13 四川轻化工大学 Transformer oil leakage detection method, system, electronic equipment and storage medium
CN117095391A (en) * 2023-09-05 2023-11-21 新疆农业大学 Lightweight apple target detection method
CN117274774A (en) * 2023-09-20 2023-12-22 哈尔滨理工大学 Yolov 7-based X-ray security inspection image dangerous goods detection algorithm
CN117408970A (en) * 2023-10-27 2024-01-16 太原科技大学 Semantic segmentation-based method for polishing surface defects of medium plate by robot
CN117523394A (en) * 2023-11-09 2024-02-06 无锡学院 SAR vessel detection method based on aggregation characteristic enhancement network
CN117372684A (en) * 2023-11-13 2024-01-09 南京邮电大学 Target detection method based on improved YOLOv5s network model
CN117542082A (en) * 2023-11-28 2024-02-09 浙江理工大学 Pedestrian detection method based on YOLOv7

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Underwater Object Detection in Marine Ranching Based on Improved YOLOv8;Rong Jia;《MDPI》;20231225;全文 *
结合语音融合特征和随机森林的构音障碍识别;李东;张雪英;段淑斐;闫密密;;西安电子科技大学学报;20171204(第03期);全文 *

Also Published As

Publication number Publication date
CN117953350A (en) 2024-04-30

Similar Documents

Publication Publication Date Title
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN109492529A (en) A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN109461157A (en) Image, semantic dividing method based on multi-stage characteristics fusion and Gauss conditions random field
CN110532859A (en) Remote Sensing Target detection method based on depth evolution beta pruning convolution net
CN110287960A (en) The detection recognition method of curve text in natural scene image
CN108549893A (en) A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN108509978A (en) The multi-class targets detection method and model of multi-stage characteristics fusion based on CNN
CN110443805A (en) A kind of semantic segmentation method spent closely based on pixel
CN110516539A (en) Remote sensing image building extracting method, system, storage medium and equipment based on confrontation network
CN112784736B (en) Character interaction behavior recognition method based on multi-modal feature fusion
CN110097090A (en) A kind of image fine granularity recognition methods based on multi-scale feature fusion
CN107967474A (en) A kind of sea-surface target conspicuousness detection method based on convolutional neural networks
CN111488827A (en) Crowd counting method and system based on multi-scale feature information
CN104537684A (en) Real-time moving object extraction method in static scene
CN112465199A (en) Airspace situation evaluation system
CN111652273A (en) Deep learning-based RGB-D image classification method
CN110032952A (en) A kind of road boundary point detecting method based on deep learning
CN113705655A (en) Full-automatic classification method for three-dimensional point cloud and deep neural network model
CN110310298A (en) A kind of road target real-time three-dimensional point cloud segmentation method based on cycling condition random field
CN112149664A (en) Target detection method for optimizing classification and positioning tasks
CN110363099A (en) A kind of expression recognition method based on local parallel deep neural network
CN116052271A (en) Real-time smoking detection method and device based on CenterNet
CN114758382A (en) Face AU detection model establishing method and application based on adaptive patch learning
CN114494777A (en) Hyperspectral image classification method and system based on 3D CutMix-transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant