CN117475434A - Construction method of improved YOLOv5s model, small target detection method and system - Google Patents

Construction method of improved YOLOv5s model, small target detection method and system Download PDF

Info

Publication number
CN117475434A
CN117475434A CN202311651968.6A CN202311651968A CN117475434A CN 117475434 A CN117475434 A CN 117475434A CN 202311651968 A CN202311651968 A CN 202311651968A CN 117475434 A CN117475434 A CN 117475434A
Authority
CN
China
Prior art keywords
yolov5s
module
attention
model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311651968.6A
Other languages
Chinese (zh)
Inventor
姜佩贺
李益
关姝睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yantai University
Original Assignee
Yantai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yantai University filed Critical Yantai University
Priority to CN202311651968.6A priority Critical patent/CN117475434A/en
Publication of CN117475434A publication Critical patent/CN117475434A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image processing, and particularly discloses a construction method for improving a YOLOv5s model, a small target detection method and a system, wherein the small target detection method comprises the following steps: acquiring an image of a region to be detected, and preprocessing; acquiring a friction force diagram of a region to be detected, and preprocessing; improving the YOLOv5s model to obtain an optimized YOLOv5s model; inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area; the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked. By adopting the technical scheme, the accuracy of a small target detection task is improved by improving the YOLOv5s model, and the method is used for assisting in pathological diagnosis of the lip gland biopsy of the Sjogren syndrome.

Description

Construction method of improved YOLOv5s model, small target detection method and system
Technical Field
The invention belongs to the technical field of image processing, and relates to a construction method, a small target detection method and a system for improving a YOLOv5s model.
Background
Sjogren's Syndrome (SS) is a chronic inflammatory autoimmune disease characterized by lymphocyte proliferation and progressive damage to exocrine glands. In addition to affecting mainly salivary and lacrimal glands, it affects multiple organ systems of the lung, kidney, skin and blood, and is often accompanied by other systemic immune diseases such as Rheumatoid Arthritis (RA) Systemic Lupus Erythematosus (SLE).
At present, the etiology of sjogren's syndrome is not clear and may involve inheritance, viral infection, sex hormone levels and other factors. Notably, sjogren's syndrome is not uncommon. However, many patients have limited knowledge of Sjogren's syndrome, resulting in delayed medical visits, delayed disease conditions, and most of the conditions are well controlled if diagnosed early and treated systematically, improving prognosis.
In the diagnosis process of the Sjogren syndrome, a pathologist needs to review each pathological section one by one under different fold mirrors, the process is tedious and time-consuming, and due to subjective heterogeneity of pathologists of different levels, missed diagnosis and misdiagnosis frequently occur, so that accurate and efficient pathological diagnosis becomes a great challenge for the pathologist to work. In the big data age, artificial intelligence has been widely applied in medical image aided diagnosis, and accompanying the rapid development of digital pathology technology, the artificial intelligence aided pathology technology is gradually in the way of the corner of the head. At present, in various tumors such as lung cancer, breast cancer and the like, the AI-assisted pathological diagnosis is efficient, stable and high in repeatability, the level of the AI-assisted pathological diagnosis is comparable with that of a professional doctor, and the research on the pathological diagnosis of the lip gland biopsy of the Sjogren syndrome has not been reported yet.
Disclosure of Invention
The invention aims to provide a construction method, a small target detection method and a system for improving a YOLOv5s model, which are used for improving the accuracy of a small target detection task and assisting in the pathological diagnosis of the lip gland biopsy of Sjogren syndrome.
In order to achieve the above purpose, the basic scheme of the invention is as follows: the construction method for improving the YOLOv5s model comprises the following steps:
replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function;
introducing a multi-head self-attention module into a skeleton network part of a YOLOv5s model;
in the neck portion of the YOLOv5s model, a shuffle attention attention module was introduced;
introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;
and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model.
The working principle and the beneficial effects of the basic scheme are as follows: because the lymphocyte volume is small, and is difficult to distinguish, the technical scheme improves the YOLOv5 to improve the accuracy of lymphocyte detection tasks and achieve the purposes of accurately detecting the lymphocyte and assisting in pathological diagnosis.
Further, the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:
let C be the angle between the coordinate centers of the predicted frame and the real frame assuming alpha h 、C w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:
the angular loss Λ is defined according to the included angle α as:
the distance loss delta for SIOU is:
γ=2-Λ
wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ x ,ρ y Representing the distance loss factors of the real frame and the prediction frame in the width direction and the height direction; gamma is the angle loss factor.
The distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment w And C h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;
the shape loss Ω of SIOU is:
wherein θ represents the degree of attention to shape lossThe value of which needs to be adjusted accordingly to the specific data set; w, h, w gt ,h gt Representing the width and height of the prediction frame and the real frame, respectively; omega w ,ω h The shape loss factor in the width direction and the height direction is shown.
After the three index losses are fused, the regression frame loss function L of SIOU SIOU The method comprises the following steps:
the IOU is the intersection ratio between the real frame and the prediction frame;
the SIOU optimizes model performance taking into account the angle loss.
The process is affected by the problem of training sample imbalance when predicting bounding box regression of objects. In the image, there are fewer high quality anchor boxes with small regression errors than low quality anchor boxes with large errors. Poor quality anchor boxes can create excessive gradients that can negatively impact the training process. To address this problem, focal-loss is integrated with SIOU to distinguish between high quality and low quality anchor boxes. This helps to improve the accuracy of the regression. The Focal-SIOU loss function is:
L Focal-SIOU =IOUγL SIOU
wherein, gamma represents the attention degree to the IOU, and the value range is larger than 0. The larger the gamma value is, the higher the attention of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L SIOU . IOU represents the intersection ratio between the real frame and the predicted frame; l (L) SIOU Representing the loss function of SIOU.
Further, the method for introducing the multi-head self-attention module into the backspace module of the YOLOv5s model is as follows:
performing structural adjustment on a C3 module of the YOL0v5s original network, and integrating the structural adjustment into a multi-head self-attention layer;
the multi-head self-attention layer adds position codes to make the multi-head self-attention layer sensitive to the positions;
defining the number of heads in the multi-head self-attention layer, and inputtingFirst, a query vector q, a key vector k, and a value vector v are generated by point convolution, and R h 、R w Each representing a position code extracted from the height and width;
after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated T And q and k are subjected to matrix multiplication to generate a corresponding content-content vector qk T
qr T And qk T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining an output characteristic z.
The multi-head self-attention module is simple to build, the parameter quantity is slightly reduced, and the detection precision is obviously improved.
Further, the cross-mode image segmentation module acquires color images of the region to be detected and an image map formed by friction force of the region to be detected, wherein the image map represents the friction force of different position points of the region to be detected by using gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;
the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;
the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;
the feature fusion module comprises a first gating module (Relu function), a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction features and processes the portions with output larger than the friction threshold, and the fusion network fuses the features output by the first gating module and the second gating module, and takes the friction features as a channel for outputting images.
Further, in the bottleneck section of the YOLOv5s model, the method of introducing shuffle attention attention module is as follows:
the dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;
each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;
the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;
after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;
the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;
the channel attention mechanism in the buffering attention module firstly causes the input to be subjected to average pooling to obtain a group of channel-related statistics, and the group of statistics are subjected to linear transformation and are multiplied by a sigmoid activation function to obtain an output result of the acquired position information by multiplying the statistics with the original input corresponding elements;
the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain spatially related statistics, and the group of statistics are subjected to linear transformation and through a sigmoid activation function and then multiplied with corresponding elements of the original input to obtain an output result of the acquired target information.
Shuffle attention the attention mechanism reduces the quantity of parameters and the calculation consumption, and simultaneously integrates the characteristic information of two dimensions of a channel and a space, thereby improving the detection precision of the detector.
The method for further removing the large target detection head of the YOLOv5s model comprises the following steps:
since the sample targets are all small and medium-sized targets, removeAfter the multiplying power of the large target detection head, the network comprises +.> The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.
And a large target detection head is removed, so that the accuracy of the detector is improved, and the parameter quantity and the calculated quantity are reduced.
The invention also provides a small target detection method based on the improved YOLOv5s algorithm, which comprises the following steps:
acquiring an image of a region to be detected, and preprocessing;
acquiring a friction force diagram of a region to be detected, and preprocessing;
based on the construction method, the YOLOv5s model is improved, and an optimized YOLOv5s model is obtained;
inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;
the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked.
The method uses an improved YOLOv5s model to carry out small target detection so as to judge whether a patient suffers from Sjogren syndrome or not and realize auxiliary diagnosis.
The invention also provides a small target detection system based on the improved YOLOv5s algorithm, which comprises an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the small target detection method, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient suffers from Sjogren syndrome.
With the system, whether the patient suffers from Sjogren syndrome is diagnosed by acquiring and analyzing images and acquiring small target detection results of the images.
Further, the friction force diagram acquisition module is a friction tester.
The device is easy to obtain and convenient to use.
Drawings
FIG. 1 is a schematic diagram of the construction method of the improved YOLOv5s model of the present invention;
FIG. 2 is a schematic diagram of SIOU angle loss calculation for an improved construction method of the YOLOv5s model of the present invention;
FIG. 3 is a detailed schematic diagram of the self-attention layer of the multi-head self-attention module of the present invention for improving the construction method of the YOLOv5s model;
fig. 4 is a schematic structural diagram of a Shuffle attention module of the improved YOLOv5s model construction method of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.
The invention discloses a construction method for improving a YOLOv5s (YOLOv 5 is a single-stage target detection algorithm) model, which is used for detecting lymphocyte infiltration lesions in a pathology map and assisting in pathology diagnosis. Because of small lymphocyte volume and difficult distinction, the invention improves the YOLOv5 to improve the accuracy of lymphocyte detection task. As shown in fig. 1, the construction method includes the steps of:
replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function, accelerating network convergence, and improving model precision;
a multi-head self-attention Module (MHSA) introduces a skeleton part of the YOLOv5s model, helping the network capture more long-term dependencies and coping with challenges of complex background;
in the neck portion of the YOLOv5s model, a Shuffle Attention (SA) attention module was introduced, enhancing the ability of the model to fuse spatial and channel dimensional features;
introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;
and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model. And the corresponding detection heads are removed, so that the precision is improved, and meanwhile, the number of parameters and the complexity of a model are reduced.
In a preferred embodiment of the present invention, the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:
in computer vision tasks, the efficiency of object detection is highly dependent on the definition of the loss function. Conventional object detection loss functions focus on several indicators of bounding box regression, including distance, overlap region, and aspect ratio, whereas conventional iou strategies do not take into account orientation information of real and predicted boxes. The prediction frame swings around in the training process, so that model training is slow, fitting is poor, and finally detection performance of the model is affected. The SIOU takes into account the angle loss, solving the above-mentioned problem. The loss function of the SIOU mainly consists of four parts, angle loss, distance loss, shape loss and IOU loss.
As shown in FIG. 2, let C be the angle (less than or equal to 45 degrees) between the predicted frame and the center of the real frame coordinates h 、C w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:
the angular loss Λ is defined according to the included angle α as:
the distance loss delta for SIOU is:
γ=2-Λ
wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ x ,ρ y Representing the distance between the real frame and the predicted frame in the width and height directionsA loss factor; gamma is the angle loss factor.
The distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment w And C h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;
the shape loss Ω of SIOU is:
wherein θ represents the attention degree to shape loss, and the value of the attention degree is required to be correspondingly adjusted according to a specific data set in order to realize more balanced training, and the value range is [2,6 ]]The present embodiment sets its value to 4; the smaller the target value is, the higher the concerned degree of the shape loss is, so that the model is more biased to adjust the shape of the prediction frame in the training process, and the feedback of other losses to the training is restrained; w, h, w gt ,h gt Representing the width and height of the prediction frame and the real frame, respectively; omega w ,ω h The shape loss variable in the width direction and the height direction is shown.
After the three index losses are fused, the regression frame loss function L of SIOU SIOU The method comprises the following steps:
the IOU is the cross-correlation between the real frame and the predicted frame.
The process is affected by the problem of training sample imbalance when predicting bounding box regression of objects. In the image, there are fewer high quality anchor boxes with small regression errors than low quality anchor boxes with large errors. Poor quality anchor boxes can create excessive gradients that can negatively impact the training process. To address this problem, focal-loss is integrated with SIOU to distinguish between high quality and low quality anchor boxes. This helps to improve the accuracy of the regression. The Focal-SIOU loss function is shown below:
L Focal-SIOU =IOU γ L SIOU
wherein, gamma represents the attention degree to the IOU, and the value range is larger than 0. The larger the gamma value is, the higher the attention of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L SIOU . IOU represents the intersection ratio between the real frame and the predicted frame; l (L) SIOU Representing the loss function of SIOU.
In a preferred embodiment of the present invention, the method for introducing the multi-headed self-attention module into the skeleton portion of the YOLOv5s model comprises:
as shown in FIG. 3, the multi-headed self-attention module is a simple powerful self-attention module suitable for a variety of machine vision tasks including image classification, object detection, and instance segmentation.
The multi-head self-attention layer adds position codes to make the multi-head self-attention layer sensitive to the positions; the network is sensitive to the relative positions among the features while focusing on the feature information, and plays a role in efficiently combining the information.
Defining the number of heads in the multi-head self-attention layer (such as 4 heads, etc., adjusted according to specific application scene), firstly generating query vector q, key vector k and value vector v by point convolution, and R h 、R w Each representing a position code extracted from the height and width;
after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated T And q and k are subjected to matrix multiplication to generate a corresponding content-content vector qk T
qr T And qk T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining an output characteristic z (an output characteristic extracted after passing through a multi-head self-attention mechanism).
In a preferred embodiment of the present invention, the method of introducing the shuffleattention module in the neck portion of the YOLOv5s model is as follows:
attention mechanisms have become key components for improving model detection performance, and two types of attention mechanisms are widely applied to machine vision research, namely a spatial attention mechanism and a channel attention mechanism, and focus on information in spatial and channel dimensions.
The channel attention mechanism is helpful for the model to confirm the characteristic information of the detection target, and the spatial attention mechanism is helpful for the model to acquire the position information of the detection target. While fusing channel attention with spatial attention improves performance, it also increases the number of parameters and computational consumption.
As shown in fig. 4, shuffle attention integrates the characteristic information of two dimensions of the channel and the space while reducing the parameter amount and the calculation consumption required by the attention mechanism, so as to improve the detection precision of the detector.
The dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;
each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;
the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;
after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;
the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;
the spatial attention mechanism and the channel attention mechanism adopted in Shuffle attention are simple to build, and compared with the SE attention mechanism and the CBAM attention mechanism, the accuracy is improved, the parameter quantity is less, and the required calculation cost is lower. The channel attention mechanism in the Shuffle attention attention module firstly enables the input to be subjected to average pooling to obtain a group of channel-related statistics, and the group of statistics are subjected to linear transformation and are multiplied by a sigmoid activation function to obtain an output result of the acquired position information by multiplying the corresponding element of the original input;
the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain spatially related statistics, and the group of statistics are subjected to linear transformation and through a sigmoid activation function and then multiplied with corresponding elements of the original input to obtain an output result of the acquired target information.
In a preferred scheme of the invention, a cross-mode image segmentation module acquires color images of a region to be detected and an image map formed by friction force of the region to be detected, the friction force of different position points of the region to be detected is represented by gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;
the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;
the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;
the feature fusion module comprises a first gating module (Relu function), a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction features and processes the portions with output larger than the friction threshold, and the fusion network fuses the features output by the first gating module and the second gating module, and takes the friction features as a channel for outputting images.
In a preferred scheme of the invention, the method for removing the large target detection head of the YOLOv5s model comprises the following steps:
the classical yolov5 model comprisesThe three detection heads with sampling multiplying power respectively correspond to small target detection, medium target detection and large target detection. Since the sample targets are all small and medium-sized targets, remove +.>After the multiplying power of the large target detection head, the network comprises +.>The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.
And a large target detection head is removed, so that the accuracy of the detector is improved, and the parameter quantity and the calculated quantity are reduced.
The invention also provides a small target detection method based on the improved YOLOv5s algorithm, which comprises the following steps:
acquiring an image of a region to be detected, and preprocessing; the image to be detected is segmented into pathological block diagrams with the size of 640 x 640 according to the highest resolution, and 300 images are screened as experimental data sets. Data set according to 8:1: the ratio of 1 is divided into a training set, a verification set and a test set.
Acquiring a friction force diagram of a region to be detected, and preprocessing;
based on the construction method, the YOLOv5s model is improved, and an optimized YOLOv5s model is obtained;
inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;
the small target number of individual regions is compared to a set threshold, and when the threshold is exceeded, the region is screened and labeled for use in diagnosing whether the patient has sjogren's syndrome.
For example, the experiment sets a training round to 100 epochs; the batch size is 5; the input image size is 640 x 640; the initial learning rate was set to 0.01, the decay factor was 0.005, and the momentum parameter was 0.937 using Adam as the optimization algorithm. Collecting a pathological chart of the lip gland biopsy, cutting the WSI of the lip gland biopsy into pathological block charts with the size of 640 x 640 according to the highest resolution, screening 300 sheets of the pathological block charts as an experimental data set, and manually labeling lymphocytes in the pathological block charts by using labelimg based on lymphocyte discrimination criteria. Data set according to 8:1: the ratio of 1 is divided into a training set, a verification set and a test set.
The present invention employs target detection metrics to evaluate the performance of improved YOLOv5s in lymphocyte detection tasks. The primary measure of interest is mAP 0.5 Since the target to be detected is only one lymphocyte, mAP 0.5 Can be expressed as:
wherein, P and R respectively represent precision and recall rate, satisfy:
wherein P represents the precision rate, R represents the recall rate, TP represents the instance that was correctly predicted as the positive instance, TN represents the instance that was incorrectly predicted as the negative instance, FP represents the instance that was incorrectly predicted as the positive instance, and FN represents the instance that was not incorrectly predicted as the negative instance.
The improved YOLOv5s target detection model is improved in the aspects of loss function, feature extraction and attention mechanism of the original model. To evaluate the improvement of different modules and the impact of the combination between modules on the performance of the detection model, ablation experiments were designed on the data sets herein using mAP 0.5 As evaluation indexes, the experimental results are shown in Table 1.
Table 1 ablation experimental results
The detection accuracy of the original YOLOv5s model on the data set is 87.7%, after the multi-head self-attention module is carried on the back bone, the model accuracy is improved by 1.4%, and the parameters and GFLOPs are slightly reduced. After the model neck is loaded with Shuffle attention modules, the model accuracy is improved by 1.5%, and the parameters and GFLOPs are slightly increased. After the large target detection head is removed, the model precision is improved by 1.2%, the parameter quantity is greatly reduced, and the GFLOPs are obviously reduced. After replacing the original CIOU with Focal-SIOU, the model accuracy was improved by 0.5% without changing the parameters and GFLOPs. After the improved strategies are fused, the final detection precision of the model can reach 91.1%, compared with the original network, the precision is improved by 3.4%, the parameter quantity is reduced by 29.6%, and the GFLOPs is reduced by 10.8%, so that the improved strategies adopted in the method have obvious improvement effect on lymphocyte detection.
Comparing the inventive network with other networks of similar body mass, it can be seen that the inventive network has certain advantages in terms of both accuracy and parameter quantity, as shown in table 2.
Table 2 model comparison results
Network model mAP 0.5 /% Quantity of parameters GFLOPs
Yolov7-tiny 85.2 6.01×10 6 13.0
Yolov7 85.5 9.32×10 6 26.7
RetinaNet 69.6 19.8×10 6 61.5
Yolov3-SPP 89.9 4.12×10 6 12.0
Yolov6n 88.3 4.23×10 6 11.8
RT-DETR 88.3 20×10 6 60
Network herein 91.1 4.94×10 6 14.1
The improved YOLOv5s model can fully extract background information, effectively identify the interference cells with similar color and shape such as epithelial cells, and ensure the detection accuracy. Based on an improved YOLOv5s model, the WSI of the lip gland biopsy is detected in a blocking mode, the lymphocyte number of a single area is counted, the block mark color with the lymphocyte number larger than a set threshold value is regarded as a suspicious focus, and the reference of doctors is provided for the purpose of assisting the diagnosis of the Sjogren syndrome.
The invention also provides a small target detection system based on the improved YOLOv5s algorithm, which comprises an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the small target detection method, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient suffers from Sjogren syndrome. Preferably, the friction force diagram acquisition module is a friction tester.
With the system, whether the patient suffers from Sjogren syndrome is diagnosed by acquiring and analyzing images and acquiring small target detection results of the images.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims (9)

1. The construction method for improving the YOLOv5s model is characterized by comprising the following steps of:
replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function; introducing a multi-head self-attention module into a skeleton network part of a YOLOv5s model;
in the neck portion of the YOLOv5s model, a shuffle attention attention module was introduced;
introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;
and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model.
2. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:
let C be the angle between the coordinate centers of the predicted frame and the real frame assuming alpha h 、C w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:
the angular loss Λ is defined according to the included angle α as:
the distance loss delta for SIOU is:
γ=2-Λ
wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ x ,ρ y Representing the distance loss factors of the real frame and the prediction frame in the width direction and the height direction; gamma is an angle loss factor;
the distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment w And C h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;
the shape loss Ω of SIOU is:
wherein θ represents the degree of interest in shape loss, the value of which requires corresponding adjustment according to the particular dataset; w, h, w gt ,h gt Representing a predicted frame and a real frame, respectivelyIs the width and height of (2); omega w ,ω h Representing the shape loss factor in the width direction and the height direction;
after the three index losses are fused, the loss function L of SIOU SIOU The method comprises the following steps:
the IOU is the intersection ratio between the real frame and the prediction frame;
focal-loss is integrated with SIOU to distinguish high quality from low quality anchor boxes, which helps to improve regression accuracy, focal-SIOU loss function is:
L Focal-SIOU =IOU γ L SIOU
wherein, gamma represents the attention degree to the IOU, the value range is larger than 0, and the larger the gamma value is, the higher the attention degree of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L SIOU The method comprises the steps of carrying out a first treatment on the surface of the IOU represents the intersection ratio between the real frame and the predicted frame; l (L) SIOU Representing the loss function of SIOU.
3. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for introducing a multi-headed self-attention module into a skeleton network part of the YOLOv5s model comprises the following steps:
performing structural adjustment on a C3 module of the original YOLOv5s network, and integrating the structural adjustment into a multi-head self-attention layer;
adding position codes into the multi-head self-attention layer to make the multi-head self-attention layer sensitive to the positions;
defining the number of heads in the multi-head self-attention layer to satisfy the balance of precision and calculated amount, input firstly generates a query vector q, a key vector k and a value vector v through point convolution, and R h 、R w Each representing a position code extracted from the height and width;
after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated T And q and k advanceLine matrix multiplication to generate a content-to-content vector qk T
qr T And qk T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining the output characteristic z of the multi-head self-attention layer.
4. The method of constructing an improved YOLOv5s model according to claim 1, wherein the method of introducing shuffle attention attention module in the neck portion of the YOLOv5s model is as follows:
the dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;
each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;
the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;
after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;
the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;
the channel attention mechanism in the Shuffle attention attention module firstly enables the input to obtain a group of channel-related statistics through average pooling, and the group of statistics are multiplied by the corresponding element of the original input to acquire the output result of the position information after linear transformation and sigmoid activation function;
the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain a spatially related statistic, and the group of statistic is subjected to linear transformation and is multiplied by a sigmoid activation function to acquire an output result of target information by multiplying the output result with the corresponding element of the original input.
5. The method for constructing the improved YOLOv5s model according to claim 1, wherein the cross-mode image segmentation module acquires color images of a region to be detected and force patterns formed by friction force of the region to be detected, the force patterns represent friction force of different position points of the region to be detected by gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;
the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;
the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;
the feature fusion module comprises a first gating module, a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction force features and processes the portions with output larger than the friction force threshold, and the fusion network fuses the features output by the first gating module and the second gating module and takes the friction force features as a channel for outputting the image.
6. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for removing the large target detection head of the YOLOv5s model is as follows:
since the sample targets are all small and medium-sized targets, removeA large target detection head with multiplying power, a network comprising ∈>The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.
7. The small target detection method based on the improved YOLOv5s algorithm is characterized by comprising the following steps of:
acquiring an image of a region to be detected, and preprocessing;
acquiring a friction force diagram of a region to be detected, and preprocessing; based on the construction method of one of claims 1 to 6, improving the YOLOv5s model to obtain an optimized YOLOv5s model;
inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;
the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked.
8. A small target detection system based on an improved YOLOv5s algorithm, which is characterized by comprising an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the method of claim 7, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient has the Sjogren syndrome.
9. The small target detection system based on the modified YOLOv5s algorithm of claim 8, wherein the friction force map acquisition module is a friction tester.
CN202311651968.6A 2023-12-05 2023-12-05 Construction method of improved YOLOv5s model, small target detection method and system Pending CN117475434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311651968.6A CN117475434A (en) 2023-12-05 2023-12-05 Construction method of improved YOLOv5s model, small target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311651968.6A CN117475434A (en) 2023-12-05 2023-12-05 Construction method of improved YOLOv5s model, small target detection method and system

Publications (1)

Publication Number Publication Date
CN117475434A true CN117475434A (en) 2024-01-30

Family

ID=89633087

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311651968.6A Pending CN117475434A (en) 2023-12-05 2023-12-05 Construction method of improved YOLOv5s model, small target detection method and system

Country Status (1)

Country Link
CN (1) CN117475434A (en)

Similar Documents

Publication Publication Date Title
CN112288706A (en) Automatic chromosome karyotype analysis and abnormality detection method
He et al. Automated model design and benchmarking of deep learning models for covid-19 detection with chest ct scans
CN111612008A (en) Image segmentation method based on convolution network
CN113408472B (en) Training method of target re-identification model, target re-identification method and device
EP2557539A2 (en) Image processing apparatus, image processing method, and image processing program
CN111798424B (en) Medical image-based nodule detection method and device and electronic equipment
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
CN109902576B (en) Training method and application of head and shoulder image classifier
CN111462102B (en) Intelligent analysis system and method based on novel coronavirus pneumonia X-ray chest radiography
CN116580394A (en) White blood cell detection method based on multi-scale fusion and deformable self-attention
CN112132166A (en) Intelligent analysis method, system and device for digital cytopathology image
CN111860587A (en) Method for detecting small target of picture
Yao et al. GeminiNet: combine fully convolution network with structure of receptive fields for object detection
CN110717916B (en) Pulmonary embolism detection system based on convolutional neural network
CN117173595A (en) Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7
CN117475434A (en) Construction method of improved YOLOv5s model, small target detection method and system
Sameki et al. ICORD: Intelligent Collection of Redundant Data-A Dynamic System for Crowdsourcing Cell Segmentations Accurately and Efficiently.
CN111598955A (en) Mobile terminal intelligent foundation pit monitoring system and method based on photogrammetry
Deng et al. A coarse to fine framework for recognizing and locating multiple diatoms with highly complex backgrounds in forensic investigation
Zhai et al. Automatic white blood cell classification based on whole-slide images with a deeply aggregated neural network
Li et al. Long short-term memory improved Siamese network for robust target tracking
Xiong et al. PC-SuperPoint: interest point detection and descriptor extraction using pyramid convolution and circle loss
Wu et al. Real-time visual tracking via incremental covariance model update on Log-Euclidean Riemannian manifold
Zhu et al. PODB: A learning-based polarimetric object detection benchmark for road scenes in adverse weather conditions
Wang Action recognition based on Riemannian manifold distance measurement and adaptive weighted feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination