CN117475434A

CN117475434A - Construction method of improved YOLOv5s model, small target detection method and system

Info

Publication number: CN117475434A
Application number: CN202311651968.6A
Authority: CN
Inventors: 姜佩贺; 李益; 关姝睿
Original assignee: Yantai University
Current assignee: Yantai University
Priority date: 2023-12-05
Filing date: 2023-12-05
Publication date: 2024-01-30

Abstract

The invention belongs to the technical field of image processing, and particularly discloses a construction method for improving a YOLOv5s model, a small target detection method and a system, wherein the small target detection method comprises the following steps: acquiring an image of a region to be detected, and preprocessing; acquiring a friction force diagram of a region to be detected, and preprocessing; improving the YOLOv5s model to obtain an optimized YOLOv5s model; inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area; the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked. By adopting the technical scheme, the accuracy of a small target detection task is improved by improving the YOLOv5s model, and the method is used for assisting in pathological diagnosis of the lip gland biopsy of the Sjogren syndrome.

Description

Construction method of improved YOLOv5s model, small target detection method and system

Technical Field

The invention belongs to the technical field of image processing, and relates to a construction method, a small target detection method and a system for improving a YOLOv5s model.

Background

Sjogren's Syndrome (SS) is a chronic inflammatory autoimmune disease characterized by lymphocyte proliferation and progressive damage to exocrine glands. In addition to affecting mainly salivary and lacrimal glands, it affects multiple organ systems of the lung, kidney, skin and blood, and is often accompanied by other systemic immune diseases such as Rheumatoid Arthritis (RA) Systemic Lupus Erythematosus (SLE).

At present, the etiology of sjogren's syndrome is not clear and may involve inheritance, viral infection, sex hormone levels and other factors. Notably, sjogren's syndrome is not uncommon. However, many patients have limited knowledge of Sjogren's syndrome, resulting in delayed medical visits, delayed disease conditions, and most of the conditions are well controlled if diagnosed early and treated systematically, improving prognosis.

In the diagnosis process of the Sjogren syndrome, a pathologist needs to review each pathological section one by one under different fold mirrors, the process is tedious and time-consuming, and due to subjective heterogeneity of pathologists of different levels, missed diagnosis and misdiagnosis frequently occur, so that accurate and efficient pathological diagnosis becomes a great challenge for the pathologist to work. In the big data age, artificial intelligence has been widely applied in medical image aided diagnosis, and accompanying the rapid development of digital pathology technology, the artificial intelligence aided pathology technology is gradually in the way of the corner of the head. At present, in various tumors such as lung cancer, breast cancer and the like, the AI-assisted pathological diagnosis is efficient, stable and high in repeatability, the level of the AI-assisted pathological diagnosis is comparable with that of a professional doctor, and the research on the pathological diagnosis of the lip gland biopsy of the Sjogren syndrome has not been reported yet.

Disclosure of Invention

The invention aims to provide a construction method, a small target detection method and a system for improving a YOLOv5s model, which are used for improving the accuracy of a small target detection task and assisting in the pathological diagnosis of the lip gland biopsy of Sjogren syndrome.

In order to achieve the above purpose, the basic scheme of the invention is as follows: the construction method for improving the YOLOv5s model comprises the following steps:

replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function;

introducing a multi-head self-attention module into a skeleton network part of a YOLOv5s model;

in the neck portion of the YOLOv5s model, a shuffle attention attention module was introduced;

introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;

and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model.

The working principle and the beneficial effects of the basic scheme are as follows: because the lymphocyte volume is small, and is difficult to distinguish, the technical scheme improves the YOLOv5 to improve the accuracy of lymphocyte detection tasks and achieve the purposes of accurately detecting the lymphocyte and assisting in pathological diagnosis.

Further, the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:

let C be the angle between the coordinate centers of the predicted frame and the real frame assuming alpha _h 、C _w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:

the angular loss Λ is defined according to the included angle α as:

the distance loss delta for SIOU is:

γ＝2-Λ

wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ _x ，ρ _y Representing the distance loss factors of the real frame and the prediction frame in the width direction and the height direction; gamma is the angle loss factor.

The distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment _w And C _h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;

the shape loss Ω of SIOU is:

wherein θ represents the degree of attention to shape lossThe value of which needs to be adjusted accordingly to the specific data set; w, h, w ^gt ，h ^gt Representing the width and height of the prediction frame and the real frame, respectively; omega _w ，ω _h The shape loss factor in the width direction and the height direction is shown.

After the three index losses are fused, the regression frame loss function L of SIOU _SIOU The method comprises the following steps:

the IOU is the intersection ratio between the real frame and the prediction frame;

the SIOU optimizes model performance taking into account the angle loss.

The process is affected by the problem of training sample imbalance when predicting bounding box regression of objects. In the image, there are fewer high quality anchor boxes with small regression errors than low quality anchor boxes with large errors. Poor quality anchor boxes can create excessive gradients that can negatively impact the training process. To address this problem, focal-loss is integrated with SIOU to distinguish between high quality and low quality anchor boxes. This helps to improve the accuracy of the regression. The Focal-SIOU loss function is:

L _Focal-SIOU ＝IOUγL _SIOU

wherein, gamma represents the attention degree to the IOU, and the value range is larger than 0. The larger the gamma value is, the higher the attention of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L _SIOU . IOU represents the intersection ratio between the real frame and the predicted frame; l (L) _SIOU Representing the loss function of SIOU.

Further, the method for introducing the multi-head self-attention module into the backspace module of the YOLOv5s model is as follows:

performing structural adjustment on a C3 module of the YOL0v5s original network, and integrating the structural adjustment into a multi-head self-attention layer;

the multi-head self-attention layer adds position codes to make the multi-head self-attention layer sensitive to the positions;

defining the number of heads in the multi-head self-attention layer, and inputtingFirst, a query vector q, a key vector k, and a value vector v are generated by point convolution, and R _h 、R _w Each representing a position code extracted from the height and width;

after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated ^T And q and k are subjected to matrix multiplication to generate a corresponding content-content vector qk ^T ；

qr ^T And qk ^T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining an output characteristic z.

The multi-head self-attention module is simple to build, the parameter quantity is slightly reduced, and the detection precision is obviously improved.

Further, the cross-mode image segmentation module acquires color images of the region to be detected and an image map formed by friction force of the region to be detected, wherein the image map represents the friction force of different position points of the region to be detected by using gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;

the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;

the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;

the feature fusion module comprises a first gating module (Relu function), a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction features and processes the portions with output larger than the friction threshold, and the fusion network fuses the features output by the first gating module and the second gating module, and takes the friction features as a channel for outputting images.

Further, in the bottleneck section of the YOLOv5s model, the method of introducing shuffle attention attention module is as follows:

the dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;

each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;

the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;

after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;

the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;

the channel attention mechanism in the buffering attention module firstly causes the input to be subjected to average pooling to obtain a group of channel-related statistics, and the group of statistics are subjected to linear transformation and are multiplied by a sigmoid activation function to obtain an output result of the acquired position information by multiplying the statistics with the original input corresponding elements;

the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain spatially related statistics, and the group of statistics are subjected to linear transformation and through a sigmoid activation function and then multiplied with corresponding elements of the original input to obtain an output result of the acquired target information.

Shuffle attention the attention mechanism reduces the quantity of parameters and the calculation consumption, and simultaneously integrates the characteristic information of two dimensions of a channel and a space, thereby improving the detection precision of the detector.

The method for further removing the large target detection head of the YOLOv5s model comprises the following steps:

since the sample targets are all small and medium-sized targets, removeAfter the multiplying power of the large target detection head, the network comprises +.> The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.

And a large target detection head is removed, so that the accuracy of the detector is improved, and the parameter quantity and the calculated quantity are reduced.

The invention also provides a small target detection method based on the improved YOLOv5s algorithm, which comprises the following steps:

acquiring an image of a region to be detected, and preprocessing;

acquiring a friction force diagram of a region to be detected, and preprocessing;

based on the construction method, the YOLOv5s model is improved, and an optimized YOLOv5s model is obtained;

inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;

the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked.

The method uses an improved YOLOv5s model to carry out small target detection so as to judge whether a patient suffers from Sjogren syndrome or not and realize auxiliary diagnosis.

The invention also provides a small target detection system based on the improved YOLOv5s algorithm, which comprises an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the small target detection method, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient suffers from Sjogren syndrome.

With the system, whether the patient suffers from Sjogren syndrome is diagnosed by acquiring and analyzing images and acquiring small target detection results of the images.

Further, the friction force diagram acquisition module is a friction tester.

The device is easy to obtain and convenient to use.

Drawings

FIG. 1 is a schematic diagram of the construction method of the improved YOLOv5s model of the present invention;

FIG. 2 is a schematic diagram of SIOU angle loss calculation for an improved construction method of the YOLOv5s model of the present invention;

FIG. 3 is a detailed schematic diagram of the self-attention layer of the multi-head self-attention module of the present invention for improving the construction method of the YOLOv5s model;

fig. 4 is a schematic structural diagram of a Shuffle attention module of the improved YOLOv5s model construction method of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.

In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.

The invention discloses a construction method for improving a YOLOv5s (YOLOv 5 is a single-stage target detection algorithm) model, which is used for detecting lymphocyte infiltration lesions in a pathology map and assisting in pathology diagnosis. Because of small lymphocyte volume and difficult distinction, the invention improves the YOLOv5 to improve the accuracy of lymphocyte detection task. As shown in fig. 1, the construction method includes the steps of:

replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function, accelerating network convergence, and improving model precision;

a multi-head self-attention Module (MHSA) introduces a skeleton part of the YOLOv5s model, helping the network capture more long-term dependencies and coping with challenges of complex background;

in the neck portion of the YOLOv5s model, a Shuffle Attention (SA) attention module was introduced, enhancing the ability of the model to fuse spatial and channel dimensional features;

and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model. And the corresponding detection heads are removed, so that the precision is improved, and meanwhile, the number of parameters and the complexity of a model are reduced.

In a preferred embodiment of the present invention, the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:

in computer vision tasks, the efficiency of object detection is highly dependent on the definition of the loss function. Conventional object detection loss functions focus on several indicators of bounding box regression, including distance, overlap region, and aspect ratio, whereas conventional iou strategies do not take into account orientation information of real and predicted boxes. The prediction frame swings around in the training process, so that model training is slow, fitting is poor, and finally detection performance of the model is affected. The SIOU takes into account the angle loss, solving the above-mentioned problem. The loss function of the SIOU mainly consists of four parts, angle loss, distance loss, shape loss and IOU loss.

As shown in FIG. 2, let C be the angle (less than or equal to 45 degrees) between the predicted frame and the center of the real frame coordinates _h 、C _w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:

the angular loss Λ is defined according to the included angle α as:

the distance loss delta for SIOU is:

γ＝2-Λ

wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ _x ，ρ _y Representing the distance between the real frame and the predicted frame in the width and height directionsA loss factor; gamma is the angle loss factor.

the shape loss Ω of SIOU is:

wherein θ represents the attention degree to shape loss, and the value of the attention degree is required to be correspondingly adjusted according to a specific data set in order to realize more balanced training, and the value range is [2,6 ]]The present embodiment sets its value to 4; the smaller the target value is, the higher the concerned degree of the shape loss is, so that the model is more biased to adjust the shape of the prediction frame in the training process, and the feedback of other losses to the training is restrained; w, h, w ^gt ，h ^gt Representing the width and height of the prediction frame and the real frame, respectively; omega _w ，ω _h The shape loss variable in the width direction and the height direction is shown.

the IOU is the cross-correlation between the real frame and the predicted frame.

The process is affected by the problem of training sample imbalance when predicting bounding box regression of objects. In the image, there are fewer high quality anchor boxes with small regression errors than low quality anchor boxes with large errors. Poor quality anchor boxes can create excessive gradients that can negatively impact the training process. To address this problem, focal-loss is integrated with SIOU to distinguish between high quality and low quality anchor boxes. This helps to improve the accuracy of the regression. The Focal-SIOU loss function is shown below:

L _Focal-SIOU ＝IOU ^γ L _SIOU

In a preferred embodiment of the present invention, the method for introducing the multi-headed self-attention module into the skeleton portion of the YOLOv5s model comprises:

as shown in FIG. 3, the multi-headed self-attention module is a simple powerful self-attention module suitable for a variety of machine vision tasks including image classification, object detection, and instance segmentation.

The multi-head self-attention layer adds position codes to make the multi-head self-attention layer sensitive to the positions; the network is sensitive to the relative positions among the features while focusing on the feature information, and plays a role in efficiently combining the information.

Defining the number of heads in the multi-head self-attention layer (such as 4 heads, etc., adjusted according to specific application scene), firstly generating query vector q, key vector k and value vector v by point convolution, and R _h 、R _w Each representing a position code extracted from the height and width;

qr ^T And qk ^T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining an output characteristic z (an output characteristic extracted after passing through a multi-head self-attention mechanism).

In a preferred embodiment of the present invention, the method of introducing the shuffleattention module in the neck portion of the YOLOv5s model is as follows:

attention mechanisms have become key components for improving model detection performance, and two types of attention mechanisms are widely applied to machine vision research, namely a spatial attention mechanism and a channel attention mechanism, and focus on information in spatial and channel dimensions.

The channel attention mechanism is helpful for the model to confirm the characteristic information of the detection target, and the spatial attention mechanism is helpful for the model to acquire the position information of the detection target. While fusing channel attention with spatial attention improves performance, it also increases the number of parameters and computational consumption.

As shown in fig. 4, shuffle attention integrates the characteristic information of two dimensions of the channel and the space while reducing the parameter amount and the calculation consumption required by the attention mechanism, so as to improve the detection precision of the detector.

the spatial attention mechanism and the channel attention mechanism adopted in Shuffle attention are simple to build, and compared with the SE attention mechanism and the CBAM attention mechanism, the accuracy is improved, the parameter quantity is less, and the required calculation cost is lower. The channel attention mechanism in the Shuffle attention attention module firstly enables the input to be subjected to average pooling to obtain a group of channel-related statistics, and the group of statistics are subjected to linear transformation and are multiplied by a sigmoid activation function to obtain an output result of the acquired position information by multiplying the corresponding element of the original input;

In a preferred scheme of the invention, a cross-mode image segmentation module acquires color images of a region to be detected and an image map formed by friction force of the region to be detected, the friction force of different position points of the region to be detected is represented by gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;

In a preferred scheme of the invention, the method for removing the large target detection head of the YOLOv5s model comprises the following steps:

the classical yolov5 model comprisesThe three detection heads with sampling multiplying power respectively correspond to small target detection, medium target detection and large target detection. Since the sample targets are all small and medium-sized targets, remove +.>After the multiplying power of the large target detection head, the network comprises +.>The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.

acquiring an image of a region to be detected, and preprocessing; the image to be detected is segmented into pathological block diagrams with the size of 640 x 640 according to the highest resolution, and 300 images are screened as experimental data sets. Data set according to 8:1: the ratio of 1 is divided into a training set, a verification set and a test set.

the small target number of individual regions is compared to a set threshold, and when the threshold is exceeded, the region is screened and labeled for use in diagnosing whether the patient has sjogren's syndrome.

For example, the experiment sets a training round to 100 epochs; the batch size is 5; the input image size is 640 x 640; the initial learning rate was set to 0.01, the decay factor was 0.005, and the momentum parameter was 0.937 using Adam as the optimization algorithm. Collecting a pathological chart of the lip gland biopsy, cutting the WSI of the lip gland biopsy into pathological block charts with the size of 640 x 640 according to the highest resolution, screening 300 sheets of the pathological block charts as an experimental data set, and manually labeling lymphocytes in the pathological block charts by using labelimg based on lymphocyte discrimination criteria. Data set according to 8:1: the ratio of 1 is divided into a training set, a verification set and a test set.

The present invention employs target detection metrics to evaluate the performance of improved YOLOv5s in lymphocyte detection tasks. The primary measure of interest is mAP _0.5 Since the target to be detected is only one lymphocyte, mAP _0.5 Can be expressed as:

wherein, P and R respectively represent precision and recall rate, satisfy:

wherein P represents the precision rate, R represents the recall rate, TP represents the instance that was correctly predicted as the positive instance, TN represents the instance that was incorrectly predicted as the negative instance, FP represents the instance that was incorrectly predicted as the positive instance, and FN represents the instance that was not incorrectly predicted as the negative instance.

The improved YOLOv5s target detection model is improved in the aspects of loss function, feature extraction and attention mechanism of the original model. To evaluate the improvement of different modules and the impact of the combination between modules on the performance of the detection model, ablation experiments were designed on the data sets herein using mAP _0.5 As evaluation indexes, the experimental results are shown in Table 1.

Table 1 ablation experimental results

The detection accuracy of the original YOLOv5s model on the data set is 87.7%, after the multi-head self-attention module is carried on the back bone, the model accuracy is improved by 1.4%, and the parameters and GFLOPs are slightly reduced. After the model neck is loaded with Shuffle attention modules, the model accuracy is improved by 1.5%, and the parameters and GFLOPs are slightly increased. After the large target detection head is removed, the model precision is improved by 1.2%, the parameter quantity is greatly reduced, and the GFLOPs are obviously reduced. After replacing the original CIOU with Focal-SIOU, the model accuracy was improved by 0.5% without changing the parameters and GFLOPs. After the improved strategies are fused, the final detection precision of the model can reach 91.1%, compared with the original network, the precision is improved by 3.4%, the parameter quantity is reduced by 29.6%, and the GFLOPs is reduced by 10.8%, so that the improved strategies adopted in the method have obvious improvement effect on lymphocyte detection.

Comparing the inventive network with other networks of similar body mass, it can be seen that the inventive network has certain advantages in terms of both accuracy and parameter quantity, as shown in table 2.

Table 2 model comparison results

Network model	mAP _0.5 /％	Quantity of parameters	GFLOPs
				Yolov7-tiny	85.2	6.01×10 ⁶	13.0
Yolov7	85.5	9.32×10 ⁶	26.7
				RetinaNet	69.6	19.8×10 ⁶	61.5
Yolov3-SPP	89.9	4.12×10 ⁶	12.0
				Yolov6n	88.3	4.23×10 ⁶	11.8
RT-DETR	88.3	20×10 ⁶	60
				Network herein	91.1	4.94×10 ⁶	14.1

The improved YOLOv5s model can fully extract background information, effectively identify the interference cells with similar color and shape such as epithelial cells, and ensure the detection accuracy. Based on an improved YOLOv5s model, the WSI of the lip gland biopsy is detected in a blocking mode, the lymphocyte number of a single area is counted, the block mark color with the lymphocyte number larger than a set threshold value is regarded as a suspicious focus, and the reference of doctors is provided for the purpose of assisting the diagnosis of the Sjogren syndrome.

The invention also provides a small target detection system based on the improved YOLOv5s algorithm, which comprises an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the small target detection method, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient suffers from Sjogren syndrome. Preferably, the friction force diagram acquisition module is a friction tester.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. The construction method for improving the YOLOv5s model is characterized by comprising the following steps of:

replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function; introducing a multi-head self-attention module into a skeleton network part of a YOLOv5s model;

2. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:

the angular loss Λ is defined according to the included angle α as:

the distance loss delta for SIOU is:

γ＝2-Λ

wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ _x ，ρ _y Representing the distance loss factors of the real frame and the prediction frame in the width direction and the height direction; gamma is an angle loss factor;

the shape loss Ω of SIOU is:

wherein θ represents the degree of interest in shape loss, the value of which requires corresponding adjustment according to the particular dataset; w, h, w ^gt ，h ^gt Representing a predicted frame and a real frame, respectivelyIs the width and height of (2); omega _w ，ω _h Representing the shape loss factor in the width direction and the height direction;

after the three index losses are fused, the loss function L of SIOU _SIOU The method comprises the following steps:

focal-loss is integrated with SIOU to distinguish high quality from low quality anchor boxes, which helps to improve regression accuracy, focal-SIOU loss function is:

L _Focal-SIOU ＝IOU ^γ L _SIOU

wherein, gamma represents the attention degree to the IOU, the value range is larger than 0, and the larger the gamma value is, the higher the attention degree of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L _SIOU The method comprises the steps of carrying out a first treatment on the surface of the IOU represents the intersection ratio between the real frame and the predicted frame; l (L) _SIOU Representing the loss function of SIOU.

3. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for introducing a multi-headed self-attention module into a skeleton network part of the YOLOv5s model comprises the following steps:

performing structural adjustment on a C3 module of the original YOLOv5s network, and integrating the structural adjustment into a multi-head self-attention layer;

adding position codes into the multi-head self-attention layer to make the multi-head self-attention layer sensitive to the positions;

defining the number of heads in the multi-head self-attention layer to satisfy the balance of precision and calculated amount, input firstly generates a query vector q, a key vector k and a value vector v through point convolution, and R _h 、R _w Each representing a position code extracted from the height and width;

after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated ^T And q and k advanceLine matrix multiplication to generate a content-to-content vector qk ^T ；

qr ^T And qk ^T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining the output characteristic z of the multi-head self-attention layer.

4. The method of constructing an improved YOLOv5s model according to claim 1, wherein the method of introducing shuffle attention attention module in the neck portion of the YOLOv5s model is as follows:

the channel attention mechanism in the Shuffle attention attention module firstly enables the input to obtain a group of channel-related statistics through average pooling, and the group of statistics are multiplied by the corresponding element of the original input to acquire the output result of the position information after linear transformation and sigmoid activation function;

the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain a spatially related statistic, and the group of statistic is subjected to linear transformation and is multiplied by a sigmoid activation function to acquire an output result of target information by multiplying the output result with the corresponding element of the original input.

5. The method for constructing the improved YOLOv5s model according to claim 1, wherein the cross-mode image segmentation module acquires color images of a region to be detected and force patterns formed by friction force of the region to be detected, the force patterns represent friction force of different position points of the region to be detected by gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;

the feature fusion module comprises a first gating module, a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction force features and processes the portions with output larger than the friction force threshold, and the fusion network fuses the features output by the first gating module and the second gating module and takes the friction force features as a channel for outputting the image.

6. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for removing the large target detection head of the YOLOv5s model is as follows:

since the sample targets are all small and medium-sized targets, removeA large target detection head with multiplying power, a network comprising ∈>The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.

7. The small target detection method based on the improved YOLOv5s algorithm is characterized by comprising the following steps of:

acquiring an image of a region to be detected, and preprocessing;

acquiring a friction force diagram of a region to be detected, and preprocessing; based on the construction method of one of claims 1 to 6, improving the YOLOv5s model to obtain an optimized YOLOv5s model;

8. A small target detection system based on an improved YOLOv5s algorithm, which is characterized by comprising an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the method of claim 7, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient has the Sjogren syndrome.

9. The small target detection system based on the modified YOLOv5s algorithm of claim 8, wherein the friction force map acquisition module is a friction tester.