CN117475434A - Construction method of improved YOLOv5s model, small target detection method and system - Google Patents
Construction method of improved YOLOv5s model, small target detection method and system Download PDFInfo
- Publication number
- CN117475434A CN117475434A CN202311651968.6A CN202311651968A CN117475434A CN 117475434 A CN117475434 A CN 117475434A CN 202311651968 A CN202311651968 A CN 202311651968A CN 117475434 A CN117475434 A CN 117475434A
- Authority
- CN
- China
- Prior art keywords
- yolov5s
- module
- attention
- model
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 73
- 238000010276 construction Methods 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000010586 diagram Methods 0.000 claims abstract description 26
- 208000021386 Sjogren Syndrome Diseases 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000006870 function Effects 0.000 claims description 46
- 230000007246 mechanism Effects 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 15
- 230000008569 process Effects 0.000 claims description 14
- 230000004927 fusion Effects 0.000 claims description 12
- 238000003709 image segmentation Methods 0.000 claims description 9
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 206010048222 Xerosis Diseases 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 230000008521 reorganization Effects 0.000 claims description 3
- 238000010827 pathological analysis Methods 0.000 abstract description 7
- 238000001574 biopsy Methods 0.000 abstract description 6
- 210000004907 gland Anatomy 0.000 abstract description 6
- 210000003128 head Anatomy 0.000 description 17
- 210000004698 lymphocyte Anatomy 0.000 description 13
- 238000003745 diagnosis Methods 0.000 description 6
- 230000001575 pathological effect Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 230000007170 pathology Effects 0.000 description 4
- 238000002679 ablation Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010039073 rheumatoid arthritis Diseases 0.000 description 2
- 201000000596 systemic lupus erythematosus Diseases 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 208000023275 Autoimmune disease Diseases 0.000 description 1
- 206010006187 Breast cancer Diseases 0.000 description 1
- 208000026310 Breast neoplasm Diseases 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 208000036142 Viral infection Diseases 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 230000001684 chronic effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000010485 coping Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000212 effect on lymphocytes Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 210000003499 exocrine gland Anatomy 0.000 description 1
- 239000003163 gonadal steroid hormone Substances 0.000 description 1
- 208000026278 immune system disease Diseases 0.000 description 1
- 230000008595 infiltration Effects 0.000 description 1
- 238000001764 infiltration Methods 0.000 description 1
- 230000002757 inflammatory effect Effects 0.000 description 1
- 210000003734 kidney Anatomy 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 210000004561 lacrimal apparatus Anatomy 0.000 description 1
- 230000003902 lesion Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 210000004789 organ system Anatomy 0.000 description 1
- 238000004393 prognosis Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000035755 proliferation Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 210000003079 salivary gland Anatomy 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 210000003491 skin Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 230000009385 viral infection Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/69—Microscopic objects, e.g. biological cells or cellular parts
- G06V20/695—Preprocessing, e.g. image segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of image processing, and particularly discloses a construction method for improving a YOLOv5s model, a small target detection method and a system, wherein the small target detection method comprises the following steps: acquiring an image of a region to be detected, and preprocessing; acquiring a friction force diagram of a region to be detected, and preprocessing; improving the YOLOv5s model to obtain an optimized YOLOv5s model; inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area; the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked. By adopting the technical scheme, the accuracy of a small target detection task is improved by improving the YOLOv5s model, and the method is used for assisting in pathological diagnosis of the lip gland biopsy of the Sjogren syndrome.
Description
Technical Field
The invention belongs to the technical field of image processing, and relates to a construction method, a small target detection method and a system for improving a YOLOv5s model.
Background
Sjogren's Syndrome (SS) is a chronic inflammatory autoimmune disease characterized by lymphocyte proliferation and progressive damage to exocrine glands. In addition to affecting mainly salivary and lacrimal glands, it affects multiple organ systems of the lung, kidney, skin and blood, and is often accompanied by other systemic immune diseases such as Rheumatoid Arthritis (RA) Systemic Lupus Erythematosus (SLE).
At present, the etiology of sjogren's syndrome is not clear and may involve inheritance, viral infection, sex hormone levels and other factors. Notably, sjogren's syndrome is not uncommon. However, many patients have limited knowledge of Sjogren's syndrome, resulting in delayed medical visits, delayed disease conditions, and most of the conditions are well controlled if diagnosed early and treated systematically, improving prognosis.
In the diagnosis process of the Sjogren syndrome, a pathologist needs to review each pathological section one by one under different fold mirrors, the process is tedious and time-consuming, and due to subjective heterogeneity of pathologists of different levels, missed diagnosis and misdiagnosis frequently occur, so that accurate and efficient pathological diagnosis becomes a great challenge for the pathologist to work. In the big data age, artificial intelligence has been widely applied in medical image aided diagnosis, and accompanying the rapid development of digital pathology technology, the artificial intelligence aided pathology technology is gradually in the way of the corner of the head. At present, in various tumors such as lung cancer, breast cancer and the like, the AI-assisted pathological diagnosis is efficient, stable and high in repeatability, the level of the AI-assisted pathological diagnosis is comparable with that of a professional doctor, and the research on the pathological diagnosis of the lip gland biopsy of the Sjogren syndrome has not been reported yet.
Disclosure of Invention
The invention aims to provide a construction method, a small target detection method and a system for improving a YOLOv5s model, which are used for improving the accuracy of a small target detection task and assisting in the pathological diagnosis of the lip gland biopsy of Sjogren syndrome.
In order to achieve the above purpose, the basic scheme of the invention is as follows: the construction method for improving the YOLOv5s model comprises the following steps:
replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function;
introducing a multi-head self-attention module into a skeleton network part of a YOLOv5s model;
in the neck portion of the YOLOv5s model, a shuffle attention attention module was introduced;
introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;
and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model.
The working principle and the beneficial effects of the basic scheme are as follows: because the lymphocyte volume is small, and is difficult to distinguish, the technical scheme improves the YOLOv5 to improve the accuracy of lymphocyte detection tasks and achieve the purposes of accurately detecting the lymphocyte and assisting in pathological diagnosis.
Further, the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:
let C be the angle between the coordinate centers of the predicted frame and the real frame assuming alpha h 、C w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:
the angular loss Λ is defined according to the included angle α as:
the distance loss delta for SIOU is:
γ=2-Λ
wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ x ,ρ y Representing the distance loss factors of the real frame and the prediction frame in the width direction and the height direction; gamma is the angle loss factor.
The distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment w And C h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;
the shape loss Ω of SIOU is:
wherein θ represents the degree of attention to shape lossThe value of which needs to be adjusted accordingly to the specific data set; w, h, w gt ,h gt Representing the width and height of the prediction frame and the real frame, respectively; omega w ,ω h The shape loss factor in the width direction and the height direction is shown.
After the three index losses are fused, the regression frame loss function L of SIOU SIOU The method comprises the following steps:
the IOU is the intersection ratio between the real frame and the prediction frame;
the SIOU optimizes model performance taking into account the angle loss.
The process is affected by the problem of training sample imbalance when predicting bounding box regression of objects. In the image, there are fewer high quality anchor boxes with small regression errors than low quality anchor boxes with large errors. Poor quality anchor boxes can create excessive gradients that can negatively impact the training process. To address this problem, focal-loss is integrated with SIOU to distinguish between high quality and low quality anchor boxes. This helps to improve the accuracy of the regression. The Focal-SIOU loss function is:
L Focal-SIOU =IOUγL SIOU
wherein, gamma represents the attention degree to the IOU, and the value range is larger than 0. The larger the gamma value is, the higher the attention of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L SIOU . IOU represents the intersection ratio between the real frame and the predicted frame; l (L) SIOU Representing the loss function of SIOU.
Further, the method for introducing the multi-head self-attention module into the backspace module of the YOLOv5s model is as follows:
performing structural adjustment on a C3 module of the YOL0v5s original network, and integrating the structural adjustment into a multi-head self-attention layer;
the multi-head self-attention layer adds position codes to make the multi-head self-attention layer sensitive to the positions;
defining the number of heads in the multi-head self-attention layer, and inputtingFirst, a query vector q, a key vector k, and a value vector v are generated by point convolution, and R h 、R w Each representing a position code extracted from the height and width;
after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated T And q and k are subjected to matrix multiplication to generate a corresponding content-content vector qk T ;
qr T And qk T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining an output characteristic z.
The multi-head self-attention module is simple to build, the parameter quantity is slightly reduced, and the detection precision is obviously improved.
Further, the cross-mode image segmentation module acquires color images of the region to be detected and an image map formed by friction force of the region to be detected, wherein the image map represents the friction force of different position points of the region to be detected by using gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;
the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;
the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;
the feature fusion module comprises a first gating module (Relu function), a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction features and processes the portions with output larger than the friction threshold, and the fusion network fuses the features output by the first gating module and the second gating module, and takes the friction features as a channel for outputting images.
Further, in the bottleneck section of the YOLOv5s model, the method of introducing shuffle attention attention module is as follows:
the dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;
each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;
the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;
after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;
the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;
the channel attention mechanism in the buffering attention module firstly causes the input to be subjected to average pooling to obtain a group of channel-related statistics, and the group of statistics are subjected to linear transformation and are multiplied by a sigmoid activation function to obtain an output result of the acquired position information by multiplying the statistics with the original input corresponding elements;
the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain spatially related statistics, and the group of statistics are subjected to linear transformation and through a sigmoid activation function and then multiplied with corresponding elements of the original input to obtain an output result of the acquired target information.
Shuffle attention the attention mechanism reduces the quantity of parameters and the calculation consumption, and simultaneously integrates the characteristic information of two dimensions of a channel and a space, thereby improving the detection precision of the detector.
The method for further removing the large target detection head of the YOLOv5s model comprises the following steps:
since the sample targets are all small and medium-sized targets, removeAfter the multiplying power of the large target detection head, the network comprises +.> The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.
And a large target detection head is removed, so that the accuracy of the detector is improved, and the parameter quantity and the calculated quantity are reduced.
The invention also provides a small target detection method based on the improved YOLOv5s algorithm, which comprises the following steps:
acquiring an image of a region to be detected, and preprocessing;
acquiring a friction force diagram of a region to be detected, and preprocessing;
based on the construction method, the YOLOv5s model is improved, and an optimized YOLOv5s model is obtained;
inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;
the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked.
The method uses an improved YOLOv5s model to carry out small target detection so as to judge whether a patient suffers from Sjogren syndrome or not and realize auxiliary diagnosis.
The invention also provides a small target detection system based on the improved YOLOv5s algorithm, which comprises an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the small target detection method, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient suffers from Sjogren syndrome.
With the system, whether the patient suffers from Sjogren syndrome is diagnosed by acquiring and analyzing images and acquiring small target detection results of the images.
Further, the friction force diagram acquisition module is a friction tester.
The device is easy to obtain and convenient to use.
Drawings
FIG. 1 is a schematic diagram of the construction method of the improved YOLOv5s model of the present invention;
FIG. 2 is a schematic diagram of SIOU angle loss calculation for an improved construction method of the YOLOv5s model of the present invention;
FIG. 3 is a detailed schematic diagram of the self-attention layer of the multi-head self-attention module of the present invention for improving the construction method of the YOLOv5s model;
fig. 4 is a schematic structural diagram of a Shuffle attention module of the improved YOLOv5s model construction method of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
In the description of the present invention, it should be understood that the terms "longitudinal," "transverse," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.
In the description of the present invention, unless otherwise specified and defined, it should be noted that the terms "mounted," "connected," and "coupled" are to be construed broadly, and may be, for example, mechanical or electrical, or may be in communication with each other between two elements, directly or indirectly through intermediaries, as would be understood by those skilled in the art, in view of the specific meaning of the terms described above.
The invention discloses a construction method for improving a YOLOv5s (YOLOv 5 is a single-stage target detection algorithm) model, which is used for detecting lymphocyte infiltration lesions in a pathology map and assisting in pathology diagnosis. Because of small lymphocyte volume and difficult distinction, the invention improves the YOLOv5 to improve the accuracy of lymphocyte detection task. As shown in fig. 1, the construction method includes the steps of:
replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function, accelerating network convergence, and improving model precision;
a multi-head self-attention Module (MHSA) introduces a skeleton part of the YOLOv5s model, helping the network capture more long-term dependencies and coping with challenges of complex background;
in the neck portion of the YOLOv5s model, a Shuffle Attention (SA) attention module was introduced, enhancing the ability of the model to fuse spatial and channel dimensional features;
introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;
and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model. And the corresponding detection heads are removed, so that the precision is improved, and meanwhile, the number of parameters and the complexity of a model are reduced.
In a preferred embodiment of the present invention, the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:
in computer vision tasks, the efficiency of object detection is highly dependent on the definition of the loss function. Conventional object detection loss functions focus on several indicators of bounding box regression, including distance, overlap region, and aspect ratio, whereas conventional iou strategies do not take into account orientation information of real and predicted boxes. The prediction frame swings around in the training process, so that model training is slow, fitting is poor, and finally detection performance of the model is affected. The SIOU takes into account the angle loss, solving the above-mentioned problem. The loss function of the SIOU mainly consists of four parts, angle loss, distance loss, shape loss and IOU loss.
As shown in FIG. 2, let C be the angle (less than or equal to 45 degrees) between the predicted frame and the center of the real frame coordinates h 、C w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:
the angular loss Λ is defined according to the included angle α as:
the distance loss delta for SIOU is:
γ=2-Λ
wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ x ,ρ y Representing the distance between the real frame and the predicted frame in the width and height directionsA loss factor; gamma is the angle loss factor.
The distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment w And C h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;
the shape loss Ω of SIOU is:
wherein θ represents the attention degree to shape loss, and the value of the attention degree is required to be correspondingly adjusted according to a specific data set in order to realize more balanced training, and the value range is [2,6 ]]The present embodiment sets its value to 4; the smaller the target value is, the higher the concerned degree of the shape loss is, so that the model is more biased to adjust the shape of the prediction frame in the training process, and the feedback of other losses to the training is restrained; w, h, w gt ,h gt Representing the width and height of the prediction frame and the real frame, respectively; omega w ,ω h The shape loss variable in the width direction and the height direction is shown.
After the three index losses are fused, the regression frame loss function L of SIOU SIOU The method comprises the following steps:
the IOU is the cross-correlation between the real frame and the predicted frame.
The process is affected by the problem of training sample imbalance when predicting bounding box regression of objects. In the image, there are fewer high quality anchor boxes with small regression errors than low quality anchor boxes with large errors. Poor quality anchor boxes can create excessive gradients that can negatively impact the training process. To address this problem, focal-loss is integrated with SIOU to distinguish between high quality and low quality anchor boxes. This helps to improve the accuracy of the regression. The Focal-SIOU loss function is shown below:
L Focal-SIOU =IOU γ L SIOU
wherein, gamma represents the attention degree to the IOU, and the value range is larger than 0. The larger the gamma value is, the higher the attention of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L SIOU . IOU represents the intersection ratio between the real frame and the predicted frame; l (L) SIOU Representing the loss function of SIOU.
In a preferred embodiment of the present invention, the method for introducing the multi-headed self-attention module into the skeleton portion of the YOLOv5s model comprises:
as shown in FIG. 3, the multi-headed self-attention module is a simple powerful self-attention module suitable for a variety of machine vision tasks including image classification, object detection, and instance segmentation.
The multi-head self-attention layer adds position codes to make the multi-head self-attention layer sensitive to the positions; the network is sensitive to the relative positions among the features while focusing on the feature information, and plays a role in efficiently combining the information.
Defining the number of heads in the multi-head self-attention layer (such as 4 heads, etc., adjusted according to specific application scene), firstly generating query vector q, key vector k and value vector v by point convolution, and R h 、R w Each representing a position code extracted from the height and width;
after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated T And q and k are subjected to matrix multiplication to generate a corresponding content-content vector qk T ;
qr T And qk T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining an output characteristic z (an output characteristic extracted after passing through a multi-head self-attention mechanism).
In a preferred embodiment of the present invention, the method of introducing the shuffleattention module in the neck portion of the YOLOv5s model is as follows:
attention mechanisms have become key components for improving model detection performance, and two types of attention mechanisms are widely applied to machine vision research, namely a spatial attention mechanism and a channel attention mechanism, and focus on information in spatial and channel dimensions.
The channel attention mechanism is helpful for the model to confirm the characteristic information of the detection target, and the spatial attention mechanism is helpful for the model to acquire the position information of the detection target. While fusing channel attention with spatial attention improves performance, it also increases the number of parameters and computational consumption.
As shown in fig. 4, shuffle attention integrates the characteristic information of two dimensions of the channel and the space while reducing the parameter amount and the calculation consumption required by the attention mechanism, so as to improve the detection precision of the detector.
The dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;
each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;
the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;
after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;
the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;
the spatial attention mechanism and the channel attention mechanism adopted in Shuffle attention are simple to build, and compared with the SE attention mechanism and the CBAM attention mechanism, the accuracy is improved, the parameter quantity is less, and the required calculation cost is lower. The channel attention mechanism in the Shuffle attention attention module firstly enables the input to be subjected to average pooling to obtain a group of channel-related statistics, and the group of statistics are subjected to linear transformation and are multiplied by a sigmoid activation function to obtain an output result of the acquired position information by multiplying the corresponding element of the original input;
the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain spatially related statistics, and the group of statistics are subjected to linear transformation and through a sigmoid activation function and then multiplied with corresponding elements of the original input to obtain an output result of the acquired target information.
In a preferred scheme of the invention, a cross-mode image segmentation module acquires color images of a region to be detected and an image map formed by friction force of the region to be detected, the friction force of different position points of the region to be detected is represented by gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;
the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;
the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;
the feature fusion module comprises a first gating module (Relu function), a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction features and processes the portions with output larger than the friction threshold, and the fusion network fuses the features output by the first gating module and the second gating module, and takes the friction features as a channel for outputting images.
In a preferred scheme of the invention, the method for removing the large target detection head of the YOLOv5s model comprises the following steps:
the classical yolov5 model comprisesThe three detection heads with sampling multiplying power respectively correspond to small target detection, medium target detection and large target detection. Since the sample targets are all small and medium-sized targets, remove +.>After the multiplying power of the large target detection head, the network comprises +.>The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.
And a large target detection head is removed, so that the accuracy of the detector is improved, and the parameter quantity and the calculated quantity are reduced.
The invention also provides a small target detection method based on the improved YOLOv5s algorithm, which comprises the following steps:
acquiring an image of a region to be detected, and preprocessing; the image to be detected is segmented into pathological block diagrams with the size of 640 x 640 according to the highest resolution, and 300 images are screened as experimental data sets. Data set according to 8:1: the ratio of 1 is divided into a training set, a verification set and a test set.
Acquiring a friction force diagram of a region to be detected, and preprocessing;
based on the construction method, the YOLOv5s model is improved, and an optimized YOLOv5s model is obtained;
inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;
the small target number of individual regions is compared to a set threshold, and when the threshold is exceeded, the region is screened and labeled for use in diagnosing whether the patient has sjogren's syndrome.
For example, the experiment sets a training round to 100 epochs; the batch size is 5; the input image size is 640 x 640; the initial learning rate was set to 0.01, the decay factor was 0.005, and the momentum parameter was 0.937 using Adam as the optimization algorithm. Collecting a pathological chart of the lip gland biopsy, cutting the WSI of the lip gland biopsy into pathological block charts with the size of 640 x 640 according to the highest resolution, screening 300 sheets of the pathological block charts as an experimental data set, and manually labeling lymphocytes in the pathological block charts by using labelimg based on lymphocyte discrimination criteria. Data set according to 8:1: the ratio of 1 is divided into a training set, a verification set and a test set.
The present invention employs target detection metrics to evaluate the performance of improved YOLOv5s in lymphocyte detection tasks. The primary measure of interest is mAP 0.5 Since the target to be detected is only one lymphocyte, mAP 0.5 Can be expressed as:
wherein, P and R respectively represent precision and recall rate, satisfy:
wherein P represents the precision rate, R represents the recall rate, TP represents the instance that was correctly predicted as the positive instance, TN represents the instance that was incorrectly predicted as the negative instance, FP represents the instance that was incorrectly predicted as the positive instance, and FN represents the instance that was not incorrectly predicted as the negative instance.
The improved YOLOv5s target detection model is improved in the aspects of loss function, feature extraction and attention mechanism of the original model. To evaluate the improvement of different modules and the impact of the combination between modules on the performance of the detection model, ablation experiments were designed on the data sets herein using mAP 0.5 As evaluation indexes, the experimental results are shown in Table 1.
Table 1 ablation experimental results
The detection accuracy of the original YOLOv5s model on the data set is 87.7%, after the multi-head self-attention module is carried on the back bone, the model accuracy is improved by 1.4%, and the parameters and GFLOPs are slightly reduced. After the model neck is loaded with Shuffle attention modules, the model accuracy is improved by 1.5%, and the parameters and GFLOPs are slightly increased. After the large target detection head is removed, the model precision is improved by 1.2%, the parameter quantity is greatly reduced, and the GFLOPs are obviously reduced. After replacing the original CIOU with Focal-SIOU, the model accuracy was improved by 0.5% without changing the parameters and GFLOPs. After the improved strategies are fused, the final detection precision of the model can reach 91.1%, compared with the original network, the precision is improved by 3.4%, the parameter quantity is reduced by 29.6%, and the GFLOPs is reduced by 10.8%, so that the improved strategies adopted in the method have obvious improvement effect on lymphocyte detection.
Comparing the inventive network with other networks of similar body mass, it can be seen that the inventive network has certain advantages in terms of both accuracy and parameter quantity, as shown in table 2.
Table 2 model comparison results
Network model | mAP 0.5 /% | Quantity of parameters | GFLOPs |
Yolov7-tiny | 85.2 | 6.01×10 6 | 13.0 |
Yolov7 | 85.5 | 9.32×10 6 | 26.7 |
RetinaNet | 69.6 | 19.8×10 6 | 61.5 |
Yolov3-SPP | 89.9 | 4.12×10 6 | 12.0 |
Yolov6n | 88.3 | 4.23×10 6 | 11.8 |
RT-DETR | 88.3 | 20×10 6 | 60 |
Network herein | 91.1 | 4.94×10 6 | 14.1 |
The improved YOLOv5s model can fully extract background information, effectively identify the interference cells with similar color and shape such as epithelial cells, and ensure the detection accuracy. Based on an improved YOLOv5s model, the WSI of the lip gland biopsy is detected in a blocking mode, the lymphocyte number of a single area is counted, the block mark color with the lymphocyte number larger than a set threshold value is regarded as a suspicious focus, and the reference of doctors is provided for the purpose of assisting the diagnosis of the Sjogren syndrome.
The invention also provides a small target detection system based on the improved YOLOv5s algorithm, which comprises an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the small target detection method, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient suffers from Sjogren syndrome. Preferably, the friction force diagram acquisition module is a friction tester.
With the system, whether the patient suffers from Sjogren syndrome is diagnosed by acquiring and analyzing images and acquiring small target detection results of the images.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
Claims (9)
1. The construction method for improving the YOLOv5s model is characterized by comprising the following steps of:
replacing the CIOU loss function of the YOLOv5s model by using the Focal-SIOU loss function; introducing a multi-head self-attention module into a skeleton network part of a YOLOv5s model;
in the neck portion of the YOLOv5s model, a shuffle attention attention module was introduced;
introducing a cross-modal image segmentation module after the shuffle attention attention module, wherein the cross-modal image segmentation module comprises an image feature extraction module, a friction force feature extraction module and a feature fusion module;
and removing the large target detection head of the YOLOv5s model to obtain an optimized YOLOv5s model.
2. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for replacing the CIOU loss function of the YOLOv5s model with the Focal-SIOU loss function is as follows:
let C be the angle between the coordinate centers of the predicted frame and the real frame assuming alpha h 、C w Respectively representing the horizontal distance and the vertical distance between the coordinate centers of the prediction frame and the real frame, and the linear distance sigma and the included angle alpha between the coordinate centers of the prediction frame and the real frame are as follows:
the angular loss Λ is defined according to the included angle α as:
the distance loss delta for SIOU is:
γ=2-Λ
wherein,respectively representing the coordinates of the central points of the real frame and the prediction frame; ρ x ,ρ y Representing the distance loss factors of the real frame and the prediction frame in the width direction and the height direction; gamma is an angle loss factor;
the distance loss function is integrated with angle loss, when the included angle alpha is more approaching 45 degrees, the contribution of the angle loss is larger, when the included angle alpha is more approaching 0 degree, the contribution of the angle loss is smaller, and the angle loss is degenerated into distance loss, and C at the moment w And C h Representing the maximum distance between the predicted frame and the real frame, and not the distance between the center points of the predicted frame and the real frame;
the shape loss Ω of SIOU is:
wherein θ represents the degree of interest in shape loss, the value of which requires corresponding adjustment according to the particular dataset; w, h, w gt ,h gt Representing a predicted frame and a real frame, respectivelyIs the width and height of (2); omega w ,ω h Representing the shape loss factor in the width direction and the height direction;
after the three index losses are fused, the loss function L of SIOU SIOU The method comprises the following steps:
the IOU is the intersection ratio between the real frame and the prediction frame;
focal-loss is integrated with SIOU to distinguish high quality from low quality anchor boxes, which helps to improve regression accuracy, focal-SIOU loss function is:
L Focal-SIOU =IOU γ L SIOU
wherein, gamma represents the attention degree to the IOU, the value range is larger than 0, and the larger the gamma value is, the higher the attention degree of the loss function to the IOU is; the closer the value is to 0, the lower the attention of the loss function to the IOU and gradually degenerates to L SIOU The method comprises the steps of carrying out a first treatment on the surface of the IOU represents the intersection ratio between the real frame and the predicted frame; l (L) SIOU Representing the loss function of SIOU.
3. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for introducing a multi-headed self-attention module into a skeleton network part of the YOLOv5s model comprises the following steps:
performing structural adjustment on a C3 module of the original YOLOv5s network, and integrating the structural adjustment into a multi-head self-attention layer;
adding position codes into the multi-head self-attention layer to make the multi-head self-attention layer sensitive to the positions;
defining the number of heads in the multi-head self-attention layer to satisfy the balance of precision and calculated amount, input firstly generates a query vector q, a key vector k and a value vector v through point convolution, and R h 、R w Each representing a position code extracted from the height and width;
after the position coding performs corresponding element addition operation, a position vector r is generated, matrix multiplication is performed on r and q, and a vector qr corresponding to the content-position is generated T And q and k advanceLine matrix multiplication to generate a content-to-content vector qk T ;
qr T And qk T And performing corresponding element addition operation, performing matrix multiplication with v after passing through the softmax layer, and finally obtaining the output characteristic z of the multi-head self-attention layer.
4. The method of constructing an improved YOLOv5s model according to claim 1, wherein the method of introducing shuffle attention attention module in the neck portion of the YOLOv5s model is as follows:
the dimension of the input feature map is c/g h w, and the input feature map is divided into g groups along the channel dimension c, so that the dimension of each group is c/g h w;
each group is split into two branches again along the channel dimension, and the dimension of each branch is changed into c/2g h w;
the two branches respectively generate respective feature graphs through a spatial attention module and a channel attention module to help the model to focus on the detection target and the position information of the detection target;
after information is extracted, the two feature graphs are spliced, the dimension is changed back to c/g h w, and after the features are extracted from the g groups, the output is obtained by splicing again, and the output dimension is still c h w and remains the same as the input dimension;
the output reorders the packets through the channel reorganization function, so as to ensure the information circulation among different groups;
the channel attention mechanism in the Shuffle attention attention module firstly enables the input to obtain a group of channel-related statistics through average pooling, and the group of statistics are multiplied by the corresponding element of the original input to acquire the output result of the position information after linear transformation and sigmoid activation function;
the spatial attention mechanism adopted in the Shuffle attention attention module firstly performs group normalization on input to obtain a spatially related statistic, and the group of statistic is subjected to linear transformation and is multiplied by a sigmoid activation function to acquire an output result of target information by multiplying the output result with the corresponding element of the original input.
5. The method for constructing the improved YOLOv5s model according to claim 1, wherein the cross-mode image segmentation module acquires color images of a region to be detected and force patterns formed by friction force of the region to be detected, the force patterns represent friction force of different position points of the region to be detected by gray values, the two images are adjusted to be of the same resolution, each image is provided with xerosis segmentation label information, and image data pairs are divided into a training set, a verification set and a test set;
the image feature extraction module performs feature extraction on the color image of the region to be detected to obtain single-mode color image features;
the friction force characteristic extraction module performs characteristic extraction on the force diagram of the region to be detected to obtain a single-mode friction force characteristic;
the feature fusion module comprises a first gating module, a second gating module and a fusion network, wherein the first gating module acquires color image features and processes the portions with output larger than an image threshold, the second gating module acquires friction force features and processes the portions with output larger than the friction force threshold, and the fusion network fuses the features output by the first gating module and the second gating module and takes the friction force features as a channel for outputting the image.
6. The method for constructing an improved YOLOv5s model according to claim 1, wherein the method for removing the large target detection head of the YOLOv5s model is as follows:
since the sample targets are all small and medium-sized targets, removeA large target detection head with multiplying power, a network comprising ∈>The detection heads with the two sampling multiplying powers respectively correspond to small target detection and medium target detection.
7. The small target detection method based on the improved YOLOv5s algorithm is characterized by comprising the following steps of:
acquiring an image of a region to be detected, and preprocessing;
acquiring a friction force diagram of a region to be detected, and preprocessing; based on the construction method of one of claims 1 to 6, improving the YOLOv5s model to obtain an optimized YOLOv5s model;
inputting the preprocessed image into an optimized YOLOv5s model, detecting small targets in the image, and counting the number of the small targets in a single area;
the small target number of the single region is compared with a set threshold value, and when the threshold value is exceeded, the region is screened and marked.
8. A small target detection system based on an improved YOLOv5s algorithm, which is characterized by comprising an image acquisition module, a friction force diagram acquisition module and a processing module, wherein the image acquisition module is used for acquiring an image to be detected, the friction force diagram acquisition module is used for acquiring friction force diagrams of an area to be detected, the output ends of the image acquisition module and the friction force diagram acquisition module are respectively connected with the input end of the processing module, the processing module executes the method of claim 7, detects small targets of the image, determines whether the number of the small targets of a single area exceeds a threshold value, and diagnoses whether a patient has the Sjogren syndrome.
9. The small target detection system based on the modified YOLOv5s algorithm of claim 8, wherein the friction force map acquisition module is a friction tester.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311651968.6A CN117475434A (en) | 2023-12-05 | 2023-12-05 | Construction method of improved YOLOv5s model, small target detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311651968.6A CN117475434A (en) | 2023-12-05 | 2023-12-05 | Construction method of improved YOLOv5s model, small target detection method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117475434A true CN117475434A (en) | 2024-01-30 |
Family
ID=89633087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311651968.6A Pending CN117475434A (en) | 2023-12-05 | 2023-12-05 | Construction method of improved YOLOv5s model, small target detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117475434A (en) |
-
2023
- 2023-12-05 CN CN202311651968.6A patent/CN117475434A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112288706A (en) | Automatic chromosome karyotype analysis and abnormality detection method | |
He et al. | Automated model design and benchmarking of deep learning models for covid-19 detection with chest ct scans | |
CN111612008A (en) | Image segmentation method based on convolution network | |
CN113408472B (en) | Training method of target re-identification model, target re-identification method and device | |
EP2557539A2 (en) | Image processing apparatus, image processing method, and image processing program | |
CN111798424B (en) | Medical image-based nodule detection method and device and electronic equipment | |
CN116342894B (en) | GIS infrared feature recognition system and method based on improved YOLOv5 | |
CN109902576B (en) | Training method and application of head and shoulder image classifier | |
CN111462102B (en) | Intelligent analysis system and method based on novel coronavirus pneumonia X-ray chest radiography | |
CN116580394A (en) | White blood cell detection method based on multi-scale fusion and deformable self-attention | |
CN112132166A (en) | Intelligent analysis method, system and device for digital cytopathology image | |
CN111860587A (en) | Method for detecting small target of picture | |
Yao et al. | GeminiNet: combine fully convolution network with structure of receptive fields for object detection | |
CN110717916B (en) | Pulmonary embolism detection system based on convolutional neural network | |
CN117173595A (en) | Unmanned aerial vehicle aerial image target detection method based on improved YOLOv7 | |
CN117475434A (en) | Construction method of improved YOLOv5s model, small target detection method and system | |
Sameki et al. | ICORD: Intelligent Collection of Redundant Data-A Dynamic System for Crowdsourcing Cell Segmentations Accurately and Efficiently. | |
CN111598955A (en) | Mobile terminal intelligent foundation pit monitoring system and method based on photogrammetry | |
Deng et al. | A coarse to fine framework for recognizing and locating multiple diatoms with highly complex backgrounds in forensic investigation | |
Zhai et al. | Automatic white blood cell classification based on whole-slide images with a deeply aggregated neural network | |
Li et al. | Long short-term memory improved Siamese network for robust target tracking | |
Xiong et al. | PC-SuperPoint: interest point detection and descriptor extraction using pyramid convolution and circle loss | |
Wu et al. | Real-time visual tracking via incremental covariance model update on Log-Euclidean Riemannian manifold | |
Zhu et al. | PODB: A learning-based polarimetric object detection benchmark for road scenes in adverse weather conditions | |
Wang | Action recognition based on Riemannian manifold distance measurement and adaptive weighted feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |