CN115937089A

CN115937089A - Training detection method based on improved YOLOV5 focus detection model

Info

Publication number: CN115937089A
Application number: CN202211274984.3A
Authority: CN
Inventors: 王月; 谢海琼; 周忠娇; 刘永旭; 银兴行
Original assignee: Chongqing Biological Intelligent Manufacturing Research Institute
Current assignee: Chongqing Biological Intelligent Manufacturing Research Institute
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2023-04-07

Abstract

The invention discloses a training detection method based on an improved YOLOV5 focus detection model, which comprises the steps of constructing an original training sample required by model training; inputting an original training sample into an image detection model to obtain multilayer characteristic layer data; inputting the obtained multilayer characteristic layer data into a predictor model of a YOLO-V5+ SSFPN image detection model, and outputting a prediction frame of a focus in an image and a judgment result of a focus image; comparing the result generated by the model with the standard corresponding to the training picture, and updating the parameters of the network model; and storing the trained model weight file, and loading the weight to the model for lesion detection. According to the invention, a focus picture detection model is trained through pictures and labeled data, so that the focus is rapidly positioned and judged, the structure of a network model is changed aiming at the problem that most sizes of the focus are small, and finally, the rapid and accurate positioning and identification of the focus position by the network model are realized by adding an SSFPN fusion multi-resolution feature layer.

Description

Training detection method based on improved YOLOV5 focus detection model

Technical Field

The invention belongs to the field of image recognition, and particularly relates to a method for training and detecting a human body focus model based on a YOLOV5+ SSFPN improved network.

Background

The development of medical technology is mature, and medical images (such as CT images) are widely used by medical staff. The focus can be displayed in the medical image, but the focus part is usually small, if the focus part is not detected in the medical image, the medical staff can spend time to carry out manual detection, possibly resulting in missed detection or wrong detection, and consuming time and labor.

In the field of image recognition, the image detection method based on artificial intelligence basically meets the requirements of daily life, has a promising prospect in the aspect of solving the problems of time consumption and labor consumption, the application of medical artificial intelligence has shown great potential in the fields of medical image analysis and computer-assisted medical treatment, image lesion detection is one of important means of computer-assisted medical treatment, and has advantages in detection significance, but the complexity of lesion tissues and the diversity of lesions make the task of lesion detection challenging.

Disclosure of Invention

Aiming at the key problems of time and labor consumption, missing detection or false detection and the like in the conventional process of manually detecting focus images, the method for training and detecting the human focus model based on the YOLOV5+ SSFPN improved network is provided and used for quickly and accurately judging and positioning the images of the small focuses of the human body.

In view of this, the technical scheme adopted by the invention is a training detection method based on an improved YOLOV5 lesion detection model, which comprises the following steps:

step 1: and constructing original training samples required by model training.

Step 2: and inputting the original training sample into an image detection model to obtain multilayer characteristic layer data.

And step 3: and inputting the obtained multilayer characteristic layer data into a predictor model of a YOLO-V5+ SSFPN image detection model, and outputting a prediction frame of a focus in the image and a judgment result of a focus image.

And 4, step 4: and comparing the result generated by the model with the standard corresponding to the training picture, and updating the parameters of the network model.

And 5: and storing the trained model weight file, and loading the weight to the model for lesion detection.

The focus picture detection model is trained through the pictures and the labeled data, so that the focus is quickly positioned and judged, the structure of the network model is changed aiming at the problem that most sizes of the focus are small, the SSFPN is added to fuse the multi-resolution feature layer, and finally, the network model is quickly and accurately positioned and identified on the position of the focus, so that technical support is provided for reducing focus detection errors.

The invention uses the image amplification technology to solve the problem of too few focus image samples, increases the number of original samples and improves the accuracy of the trained model detection.

According to the invention, the image amplification is carried out on the focus image of the small-size small sample, the deep neural network YOLO-V5+ SSFPN is trained through more pictures, the network can be used for rapidly and accurately identifying and positioning the small-size focus, and the network has smaller calculated amount compared with other two-stage networks. The detection capability of the network model can be continuously improved by continuously collecting focus images, and the problem of high missed diagnosis rate of manual identification is solved.

Drawings

FIG. 1 is a diagram of a training sample;

FIG. 2 is a diagram of the network structure of YOLO-V5;

FIG. 3 is a block diagram of SSFPN;

FIG. 4 is a flow chart of network model training detection.

Detailed Description

1. A method for training and detecting a human body lesion model based on a YOLOV5+ SSFPN improved network, as shown in fig. 4, the method comprises the following steps:

step 1: processing such as image acquisition and segmentation is performed on positions where focuses may exist, and the processed images are labeled to obtain original training samples required by model training, wherein the samples comprise processed pictures and labeling information, as shown in fig. 1. The number of images with focuses is small, and the sizes of most focuses are small, so that the acquired images are properly cut and then are amplified by an image enhancement method, and the number of original training samples is increased.

The image enhancement method respectively adopts the steps of adding salt and pepper noises with different degrees into an image, carrying out Gaussian blur and bilateral blur with different degrees on the image, properly changing the brightness degree of the image, and combining the image enhancement modes for use, thereby greatly amplifying the original image data set.

And 2, step: and inputting the original training sample into a YOLO-V5+ SSFPN image detection model to obtain multilayer characteristic layer data.

And step 3: inputting the obtained multilayer feature layer data into a prediction sub-model of a YOLO-V5+ SSFPN image detection model (YOLO-V5 is an end-to-end detection model and can be divided into a feature extraction sub-model and a target boundary frame prediction sub-model), and outputting a prediction frame of a focus in the image and a judgment result of a focus image. The YOLO-V5 network model applies a single Convolutional Neural Network (CNN) to the whole image, divides the image into grids, directly predicts the class probability and the bounding box of each grid, greatly reduces the time for judging the focus image, enhances the input image by using Mosaic data, splices the image in the modes of random zooming, random cutting and random arrangement, increases sample data, and improves the detection effect of small targets. The network model of YOLO-V5 is shown in FIG. 2.

The structure of YOLO-V5 is divided into an input end, a Backbone, a Neck and a prediction end, wherein the input end mainly operates on Mosaic data enhancement, adaptive anchor frame calculation and adaptive picture scaling, and the input picture is properly transformed to reach the same size; the Backbone comprises a Focus module and a CSP module, wherein the Focus module divides a high-resolution picture (characteristic diagram) into a plurality of low-resolution pictures/characteristic diagrams by adopting slicing operation, namely alternate column sampling and splicing are carried out, and information loss caused by downsampling is reduced; the CSP structure divides the original input into two branches, respectively carries out convolution operation to reduce the number of channels by half, then carries out Bottleneck N operation (namely N residual modules) on one branch, and then concat two branches, aiming at enabling the model to learn more characteristic information; the Neck module comprises an FPN structure and a PAN structure, and corresponding features with different resolutions can be stored when the image features are transmitted in the model, so that feature information can be better extracted; the prediction end comprises CIOU _ Loss as a Loss function of the bounding box and nms non-maximum suppression, and the overlapped object identification is better identified and hidden.

However, the focus image is complex, the size of part of the focus is too small, and the deeper features obtained by the YOLO-V5 using convolution operation bring information loss, which causes difficulty in focus detection, and the SSFPN feature extraction method performs three-dimensional spatial 3D convolution operation by regarding the FPN structure as a scale space, so that image features of FPN under different resolutions are fused, and the detection effect of the network model on small targets is further improved. The structure of SSFPN is shown in fig. 3.

In order to improve the extraction capability of a YOLO-V5 detection model on small target features, SSFPN is arranged between an FPN and a PAN module of a YOLO-V5 Back bone, resize operation is carried out on feature information with different resolutions extracted by the FPN module to change the feature information into the same size, the feature information is changed into a 4-dimensional tensor after Concat operation, and then the feature information is input to the PAN module after 3D convolution operation.

And 4, step 4: and comparing the result generated by the model with the standard corresponding to the training picture, and updating the parameters of the network model. YOLO-V5 uses CIOU _ Loss as a Loss function of the bounding box, and the formula is as follows:

CIOU_Loss＝1–CIOU

CIOU is an improved calculation method of IOU, which considers the length-width ratio in the three elements of the regression frame; the IOU is used for calculating the area intersection ratio of a real frame and a prediction frame; b, b ^gt Respectively representing a prediction box and a real box, p (b, b) ^gt ) Representing the Euclidean distance between two central points of the prediction frame and the real frame; c diagonal distance of minimum closure area capable of simultaneously containing prediction box and real box; w is a ^gt 、h ^gt W, h represent the width of the real box, the height of the real box, the width of the predicted box, the height of the predicted box, respectively.

Step 6: inputting the cut focus detection picture into a detection model, drawing a focus prediction frame in an original picture by the detection model, displaying the judgment result of the focus image and storing the image with the boundary frame into a specific folder. The YOLO-V5 network model is an end-to-end structure, an anchor frame does not need to be predicted, a focus picture can be rapidly judged, FPS can reach 50, and the focus of a small target can be rapidly and accurately detected after an SSFPN structure is added.

Claims

1. A training detection method based on an improved YOLOV5 focus detection model is characterized by comprising the following steps:

step 1: constructing an original training sample required by model training;

and 2, step: inputting an original training sample into an image detection model to obtain multilayer characteristic layer data;

and step 3: inputting the obtained multilayer characteristic layer data into a prediction sub-model of a YOLO-V5+ SSFPN image detection model, and outputting a prediction frame of a focus in an image and a judgment result of a focus image;

and 4, step 4: comparing the result generated by the model with the standard corresponding to the training picture, and updating the parameters of the network model;

and 5: and storing the trained model weight file, and loading the weight to the model for focus detection.

2. The method of claim 1, wherein the improved YOLOV5 lesion detection model-based training detection method comprises: the step 1 also comprises the step of amplifying the images by an image enhancement method after the acquired images are cut, so that the number of original training samples is increased.

3. The method of claim 2, wherein the improved YOLOV5 lesion detection model-based training detection method comprises: the image enhancement comprises adding salt and pepper noises with different degrees into the image, carrying out Gaussian blur and bilateral blur with different degrees on the image or changing the brightness degree of the image, or combining the image enhancement modes.

4. The method of claim 1, wherein the improved YOLOV5 lesion detection model-based training detection method comprises: the YOLO-V5 network model comprises an input end, a backhaul end, a Neck end and a prediction end, wherein the input end comprises the following operations: the method comprises the following steps of Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling; the Backbone comprises a Focus module and a CSP module, wherein the Focus module splits a high-resolution picture/feature map into a plurality of low-resolution pictures/feature maps by adopting slicing operation, the CSP structure divides an input into two branches, convolution operation is respectively carried out, then Bottleneck N operation is carried out on one branch, and then two branches are concat; the Neck comprises FPN and PAN structures; the predictor includes CIOU _ Loss as a Loss function and non-maximum suppression of bounding boxes.

5. The method of claim 4, wherein the improved YOLOV5 lesion detection model-based training detection method comprises: SSFPN is arranged between the FPN and the PAN module of the YOLOV5 Back bone, resize operation is carried out on the feature information with different resolutions extracted by the FPN module to change the feature information into the same size, concat operation is changed into a 4-dimensional tensor, and then 3D convolution operation is carried out on the tensor to input the tensor to the PAN module.

6. The method of claim 1, wherein the improved YOLOV5 lesion detection model-based training detection method comprises: the YOLO-V5 uses CIOU _ Loss as a Loss function of the bounding box, and the formula is as follows:

CIOU_Loss＝1–CIOU

7. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the method for training a human lesion model according to any one of claims 1 to 5.