CN113555087A

CN113555087A - Artificial intelligence film reading method based on convolutional neural network algorithm

Info

Publication number: CN113555087A
Application number: CN202110813909.9A
Authority: CN
Inventors: 薛帅; 张丽; 陈向; 左万利
Original assignee: First Hospital Jinlin University
Current assignee: First Hospital Jinlin University
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-10-26

Abstract

The invention belongs to the technical field of artificial intelligence, and particularly relates to an artificial intelligence film reading method based on a convolutional neural network algorithm, which comprises the following steps: acquiring a target image to be detected; step two: selecting an image area; step three: extracting target features; step four: classifying the target according to the characteristics; step five: regression is conducted on the target boundary box; step six: optimizing the structure; step seven: the method has the advantages that the target detection is completed, the structure is reasonable, the accuracy of thyroid color Doppler ultrasound diagnosis is improved, misdiagnosis of preoperative thyroid color Doppler ultrasound is prevented, the input and the output are all program operation, the effect of human factors in image diagnosis is eliminated, and the image characteristics can be better identified compared with the traditional method. Moreover, the ability of gradually learning and improving the algorithm in a grading way is more suitable for the improvement of the program, and is the method with the highest diagnosis accuracy rate of the medical image in the AI at present.

Description

Artificial intelligence film reading method based on convolutional neural network algorithm

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence film reading method based on a convolutional neural network algorithm.

Background

The incidence of Thyroid Cancer (TC) is rapidly increasing worldwide, and surgery remains the first treatment option for physicians to treat TC in China. However, most TCs grow slowly and have good prognosis, and their fatality rate is not reduced by active surgical treatment, but rather, the quality of life of patients with TCs is greatly reduced by surgery, so overdiagnosis and overdiagnment of TCs become the key points of clinical attention. The preoperative diagnosis of TC mostly depends on the color Doppler ultrasound of the thyroid, however, the experience and the technology of color Doppler ultrasound doctors are different, and the accuracy of the color Doppler ultrasound diagnosis of the thyroid is directly influenced. Studies have shown that misdiagnosis of preoperative thyrotoxicosis is the primary cause of erroneous punctures and over-operative treatment.

With the development of computer technology, Artificial Intelligence (AI) plays an increasingly important role in medical image picture recognition and disease diagnosis. Deep learning based on Convolutional Neural Networks (CNN) algorithm shows remarkable application prospect, especially in visual structure and language recognition task. A number of studies have shown that CNNs can provide more accurate diagnostic information than conventional methods in the learning and diagnostic tasks of medical images. Some advanced clinical medicine centers have used deep-learning AI algorithms to diagnose breast, lung, brain and liver disease. However, to date, no AI software for thyroid color Doppler has been developed.

Therefore, an artificial intelligence film reading method based on a convolutional neural network algorithm is provided to solve the problems.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

Therefore, the invention aims to provide an artificial intelligence radiograph reading method based on a convolutional neural network algorithm, so that the accuracy of thyroid color Doppler ultrasound diagnosis is improved, and misdiagnosis of preoperative thyroid color Doppler ultrasound is prevented.

To solve the above technical problem, according to an aspect of the present invention, the present invention provides the following technical solutions:

an artificial intelligence film reading method based on a convolutional neural network algorithm comprises the following steps:

the method comprises the following steps: acquiring a target image to be detected;

step two: selecting an image area;

step three: extracting target features;

step four: classifying the target according to the characteristics;

step five: regression is conducted on the target boundary box;

step six: optimizing the structure;

step seven: and finishing target detection.

As a preferred scheme of the artificial intelligence interpretation method based on the convolutional neural network algorithm, the method comprises the following steps: the algorithm is realized mainly by means of a model of the nanoDet, and the nanoDet comprises a feature extraction backbone network, a feature fusion network and a detection head.

As a preferred scheme of the artificial intelligence interpretation method based on the convolutional neural network algorithm, the method comprises the following steps: in a neural network, particularly in the field of CV (computer vision), features of an image are generally extracted, and the part is the root of the whole CV task, so that the part of the network structure is called a backbone; NanoDet selects ShuffleNetV21.0x as the backbone. ShuffLeNetV21.0x is a modified version of ShuffLeNetV1, and the modification process follows the following 4 guidelines:

(1) the input and output of the same channel size can minimize MAC (memory access cost), and the model speed is the fastest at the moment;

(2) excessive use of group convolution (group convolution) increases the MAC and slows the model speed;

(3) the fewer the number of model branches, the simpler the model and the faster the speed;

(4) element-wise operation can also negatively impact model speed.

As a preferred scheme of the artificial intelligence interpretation method based on the convolutional neural network algorithm, the method comprises the following steps: the Feature fusion layer selects an improved Network PAN (Path Augmentation Network) of FPN (Feature farm Network, FPN) and makes lightweight modification on the basis of the improved Network PAN (Path Augmentation Network); a Feature Pyramid Network (FPN) is an efficient CNN Feature extraction method, a conventional convolutional neural Network is progressive from bottom to top, scale and semantic information are changed continuously, and the FPN is enhanced by adding Feature supplement of top-down paths, so that finally output features better represent multi-dimensional information of an input picture.

As a preferred scheme of the artificial intelligence interpretation method based on the convolutional neural network algorithm, the method comprises the following steps: the method comprises the following steps that a Detection head (head) of an FCOS (fuzzy conditional One-Stage Object Detection) model is selected for the nanoDet, the idea of the FCOS model is to predict a target class and a target frame to which each point in an input image belongs, the overall architecture of the FCOS model is similar to an FPN (feature Pyramid network) structure, and 5 fused feature layers are predicted.

Compared with the prior art, the invention has the beneficial effects that: the accuracy of thyroid color Doppler ultrasound diagnosis is improved, misdiagnosis of preoperative thyroid color Doppler ultrasound is prevented, input and output are all program operation, the effect of human factors in image diagnosis is eliminated, and image characteristics can be better identified compared with the traditional method. Moreover, the ability of gradually learning and improving the algorithm in a grading way is more suitable for the improvement of the program, and is the method with the highest diagnosis accuracy rate of the medical image in the AI at present.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the present invention will be described in detail with reference to the accompanying drawings and detailed embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise. Wherein:

FIG. 1 is a schematic structural view of the present invention;

FIG. 2 is a view of the construction of the nanoDet model of the present invention;

FIG. 3 is a diagram of a backbone network model architecture according to the present invention; (a) a basic ShuffleNet unit of ShuffleNet V1 (b), a ShuffleNet unit of ShuffleNet V1 spatial down-sampling (c), a basic ShuffleNet V2 unit (d), a ShuffleNet V2 spatial down-sampling unit;

FIG. 4 is a diagram of a characteristic pyramid network of the present invention;

FIG. 5 is a graph of FPN calculations according to the present invention;

FIG. 6 is a PAN calculation graph in accordance with the invention;

FIG. 7 is a diagram of an ultra lightweight PAN configuration of the present invention;

FIG. 8 is a diagram of the FCOS model architecture of the present invention;

FIG. 9 is a schematic view of the nanoDet detection head of the present invention;

FIG. 10 is a flowchart illustrating android app target detection according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described herein, and it will be apparent to those of ordinary skill in the art that the present invention may be practiced without departing from the spirit and scope of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Next, the present invention will be described in detail with reference to the drawings, wherein for convenience of illustration, the cross-sectional view of the device structure is not enlarged partially according to the general scale, and the drawings are only examples, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Example 1

step two: selecting an image area;

step three: extracting target features;

step four: classifying the target according to the characteristics;

step five: regression is conducted on the target boundary box;

step six: optimizing the structure;

step seven: and finishing target detection.

Specifically, the algorithm of the invention is realized mainly by means of a model of the nanoDet, and the nanoDet comprises a feature extraction backbone network, a feature fusion network and a detection head.

Specifically, in a neural network, especially in the field of CV (computer vision), features of an image are generally extracted first, and this part is the root of the whole CV task, so this part of the network structure is called a backbone; NanoDet selects ShuffleNetV21.0x as the backbone. ShuffLeNetV21.0x is a modified version of ShuffLeNetV1, and the modification process follows the following 4 guidelines:

(4) element-wise operation can also negatively impact model speed.

Based on the 4 guidelines obtained from the experimental verification and theoretical demonstration, the analysis found that the module of V1 largely uses 1x1 convolution of group (group) operation, which is contrary to the 2 nd guideline. In addition, V1 adopts a bottleneck layer (bottle layer) similar to ResNet, and the number of input and output channels is different and is contrary to the 1 st principle. There is a large amount of element-wise operation in the short-circuited connection, which is contrary to the 4 th principle. Based on the above findings, ShuffleNet V21.0x authors newly introduced a channel splitting (channel split) operation. The input feature map is divided into two branches in the channel dimension: c ' and c-c ', with the specific implementation being c ' ═ c/2. The left branch is mapped equally, the right branch contains 3 continuous convolutions, and the number of input and output channels is the same, following the 1 st principle. The group operation in two 1x1 convolutional layers is eliminated, and the 2 nd principle is followed. In addition, the output of the two branches does not use element-wise operation, but uses concat operation instead, and follows the 4 th principle. The structure of ShuffleNet V2 is shown in Table 1, the structure is basically similar to ResNet, and is divided into several stages, and each stage replaces the Residual block with ShuffleNet unit. Table 1 the number of output channels is changed by changing the number of group operations on the premise that complexity is defined, and generally the more output channels can extract more features.

TABLE 1 Structure Table of ShuffleNet V2

In the nanoDet, ShuffleNet V21.0x is improved in light weight, the last layer of convolution operation is removed, and 8, 16 and 32 times of down-sampling features are extracted and input into PAN for multi-scale feature fusion.

Specifically, the Feature fusion layer selects an improved Network PAN (Path Augmentation Net) of FPN (Feature Central Network, FPN) and is modified in a light weight manner on the basis of the improved Network PAN (Path Augmentation Net); a Feature Pyramid Network (FPN) is an efficient CNN Feature extraction method, a conventional convolutional neural Network is progressive from bottom to top, scale and semantic information are changed continuously, and the FPN is enhanced by adding Feature supplement of top-down paths, so that finally output features better represent multi-dimensional information of an input picture.

The left side of the FPN calculation graph is a common ResNet network from bottom to top, is used for extracting semantic information and is a process for concentrating and expressing features layer by layer. The lower layer may reflect the image information of the shallow layer, and the higher layer may reflect the contour or category information of the image object of the deep layer.

The top-down path on the right side of the FPN calculation graph takes the key role of the information of the higher layer in the subsequent target detection task into consideration, and the information output of the upper layer is up-sampled (linear interpolation is used here) to be used as the input of the next adjacent layer. Firstly, carrying out 1 × 1 convolution dimensionality reduction channel on the uppermost feature map to obtain P5, and then sequentially carrying out upsampling on P5 to obtain P4 and P3. The middle of the graph is connected transversely, so that the high-level semantic information after up-sampling is fused with the positioning detail characteristics with the corresponding sizes before down-sampling. C3 and C4 are adjusted through 1 multiplied by 1 convolution and are consistent with the channel numbers of P3 and P4, element-by-element addition is carried out to obtain P3 and P4, and finally the P3 and the P4 are output together with P5 to carry out subsequent tasks.

The FPN layer in the nanoDet model selects a PANET network, the PAN network is an improved version of the FPN, and the PAN provides Bottom-up Path enhancement (Bottom-up Path Augmentation) on the basis of the FPN, so that the utilization rate of Bottom information is improved.

In addition, in order to lighten the model, all convolution operations in the PAN are completely removed, only 1x1 convolution after extracting the backbone network features is reserved for aligning the feature channel dimensions, and the up-sampling and the down-sampling are completed by interpolation. In addition, the Feature Map (Feature Map) of multi-scale does not use the connection (linkage) operation, but selects direct addition, so the calculation amount of the whole Feature fusion model can be greatly reduced.

Specifically, the nanoDet selects a Detection head (head) of a full relational One-Stage Object Detection (FCOS) model, the FCOS model is based on the idea of predicting a target class and a target frame to which each point in an input image belongs, and the overall architecture of the model is similar to an fpn (feature Pyramid network) structure and predicts 5 fused feature layers.

In the figure, 3 output layers are a classification branch, a Center-less branch and a regression branch.

1. H × W in the classification branch indicates the size of the feature, and C indicates the number of classes.

2. The Center-less branch is used for calculating the distance between each point and the target Center point and excluding the prediction points far away from the target Center.

3. The regression branch will output 4 values (l, t, r, b) representing the distance of the point to the 4 edges of the target frame, respectively.

Defining the label frame and category information of the target in an image as

The first 4 values represent coordinates of upper left corner and lower right corner points respectively, the last value is integer type of category information, and then the category information of each pixel point on the input image can determine a regression target according to whether the category information falls into the labeling frame, and the calculation mode is as formula (1)

In the above formula, (x, y) is the coordinates of the pixel points, the points falling outside the labeling frame are negative samples, and the category is set to be 0; the pixel point in the label box is a positive sample, and the label box type is the target type (non-0 integer) of the point. In addition, in order to solve the problem of identification of image overlapping parts in fig. 8, an FPN structure is introduced into an FCOS model to realize prediction of target frames based on different feature layers and different scales, most of the target frames with coincidence can be stripped, and a pixel point (l) is used for identifying the overlapping parts of the images^*,t^*,r^*,b^*) Whether the maximum value of 4 values is in the preset range or not is used for distinguishing which points are in which feature layer, so that each feature layer has a preset scale range, for example, the maximum value range [0,64 ] corresponding to the P3 layer]The maximum value range for the P4 layer is [64,128 ]]And for target frames that still cannot be peeled off, the training target is based onThe box with the smallest area in the coincident target boxes is calculated.

In addition, a 'centrality' concept is introduced into the FCOS to solve the problem that part of false detection frames are far away from the central point of a real frame, and the solution idea is to multiply the centrality and the corresponding classification score and calculate a final score (used for sequencing the detected boundary frames). The specific implementation is that a Center-less branch and a classification branch are set to be parallel, the centrality value is between 0 and 1, the closer the distance between each point in a target frame and a central point is, the larger the weight is, and finally, a low-quality boundary frame is filtered through a Non-Maximum Suppression (NMS) process. The process of calculating the centrality is as shown in formula (2)

The detection of FCOS is carried out lightweight transformation in the nanoDet, a Center-less branch which is difficult to converge in training is removed firstly, in addition, the convolution of 4 256 channels is used as a branch in the detection head of FCOS, the compression is carried out in the nanoDet into the convolution of 2 96 channels, the deep separable convolution is used for replacing the common convolution, the frame regression and classification are calculated by using the same group of convolution, and finally, split is divided into two parts. In addition, since inter-detection-head weight sharing in FCOS can reduce the parameter amount but reduce the model detection capability, a single set of convolution is used independently for each detection head in nanoDet. In the nanoDet, the normalization mode of GN (group normalization) in FCOS is changed into BN (batch normalization), because the normalized parameters of BN can be directly fused into convolution during reasoning, the calculation amount is reduced.

While the invention has been described above with reference to an embodiment, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In particular, the various features of the disclosed embodiments of the invention may be used in any combination, provided that no structural conflict exists, and the combinations are not exhaustively described in this specification merely for the sake of brevity and resource conservation. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. An artificial intelligence film reading method based on a convolutional neural network algorithm is characterized by comprising the following steps:

step two: selecting an image area;

step three: extracting target features;

step four: classifying the target according to the characteristics;

step five: regression is conducted on the target boundary box;

step six: optimizing the structure;

step seven: and finishing target detection.

2. The artificial intelligence film reading method based on the convolutional neural network algorithm as claimed in claim 1, wherein: the algorithm is realized mainly by means of a model of the nanoDet, and the nanoDet comprises a feature extraction backbone network, a feature fusion network and a detection head.

3. The artificial intelligence film reading method based on the convolutional neural network algorithm as claimed in claim 2, wherein: in a neural network, especially in the field of CV (computational fluid), features of an image are generally extracted, and this part is the root of the whole CV task, so this part of the network structure is called a backbone; NanoDet selects ShuffleNetV21.0x as the backbone. ShuffLeNetV21.0x is a modified version of ShuffLeNetV1, and the modification process follows the following 4 guidelines:

(4) element-wise operation can also negatively impact model speed.

4. The artificial intelligence film reading method based on the convolutional neural network algorithm as claimed in claim 2, wherein: the Feature fusion layer selects an improved Network PAN (Path Augmentation Network) of FPN (Feature farm Network, FPN) and makes lightweight modification on the basis of the improved Network PAN (Path Augmentation Network); a Feature Pyramid Network (FPN) is an efficient CNN Feature extraction method, a conventional convolutional neural Network is progressive from bottom to top, scale and semantic information are changed continuously, and the FPN is enhanced by adding Feature supplement of top-down paths, so that finally output features better represent multi-dimensional information of an input picture.

5. The artificial intelligence film reading method based on the convolutional neural network algorithm as claimed in claim 2, wherein: the method comprises the following steps that a Detection head (head) of an FCOS (fuzzy conditional One-Stage Object Detection) model is selected for the nanoDet, the idea of the FCOS model is to predict a target class and a target frame to which each point in an input image belongs, the overall architecture of the FCOS model is similar to an FPN (feature Pyramid network) structure, and 5 fused feature layers are predicted.