CN111724345A - Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field - Google Patents

Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field Download PDF

Info

Publication number
CN111724345A
CN111724345A CN202010422064.6A CN202010422064A CN111724345A CN 111724345 A CN111724345 A CN 111724345A CN 202010422064 A CN202010422064 A CN 202010422064A CN 111724345 A CN111724345 A CN 111724345A
Authority
CN
China
Prior art keywords
network
feature
detection
module
pneumonia
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010422064.6A
Other languages
Chinese (zh)
Inventor
武昱忻
李锵
关欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202010422064.6A priority Critical patent/CN111724345A/en
Publication of CN111724345A publication Critical patent/CN111724345A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10116X-ray image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung

Abstract

The invention relates to the field of medical equipment, deep learning convolutional neural network and the field of target detection and positioning, in order to improve the efficiency of diagnosing chest X-ray films, the invention discloses a pneumonia picture calibrating device and method capable of adaptively adjusting the size of a receptive field, wherein the pneumonia picture calibrating device comprises an X-ray machine and a computer, a picture shot by the X-ray machine is input into the computer, and the computer comprises a feature extraction network processing module, a feature pyramid module, a classification subbranch module and a regression subbranch module; ResNet50 and ResNet101 are respectively combined with the selective kernel convolution to form SK-ResNet50 and SK-ResNet101 as feature extraction networks; inputting the extracted features into a feature pyramid module for processing, and outputting a feature graph by using a feature pyramid; the classification subbranch module outputs the detection score of the prediction frame; the regression sub-branch module outputs the position of a prediction frame, and the prediction frame is the predicted pneumonia focus area. The invention is mainly applied to design and manufacture occasions.

Description

Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field
Technical Field
The invention relates to the field of medical instruments, deep learning convolutional neural networks and the field of target detection and positioning, and improves the combination of a dynamic selection unit and a target detection network, so that a neuron can self-adaptively adjust the size of a receptive field according to multi-scale input information, namely the size of a target, and more accurately realize pneumonia detection and positioning tasks in chest X-ray images. In particular to a pneumonia detection device and a positioning method capable of adaptively adjusting the size of a receptive field.
Background
Pneumonia is a serious pulmonary disease, is an inflammation of alveoli caused by bacteria, viruses, fungi and the like, and can rapidly worsen in time to cause other diseases such as heart failure, empyema, lung abscess, myocarditis or toxic encephalitis. Every year, 4.5 million people worldwide infect pneumonia, and 400 million people die from pneumonia. The numerical difference between infection and mortality indicates that early diagnosis is important. Pneumonia appears as an area of increased opacity on chest X-rays, and currently, pneumonia is diagnosed mainly by radiologists observing chest X-rays. However, it is time-consuming and labor-consuming to manually observe the X-ray film, so that the increasing data volume is overwhelmed by radiologists, and misdiagnosis and missed diagnosis are easily caused by the influence of subjective factors.
In recent years, with the development of Deep Learning (DL), great attention has been paid to radiology because of its applicability to solving various clinical imaging problems, and researchers at home and abroad have been actively studied in the field of chest X-ray images. Wu et al designed an X-ray film pneumonia result prediction device based on a convolutional neural network, wherein ResNet50 is adopted as a classification network, and a detection model is a Faster proposed regional convolutional neural network (Faster R-CNN), but the detection model is a two-stage detection model, so that the model complexity is high and the detection speed is low. Amit et al first input the X-ray image into a calibration candidate region (ROI Align) classifier, then segment it with the fast R-CNN model and predict the prediction frame, and adjust the threshold during training to improve the result, but the result accuracy is not high. Therefore, accurate detection of pneumonia with a less complex deep learning framework, in addition to the diagnosis by radiologists, is important to reduce radiologist workload and to make early diagnoses of patient disease.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a chest X-ray diagnosis method and equipment, and improve the efficiency of chest X-ray diagnosis. Therefore, the pneumonia picture calibrating device capable of adaptively adjusting the size of the receptive field comprises an X-ray machine and a computer, wherein pictures shot by the X-ray machine are input into the computer, and the computer comprises a feature extraction network processing module, a feature pyramid module, a classification subbranch module and a regression subbranch module; ResNet50 and ResNet101 are respectively combined with the selective kernel convolution to form SK-ResNet50 and SK-ResNet101 as feature extraction networks, ResNet50 is a residual network of 50 layers, and ResNet101 is a residual network of 101 layers; inputting the extracted features into a feature pyramid module for processing, and outputting a feature graph by using a feature pyramid; the classification sub-branch module is a full convolution neural network FCN module, is connected to each feature map output by the feature pyramid network, and outputs the detection scores of the prediction frames; the regression sub-branch module is also a full convolution neural network FCN module, is also connected to each feature map output by the feature pyramid network, and outputs the position of a prediction frame, and the prediction frame is a predicted pneumonia focus area.
A pneumonia picture verification method capable of adaptively adjusting the size of a receptive field is used for establishing a selective kernel convolution retina network SK-RetinaNet, wherein the network comprises three parts: SK-ResNet, characteristic pyramid network and subbranch; the SK-ResNet is characterized in that an SK unit for adaptively adjusting the size of a receptive field is added into a residual block on the basis of a residual network ResNet to serve as a feature extraction network; the feature pyramid network is used for constructing a multi-scale feature map set from top to bottom by using feature maps output by each layer of SK-ResNet; the sub-branch is composed of a classification sub-branch and a regression sub-branch, the classification sub-branch is a full convolution neural network FCN and is connected to each feature map output by the feature pyramid network, the regression sub-branch is also connected to each feature map output by the feature pyramid network, and the classification sub-branch and the regression sub-branch realize detection and positioning tasks on multiple scales.
And fusing the result obtained by taking the SK-ResNet50 as the characteristic extraction network detection and the result obtained by taking the SK-ResNet101 as the characteristic extraction network detection, wherein the detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction frame and the detection score of the prediction frame, and the fusing method is that after the detection is finished respectively, the detection score is taken as the weight, and the coordinates of the upper left corner point and the lower right corner point of the result frame detected by the two models are adjusted.
The method comprises the following specific steps:
and fusing the detection results of the networks with SK-ResNet50 and SK-ResNet101 as features respectively. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. After the respective detection is finished, the detection scores are used as weights, the coordinates of the upper left corner point and the coordinates of the lower right corner point of the result frame of the two model detections are adjusted, and for the horizontal coordinate of the upper left corner point, the fusion method is shown as the formula (7):
Figure BDA0002497318640000021
wherein t islxTo predict the horizontal coordinate of the top left corner point of the box, s1Extracting the detection fraction, s, of the network for the purpose of characterizing SK-ResNet502To extract the detection fraction, T, of the network by taking SK-ResNet101 as a featurelxThe horizontal coordinate of the fused upper left corner point is taken as the horizontal coordinate;
taking the abscissa of the lower right corner point as an example, the fusion method is shown as formula (8):
Figure BDA0002497318640000022
wherein t isrx1、s3Respectively, the abscissa and the detection score, t, of the lower right corner point of a prediction box when the SK-ResNet50 is taken as a feature to extract the networkrx2、s4Respectively taking SK-ResNet101 as a feature to extract the abscissa and the detection score of the lower right corner point of a prediction box in the network, TrxThe abscissa of the fused lower right corner point is used.
The SK cell contains three operations: slicing, fusing and selecting:
first is the first operation, slicing, X is a feature map of dimension (H ' × W ' × C ') extended by different convolution kernels to two branches, two operations being respectively
Figure BDA0002497318640000023
And after convolution, a BN batch standardization function and a ReLU activation function are carried out;
the second operation is: and (3) fusing, namely performing pixel addition on the two obtained characteristic graphs:
Figure BDA0002497318640000024
then global average pooling is performed:
Figure BDA0002497318640000025
s∈RCa one-dimensional profile is obtained, with a global receptive field, followed by a non-linear transformation using a fully connected layer to reduce the dimensionality:
Figure BDA0002497318640000026
wherein, represents ReLU activation function, β represents BN batch normalization, W ∈ Rd×CTo investigate the effect of d on model efficiency, a reduction ratio r was used to control its value:
Figure BDA0002497318640000031
wherein L represents the minimum value of d;
finally, the third operation: optionally, first perform a softmax operation on z:
Figure BDA0002497318640000032
in the formula, A, B ∈ RC×d,Ac∈R1×dIs a characteristic diagram of the c-th channel of A, acIs the c-th element of a. B is a redundancy matrix in both branches because ac+b c1. The final profile V is obtained from the following formula:
Figure BDA0002497318640000033
in SK-RetinaNet, the neuron reception fields obtained by convolution of convolution kernels with different sizes in slicing operation are different, and the size of the reception field is self-adaptively adjusted according to the different size of the input target; on the other hand, the fusion operation in the SK can realize that the weight of each characteristic channel is automatically trained according to the importance degree of different channels under the condition that the training parameters are not excessively increased, so that a more accurate result is achieved.
The invention has the characteristics and beneficial effects that:
the invention realizes the tasks of detecting and positioning the pneumonia focus by using a deep learning method, and greatly improves the efficiency compared with the judgment of the traditional doctor. The convolutional neural network can input images into the models in batches for detection, so that the burden of doctors is reduced, and the detection speed is increased.
By combining the SK unit with the RetinaNet (retina network) network, the disadvantage that the receptive field of each layer of artificial neurons in the common convolutional neural network is designed to be the same in size is changed, and each neuron can adaptively adjust the size of the receptive field according to multi-scale input information, so that the detection precision is improved while the complexity of the network is ensured to be low.
The algorithm of the invention comprises random inversion, random scaling, movement of coordinate space and brightness and contrast increase before the training picture is input into the neural network, so that the generalization capability of the model is strong.
Description of the drawings:
FIG. 1 SK is a schematic diagram.
Fig. 2 is a pneumonia detection network RetinaNet algorithm framework.
Fig. 3 a schematic representation of a chest X-ray.
FIG. 4 a feature pyramid creation process.
Detailed Description
As more and more patients with pneumonia develop, the problem of inadequate X-ray film diagnosis time and energy by radiologists becomes increasingly prominent. The neural network can be used for saving manpower and quickly detecting the position of the pneumonia focus, but the pneumonia is represented by increased opacity in an X-ray film and cannot be obviously judged in appearance, so that the discrimination of the neural network is very limited when the characteristics are extracted, and the result precision is not high enough. Increasing the number of network layers increases time complexity and reduces efficiency. Therefore, how to improve the detection accuracy by using the neural network without excessively increasing the complexity of the network becomes a difficult problem to be solved urgently.
The general technical scheme of the invention is as follows: the whole body is roughly divided into three parts: the system comprises a feature extraction network, a feature pyramid, a classification subbranch and a regression subbranch. ResNet50(ResNet50 is a residual network with 50 layers) and ResNet101(ResNet101 is a residual network with 101 layers) are respectively combined with Selective Kernel convolution (SK) to form SK-ResNet50 (the combination of SK and ResNet 50) and SK-ResNet101 (the combination of SK and ResNet 101) as feature extraction networks. Then, a feature pyramid network is established, with top-down cross-connections, such that subsequent detections on feature maps of different levels utilize both high-level features and low-level features. Secondly, a classification subbranch and a regression subbranch are respectively established on five feature maps of P3-P7 (see the detailed description below) output by the feature pyramid, and the detection scores of the prediction boxes are output and the positions of the detection scores are located. Finally, the results obtained by taking SK-ResNet50 and SK-ResNet101 as feature extraction network detection are fused, and the coordinates of the prediction box are adjusted by taking the detection score as a weight.
The concept related to the present invention is:
the prior frames are on five feature maps P3-P7 generated by the feature pyramid, and each feature map to be detected has nine different prior frames which comprise three different aspect ratios (0.5, 1, 2) and three different area changes (2)0,21/3,22/3). This is the first step to roughly select the blocks, and subsequent passage through the regression sub-branch will adjust its position to become the prediction blocks.
The target box is the location of the true pneumonia, with labels in the training set.
The prediction box is the position of the pneumonia focus predicted after the network processing is completed.
The invention provides a framework combining a classical detection network RetinaNet and an SK with a function of adaptively adjusting the size of a receptive field. The network framework is improved in two aspects: firstly, SK capable of adaptively adjusting the size of a receptive field is added into ResNet (ResNet is a residual error network), so that the principle that the size of a receptive domain of visual cortical neurons can be adjusted according to stimulation is simulated, a detection network capable of adaptively adjusting the size of the receptive field is formed, and detection precision is improved on the premise that training parameters are not excessively increased; secondly, a prediction frame fusion algorithm is added into the prediction network, the SK-ResNet50 and the SK-ResNet101 are respectively used as framework networks, the scores predicted by the classified sub-branches are used as weights to adjust the positions of the prediction frames, and more accurate prediction frames are generated.
The hardware of the invention is composed of: CPU (central processing unit)
Figure BDA0002497318640000042
Core i7-6800K 3.5Ghz, GPU Nvidia GTX1080Ti (11GB) × 2, Ubuntu 16.04 operating system, using open source deep learning framework Keras.
The invention comprises three parts: SK-ResNet50 or SK-ResNet101, a feature pyramid network and sub-branches. The SK-ResNet50 or SK-ResNet101 adds SK as a feature extraction network in each residual block on the basis of ResNet50 or ResNet 101; the feature pyramid network is used for constructing a multi-scale feature map set from top to bottom by using feature maps output by each layer of SK-ResNet50 or SK-ResNet 101; the sub-branches implement the detection and positioning tasks on multiple scales.
ResNet50 is a residual network of 50 layers, ResNet101 is a residual network of 101 layers, and the specific structure is shown in Table 1.
TABLE 1 ResNet50 and ResNet101 network architectures
Figure BDA0002497318640000041
SK is added into each residual block of ResNet50 or ResNet101 to obtain SK-ResNet50 or SK-ResNet101, and the specific structure is shown in Table 2.
TABLE 2 SK-ResNet50 and SK-ResNet101 network architectures
Figure BDA0002497318640000051
The respective network structures of the SK-ResNet, the feature pyramid network and the subbranches are as follows:
SK-ResNet50 or SK-ResNet101 adds SK to each convolution block in ResNet50 or ResNet 101.
2. The construction process of the characteristic pyramid network comprises the following steps:
a) c5 is reduced in dimension by 1 × 1 convolution, and the dimension becomes 256. The feature map P5_ upsampled becomes the same size as C4 after upsampling, and becomes the feature map P5 after a 3 × 3 convolution.
b) C4 is reduced in dimension by 1 × 1 convolution, the dimension is changed to 256, then the dimension is added with feature map P4 element by element, then the up-sampling is carried out, the feature map P4_ upsampled with the same size as C3 is changed, and the feature map P4 is formed by 3 × 3 convolution
c) C3 is reduced in dimension by 1 × 1 convolution, and the dimension is changed to 256, then added with P4_ upsamplled element by element, and changed into feature map P3 by 3 × 3 convolution
d) C5 is convolved into a feature map P6 with a size of 3 × 3 and a step size of 2
e) The characteristic map P6 is converted into a characteristic map P7 after being subjected to a convolution operation with the ReLU activation function and the size of 3 multiplied by 3 and the step size of 2, so that the characteristic pyramid network outputs five characteristic maps P3-P7
Wherein C3, C4 and C5 are respectively characteristic diagrams of outputs of 2 nd, 3 rd and 4 th convolution blocks (block3, block4 and block5) of ResNet 50.
The method for establishing the characteristic pyramid adopts the horizontal connection from top to bottom. The goal is to enable subsequent detection on different levels of feature maps while utilizing both high and low levels of features.
The construction process of the characteristic pyramid network comprises the following steps:
a) c5 is reduced in dimension by 1 × 1 convolution, and the dimension becomes 256. The feature map P5_ upsampled becomes the same size as C4 after upsampling, and becomes the feature map P5 after a 3 × 3 convolution.
b) C4 is reduced in dimension by 1 × 1 convolution, the dimension is changed to 256, then the dimension is added with feature map P4 element by element, then the up-sampling is carried out, the feature map P4_ upsampled with the same size as C3 is changed, and the feature map P4 is formed by 3 × 3 convolution
c) C3 is reduced in dimension by 1 × 1 convolution, and the dimension is changed to 256, then added with P4_ upsamplled element by element, and changed into feature map P3 by 3 × 3 convolution
d) C5 is convolved into a feature map P6 with a size of 3 × 3 and a step size of 2
e) The characteristic map P6 is converted into a characteristic map P7 after being subjected to a convolution operation with the ReLU activation function and the size of 3 multiplied by 3 and the step size of 2, so that the characteristic pyramid network outputs five characteristic maps P3-P7
Fig. 4 is a process of establishing a feature pyramid, and it can be seen from the figure that a chest X-ray film is convoluted layer by layer, and the feature map becomes smaller. In the process of establishing the characteristic pyramid, the top-down transverse connection is formed through the characteristic graph adding operation and the up-sampling operation, and five characteristic graphs with different sizes are generated. A classification subbranch and a regression subbranch are established on each feature map in the following, so that the feature maps at different levels are detected, and the features at high level and the features at low level are utilized simultaneously.
3. The subbranches are composed of classification subbranches and regression subbranches.
The structure of the classification subbranch is very simple, applying four 3 × 3 convolutions, each convolution having 256 channels and a ReLU activation function. Four 3 x 3 convolutions are followed by a 3 x 3 convolution with one channel being 9. The regression subbranches are similar in structure to the classification subbranches, except that four 3 × 3 convolutions in the regression subbranch are followed by a 36-pass 3 × 3 convolution. It is connected to each feature map output by the feature pyramid network as well as the classification subbranches. There is a parallel relationship between the classification subbranch and the regression subbranch. The classification sub-branch outputs the detection score of the prediction box, and the regression sub-branch is used for regressing the position of the prediction box.
The classification subbranch predicts the possibility of existence of an object on each pixel point of 9 prior frames, is a small full convolution neural network (FCN) and is connected to each feature map output by the feature pyramid network, and parameters of the subnetwork are shared by all pyramid layers. The structure of the classification subbranch is very simple, applying four 3 × 3 convolutions, each convolution having 256 channels and a ReLU activation function. Four 3 x 3 convolutions are followed by a 3 x 3 convolution with one channel 9 and finally a sigmoid activation function is concatenated. The regression sub-branch is parallel to the classification sub-branch, the feature layer output by each pyramid is connected with a small FCN, the purpose is to regress the difference between the prior frame and the nearest target frame, values of four coordinates of an upper left corner point and a lower right corner point are continuously updated, and each pyramid layer generates 4 multiplied by 9 pieces of one-dimensional linear output.
FIG. 1 is a schematic SK diagram, which includes three operations: slicing, fusing and selecting.
First is the first operation, slicing, X is a feature map of dimension (H ' × W ' × C ') that can be extended by different convolution kernels to multiple branches, in this case we have two branches by default
Figure BDA0002497318640000061
Figure BDA0002497318640000062
The sizes of the convolution kernels are 3 and 5, respectively. And after convolution with BN batch normalization and ReLU activation functions.
The second operation is: and (4) fusing. And performing pixel addition on the two obtained feature maps:
Figure BDA0002497318640000063
then global average pooling is performed:
Figure BDA0002497318640000064
to give s ∈ RCAnd a one-dimensional feature map, wherein the feature map has a global receptive field. Then a non-linear transformation is performed using a full convolutional layer to reduce dimensionality:
Figure BDA0002497318640000065
in the formula, represents ReLUActivation function, β for BN batch normalization, W ∈ Rd×C. To investigate the effect of d on model efficiency, we used the reduction ratio r to control its value:
Figure BDA0002497318640000071
wherein L represents the minimum value of d.
Finally, the third operation: and (4) selecting. Its role is to adaptively select information of different spatial scales. First a softmax operation is performed on z:
Figure BDA0002497318640000072
in the formula, A, B ∈ RC×d,Ac∈R1×dIs a characteristic diagram of the c-th channel of A, acIs the c-th element of a. In the two branches of this example B is a redundancy matrix, since ac+b c1. The final profile V is obtained from the following formula:
Figure BDA0002497318640000073
and fusing the result obtained by taking the SK-ResNet50 as the characteristic extraction network detection and the result obtained by taking the SK-ResNet101 as the characteristic extraction network detection. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. And the fusion method is that after the detection is finished respectively, the detection scores are used as weights, and the coordinates of the upper left corner point and the lower right corner point of the result frames of the two model detections are adjusted. The method comprises the following specific steps:
and fusing the detection results of the networks with SK-ResNet50 and SK-ResNet101 as features respectively. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. And the fusion method is that after the detection is finished respectively, the detection scores are used as weights, and the coordinates of the upper left corner point and the lower right corner point of the result frames of the two model detections are adjusted. Taking the abscissa of the upper left corner point as an example, the fusion method is shown as formula (7):
Figure BDA0002497318640000074
wherein t islx1、s1Respectively, the abscissa and the detection score of the upper left corner point of a prediction box when the SK-ResNet50 is taken as a feature to extract the network, tlx2、s2Respectively taking SK-ResNet101 as a feature to extract the abscissa and the detection score of the upper left corner point of a prediction box in the network, TlxThe abscissa of the fused upper left corner point.
The fusion method of the horizontal coordinates of the lower right corner points is shown as the formula (8):
Figure BDA0002497318640000075
wherein t isrx1、s3Respectively, the abscissa and the detection score, t, of the lower right corner point of a prediction box when the SK-ResNet50 is taken as a feature to extract the networkrx2、s4Respectively taking SK-ResNet101 as a feature to extract the abscissa and the detection score of the lower right corner point of a prediction box in the network, TrxThe abscissa of the fused lower right corner point is used.
The fusion is not realized by the feature extraction network, the feature pyramid, the classification subbranch and the regression subbranch which are described previously, but the detection result of the feature extraction network with SK-ResNet50 as the feature is fused with the detection result of the feature extraction network with SK-ResNet101 as the feature. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. And the fusion method is that after the detection is finished respectively, the detection scores are used as weights, and the coordinates of the upper left corner point and the lower right corner point of the result frames of the two model detections are adjusted.
Fig. 2 is a main framework of the pneumonia detection network, the selective kernel convolution residual error network and the feature pyramid network are combined into a backbone network, and P3-P7 are feature graphs with different scales and sizes, and are respectively connected with a classification sub-network and a regression sub-network. Compared with other neural networks, ResNet establishes a direct correlation channel between input and output, allowing the original input information to be directly transmitted to the later layers, so that the ResNet focuses on learning the residual error between input and output. The shallow layer network focuses more on detail information, the high layer network focuses more on semantic information, different characteristics are needed for detecting different targets, therefore, a characteristic pyramid network is built at the top end of ResNet, characteristic graphs P3-P7 are from small to large, and therefore low-layer characteristics and high-layer characteristics can be used simultaneously to conduct prediction on different layers simultaneously. Then, prediction frame extraction is carried out on each layer, the prior frame is simultaneously input into a classification sub-network and a regression sub-network, the score of a certain prior frame as a focus area is obtained, the position of the prior frame is adjusted, and the two sub-networks do not share the weight. Compared with the RetinaNet, on one hand, the SK-RetinaNet has different receptive fields of neurons obtained by convolution of convolution kernels with different sizes in slicing operation, so that the magnitude of the receptive fields can be adaptively adjusted according to different input target sizes. On the other hand, the fusion operation in the SK enables the weight of each characteristic channel to be automatically trained according to the importance degree of different channels under the condition that the training parameters are not excessively increased, so that a more accurate result is achieved.
Since the training set is limited in pictures and easy to overfit, the text expands data by methods of random flipping, random scaling, movement of coordinate space, increasing/decreasing brightness and contrast to prevent overfit and improve the generalization capability of the model. Fig. 3 is a schematic diagram of a chest X-ray film.
If the number of the data sets is too small, the overfitting problem can be caused in the training process, and the detection result is influenced. The image is subjected to processing such as random inversion to expand data and prevent overfitting. Thus, this operation of augmenting the data is done prior to training, i.e., prior to inputting the feature extraction network. The data set provided by the North American radiology Association in the pneumonia detection competition is adopted, and the data set is expanded after operations of random overturning, random zooming, coordinate space movement, brightness increase/decrease, contrast ratio increase/decrease and the like. And then inputting the expanded data set into a feature extraction network, and further carrying out subsequent detection.

Claims (5)

1. A pneumonia picture calibrating device capable of adaptively adjusting the size of a receptive field is characterized by comprising an X-ray machine and a computer, wherein pictures shot by the X-ray machine are input into the computer, and the computer comprises a feature extraction network processing module, a feature pyramid module, a classification subbranch module and a regression subbranch module; ResNet50 and ResNet101 are respectively combined with the selective kernel convolution to form SK-ResNet50 and SK-ResNet101 as feature extraction networks, ResNet50 is a residual network of 50 layers, and ResNet101 is a residual network of 101 layers; inputting the extracted features into a feature pyramid module for processing, and outputting a feature graph by using a feature pyramid; the classification sub-branch module is a full convolution neural network FCN module, is connected to each feature map output by the feature pyramid network, and outputs the detection scores of the prediction frames; the regression sub-branch module is also a full convolution neural network FCN module, is also connected to each feature map output by the feature pyramid network, and outputs the position of a prediction frame, and the prediction frame is a predicted pneumonia focus area.
2. A pneumonia picture verification method capable of adaptively adjusting the size of a receptive field is characterized in that a selective kernel convolution retina network SK-RetinaNet is established, and the network comprises three parts: SK-ResNet, characteristic pyramid network and subbranch; the SK-ResNet is characterized in that an SK unit for adaptively adjusting the size of a receptive field is added into a residual block on the basis of a residual network ResNet to serve as a feature extraction network; the feature pyramid network is used for constructing a multi-scale feature map set from top to bottom by using feature maps output by each layer of SK-ResNet; the sub-branch is composed of a classification sub-branch and a regression sub-branch, the classification sub-branch is a full convolution neural network FCN and is connected to each feature map output by the feature pyramid network, the regression sub-branch is also connected to each feature map output by the feature pyramid network, and the classification sub-branch and the regression sub-branch realize detection and positioning tasks on multiple scales.
3. The pneumonia image verification method capable of adaptively adjusting the size of the receptive field according to claim 2, wherein the results obtained by extracting the network detection with SK-ResNet50 as the feature are merged with the results obtained by extracting the network detection with SK-ResNet101 as the feature, the detection results include the coordinates of the upper left corner and the lower right corner of the prediction box and the detection scores of the prediction box, and the merging method is to adjust the coordinates of the upper left corner point and the lower right corner point of the result boxes of the two model detections with the detection scores as the weights after the respective detections are completed.
4. The pneumonia image verification method capable of adaptively adjusting the size of receptive field according to claim 2, characterized by comprising the following steps:
and fusing the detection results of the networks with SK-ResNet50 and SK-ResNet101 as features respectively. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. After the respective detection is finished, the detection scores are used as weights, the coordinates of the upper left corner point and the coordinates of the lower right corner point of the result frame of the two model detections are adjusted, and for the horizontal coordinate of the upper left corner point, the fusion method is shown as the formula (7):
Figure FDA0002497318630000011
wherein t islxTo predict the horizontal coordinate of the top left corner point of the box, s1Extracting the detection fraction, s, of the network for the purpose of characterizing SK-ResNet502To extract the detection fraction, T, of the network by taking SK-ResNet101 as a featurelxThe horizontal coordinate of the fused upper left corner point is taken as the horizontal coordinate;
taking the abscissa of the lower right corner point as an example, the fusion method is shown as formula (8):
Figure FDA0002497318630000012
wherein t isrx1、s3Respectively, the abscissa and the detection score, t, of the lower right corner point of a prediction box when the SK-ResNet50 is taken as a feature to extract the networkrx2、s4Respectively taking SK-ResNet101 as a feature to extract the abscissa and the detection score of the lower right corner point of a prediction box in the network, TrxThe abscissa of the fused lower right corner point is used.
5. The method for verification of pneumonia picture capable of adaptively adjusting the size of receptive field according to claim 4, wherein the SK unit comprises three operations: slicing, fusing and selecting:
first is the first operation, slicing, X is a feature map of dimension (H ' × W ' × C ') extended by different convolution kernels to two branches, two operations being respectively
Figure FDA0002497318630000021
And after convolution, a BN batch standardization function and a ReLU activation function are carried out;
the second operation is: and (3) fusing, namely performing pixel addition on the two obtained characteristic graphs:
Figure FDA0002497318630000022
then global average pooling is performed:
Figure FDA0002497318630000023
s∈RCa one-dimensional profile is obtained, with a global receptive field, followed by a non-linear transformation using a fully connected layer to reduce the dimensionality:
Figure FDA0002497318630000024
wherein, represents ReLU activation function, β represents BN batch normalization, W ∈ Rd×CTo investigate the effect of d on model efficiency, a reduction ratio r was used to control its value:
Figure FDA0002497318630000025
wherein L represents the minimum value of d;
finally, the third operation: optionally, first perform a softmax operation on z:
Figure FDA0002497318630000026
in the formula, A, B ∈ RC×d,Ac∈R1×dIs a characteristic diagram of the c-th channel of A, acIs the c-th element of a. B is a redundancy matrix in both branches because ac+bc1. The final profile V is obtained from the following formula:
Figure FDA0002497318630000027
in SK-RetinaNet, the neuron reception fields obtained by convolution of convolution kernels with different sizes in slicing operation are different, and the size of the reception field is self-adaptively adjusted according to the different size of the input target; on the other hand, the fusion operation in the SK can realize that the weight of each characteristic channel is automatically trained according to the importance degree of different channels under the condition that the training parameters are not excessively increased, so that a more accurate result is achieved.
CN202010422064.6A 2020-05-18 2020-05-18 Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field Pending CN111724345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010422064.6A CN111724345A (en) 2020-05-18 2020-05-18 Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010422064.6A CN111724345A (en) 2020-05-18 2020-05-18 Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field

Publications (1)

Publication Number Publication Date
CN111724345A true CN111724345A (en) 2020-09-29

Family

ID=72564596

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010422064.6A Pending CN111724345A (en) 2020-05-18 2020-05-18 Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field

Country Status (1)

Country Link
CN (1) CN111724345A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669282A (en) * 2020-12-29 2021-04-16 燕山大学 Spine positioning method based on deep neural network
CN113592809A (en) * 2021-07-28 2021-11-02 中国海洋大学 Pneumonia image detection system and method based on channel attention residual error network
CN114693939A (en) * 2022-03-16 2022-07-01 中南大学 Transparency detection depth feature extraction method under complex environment
CN113592809B (en) * 2021-07-28 2024-05-14 中国海洋大学 Pneumonia image detection system and method based on channel attention residual error network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559300A (en) * 2018-11-19 2019-04-02 上海商汤智能科技有限公司 Image processing method, electronic equipment and computer readable storage medium
CN109919928A (en) * 2019-03-06 2019-06-21 腾讯科技(深圳)有限公司 Detection method, device and the storage medium of medical image
CN110674866A (en) * 2019-09-23 2020-01-10 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN110796037A (en) * 2019-10-15 2020-02-14 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109559300A (en) * 2018-11-19 2019-04-02 上海商汤智能科技有限公司 Image processing method, electronic equipment and computer readable storage medium
CN109919928A (en) * 2019-03-06 2019-06-21 腾讯科技(深圳)有限公司 Detection method, device and the storage medium of medical image
CN110674866A (en) * 2019-09-23 2020-01-10 兰州理工大学 Method for detecting X-ray breast lesion images by using transfer learning characteristic pyramid network
CN110796037A (en) * 2019-10-15 2020-02-14 武汉大学 Satellite-borne optical remote sensing image ship target detection method based on lightweight receptive field pyramid

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TEO ASPLUND 等: "A Faster, Unbiased Path Opening by Upper Skeletonization and Weighted Adjacency Graphs", IEEE, vol. 25, no. 12, XP011624980, DOI: 10.1109/TIP.2016.2609805 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669282A (en) * 2020-12-29 2021-04-16 燕山大学 Spine positioning method based on deep neural network
CN112669282B (en) * 2020-12-29 2023-02-14 燕山大学 Spine positioning method based on deep neural network
CN113592809A (en) * 2021-07-28 2021-11-02 中国海洋大学 Pneumonia image detection system and method based on channel attention residual error network
CN113592809B (en) * 2021-07-28 2024-05-14 中国海洋大学 Pneumonia image detection system and method based on channel attention residual error network
CN114693939A (en) * 2022-03-16 2022-07-01 中南大学 Transparency detection depth feature extraction method under complex environment
CN114693939B (en) * 2022-03-16 2024-04-30 中南大学 Method for extracting depth features of transparent object detection under complex environment

Similar Documents

Publication Publication Date Title
CN110599448B (en) Migratory learning lung lesion tissue detection system based on MaskScoring R-CNN network
CN110378381B (en) Object detection method, device and computer storage medium
Roth et al. A new 2.5 D representation for lymph node detection using random sets of deep convolutional neural network observations
Tian et al. Multi-path convolutional neural network in fundus segmentation of blood vessels
CN111259982A (en) Premature infant retina image classification method and device based on attention mechanism
CN107492071A (en) Medical image processing method and equipment
CN106940816A (en) Connect the CT image Lung neoplasm detecting systems of convolutional neural networks entirely based on 3D
CN111429407B (en) Chest X-ray disease detection device and method based on double-channel separation network
Li et al. Segmentation of retinal fluid based on deep learning: application of three-dimensional fully convolutional neural networks in optical coherence tomography images
CN107169974A (en) It is a kind of based on the image partition method for supervising full convolutional neural networks more
US20220198230A1 (en) Auxiliary detection method and image recognition method for rib fractures based on deep learning
CN108765387A (en) Based on Faster RCNN mammary gland DBT image lump automatic testing methods
CN108765392B (en) Digestive tract endoscope lesion detection and identification method based on sliding window
CN108537282A (en) A kind of diabetic retinopathy stage division using extra lightweight SqueezeNet networks
Yao et al. Pneumonia detection using an improved algorithm based on faster r-cnn
Zhao et al. D2a u-net: Automatic segmentation of covid-19 lesions from ct slices with dilated convolution and dual attention mechanism
Lei et al. Automated detection of retinopathy of prematurity by deep attention network
CN111724345A (en) Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field
CN113782184A (en) Cerebral apoplexy auxiliary evaluation system based on facial key point and feature pre-learning
Vij et al. A systematic review on diabetic retinopathy detection using deep learning techniques
Pradhan et al. Lung cancer detection using 3D convolutional neural networks
CN107146211A (en) Retinal vascular images noise-reduction method based on line spread function and bilateral filtering
Miao et al. Classification of Diabetic Retinopathy Based on Multiscale Hybrid Attention Mechanism and Residual Algorithm
CN113989206A (en) Lightweight model-based bone age prediction method and device
Yang et al. Learning feature-rich integrated comprehensive context networks for automated fundus retinal vessel analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20240227