CN111724345A

CN111724345A - Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field

Info

Publication number: CN111724345A
Application number: CN202010422064.6A
Authority: CN
Inventors: 武昱忻; 李锵; 关欣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-05-18
Filing date: 2020-05-18
Publication date: 2020-09-29

Abstract

The invention relates to the field of medical equipment, deep learning convolutional neural network and the field of target detection and positioning, in order to improve the efficiency of diagnosing chest X-ray films, the invention discloses a pneumonia picture calibrating device and method capable of adaptively adjusting the size of a receptive field, wherein the pneumonia picture calibrating device comprises an X-ray machine and a computer, a picture shot by the X-ray machine is input into the computer, and the computer comprises a feature extraction network processing module, a feature pyramid module, a classification subbranch module and a regression subbranch module; ResNet50 and ResNet101 are respectively combined with the selective kernel convolution to form SK-ResNet50 and SK-ResNet101 as feature extraction networks; inputting the extracted features into a feature pyramid module for processing, and outputting a feature graph by using a feature pyramid; the classification subbranch module outputs the detection score of the prediction frame; the regression sub-branch module outputs the position of a prediction frame, and the prediction frame is the predicted pneumonia focus area. The invention is mainly applied to design and manufacture occasions.

Description

Pneumonia picture verification device and method capable of adaptively adjusting size of receptive field

Technical Field

The invention relates to the field of medical instruments, deep learning convolutional neural networks and the field of target detection and positioning, and improves the combination of a dynamic selection unit and a target detection network, so that a neuron can self-adaptively adjust the size of a receptive field according to multi-scale input information, namely the size of a target, and more accurately realize pneumonia detection and positioning tasks in chest X-ray images. In particular to a pneumonia detection device and a positioning method capable of adaptively adjusting the size of a receptive field.

Background

Pneumonia is a serious pulmonary disease, is an inflammation of alveoli caused by bacteria, viruses, fungi and the like, and can rapidly worsen in time to cause other diseases such as heart failure, empyema, lung abscess, myocarditis or toxic encephalitis. Every year, 4.5 million people worldwide infect pneumonia, and 400 million people die from pneumonia. The numerical difference between infection and mortality indicates that early diagnosis is important. Pneumonia appears as an area of increased opacity on chest X-rays, and currently, pneumonia is diagnosed mainly by radiologists observing chest X-rays. However, it is time-consuming and labor-consuming to manually observe the X-ray film, so that the increasing data volume is overwhelmed by radiologists, and misdiagnosis and missed diagnosis are easily caused by the influence of subjective factors.

In recent years, with the development of Deep Learning (DL), great attention has been paid to radiology because of its applicability to solving various clinical imaging problems, and researchers at home and abroad have been actively studied in the field of chest X-ray images. Wu et al designed an X-ray film pneumonia result prediction device based on a convolutional neural network, wherein ResNet50 is adopted as a classification network, and a detection model is a Faster proposed regional convolutional neural network (Faster R-CNN), but the detection model is a two-stage detection model, so that the model complexity is high and the detection speed is low. Amit et al first input the X-ray image into a calibration candidate region (ROI Align) classifier, then segment it with the fast R-CNN model and predict the prediction frame, and adjust the threshold during training to improve the result, but the result accuracy is not high. Therefore, accurate detection of pneumonia with a less complex deep learning framework, in addition to the diagnosis by radiologists, is important to reduce radiologist workload and to make early diagnoses of patient disease.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention aims to provide a chest X-ray diagnosis method and equipment, and improve the efficiency of chest X-ray diagnosis. Therefore, the pneumonia picture calibrating device capable of adaptively adjusting the size of the receptive field comprises an X-ray machine and a computer, wherein pictures shot by the X-ray machine are input into the computer, and the computer comprises a feature extraction network processing module, a feature pyramid module, a classification subbranch module and a regression subbranch module; ResNet50 and ResNet101 are respectively combined with the selective kernel convolution to form SK-ResNet50 and SK-ResNet101 as feature extraction networks, ResNet50 is a residual network of 50 layers, and ResNet101 is a residual network of 101 layers; inputting the extracted features into a feature pyramid module for processing, and outputting a feature graph by using a feature pyramid; the classification sub-branch module is a full convolution neural network FCN module, is connected to each feature map output by the feature pyramid network, and outputs the detection scores of the prediction frames; the regression sub-branch module is also a full convolution neural network FCN module, is also connected to each feature map output by the feature pyramid network, and outputs the position of a prediction frame, and the prediction frame is a predicted pneumonia focus area.

A pneumonia picture verification method capable of adaptively adjusting the size of a receptive field is used for establishing a selective kernel convolution retina network SK-RetinaNet, wherein the network comprises three parts: SK-ResNet, characteristic pyramid network and subbranch; the SK-ResNet is characterized in that an SK unit for adaptively adjusting the size of a receptive field is added into a residual block on the basis of a residual network ResNet to serve as a feature extraction network; the feature pyramid network is used for constructing a multi-scale feature map set from top to bottom by using feature maps output by each layer of SK-ResNet; the sub-branch is composed of a classification sub-branch and a regression sub-branch, the classification sub-branch is a full convolution neural network FCN and is connected to each feature map output by the feature pyramid network, the regression sub-branch is also connected to each feature map output by the feature pyramid network, and the classification sub-branch and the regression sub-branch realize detection and positioning tasks on multiple scales.

And fusing the result obtained by taking the SK-ResNet50 as the characteristic extraction network detection and the result obtained by taking the SK-ResNet101 as the characteristic extraction network detection, wherein the detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction frame and the detection score of the prediction frame, and the fusing method is that after the detection is finished respectively, the detection score is taken as the weight, and the coordinates of the upper left corner point and the lower right corner point of the result frame detected by the two models are adjusted.

The method comprises the following specific steps:

and fusing the detection results of the networks with SK-ResNet50 and SK-ResNet101 as features respectively. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. After the respective detection is finished, the detection scores are used as weights, the coordinates of the upper left corner point and the coordinates of the lower right corner point of the result frame of the two model detections are adjusted, and for the horizontal coordinate of the upper left corner point, the fusion method is shown as the formula (7):

wherein t is_lxTo predict the horizontal coordinate of the top left corner point of the box, s₁Extracting the detection fraction, s, of the network for the purpose of characterizing SK-ResNet50₂To extract the detection fraction, T, of the network by taking SK-ResNet101 as a feature_lxThe horizontal coordinate of the fused upper left corner point is taken as the horizontal coordinate;

taking the abscissa of the lower right corner point as an example, the fusion method is shown as formula (8):

wherein t is_rx1、s₃Respectively, the abscissa and the detection score, t, of the lower right corner point of a prediction box when the SK-ResNet50 is taken as a feature to extract the network_rx2、s₄Respectively taking SK-ResNet101 as a feature to extract the abscissa and the detection score of the lower right corner point of a prediction box in the network, T_rxThe abscissa of the fused lower right corner point is used.

The SK cell contains three operations: slicing, fusing and selecting:

first is the first operation, slicing, X is a feature map of dimension (H ' × W ' × C ') extended by different convolution kernels to two branches, two operations being respectively

And after convolution, a BN batch standardization function and a ReLU activation function are carried out;

the second operation is: and (3) fusing, namely performing pixel addition on the two obtained characteristic graphs:

then global average pooling is performed:

s∈R^Ca one-dimensional profile is obtained, with a global receptive field, followed by a non-linear transformation using a fully connected layer to reduce the dimensionality:

wherein, represents ReLU activation function, β represents BN batch normalization, W ∈ R^d×CTo investigate the effect of d on model efficiency, a reduction ratio r was used to control its value:

wherein L represents the minimum value of d;

finally, the third operation: optionally, first perform a softmax operation on z:

in the formula, A, B ∈ R^C×d,A_c∈R^1×dIs a characteristic diagram of the c-th channel of A, a_cIs the c-th element of a. B is a redundancy matrix in both branches because a_c+b _c1. The final profile V is obtained from the following formula:

in SK-RetinaNet, the neuron reception fields obtained by convolution of convolution kernels with different sizes in slicing operation are different, and the size of the reception field is self-adaptively adjusted according to the different size of the input target; on the other hand, the fusion operation in the SK can realize that the weight of each characteristic channel is automatically trained according to the importance degree of different channels under the condition that the training parameters are not excessively increased, so that a more accurate result is achieved.

The invention has the characteristics and beneficial effects that:

the invention realizes the tasks of detecting and positioning the pneumonia focus by using a deep learning method, and greatly improves the efficiency compared with the judgment of the traditional doctor. The convolutional neural network can input images into the models in batches for detection, so that the burden of doctors is reduced, and the detection speed is increased.

By combining the SK unit with the RetinaNet (retina network) network, the disadvantage that the receptive field of each layer of artificial neurons in the common convolutional neural network is designed to be the same in size is changed, and each neuron can adaptively adjust the size of the receptive field according to multi-scale input information, so that the detection precision is improved while the complexity of the network is ensured to be low.

The algorithm of the invention comprises random inversion, random scaling, movement of coordinate space and brightness and contrast increase before the training picture is input into the neural network, so that the generalization capability of the model is strong.

Description of the drawings:

FIG. 1 SK is a schematic diagram.

Fig. 2 is a pneumonia detection network RetinaNet algorithm framework.

Fig. 3 a schematic representation of a chest X-ray.

FIG. 4 a feature pyramid creation process.

Detailed Description

As more and more patients with pneumonia develop, the problem of inadequate X-ray film diagnosis time and energy by radiologists becomes increasingly prominent. The neural network can be used for saving manpower and quickly detecting the position of the pneumonia focus, but the pneumonia is represented by increased opacity in an X-ray film and cannot be obviously judged in appearance, so that the discrimination of the neural network is very limited when the characteristics are extracted, and the result precision is not high enough. Increasing the number of network layers increases time complexity and reduces efficiency. Therefore, how to improve the detection accuracy by using the neural network without excessively increasing the complexity of the network becomes a difficult problem to be solved urgently.

The general technical scheme of the invention is as follows: the whole body is roughly divided into three parts: the system comprises a feature extraction network, a feature pyramid, a classification subbranch and a regression subbranch. ResNet50(ResNet50 is a residual network with 50 layers) and ResNet101(ResNet101 is a residual network with 101 layers) are respectively combined with Selective Kernel convolution (SK) to form SK-ResNet50 (the combination of SK and ResNet 50) and SK-ResNet101 (the combination of SK and ResNet 101) as feature extraction networks. Then, a feature pyramid network is established, with top-down cross-connections, such that subsequent detections on feature maps of different levels utilize both high-level features and low-level features. Secondly, a classification subbranch and a regression subbranch are respectively established on five feature maps of P3-P7 (see the detailed description below) output by the feature pyramid, and the detection scores of the prediction boxes are output and the positions of the detection scores are located. Finally, the results obtained by taking SK-ResNet50 and SK-ResNet101 as feature extraction network detection are fused, and the coordinates of the prediction box are adjusted by taking the detection score as a weight.

The concept related to the present invention is:

the prior frames are on five feature maps P3-P7 generated by the feature pyramid, and each feature map to be detected has nine different prior frames which comprise three different aspect ratios (0.5, 1, 2) and three different area changes (2)⁰，2^1/3，2^2/3). This is the first step to roughly select the blocks, and subsequent passage through the regression sub-branch will adjust its position to become the prediction blocks.

The target box is the location of the true pneumonia, with labels in the training set.

The prediction box is the position of the pneumonia focus predicted after the network processing is completed.

The invention provides a framework combining a classical detection network RetinaNet and an SK with a function of adaptively adjusting the size of a receptive field. The network framework is improved in two aspects: firstly, SK capable of adaptively adjusting the size of a receptive field is added into ResNet (ResNet is a residual error network), so that the principle that the size of a receptive domain of visual cortical neurons can be adjusted according to stimulation is simulated, a detection network capable of adaptively adjusting the size of the receptive field is formed, and detection precision is improved on the premise that training parameters are not excessively increased; secondly, a prediction frame fusion algorithm is added into the prediction network, the SK-ResNet50 and the SK-ResNet101 are respectively used as framework networks, the scores predicted by the classified sub-branches are used as weights to adjust the positions of the prediction frames, and more accurate prediction frames are generated.

The hardware of the invention is composed of: CPU (central processing unit)

Core i7-6800K 3.5Ghz, GPU Nvidia GTX1080Ti (11GB) × 2, Ubuntu 16.04 operating system, using open source deep learning framework Keras.

The invention comprises three parts: SK-ResNet50 or SK-ResNet101, a feature pyramid network and sub-branches. The SK-ResNet50 or SK-ResNet101 adds SK as a feature extraction network in each residual block on the basis of ResNet50 or ResNet 101; the feature pyramid network is used for constructing a multi-scale feature map set from top to bottom by using feature maps output by each layer of SK-ResNet50 or SK-ResNet 101; the sub-branches implement the detection and positioning tasks on multiple scales.

ResNet50 is a residual network of 50 layers, ResNet101 is a residual network of 101 layers, and the specific structure is shown in Table 1.

TABLE 1 ResNet50 and ResNet101 network architectures

SK is added into each residual block of ResNet50 or ResNet101 to obtain SK-ResNet50 or SK-ResNet101, and the specific structure is shown in Table 2.

TABLE 2 SK-ResNet50 and SK-ResNet101 network architectures

The respective network structures of the SK-ResNet, the feature pyramid network and the subbranches are as follows:

SK-ResNet50 or SK-ResNet101 adds SK to each convolution block in ResNet50 or ResNet 101.

2. The construction process of the characteristic pyramid network comprises the following steps:

a) c5 is reduced in dimension by 1 × 1 convolution, and the dimension becomes 256. The feature map P5_ upsampled becomes the same size as C4 after upsampling, and becomes the feature map P5 after a 3 × 3 convolution.

b) C4 is reduced in dimension by 1 × 1 convolution, the dimension is changed to 256, then the dimension is added with feature map P4 element by element, then the up-sampling is carried out, the feature map P4_ upsampled with the same size as C3 is changed, and the feature map P4 is formed by 3 × 3 convolution

c) C3 is reduced in dimension by 1 × 1 convolution, and the dimension is changed to 256, then added with P4_ upsamplled element by element, and changed into feature map P3 by 3 × 3 convolution

d) C5 is convolved into a feature map P6 with a size of 3 × 3 and a step size of 2

e) The characteristic map P6 is converted into a characteristic map P7 after being subjected to a convolution operation with the ReLU activation function and the size of 3 multiplied by 3 and the step size of 2, so that the characteristic pyramid network outputs five characteristic maps P3-P7

Wherein C3, C4 and C5 are respectively characteristic diagrams of outputs of 2 nd, 3 rd and 4 th convolution blocks (block3, block4 and block5) of ResNet 50.

The method for establishing the characteristic pyramid adopts the horizontal connection from top to bottom. The goal is to enable subsequent detection on different levels of feature maps while utilizing both high and low levels of features.

The construction process of the characteristic pyramid network comprises the following steps:

Fig. 4 is a process of establishing a feature pyramid, and it can be seen from the figure that a chest X-ray film is convoluted layer by layer, and the feature map becomes smaller. In the process of establishing the characteristic pyramid, the top-down transverse connection is formed through the characteristic graph adding operation and the up-sampling operation, and five characteristic graphs with different sizes are generated. A classification subbranch and a regression subbranch are established on each feature map in the following, so that the feature maps at different levels are detected, and the features at high level and the features at low level are utilized simultaneously.

3. The subbranches are composed of classification subbranches and regression subbranches.

The structure of the classification subbranch is very simple, applying four 3 × 3 convolutions, each convolution having 256 channels and a ReLU activation function. Four 3 x 3 convolutions are followed by a 3 x 3 convolution with one channel being 9. The regression subbranches are similar in structure to the classification subbranches, except that four 3 × 3 convolutions in the regression subbranch are followed by a 36-pass 3 × 3 convolution. It is connected to each feature map output by the feature pyramid network as well as the classification subbranches. There is a parallel relationship between the classification subbranch and the regression subbranch. The classification sub-branch outputs the detection score of the prediction box, and the regression sub-branch is used for regressing the position of the prediction box.

The classification subbranch predicts the possibility of existence of an object on each pixel point of 9 prior frames, is a small full convolution neural network (FCN) and is connected to each feature map output by the feature pyramid network, and parameters of the subnetwork are shared by all pyramid layers. The structure of the classification subbranch is very simple, applying four 3 × 3 convolutions, each convolution having 256 channels and a ReLU activation function. Four 3 x 3 convolutions are followed by a 3 x 3 convolution with one channel 9 and finally a sigmoid activation function is concatenated. The regression sub-branch is parallel to the classification sub-branch, the feature layer output by each pyramid is connected with a small FCN, the purpose is to regress the difference between the prior frame and the nearest target frame, values of four coordinates of an upper left corner point and a lower right corner point are continuously updated, and each pyramid layer generates 4 multiplied by 9 pieces of one-dimensional linear output.

FIG. 1 is a schematic SK diagram, which includes three operations: slicing, fusing and selecting.

First is the first operation, slicing, X is a feature map of dimension (H ' × W ' × C ') that can be extended by different convolution kernels to multiple branches, in this case we have two branches by default

The sizes of the convolution kernels are 3 and 5, respectively. And after convolution with BN batch normalization and ReLU activation functions.

The second operation is: and (4) fusing. And performing pixel addition on the two obtained feature maps:

then global average pooling is performed:

to give s ∈ R^CAnd a one-dimensional feature map, wherein the feature map has a global receptive field. Then a non-linear transformation is performed using a full convolutional layer to reduce dimensionality:

in the formula, represents ReLUActivation function, β for BN batch normalization, W ∈ R^d×C. To investigate the effect of d on model efficiency, we used the reduction ratio r to control its value:

wherein L represents the minimum value of d.

Finally, the third operation: and (4) selecting. Its role is to adaptively select information of different spatial scales. First a softmax operation is performed on z:

in the formula, A, B ∈ R^C×d,A_c∈R^1×dIs a characteristic diagram of the c-th channel of A, a_cIs the c-th element of a. In the two branches of this example B is a redundancy matrix, since a_c+b _c1. The final profile V is obtained from the following formula:

and fusing the result obtained by taking the SK-ResNet50 as the characteristic extraction network detection and the result obtained by taking the SK-ResNet101 as the characteristic extraction network detection. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. And the fusion method is that after the detection is finished respectively, the detection scores are used as weights, and the coordinates of the upper left corner point and the lower right corner point of the result frames of the two model detections are adjusted. The method comprises the following specific steps:

and fusing the detection results of the networks with SK-ResNet50 and SK-ResNet101 as features respectively. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. And the fusion method is that after the detection is finished respectively, the detection scores are used as weights, and the coordinates of the upper left corner point and the lower right corner point of the result frames of the two model detections are adjusted. Taking the abscissa of the upper left corner point as an example, the fusion method is shown as formula (7):

wherein t is_lx1、s₁Respectively, the abscissa and the detection score of the upper left corner point of a prediction box when the SK-ResNet50 is taken as a feature to extract the network, t_lx2、s₂Respectively taking SK-ResNet101 as a feature to extract the abscissa and the detection score of the upper left corner point of a prediction box in the network, T_lxThe abscissa of the fused upper left corner point.

The fusion method of the horizontal coordinates of the lower right corner points is shown as the formula (8):

The fusion is not realized by the feature extraction network, the feature pyramid, the classification subbranch and the regression subbranch which are described previously, but the detection result of the feature extraction network with SK-ResNet50 as the feature is fused with the detection result of the feature extraction network with SK-ResNet101 as the feature. The detection result comprises the coordinates of the upper left corner and the lower right corner of the prediction box and the detection score of the prediction box. And the fusion method is that after the detection is finished respectively, the detection scores are used as weights, and the coordinates of the upper left corner point and the lower right corner point of the result frames of the two model detections are adjusted.

Fig. 2 is a main framework of the pneumonia detection network, the selective kernel convolution residual error network and the feature pyramid network are combined into a backbone network, and P3-P7 are feature graphs with different scales and sizes, and are respectively connected with a classification sub-network and a regression sub-network. Compared with other neural networks, ResNet establishes a direct correlation channel between input and output, allowing the original input information to be directly transmitted to the later layers, so that the ResNet focuses on learning the residual error between input and output. The shallow layer network focuses more on detail information, the high layer network focuses more on semantic information, different characteristics are needed for detecting different targets, therefore, a characteristic pyramid network is built at the top end of ResNet, characteristic graphs P3-P7 are from small to large, and therefore low-layer characteristics and high-layer characteristics can be used simultaneously to conduct prediction on different layers simultaneously. Then, prediction frame extraction is carried out on each layer, the prior frame is simultaneously input into a classification sub-network and a regression sub-network, the score of a certain prior frame as a focus area is obtained, the position of the prior frame is adjusted, and the two sub-networks do not share the weight. Compared with the RetinaNet, on one hand, the SK-RetinaNet has different receptive fields of neurons obtained by convolution of convolution kernels with different sizes in slicing operation, so that the magnitude of the receptive fields can be adaptively adjusted according to different input target sizes. On the other hand, the fusion operation in the SK enables the weight of each characteristic channel to be automatically trained according to the importance degree of different channels under the condition that the training parameters are not excessively increased, so that a more accurate result is achieved.

Since the training set is limited in pictures and easy to overfit, the text expands data by methods of random flipping, random scaling, movement of coordinate space, increasing/decreasing brightness and contrast to prevent overfit and improve the generalization capability of the model. Fig. 3 is a schematic diagram of a chest X-ray film.

If the number of the data sets is too small, the overfitting problem can be caused in the training process, and the detection result is influenced. The image is subjected to processing such as random inversion to expand data and prevent overfitting. Thus, this operation of augmenting the data is done prior to training, i.e., prior to inputting the feature extraction network. The data set provided by the North American radiology Association in the pneumonia detection competition is adopted, and the data set is expanded after operations of random overturning, random zooming, coordinate space movement, brightness increase/decrease, contrast ratio increase/decrease and the like. And then inputting the expanded data set into a feature extraction network, and further carrying out subsequent detection.

Claims

1. A pneumonia picture calibrating device capable of adaptively adjusting the size of a receptive field is characterized by comprising an X-ray machine and a computer, wherein pictures shot by the X-ray machine are input into the computer, and the computer comprises a feature extraction network processing module, a feature pyramid module, a classification subbranch module and a regression subbranch module; ResNet50 and ResNet101 are respectively combined with the selective kernel convolution to form SK-ResNet50 and SK-ResNet101 as feature extraction networks, ResNet50 is a residual network of 50 layers, and ResNet101 is a residual network of 101 layers; inputting the extracted features into a feature pyramid module for processing, and outputting a feature graph by using a feature pyramid; the classification sub-branch module is a full convolution neural network FCN module, is connected to each feature map output by the feature pyramid network, and outputs the detection scores of the prediction frames; the regression sub-branch module is also a full convolution neural network FCN module, is also connected to each feature map output by the feature pyramid network, and outputs the position of a prediction frame, and the prediction frame is a predicted pneumonia focus area.

2. A pneumonia picture verification method capable of adaptively adjusting the size of a receptive field is characterized in that a selective kernel convolution retina network SK-RetinaNet is established, and the network comprises three parts: SK-ResNet, characteristic pyramid network and subbranch; the SK-ResNet is characterized in that an SK unit for adaptively adjusting the size of a receptive field is added into a residual block on the basis of a residual network ResNet to serve as a feature extraction network; the feature pyramid network is used for constructing a multi-scale feature map set from top to bottom by using feature maps output by each layer of SK-ResNet; the sub-branch is composed of a classification sub-branch and a regression sub-branch, the classification sub-branch is a full convolution neural network FCN and is connected to each feature map output by the feature pyramid network, the regression sub-branch is also connected to each feature map output by the feature pyramid network, and the classification sub-branch and the regression sub-branch realize detection and positioning tasks on multiple scales.

3. The pneumonia image verification method capable of adaptively adjusting the size of the receptive field according to claim 2, wherein the results obtained by extracting the network detection with SK-ResNet50 as the feature are merged with the results obtained by extracting the network detection with SK-ResNet101 as the feature, the detection results include the coordinates of the upper left corner and the lower right corner of the prediction box and the detection scores of the prediction box, and the merging method is to adjust the coordinates of the upper left corner point and the lower right corner point of the result boxes of the two model detections with the detection scores as the weights after the respective detections are completed.

4. The pneumonia image verification method capable of adaptively adjusting the size of receptive field according to claim 2, characterized by comprising the following steps:

5. The method for verification of pneumonia picture capable of adaptively adjusting the size of receptive field according to claim 4, wherein the SK unit comprises three operations: slicing, fusing and selecting:

then global average pooling is performed:

wherein L represents the minimum value of d;

in the formula, A, B ∈ R^C×d,A_c∈R^1×dIs a characteristic diagram of the c-th channel of A, a_cIs the c-th element of a. B is a redundancy matrix in both branches because a_c+b_c1. The final profile V is obtained from the following formula: