CN115359140A

CN115359140A - Automatic multi-feature region-of-interest delineation system and method based on neural network

Info

Publication number: CN115359140A
Application number: CN202211013370.XA
Authority: CN
Inventors: 陈明; 张婕; 杨一威; 徐裕金; 季永领
Original assignee: Zhejiang Cancer Hospital
Current assignee: Zhejiang Cancer Hospital
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-18

Abstract

The invention provides a system and a method for automatically delineating a multi-feature region of interest based on a neural network, which comprises an image module, a preprocessing module, an automatic delineating module and an output module; the image module is used for acquiring images; the preprocessing module is used for processing the image acquired by the image module to enable the image to meet the image standard required by the automatic drawing module; the automatic delineation module carries out ROI segmentation on the preprocessed image; and the output module processes the segmentation result output by the automatic delineation module to enable the segmentation result to be in a format which can be read by other equipment. The invention can identify objects with different characteristics and the same label. The convolution parameters of the dynamic region sensing convolution adopted by the invention can be automatically matched according to the image characteristics of the input image blocks, so that the invention can make correct judgment when different blocks of the ROI in the input image have different characteristics.

Description

Automatic multi-feature region-of-interest delineation system and method based on neural network

Technical Field

The invention belongs to the field of images, relates to a system and a method for automatically delineating a multi-feature region of interest based on a neural network, and particularly relates to a method for identifying a region of interest with various image features by applying a neural network technology.

Background

Delineation or segmentation of a region of interest (ROI) on a medical image is an important step in image analysis and therapy planning. For example, before performing image omics analysis, an ROI needs to be sketched on an image, and then image features of the ROI are extracted and analyzed; before radiation therapy, the contour of an organ or a tumor needs to be delineated on a medical image, so that optimal calculation and evaluation of planning can be performed in a radiation therapy planning system.

Traditional manual sketching is time-consuming and labor-consuming, and sketching accuracy greatly depends on energy and experience of people. Although automatic or semi-automatic delineation tools are available, they have drawbacks. The automatic delineation method based on the gray information is easily influenced by factors such as image artifacts and low contrast, and when the image characteristics inside the ROI are not expressed, the generated gray value difference also influences the delineation precision. The accuracy of automatic delineation based on template registration is affected by the template selection strategy, organ volume changes (e.g. bladder), and the robustness of the employed registration algorithm. At present, an automatic delineation model based on a convolutional neural network is calculated in a space domain by adopting a weight sharing mode, which means that only image blocks with the same image characteristics can be correctly identified by the model. When the image features presented by different pixels or voxels in the ROI have differences (for example, on a CT image, pixels of a normal lung and a atelectasis belong to the same label of a "lung", but the image features have very obvious differences due to the difference of gray levels between the two pixels), the accuracy of automatic delineation is greatly reduced.

Disclosure of Invention

The technical problem to be solved by the present invention is to overcome the defects of the prior art, and to provide an automatic delineation system of a multi-feature region of interest based on a neural network, comprising: the device comprises an image module, a preprocessing module, an automatic drawing module and an output module.

The image module is used for acquiring an image; the preprocessing module is used for processing the image acquired by the image module so as to enable the image to meet the image standard required by the automatic drawing module; the automatic delineation module carries out ROI segmentation on the preprocessed image; and the output module processes the segmentation result output by the automatic drawing module to enable the segmentation result to be in a format which can be read by other equipment.

The invention also aims to provide a method for constructing the multi-feature region of interest automatic delineation system based on the neural network, which is realized by the following steps:

s1: the image module adopts medical imaging equipment to shoot images of the region of interest, and the images are marked as I ₀ 。

In at least one embodiment of the present invention, the "medical imaging apparatus" includes: computed Tomography (CT), magnetic Resonance Imaging (MRI), positron Emission Tomography (PET), PET-CT.

S2: pretreatment module pair I ₀ Preprocessing to make it meet the image standard processed by the automatic drawing module and marking as I ₁ 。

In at least one embodiment of the present invention, I ₀ Is a CT image, the preprocessing method comprises the step of preprocessing I ₀ Middle CT value is [ u, v]Linear conversion of pixels in range to [0,255]CT values less than u are all converted to 0, and CT values greater than v are all converted to 255.

In at least one embodiment of the present invention, I ₀ Is a CT image, the preprocessing method comprises the step of preprocessing I ₀ Middle CT value is [ u, v]Pixel over range is linearly converted to [0,1]]CT values less than u are all converted to 0, and CT values greater than v are all converted to 1.

In at least one embodiment of the invention, the pretreatment method comprises subjecting I ₀ The regions of (1) that are not ROI-related are cropped and discarded. For example, I ₀ Is 512 pixels in length and width, will I ₀ The pixels in the 1 st to 128 th rows, the 385 th to 512 th rows, the 1 st to 128 th columns and the 385 th to 512 th columns are all abandoned, so that the pixel I ₁ The length and the width are both 256 pixels.

S3: outputting the preprocessed image I from S2 ₁ Inputting an automatic drawing module, wherein an automatic drawing model M contained in the module is used for drawing an image I ₁ ROI identification is performed to generate a divided mask image I _output ；

In at least one embodiment of the present invention, the "split mask pattern I _output "is the size and I ₁ The same image, in which the pixels identified by M as objects are labeled 1 and the pixels identified by M as background (i.e., non-objects) are labeled 0. The numbers 1 and 0 are only labeled to distinguish whether the pixel is a target or not, and those skilled in the art can conceive of using different number labels to distinguish the pixel as a target or a background without creative efforts, and thus, the method of modifying the number labels does not go beyond the technical scope of the present invention.

S4: mask I output by output module pair S3 _output And processing the data to generate a format which can be read by other equipment.

In at least one embodiment of the invention, the "mask pattern I for S3 output _output By treating "is meant treating I ₁ And I _output The corresponding element multiplication is performed so that the image resulting from the multiplication contains only the gradation value of the target region, and the gradation value of the non-target region becomes 0.

In at least one embodiment of the invention, the "mask pattern I for S3 output _output The "treatment" means: i is _output According to I ₁ The pixel regions cut out by S2 are filled with zero values, so that I after filling _output And I ₀ Are of uniform size, and then _output And I ₀ The corresponding element multiplication is performed so that the image resulting from the multiplication contains only the gradation value of the target region, and the gradation value of the non-target region becomes 0.

In at least one embodiment of the present invention, the "mask map I for S3 output _output Processing means reading the mask image I _output The coordinates of the edges of the medium target area, which may be two-dimensional or three-dimensional, are written into the file in accordance with the DICOM standard so that the file can be read by the DICOM processing software.

In at least one embodiment of the present invention, the "automatic delineation model M" in step S3 is a neural network model established based on dynamic region-aware convolution.

The structure of the neural network model established based on the dynamic region perception convolution comprises the following steps: the device comprises an input layer, a dynamic region perception convolution layer, an excitation function layer, a batch normalization layer, a transposition convolution layer based on dynamic region perception convolution and an output layer, wherein the layers are arranged according to a certain rule.

In a preferred embodiment of the present invention, the arrangement rule of the neural network model established based on the dynamic region sensing convolution sequentially includes an input layer, 5 combinations of the dynamic region sensing convolution layer, the excitation function layer, and the batch normalization layer according to the direction from input to output, and 5 combinations of the transposed convolution layer, the excitation function layer, and the batch normalization layer based on the dynamic region sensing convolution, and an output layer.

The dynamic region aware convolutional layer includes: the device comprises a feature extraction module, a feature coding module, a convolution kernel generation module and a convolution calculation module. The characteristic extraction module extracts characteristics from different blocks of the input image I' of the layer, each pixel point or voxel point of the image has a corresponding characteristic value, and the characteristic values form a characteristic graph F together; the characteristic coding module carries out 1-N coding on the characteristic values in the F to form a coding graph I _N (ii) a The convolution kernel generation module generates N convolution kernels W = { W = { (W) } ₁ ,W ₂ ,…,W _N }; the convolution calculation module is based on the coding pattern I _N At each pixel or voxel of I', the rootAccording to the corresponding code i epsilon [1, N ] of the pixel or the voxel]Selecting the corresponding convolution kernel W _i (i∈[1,N]) And carrying out multiplication and summation calculation on corresponding elements, and outputting corresponding values which jointly form the output of the dynamic area perception convolutional layer.

In at least one embodiment of the present invention, the feature extraction module of the dynamic region aware convolutional layer is implemented by 1 with k size _x ×k _y X C or k _x ×k _y ×k _z X C convolution kernel (where k _x 、k _y 、k _z Is the convolution kernel size along the three directions of x, y and z, and C is the channel number of the input image I ') and the input image I' are subjected to convolution calculation to obtain a feature map F; the characteristic coding module averagely divides the characteristic values in the F into N sections according to the sequence from small to large, and the characteristic values in the ith value section are all coded as i (i belongs to [1, N ]])。

In at least one embodiment of the present invention, the feature extraction module of the dynamic region-aware convolutional layer is implemented by passing N feature extraction modules with a size k _x ×k _y X C or k _x ×k _y ×k _z X C convolution kernel (where k _x 、k _y 、k _z Is the convolution kernel size along the three directions of x, y and z, and C is the channel number of the input image I ') and the input image I' are subjected to convolution calculation to obtain a feature map F, wherein the feature map has N channels, namely each pixel or voxel in F corresponds to N feature values; the characteristic coding module carries out coding according to the channel index corresponding to the maximum characteristic value at each pixel or voxel in the F, namely the pixel with the coordinate of (t, p) or the voxel with the coordinate of (t, p, q) in the characteristic map F has the characteristic values of N channels, wherein the m (m is equal to the element [1, N)]) The eigenvalue of each channel is the largest, and the code of the pixel or voxel is m.

In at least one embodiment of the present invention, initialization parameters of the N convolution kernels generated by the convolution kernel generation module of the dynamic region-aware convolutional layer are randomly generated.

In at least one embodiment of the present invention, the computation process of the transposed convolutional layer based on dynamic region-aware convolution is: (1) According to the input parameter step (stride), padding (padding) and volume of the layerSize of product nucleus (k) _x ×k _y Or k _x ×k _y ×k _z ) First, an image I is input on the layer _t Each two pixels or voxels of (1) stride-1 zero values are inserted, and then k is inserted around the image _x Row-1, k of padding _y Zero value or k of-padding-1 column _x Row-1, k of padding _y Column-padding-1, k _z Zero values of padding-1 layer, forming a new image I _w (ii) a (2) Respectively overturning the convolution kernel parameters along the directions of x and y or the directions of x, y and z; (3) Inverted convolution kernel parameters and I _w And performing dynamic region sensing convolution calculation, wherein the result of the convolution calculation is the layer output.

The training step of the automatic drawing model M in the step S3 comprises the following steps:

(1) Collecting a plurality of images I and ROI contours in the images;

(2) Conversion of ROI contours into mask I _mask The first to the second _mask The size of the ROI is consistent with that of the I, the value of the pixel or the voxel in the ROI and the value of the pixel or the voxel in the ROI are set to be 1, and the values of the rest pixels or the voxel are set to be 0;

(3) Preprocessing an image I, linearly converting the numerical value of the image I in the range of [ u, v ] into [0,255] or [0,1], converting the numerical value smaller than u into 0, and converting the numerical value larger than v into 255 or 1;

(4) A plurality of preprocessed images I and corresponding mask images I thereof _mask Respectively as the input and output of the automatic delineation model M, and applying a back propagation algorithm to optimize the M parameters until the outputs M (I) and I of the automatic delineation model _mask The contact ratio therebetween reaches the maximum.

In at least one embodiment of the present invention, "M (I) and I" are evaluated _mask The degree of overlap between "is a Dice Similarity Coeffient (DSC) and is calculated as follows:

the invention provides a system and a method for automatically delineating a multi-feature region of interest based on a neural network, and the system and the method have the advantages that: the model provided by the invention can identify objects with different characteristics and the same label. At present, the most common automatic delineation model based on a neural network is a full convolution neural network, and the network adopts standard convolution shared by space to perform operation, namely, each block in an input image is judged by adopting the same parameter, which means that when the image characteristics of the input blocks are inconsistent, the judgment precision of the model is also influenced. The convolution parameters of the dynamic regional sensing convolution adopted by the invention can be automatically matched according to the image characteristics of the input image blocks, so that the invention is still expected to make correct judgment when different blocks of the ROI in the input image have different characteristics.

Drawings

FIG. 1 is a schematic diagram of a neural network-based automatic delineation system for multi-feature regions of interest.

FIG. 2 is a flow chart of a method for automatically delineating a multi-feature region of interest based on a neural network.

FIG. 3 is a schematic diagram of a structure of a dynamic area sensing convolutional layer.

Fig. 4 is a schematic diagram of a calculation process of a dynamic region aware convolutional layer (N = 5).

Detailed Description

The invention is further explained by the accompanying drawings and examples.

Example 1

A system for automatic delineation of a multi-feature region of interest based on a neural network, as shown in fig. 1, comprising: the device comprises an image module, a preprocessing module, an automatic drawing module and an output module. The image module is used for acquiring an image; the preprocessing model is used for processing the image acquired by the image module so as to enable the image to meet the image standard required by the automatic drawing module; the automatic delineation module carries out ROI segmentation on the preprocessed image; and the output module processes the segmentation result output by the automatic drawing module to enable the segmentation result to be in a format which can be read by other equipment.

Example 2

A multi-feature region-of-interest automatic delineation method based on a neural network is disclosed, and is realized by the following steps as shown in FIG. 2:

s1: the image module adopts medical imaging equipment to shoot an image of the region of interest, and the image is marked as I ₀ 。

S2: pretreatment module pair I ₀ Preprocessing to make it meet the image standard processed by the automatic drawing module, and recording as I ₁ 。

In at least one embodiment of the invention, the pretreatment method comprises subjecting I ₀ The regions not associated with the ROI are cropped and discarded. For example, I ₀ Is 512 pixels in length and width, will I ₀ Discarding the pixels of the 1 st to 128 th rows, 385 th to 512 th rows, 1 st to 128 th columns and 385 th to 512 th columns to make I ₁ Is 256 pixels long and wide.

S3: outputting the preprocessed image I from S2 ₁ Inputting an automatic drawing module, wherein an automatic drawing model M contained in the module is used for drawing an image I ₁ ROI identification is performed to generate a divided mask image I _output 。

In at least one embodiment of the present invention, the "split mask map I" is _output "is the size and I ₁ The same image, in which the pixel identified by M as the target is marked 1, is identified by M as the background (i.e., non-target)The pixel is labeled 0. The numbers 1 and 0 are only labeled to distinguish whether the pixel is a target or not, and those skilled in the art can conceive of using different number labels to distinguish the pixel as a target or a background without creative efforts, and thus, the method of modifying the number labels does not go beyond the technical scope of the present invention.

In at least one embodiment of the present invention, the "mask map I for S3 output _output By treating "is meant treating I ₁ And I _output The corresponding element multiplication is performed so that the image resulting from the multiplication contains only the gradation value of the target region, and the gradation value of the non-target region becomes 0.

In at least one embodiment of the invention, the "mask pattern I for S3 output _output The "treatment" means: I.C. A _output According to I ₁ The regions of the pixels clipped by S2 are filled with zeros, so that the filled I _output And I ₀ Are of uniform size, and _output and I ₀ The corresponding element multiplication is performed so that the image resulting from the multiplication contains only the gradation value of the target region, and the gradation value of the non-target region becomes 0.

Example 3

As shown in fig. 3, the dynamic area aware convolutional layer includes: the device comprises a feature extraction module, a feature coding module, a convolution kernel generation module and a convolution calculation module. The operation process of the layer is as shown in fig. 4, the feature extraction module extracts features from different blocks of the input image I' of the layer, each pixel point or voxel point of the image has a corresponding feature value, and the feature values form a feature graph F together; the characteristic coding module carries out 1-N coding on the characteristic values in the F to form a coding graph I _N (ii) a The convolution kernel generation module generates N convolution kernels W = { W = { (W) } ₁ ,W ₂ ,…,W _N }; the convolution calculation module is based on the coding pattern I _N On each pixel or voxel of I', I e [1, N ] is assigned a code corresponding to that pixel or voxel]Selecting the corresponding convolution kernel W _i (i∈[1,N]) And carrying out multiplication and summation calculation on corresponding elements, and outputting corresponding values which jointly form the output of the dynamic area perception convolutional layer.

In at least one embodiment of the present invention, the feature extraction module of the dynamic region aware convolutional layer is implemented by 1 with k size _x ×k _y X C or k _x ×k _y ×k _z X C convolution kernel (where k _x 、k _y 、k _z Is the convolution kernel size along the x, y and z directions, and C is the channel number of the input image I ') and the input image I' are subjected to convolution calculation to obtain a feature map F; the characteristic coding module averagely divides the characteristic values in the F into N sections according to the sequence from small to large, and the characteristic values in the ith value section are all coded as i (i belongs to [1, N ]])。

In at least one embodiment of the present invention, the feature extraction module of the dynamic region-aware convolutional layer is implemented by N sizesIs k _x ×k _y X C or k _x ×k _y ×k _z X C convolution kernel (where k _x 、k _y 、k _z Is the convolution kernel size along the three directions of x, y and z, and C is the channel number of the input image I ') and the input image I' are subjected to convolution calculation to obtain a feature map F, wherein the feature map has N channels, namely each pixel or voxel in F corresponds to N feature values; the characteristic coding module carries out coding according to the channel index corresponding to the maximum characteristic value at each pixel or voxel in the F, namely, the pixel with the coordinate of (t, p) or the voxel with the coordinate of (t, p, q) in the characteristic diagram F has the characteristic values of N channels, wherein the m (m is the element [1, N)]) The eigenvalue of each channel is the largest, and the code of the pixel or voxel is m.

In at least one embodiment of the present invention, the computation process of the transposed convolutional layer based on dynamic region-aware convolution is: (1) According to the input parameter step length (stride), the number of 0 complements (padding) and the convolution kernel size (k) of the layer _x ×k _y Or k _x ×k _y ×k _z ) First, an image I is input on the layer _t Each two pixels or voxels of (1) stride zero values are inserted, and then k is inserted around the image _x Row-1, k of padding _y Zero value or k of-padding-1 column _x -padding-1 line, k _y Column-padding-1, k _z Zero values of padding-1 layer, forming a new image I _w (ii) a (2) Respectively overturning the convolution kernel parameters along the directions of x and y or the directions of x, y and z; (3) Inverted convolution kernel parameters and I _w And performing dynamic region perception convolution calculation, wherein the result of the convolution calculation is the layer output.

(1) Collecting a plurality of images I and ROI contours in the images;

(2) Conversion of ROI contours into mask I _mask The first to the second _mask Is in accordance with I, the outline of the ROI and the image of the interior thereofSetting the value of the pixel or the voxel to be 1, and setting the values of the rest pixels or the voxels to be 0;

Claims

1. an automatic multi-feature region-of-interest delineation system based on a neural network is characterized by comprising an image module, a preprocessing module, an automatic delineation module and an output module; the image module is used for acquiring images; the preprocessing module is used for processing the image acquired by the image module so as to enable the image to meet the image standard required by the automatic drawing module; the automatic delineation module carries out ROI segmentation on the preprocessed image; and the output module processes the segmentation result output by the automatic delineation module to enable the segmentation result to be in a format which can be read by other equipment.

2. The method for constructing an automatic delineation system of claim 1, wherein the method is implemented by the steps of:

s1: the image module adopts medical imaging equipment to shoot images of the region of interest, and the images are marked as I ₀ ；

The medical imaging apparatus includes: computed tomography images, magnetic resonance imaging, positron emission tomography, PET-CT;

s2: pretreatment module pair I ₀ Preprocessing to make it meet the image standard processed by the automatic drawing module and marking as I ₁ ；

The split mask pattern I _output Is the size and I ₁ The same image, in which the pixel identified as the target by M is marked as 1, and the pixel identified as the non-target by M is marked as 0;

3. The method of claim 2, wherein step S2 is step I ₀ In the case of CT images, the method of preprocessing includes one or more of the following methods: (1) Will I ₀ Middle CT value is [ u, v]Linear conversion of pixels in range to [0,255]Or [0,1]CT values less than u are all converted into 0, and CT values more than v are all converted into 255 or 1; (2) Will I ₀ The regions not associated with the ROI are cropped and discarded.

4. The construction method according to claim 2, wherein the automatic delineation model M in step S3 is a neural network model established based on dynamic region-aware convolution, and the structure of the model includes: the device comprises an input layer, a dynamic region perception convolution layer, an excitation function layer, a batch normalization layer, a transposition convolution layer based on dynamic region perception convolution and an output layer, wherein the layers are arranged according to a certain rule.

5. The building method according to claim 2, wherein the training step of the automatic delineation model M comprises:

(1) Collecting a plurality of images I and ROI contours in the images;

(4) A plurality of preprocessed images I and corresponding mask images I thereof _mask Respectively as the input and output of the automatic delineation model M, and applying a back propagation algorithm to optimize the M parameters until the outputs M (I) and I of the automatic delineation model _mask The contact ratio between the two components reaches the maximum;

wherein "M (I) and I" are evaluated _mask The degree of overlap between "is a Dice Similarity Coeffient (DSC) and is calculated as follows:

6. construction method according to claim 2, characterized in that the mask I output to S3 in step S4 is _output Performing the treatment includes one or more of the following methods:

(1) Will I ₁ And I _output Multiplying corresponding elements so that the image generated by multiplying only contains the gray value of the target area and the gray value of the non-target area is changed into 0;

(2)I _output according to I ₁ The pixel regions cut out by S2 are filled with zero values, so that I after filling _output And I ₀ Are of uniform size, and _output and I ₀ Multiplying corresponding elements, so that the image generated by multiplying only contains the gray value of the target area, and the gray value of the non-target area is changed into 0;

(3) Reading mask image I _output And (3) the edge coordinates of the medium target area, which are two-dimensional or three-dimensional coordinates, are written into the file according to the DICOM standard so that the file is read by DICOM processing software.

7. The construction method according to claim 4, wherein the arrangement rule of the neural network model established based on the dynamic region-aware convolution is sequentially an input layer, a combination of 5 groups of dynamic region-aware convolution layers, an excitation function layer and a batch normalization layer, a combination of 5 groups of transposed convolution layers, excitation function layers and batch normalization layers based on the dynamic region-aware convolution and an output layer according to the direction from input to output;

the dynamic area aware convolutional layer includes: the device comprises a feature extraction module, a feature coding module, a convolution kernel generation module and a convolution calculation module. The characteristic extraction module extracts characteristics from different blocks of the input image I' of the layer, each pixel point or voxel point of the image has a corresponding characteristic value, and the characteristic values form a characteristic graph F together; the characteristic coding module carries out 1-N coding on the characteristic values in the F to form a coding graph I _N (ii) a The convolution kernel generation module generates N convolution kernels W = { W = { (W) } ₁ ,W ₂ ,…,W _N }; the convolution calculation module is based on the coding pattern I _N On each pixel or voxel of I', I e [1, N ] is assigned to the corresponding code according to the pixel or voxel]Selecting the corresponding convolution kernel W _i (i∈[1,N]) And carrying out multiplication and summation calculation on corresponding elements, and outputting corresponding values which jointly form the output of the dynamic area perception convolutional layer.

8. The build method of claim 7 wherein the computation of the feature extraction module of the dynamic region-aware convolutional layer comprises one or more of the following:

(1) By 1 dimension k _x ×k _y X C or k _x ×k _y ×k _z X C convolution kernel (where k _x 、k _y 、k _z Convolution kernel sizes along x, y and z directions, and C is the number of channels of the input image I ') and the input image I' are subjected to convolution calculation to obtain a feature map F; the characteristic coding module averagely divides the characteristic values in the F into N sections according to the sequence from small to large, and the characteristic values in the ith numerical value section are all coded as i (i belongs to [1, N ]])；

(2) By N sizes of k _x ×k _y X C or k _x ×k _y ×k _z X C convolution kernel (where k _x 、k _y 、k _z Is the convolution kernel size along the x, y and z directions, and C is the channel number of the input image I ') and the input image I' are subjected to convolution calculation to obtain a feature map F, wherein the feature map has N channels, namely each pixel or voxel in F corresponds to N feature values; the characteristic coding module carries out coding according to the channel index corresponding to the maximum characteristic value at each pixel or voxel in the F, namely the pixel with the coordinate of (t, p) or the voxel with the coordinate of (t, p, q) in the characteristic map F has the characteristic values of N channels, wherein the m (m is equal to the element [1, N)]) The eigenvalue of each channel is the largest, and the code of the pixel or voxel is m.

9. The building method according to claim 7, wherein initialization parameters of the N convolution kernels generated by the convolution kernel generation module of the dynamic region-aware convolution layer are randomly generated.

10. The building method according to claim 7, wherein the calculation process of the transposed convolutional layer based on dynamic region-aware convolution is: (1) According to the input parameter step size stride of the layer, 0 padding and convolution kernel size (k) _x ×k _y Or k _x ×k _y ×k _z ) First, an image I is input on the layer _t Each two pixels or voxels of (1) stride zero values are inserted, and then k is inserted around the image _x Row-1, k of padding _y Zero value or k of-padding-1 column _x Row-1, k of padding _y Column-padding-1, k _z Zero values of the padding-1 layers forming a new image I _w (ii) a (2) Respectively overturning the convolution kernel parameters along the directions of x and y or the directions of x, y and z; (3) Inverted convolution kernel parameters and I _w And performing dynamic region sensing convolution calculation, wherein the result of the convolution calculation is the layer output.