CN110473195B

CN110473195B - Medical focus detection framework and method capable of being customized automatically

Info

Publication number: CN110473195B
Application number: CN201910743751.5A
Authority: CN
Inventors: 梁小丹; 王绍菊; 林冰倩; 林倞
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-08-13
Filing date: 2019-08-13
Publication date: 2023-04-18
Anticipated expiration: 2039-08-13
Also published as: CN110473195A

Abstract

The invention discloses a medical focus detection framework capable of being customized automatically and a method thereof, wherein the detection framework comprises: the candidate feature extraction module is used for extracting the features of the medical image; the automatic head customizing module of the focus detection network defines a search space to combine the perception relations among the candidate areas together, and obtains the optimal focus detection network head by utilizing a micro NAS algorithm; the lesion detection network head optimization module obtains new candidate characteristics through a convolution layer and a standard cell and two shrinkage cells, performs binary classification and prediction frame regression on the candidate characteristics through two connection layers, and outputs the weight M of the candidate characteristic classification in the binary classification to the knowledge migration module as high-level semantic information; and the knowledge migration module is used for obtaining enhanced candidate characteristics by combining the semantic relations and transmitting related context information in different regions, combining the enhanced candidate characteristics with the original candidate characteristics, and finally performing multi-element classification and regression through the full connection layer.

Description

Medical focus detection framework and method capable of being customized automatically

Technical Field

The invention relates to the technical fields of image recognition, target detection, deep learning and the like, in particular to a medical focus detection framework and a method capable of being customized automatically.

Background

The object detection task is to find all objects of interest in the image and determine their position and size. The medical focus detection task is to find out all focuses in the medical image and determine the positions and the sizes of the focuses; this is an important prerequisite for computer aided detection/diagnosis (CADe/CADx). Currently, with the rapid development of deep learning algorithms, particularly Convolutional Neural Networks (CNNs), significant progress is brought to medical lesion detection. However, most of the existing methods directly use various CNN pre-training target detection models of natural images, such as RetinaNet, region-based full convolution network (R-FCN) and the like, for lesion detection of medical images; however, since there is a huge field difference between the medical image and the natural image, and there are specific challenges in this field, such as high similarity between the lesion and the background, non-uniform lesion type, and small lesion as the main challenges, in the medical lesion detection, the method of directly using the conventional natural image detection model has limited performance, and the performance is reduced due to the challenges. Therefore, it is very necessary to customize a network architecture specifically for medical lesion detection.

Recently, neural Architecture Search (NAS) has achieved very competitive performance in tasks such as image classification, semantic segmentation and natural image processing, and the purpose of NAS is to automatically Search for an optimal Neural network Architecture according to a task target, thereby breaking through the limitation of manual design of researchers and achieving better performance. Existing target detection NAS work only migrates network architecture searched in the picture classification task to the detection skeleton, and consumes a lot of GPU memory and time.

The traditional target detection framework mainly comprises three parts: feature extractor, region candidate network (RPN) and region-based CNN header. It is well known that pre-training feature extractors and RPNs with natural images such as ImageNet facilitates lesion detection and is an important component of medical lesion detection networks. Currently, there are many studies on region-based CNN heads, as shown in fig. 1, which can be mainly classified into the following three types: 1) Receptor Field Head (RFH), taking into account multiple sizes and shapes of the receptor field to highlight the importance of close proximity to the central region and to increase insensitivity to small spatial movements. 2) Fully Connected Header (FCH), parameters are redundant and spatial information is ignored. 3) Residual Bottleneck Header (RBH), using residual bottleneck module to enhance candidate information, merging different levels of features and avoiding gradient dispersion by layer jump connection, but the performance is limited by single receptive field.

Disclosure of Invention

In order to overcome the above-mentioned deficiencies of the prior art, the present invention provides an architecture and a method for automatically customizing medical lesion detection, so as to implement a unified multi-type lesion detection network capable of sharing related information with multiple types of lesions and implementing a seamless manner.

To achieve the above object, the present invention provides an automatic customized medical lesion detection architecture, comprising:

the candidate feature extraction module is used for extracting features of the input medical image and extracting candidate features of the image;

the system comprises a focus detection network head automatic customization module, a focus detection network head automatic customization module and a focus detection network head automatic customization module, wherein the focus detection network head automatic customization module is used for defining a new search space according to medical image characteristics, focus characteristics and related knowledge of target detection, the search space comprises a large number of advanced operations with sub-network architectures such as flexible receptive fields, skip layer connection and the like, a non-local operation is added, perception relations among candidate areas are combined together, and a micro NAS algorithm is utilized to search proper operation and connection modes in a designed search space according to the candidate characteristics so as to form an optimal focus detection network head suitable for medical images;

the lesion detection network head optimization module is used for customizing an optimal lesion detection network head customized by the lesion detection network head automatic customization module, obtaining new candidate characteristics from the candidate characteristic extraction module through a convolution layer with convolution kernel of 3 multiplied by 3, a standard cell and two shrinkage cells, performing binary classification and prediction frame regression on the new candidate characteristics through two connection layers, and outputting the weight M of the candidate characteristic classification in the binary classification to the knowledge migration module as high-level semantic information;

the knowledge migration module combines the semantic relation on the basis of a regional relation graph learned by the head of the optimal lesion detection network, transmits related context information in different regions to obtain an enhanced candidate feature, combines the enhanced candidate feature with the original candidate feature to share related information of multiple lesion types, and finally performs multi-element classification and regression through a full connection layer.

Preferably, the search space comprises the following 9 operations: 1) No connection is made; 2) Jump layer connection; 3) 3 × 3 average pooling; 4) Non-local; 5) Convolution of 1 × 3 and 3 × 1; 6) A depth separable convolution of 3 x 3; 7) A 5 x 5 depth separable convolution; 8) A convolution of 3 × 3 holes with a dilation rate of 3; 9) A convolution with 3 x 3 holes with a dilation rate of 5.

Preferably, in the micro NAS algorithm, firstly, a proper search space needs to be designed according to a task, and then, a module for searching is defined, including a standard cell module and a contracted cell module, where the step length of the standard cell module is 1, so as to keep the resolution of output equal to that of input, and at the same time, the number of channels is unchanged, the step length of the contracted cell module is 2, so as to reduce the resolution by half, and at the same time, double the number of channels, each module, i.e., cell, is regarded as a directed acyclic graph, the number of branches of the directed acyclic graph is defined, each branch represents a feature graph, and the connection mode between branches represents operation; after the search definition is completed, initialization setting is carried out, discrete structures of the branches are made continuous through a softmax function, then gradient return is carried out through a gradient descent algorithm to update weights of the branches, finally after a certain time of search is carried out, one operation with the largest weight is reserved in 9 operation connections, namely a dense connection is changed into a sparse connection, then two connections with the largest weight are selected as the inputs of the branches, and the results of the two connections are combined to serve as the outputs.

Preferably, the 9 candidate operation sets are specifically defined as follows:

1) Connectionless operation: there is no connection between the branches;

2) And (3) layer jump connection operation: the branches are directly connected without any operation;

3) 3 × 3 average pooling operation: average pooling with pooling kernel size of 3 × 3;

4) Convolution operations of 1 × 3 and 3 × 1: a ReLU active layer, a convolution layer with convolution kernel of 1 × 3, a batch normalization layer, a ReLU active layer, a convolution layer with convolution kernel of 3 × 1, and a batch normalization layer;

5) 3 × 3 depth separable convolution operations: a ReLU active layer, a convolution layer with convolution kernel of 3 × 3, a convolution layer with convolution kernel of 1 × 1, a batch normalization layer, a ReLU active layer, a convolution layer with convolution kernel of 3 × 3, a convolution layer with convolution kernel of 1 × 1 and a batch normalization layer;

6) 5 × 5 depth separable convolution operations: a ReLU active layer, a convolution layer with convolution kernel of 5 × 5, a convolution layer with convolution kernel of 1 × 1, a batch normalization layer, a ReLU active layer, a convolution layer with convolution kernel of 5 × 5, a convolution layer with convolution kernel of 1 × 1, and a batch normalization layer;

7) 3 × 3 hole convolution operation with expansion ratio of 3: a ReLU active layer, a convolution layer with convolution kernel of 3 x 3 and expansion rate of 3, a convolution layer with convolution kernel of 1 x 1, and a batch normalization layer;

8) 3 × 3 hole convolution operation with expansion ratio of 5: a ReLU active layer, a convolution layer with convolution kernel of 3 × 3 and expansion rate of 5, a convolution layer with convolution kernel of 1 × 1, and a batch normalization layer;

9) Non-local operations: the purpose is to encode semantic information between region candidates relevant for object detection.

Preferably, in the non-local operation, the relationship between the regions is represented as a region-to-region undirected graph G =<N，E>Each node in N corresponds to a region candidate, each edge e _i，j E encodes the relationship between two nodes, and the input of the Non-local operation is

The adjacency matrix of undirected graph G may be multiplied by the matrix E = softmax (φ (X) ^T ) Calculated, where φ (.) is a non-linear transformation with a ReLU activation function; then, each node in E is propagated by a graph convolution layer Y = sigma (Ef (X) W), wherein f (.), W is a nonlinear transformation, and sigma is an activation function, and finally, the fully-connected layers are connected to keep the input and output sizes consistent.

Preferably, the search strategy of the lesion detection network head automatic customization module is to optimize the continuous parameter expression of the discrete structure by using a stochastic gradient descent algorithm and learn a set of continuous framework weights

Output tensor of branch

Is a weighted mix of candidate operations, expressed as->

Wherein the weight->

Are architectural parameters.

Preferably, the training data is divided into two non-intersecting subsets, namely a training set and a validation set, and the optimization process is iteratively performed by the following two steps: 1) By passing

Updating the network weight w; 2) Pass and/or>

The architectural weight a is updated. L is _train (w, a) and L _val (w, a) loss of training and validation sets, respectively, wherein L _train (w, a) and L _val (w, a) loss of training and validation sets, respectively.

Preferably, the knowledge migration module implements the following process:

a. collecting weights and biases for raw binary classifications

As high-level semantic information, P is the dimension of the AutoRCNN Head output tensor, and soft mapping is used for->

De-mapping M, where s _ij The region i resulting from the original classification layer is classified as a score of j.

b. Graph reasoning by matrix multiplication to obtain enhanced features f _o I.e. f _o ＝EΓ ^s MW _o In which

Is a weight transformation matrix, o is the output dimension of the knowledge migration module, and enhances the feature f _o And the original characteristics f are combined together to improve the position and classification performance of the multi-type focus detection.

In order to achieve the above object, the present invention further provides an automatic customized medical lesion detection method, comprising the steps of:

the method comprises the following steps that S1, a candidate feature extraction module is used for extracting features of an input medical image, and candidate features of the image are extracted;

s2, utilizing an automatic customized focus detection network head optimal module to extract candidate features output by a candidate feature extraction module, firstly passing through a convolution layer with convolution kernel of 3 x 3, then passing through a standard cell and two shrinkage cells to obtain new candidate features, carrying out binary classification and prediction frame regression on the new candidate features through two connection layers, and outputting the weight M of the candidate feature classification in the binary classification to a knowledge migration module as high-level semantic information;

and S3, combining the semantic relations on the basis of a regional relation graph learned by the head of the optimal lesion detection network by using a knowledge migration module, transmitting related context information in different regions to obtain an enhanced candidate feature, combining the enhanced candidate feature and the original candidate feature to share related information of various lesion types, and finally performing multi-element classification and regression through a full connection layer.

Preferably, the method further comprises:

and S0, defining a new search space according to the characteristics of the medical image, the characteristics of the focus and the related knowledge of target detection, wherein the search space comprises a large number of advanced operations with sub-network architectures such as flexible receptive fields, skip layer connection and the like, adding a non-local operation, combining the perception relations among the candidate areas, and searching a proper operation and connection mode in the designed search space according to the candidate characteristics by utilizing a micro NAS algorithm to form an optimal focus detection network head suitable for the medical image.

Compared with the prior art, the medical focus detection framework and the method which can be automatically customized, disclosed by the invention, automatically customize a detection network framework head suitable for medical images for medical focus detection tasks, and realize a uniform multi-type focus detection network in a seamless mode through the knowledge migration module, so that the focus detection precision is improved, and the multi-type focus detection function is also increased. The medical focus detection network architecture head and the knowledge migration module can be based on any natural image universal detection network framework, the performance is improved to different degrees, and the optimal performance of the detection network framework is realized.

Drawings

FIG. 1 is a prior art RCNN Head architecture diagram;

FIG. 2 is a schematic diagram of an automatically customizable medical lesion detection architecture according to the present invention;

FIG. 3 is a diagram of the standard cell architecture in AutoRCNN Head in an embodiment of the present invention;

FIG. 4 is a diagram of the contracted cell architecture in AutoRCNN Head in an embodiment of the present invention;

fig. 5 is a flow chart illustrating steps of a method for automatically customizing a medical lesion detection in accordance with the present invention.

Detailed Description

Other advantages and capabilities of the present invention will be readily apparent to those skilled in the art from the present disclosure by describing the embodiments of the present invention with specific embodiments thereof in conjunction with the accompanying drawings. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

Fig. 2 is a schematic diagram of an architecture of an automatically customizable medical lesion detection architecture according to the present invention. As shown in fig. 2, an automatic customized medical lesion detection architecture of the present invention comprises:

the candidate feature extraction module 201 is configured to perform feature extraction on the input medical image to extract candidate features of the image. In the embodiment of the present invention, the candidate feature extraction module 201 extracts candidate features of an image by using a current universal detection network including a feature extractor and an RPN, because a medical image is a single-channel grayscale image, the candidate feature extraction module 201 will combine three channels including a key frame and frames before and after the key frame of the medical image as an input image I, extract semantic features of different levels of the image by using the feature extractor and a feature pyramid, extract candidate features of the image by using the RPN network, align the candidate features by RoI, and enter an optimal module 203 of a focus detection network Head (AutoRCNN Head)

A focus detection network Head (AutoRCNN Head) automatic customization module 202, which is used for defining a new search space according to the medical image characteristics, focus characteristics and related knowledge of target detection, wherein the search comprises a large number of advanced operations with sub-network architectures such as flexible receptive field, skip layer connection and the like, and adds a Non-local (Non-local) operation to combine the perception relations among the candidate regions together, and searches a suitable operation and connection mode in the designed search space according to the candidate characteristics by using a micro NAS algorithm to form a Head of a detection network, namely, the optimal focus detection network Head AutoRCNN Head suitable for medical images is automatically searched by using the micro NAS algorithm.

In the target detection task, the relationship between the size and the centrality of the receptive field can enhance the identifiability and the robustness of the feature representation, so that the invention designs a new search space for the micro NAS algorithm, which comprises the following 9 operations: 1) No connection is made; 2) Jump layer connection; 3) 3 × 3 average pooling; 4) Non-local (Non-local); 5) Convolution of 1 × 3 and 3 × 1; 6) A depth separable convolution of 3 x 3; 7) A 5 x 5 depth separable convolution; 8) A convolution of 3 × 3 holes with a dilation rate of 3; 9) A convolution with 3 x 3 holes with a dilation rate of 5.

In the micro NAS algorithm, firstly, a suitable search space needs to be designed according to a task, such as the above 9 operations, and then, modules for search, namely a standard cell module and a shrinking cell module, need to be defined, where the step size of the standard cell module is 1, so as to keep the resolution of the output and the input equal, and at the same time, the number of channels is unchanged, the step size of the shrinking cell module is 2, so as to reduce the resolution by half, and at the same time, double the number of channels. Each module, namely a cell, can be regarded as a directed acyclic graph, the number of branches of the directed acyclic graph needs to be defined, each branch represents a feature graph, and the connection mode among the branches represents operation, such as jump layer connection and 3 × 3 average pooling; after the search definition is completed, initialization setting is carried out, each branch has inputs from all previous branches, all candidate OP connections are arranged between every two connected branches, if the candidate OP has nine types, 9 connection relations are initialized between the connected branches, the weight sum of the 9 connection relations is set to be 1, the discrete structure is continuous through a softmax function, and then gradient return updating is carried out by using a gradient descent algorithm. And finally, after a certain time of search, firstly keeping the operation with the maximum weight value in 9 operation connections, namely changing a dense connection into a sparse connection, then selecting two connections with the maximum weight values as the input of the branch, and simultaneously combining the results of the two connections as the output.

Assuming that each module, cell, is a directed acyclic graph containing B branches, each branch having two inputs from the previous branch and one output; the branches represent feature tensors and the lines represent operations between feature tensors. Available 5-tuple (X) ₁ ,X ₂ ,OP ₁ ,OP ₂ ,

) Denotes the branch b in cell c, where X ₁ ,/>

Is the input tensor; OP (optical proximity module) ₁ ,OP ₂ Epsilon OP is X ₁ ,X ₂ Corresponding operation, OP is 9 types proposed by the present inventionA set of candidate operations; final output Y of cell c _c = merging

Specifically, the invention designs a new search space for the micro NAS algorithm, so that the relationship between the size and the centrality of the receptive field is utilized to enhance the identifiability and the robustness of the characteristic representation. The specific definitions of the 9 candidate operation sets, i.e., OPs, are as follows:

1) Connectionless operation: there is no connection between the branches;

5) 3 × 3 depth separable convolution operations: a ReLU active layer, a convolution layer with convolution kernel of 3 × 3, a convolution layer with convolution kernel of 1 × 1, a batch normalization layer, a ReLU active layer, a convolution layer with convolution kernel of 3 × 3, a convolution layer with convolution kernel of 1 × 1, and a batch normalization layer;

7) 3 × 3 hole convolution operation with expansion ratio 3: a ReLU active layer, a convolution layer with convolution kernel of 3 x 3 and expansion rate of 3, a convolution layer with convolution kernel of 1 x 1 and a batch normalization layer;

8) 3 × 3 hole convolution operation with expansion ratio of 5: a ReLU active layer, a convolution layer with convolution kernel of 3 × 3 and expansion ratio of 5, a convolution layer with convolution kernel of 1 × 1, and a batch normalization layer.

9) Non-local (Non-local) operation: the purpose is to encode semantic information between region candidates relevant for object detection. Specifically, the relationship between the regions is represented as a region-to-region undirected graph G =<N，E>Each node in N corresponds to a region candidate, each edge e _i，j E encodes the relationship between two nodes. In one embodiment of the present invention, the Non-local operation is input as

The adjacency matrix of undirected graph G may be multiplied by the matrix E = softmax (φ (X) ^T ) Calculated, where φ (.) is a non-linear transformation with a ReLU activation function; then, each node in E is propagated by a graph convolution layer Y = sigma (Ef (X) W), wherein f (.), W is a nonlinear transformation, sigma is an activation function, and finally, the fully-connected layers are connected to keep the input and output sizes consistent.

The search strategy of the lesion detection network Head (AutoRCNN Head) automatic customization module 202 is to optimize the continuous parameter expression of the discrete structure by using a random gradient descent algorithm and learn a group of continuous framework weights

The output tensor of the branch is greater or less>

Is a weighted mix of candidate operations that can be expressed as +>

Wherein the weight +>

Is an architectural parameter. In the Head AutoRCNN Head of the lesion detection network, the contracted cells are positioned at the depth of 1/3 to 2/3, and the other positions are standard cells. It is composed of a base, a cover and a coverThe corresponding architecture parameters are (a) ^{Shrinkage of} ，a ^{Standard of merit} ) Wherein all standard cells in AutoRCNN Head share weight a ^{Standard of reference} All contracting cells share the weight a ^{Shrinkage of} 。

After the discrete construct is serialized, the task of the construct search is reduced to learning a set of continuous variables

Optimization can be performed with an efficient random gradient descent algorithm. The loss function comprises a cross entropy function of lesion classification and an absolute loss function of lesion detection, and is carried out by iteratively updating the network weight and the architecture weight. The invention divides the training data into two non-crossed subsets, namely a training set and a verification set, and the loss corresponding to the two subsets is L _train And L _val . The optimization process is iteratively performed by the following two steps:

1) By passing

Updating the network weight w;

2) By passing

Updating the architecture weight a;

after the search is finished, by

Operation OP keeping a maximum weight _ij And finally, placing the contracted cells at the depth of 1/3 to 2/3 of the whole Head, and stacking the standard cells at the rest positions to form the best focus detection network Head AutoRCNN Head customized for the medical image. The best standard cell and contracted cell architecture maps found are shown in fig. 3 and 4. In the cell-cytoskeleton diagram, c _ { k } represents the output of the current cell, and c _ { k-1} and c _ { k-2} represent the outputs of the first two cells, respectively. Wherein 0/1/2/3 represents the first/second/third/four branches in the cell, respectivelyThe connection between the branches is the operation of a maximum weight value reserved in the candidate operation set after the search is finished. I.e. each cell has as input the outputs from the first two cells, and one output at a time. During initialization, 9 kinds of operation connections exist between each branch, after searching for gradient updating is carried out, firstly, only one operation with the maximum weight value is reserved for each connection path, and finally, the operation of reserving two maximum weight values from different inputs is selected as the input of the branch. Finally, the concatenation of all the branch outputs serves as the output of the whole cell. For example, in the standard cell architecture diagram of FIG. 3, the inputs to Branch 0 are the layer jump connection operation from c _ { k-1} and the 3 × 3 hole convolution operation with a dilation rate of 5 for c _ { k-2 }; the inputs to Branch 1 are the layer jump join operation from c _ { k-1} and the non-local operation of c _ { k-2 }; the inputs to branch 2 are the 3 x 3 depth separable convolution operation from c _ { k-1} and the 3 x 03 hole convolution operation with a dilation rate of 3 for branch 1, respectively; the inputs to branch 3 are the non-local operation from branch 0 and the 3 x 13 hole convolution operation with the dilation rate of 3 for branch 1, respectively; finally, the output of the standard cell is the output splicing of branch 0/1/2/3. In the contracted cytoskeleton diagram of FIG. 4, the inputs to branch 0 are the 1 × 23 and 3 × 31 convolution operations from c _ { k-1} and the 5 × 45 depth separable convolution operation of c _ { k-2 }; the inputs to branch 1 are the 1 × 53 and 3 × 1 convolution operations from c _ { k-1} and the 5 × 5 depth separable convolution operation of branch 0, respectively; the inputs to branch 2 are the 3 x 3 average pooling operation from branch 0 and the 1 x 3 and 3 x 1 convolution operations of branch 1, respectively; the inputs to branch 3 are the 5 x 5 depth separable convolution operation from branch 1 and the 1 x 3 and 3 x 1 convolution operations of branch 2, respectively; finally, the output of the standard cell is the output splicing of the branch 0/1/2/3.

An optimal lesion detection network Head (AutoRCNN Head) module 203, which is an optimal lesion detection network Head AutoRCNN Head customized by an automatic lesion detection network Head (AutoRCNN Head) customization module 202, firstly passes candidate features output by a candidate feature extraction module 201 through a convolution layer with convolution kernel of 3 × 3, then passes through a standard cell and two shrinkage cells to obtain new candidate features, performs binary classification and prediction frame regression on the new candidate features through two connection layers, outputs the weight M of the candidate feature classification in the binary classification to a knowledge migration module 204 as high-level semantic information, performs mapping migration to a multivariate classification module by the knowledge migration module 204 to obtain enhanced candidate features, combines the enhanced features and the candidate features, and performs multi-type lesion detection on the combined candidate features through the last two connection layers, namely multivariate classification and prediction frame regression. In the specific embodiment of the invention, the step length of the standard cell of the AutoRCNN Head module is 1, so that the resolution of the output is equal to that of the input, and the number of channels is unchanged; the cell contraction step size was 2, reducing the resolution by half while doubling the number of channels.

The knowledge migration module 204 combines the semantic relation on the basis of the region relation graph learned by the optimal lesion detection network Head AutoRCNN Head, and transmits the related context information in different regions to obtain an enhanced candidate feature, then combines the enhanced candidate feature and the original candidate feature together to share the related information of multiple lesion types, and finally carries out multi-element classification and regression through a full connection layer, thereby realizing a uniform multi-type lesion detection network in a seamless manner.

In the present invention, the knowledge migration module 204 does not simply fine-tune a multi-type classifier, but further studies the capability of the optimal lesion detection network Head AutoRCNN Head. Since Non-local will appear in the finally selected network architecture in many search experiments, E = softmax (Φ (X) ^T ) The learned graph structure converts binary classification into multi-type classification through reasoning.

In an embodiment of the present invention, the knowledge migration module 204 is implemented by using a graph inference algorithm, and specifically, the implementation process is as follows:

a. collecting weights and biases for raw binary classifications

As high-level semantic information, P is the dimension of the output tensor of the AutoRCNN Head. Due to the fact thatThe graph G is a region-to-region graph extracted from Non-local operation in AutoRCNN Head, and an input node f from category high-level semantic information to a knowledge migration module needs to be found _i E f, the most suitable mapping. To avoid errors resulting from the original binary classification, the present invention uses soft mapping @>

b. Graph reasoning by matrix multiplication to obtain enhanced features f _o I.e. f _o ＝EΓ ^s MW _o Wherein

Is a weight transformation matrix, and o is the output dimension of the knowledge migration module. Will enhance the feature f _o And the original characteristics f are combined to improve the position and classification performance of the multi-type focus detection.

Fig. 5 is a flow chart illustrating steps of a method for automatically customizing a medical lesion detection in accordance with the present invention. As shown in fig. 5, the present invention provides an automatic customized medical lesion detection method, which comprises the following steps:

s1, performing feature extraction on the input medical image by using a candidate feature extraction module to extract candidate features of the image. In the specific embodiment of the invention, the candidate feature extraction module extracts the candidate features of the image by using the existing general detection network comprising the feature extractor and the RPN, because the medical image is a single-channel gray scale image, the candidate feature extraction module can form three channels by using a key frame and front and rear frames of the medical image as an input image I, semantic features of different levels of the image are extracted by the feature extractor and a feature pyramid, then the candidate features of the image are extracted by the RPN network, and the candidate features enter a focus detection network Head (AutoRCNN Head) module after the RoI alignment.

And S2, utilizing an automatic customized focus detection network Head (AutoRCNN Head) optimal module to extract the candidate features output by the candidate feature extraction module, firstly passing through a convolution layer with convolution kernel of 3 x 3, then passing through a standard cell and two shrinkage cells to obtain new candidate features, carrying out binary classification and prediction frame regression on the new candidate features through two connection layers, and outputting the weight M of the candidate feature classification in the binary classification to a knowledge migration module as high-level semantic information.

And S3, on the basis of a region relation graph learned by an optimal lesion detection network Head AutoRCNN Head by utilizing a knowledge migration module, combining a semantic relation and transmitting related context information in different regions to obtain an enhanced candidate feature, then combining the enhanced candidate feature and the original candidate feature together to share related information of multiple lesion types, and finally performing multi-element classification and regression through a full connection layer, thereby realizing a uniform multi-type lesion detection network in a seamless mode.

Preferably, the present invention provides an automatic customized medical lesion detection method, further comprising:

step S0, according to the medical image characteristics, the focus characteristics and the related knowledge of target detection, defining a new search space, wherein the search space comprises a large number of advanced operations with sub-network architectures such as flexible receptive field, skip layer connection and the like, adding a Non-local operation to combine the perception relations among the candidate areas, and searching a proper operation and connection mode in the designed search space according to the candidate characteristics by using a micro NAS algorithm to form a Head of a detection network, namely, automatically searching an optimal focus detection network Head AutoRCNN Head suitable for the medical image by using the micro NAS algorithm.

In the target detection task, the relationship between the size and the centrality of the receptive field can enhance the identifiability and the robustness of the feature representation, so that the invention designs a new search space for the micro NAS algorithm, which comprises the following 9 operations: 1) No connection; 2) Jump layer connection; 3) 3 × 3 average pooling; 4) Non-local (Non-local); 5) Convolution of 1 × 3 and 3 × 1; 6) A 3 × 3 depth separable convolution; 7) A 5 x 5 depth separable convolution; 8) A convolution of 3 × 3 holes with a dilation rate of 3; 9) A convolution of 3 x 3 holes with an expansion ratio of 5. The 9 candidate operation sets, i.e. OPs, are specifically defined as follows:

1) Connectionless operation: there is no connection between the branches;

5) 3 × 3 depth separable convolution operation: a ReLU active layer, a convolution layer with convolution kernel of 3 × 3, a convolution layer with convolution kernel of 1 × 1, a batch normalization layer, a ReLU active layer, a convolution layer with convolution kernel of 3 × 3, a convolution layer with convolution kernel of 1 × 1, and a batch normalization layer;

6) 5 × 5 depth separable convolution operation: a ReLU active layer, a convolution layer with convolution kernel of 5 × 5, a convolution layer with convolution kernel of 1 × 1, a batch normalization layer, a ReLU active layer, a convolution layer with convolution kernel of 5 × 5, a convolution layer with convolution kernel of 1 × 1, and a batch normalization layer;

7) 3 × 3 hole convolution operation with expansion ratio of 3: a ReLU active layer, a convolution layer with convolution kernel of 3 x 3 and expansion rate of 3, a convolution layer with convolution kernel of 1 x 1 and a batch normalization layer;

9) Non-local (Non-local) operation: the objective is to encode semantic information between region candidates relevant to object detection. Specifically, the relationship between the regions is represented as a region-to-region undirected graph G =<N，E>Each node in N corresponds to a region candidate, each edge e _i，j E encodes the relationship between the two nodes. In one embodiment of the present invention, the Non-local operation is input as

In step S0, the search strategy is to optimize the continuous parameter expression of the discrete structure by using a stochastic gradient descent algorithm to learn a set of continuous framework weights

Output tensor of branch>

Is a weighted mix of candidate operations that may be expressed as ∑ or ∑>

Wherein the weight->

Is an architectural parameter. In the Head AutoRCNN Head of the lesion detection network of the present invention, the contractile cells are located at a depth of 1/3 to 2/3, and the other positions are standard cells. Their corresponding architectural parameters are respectively (a) ^{Shrinkage of} ，a ^{Standard of reference} ) Wherein all standard cells in AutoRCNN Head share weight a ^{Standard of merit} All contracting cells share the weight a ^{Shrinkage of} 。

After the discrete structure is serialized, the task of structure search is reduced to learning a set of continuationsVariables of

1) By passing

Updating the network weight w;

2) By passing

Updating the architecture weight a;

after the search is finished, by

Operation OP keeping a maximum weight _ij And finally, placing the contracted cells at the depth of 1/3 to 2/3 of the whole Head, and stacking the standard cells at the rest positions to form the best focus detection network Head AutoRCNN Head customized for the medical image. The best standard cell and contracted cell architecture maps found are shown in fig. 3 and 4. In the cell structure diagram, c.sup. { k } indicates the output of the current cell, and c.sup. { k-1} and c.sup. { k-2} indicate the outputs of the first two cells, respectively. Wherein 0/1/2/3 respectively represents the first/second/third/four branches in the cell, and the connection between the branches is the operation of a maximum weight value reserved in the candidate operation set after the search is finished. I.e. each cell has as input the outputs from the first two cells, and one output at a time. During initialization, there are 9 kinds of operation connections between each branch, and after the search of gradient update, only one maximum connection path is reserved in each connection pathOperations with large weights, and finally, the operation of keeping the maximum weight of two different inputs is selected as the input of the branch.

Preferably, in step S3, the knowledge migration module is implemented by using a graph inference algorithm, and specifically, the implementation process is as follows:

a. collecting weights and biases of raw binary classifications

As high-level semantic information, P is the dimension of the output tensor of the AutoRCNN Head. Since the graph G is a region-to-region graph extracted from Non-local operation in AutoRCNN Head, it is necessary to find the input node f from the category high-level semantic information to the knowledge migration module _i E.f, the most suitable mapping. To avoid errors resulting from the original binary classification, the present invention uses soft mapping @>

De-map M, where s _ij The region i resulting from the original classification layer is classified as a score of j.

In summary, the medical lesion detection architecture and method capable of being customized automatically of the present invention automatically customize a detection network architecture head suitable for medical images for medical lesion detection tasks, and implement a unified multi-type lesion detection network in a seamless manner through the knowledge migration module, thereby not only improving the precision of lesion detection, but also increasing the function of multi-type lesion detection. The medical focus detection network architecture head and the knowledge migration module can be based on any natural image universal detection network framework, the performance is improved to different degrees, and the optimal performance of the detection network framework is realized.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Modifications and variations can be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the present invention. Therefore, the scope of the invention should be determined from the following claims.

Claims

1. An automatically customizable medical lesion detection architecture, comprising:

a focus detection network head automatic customization module, which is used for defining a new search space according to medical image characteristics, focus characteristics and related knowledge of target detection, wherein the search space comprises advanced operations with flexible receptive field and skip layer connection sub-network architecture, and adds a non-local operation, combines the perception relations among candidate regions together, and searches a proper operation and connection mode in a designed search space by using a micro NAS algorithm according to the candidate characteristics so as to form an optimal focus detection network head suitable for medical images;

the focus detection network head optimal module is used for customizing the optimal focus detection network head customized by the focus detection network head automatic customization module, firstly passes a convolution layer with convolution kernel of 3 multiplied by 3, then passes a standard cell and two shrinkage cells to obtain new candidate features, carries out binary classification and prediction box regression on the new candidate features through two connection layers, and outputs the weight M of the candidate feature classification in the binary classification as high-level semantic information to the knowledge migration module;

the knowledge migration module is used for combining the semantic relation on the basis of a regional relation graph learned by the head of the optimal lesion detection network, transmitting related context information in different regions to obtain an enhanced candidate feature, combining the enhanced candidate feature with the original candidate feature to share related information of various lesion types, and finally performing multi-element classification and regression through a full connection layer;

the search space includes the following 9 operations: 1) No connection; 2) Jump layer connection; 3) 3 × 3 average pooling; 4) Non-local; 5) Convolution of 1 × 3 and 3 × 1; 6) A depth separable convolution of 3 x 3; 7) A 5 x 5 depth separable convolution; 8) A convolution of 3 × 3 holes with a dilation rate of 3; 9) A convolution of 3 × 3 holes with a dilation rate of 5;

in the micro NAS algorithm, firstly, a proper search space is designed according to a task, then, search modules are defined, wherein the search modules comprise a standard cell module and a contracted cell module, the step length of the standard cell module is 1 so as to keep the resolution ratio of output and input equal, the number of channels is unchanged, the step length of the contracted cell module is 2, the resolution ratio is reduced by half, the number of the channels is doubled, each module, namely a cell, is regarded as a directed acyclic graph, the number of branches of the directed acyclic graph is defined, each branch represents a feature graph, and the connection mode among the branches represents operation; after definition is completed, initialization setting is carried out, discrete structures of the branches are continuous through a softmax function, gradient feedback is carried out through a gradient descent algorithm to update weights of the discrete structures, finally after searching for a certain time, an operation with the largest weight is reserved in 9 operation connections, namely, one dense connection is changed into a sparse connection, then two connections with the largest weight are selected as the inputs of the branches, and results of the two connections are combined to serve as the outputs.

2. An automatically customizable medical lesion detection architecture according to claim 1, wherein: the specific definition of 9 operations of the search space is as follows:

1) Connectionless operation: there is no connection between the branches;

3. An automatically customizable medical lesion detection architecture according to claim 2, wherein: in the non-local operation, the relation between the regions is represented as a region-to-region undirected graph G, G =<N,E>Each node in N corresponds to a region candidate, each edge e _i,j E.e encodes the relationship between two nodes, the input of the non-local operation is

4. An automatically customizable medical lesion detection architecture according to claim 3, wherein: the search strategy of the focus detection network head automatic customization module is to optimize the continuous parameter expression of a discrete structure by using a random gradient descent algorithm and learn a group of continuous framework weights

The output tensor of the branch is greater or less>

Is a weighted mix of candidate operations, expressed as->

Wherein the weight->

Are architectural parameters.

5. An automatically customizable medical lesion detection architecture according to claim 4, wherein the training data is divided into two non-intersecting subsets, namely a training set and a validation set, and the optimization process is iteratively performed by the following two steps: 1) By passing

Updating the network weight w; 2) Pass and/or>

Updating the architecture weight a; l is _train (w, a) and L _val (w, a) loss of training and validation sets, respectively, wherein L _train (w, a) and L _val (w, a) loss of training and validation sets, respectively.

6. The automatically customizable medical lesion detection architecture of claim 5, wherein the knowledge migration module implements the process of:

a. collecting weights and biases for raw binary classifications

P is the dimension of the AutoRCNN Head output tensor as high-level semantic information and uses soft mapping +>

De-mapping M, where s _ij The area i obtained from the original classification layer is classified as a score of j;

Is a weight transformation matrix, o is the output dimension of the knowledge migration module, and enhances the feature f _o And the original characteristics f are combined to improve the position and classification performance of the multi-type focus detection. />