CN112464579A

CN112464579A - Identification modeling method for searching esophageal cancer lesion area based on evolutionary neural network structure

Info

Publication number: CN112464579A
Application number: CN202110141443.2A
Authority: CN
Inventors: 章毅; 胡兵; 张潇之; 周尧; 刘伟; 吴雨; 袁湘蕾
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-03-09
Anticipated expiration: 2041-02-02
Also published as: CN112464579B

Abstract

The invention discloses an esophageal cancer lesion area identification modeling method based on evolutionary neural network structure search, which relates to the technical field of image pattern identification and medical images and comprises the following steps: s1: collecting and labeling an esophagus image data set used for training a neural network model; s2: constructing a neural network structure search space for identifying the lesion area of the esophageal cancer; s3: training a super-network model facing the identification of the lesion area of the esophageal cancer; s4: searching an optimal neural network structure on the constructed hyper-network model by using an evolutionary algorithm; s5: and finely adjusting the searched neural network structure, and predicting a lesion area on the newly input esophagus image. The method eliminates the dependence of the neural network structure design on expert experience in the esophageal cancer intelligent recognition task, so that the deep neural network method is easier to use in the aspect of esophageal cancer lesion area recognition.

Description

Identification modeling method for searching esophageal cancer lesion area based on evolutionary neural network structure

Technical Field

The invention relates to the technical field of image pattern recognition and medical images, in particular to a method for recognizing and modeling esophageal cancer lesion areas based on evolutionary neural network structure search.

Background

Esophageal Cancer (EC) is a highly lethal malignant disease [ A.K. Rustgi and H.B. El-Serag, "Esophagal Cancer," New England Journal of Medicine, vol. 371, No. 26, pp. 2499-. Gastroscopy is widely used for diagnosing early esophageal cancer at present, and can provide guidance for early intervention and treatment. In Clinical examinations, narrow-band light imaging (NBI) technology is commonly used for identification of esophageal lesion regions [ Y. Horiuchi, K. Aoyama, Y. Tokai, T. Hirasawa, S. Yoshimizu, A. Ishiyama, T. Yoshio, T. Tsuchida, J. Fujisaki, and T. Tada, "convergent neural network for differential imaging with differentiated imaging with narrow-band light imaging and diagnosis, pp. 1-9, 2019 ], and related studies have shown that NBI has higher esophageal cancer diagnosis accuracy than ordinary white light endoscopes [ Z. laser, L. testing, S. waveform, Watertight, Weishigella, W. environmental imaging, and" environmental imaging with white-light imaging and diagnosis, no. 6, pp. 5481 and 5486, 2019. Although endoscopic equipment and imaging techniques have made significant progress, physicians with extensive clinical experience and skilled operating skills are still lacking, which is even more severe in less developed areas. On the other hand, the esophageal cancer lesion area in the NBI image usually shows the characteristics of irregularity, deformation, random position, complex background content and the like, so that accurate identification is very difficult to realize, inconsistent observation results are easily generated among different doctors, and the accuracy of early diagnosis of esophageal cancer is affected.

The existing technical scheme mainly uses a computer to assist in identifying lesion areas, and can be divided into two main categories, wherein the first category is to design manual characteristics including images, textures, shapes and the like [ F. Van Der Sommen, S. Zinger, E.J. school, and P. De With ], "superior automatic identification of early approximate cancer using local volume and color features," neuro-displaying, vol 144, pp. 92-106, 2014; A. the method comprises the steps of selecting, F, van der Sommen, S, finger, E, school et al, "Evaluation and compliance of temporal feature presentation for the detection of early stage cancer in end copy," in Proceedings of the 8th International Conference Computer Vision and Applications (VIAPP), 2013 ], then combining machine learning methods such as decision trees, support vector machines and the like to classify the pixel points of the NBI image, and further generating prediction output to achieve the purpose of automatically prompting the position of a lesion area in the image. Another is a deep neural network based approach [ A. Ebigbo, R. Mendel, A. Probst, J. Manzeneder, F. Prinz, L.A. de Souza Jr, J. Papa, C. Palm, and H. Messmann, "Real-time use of intellectual interaction in the evaluation of cancer in Barrett's opera gus," Gut, vol.69, No. 4, pp. 615-; z, Wu, R, Ge, M, Wen, G, Liu, Y, Chen, P, Zhang, X, He, J, Hua, L, Luo, and S, Li, "Elnet: Automatic classification and segmentation for using a relational neural network," Medical Image Analysis, vol, 67, p, 101838, 2020 ], by constructing a deep neural network model, it is possible to learn the characteristics of high discriminability directly on the original NBI Image and to generate a dense prediction output in an Image-to-Image manner, so that the position and range of the lesion region in the current Image can be visually suggested.

The method of manually designing NBI image features relies heavily on a priori knowledge of digestive endoscopy and neural networks, which is difficult to acquire and describe and to effectively capture the essential representation of NBI images. Therefore, the computer-aided esophageal cancer lesion region identification method using manual features has limited practical applicability. The deep neural network method represented by the convolutional neural network can effectively overcome the problem that image features need to be designed manually through the capability of automatically learning the image features. However, the design of the existing convolutional neural network model structure is mainly oriented to the general visual task scene, such as image Recognition [ k. He, x. Zhang, s. Ren, and j. Sun, "Deep responsiveness learning for image Recognition," in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-. These network structures designed for general visual tasks are difficult to directly adapt to the esophageal cancer lesion area identification task. Specifically, the NBI esophageal image and the natural image have large differences, including the complexity of image content, the degree of change of the morphological scale of a lesion area, and the like. Therefore, the design of the model structure requires the co-participation of experienced medical and neural network experts, which consumes a lot of time and manpower. Furthermore, because of the limited expertise, the constructed network model does not necessarily have the best performance.

The evolutionary algorithm is a black box optimization method, and can design a candidate solution representation method according to a specific optimization problem, further construct a population and a genetic operator, gradually improve the quality of the solution of the whole population through environment selection, and finally converge. Since the search space of the network structure is usually discrete and is an undifferentiated relation with the network performance, the network structure search problem can be effectively solved by constructing the evolutionary algorithm. In addition, studies have shown that the resolution of the input images, the multi-level feature fusion and the up-sampling part of the network structure of the network model have important influence on the identification of the lesion region of esophageal cancer [ a. section, f. van der Sommen, s. Zinger, e. school et al ], "Evaluation and composition of the temporal feature prediction for the detection of the early stage cancer in end copy," in Proceedings of the 8th International Conference on Computer Vision and Applications (VISAPP), 2013; l, Guo, X, Xiao, C, Wu, X, Zeng, Y, Zhang, J, Du, S, Bai, J, Xie, Z, Zhang, Y, Li et al, "Real-time automated diagnosis of precancerous and early aesthetic square cell using a discrete model (with video)," scientific internal Endoscopy, vol 91, No. 1, pp. 41-51, 2020 ]. Therefore, the method for building the neural network structure based on evolution to search the esophageal cancer lesion area identification modeling can reduce the dependence on experience, save labor and time and realize automatic neural network modeling facing esophageal cancer lesion area identification.

In general terms: the convolutional neural network method has been widely used for esophageal cancer lesion area identification, but the construction of a network model has strong dependence on the experience of doctors and neural network experts. In view of the above, the invention aims to provide a method for identifying and modeling esophageal cancer lesion regions based on evolutionary neural network structure search, which automatically establishes an optimum convolutional neural network model according to collected esophageal images, thereby reducing the dependence on experience, greatly saving manpower and time, and finally assisting doctors in improving the accuracy of early diagnosis of esophageal cancer.

Disclosure of Invention

The invention aims to: aiming at the existing problems, the invention provides a method for searching the identification modeling of the lesion area of the esophageal cancer based on the evolutionary neural network structure, which can efficiently and automatically search the neural network structure suitable for analyzing the collected esophageal NBI image data and solve the problem that the neural network model design has strong dependence on expert experience in the identification of the lesion area of the esophageal cancer.

The technical scheme adopted by the invention is as follows:

an identification modeling method for searching esophageal cancer lesion areas based on an evolutionary neural network structure comprises the following steps:

s1: collecting and labeling an esophagus image data set used for training a neural network model;

s2: constructing a neural network structure search space for identifying the lesion area of the esophageal cancer;

s3: training a super-network model facing the identification of the lesion area of the esophageal cancer;

s4: searching an optimal neural network structure on the constructed hyper-network model by using an evolutionary algorithm;

s5: and finely adjusting the searched neural network structure, and predicting a lesion area on the newly input esophagus image.

Preferably, the step S1 of acquiring and labeling the esophageal image data set for training the neural network model includes the following steps:

s1-1: recording and collecting an esophageal endoscopy video stream, screening and cutting out a video segment in an NBI imaging mode;

s1-2: extracting video frames of the video segments containing the esophageal lesion areas, and randomly extracting video frames of normal esophagus;

s1-3: labeling a lesion region by using a polygon, and respectively labeling image levels of a lesion and a normal video frame;

s1-4: and dividing the labeled data set into a training set, a verification set and a test set.

Preferably, the step S2 of constructing the neural network structure search space for esophageal cancer lesion area identification specifically includes the following steps:

s2-1: constructing an input image size search space, including 192, based on the size of the collected data image

192, 256

256, 320

320, 384

384, 448

448 five resolutions; the resolution of the input image has great influence on the accuracy of the identification result of the lesion region of the neural network model, so that the construction of the size search space of the input image is carried out;

s2-2: constructing an upsampling operation search space, wherein the search space comprises two operations of deconvolution and bilinear interpolation; because different long sampling methods have certain influence on the identification result of the lesion area of the esophageal cancer, the construction of an up-sampling operation search space is carried out;

s2-3: constructing a convolution operation search space comprising 1

1，3

3，5

5，7

7, convolution operation of four convolution kernels with different sizes; each convolution operation is followed by a batchNormalized (BN) and the ReLU activation function is used.

Preferably, the step size of the deconvolution described in step S2-2 is 2, and the multiple of the bilinear interpolation is 2.

Preferably, the training of the hyper-network model for identifying the lesion region of esophageal cancer in step S3 specifically includes the following steps:

s3-1: constructing a training super-network model facing the identification of the lesion area of the esophageal cancer, wherein the whole super-network model is divided into two parts, namely a down-sampling part and an up-sampling part; the down-sampling part comprises a convolution layer, a pooling layer and four down-sampling blocks, wherein the four down-sampling blocks are a down-sampling block 1, a down-sampling block 2, a down-sampling block 3 and a down-sampling block 4 respectively, each down-sampling block comprises three convolution layers, and a residual connection mode is used; the up-sampling part comprises 6 up-sampling blocks, wherein the 6 up-sampling blocks are an up-sampling block 1, an up-sampling block 2, an up-sampling block 3, an up-sampling block 4, an up-sampling block 5 and an up-sampling block 6 respectively, each up-sampling block comprises an up-sampling layer and two convolution layers, each up-sampling layer and each convolution layer have a plurality of candidate operations, and only one candidate operation can be activated in the training and testing stage of the network model; the intermediate feature map of the down-sampling part is fused with the feature map in the up-sampling block in an adding mode, and is finally used for identifying the lesion area of the esophageal cancer;

s3-2: for each batch of training data, the resolution in the hyper-network model and each up-sampling operation and convolution operation in the up-sampling part have only one option, are randomly activated according to uniform distribution, and the activation of each operation is independent; a path is formed in the hyper-network model through random activation, a corresponding sub-network is further constructed for training, and the training process of the hyper-network model is formally described as follows:

in the above formula

It is shown that the network structure is,

representing the distribution of random samples of the network structure,

which is indicative of a parameter of the network,

a label indicating a lesion area of the esophagus,

representing the predicted output of the network model,

representing a network training error function.

Preferably, in step S4, an evolutionary algorithm is used to search for an optimal neural network structure on the constructed super network model, and the method specifically includes the following steps:

s4-1: constructing a coding scheme: using integers to represent input image resolution, an up-sampling operation type and a convolution operation type;

s4-2: designing genetic operation: according to the coding scheme constructed in S4-1, a single-point splitting and recombining operation is directly used for realizing crossing, and a certain bit of codes in a candidate solution is randomly reinitialized with a certain probability to realize mutation; preferably, a bit of the candidate solution is randomly reinitialized with a probability value of 0.1 to realize mutation;

s4-3: determining an environment selection method: in the environment selection process, the division Dice value and the floating point Operations Per Second (FLOPs) of the current model are used as two targets to carry out optimization, and elimination priorities of solutions in the population are determined by adopting non-dominant sorting and crowding-distance (growing-distance), so that a uniformly distributed Pareto-optimal (Pareto-optimal) solution set is obtained. Since the identification of the lesion region of esophageal cancer needs to be checked in consideration of efficiency, it is selected here.

It is further explained that the specific pareto optimal solution set obtaining process of the present invention is: creating an initialization population, performing fitness calculation, outputting a pareto optimal solution set if the requirements are met, performing genetic operation if the requirements are not met, performing multi-target selection, generating a progeny population, and performing fitness calculation again, as shown in fig. 4.

Preferably, five integers of 1-5 are used for encoding for five image resolutions in step S4-1; the upsampling operation uses 1 and 2 and the convolution operation uses 1-4 to encode.

Preferably, the fine-tuning of the neural network structure searched in step S5 and the prediction of the lesion area on the newly input esophageal image specifically include the following steps:

s5-1: according to the actual constraint condition of the operation efficiency, selecting a solution which meets the constraint and consumes the least FLOPs from the pareto solution set obtained by the evolutionary algorithm in the step S4-3; the actual operation efficiency constraint conditions comprise the resolution and the frame rate of the esophageal endoscopy video stream and the calculation force provided by the current calculation platform;

s5-2: converting the decoded codes into actually searched neural network structures, inheriting corresponding weight parameters from the super-network, and then continuing to perform fine adjustment on an esophageal cancer lesion data training set;

s5-3: the refined model is used in the test set to evaluate the effect and further used to predict the lesion area of the newly entered esophageal image.

Compared with the prior art, the invention has the beneficial effects that:

1) according to the invention, the optimal neural network structure can be automatically searched according to the collected esophagus image data, so that the dependence of the neural network structure design on expert experience in an esophagus cancer intelligent recognition task is eliminated, and the deep neural network method is easier to use in the aspect of esophagus cancer lesion area recognition;

2) the method can automatically find the resolution ratio of the esophagus image suitable for the neural network model, thereby reducing the influence of the image resolution ratio on the identification performance of the lesion area;

3) the invention can automatically search a group of network model structures with different calculated quantities, so that a proper neural network model can be flexibly selected according to different calculation cost constraint conditions, and the deployment of an esophageal cancer lesion area intelligent identification system under different scenes is facilitated.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a search space of the present invention;

FIG. 3 is a hyper-network model of the present invention;

FIG. 4 is a flow chart of a multi-objective evolutionary algorithm of the present invention;

fig. 5 is a diagram showing the identification result of the lesion region of esophageal cancer.

Detailed Description

The present invention will be described in further detail in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

s1: collecting and labeling an esophagus image data set used for training a neural network model:

wherein, the step S1 of collecting and labeling the esophagus image data set used for training the neural network model comprises the following steps:

S2: constructing a neural network structure search space for identifying the lesion region of the esophageal cancer:

the step S2 of constructing the neural network structure search space for identifying the lesion region of esophageal cancer specifically includes the following steps:

192, 256

256, 320

320, 384

384, 448

s2-3: constructing a convolution operation search space comprising 1

1，3

3，5

5，7

7, convolution operation of four convolution kernels with different sizes; each convolution operation is followed by Batch Normalization (BN) and a ReLU activation function is used.

Further, the step size of the deconvolution described in step S2-2 is 2, and the multiple of the bilinear interpolation is 2.

S3: training a super-network model facing the identification of the lesion area of the esophageal cancer:

the training of the esophageal cancer lesion region identification oriented hyper-network model in the step S3 specifically comprises the following steps:

in the above formula

It is shown that the network structure is,

representing the distribution of random samples of the network structure,

which is indicative of a parameter of the network,

a label indicating a lesion area of the esophagus,

representing the predicted output of the network model,

representing a network training error function.

S4: searching an optimal neural network structure on the constructed hyper-network model by using an evolutionary algorithm:

in step S4, an evolutionary algorithm is used to search for an optimal neural network structure on the constructed super network model, which specifically includes the following steps:

s4-2: designing genetic operation: according to the coding scheme constructed in S4-1, interleaving is achieved directly using a single point splitting and re-merging operation, and a certain bit code in the candidate solution is randomly re-initialized with a probability value of 0.1 to achieve mutation.

Further, for five image resolutions, five integers of 1 to 5 are used for encoding in step S4-1; the upsampling operation uses 1 and 2 and the convolution operation uses 1-4 to encode.

S5: fine-tuning the searched neural network structure and predicting the lesion area on the newly input esophagus image:

in step S5, the method includes the following steps:

s5-1: according to the actual constraint condition of the operation efficiency, selecting a solution which meets the constraint and consumes the least FLOPs from the pareto solution set obtained by the evolutionary algorithm in the step S4-3; the actual operation efficiency constraint conditions comprise the resolution and the frame rate of the tube endoscopy video stream and the calculation force which can be provided by the current calculation platform;

The esophageal cancer lesion area identification model searched based on the evolutionary neural network structure, which is constructed by the method, can automatically find the resolution of an esophageal image suitable for the neural network model, so that the influence of the image resolution on the lesion area identification performance is reduced; the method can automatically search the optimal neural network structure according to the collected esophagus image data, thereby eliminating the dependence of the neural network structure design on expert experience in the esophagus cancer intelligent recognition task and enabling the deep neural network method to be more easily used in the recognition of the lesion area of the esophagus cancer, and the detailed description is shown in fig. 5.

The above-mentioned embodiments only express the specific embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for those skilled in the art, without departing from the technical idea of the present application, several changes and modifications can be made, which are all within the protection scope of the present application.

Claims

1. An identification modeling method for searching esophageal cancer lesion areas based on an evolutionary neural network structure is characterized by comprising the following steps:

2. The identification modeling method for searching esophageal cancer lesion areas based on the evolved neural network structure as claimed in claim 1, wherein the esophageal image dataset collected and labeled in step S1 for training the neural network model specifically comprises the following steps:

s1-3: labeling a lesion area in a video frame containing an esophageal lesion area by using a polygon, and labeling the lesion area and a normal video frame respectively at an image level;

3. The method for identifying and modeling esophageal cancer lesion areas based on evolutionary neural network structure search according to claim 1, wherein a neural network structure search space for esophageal cancer lesion area identification is constructed in step S2, and the method specifically comprises the following steps:

s2-1: constructing an input image size search space according to the size of the acquired esophagus image, wherein the input image size search space comprises 192

192, 256

256, 320

320, 384

384, 448

448 five resolutions;

s2-2: constructing an upsampling operation search space, wherein the search space comprises two operations of deconvolution and bilinear interpolation;

s2-3: constructing a convolution operation search space comprising 1

1，3

3，5

5，7

7, convolution operation of four convolution kernels with different sizes; each convolution operation is followed by a batch normalization and a ReLU activation function is used.

4. The method for identifying and modeling esophageal cancer lesion region based on evolutionary neural network structure search of claim 3, wherein step size of deconvolution in step S2-2 is 2, and multiple of bilinear interpolation is 2.

5. The method for searching the esophageal cancer lesion area identification modeling based on the evolved neural network structure as claimed in claim 1, wherein the training of the hyper-network model for esophageal cancer lesion area identification in step S3 specifically comprises the following steps:

s3-1: constructing a training super-network model facing the identification of the lesion area of the esophageal cancer, wherein the whole super-network model is divided into two parts, namely a down-sampling part and an up-sampling part; the down-sampling part comprises a convolution layer, a pooling layer and four down-sampling blocks, wherein each down-sampling block comprises three convolution layers and is connected in a residual error mode; the up-sampling part comprises 6 up-sampling blocks, each up-sampling block comprises an up-sampling layer and two convolution layers, each up-sampling layer and each convolution layer have a plurality of candidate operations, and only one candidate operation can be activated in the training and testing stage of the network model; the intermediate feature map of the down-sampling part is fused with the feature map in the up-sampling block in an adding mode, and is finally used for identifying the lesion area of the esophageal cancer;

in the above formula

It is shown that the network structure is,

representing the distribution of random samples of the network structure,

which is indicative of a parameter of the network,

a label indicating a lesion area of the esophagus,

representing the predicted output of the network model,

representing a network training error function.

6. The method for identifying and modeling esophageal cancer lesion regions based on evolutionary neural network structure search according to claim 1, wherein an evolutionary algorithm is used to search an optimal neural network structure on the constructed hyper-network model in step S4, and the method specifically comprises the following steps:

s4-1: constructing a coding scheme: using integers to express the resolution, the up-sampling operation type and the convolution operation type of an input esophagus image;

s4-2: designing genetic operation: according to the coding scheme constructed in S4-1, a single-point splitting and recombining operation is directly used for realizing crossing, and a certain bit of codes in a candidate solution is randomly reinitialized with a probability value of 0.1 for realizing mutation;

s4-3: determining an environment selection method: in the environment selection process, the division Dice value and the floating point operation times per second of the current model are used as two targets for optimization, the elimination priority of the solutions in the population is determined by adopting non-dominated sorting and crowding distance, and then the uniformly distributed pareto optimal solution set is obtained.

7. The identification modeling method for searching esophageal cancer lesion area based on the evolutionary neural network structure as claimed in claim 6, wherein in step S4-1, five integers from 1 to 5 are used for encoding for five image resolutions; the upsampling operation uses 1 and 2 and the convolution operation uses 1-4 to encode.

8. The method for identifying and modeling esophageal cancer lesion areas based on the evolutionary neural network structure search of claim 6, wherein the step S5 of fine-tuning the searched neural network structure and predicting lesion areas on the newly input esophageal image specifically comprises the following steps:

s5-1: according to the actual constraint condition of the operation efficiency, selecting a solution which meets the constraint and consumes the least FLOPs from the pareto solution set obtained by the evolutionary algorithm in the step S4-3;