CN112419246A

CN112419246A - Depth detection network for quantifying esophageal mucosa IPCLs blood vessel morphological distribution

Info

Publication number: CN112419246A
Application number: CN202011263459.2A
Authority: CN
Inventors: 钟芸诗; 颜波; 蔡世伦; 谭伟敏; 王沛晟; 李吉春; 阿依木克地斯·亚力孔
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-02-26
Anticipated expiration: 2040-11-12
Also published as: CN112419246B

Abstract

The invention belongs to the technical field of medical image processing, and particularly relates to a depth detection network for quantifying vascular morphology distribution of esophageal mucosa IPCLs. The method comprises a characteristic extraction network, a characteristic pyramid, a region candidate network, a cancer focus classification network with interest region pooling and clustering distribution prior self-embedding, and a system for visualization on a narrow-band imaging endoscope image. Extracting a feature map of an input image by a feature extraction network; the characteristic pyramid fuses the characteristics of different scales; the regional candidate network provides possible lesion regions; pooling the region of interest to pool features into suspicious lesion regions; cluster distribution prior classifies cancer foci from an embedded cancer focus classification network; and finally, visualizing the images on the narrow-band imaging endoscope, and using different colors to frame and mark the cancer focus. The invention can detect and diagnose the cancer focus of early esophageal squamous carcinoma in the image, can effectively improve the diagnosis efficiency and assist doctors to obtain higher diagnosis precision.

Description

Depth detection network for quantifying esophageal mucosa IPCLs blood vessel morphological distribution

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a depth detection network for quantifying vascular mucosa IPCLs (IPCLs) blood vessel morphological distribution.

Background

Esophageal cancer and gastric cancer are common upper gastrointestinal malignant tumors in developing countries such as China, the number of new cases in China exceeds 40 percent of the total cases all over the world, and the morbidity and mortality are obviously higher than the average level all over the world^[10]. According to the latest statistics of Chinese tumor registration center, the new cases of esophagus cancer and stomach cancer are respectively located at the sixth and the second of malignant tumor. The prognosis of esophagus cancer and stomach cancer is poor, the relative survival rate in 5 years is respectively 20.9 percent and 27.4 percent, and the serious burden is brought to the health care^[11,13-14]. Standardized upper gastrointestinal cancer screening, treatment and follow-up are effective means for reducing the incidence and mortality of cancer. Narrow-band imaging endoscopic screening is the first means for finding upper gastrointestinal cancer. The pathological type and infiltration depth of esophageal mucosal lesion under narrow-band imaging endoscope are judged mainly according to the unique vascular morphology of capillary loops (IPCLs) in epithelial papilla.

According to the typing standards proposed by Inoue and Arima^[15]Blood vessels can be generally classified into A, B1, B2 and B3. Wherein type a refers to the observation of non-abnormal blood vessels; type B1 refers to observation of loop-shaped abnormal blood vessel, blood vessel dilation, snake-shaped form, different caliber, non-uniform shape, diameter of 20-30 μ M, and infiltration depth of M1-M2 layer; b2 type finger observationThe shape of the blood vessel is irregular tree-like or multiple from the non-loop-shaped blood vessel, and the infiltration depth is M3-SM1 layers; type B3 indicates that large green blood vessels were observed, the vessels were highly dilated and their depth of infiltration was SM2 layers.

The type, number and distribution of IPCL blood vessels play an important guiding role in clinical treatment decision. For example, the IPCL with a deep infiltration depth is prompted to have a large amount of aggregation, which may prompt that the esophageal lesion enters the middle and late stages, and is not suitable for minimally invasive treatment or even surgical treatment; conversely, if the IPCL with a deeper infiltration depth is scattered, the patient may have an opportunity for surgery.

Clinically, the observation of the IPCL is greatly affected by human subjective factors because, unlike conventional gastrointestinal endoscopic imaging modalities, the observation of the IPCL requires 10-50 times magnification of the surface of the lesion using a magnifying gastroscope in the NBI mode. In the same principle as in a microscope, the doctor will get an image close to 200 fine structures/fields of view in the zoom mode. Under the condition, a clinician needs to observe all structures, visual fatigue is very easy to generate, and due to insufficient clinical experience, after 5-10 visual fields are observed, the clinician only remembers a particularly impressive part, namely the ' reference Murphy ' law ', lacks an objective and quantifiable concept, and is easy to misjudge the state of an illness and cause errors of medical decision.

The research can enable clinicians to get rid of the influence of subjective factors (fatigue, careless, insufficient experience and the like caused by a large amount of fine observation), only needs to amplify the focus, obtains IPCL prediction aiming at all visual fields through computer analysis, comprises the number, proportion and aggregation condition of various blood vessels, and can help the clinicians to judge the focus more accurately.

The deep convolutional neural network is a machine learning technology, can effectively avoid human factors, and automatically learns how to extract abundant representative visual features from a large amount of marked data. The technology uses a back propagation optimization algorithm, so that a machine updates internal parameters thereof and learns the mapping relation from an input image to a label. In recent years, deep convolutional neural networks have greatly improved the performance of tasks in computer vision.

2012, Krizhevsky et al^[1]First in ImageNet^[2]The image classification competition applies a deep convolutional neural network, and obtains a champion with a Top-5 error rate of 15.3%, which causes a hot tide of deep learning. 2015 Simnyan et al^[3]The neural networks VGG-16 and VGG-19 of 16 and 19 layers are provided, the parameter number of the networks is increased, and the result of the ImageNet image classification task is further improved. 2016 He et al^[4]The use of the 152-layer residual network ResNet achieves a classification effect exceeding that of human eyes.

Deep convolutional neural networks not only perform excellently in image classification tasks, but also in some structured output tasks, such as object detection^[5-7]Semantic segmentation^[8,9]The same excellent effects are obtained. If the deep convolutional neural network is applied to computer-aided diagnosis, doctors can be assisted to make better medical diagnosis, early discovery and early treatment can be achieved, and the treatment effect is improved.

The invention provides a cluster distribution priori self-embedded detection network, which can fully excavate the potential cluster distribution priori of a cancer focus, extract rich characteristics and simultaneously realize the cancer focus detection and diagnosis of early esophageal squamous carcinoma.

Disclosure of Invention

The invention aims to provide a cluster distribution prior self-embedded depth detection network for quantifying the vascular morphology distribution of esophageal mucosa IPCLs, which eliminates the influence of human factors and realizes the automatic diagnosis of narrow-band imaging endoscopic images.

The invention provides a cluster distribution prior self-embedded detection network, which is based on a target detection neural network and specifically comprises the following steps: the system comprises a characteristic extraction backbone network, a characteristic pyramid network, a regional candidate network, an interest region pooling and clustering distribution prior self-embedded cancer focus classification network and an auxiliary diagnosis system for performing visualization on a narrow-band imaging endoscope image; wherein:

(1) the feature extraction backbone network is in ResNet-50^[4]Constructed on the basis of 50And a convolutional layer for extracting a feature map of the input image (i.e., a feature extractor as a feature pyramid). Specifically, feature maps are extracted at the tail of layers 1,2,3 and 4 of the ResNet-50 model respectively, the extracted feature maps respectively have 256 channels, 512 channels, 1024 channels and 2048 channels, and the sizes of the feature maps are original sizes of 1/4 channels, 1/8 channels, 1/16 channels and 1/32 channels; feature graph feed into feature pyramid network^[12]；

(2) The characteristic pyramid network is used for fusing characteristics of different scales, firstly unifying all characteristic graphs to 256 channels by using convolution of 1 multiplied by 1, then up-sampling the characteristics of an upper layer to twice size layer by layer from top to bottom, adding the characteristics of the upper layer and the lower layer, and performing convolution of 3 multiplied by 3; thus, a multi-scale feature map is obtained: the sizes of the original images are 1/4, 1/8, 1/16 and 1/32 respectively, and the number of channels is 256;

(3) the region candidate network is used for extracting possible lesion regions; wherein the anchor generator is used first^[5]Generating a dense rectangular candidate frame; the rectangle candidate frames have 5 × 3 different sizes, and are formed by combining five different sizes (such as width of 32, 64, 128, 256 and 512) and three different shapes (such as 1:1, 1:2 and 2: 1); the features of each layer in the feature pyramid are subjected to 3 × 3 convolution and 1 × 1 convolution, and the candidate box is judged to belong to a positive sample or a negative sample through Softmax; finally, performing boundary frame regression of three shapes (each box has 4 coordinates, so 3 × 4 is 12 channels) through convolution of 12 channels by 1 × 1, and correcting inaccurate candidate frames;

(4) the region of interest pooling and cluster distribution is a priori self-embedded cancer lesion classification network, wherein the region of interest pooling is performed by pooling features into suspicious lesion regions; the cluster distribution prior self-embedded cancer focus classification network is used for classifying the cancer focus; specifically, the region of interest is framed with a rectangular bounding box parallel to the coordinate axes, and the cancer focus classification result of the region, i.e., a normal (i.e., class a) region or a lesion region (i.e., class B1, B2, B3), is given; the network firstly extracts regions of interest from feature maps of different levels of a feature pyramid, aligns the regions of interest, and pools the regions of interest to 7 × 7 at maximum, so that each region of interest corresponds to a feature with the size of 256 × 7 × 7; then, the features of each region of interest are overlapped with the features of K adjacent neighbors (namely, the feature channels are connected and combined) to form a feature map in a shape of (256 × K) × 7 × 7, so that the classification network applies potential cancer focus distribution prior; two output branches are then produced through the full connection layer: the first branch circuit outputs the position offset of each characteristic region for further correcting the position of the detection frame; calculating the classification probability of the features through a Softmax function by the second branch to obtain the category of the cancer focus of the region; wherein the fully-connected layer is to flatten the characteristic diagram of (256 × K) × 7 × 7 shape to form a characteristic of (12544 × K) × 1 × 1, the output of the fully-connected layer is 1024 channels, the output of the first branch is 20 channels, i.e. each category corresponds to four coordinates of the bounding box, 5 × 4, and the output of the second branch is 5 channels, i.e. 5 categories including negative samples;

(5) the auxiliary diagnosis system for visualization on the narrow-band imaging endoscope image is used for finally carrying out visualization display on the narrow-band imaging endoscope image and carrying out frame selection marking on a cancer focus by using different colors. Specifically, the input is a narrow-band imaging endoscope image; the network is used for detecting and diagnosing the cancer foci, different color detection boxes are used for representing different cancer focus types, namely, the detection boxes respectively represent A, B1, B2 and B3 types in green, red, purple and black, and the classification confidence of the detection boxes is marked. Then, the confidence degrees of all the detection frames are screened, and the confidence degrees smaller than a threshold value T are removed₁The non-maximum value is used for inhibiting and eliminating all the detection frames, and the intersection ratio is larger than the threshold value T₂Redundant overlap boxes. Wherein T is₁、T₂Take [0, 1] over]All values of step size 0.05 within the range and by comparison F₁Score to determine the optimal threshold value T₁、T₂。

The training method of the network model comprises the following steps:

before training, network parameters of the ResNet-50 model are initialized randomly, images in a training set are scaled, the resolution of the images is not more than 800 x 1333, and corresponding bounding boxes are scaled in the same proportion.

During training, the images are firstly set to be [0.485, 0.456 and 0.406 ] according to the mean value]And standard deviation of [0.229, 0.224, 0.225 ]]Three channels (R, G, B) of the image are normalized. Using Adam optimization algorithm^[16]Let the initial learning rate be 10^-4Two estimated exponential decay rates: beta is a₁Is set to 0.9, beta₂0.999, weight decay is 0, and a small batch stochastic gradient descent strategy is used, with the batch size set to 8 to minimize the loss function; training is carried out for N rounds; because the distribution of each type of blood vessel in the training set is not balanced, the blood vessels of B2 and B3 types can not be fully trained, and Focal loss is used as a loss function of the cancer focus classification network, wherein the weights of negative samples, A, B1, B2 and B3 are respectively C₁、C₂、C₃、C₄、C₅And the distribution proportion of each blood vessel in the training set is determined after a plurality of experiments.

In the invention, after the image of the narrow-band imaging endoscope is input, the detection and diagnosis result of the cancer focus can be obtained only by one-time forward transmission.

The invention has the beneficial effects that:

the invention designs a cluster distribution prior self-embedded detection network, which takes a narrow-band imaging endoscope image as input and simultaneously realizes the cancer focus detection and diagnosis of early esophageal squamous cell carcinoma. The image to be tested can obtain detection and diagnosis results only through one-time forward propagation, partial network parameters are shared by detection and classification tasks, the calculated amount is effectively reduced, and the diagnosis efficiency is improved. Experimental results show that the invention can accurately detect the cancer focus area of early esophageal squamous carcinoma, provides accurate diagnosis results based on the detection frame, reduces the influence of human factors and improves the efficiency and accuracy of clinical diagnosis.

Drawings

FIG. 1 is a network framework diagram of the present invention.

FIG. 2 is a schematic diagram showing the detection and diagnosis effects of the present invention after the narrow-band imaging endoscope image is inputted into the network model. The method comprises the following steps of (a) obtaining a narrow-band imaging endoscope image, (b) obtaining a result of detecting and classifying a cancer focus in the image by the method, and (c) obtaining a result of detecting and classifying the cancer focus in the image by a doctor through experience.

Fig. 3 is a comparison of the present invention and the visualization effect of the doctor in the narrow band imaging endoscopic image for detection and diagnosis.

Figure 4 is a recall comparison of the present invention versus different classifications of physician detection and diagnosis in a narrow band imaging endoscopic image.

Fig. 5 is a characteristic diagram display after the characteristic extraction is performed through the characteristic extraction network in the present invention.

Detailed Description

The embodiments of the present invention are described in detail below, but the scope of the present invention is not limited to the examples.

The invention adopts the network framework shown in figure 1, and uses 144 narrow-band imaging endoscope images which are cooperatively marked by a plurality of doctors with abundant seniors for training, thereby obtaining a model which can automatically detect and diagnose the esophageal squamous cell carcinoma foci on the narrow-band imaging endoscope images. The specific process is as follows:

(1) before training, network parameters of the ResNet-50 model are initialized randomly, images in a training set are scaled, the resolution of the images is not more than 800 x 1333, and corresponding bounding boxes are scaled in the same proportion. .

(2) During training, the images are firstly set to be [0.485, 0.456 and 0.406 ] according to the mean value]And standard deviation of [0.229, 0.224, 0.225 ]]-normalizing the three channels (R, G, B) of the image; using Adam optimization algorithm^[16]Let the initial learning rate be 10^-4Twice estimated exponential decay Rate beta₁Is set to 0.9, beta₂0.999, weight decay is 0, and a small batch stochastic gradient descent strategy is used, with the batch size set to 8 to minimize the loss function; training is carried out for N rounds; because the distribution of each type of blood vessel in the training set is not balanced, the blood vessels of B2 and B3 types can not be fully trained, and Focal loss is used as a loss function of the cancer focus classification network, wherein the weights of negative samples, A, B1, B2 and B3 are respectively C₁、C₂、C₃、C₄、C₅The training is performed for multiple times according to the distribution proportion of each blood vessel in the training setDetermined after the experiment.

(3) During testing, the narrow-band imaging endoscope image is scaled so that the resolution does not exceed 800 × 1333, and the narrow-band imaging endoscope image is input into a trained model, and the model outputs the outer surrounding frames of all detected blood vessels, the corresponding cancer focus classes (including the normal class A and the four classes of the abnormal classes B1, B2 and B3) and the confidence coefficient p of the cancer focus classes. In particular, since the number of blood vessels included in a narrow-band imaging endoscopic image is large, the upper limit of the number of detection frames per image is set to 250. Setting a threshold T₁0.3, when p > 0.3, the outer bounding box is retained, otherwise the outer bounding box is removed. Setting a threshold T₂0.3, the remaining outer bounding box is non-maximally suppressed, remaining only in the neighborhood (intersection ratio greater than threshold T)₂Time) the bounding box with the highest confidence p.

FIG. 2 illustrates the detection and diagnosis effect of the present invention after the narrow-band imaging endoscope image is inputted into the network model, wherein (a) is the original narrow-band imaging endoscope image; (b) for the outer surrounding frame obtained by detecting the cancer foci in the image and the corresponding classification and confidence coefficient, different colors respectively represent different cancer focus types, namely green, red, purple and white respectively represent A, B1, B2 and B3; (c) the method is a result of the cooperation of cancer focus detection and classification in images after discussion for a plurality of doctors with years of clinical practice and abundant experience. It can be seen from the figure that the results of the combined judgment of the system and a plurality of doctors with abundant experience are basically consistent in the detection and classification of the cancer foci, and the invention has strong application value.

Fig. 3 shows the visual effect of the present invention compared with the detection and diagnosis of a single doctor in a narrow-band imaging endoscopic image, wherein the reference basis (i.e. standard result) of the detection and diagnosis is cooperatively labeled by a plurality of qualified doctors. Therefore, when a single doctor carries out detection and diagnosis, mistakes and omissions are inevitably generated, and higher sensitivity cannot be achieved, but the system of the invention not only has higher judgment speed (each image is less than 1 second) but also has higher accuracy compared with a single doctor.

FIG. 4 is a comparison of recall rates of different categories of detection and diagnosis of the present invention compared to a single physician in a narrow band imaging endoscopic image, wherein the reference basis (i.e., standard outcome) for detection and diagnosis is co-labeled by a plurality of highly qualified physicians. It can be seen that the recall rate of the present invention is much higher than that of a single doctor, and the recall rate means the rate of detection and correct classification of real cancer foci, which means that the present invention is much less than the case of detection of misdiagnosis by a single doctor.

Fig. 5 is a characteristic diagram showing the characteristic extraction performed by the feature extraction network according to the present invention, and it can be seen that after the characteristic extraction, the characteristic values of blood vessels and non-blood vessels are greatly different, which indicates that the feature extraction network can effectively extract key features for detection and diagnosis from the narrow-band imaging endoscopic image.

Tables 1 and 2 show the sensitivity, accuracy and recall rate analysis of the invention and a single doctor in the narrow-band imaging endoscope image. Table 1 is the performance of the network of the present invention when K ═ 4 is taken (i.e., the classification uses feature fusion of 4 neighbors); table 2 shows the results of the tests and diagnoses by the individual doctors. The judgment criteria of detection and diagnosis are labeled by a plurality of doctors with rich seniority. Therefore, the invention surpasses the detection and diagnosis level of a single doctor in the recall rate and embodies the clinical use value of the invention.

TABLE 1

Type (B)	TP	FP	FN	Sensitivity of the device	Rate of accuracy	Recall rate
							A	169	267	53	0.761	0.388	0.669
B1	3248	489	249	0.929	0.869	0.916
							B2	98	40	70	0.583	0.710	0.466
B3	20	22	5	0.800	0.476	0.500
							Overall	3535	818	377	0.904	0.812	0.884

TABLE 2

Type of lesion	TP	FP	FN	Sensitivity of the device	Rate of accuracy	Recall rate
							A	-	-	-	-	-	0.50
B1	-	-	-	-	-	0.70
							B2	-	-	-	-	-	0.93
B3	-	-	-	-	-	1.00
							Overall	-	-	-	-	-	0.67

Reference to the literature

[1]Krizhevsky,A.,Sutskever,I.&Hinton,G.E.ImageNet classification with deep convolutional neural networks.Advances in Neural Information Processing Systems,1097-1105(2012).

[2]Russakovsky,O.,Deng,J.,Su,H.et al.ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision 115,211-252(2015).

[3]Simonyan,K.&Zisserman A.Very deep convolutional networks for large-scale image recognition.International Conference on Representation Learning,(2014).

[4]He,K.,Zhang,X.,Ren,S.&Sun,J.Deep residual learning for image recognition.IEEE Conference on Computer Vision and Pattern Recognition,770-778(2016).

[5]Girshick,R.,Donahue,J.,Darrell,T.&Malik,J.Rich feature hierarchies for accurate object detection and semantic segmentation.IEEE Conference on Computer Vision and Pattern Recognition,580-587(2014).

[6]Girshick,R.Fast R-CNN.IEEE International Conference on Computer Vision,1440-1448(2015).

[7]Ren,S.,He,K.,Girshick,R.&Sun,J.Faster R-CNN:Towards real-time object detection with region proposal networks.Neural Information Processing Systems,(2015).

[8]Long,J.,Shelhamer,E.&Darrell,T.Fully convolutional networks for semantic segmentation.IEEE International Conference on Computer Vision,3431-3440(2015).

[9]Chen,L.,Papandreou,G.,Kokkinos,I.,Murphy,K.&Yuille,A.L.DeepLab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected CRFs.IEEE Transactions on Pattern Analysis and Machine Intelligence 40,834-848(2018).

[10]Ervik M,L F,Ferlay J,et al.Cancer Today.Lyon,France:International Agency for Research on Cancer.Cancer Today.[EB/OL].[2017-02-26].

[11] Chenwangqing, Zhengrongshou, Zhangwei, etc. in 2013, the attack and death of Chinese malignant tumors are analyzed [ J ] in 2017,26(1):1-7.

[12]Tsung-Yi Lin,Piotr Dollár,Ross B.Girshick,Kaiming He,BharathHariharan,Serge J.Belongie:Feature Pyramid Networks for Object Detection.CVPR 2017:936-944.

[13]Zeng H,Zheng R,Guo Y,et al.Cancer survival in China,2003-2005:a population based study[J].Int J Cancer,2015,136(8)

[14]Chen WQ,Zheng RS,Baade PD,et al.Cancer statistics in China,2015[J].CA Cancer J Clin,2016,66(2):115-132.

[15]Inoue H，Kaga M，Ikeda H，et al.Magnification endoscopyin esophageal squamous cell carcinoma:a review of theintrapapillary capillary loop classification[J].AnnGastroenterol,2015,28(1):41-48.

[16]Diederik P.Kingma,Jimmy Ba.Adam:A Method for Stochastic Optimization.ICLR(Poster)2015.

Claims

1. A clustering distribution prior self-embedded depth detection network for quantifying esophageal mucosa IPCLs vessel morphology distribution is characterized by specifically comprising: the system comprises a characteristic extraction backbone network, a characteristic pyramid network, a regional candidate network, an interest region pooling and clustering distribution prior self-embedded cancer focus classification network and an auxiliary diagnosis system for performing visualization on a narrow-band imaging endoscope image; wherein:

(1) the feature extraction backbone network is constructed on the basis of ResNet-50 and comprises 50 convolutional layers for extracting a feature map of an input image; specifically, feature maps are extracted at the tail of layers 1,2,3 and 4 of the ResNet-50 model respectively, the extracted feature maps respectively have 256 channels, 512 channels, 1024 channels and 2048 channels, and the sizes of the feature maps are original sizes of 1/4 channels, 1/8 channels, 1/16 channels and 1/32 channels; feature graph feed into feature pyramid network^]；

(2) The characteristic pyramid network is used for fusing characteristics of different scales, firstly unifying all characteristic graphs to 256 channels by using convolution of 1 multiplied by 1, then up-sampling the characteristics of an upper layer to twice size layer by layer from top to bottom, adding the characteristics of the upper layer and the lower layer, and performing convolution of 3 multiplied by 3; obtaining a multi-scale feature map: the sizes of the original images are 1/4, 1/8, 1/16 and 1/32 respectively, and the number of channels is 256;

(3) the region candidate network is used for extracting possible lesion regions; wherein, firstly, an anchor generator is used for generating a dense rectangular candidate box; the candidate frames of the rectangle have 5 multiplied by 3 different sizes and are formed by combining five different sizes and three different shapes; the features of each layer in the feature pyramid are subjected to 3 × 3 convolution and 1 × 1 convolution, and the candidate box is judged to belong to a positive sample or a negative sample through Softmax; finally, performing boundary frame regression of three shapes through convolution of 12 channels by 1 × 1, and correcting inaccurate candidate frames;

(4) the region of interest pooling and cluster distribution is a priori self-embedded cancer lesion classification network, wherein the region of interest pooling is performed by pooling features in the region of interest to suspicious lesion regions; the cluster distribution prior self-embedded cancer focus classification network is used for classifying the cancer focus; specifically, a region of interest is framed by a rectangular bounding box parallel to coordinate axes, and a cancer focus classification result of the region is given, namely a normal (type A) region or a lesion region (types B1, B2 and B3); the network firstly extracts interested areas from feature maps of different levels of a feature pyramid, aligns the interested areas and pools the interested areas to 7 x 7 to the maximum extent, so that each interested area corresponds to a feature with the size of 256 x 7; then, the features of each region of interest are superposed with the adjacent K adjacent features to form a (256 × K) × 7 × 7-shaped feature map, so that the classification network is applied to potential cancer focus distribution prior; two output branches are then produced through the full connection layer: the first branch circuit outputs the position offset of each characteristic region for further correcting the position of the detection frame; calculating the classification probability of the features through a Softmax function by the second branch to obtain the category of the cancer focus of the region; wherein the fully-connected layer is to flatten the characteristic diagram of (256 × K) × 7 × 7 shape to form a characteristic of (12544 × K) × 1 × 1, the output of the fully-connected layer is 1024 channels, the output of the first branch is 20 channels, i.e. each category corresponds to four coordinates of the bounding box, 5 × 4, and the output of the second branch is 5 channels, i.e. 5 categories including negative samples;

(5) the auxiliary diagnosis system for visualization on the narrow-band imaging endoscope image is used for performing visualization display on the narrow-band imaging endoscope image and performing frame selection marking on a cancer focus by using different colors; specifically, the input is a narrow-band imaging endoscope image; the network is used for detecting and diagnosing the cancer foci, and different color detection boxes are used for representing different cancer focus types, namely green, red, purple,Black represents A, B1, B2 and B3, and the classification confidence of the detection frame is marked; then, the confidence degrees of all the detection frames are screened, and the confidence degrees smaller than a threshold value T are removed₁The non-maximum value is used for inhibiting and eliminating all the detection frames, and the intersection ratio is larger than the threshold value T₂Redundant overlap boxes of (2); wherein T is₁、T₂Take [0, 1] over]All values of step size 0.05 within the range and by comparison F₁Score to determine the optimal threshold value T₁、T₂。

2. The deep detection network of claim 1, wherein the network model is trained as follows:

before training, randomly initializing network parameters of a ResNet-50 model, and scaling images in a training set to ensure that the resolution ratio of the images does not exceed 800 multiplied by 1333 and the corresponding bounding boxes are scaled in the same proportion;

during training, the images are firstly set to be [0.485, 0.456 and 0.406 ] according to the mean value]And standard deviation of [0.229, 0.224, 0.225 ]]-normalizing the three channels (R, G, B) of the image; using Adam optimization algorithm, set initial learning rate to 10^-4Two estimated exponential decay rates: beta is a₁Is set to 0.9, beta₂0.999, weight decay is 0, and a small batch stochastic gradient descent strategy is used, with the batch size set to 8 to minimize the loss function; training is carried out for N rounds; because the distribution of each type of blood vessel in the training set is not balanced, the blood vessels of B2 and B3 types cannot be sufficiently trained, Focalloss is used as a loss function of a cancer focus classification network, wherein the weights of negative samples, A, B1, B2 and B3 are respectively C₁、C₂、C₃、C₄、C₅And the distribution proportion of each blood vessel in the training set is determined after a plurality of experiments.

3. The depth detection network of claim 2, wherein the images of the narrow-band imaging endoscope are input into the trained network and transmitted in a forward direction to obtain the detection and diagnosis result of the cancer focus.