CN117854705A

CN117854705A - Multi-lesion multi-task intelligent diagnosis method, device, equipment and medium for upper digestive tract

Info

Publication number: CN117854705A
Application number: CN202311840684.1A
Authority: CN
Inventors: 王百键; 余汉濠; 晏涛; 陈彦宁
Original assignee: Macau Jinghu Hospital; University of Macau
Current assignee: Macau Jinghu Hospital; University of Macau
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-04-09

Abstract

The invention discloses an intelligent diagnosis method, device, equipment and medium for multiple diseases and multiple tasks of upper digestive tract, which comprises the following steps: inputting the upper gastrointestinal endoscopy image into an upper gastrointestinal multitasking classification model, and generating fusion characteristics corresponding to the upper gastrointestinal endoscopy image; processing the fusion characteristics through an upper gastrointestinal tract multitask classification model to obtain an image anatomical part and a lesion image corresponding to an upper gastrointestinal tract endoscopic image; inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result; and diagnosing according to the image anatomical part corresponding to the upper gastrointestinal endoscopy image and the target segmentation result of the lesion area. By the method, the fusion characteristics are generated and processed, the lesion region target segmentation result is generated according to the lesion image, diagnosis is carried out according to the image segmentation position and the lesion region target segmentation result, diagnosis efficiency, accuracy and operation quality can be improved, and missed diagnosis and misdiagnosis are reduced.

Description

Multi-lesion multi-task intelligent diagnosis method, device, equipment and medium for upper digestive tract

Technical Field

The invention relates to the technical field of medical image processing, in particular to an intelligent diagnosis method, device, equipment and medium for multiple diseases and multiple tasks of an upper digestive tract.

Background

Upper gastrointestinal cancer is a relatively common type of malignancy, mainly comprising: esophageal cancer and gastric cancer. Numerous clinical observations and basic studies have shown that the progression of upper gastrointestinal cancer goes through multiple stages, from normal tissue to the stage of precancerous lesions. Early detection and early treatment of upper gastrointestinal cancers and possibly cancerous related lesions can effectively improve the quality of life of patients and reduce the socioeconomic burden of cancer treatment. In clinical diagnosis, a doctor needs to judge whether a lesion exists or not and the type of the lesion in a large amount of images produced by endoscopy in a limited time; further, it is desirable to determine the precise area of the lesion in order to assess the condition and to formulate subsequent treatment plans.

However, the above-described manual diagnosis process based on endoscopic images is susceptible to a variety of subjective and objective factors, and the manual diagnosis of upper gastrointestinal lesions under an endoscope is challenging: the observation of all parts of the upper digestive tract during the upper digestive tract endoscopy is a prerequisite for the examination of the integrity, and since endoscopists have a large difference in diagnosis level, there is a problem of insufficient observation of anatomical parts of the upper digestive tract, resulting in missed diagnosis of lesions. Time and economic costs are expensive. The lesions of different types have subtle differences, are not easy to distinguish and are easy to misdiagnose. The same lesion also has larger difference, the larger the lesion area range is, the multiple scales coexist, the small lesion features are not obvious, and the diagnosis is easy to be missed. Therefore, there is a need for intelligent diagnostic methods with higher diagnostic efficiency, higher accuracy and higher operational quality.

The foregoing is provided merely for the purpose of facilitating understanding of the technical solutions of the present invention and is not intended to represent an admission that the foregoing is prior art.

Disclosure of Invention

The invention mainly aims to provide an intelligent diagnosis method, device, equipment and medium for multiple lesions of the upper digestive tract, and aims to solve the technical problem that the accurate areas of the lesions are difficult to determine in the prior art.

In order to achieve the above purpose, the invention provides an intelligent diagnosis method for multiple diseases and multiple tasks of the upper digestive tract, which comprises the following steps:

inputting an endoscopic image into an upper gastrointestinal tract multitasking classification model, and generating fusion characteristics corresponding to the endoscopic image;

processing the fusion characteristics through the upper gastrointestinal tract multi-task classification model to obtain an image anatomical part and a lesion image corresponding to the endoscopic image;

inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result;

and diagnosing according to the image anatomical part corresponding to the endoscopic image and the target segmentation result of the lesion area.

Optionally, the upper gastrointestinal tract multitask classification model comprises a CNN backbone network, a global feature extraction module, a local feature extraction module and an orthogonal feature fusion module, wherein the CNN backbone network is respectively connected with the inputs of the global feature extraction module and the local feature extraction module, and the orthogonal feature fusion module is respectively connected with the outputs of the global feature extraction module and the local feature extraction module;

the step of inputting the endoscopic image into the upper gastrointestinal tract multitasking classification model and generating the fusion characteristic corresponding to the endoscopic image comprises the following steps:

extracting shallow general features from an endoscopic image through the CNN backbone network;

inputting the shallow generic features to the global feature extraction module to generate global features;

inputting the shallow general features to the local feature extraction module to generate local features;

and inputting the global features and the local features to the orthogonal feature fusion module to generate fusion features corresponding to the endoscopic images.

Optionally, the local feature extraction module comprises a cavity space convolution pooling pyramid module and a mixed attention module, wherein the mixed attention module comprises a channel attention unit and a space attention unit;

the step of inputting the shallow generic feature to the local feature extraction module to generate a local feature includes:

inputting the shallow general features into the cavity space convolution pooling pyramid module to generate intermediate general features, wherein the intermediate general features are shallow general features which retain multi-scale context information;

inputting the intermediate general feature into the channel attention unit to acquire the dependency relationship among channels, acquiring the spatial pixel level relationship through the spatial attention unit, and generating a local feature module based on the dependency relationship and the pixel level relationship.

Optionally, the step of inputting the shallow generic feature to the global feature extraction module to generate a global feature includes:

the global feature extraction module is used for carrying out blocking treatment on the shallow general features, flattening the blocked shallow general features and generating one-dimensional vectors;

performing linear projection transformation on the one-dimensional vector, and simultaneously introducing position coding and classification marker bits into the transformed one-dimensional vector to generate a target one-dimensional vector;

and generating global features corresponding to the endoscopic image according to the target one-dimensional vector.

Optionally, the upper gastrointestinal tract multi-disease segmentation model comprises: the device comprises a U-Net framework, a global feature extraction module and an operation module, wherein the U-Net framework comprises three CNN modules, and the operation module comprises a remodelling unit, an up-sampling unit and a segmentation unit which are sequentially vector from left to right;

the step of inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result comprises the following steps:

inputting the lesion image into the U-Net framework, and sequentially extracting shallow features through three CNN modules;

after the shallow layer features are compressed, inputting the shallow layer features into the global feature extraction module to generate first intermediate features;

inputting the first intermediate features into a remodelling unit and a segmentation unit to generate second intermediate features;

performing up-sampling on the second intermediate feature and performing fusion and convolution operation on the second intermediate feature and a feature map obtained by down-sampling in a CNN module to obtain a fusion feature map;

and repeatedly performing the operation of obtaining the fusion feature map to obtain multi-scale fusion features, and generating a lesion area target segmentation result according to the multi-scale fusion features.

Optionally, after the step of diagnosing according to the image anatomical region corresponding to the endoscopic image and the target segmentation result of the lesion region, the method further includes:

and displaying the part under examination according to the image anatomical part, recording the examined part and simultaneously displaying the part which is not yet examined.

Optionally, the intelligent diagnosis method further comprises:

performing frame skipping acquisition of a preset time interval on continuous endoscopic images to generate continuous serialized image frames;

and processing the serialized image frames through the upper gastrointestinal tract multitasking classification model to obtain image anatomical parts and lesion images corresponding to the serialized image frames.

In addition, in order to achieve the above object, the present invention also proposes an intelligent diagnosis apparatus, the apparatus comprising:

the feature generation module is used for inputting an endoscopic image into the upper gastrointestinal tract multitasking classification model and generating fusion features corresponding to the endoscopic image;

the feature processing module is used for processing the fusion features through the upper gastrointestinal tract multi-task classification model to obtain an image anatomical part and a lesion image corresponding to the endoscopic image;

the result generation module is used for inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result;

and the result diagnosis module is used for diagnosing according to the image anatomical part corresponding to the endoscopic image and the lesion area target segmentation result.

In addition, in order to achieve the above object, the present invention also proposes an intelligent diagnosis apparatus, the apparatus comprising: a memory, a processor, and a smart diagnostic program stored on the memory and executable on the processor, the smart diagnostic program configured to implement the steps of the smart diagnostic method as described above.

In addition, in order to achieve the above object, the present invention also proposes a storage medium having stored thereon a smart diagnostic program which, when executed by a processor, implements the steps of the smart diagnostic method as described above.

The invention discloses an intelligent diagnosis method, device, equipment and medium for multiple diseases and multiple tasks of upper digestive tract, which comprises the following steps: inputting the upper gastrointestinal endoscopy image into an upper gastrointestinal multitasking classification model, and generating fusion characteristics corresponding to the upper gastrointestinal endoscopy image; processing the fusion characteristics through an upper gastrointestinal tract multitask classification model to obtain an image anatomical part and a lesion image corresponding to an upper gastrointestinal tract endoscopic image; inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result; and diagnosing according to the image anatomical part corresponding to the upper gastrointestinal endoscopy image and the target segmentation result of the lesion area. By the method, the upper gastrointestinal tract multitasking classification model and the gastrointestinal tract multi-disease segmentation model are combined to generate the fusion characteristic, the fusion characteristic is processed, a target segmentation result of the lesion area is generated according to the lesion image, diagnosis is carried out according to the image segmentation part and the target segmentation result of the lesion area, diagnosis efficiency, accuracy and operation quality can be improved, and missed diagnosis and misdiagnosis are reduced.

Drawings

FIG. 1 is a schematic diagram of a hardware running environment intelligent diagnosis device according to an embodiment of the present invention;

FIG. 2 is a flow chart of a first embodiment of the intelligent diagnostic method of the present invention;

FIG. 3 is a schematic diagram of an upper gastrointestinal tract multi-task classification network according to a first embodiment of the intelligent diagnosis method of the present invention;

FIG. 4 is a schematic diagram of a diagnostic process according to a first embodiment of the intelligent diagnostic method of the present invention;

FIG. 5 is a flow chart of a second embodiment of the intelligent diagnostic method of the present invention;

FIG. 6 is a schematic diagram of a global feature extraction module according to a second embodiment of the intelligent diagnosis method of the present invention;

FIG. 7 is a schematic structural diagram of a local feature extraction module according to a second embodiment of the intelligent diagnosis method of the present invention;

FIG. 8 is a schematic diagram showing the fusion of orthogonal fusion modules according to a second embodiment of the intelligent diagnosis method of the present invention;

FIG. 9 is a flow chart of a third embodiment of the intelligent diagnostic method of the present invention;

FIG. 10 is a schematic diagram of an upper gastrointestinal multi-lesion segmentation model according to a third embodiment of the intelligent diagnosis method of the present invention;

fig. 11 is a block diagram showing the construction of a first embodiment of the intelligent diagnosis apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an intelligent diagnosis device of a hardware running environment according to an embodiment of the present invention.

As shown in fig. 1, the smart diagnostic device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the intelligent diagnostic apparatus, and may include more or fewer components than illustrated, or may combine certain components, or a different arrangement of components.

As shown in fig. 1, an operating system, a data storage module, a network communication module, a user interface module, and a smart diagnostic program may be included in the memory 1005 as one type of storage medium.

In the intelligent diagnostic apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the intelligent diagnosis device of the present invention may be provided in the intelligent diagnosis device, and the intelligent diagnosis device calls the intelligent diagnosis program stored in the memory 1005 through the processor 1001 and executes the intelligent diagnosis method provided by the embodiment of the present invention.

The embodiment of the invention provides an intelligent diagnosis method, referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the intelligent diagnosis method of the invention.

In this embodiment, the intelligent diagnosis method includes the following steps:

step S10: and inputting the endoscopic image into an upper gastrointestinal tract multitasking classification model, and generating fusion characteristics corresponding to the endoscopic image.

It should be noted that, the execution body of the method of the present embodiment may be a diagnostic device having functions of data processing, network communication, and program running, for example, an intelligent diagnostic device; other electronic devices with the same or similar functions or intelligent diagnosis systems loaded with the electronic devices are also possible. The present embodiment and the following embodiments will exemplify the intelligent diagnosis method of the present embodiment and the following embodiments with the intelligent diagnosis apparatus as an execution subject.

It will be appreciated that the endoscopic image may be an electronic endoscopic image, including a conventional white light endoscopic image, and the detected site includes: is composed of throat, esophagus, esophageal gastric junction, fundus, middle and upper parts of the body of the positive gastroscope, middle and upper parts of the body of the reverse gastroscope, angle of stomach, lower part of the body of stomach, antrum (pyloric antrum), duodenal bulbar portion, and duodenal descending portion. And inputting the endoscopic image into the upper gastrointestinal tract multitasking classification model, and generating fusion characteristics corresponding to the endoscopic image.

Referring to fig. 3, fig. 3 is a schematic structural diagram of an upper gastrointestinal tract multi-task classification network according to a first embodiment of the intelligent diagnosis method of the present invention, where the upper gastrointestinal tract multi-task classification network can simultaneously implement classification of anatomical sites and classification of lesions, and meanwhile solve the problems of extraction and fusion of coarse-granularity global features and fine-granularity local features in endoscopic images; the upper gastrointestinal tract multi-lesion segmentation network can segment multi-scale lesions; the quality control module may display the examined region and the undetected region based on the anatomical region classification result.

Step S20: and processing the fusion characteristic through the upper gastrointestinal tract multi-task classification model to obtain an image anatomical part and a lesion image corresponding to the endoscopic image.

The upper gastrointestinal tract multi-task classification model may be a model for distinguishing anatomical sites and lesion types to be observed during the upper gastrointestinal tract examination. The two tasks of classifying the anatomical part based on the endoscope image and classifying the lesion have rich associated information, and the integration of the anatomical part information is helpful for accurately identifying the lesion when the lesion is classified, for example, barrett's esophagus and reflux esophagitis usually occur at the joint of esophagus and stomach, and intestinal epithelialization usually occurs in antrum and angle of stomach. Based on the correlation characteristics of the two tasks of anatomical part classification and lesion classification and the advantages of multi-task learning, the upper gastrointestinal tract multi-task classification model can realize that the model can give out the anatomical part of an image while detecting and distinguishing lesions.

It can be understood that when identifying and judging the lesion type of the upper digestive tract, not only the global feature of the endoscope image but also the local information of the image need to be analyzed, and the lesion type is diagnosed by combining the global feature and the local feature. Aiming at the diagnosis characteristics, the upper gastrointestinal tract multitasking classification model combines a visual recognition model (Vision Transformer, VIT) based on a transducer with a convolutional neural network CNN, and can fully utilize respective advantages to realize a double-branch network model so as to distinguish various anatomical parts and various lesions. The model shares partial parameters, and comprises a CNN backbone network for extracting shallow general features, wherein the features are branched by two paths, and the first path carries out the Flatten operation on the features so as to adapt to the input of VIT and extract global features. The other path inputs the features into a local feature extraction module with a attention unit so as to extract local features. And then fusing the two paths of features by using an orthogonal feature fusion module so as to obtain the feature expression with more discrimination. The task layer consists of two full-connection layers and is trained together with the whole network so as to realize the internal associated information delivery between two tasks of anatomy part classification and lesion classification.

It should be appreciated that the intelligent diagnostic apparatus processes the fusion features through the upper gastrointestinal tract multi-task classification model to obtain an image anatomical region and a lesion image corresponding to the endoscopic image.

Step S30: and inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result.

It is understood that the upper gastrointestinal multi-lesion segmentation model may be a model for segmenting a lesion region in an image. The intelligent diagnosis device inputs the lesion image into the upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result.

Step S40: and diagnosing according to the image anatomical part corresponding to the endoscopic image and the target segmentation result of the lesion area.

Referring to fig. 4, fig. 4 is a schematic diagram of a diagnostic process according to a first embodiment of the intelligent diagnostic method of the present invention. Firstly, the upper digestive tract is observed in real time by an endoscope system; then, the image acquisition module carries out Frame skipping acquisition on the continuous endoscope images at fixed time intervals to form continuous serialized image frames (frames); these image frames are then sequentially input into a multi-task classification model deployed on a computing server and predicting the anatomical region to which the image frames belong and the lesion class in the image frames; after the position predicted value is obtained, a quality control module arranged on a calculation server displays the observed position in an upper digestive tract model diagram and prompts the position which is not observed, so that a doctor is reminded to' see all the positions; after obtaining a lesion type predicted value, performing lesion region segmentation by a lesion segmentation model; and finally, displaying all diagnosis results on a display in real time.

It can be understood that the intelligent diagnosis device inputs the lesion image into the upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result. And diagnosing according to the image anatomical part corresponding to the endoscopic image and the target segmentation result of the lesion area.

In a specific implementation, when the upper gastrointestinal tract multitask classification model is trained, the training set can be white light endoscopy data of upper gastrointestinal tract lesion patients with different ages and sexes, the image is divided into 11 mark parts, and the 11 mark parts can be respectively a throat, an esophagus and stomach combination part, a fundus, a middle upper part of a positive gastroscope, a middle upper part of a reverse gastroscope, a stomach angle, a lower part of the stomach, a antrum (namely, pyloric antrum), a duodenal bulb part and a duodenal depression part, so as to construct an upper gastrointestinal tract anatomy part classification data set. Then, a plurality of experienced endoscopists continue to divide the images into 21 categories of normal throat, abnormal throat, normal esophagus, esophageal cancer, esophageal squamous intraepithelial neoplasia, barrett's esophagus, esophageal benign tumor, reflux esophagitis, esophageal varices, other esophageal lesions, normal gastric mucosa, stomach cancer, gastric mucosa dysplasia, gastrointestinal epitheliosis, gastritis, gastric polyps, gastric ulcers, other gastric lesions, normal duodenum, abnormal duodenum, and Undefined (Undefined) according to the pathology report or the diagnosis report, and construct an upper digestive tract multiple disease classification data set. Then, a lesion region is delineated according to a pathology report or a diagnosis report, and a multi-lesion segmentation data set is established.

In the embodiment, inputting an upper gastrointestinal endoscopy image into an upper gastrointestinal multitasking classification model to generate fusion characteristics corresponding to the upper gastrointestinal endoscopy image; processing the fusion characteristics through an upper gastrointestinal tract multitask classification model to obtain an image anatomical part and a lesion image corresponding to an upper gastrointestinal tract endoscopic image; inputting the lesion image into an upper gastrointestinal tract multi-lesion segmentation model to generate a lesion region target segmentation result; and diagnosing according to the image anatomical part corresponding to the upper gastrointestinal endoscopy image and the target segmentation result of the lesion area. By the method, the upper gastrointestinal tract multitasking classification model and the gastrointestinal tract multi-disease segmentation model are combined to generate the fusion characteristic, the fusion characteristic is processed, a target segmentation result of the lesion area is generated according to the lesion image, diagnosis is carried out according to the image segmentation part and the target segmentation result of the lesion area, diagnosis efficiency, accuracy and operation quality can be improved, and missed diagnosis and misdiagnosis are reduced.

Referring to fig. 5, fig. 5 is a schematic flow chart of a second embodiment of the intelligent diagnosis method of the present invention.

Further, based on the first embodiment, in this embodiment, the step S10 further includes:

step S101: shallow generic features are extracted from the endoscopic images through the CNN backbone network.

It should be noted that, the upper gastrointestinal tract multitask classification model includes a CNN backbone network, a global feature extraction module, a local feature extraction module and an orthogonal feature fusion module, where the CNN backbone network may be a convolutional neural network (convolutional neural network, CNN) backbone network, where the CNN backbone network is connected with the inputs of the global feature extraction module and the local feature extraction module, and the orthogonal feature fusion module is connected with the outputs of the global feature extraction module and the local feature extraction module, respectively.

Step S102: and inputting the shallow general features to the global feature extraction module to generate global features.

It should be noted that, the general feature is subjected to the flat operation to adapt to the input of the global feature extraction module, and the global feature extraction module composed of the VIT extracts the global feature in the image. The intelligent diagnosis equipment inputs the shallow general features to the global feature extraction module to generate global features.

Further, the step S102 further includes: the global feature extraction module is used for carrying out blocking treatment on the shallow general features, flattening the blocked shallow general features and generating one-dimensional vectors; performing linear projection transformation on the one-dimensional vector, and simultaneously introducing position coding and classification marker bits into the transformed one-dimensional vector to generate a target one-dimensional vector; and generating global features corresponding to the endoscopic image according to the target one-dimensional vector.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a global feature extraction module according to a second embodiment of the intelligent diagnosis method of the present invention. Firstly, performing block processing on a two-dimensional image, flattening each image block into a one-dimensional vector, then performing linear projection transformation on each vector, introducing position codes, adding position information of a sequence, and adding a classification flag bit (class) before the input sequence information so as to better represent global information.

Step S103: and inputting the shallow general features to the local feature extraction module to generate local features.

Further, the step S103 further includes: inputting the shallow general features into the cavity space convolution pooling pyramid module to generate intermediate general features, wherein the intermediate general features are shallow general features which retain multi-scale context information; inputting the intermediate general feature into the channel attention unit to acquire the dependency relationship among channels, acquiring the spatial pixel level relationship through the spatial attention unit, and generating a local feature module based on the dependency relationship and the pixel level relationship.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a local feature extraction module according to a second embodiment of the intelligent diagnosis method of the present invention, where the local feature extraction module includes a hole space convolution pooling pyramid module and a hybrid attention module, and the hybrid attention module includes a channel attention unit and a spatial attention unit.

It can be appreciated that the core of CNN is a convolution kernel, with generalized bias such as translational invariance and local sensitivity, which can capture local spatio-temporal information. In addition, attention information can be fused into the image, finer feature expression can be captured, therefore, attention CNN is taken as a basic mechanism, a local feature extraction module is constructed, meanwhile, aiming at a plurality of upper gastrointestinal lesions, fine granularity characteristics with small differences among different categories and large differences among the same category exist, lesion scales span large and multiple scales coexist, a cavity space convolution pooling pyramid (Atrous spatial pyramid pooling, ASPP) module and a mixed attention module (Hybrid attention module, HAM) are designed in the local feature extraction module, ASPP is used for retaining the multi-scale context information, and HAM is a combination of a channel attention unit (Channel attention unit) and a space attention unit (Spatial attention unit) and is used for positioning and analyzing fine granularity local information with high discrimination. The Channel attention unit and the Spatial attention unit capture the dependency relationship between channels (channels) and the pixel-level relationship on space (Spatial) respectively, and better effect can be achieved by using the two attention mechanisms.

Step S104: and inputting the global features and the local features to the orthogonal feature fusion module to generate fusion features corresponding to the endoscopic images.

Referring to fig. 8, fig. 8 is a fusion schematic diagram of an orthogonal fusion module according to a second embodiment of the intelligent diagnosis method of the present invention, when an endoscopist identifies and determines a lesion type of an upper digestive tract, the endoscopist needs to analyze not only global features of an endoscope image, such as an anatomical part of the upper digestive tract represented by the image, but also information such as a lesion occurrence part, a shape and a color of the lesion in the image; it is also necessary to analyze local information of the image, such as analyzing the morphology of capillary loops in the esophageal papillary in the lesion area to identify early stage esophageal squamous cell carcinoma; finally, the endoscopist synthesizes the global characteristics and the local characteristics to diagnose the lesion category. The use of the orthogonal feature fusion module facilitates the fusion of global features and local features to form feature expression which is more beneficial to improving classification performance, and the workflow is shown in fig. 8 (a), the global features and the local features are taken as input, projection of each local feature point on the global features is calculated, and pixel coordinates of the feature points are respectively represented. Mathematically, the projection can be expressed as:

the dot product operation is an L2 normal form, and the calculation formula is as follows:

as shown in fig. 8 (b), the orthogonal component is the difference between the local feature and its projection vector, and thus the component orthogonal to it can be obtained by:

it should be appreciated that a tensor may be extracted in which each point is orthogonal. Then, a vector is added to each point of this tensor, and then this new tensor is aggregated into a vector. Finally, a full connection layer is used for generating a new feature vector of dimensions, and the vector simultaneously aggregates global features and local features, accords with the diagnosis thought of an endoscopist and is beneficial to improving the classification accuracy of lesions.

In this embodiment, the shallow generic features are extracted from the endoscopic image through the CNN backbone network; inputting the shallow generic features to the global feature extraction module to generate global features; inputting the shallow general features to the local feature extraction module to generate local features; and inputting the global features and the local features to the orthogonal feature fusion module to generate fusion features corresponding to the endoscopic images. By the method, the shallow general features are extracted, the shallow general features are input into the global feature extraction module to generate global features, the local features are extracted through the general features, and the fusion features are generated by fusing the global features and the local features, so that the accuracy of lesion classification can be improved.

Referring to fig. 9, fig. 9 is a schematic flow chart of a third embodiment of the intelligent diagnosis method according to the present invention.

Further, based on the first embodiment, in this embodiment, the step S30 further includes:

step S301: and inputting the lesion image into the U-Net framework, and sequentially extracting shallow features through three CNN modules.

The upper gastrointestinal tract multiple-disease segmentation model includes: the device comprises a U-Net framework, a global feature extraction module and an operation module, wherein the U-Net framework comprises three CNN modules, and the operation module comprises a remodelling unit, an up-sampling unit and a segmentation unit which are sequentially vector from left to right.

Step S302: and after the shallow features are compressed, inputting the shallow features into the global feature extraction module to generate first intermediate features.

Step S303: and inputting the first intermediate features into a remodelling unit and a segmentation unit to generate second intermediate features.

It can be appreciated that the intelligent diagnosis device performs the compression operation on the shallow features, and inputs the shallow features to the global feature extraction module to generate first intermediate features. And inputting the first intermediate features into a remodelling unit and a segmentation unit to generate second intermediate features.

Step S304: and carrying out up-sampling on the second intermediate feature and carrying out fusion and convolution operation on the second intermediate feature and the feature map obtained by down-sampling in the CNN module to obtain a fusion feature map.

Step S305: and repeatedly performing the operation of obtaining the fusion feature map to obtain multi-scale fusion features, and generating a lesion area target segmentation result according to the multi-scale fusion features.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an upper gastrointestinal multi-lesion segmentation model according to a third embodiment of the intelligent diagnosis method of the present invention. In order to construct a perfect upper gastrointestinal tract lesion real-time intelligent diagnosis system, on the basis of upper gastrointestinal tract lesion classification, the invention provides an effective segmentation network for all upper gastrointestinal tract lesions based on visual differences between lesion areas and normal gastrointestinal tract mucous membranes, thereby forming a multi-task intelligent diagnosis system. Compared with CNN, the self-attention mechanism of VIT is not limited by local interaction, and has strong global feature extraction capability and parallel computation. Furthermore, in various medical image segmentation tasks, the U-shaped architecture (also referred to as U-Net) built by CNN has become a standard in segmentation system design and has achieved great success. In view of this, the present invention exploits the VIT and U-Net architecture to develop a multi-scale feature fusion network to improve the segmentation effect of the model on secondary multi-scale lesions, especially small lesions.

It can be understood that the network firstly adopts three CNN modules in turn to extract shallow features common to the image, such as edges and colors, and then performs the flat operation on the features to adapt to the input of the VIT global feature extraction module, where the global feature extraction module network structure is consistent with the global feature extraction module structure in the multi-classification network, then performs Reshape operation and convolution operation on the features output by the VIT modules, and then upsamples (Upsampling) the feature map of the previous step and performs fusion (Feature Concatenation) and convolution operation on the feature map obtained by Downsampling (Downsampling) in the CNN module, so as to obtain the fused feature map. And repeating the previous step to obtain a plurality of scale feature graphs, further up-sampling the plurality of scale feature graphs to the same size (scale) to perform multi-scale feature fusion, and finally inputting the multi-scale fused features to a segmentation head (Segmentation Head) to perform pixel level prediction to obtain a final segmentation result.

In this embodiment, the lesion image is input to the U-Net architecture, and shallow features are extracted sequentially through three CNN modules; after the shallow layer features are compressed, inputting the shallow layer features into the global feature extraction module to generate first intermediate features; inputting the first intermediate features into a remodelling unit and a segmentation unit to generate second intermediate features; performing up-sampling on the second intermediate feature and performing fusion and convolution operation on the second intermediate feature and a feature map obtained by down-sampling in a CNN module to obtain a fusion feature map; and repeatedly performing the operation of obtaining the fusion feature map to obtain multi-scale fusion features, and generating a lesion area target segmentation result according to the multi-scale fusion features. By the mode, the obtained target segmentation result of the lesion area has better fineness.

In addition, the embodiment of the invention also provides a storage medium, wherein the storage medium is stored with a smart diagnosis program, and the smart diagnosis program realizes the steps of the smart diagnosis method when being executed by a processor.

Referring to fig. 11, fig. 11 is a block diagram showing the construction of a first embodiment of the intelligent diagnosis apparatus according to the present invention.

As shown in fig. 11, an intelligent diagnosis apparatus according to an embodiment of the present invention includes: a feature generation module 501, a feature processing module 502, a result generation module 503, and a result diagnosis module 504.

The feature generation module 501 is configured to input an endoscopic image into an upper gastrointestinal tract multitasking classification model, and generate a fusion feature corresponding to the endoscopic image;

the feature processing module 502 is configured to process the fusion feature through the upper gastrointestinal tract multi-task classification model, and obtain an image anatomical part and a lesion image corresponding to the endoscopic image;

the result generation module 503 is configured to input the lesion image into an upper gastrointestinal tract multi-lesion segmentation model, and generate a lesion region target segmentation result;

the result diagnosis module 504 is configured to perform diagnosis according to the image anatomical region corresponding to the endoscopic image and the target segmentation result of the lesion region.

Other embodiments or specific implementation manners of the intelligent diagnosis device of the present invention may refer to the above method embodiments, and are not described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. read-only memory/random-access memory, magnetic disk, optical disk), comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An intelligent diagnosis method for multiple lesions of an upper digestive tract is characterized by comprising the following steps of:

2. The intelligent diagnostic method of claim 1, wherein the upper gastrointestinal tract multi-task classification model comprises a CNN backbone network, a global feature extraction module, a local feature extraction module, and an orthogonal feature fusion module, the CNN backbone network being connected to inputs of the global feature extraction module and the local feature extraction module, respectively, the orthogonal feature fusion module being connected to outputs of the global feature extraction module and the local feature extraction module, respectively;

3. The intelligent diagnostic method of claim 2, wherein the local feature extraction module comprises a void space convolution pooling pyramid module and a hybrid attention module comprising a channel attention unit and a spatial attention unit;

4. The intelligent diagnostic method of claim 2, wherein the step of inputting the shallow generic features to the global feature extraction module to generate global features comprises:

5. The intelligent diagnostic method of any one of claims 1-4, wherein the upper gastrointestinal tract multi-lesion segmentation model comprises: the device comprises a U-Net framework, a global feature extraction module and an operation module, wherein the U-Net framework comprises three CNN modules, and the operation module comprises a remodelling unit, an up-sampling unit and a segmentation unit which are sequentially vector from left to right;

6. The intelligent diagnosis method according to any one of claims 1 to 4, wherein after the step of diagnosing based on the image anatomy corresponding to the endoscopic image and the lesion region target segmentation result, further comprising:

7. The intelligent diagnostic method of any one of claims 1-4, wherein the intelligent diagnostic method further comprises:

8. An intelligent diagnostic apparatus, characterized in that the apparatus comprises:

9. An intelligent diagnostic apparatus, characterized in that the apparatus comprises: a memory, a processor and a smart diagnostic program stored on the memory and executable on the processor, the smart diagnostic program being configured to implement the steps of the smart diagnostic method of any one of claims 1 to 7.

10. A storage medium having stored thereon a smart diagnostic program which, when executed by a processor, implements the steps of the smart diagnostic method of any one of claims 1 to 7.