CN113096131A - Gastroscope picture multi-label classification system based on VIT network - Google Patents

Gastroscope picture multi-label classification system based on VIT network Download PDF

Info

Publication number
CN113096131A
CN113096131A CN202110640686.0A CN202110640686A CN113096131A CN 113096131 A CN113096131 A CN 113096131A CN 202110640686 A CN202110640686 A CN 202110640686A CN 113096131 A CN113096131 A CN 113096131A
Authority
CN
China
Prior art keywords
classification
layer
vit
network
classifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110640686.0A
Other languages
Chinese (zh)
Inventor
戴捷
李亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zidong Information Technology Suzhou Co ltd
Original Assignee
Zidong Information Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zidong Information Technology Suzhou Co ltd filed Critical Zidong Information Technology Suzhou Co ltd
Priority to CN202110640686.0A priority Critical patent/CN113096131A/en
Publication of CN113096131A publication Critical patent/CN113096131A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10068Endoscopic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30092Stomach; Gastric

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a gastroscope picture multi-label classification system based on a VIT network, and belongs to the technical field of medical image intelligent processing. The system comprises a sample processing module, a sample processing module and a display module, wherein the sample processing module is used for preprocessing a sample picture; the model building module is used for building a preset network model based on the VIT network; the model training module is used for inputting the processed sample picture into the preset network model and training by utilizing a preset error function; and the classification module is used for setting an output threshold value of the classification model and performing multi-label classification on the input gastroscope picture. The classification result output of the multi-label gastroscope picture can be accelerated through the VIT network and the output threshold value, and the classification speed of the multi-label gastroscope picture is improved.

Description

Gastroscope picture multi-label classification system based on VIT network
Technical Field
The invention relates to a medical image intelligent processing technology, in particular to a gastroscope picture multi-label classification system based on a VIT network.
Background
Gastric disease belongs to a multiple disease category in our country and is likely to cause gastric cancer if improperly diagnosed and treated. Currently, the onset age of gastric diseases is gradually becoming younger due to changes in dietary structure and the like. Since gastroscopic techniques are effective in the diagnosis of gastric diseases, they have been proposed as the main diagnostic method for gastric diseases. In particular, gastroscopy can directly detect the pathological tissue area in the stomach to make corresponding diagnosis, and tissue biopsy can be made under the gastroscopy to play an important role in diagnosing early gastric precancerous diseases or precancerous lesions and identifying benign and malignant ulcers. The gastroscopy method has certain advantages, but the last gastric cancer diagnosis condition can be directly influenced due to human factors such as inconsistent experience level of doctors or special conditions such as negligence, and a great deal of time can be consumed for observing a gastroscopy picture by human eyes.
Although techniques for analyzing a gastroscopic image by using artificial intelligence image recognition have appeared at present, the gastroscopic image recognition needs to have higher accuracy rate to prevent misdiagnosis and missed diagnosis due to the high similarity of various stomach diseases presented in the gastroscopic image, and generally needs to perform multi-label classification processing on the same gastroscopic image and label a plurality of different types of stomach diseases at the same time. In order to achieve the above object, the existing image recognition technology generally achieves an improvement in accuracy by stacking a plurality of layers of neural networks, which greatly increases the calculation amount of a recognition system, so that the speed of gastroscope image recognition is obviously reduced, and the image recognition technology is not suitable for performing lesion recognition on a large number of gastroscope images.
Therefore, it is desirable to obtain a technical solution that can analyze a large amount of gastroscopic images by using an artificial intelligence image recognition technology, and can further improve the recognition speed under the condition of satisfying the gastroscopic lesion recognition accuracy and multi-label classification processing.
Disclosure of Invention
The object of the present application is to solve the above technical problem. The application provides a many labels of gastroscope picture classification system based on VIT network can accelerate the many labels of gastroscope picture classification result output through establishing neural network model and branch classification result output rule based on VIT network, realizes the improvement to the many labels classification speed of gastroscope picture. The application provides the following technical scheme:
the utility model provides a many labels classification system of gastroscope picture based on VIT network, it includes:
the sample processing module is used for preprocessing the sample picture to obtain a processed sample picture;
the model building module is used for building a preset network model based on the VIT network, the preset network model based on the VIT network comprises a main part and a branch part, the main part comprises a plurality of layers of VIT networks and a main classifier, the branch part comprises a branch classifier added at the output position of each VIT layer of the last layer of the main part, and the main classifier and the branch classifier can perform multi-label classification;
the model training module is used for inputting the processed sample picture into the preset network model and training the processed sample picture by using a preset error function to obtain a classification model;
and the classification module is used for setting an output threshold of the classification model to obtain the set classification model, the output threshold controls the output of the classification result in advance, and the set classification model is used for performing multi-label classification on the input gastroscope picture.
Optionally, the trunk classifier and the branch classifier respectively comprise one or more of a convolutional layer, a pooling layer, a VIT layer, and a full link layer, and the number of the full link layers is consistent with the total classification number of the classifications.
Optionally, wherein the outputting the threshold value to control the output of the classification result in advance comprises: and calculating the classification results of all the classification labels when the data passes through the branch classifier, calculating uncertainty degree values of the classification results, outputting the classification results and stopping executing when the uncertainty degree values of the classification results of all the classification labels are lower than an output threshold value, and otherwise, continuously executing the lower-layer VIT network and the branch classifier.
Optionally, the gastroscope sample picture preprocessing method includes: scaling and clipping processing, random horizontal turning processing, and standardization processing or any combination thereof.
Optionally, the method for performing multi-label classification by the trunk classifier and the branch classifier respectively includes:
extracting gastroscope sample picture characteristics by utilizing the convolution layer and the pooling layer;
copying the sample picture features as C parts, wherein C is the total category number;
capturing the dependency relationship among disease categories in a gastroscope sample picture by using a VIT layer and outputting C category characteristics;
and respectively decoding the class characteristics by utilizing C full-connection operations of the full-connection layer, and outputting C class predictions of the gastroscope sample picture.
Optionally, the preset error function includes an error function of a training trunk portion and an error function of a training branch portion.
Optionally, the error function of the training trunk portion is a cross entropy function, and the error function of the training branch portion is:
Figure 596011DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 674825DEST_PATH_IMAGE002
and
Figure 362683DEST_PATH_IMAGE003
as a result of the classification by the ith branch classifier,
Figure 134330DEST_PATH_IMAGE004
is the classification result of the stem classifier, j is the training times,
Figure 84968DEST_PATH_IMAGE005
for the divergence function, L is the number of all classifiers.
Optionally, in the VIT network, each layer of the network includes a multi-head attention layer for obtaining sequence features, a feed-forward propagation layer for nonlinear transformation, and two addition normalization layers for normalization; and respectively constructing an addition normalization layer and a feedforward propagation layer in front of the multi-head attention layer and the feedforward propagation layer of each layer, wherein the feedforward propagation layer is positioned behind the multi-head attention layer, and the output of the VIT network of each layer is directly used as the input of the VIT network of the next layer.
Optionally, wherein the classification junction uncertainty measure value is derived by the following equation:
Figure 967474DEST_PATH_IMAGE006
wherein N is the total number of the classifiers,
Figure 319957DEST_PATH_IMAGE007
is the classification result of the ith branch classifier.
The beneficial effects of this application include at least: by adopting a gastroscope picture multi-label classification system based on a VIT network, the invention provides a novel neural network framework for gastroscope picture multi-label classification, introduces the VIT network suitable for global feature extraction, is more suitable for a gastroscope picture multi-label classification task, constructs a branch classification result output rule by setting an output threshold value, can output a classification result in advance in a branch structure in the classification, identification and calculation process of the gastroscope picture, and improves the speed of multi-label picture classification. Through actual tests, compared with the existing gastroscope image classification model based on the neural network, the method can perform multi-label classification on the gastroscope image at a higher speed under the condition of ensuring the accuracy.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
The present application may be better understood by describing exemplary embodiments thereof in conjunction with the following drawings, wherein:
fig. 1 is a block diagram of a VIT network-based gastroscopic image multi-label classification system according to the present application.
Fig. 2 is a block diagram of a VIT network-based gastroscopic image multi-label classification network model according to the present application.
Fig. 3 is a schematic diagram illustrating model training of a VIT network-based gastroscope image multi-label classification network model according to the present application.
Detailed Description
The present invention will be described in further detail with reference to the following examples and the accompanying drawings so that those skilled in the art can practice the invention with reference to the description.
It is noted that in the detailed description of these embodiments, in order to provide a concise description, all features of an actual implementation may not be described in detail. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions are made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
VIT (vision transform) is a neural network that applies a transform to the field of image classification. Transformers have previously been applied in the field of natural language recognition. Through VIT, the advantages of the Transformer can be exerted in the field of image recognition, for example, the problem of fixed and limited visual field in feature extraction of a traditional image recognition convolution network can be solved through a self-attention mechanism of the Transformer, and the self-attention mechanism can obtain feature information in a wider range. This application constructs the many labels classification model of gastroscope picture through introducing VIT, can extract the global feature of gastroscope picture better, more is favorable to analyzing the characteristic of different stomach diseases and carries out many labels classification.
A general VIT network consists of 4 main parts, namely a multi-head attention layer for acquiring sequence features, a feed-forward propagation layer for nonlinear transformation, and two addition normalization layers for normalization; and constructing an addition normalization layer at the output positions of the multi-head attention layer and the feedforward propagation layer of each layer, wherein the feedforward propagation layer is positioned behind the multi-head attention layer.
Fig. 1 is a block diagram of a gastroscopic image classification system based on a multi-branch neural network model provided in an embodiment of the present application. The system at least comprises the following modules:
and the sample processing module 110 is configured to pre-process the sample picture to obtain a processed sample picture.
Firstly, for better extraction of gastroscope picture characteristics, preprocessing a gastroscope sample picture, wherein the preprocessing comprises the following steps: scaling and clipping processing, random horizontal turning processing, standardization processing or any combination thereof. The scaling cropping process is used to process the input picture to a fixed size. The normalization process is to subtract the statistical average of the corresponding dimensionality of the data from the RGB unread images of the images to eliminate common parts, highlight features and differences between individuals. The random horizontal flipping process is also used for enhancing data to improve the generalization capability of the model. The present embodiment does not limit the value of scaling and cropping of the picture.
Illustratively, the different input pictures are scaled to a size of 640 × 3, then cropped to a size of 384 × 3, the black redundant parts of the four corners of the pictures are cropped, and finally data normalization is performed to obtain the features of the final input image.
The model building module 120 is used for building a preset network model based on a VIT network, the preset network model based on the VIT network comprises a main part and a branch part, the main part comprises a plurality of layers of VIT networks and a main classifier, the branch part comprises a branch classifier added at the output position of each VIT layer of the main part except the last layer, and the main classifier and the branch classifier can perform multi-label classification;
referring to fig. 2, the VIT network-based neural network model includes a trunk portion and branch portions. The backbone portion is mainly composed of multiple layers of VIT networks. The output of each layer of VIT network is directly used as the input of the next layer of VIT network. And the output part of the last layer of VIT network is connected with a backbone classifier. The number of layers of the VIT network is not limited in this embodiment. The branch part of the multi-branch neural network model is formed by adding a branch classifier at the output position of each VIT layer of the trunk part except the last layer. Illustratively, in a multi-branch neural network model that uses a 12-layer VIT network to construct the trunk portion, there are 11 branch classifiers and 1 trunk classifier.
The trunk classifier and the branch classifier are composed of one or more of a convolution layer, a pooling layer, a VIT layer and a full-connection layer, and the number of the full-connection layers is consistent with the total classification number of the classification. Illustratively, each branch classifier may be composed of 7 layers, the first 4 layers are convolutional layers, the 5 th layer is a pooling layer, the 6 th layer is a VIT layer, and the 7 th layer is a fully-connected layer, and when there are 20 types of classifications to be output, the fully-connected layer has 20 fully-connected operations to output 20 classification results. The present embodiment does not limit the specific configuration of the classifier.
The classifier utilizes the convolution layer and the pooling layer to extract gastroscope picture characteristics, copies the picture characteristics according to the classified number before inputting the VIT layer, for example, when 20 types are classified, the copy picture characteristics are 20, then utilizes the VIT layer to capture the dependency relationship among stomach disease types and output 20 type characteristics, utilizes 20 full-connection operations to decode the type characteristics respectively, and outputs 20 type predictions of the gastroscope picture, and the specific method is as follows: classifiers each full-join operation in the full-join layer of the last layer of each classifier contains 2 neurons, the last 2 results are the distribution probabilities, i.e., the probability of classifying as a normal stomach picture and the probability P = [ P _1, P _2] of the classified stomach disease picture, where P1 is the probability that the picture does not have the classified stomach disease, P2 is the probability that the picture has the classified stomach disease, and P _1+ P _2= 1. If p _1 is more than or equal to p _2, the image is judged to be the image without the classified stomach diseases by the model, and when p _1 is less than p _2, the image is judged to be the image with the classified stomach diseases by the model. The present embodiment does not limit the specific number of classifications.
And the model training module 130 is configured to input the processed sample picture into the preset network model and train the processed sample picture by using a preset error function to obtain a classification model.
Firstly, constructing a cross entropy error function used in training of a trunk part, training by using task data with labels, obtaining a classification result through a trunk classifier, performing loss calculation on the classification result and real labels corresponding to samples by using the constructed cross entropy error function, and fitting the whole network with real distribution of pictures. After a certain number of iterations, the model weight parameters that achieve the best results are saved.
Then, a relative entropy error function used for training the branch part is constructed, and the specific formula is as follows:
Figure 200189DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 825074DEST_PATH_IMAGE002
and
Figure 183374DEST_PATH_IMAGE003
as a result of the classification by the ith branch classifier,
Figure 655944DEST_PATH_IMAGE004
is the classification result of the stem classifier, j is the training times,
Figure 769393DEST_PATH_IMAGE005
for the divergence function, L is the number of all classifiers.
Training is performed by using unlabeled task data, referring to fig. 3, loss calculation is performed on the classification result of the trunk classifier and the classification result of the branch classifier by using the constructed relative entropy error function, so that the branch part fits the result of the trunk part, and the weight parameter of the branch part with the best effect is stored.
Optionally, in the training process, the batch size is set to be 4, the initial learning rate of the training of the trunk portion is 0.00001, and the initial learning rate of the training of the branch portion is 0.0001, in other embodiments, corresponding hyper-parameters may also be different during model training, the batch size and the initial learning rate may also be other values, and the value of each parameter in the training process is not limited in this embodiment.
The classification module 140 is configured to set an output threshold of the classification model, so as to obtain a set classification model, where the output threshold controls the output of the classification result in advance, and the set classification model is used to perform multi-label classification on the input gastroscope image.
Setting output threshold values of branch parts of the multi-branch neural network model, calculating results of all classification labels when data pass through a branch classifier, and calculating uncertainty degree values according to the classification results:
Figure 694624DEST_PATH_IMAGE009
wherein N is the total number of the classifiers,
Figure 105883DEST_PATH_IMAGE007
is the classification result of the ith branch classifier.
And if the uncertainty degree value of at least one label is higher than the output threshold, judging that the current confidence is insufficient, not recognizing the accuracy of multi-label classification, and continuously executing a next VIT layer and a branch classifier in the trunk part.
The value of the output threshold is determined according to the characteristics of different classification tasks and the actual classification precision requirement, and the specific value is not limited in this embodiment. For example, in the classification task of a gastroscopic picture with 20 labels, the ideal value of the output threshold is 0.3.
The gastroscope picture input training model needing multi-label classification is processed, the trained model has good identification precision and can be used for multi-label classification of the input gastroscope picture, and when the uncertainty degree value of the classification result is lower than an output threshold value, the classification result can be output in advance, and the speed of gastroscope picture classification is accelerated.
Alternatively, the model may be tested using test data, and the comparison result between the trained VIT network-based gastroscope image multi-label classification model and the existing gastroscope image classification model in terms of both speed and accuracy may be obtained by referring to the test data shown in table 1 below. According to the table 1, under the condition that the trained gastroscope image multi-label classification model based on the VIT network and the existing gastroscope image classification model are basically the same in accuracy, the time consumption of recognition is saved by 10 milliseconds compared with the gastroscope image classification model, and the time consumption is greatly reduced. Therefore, the method has higher identification precision and identification speed.
Table 1:
Figure 167380DEST_PATH_IMAGE010
the basic principles of the present application have been described in connection with specific embodiments, but it should be noted that, for those skilled in the art, it can be understood that all or any of the steps or components of the method and apparatus of the present application can be implemented in hardware, firmware, software or their combination in any computing device (including processors, storage media, etc.) or network of computing devices, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present application.
The object of the present application can thus also be achieved by running a program or a set of programs on any computing device. The computing device may be a general purpose device as is well known. The object of the application can thus also be achieved merely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present application, and a storage medium storing such a program product also constitutes the present application. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future.
It is further noted that in the apparatus and method of the present application, it is apparent that the components or steps may be disassembled and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.
Unless otherwise defined, technical or scientific terms used in the claims and the specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. The use of "first," "second," and similar terms in the description and claims of this patent application do not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. The terms "a" or "an," and the like, do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprise" or "comprises", and the like, means that the element or item listed before "comprises" or "comprising" covers the element or item listed after "comprising" or "comprises" and its equivalent, and does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, nor are they restricted to direct or indirect connections.
The above-described embodiments should not be construed as limiting the scope of the present application. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (9)

1. A VIT network-based gastroscope picture multi-label classification system comprising:
the sample processing module is used for preprocessing the sample picture to obtain a processed sample picture;
the model building module is used for building a preset network model based on the VIT network, the preset network model based on the VIT network comprises a main part and a branch part, the main part comprises a plurality of layers of VIT networks and a main classifier, the branch part comprises a branch classifier added at the output position of each VIT layer of the last layer of the main part, and the main classifier and the branch classifier can perform multi-label classification;
the model training module is used for inputting the processed sample picture into the preset network model and training the processed sample picture by using a preset error function to obtain a classification model;
and the classification module is used for setting an output threshold of the classification model to obtain the set classification model, the output threshold controls the output of the classification result in advance, and the set classification model is used for performing multi-label classification on the input gastroscope picture.
2. The system of claim 1, wherein the trunk classifier and the branch classifier are respectively formed by one or more of a convolutional layer, a pooling layer, a VIT layer, and a fully-connected layer, and the number of fully-connected layers is consistent with the total classification number of the classifications.
3. The system of claim 1, wherein the outputting a threshold control output ahead of the classification result comprises: and calculating the classification results of all the classification labels when the data passes through the branch classifier, calculating uncertainty degree values of the classification results, outputting the classification results and stopping executing when the uncertainty degree values of the classification results of all the classification labels are lower than an output threshold value, and otherwise, continuously executing the lower-layer VIT network and the branch classifier.
4. The system according to claim 1, wherein the gastroscopic sample picture preprocessing method comprises: scaling and clipping processing, random horizontal turning processing, and standardization processing or any combination thereof.
5. The system of claim 2, wherein the method for multi-label classification by the trunk classifier and the branch classifier respectively comprises:
extracting gastroscope sample picture characteristics by utilizing the convolution layer and the pooling layer;
copying the sample picture features as C parts, wherein C is the total category number;
capturing the dependency relationship among disease categories in a gastroscope sample picture by using a VIT layer and outputting C category characteristics;
and respectively decoding the class characteristics by utilizing C full-connection operations of the full-connection layer, and outputting C class predictions of the gastroscope sample picture.
6. The system of claim 1, wherein the preset error function comprises an error function of a training trunk portion and an error function of a training branch portion.
7. The system of claim 6, wherein the error function of the training trunk portion is a cross entropy function and the error function of the training branch portion is:
Figure 455133DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 867660DEST_PATH_IMAGE002
and
Figure 82609DEST_PATH_IMAGE003
as a result of the classification by the ith branch classifier,
Figure 264192DEST_PATH_IMAGE004
is the classification result of the stem classifier, j is the training times,
Figure 453865DEST_PATH_IMAGE005
as a function of divergence, L is all classifiersThe number of (2).
8. The system of claim 1, wherein each layer of the VIT network comprises a multi-head attention layer for obtaining sequence features, a feed-forward propagation layer for nonlinear transformation, and two addition normalization layers for normalization; and respectively constructing an addition normalization layer and a feedforward propagation layer in front of the multi-head attention layer and the feedforward propagation layer of each layer, wherein the feedforward propagation layer is positioned behind the multi-head attention layer, and the output of the VIT network of each layer is directly used as the input of the VIT network of the next layer.
9. The system of claim 3, wherein the classification result uncertainty measure value is derived by the following equation:
Figure 353688DEST_PATH_IMAGE006
wherein N is the total number of the classifiers,
Figure 857481DEST_PATH_IMAGE007
is the classification result of the ith branch classifier.
CN202110640686.0A 2021-06-09 2021-06-09 Gastroscope picture multi-label classification system based on VIT network Pending CN113096131A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110640686.0A CN113096131A (en) 2021-06-09 2021-06-09 Gastroscope picture multi-label classification system based on VIT network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110640686.0A CN113096131A (en) 2021-06-09 2021-06-09 Gastroscope picture multi-label classification system based on VIT network

Publications (1)

Publication Number Publication Date
CN113096131A true CN113096131A (en) 2021-07-09

Family

ID=76664507

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110640686.0A Pending CN113096131A (en) 2021-06-09 2021-06-09 Gastroscope picture multi-label classification system based on VIT network

Country Status (1)

Country Link
CN (1) CN113096131A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627597A (en) * 2021-08-12 2021-11-09 上海大学 Countermeasure sample generation method and system based on general disturbance
CN113743384A (en) * 2021-11-05 2021-12-03 广州思德医疗科技有限公司 Stomach picture identification method and device
WO2023284182A1 (en) * 2021-07-15 2023-01-19 Zhejiang Dahua Technology Co., Ltd. Training method for recognizing moving target, method and device for recognizing moving target

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472676A (en) * 2019-08-05 2019-11-19 首都医科大学附属北京朝阳医院 Stomach morning cancerous tissue image classification system based on deep neural network
CN112364926A (en) * 2020-11-17 2021-02-12 苏州大学 Gastroscope picture classification method and device based on ResNet-50 time compression and storage medium
US20210104321A1 (en) * 2018-11-15 2021-04-08 Ampel Biosolutions, Llc Machine learning disease prediction and treatment prioritization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210104321A1 (en) * 2018-11-15 2021-04-08 Ampel Biosolutions, Llc Machine learning disease prediction and treatment prioritization
CN110472676A (en) * 2019-08-05 2019-11-19 首都医科大学附属北京朝阳医院 Stomach morning cancerous tissue image classification system based on deep neural network
CN112364926A (en) * 2020-11-17 2021-02-12 苏州大学 Gastroscope picture classification method and device based on ResNet-50 time compression and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023284182A1 (en) * 2021-07-15 2023-01-19 Zhejiang Dahua Technology Co., Ltd. Training method for recognizing moving target, method and device for recognizing moving target
CN113627597A (en) * 2021-08-12 2021-11-09 上海大学 Countermeasure sample generation method and system based on general disturbance
CN113627597B (en) * 2021-08-12 2023-10-13 上海大学 Method and system for generating countermeasure sample based on general disturbance
CN113743384A (en) * 2021-11-05 2021-12-03 广州思德医疗科技有限公司 Stomach picture identification method and device

Similar Documents

Publication Publication Date Title
Sampath et al. A survey on generative adversarial networks for imbalance problems in computer vision tasks
US10482603B1 (en) Medical image segmentation using an integrated edge guidance module and object segmentation network
Jin et al. DUNet: A deformable network for retinal vessel segmentation
CN113096131A (en) Gastroscope picture multi-label classification system based on VIT network
Liu et al. A deep learning method for breast cancer classification in the pathology images
Al-Kharraz et al. Automated system for chromosome karyotyping to recognize the most common numerical abnormalities using deep learning
CN113034500A (en) Digestive tract endoscope picture focus identification system based on multi-channel structure
US20240071110A1 (en) Convolutional localization networks for intelligent captioning of medical images
Zhu et al. Improved prediction on heart transplant rejection using convolutional autoencoder and multiple instance learning on whole-slide imaging
CN115601602A (en) Cancer tissue pathology image classification method, system, medium, equipment and terminal
Peng et al. Unsupervised mitochondria segmentation in EM images via domain adaptive multi-task learning
Yang et al. Discriminative semi-supervised dictionary learning with entropy regularization for pattern classification
Wang et al. Superpixel inpainting for self-supervised skin lesion segmentation from dermoscopic images
Stanitsas et al. Image descriptors for weakly annotated histopathological breast cancer data
Meng et al. Pneumonia diagnosis on chest X-rays with machine learning
CN113313177A (en) Digestive tract endoscope picture multi-label classification system
Mounir et al. Self-supervised temporal event segmentation inspired by cognitive theories
Liu et al. Beyond vanilla convolution: Random pixel difference convolution for face perception
Asiri et al. Advancing brain tumor detection: harnessing the Swin Transformer’s power for accurate classification and performance analysis
Kaoungku et al. Colorectal Cancer Histology Image Classification Using Stacked Ensembles
Mann et al. Estimation of age groups using facial recognition features
Müller et al. Shortcut detection with variational autoencoders
Alghamdi Facial Expressions-Based Pain Assessment System Using Deep Learning Techniques
Cheslerean-Boghiu et al. Transformer-based interpretable multi-modal data fusion for skin lesion classification
Kumar et al. Improving the efficiency of Plant-Leaf Disease detection using Convolutional Neural Network optimizer-Adam Algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210709

RJ01 Rejection of invention patent application after publication