CN115048537A - Disease recognition system based on image-text multi-mode collaborative representation - Google Patents
Disease recognition system based on image-text multi-mode collaborative representation Download PDFInfo
- Publication number
- CN115048537A CN115048537A CN202210809344.1A CN202210809344A CN115048537A CN 115048537 A CN115048537 A CN 115048537A CN 202210809344 A CN202210809344 A CN 202210809344A CN 115048537 A CN115048537 A CN 115048537A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- disease
- model
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 201000010099 disease Diseases 0.000 title claims abstract description 98
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 title claims abstract description 98
- 238000000034 method Methods 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims abstract description 22
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000003745 diagnosis Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 29
- 238000012360 testing method Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 11
- 230000000007 visual effect Effects 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 6
- 208000031940 Disease Attributes Diseases 0.000 claims description 3
- 230000003902 lesion Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 2
- 230000004048 modification Effects 0.000 claims description 2
- 230000009471 action Effects 0.000 abstract description 3
- 230000035945 sensitivity Effects 0.000 abstract description 3
- 235000013311 vegetables Nutrition 0.000 description 12
- 240000008067 Cucumis sativus Species 0.000 description 3
- 235000010799 Cucumis sativus var sativus Nutrition 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 241000196324 Embryophyta Species 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 238000011217 control strategy Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000005507 spraying Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 241000221785 Erysiphales Species 0.000 description 1
- 235000007688 Lycopersicon esculentum Nutrition 0.000 description 1
- 241000233679 Peronosporaceae Species 0.000 description 1
- 240000003768 Solanum lycopersicum Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000012271 agricultural production Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000691 measurement method Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/68—Food, e.g. fruit or vegetables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Animal Behavior & Ethology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a disease identification system based on image-text multi-mode collaborative representation, which is characterized by comprising the following steps: the image identification module is used for identifying the image data; the text recognition module is connected with the image recognition module and used for extracting text data features; the knowledge map module is connected with the text recognition module and used for providing intellectual guidance for the disease diagnosis process; and the model training module is connected with the knowledge graph module and is used for acquiring a disease category identification result. The method can help to improve the accuracy of the disease identification result and improve the robustness of the disease identification model; improving the recognition accuracy and performing model interpretation; the recognition accuracy, model sensitivity and model specificity of the optimal model of the model are obviously improved; the model is interfered when the disease characteristics are extracted from the model under the action of knowledge, and the reliability of the model is improved.
Description
Technical Field
The invention belongs to the field of research on vegetable leaf disease identification models, and particularly relates to a disease identification system based on image-text multi-mode collaborative representation.
Background
Vegetable diseases are one of the important challenges which puzzle agricultural production safety, agricultural product quality safety and ecological environment safety for a long time, and not only seriously affect the yield and quality of vegetables, but also are important factors which affect the overall benefits of vegetable industries. According to the estimation of the food and agriculture organization of the united nations, the average loss caused by the crops after suffering diseases is 10 to 30 percent of the total yield. Therefore, the rapid and accurate identification of the vegetable diseases is the first step of taking related control measures to stop damage in time. With the development of information science, technologies such as image processing, machine learning and the like are applied to vegetable disease diagnosis, and powerful technologies and means are provided for rapid, accurate and nondestructive diagnosis of vegetable diseases.
The main difficulty of identifying the cucumber leaf diseases under the complex background is that the disease images often contain various backgrounds such as other plants, soil, mulching films, water pipes and the like, the backgrounds contain elements similar to symptoms or diseases, and even complex background information submerges disease characteristics. Therefore, the existing classical classification model is directly used for identifying the disease image under the complex background, and the effect is not ideal.
The main defects of the disease characterization learning by purely depending on image mode data are as follows: on one hand, the data of the image modality does not contain all characteristics of the diseases, and the characteristics are supplemented by the data of other modalities; on the other hand, only low-level image features are learned by the model, and features on which recognition decisions depend are difficult to understand by people. As the disease identification result directly influences the subsequent control strategy and medicine spraying, the reliability of the model becomes a difficult problem of development and application of deep learning in the disease identification field as a sensitive task.
Disclosure of Invention
The invention aims to provide an image-text multi-mode collaborative representation-based disease identification system to solve the problems in the prior art.
In order to achieve the purpose, the invention provides the following scheme: an image-text multi-modal collaborative representation-based disease identification system comprises:
the image identification module is used for identifying the image data;
the text recognition module is connected with the image recognition module and used for extracting text data features;
the knowledge map module is connected with the text recognition module and used for providing intellectual guidance for the disease diagnosis process;
and the model training module is connected with the knowledge graph module and is used for acquiring a disease category identification result.
Preferably, the image recognition module comprises an image modality unit, and the image modality unit is used for reading the color, the shape and the occurrence position visual characteristics of the lesion.
Preferably, the text recognition module comprises a text mode unit and a text feature extraction unit;
the text mode unit is used for reading a text expression form of the image visual characteristics;
the text feature extraction unit is used for extracting features of a single input vector and features around the single input vector.
Preferably, the knowledge graph module comprises an entity identification unit, an attribute relationship establishing unit and an attribute value extracting unit;
the entity recognition unit is used for extracting real words from the original text;
the attribute relationship establishing unit is used for establishing attribute relationships among occurrence characteristics of diseases of different crops;
the attribute value extraction unit is used for extracting disease attribute relations defined in advance, eliminating disease category entities from all entities of all original texts, and enabling the rest entities and the attribute relations to correspondingly form a complete disease knowledge triple.
Preferably, the model training module comprises a preprocessing unit, and the preprocessing unit is used for reading image mode unit data, resizing images again and acquiring images with uniform sizes; the method comprises the steps of reading text modal unit data, limiting the maximum value of a representation vector of each sentence description text, and obtaining text vectors with uniform length; and the system is also used for reading knowledge map module data, setting the vector dimension represented by each attribute value and acquiring knowledge map text vectors with uniform length.
Preferably, the model training module includes a unified training unit, and the unified training unit is configured to jointly train the image recognition module and the text recognition module to obtain an image recognition model and a text recognition model.
Preferably, the model training module comprises a knowledge-graph modification unit that links the knowledge-graph module using text modality unit information, followed by knowledge-graph assisted classification.
Preferably, the model training module includes a model testing unit, and the model testing unit is configured to centralize all image-text pairs and test the image-text pairs according to a test procedure to obtain the recognition accuracy.
The invention discloses the following technical effects: the method has the advantages that the information from the disease text and the image in two modes is fused and converged by the characteristics, the information supplement is realized, the accuracy of the disease identification result is improved, and the robustness of the disease identification model is improved. The knowledge vector is mainly redistributed to the disease category initial probability in a text matching mode, so that the model can be assisted to improve the recognition accuracy and carry out model interpretation. The recognition accuracy, model sensitivity and model specificity of the optimal model of the model are obviously improved. The knowledge map is applied to a disease decision process, so that the model intervenes when disease features are extracted from the model under the action of knowledge, and the reliability of the model is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
FIG. 1 is a schematic diagram of a system architecture in an embodiment of the present invention;
fig. 2 is a flow chart of model construction in the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Although satisfactory recognition results are obtained in the study of specific data sets in the prior art, the main defects of the characterization learning of diseases by purely depending on image modality data are shown as follows: on one hand, the data of the image modality does not contain all characteristics of the diseases, and the characteristics are supplemented by the data of other modalities; on the other hand, only low-level image features are learned by the model, and features on which recognition decisions depend are difficult to understand by people. As the disease identification result directly influences the subsequent control strategy and medicine spraying, the reliability of the model becomes a difficult problem of development and application of deep learning in the disease identification field as a sensitive task.
The invention provides a complex background disease identification system based on image-text multi-mode collaborative representation and knowledge assistance. The method uses three types of data of a disease image modality, a disease description text modality and a disease field knowledge map to carry out a disease identification task together, and in the image modality, an identification model can accurately position and learn visual characteristics of diseases as much as possible; in a text mode, a disease description text is repeatedly expressed on visual features of diseases, but the interference of a complex background in an image mode is eliminated in the expression, which is equivalent to the enhancement of the visual features of the diseases; the disease domain knowledge map contains domain knowledge, so that the model is not identified from zero in the process of judging the disease category, but is judged on the basis of obtaining the disease domain knowledge, and the model is more reliable.
As shown in fig. 1-2, the disease identification system based on image-text multi-modal collaborative representation provided by this embodiment performs vegetable disease identification jointly with three modalities as input.
The input of the complex background cucumber leaf disease identification model is an RGB image of the vegetable leaf disease collected in the field, a disease description text and a previously established domain knowledge map, and the output is the disease category.
Image recognition module
The image modality input data includes rich visual features (color, shape, occurrence position, and the like of lesions) and is denoted as T i img The feature extractor mainly designs a residual connecting structure by using a ResNet18 network structure and a ResNet18 network structure, so that the problem of gradient disappearance or gradient explosion caused by too deep network layers can be avoided to the maximum extent.
ResNet18 has 18 layers in total, the input image size of the network structure is 224 x 224, the number of channels is 3, the image size is reduced to 112 x 112 after the first layer of convolution, the number of channels is increased to 64, the image is further reduced to 56 x 56 after the maximum pooling layer, the number of channels is not increased, the image enters the residual part after the first two steps of operation, the size of each residual image is reduced to half of the original size, the number of channels is increased to 2 times, the image reduction is realized by the convolution layer with the step size of 2, the image size is reduced to 7 x 7 after 4 times of residual operation, the number of channels is increased to 512, and finally the average pooling layer and the full connection layer are connected.
Text recognition module
Another text representation, denoted T, that includes visual features of an image in text mode input data i text . Compared with image features, text feature dimensions are greatly reduced, semantic representation of single features is rich, and therefore the features of single input vectors and features around the single input vectors are extracted by using a recurrent neural network, and a feature extractor adopts TextRCNN, and the input of the TextRCNN is preprocessed text C (T) i ) And a text label L, text C (T) first i ) Obtaining context characteristic C of text after passing through bidirectional recurrent neural network (LSTM) l (T i ) And C r (T i ) The calculation formula is shown in the formulas (1) and (2).
C l (T i )=f(W (l) C l (T i-1 )+W (sl) e(T i-1 )) (1)
C r (T i )=f(W (r) C r (T i+1 )+W (sr) e(T i+1 )) (2)
Where f (-) is the Tanh activation function, T i For current vectorized text, T i-1 And T i+1 Upper and lower, W, respectively, of vectorized text (l) And W (r) Respectively, the left and right cyclic neural network hidden layers and the conversion matrix of the next hidden layer, W (sl) And W (sr) A combination matrix of the left and right semantics of the current vectorized text semantic and the next vectorized text, respectively, C l And C r Left and right quantized text, e (T), respectively, for the current vectorized text i-1 ) And e (T) i+1 ) A left word embedding vector and a right word embedding vector, respectively, representing the current vectorized text.
And splicing the obtained context features and the current text features to be used as the input of a specific feature extractor, wherein the specific feature extractor extracts the text features with the most obvious features in the vectorized text by using a maximum pooling layer (MaxPool), and finally combining the image features and the text features to obtain combined features as shown in a formula (3).
Knowledge graph module
The knowledge map in the field of vegetable diseases provides knowledge guidance for the disease diagnosis process, so that the establishment of a knowledge map which is complete in coverage and accurate in description is very important. In the method, the disease knowledge map with triples as basic units is created by crawling the vegetable intellectual description of Baidu encyclopedia and various agricultural disease control websites, and performing the steps of entity identification, attribute relationship establishment, attribute value extraction and the like. The entity recognition part firstly carries out word segmentation operation on an original text (a word segmentation tool uses jieba), judges the part of speech of a word after the word segmentation is obtained, removes other words of non-real words and keeps real words as the entity of the knowledge graph; the attribute relation establishing part adopts a manual marking method and establishes an attribute relation according to occurrence characteristics of diseases of different crops under the guidance of plant protection experts; the attribute value extraction part excludes disease category entities (such as entities of tomato powdery mildew, cucumber downy mildew and the like) from all entities of all original texts according to the predefined disease attribute relationship, and the rest entities and the attribute relationship correspondingly form a complete disease knowledge triple.
Training module
The learning rates of the model image recognition module and the text recognition module are 0.0002, and Adam is adopted by the classification network optimizer.
Step one, data preprocessing. In the image mode, the original image is uniformly resized into 224 multiplied by 224 pixels to obtain an image with uniform size; in the text mode, a bag of words model (bag of words) is adopted for text representation, the maximum value of each sentence description text representation vector is limited to be 20, if the maximum value is exceeded, the maximum value is omitted, and if the maximum value is not exceeded, 0 supplementing operation is carried out to obtain text vectors with uniform length; the knowledge graph adopts a word2vec model to perform text representation, and the text representation is performed by using sufficient vectors, so that the distance between similar attribute values is closer, and the distance between attribute values with large semantic difference is farther, therefore, the dimension of each attribute value representation vector is set to be 100, and the knowledge graph text vectors with uniform length are obtained.
And step two, the image recognition module and the text recognition module are trained together. The entire image-text module network was trained 50 rounds. Training relevant parameters of a ResNet18 feature extractor based on the image features and the image labels, fixing the relevant parameters of the ResNet18 feature extractor when the accuracy rate does not change any more in the training process, and storing an image recognition model; and training relevant parameters of the TextRCNN feature extractor based on the text features and the text labels, fixing the relevant parameters of the TextRCNN feature extractor when the accuracy does not change any more in the training process, and storing the text recognition model.
And step three, adding a knowledge graph module. And linking the knowledge graph by using text modal information, performing word importance evaluation by using a Baidu open source tool LAC before linking, selecting a real word part, and discarding non-real word parts such as conjunctions and the like.
After the important real words are selected, re-embedding the important real words (using word2vec training word vectors), obtaining word vectors of the important real words, and then carrying out knowledge graph linking, wherein the linking process is shown as a formula 4.
Wherein d represents the distance between the real word vector and the knowledge vector, the distance measurement method adopts cosine similarity,represents a set of real-word vectors,representing a set of knowledge vectors. In the present study, it was shown that the linking of real words to the knowledge-graph was successful.
After the text mode and the knowledge graph are linked, a knowledge graph auxiliary classification process is carried out, an image-text multi-mode recognition model needs to be obtained in advance in the process, and then an inference process of a disease classification model based on image-text multi-mode collaborative representation and knowledge assistance is shown as a formula 5.
Wherein P is i&t&k And (3) representing the multi-mode joint output probability of the image-text of the fusion knowledge, wherein M (-) is the joint output of the image-text, W (-) is the initial probability value after knowledge matching, if the matching is successful, the initial probability is Softmax (n), n is the number of successful real word matching, and otherwise, the initial probability is Softmax (0).
And step four, testing the model, namely testing the model on the test set. The method comprises the steps of inputting a disease image to be recognized and a corresponding description text in a model, obtaining a test image with a uniform size and a vectorized disease description text after a preprocessing step, inputting the test image and the disease description text into an image feature extraction model and a text feature extraction model respectively to obtain recognition probabilities of the test image and the disease description text, then linking disease description text real words to a knowledge map, obtaining a final recognition result according to the number of successful links, and finally testing all image-text pairs in a test set according to a test flow to obtain recognition accuracy.
The invention provides a complex background disease recognition system based on image-text multi-mode collaborative representation and knowledge assistance, and a model takes three modes as input to carry out vegetable disease recognition together. Meanwhile, the knowledge map is applied to a disease decision process, so that the model intervenes when the disease features are extracted from the model under the action of knowledge, and the reliability of the model is improved.
Compared with the existing disease identification model, the invention relates to a complex background disease identification system based on image-text multi-mode collaborative representation and knowledge assistance, a single disease image mode usually cannot contain all effective information required for accurately identifying disease types, and information from two modes of a disease text and an image is converged by feature fusion, so that information supplement is realized, the accuracy of a disease identification result is improved, and the robustness of the disease identification model is improved. The knowledge vector is mainly redistributed to the disease category initial probability in a text matching mode, so that the model can be assisted to improve the recognition accuracy and carry out model interpretation. And finally, the recognition accuracy, model sensitivity and model specificity of the optimal model in the disease recognition model based on the image-text multi-mode collaborative representation and knowledge assistance are respectively 99.63%, 99%, 99.07% and 99.78%.
Claims (8)
1. An image-text multi-modal collaborative representation-based disease identification system is characterized by comprising:
the image identification module is used for identifying the image data;
the text recognition module is connected with the image recognition module and used for extracting text data features;
the knowledge map module is connected with the text recognition module and used for providing intellectual guidance for the disease diagnosis process;
and the model training module is connected with the knowledge graph module and is used for acquiring a disease category identification result.
2. The system for disease identification based on image-text multimodal collaborative representation according to claim 1, characterized in that,
the image identification module comprises an image modality unit, and the image modality unit is used for reading the color, the shape and the occurrence position visual characteristics of the lesion.
3. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 1,
the text recognition module comprises a text modal unit and a text feature extraction unit;
the text mode unit is used for reading a text expression form of the image visual characteristics;
the text feature extraction unit is used for extracting features of a single input vector and features around the single input vector.
4. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 1,
the knowledge graph module comprises an entity identification unit, an attribute relation establishment unit and an attribute value extraction unit;
the entity recognition unit is used for extracting real words from the original text;
the attribute relationship establishing unit is used for establishing attribute relationships among occurrence characteristics of diseases of different crops;
the attribute value extraction unit is used for extracting disease attribute relations defined in advance, eliminating disease category entities from all entities of all original texts, and enabling the rest entities and the attribute relations to correspondingly form a complete disease knowledge triple.
5. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 1,
the model training module comprises a preprocessing unit, wherein the preprocessing unit is used for reading image mode unit data, re-adjusting the size of an image and acquiring images with uniform sizes; the method comprises the steps of reading text modal unit data, limiting the maximum value of a representation vector of each sentence description text, and obtaining text vectors with uniform length; and the system is also used for reading knowledge map module data, setting the vector dimension represented by each attribute value and acquiring knowledge map text vectors with uniform length.
6. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 5,
the model training module comprises a unified training unit, and the unified training unit is used for training the image recognition module and the text recognition module together to obtain an image recognition model and a text recognition model.
7. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 5,
the model training module comprises a knowledge graph modification unit which uses text modal unit information to link the knowledge graph module and then performs knowledge graph assisted classification.
8. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 5,
the model training module comprises a model testing unit, and the model testing unit is used for centralizing all image-text pairs to be tested according to a testing process to obtain the identification accuracy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210809344.1A CN115048537A (en) | 2022-07-11 | 2022-07-11 | Disease recognition system based on image-text multi-mode collaborative representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210809344.1A CN115048537A (en) | 2022-07-11 | 2022-07-11 | Disease recognition system based on image-text multi-mode collaborative representation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115048537A true CN115048537A (en) | 2022-09-13 |
Family
ID=83165948
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210809344.1A Pending CN115048537A (en) | 2022-07-11 | 2022-07-11 | Disease recognition system based on image-text multi-mode collaborative representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115048537A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051132A (en) * | 2023-04-03 | 2023-05-02 | 之江实验室 | Illegal commodity identification method and device, computer equipment and storage medium |
CN116246176A (en) * | 2023-05-12 | 2023-06-09 | 山东建筑大学 | Crop disease detection method and device, electronic equipment and storage medium |
CN117611924A (en) * | 2024-01-17 | 2024-02-27 | 贵州大学 | Plant leaf phenotype disease classification method based on graphic subspace joint learning |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114565826A (en) * | 2022-04-28 | 2022-05-31 | 南京绿色科技研究院有限公司 | Agricultural pest and disease identification and diagnosis method, system and device |
-
2022
- 2022-07-11 CN CN202210809344.1A patent/CN115048537A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114565826A (en) * | 2022-04-28 | 2022-05-31 | 南京绿色科技研究院有限公司 | Agricultural pest and disease identification and diagnosis method, system and device |
Non-Patent Citations (2)
Title |
---|
CHUNSHAN WANG等: "Few-shot vegetable disease recognition model based on image text collaborative representation learning" * |
JI ZHOU等: "Crop disease identification and interpretation method based on multimodal deep learning" * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116051132A (en) * | 2023-04-03 | 2023-05-02 | 之江实验室 | Illegal commodity identification method and device, computer equipment and storage medium |
CN116246176A (en) * | 2023-05-12 | 2023-06-09 | 山东建筑大学 | Crop disease detection method and device, electronic equipment and storage medium |
CN116246176B (en) * | 2023-05-12 | 2023-09-19 | 山东建筑大学 | Crop disease detection method and device, electronic equipment and storage medium |
CN117611924A (en) * | 2024-01-17 | 2024-02-27 | 贵州大学 | Plant leaf phenotype disease classification method based on graphic subspace joint learning |
CN117611924B (en) * | 2024-01-17 | 2024-04-09 | 贵州大学 | Plant leaf phenotype disease classification method based on graphic subspace joint learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115048537A (en) | Disease recognition system based on image-text multi-mode collaborative representation | |
CN110349676B (en) | Time-series physiological data classification method and device, storage medium and processor | |
CN111582225B (en) | Remote sensing image scene classification method and device | |
CN112036276B (en) | Artificial intelligent video question-answering method | |
CN114782694B (en) | Unsupervised anomaly detection method, system, device and storage medium | |
CN113378676A (en) | Method for detecting figure interaction in image based on multi-feature fusion | |
CN116994069B (en) | Image analysis method and system based on multi-mode information | |
CN110866542A (en) | Depth representation learning method based on feature controllable fusion | |
CN112732921B (en) | False user comment detection method and system | |
KR102563550B1 (en) | Method and apparatus for read-only prompt learning | |
CN115050014A (en) | Small sample tomato disease identification system and method based on image text learning | |
CN114841151B (en) | Medical text entity relation joint extraction method based on decomposition-recombination strategy | |
CN116564355A (en) | Multi-mode emotion recognition method, system, equipment and medium based on self-attention mechanism fusion | |
CN111242059B (en) | Method for generating unsupervised image description model based on recursive memory network | |
CN116402066A (en) | Attribute-level text emotion joint extraction method and system for multi-network feature fusion | |
CN116304984A (en) | Multi-modal intention recognition method and system based on contrast learning | |
CN115309927A (en) | Multi-label guiding and multi-view measuring ocean remote sensing image retrieval method and system | |
CN115545021A (en) | Clinical term identification method and device based on deep learning | |
Xu et al. | Zero-shot compound fault diagnosis method based on semantic learning and discriminative features | |
CN117012373B (en) | Training method, application method and system of grape embryo auxiliary inspection model | |
CN114399661A (en) | Instance awareness backbone network training method | |
CN114022687A (en) | Image description countermeasure generation method based on reinforcement learning | |
CN117056451A (en) | New energy automobile complaint text aspect-viewpoint pair extraction method based on context enhancement | |
CN112215285A (en) | Cross-media-characteristic-based automatic fundus image labeling method | |
CN113159071B (en) | Cross-modal image-text association anomaly detection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220913 |
|
RJ01 | Rejection of invention patent application after publication |