CN115048537A

CN115048537A - Disease recognition system based on image-text multi-mode collaborative representation

Info

Publication number: CN115048537A
Application number: CN202210809344.1A
Authority: CN
Inventors: 王春山; 吴育瑶; 周冀; 冯徐广; 张文浩; 孙社栋; 孙威; 姚惠; 王彤; 李久熙
Original assignee: Hebei Agricultural University
Current assignee: Hebei Agricultural University
Priority date: 2022-07-11
Filing date: 2022-07-11
Publication date: 2022-09-13

Abstract

The invention discloses a disease identification system based on image-text multi-mode collaborative representation, which is characterized by comprising the following steps: the image identification module is used for identifying the image data; the text recognition module is connected with the image recognition module and used for extracting text data features; the knowledge map module is connected with the text recognition module and used for providing intellectual guidance for the disease diagnosis process; and the model training module is connected with the knowledge graph module and is used for acquiring a disease category identification result. The method can help to improve the accuracy of the disease identification result and improve the robustness of the disease identification model; improving the recognition accuracy and performing model interpretation; the recognition accuracy, model sensitivity and model specificity of the optimal model of the model are obviously improved; the model is interfered when the disease characteristics are extracted from the model under the action of knowledge, and the reliability of the model is improved.

Description

Disease identification system based on image-text multi-mode collaborative representation

Technical Field

The invention belongs to the field of research on vegetable leaf disease identification models, and particularly relates to a disease identification system based on image-text multi-mode collaborative representation.

Background

Vegetable diseases are one of the important challenges which puzzle agricultural production safety, agricultural product quality safety and ecological environment safety for a long time, and not only seriously affect the yield and quality of vegetables, but also are important factors which affect the overall benefits of vegetable industries. According to the estimation of the food and agriculture organization of the united nations, the average loss caused by the crops after suffering diseases is 10 to 30 percent of the total yield. Therefore, the rapid and accurate identification of the vegetable diseases is the first step of taking related control measures to stop damage in time. With the development of information science, technologies such as image processing, machine learning and the like are applied to vegetable disease diagnosis, and powerful technologies and means are provided for rapid, accurate and nondestructive diagnosis of vegetable diseases.

The main difficulty of identifying the cucumber leaf diseases under the complex background is that the disease images often contain various backgrounds such as other plants, soil, mulching films, water pipes and the like, the backgrounds contain elements similar to symptoms or diseases, and even complex background information submerges disease characteristics. Therefore, the existing classical classification model is directly used for identifying the disease image under the complex background, and the effect is not ideal.

The main defects of the disease characterization learning by purely depending on image mode data are as follows: on one hand, the data of the image modality does not contain all characteristics of the diseases, and the characteristics are supplemented by the data of other modalities; on the other hand, only low-level image features are learned by the model, and features on which recognition decisions depend are difficult to understand by people. As the disease identification result directly influences the subsequent control strategy and medicine spraying, the reliability of the model becomes a difficult problem of development and application of deep learning in the disease identification field as a sensitive task.

Disclosure of Invention

The invention aims to provide an image-text multi-mode collaborative representation-based disease identification system to solve the problems in the prior art.

In order to achieve the purpose, the invention provides the following scheme: an image-text multi-modal collaborative representation-based disease identification system comprises:

the image identification module is used for identifying the image data;

the text recognition module is connected with the image recognition module and used for extracting text data features;

the knowledge map module is connected with the text recognition module and used for providing intellectual guidance for the disease diagnosis process;

and the model training module is connected with the knowledge graph module and is used for acquiring a disease category identification result.

Preferably, the image recognition module comprises an image modality unit, and the image modality unit is used for reading the color, the shape and the occurrence position visual characteristics of the lesion.

Preferably, the text recognition module comprises a text mode unit and a text feature extraction unit;

the text mode unit is used for reading a text expression form of the image visual characteristics;

the text feature extraction unit is used for extracting features of a single input vector and features around the single input vector.

Preferably, the knowledge graph module comprises an entity identification unit, an attribute relationship establishing unit and an attribute value extracting unit;

the entity recognition unit is used for extracting real words from the original text;

the attribute relationship establishing unit is used for establishing attribute relationships among occurrence characteristics of diseases of different crops;

the attribute value extraction unit is used for extracting disease attribute relations defined in advance, eliminating disease category entities from all entities of all original texts, and enabling the rest entities and the attribute relations to correspondingly form a complete disease knowledge triple.

Preferably, the model training module comprises a preprocessing unit, and the preprocessing unit is used for reading image mode unit data, resizing images again and acquiring images with uniform sizes; the method comprises the steps of reading text modal unit data, limiting the maximum value of a representation vector of each sentence description text, and obtaining text vectors with uniform length; and the system is also used for reading knowledge map module data, setting the vector dimension represented by each attribute value and acquiring knowledge map text vectors with uniform length.

Preferably, the model training module includes a unified training unit, and the unified training unit is configured to jointly train the image recognition module and the text recognition module to obtain an image recognition model and a text recognition model.

Preferably, the model training module comprises a knowledge-graph modification unit that links the knowledge-graph module using text modality unit information, followed by knowledge-graph assisted classification.

Preferably, the model training module includes a model testing unit, and the model testing unit is configured to centralize all image-text pairs and test the image-text pairs according to a test procedure to obtain the recognition accuracy.

The invention discloses the following technical effects: the method has the advantages that the information from the disease text and the image in two modes is fused and converged by the characteristics, the information supplement is realized, the accuracy of the disease identification result is improved, and the robustness of the disease identification model is improved. The knowledge vector is mainly redistributed to the disease category initial probability in a text matching mode, so that the model can be assisted to improve the recognition accuracy and carry out model interpretation. The recognition accuracy, model sensitivity and model specificity of the optimal model of the model are obviously improved. The knowledge map is applied to a disease decision process, so that the model intervenes when disease features are extracted from the model under the action of knowledge, and the reliability of the model is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:

FIG. 1 is a schematic diagram of a system architecture in an embodiment of the present invention;

fig. 2 is a flow chart of model construction in the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Although satisfactory recognition results are obtained in the study of specific data sets in the prior art, the main defects of the characterization learning of diseases by purely depending on image modality data are shown as follows: on one hand, the data of the image modality does not contain all characteristics of the diseases, and the characteristics are supplemented by the data of other modalities; on the other hand, only low-level image features are learned by the model, and features on which recognition decisions depend are difficult to understand by people. As the disease identification result directly influences the subsequent control strategy and medicine spraying, the reliability of the model becomes a difficult problem of development and application of deep learning in the disease identification field as a sensitive task.

The invention provides a complex background disease identification system based on image-text multi-mode collaborative representation and knowledge assistance. The method uses three types of data of a disease image modality, a disease description text modality and a disease field knowledge map to carry out a disease identification task together, and in the image modality, an identification model can accurately position and learn visual characteristics of diseases as much as possible; in a text mode, a disease description text is repeatedly expressed on visual features of diseases, but the interference of a complex background in an image mode is eliminated in the expression, which is equivalent to the enhancement of the visual features of the diseases; the disease domain knowledge map contains domain knowledge, so that the model is not identified from zero in the process of judging the disease category, but is judged on the basis of obtaining the disease domain knowledge, and the model is more reliable.

As shown in fig. 1-2, the disease identification system based on image-text multi-modal collaborative representation provided by this embodiment performs vegetable disease identification jointly with three modalities as input.

The input of the complex background cucumber leaf disease identification model is an RGB image of the vegetable leaf disease collected in the field, a disease description text and a previously established domain knowledge map, and the output is the disease category.

Image recognition module

The image modality input data includes rich visual features (color, shape, occurrence position, and the like of lesions) and is denoted as T _i ^img The feature extractor mainly designs a residual connecting structure by using a ResNet18 network structure and a ResNet18 network structure, so that the problem of gradient disappearance or gradient explosion caused by too deep network layers can be avoided to the maximum extent.

ResNet18 has 18 layers in total, the input image size of the network structure is 224 x 224, the number of channels is 3, the image size is reduced to 112 x 112 after the first layer of convolution, the number of channels is increased to 64, the image is further reduced to 56 x 56 after the maximum pooling layer, the number of channels is not increased, the image enters the residual part after the first two steps of operation, the size of each residual image is reduced to half of the original size, the number of channels is increased to 2 times, the image reduction is realized by the convolution layer with the step size of 2, the image size is reduced to 7 x 7 after 4 times of residual operation, the number of channels is increased to 512, and finally the average pooling layer and the full connection layer are connected.

Text recognition module

Another text representation, denoted T, that includes visual features of an image in text mode input data _i ^text . Compared with image features, text feature dimensions are greatly reduced, semantic representation of single features is rich, and therefore the features of single input vectors and features around the single input vectors are extracted by using a recurrent neural network, and a feature extractor adopts TextRCNN, and the input of the TextRCNN is preprocessed text C (T) _i ) And a text label L, text C (T) first _i ) Obtaining context characteristic C of text after passing through bidirectional recurrent neural network (LSTM) _l (T _i ) And C _r (T _i ) The calculation formula is shown in the formulas (1) and (2).

C _l (T _i )＝f(W ^(l) C _l (T _i-1 )+W ^(sl) e(T _i-1 )) (1)

C _r (T _i )＝f(W ^(r) C _r (T _i+1 )+W ^(sr) e(T _i+1 )) (2)

Where f (-) is the Tanh activation function, T _i For current vectorized text, T _i-1 And T _i+1 Upper and lower, W, respectively, of vectorized text ^(l) And W ^(r) Respectively, the left and right cyclic neural network hidden layers and the conversion matrix of the next hidden layer, W ^(sl) And W ^(sr) A combination matrix of the left and right semantics of the current vectorized text semantic and the next vectorized text, respectively, C _l And C _r Left and right quantized text, e (T), respectively, for the current vectorized text _i-1 ) And e (T) _i+1 ) A left word embedding vector and a right word embedding vector, respectively, representing the current vectorized text.

And splicing the obtained context features and the current text features to be used as the input of a specific feature extractor, wherein the specific feature extractor extracts the text features with the most obvious features in the vectorized text by using a maximum pooling layer (MaxPool), and finally combining the image features and the text features to obtain combined features as shown in a formula (3).

Knowledge graph module

The knowledge map in the field of vegetable diseases provides knowledge guidance for the disease diagnosis process, so that the establishment of a knowledge map which is complete in coverage and accurate in description is very important. In the method, the disease knowledge map with triples as basic units is created by crawling the vegetable intellectual description of Baidu encyclopedia and various agricultural disease control websites, and performing the steps of entity identification, attribute relationship establishment, attribute value extraction and the like. The entity recognition part firstly carries out word segmentation operation on an original text (a word segmentation tool uses jieba), judges the part of speech of a word after the word segmentation is obtained, removes other words of non-real words and keeps real words as the entity of the knowledge graph; the attribute relation establishing part adopts a manual marking method and establishes an attribute relation according to occurrence characteristics of diseases of different crops under the guidance of plant protection experts; the attribute value extraction part excludes disease category entities (such as entities of tomato powdery mildew, cucumber downy mildew and the like) from all entities of all original texts according to the predefined disease attribute relationship, and the rest entities and the attribute relationship correspondingly form a complete disease knowledge triple.

Training module

The learning rates of the model image recognition module and the text recognition module are 0.0002, and Adam is adopted by the classification network optimizer.

Step one, data preprocessing. In the image mode, the original image is uniformly resized into 224 multiplied by 224 pixels to obtain an image with uniform size; in the text mode, a bag of words model (bag of words) is adopted for text representation, the maximum value of each sentence description text representation vector is limited to be 20, if the maximum value is exceeded, the maximum value is omitted, and if the maximum value is not exceeded, 0 supplementing operation is carried out to obtain text vectors with uniform length; the knowledge graph adopts a word2vec model to perform text representation, and the text representation is performed by using sufficient vectors, so that the distance between similar attribute values is closer, and the distance between attribute values with large semantic difference is farther, therefore, the dimension of each attribute value representation vector is set to be 100, and the knowledge graph text vectors with uniform length are obtained.

And step two, the image recognition module and the text recognition module are trained together. The entire image-text module network was trained 50 rounds. Training relevant parameters of a ResNet18 feature extractor based on the image features and the image labels, fixing the relevant parameters of the ResNet18 feature extractor when the accuracy rate does not change any more in the training process, and storing an image recognition model; and training relevant parameters of the TextRCNN feature extractor based on the text features and the text labels, fixing the relevant parameters of the TextRCNN feature extractor when the accuracy does not change any more in the training process, and storing the text recognition model.

And step three, adding a knowledge graph module. And linking the knowledge graph by using text modal information, performing word importance evaluation by using a Baidu open source tool LAC before linking, selecting a real word part, and discarding non-real word parts such as conjunctions and the like.

After the important real words are selected, re-embedding the important real words (using word2vec training word vectors), obtaining word vectors of the important real words, and then carrying out knowledge graph linking, wherein the linking process is shown as a formula 4.

Wherein d represents the distance between the real word vector and the knowledge vector, the distance measurement method adopts cosine similarity,

represents a set of real-word vectors,

representing a set of knowledge vectors. In the present study, it was shown that the linking of real words to the knowledge-graph was successful.

After the text mode and the knowledge graph are linked, a knowledge graph auxiliary classification process is carried out, an image-text multi-mode recognition model needs to be obtained in advance in the process, and then an inference process of a disease classification model based on image-text multi-mode collaborative representation and knowledge assistance is shown as a formula 5.

Wherein P is _i&t&k And (3) representing the multi-mode joint output probability of the image-text of the fusion knowledge, wherein M (-) is the joint output of the image-text, W (-) is the initial probability value after knowledge matching, if the matching is successful, the initial probability is Softmax (n), n is the number of successful real word matching, and otherwise, the initial probability is Softmax (0).

And step four, testing the model, namely testing the model on the test set. The method comprises the steps of inputting a disease image to be recognized and a corresponding description text in a model, obtaining a test image with a uniform size and a vectorized disease description text after a preprocessing step, inputting the test image and the disease description text into an image feature extraction model and a text feature extraction model respectively to obtain recognition probabilities of the test image and the disease description text, then linking disease description text real words to a knowledge map, obtaining a final recognition result according to the number of successful links, and finally testing all image-text pairs in a test set according to a test flow to obtain recognition accuracy.

The invention provides a complex background disease recognition system based on image-text multi-mode collaborative representation and knowledge assistance, and a model takes three modes as input to carry out vegetable disease recognition together. Meanwhile, the knowledge map is applied to a disease decision process, so that the model intervenes when the disease features are extracted from the model under the action of knowledge, and the reliability of the model is improved.

Compared with the existing disease identification model, the invention relates to a complex background disease identification system based on image-text multi-mode collaborative representation and knowledge assistance, a single disease image mode usually cannot contain all effective information required for accurately identifying disease types, and information from two modes of a disease text and an image is converged by feature fusion, so that information supplement is realized, the accuracy of a disease identification result is improved, and the robustness of the disease identification model is improved. The knowledge vector is mainly redistributed to the disease category initial probability in a text matching mode, so that the model can be assisted to improve the recognition accuracy and carry out model interpretation. And finally, the recognition accuracy, model sensitivity and model specificity of the optimal model in the disease recognition model based on the image-text multi-mode collaborative representation and knowledge assistance are respectively 99.63%, 99%, 99.07% and 99.78%.

Claims

1. An image-text multi-modal collaborative representation-based disease identification system is characterized by comprising:

the image identification module is used for identifying the image data;

2. The system for disease identification based on image-text multimodal collaborative representation according to claim 1, characterized in that,

the image identification module comprises an image modality unit, and the image modality unit is used for reading the color, the shape and the occurrence position visual characteristics of the lesion.

3. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 1,

the text recognition module comprises a text modal unit and a text feature extraction unit;

4. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 1,

the knowledge graph module comprises an entity identification unit, an attribute relation establishment unit and an attribute value extraction unit;

5. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 1,

the model training module comprises a preprocessing unit, wherein the preprocessing unit is used for reading image mode unit data, re-adjusting the size of an image and acquiring images with uniform sizes; the method comprises the steps of reading text modal unit data, limiting the maximum value of a representation vector of each sentence description text, and obtaining text vectors with uniform length; and the system is also used for reading knowledge map module data, setting the vector dimension represented by each attribute value and acquiring knowledge map text vectors with uniform length.

6. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 5,

the model training module comprises a unified training unit, and the unified training unit is used for training the image recognition module and the text recognition module together to obtain an image recognition model and a text recognition model.

7. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 5,

the model training module comprises a knowledge graph modification unit which uses text modal unit information to link the knowledge graph module and then performs knowledge graph assisted classification.

8. The system for disease recognition based on image-text multi-modal collaborative representation according to claim 5,

the model training module comprises a model testing unit, and the model testing unit is used for centralizing all image-text pairs to be tested according to a testing process to obtain the identification accuracy.