CN113989563A

CN113989563A - Multi-scale multi-label fusion Chinese medicine tongue picture classification method

Info

Publication number: CN113989563A
Application number: CN202111273511.7A
Authority: CN
Inventors: 张明川; 赵凌昊; 徐文萱; 王琳; 郑瑞娟; 冀治航; 宋建强; 朱军龙
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

A multi-scale multi-label fused traditional Chinese medicine tongue picture classification method relates to the technical field of computer vision, applies deep learning theoretical knowledge to tongue picture feature classification, fuses high-level semantic information and low-level detail features by utilizing a feature pyramid network to form a feature map with higher resolution, labels processed tongue picture features and extracts the relevance of labels to obtain a classification result. The invention has the beneficial effects that: the method applies the deep learning theoretical knowledge to tongue picture characteristic classification, increases the resolution and diversity of the characteristics by extracting and fusing the multi-scale characteristics in the characteristic pyramid network, and performs multi-label classification, thereby improving the accuracy and robustness of tongue picture classification.

Description

Multi-scale multi-label fusion Chinese medicine tongue picture classification method

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to a multi-scale multi-label fusion Chinese medicine tongue image classification method.

Background

The automatic classification of the tongue manifestation characteristics of traditional Chinese medicine is the core content of the objectification of tongue diagnosis, and the accuracy of the classification result determines the reliability of subsequent processing and the acceptance degree of practitioners of traditional Chinese medicine. According to the diagnosis principle of the exterior and interior in traditional Chinese medicine, the change of tongue manifestation characteristics reflects the functional pathological changes of human viscera, which is a manifestation of the abundance or insufficiency of qi and blood on the tongue, so that the automatic classification of tongue manifestation becomes the objective research hotspot of tongue diagnosis.

When analyzing tongue picture characteristics by using a computer, the tongue picture characteristics related to physiological functions and pathological changes of the body are obtained from tongue images. The tongue color has small color difference and certain similarity, so the tongue image classification precision is higher and higher. However, most tongue picture classification tasks in the research are set as multi-class (or binary) classification problems with a single label, while few research using multi-label learning have the same effect because the number of labels is small and deep learning techniques are not used. The classification problem in medicine should be multi-output classification from the practical point of view, and multi-label classification is one of multi-output classification.

In the past, most of classification researches aiming at tongue manifestation are to classify each label independently, and the potential dependency relationship among the labels is ignored, so that the potential dependency relationship among the targets can improve the classification effect of multi-label images to a certain extent. Few studies using multiple tags either do not use deep learning techniques or do not fully mine the dependency between the tags, which affects the accuracy of tongue classification.

Disclosure of Invention

The invention aims to solve the technical problem of providing a multi-scale multi-label fusion Chinese medicine tongue image classification method, and solves the problems of low tongue image classification accuracy and the like in the prior art.

The technical scheme adopted by the invention for solving the technical problems is as follows: a multi-scale multi-label fused traditional Chinese medicine tongue image classification method comprises the following steps:

step 1, acquiring a tongue image under the conditions of standard light and fixed professional photographing equipment, preprocessing the acquired tongue image, and constructing an original tongue image data set;

step 2, constructing a multi-scale feature fusion network, extracting features of the input tongue image in a feature pyramid network structure, fusing corresponding layer features in a pyramid model through superposition operation, constructing final output features, and integrating the fused features;

step 3, marking the tongue image characteristics extracted in the step 2 by using a semi-automatic marking method;

step 4, dividing the characteristics marked in the step 3 into different subclass label sets according to different tongue body areas, and then integrating all the subclass label sets into a multi-label data set; for example, the result of the feature sub-label set of the tongue tip region after analysis is about the aspects of heart and lung, and the cracked tongue is mainly distributed in the tongue root and the tongue middle position, so that the cracked tongue only needs to be divided into two regions for analysis: root of tongue and tongue in the middle;

step 5, training a classification model by adopting a multi-label classification method, automatically mining the correlation among labels in the training process of the model, and applying the correlation to the classification model to enable tongue picture classification to be more comprehensive;

step 6, inputting a tongue image photo to be tested, judging the validity of the tongue image photo, and performing the next step if the requirement is met;

and 7, putting the tongue picture to be tested into the trained model for prediction, and outputting a classification result.

The semi-automatic labeling method used in the step 3 of the invention is to adopt Labelme open source image labeling software of Python, firstly manually perform region division and labeling on the approximate structure of the tongue picture through the software, then manually determine the labeling method and range, automatically label the rest pictures by using the software, and finally manually check.

The specific process of training the classification model by adopting the multi-label classification method in the step 5 comprises the following steps: after the tongue picture enters a convolutional neural network, automatically analyzing various sub-label sets and excavating the correlation among small targets so as to analyze various characteristics of the tongue picture; and then continuously adjusting parameters in the training process, optimizing the whole network model and storing the optimal weight information.

The method for judging the validity of the tongue picture to be tested in the step 6 comprises the following steps: and judging whether the ratio of the tongue body in the whole photographic picture meets the requirement or not.

The ratio of the tongue body in the whole tongue image picture is judged by 80 percent, the requirement is met when the ratio is more than or equal to 80 percent, and the requirement is not met when the ratio is less than 80 percent.

The invention has the beneficial effects that: the method applies the deep learning theoretical knowledge to tongue picture characteristic classification, increases the resolution and diversity of the characteristics by extracting and fusing the multi-scale characteristics in the characteristic pyramid network, and performs multi-label classification, thereby improving the accuracy and robustness of tongue picture classification.

Drawings

FIG. 1 is a schematic diagram of tongue image classification structure according to the present invention;

FIG. 2 is a schematic diagram of a feature extraction network according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of tongue region positioning and characterization according to an embodiment of the present invention;

FIG. 4 is a schematic overall flowchart of the tongue image classification method according to the present invention.

Detailed Description

In the invention, a multi-scale multi-label fused Chinese medicine tongue picture classification method is provided. For tongue picture classification, firstly, the resolution of the acquired tongue picture is improved, the recognition degree of a small target is enhanced, the processed tongue picture is subjected to feature extraction, fusion and labeling, and then relevance analysis is carried out, so that more comprehensive multi-label tongue picture classification is realized.

The training part in the invention consists of 2 parts (as shown in figure 1), which are respectively constructed with a characteristic pyramid network and a depth model, and the two parts jointly complete the whole tongue picture classification process of multi-scale multi-label fusion. The feature pyramid network has two main purposes: constructing a tongue picture data set and extracting characteristics (the main process comprises extracting, fusing and re-extracting the characteristics of the tongue picture); the depth model construction mainly comprises two processes: tongue feature labeling and tongue classification.

1. Stage for constructing characteristic pyramid network

Firstly, a depth network is constructed based on a depth convolution characteristic paradigm, and a pyramid strategy is utilized to fuse multi-scale characteristics to construct a deep abstract representation of an input tongue picture.

The feature pyramid network improves the accuracy of tongue picture high-level features (the low-level features are general information easy to express and contain more position and detail information, the high-level features are complex semantic features and global features which are difficult to explain, the resolution is low, and the perception capability of details is poor), a feature map with a larger resolution is formed, and the situation that a small target is ignored due to too many downsampling of a deep convolution network is avoided.

The whole process of constructing the feature pyramid network is as follows:

(1) a tongue picture data set is constructed. The tongue image is collected under standard light, so that the shot picture has higher definition, the tongue body is relaxed as much as possible, the tongue surface is horizontally displayed, and the tongue body is fully exposed. The acquired tongue image is preprocessed. The method specifically comprises the following steps: and correcting the deflection of the tongue body, removing redundancy of the tongue root part, and constructing a tongue picture data set by extracting the tongue body structure according to the division of the tongue tip and the two sides of the tongue in the tongue body.

(2) And extracting the characteristic information of the tongue picture by using a characteristic pyramid network structure. After the picture is continuously convoluted, the size of the feature map becomes small, semantic information becomes rich, but the resolution of the feature map is reduced, so that the detection of small targets becomes difficult. Because there are many small targets in the tongue image, and a large number of small targets need to be detected and analyzed in the tongue image, a characteristic pyramid network is adopted, a series of characteristic graphs from large to small, from low layer to high layer, from high resolution to low resolution are obtained after a series of convolutions, and the high-layer characteristic graphs can be restored back step by upsampling, so that the size of the characteristic graph is increased under the condition that high-layer semantic information is not lost to a certain extent, and then the small targets are detected, thereby solving the problem that the small targets are difficult to detect, namely equivalently improving the resolution of the high-layer characteristic graph.

(3) And building a basic characteristic fusion network. And constructing a multi-scale feature fusion network by stacking a plurality of convolution layers and sampling layers. In the feature pyramid network, a feature extraction network is shown in fig. 2. The feature layer with large scale (such as p 2) has lower level, high resolution and rich color features, and the size is reduced by downsampling to make the size of the feature layer the same as that of p 3; the feature layer with small scale (such as p 4) is high in level and low in resolution, the size of the feature layer is the same as that of p3 through upsampling, and feature points at the corresponding positions of the feature layer and the feature point are added to generate a feature map with the same size as that of p 3. The tongue image feature maps are fused and overlapped, corresponding layer features in the pyramid model are fused through overlapping operation, final output features are constructed, feature maps with different sizes can be mutually overlapped through sampling, and feature maps with larger resolution are formed.

In order to solve the problem that the identification of small targets in multi-label image classification is easy to lose, a basic feature fusion network uses a feature pyramid network as a basic feature extraction network, and uses a fused p3 layer as a final feature layer.

2. Stage of building depth model

The method comprises the steps of constructing a depth model mainly comprising a tongue picture feature labeling architecture and a multi-label tongue picture feature classification model, obtaining a multi-label data set for a tongue picture feature map after fusion by adopting a semi-automatic labeling method, and training the classification model by utilizing a multi-label classification network to realize the tongue picture multi-label classification problem.

(1) Tongue image labeling architecture

In the fused tongue feature images, medical personnel individually label each label of several groups of sample tongue images, and mainly integrate the unique pathological information represented by each fixed region of the tongue based on professional medical knowledge; each label corresponds to physiological and pathological information of a corresponding area of the tongue, the quantity of various information of each label is ensured to be balanced as much as possible, and the labeling mode is image-level weak supervision labeling; then selecting a simple automatic tongue image labeling tool (adopting Labelme open source image labeling software of Python) and automatically labeling according to the method and range of the previous labeling of medical personnel; the feature information labels represented by the concerned tongue region are combined into one large label (for example, in fig. 3, all labels in the heart and lung regions are combined into one large label), so as to obtain a multi-label data set, and finally, the medical professional performs review. The tongue corresponding region positioning and features are shown in fig. 3.

(2) Multi-label tongue picture characteristic classification model

Unlike the traditional classification problem, multi-label classification is a more complex classification task, and each sample can belong to one or more classes simultaneously. If multi-label learning is adopted to solve the multi-label classification problem, the tongue picture can be classified more accurately and more comprehensively. The traditional tongue picture classification is often the output of single attribute and single label, and a plurality of characteristics can be simultaneously detected and output by the method provided by the invention. For example, a tongue image to be predicted is input, and the output prediction result is: the red tongue has white coating, smooth body fluid and no cracks, and the tongue area shows various characteristics such as vigorous heart fire, deficiency of spleen and stomach, and the like.

A multi-label learning convolutional neural network model method is adopted to train a characteristic information model contained in the labels of each area of the tongue, the residual samples of each label are deduced to obtain a multi-label classification method training classification model, the correlation among the labels is automatically mined, and the multi-label training classification model is effectively applied to the classification model, so that tongue picture classification is more accurate.

Claims

1. A multi-scale multi-label fused traditional Chinese medicine tongue image classification method is characterized by comprising the following steps:

step 4, dividing the characteristics marked in the step 3 into different subclass label sets according to different tongue body areas, and then integrating all the subclass label sets into a multi-label data set;

2. The method for classifying tongue images in multi-scale and multi-label fusion in traditional Chinese medicine according to claim 1, wherein the semi-automatic labeling method used in step 3 is Labelme open source image labeling software of Python, wherein the general structure of tongue images is manually partitioned and labeled through software, then the labeling method and range are manually determined, the remaining images are automatically labeled through software, and finally, the images are manually checked.

3. The method for classifying tongue images of multi-scale and multi-label fusion traditional Chinese medicine according to claim 1, wherein the specific process of training the classification model by using the multi-label classification method in the step 5 is as follows: after the tongue picture enters a convolutional neural network, automatically analyzing various sub-label sets and excavating the correlation among small targets so as to analyze various characteristics of the tongue picture; and then continuously adjusting parameters in the training process, optimizing the whole network model and storing the optimal weight information.

4. The method for classifying tongue images of multi-scale and multi-label fusion in traditional Chinese medicine according to claim 1, wherein the method for judging the validity of the tongue image photo to be tested in step 6 comprises: and judging whether the ratio of the tongue body in the whole photographic picture meets the requirement or not.

5. The method for classifying the tongue images of the multi-scale and multi-label fusion traditional Chinese medicine as claimed in claim 4, wherein the ratio of the tongue body in the whole tongue image picture is determined by 80%, and if the ratio is greater than or equal to 80%, the requirement is met, and if the ratio is less than 80, the requirement is not met.