CN107610087B

CN107610087B - Tongue coating automatic segmentation method based on deep learning

Info

Publication number: CN107610087B
Application number: CN201710338958.5A
Authority: CN
Inventors: 文贵华; 曾海彬; 马佳炯
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2017-05-15
Filing date: 2017-05-15
Publication date: 2020-04-28
Anticipated expiration: 2037-05-15
Also published as: CN107610087A

Abstract

The invention discloses a tongue coating automatic segmentation method based on deep learning, which comprises the following steps: s1, collecting and inputting an image containing the tongue coating; s2, detecting the tongue coating of the image containing the tongue coating by adopting a Faster R-CNN deep learning method, and automatically obtaining a preliminary tongue coating area image; s3, calibrating the preliminary tongue fur area image by adopting a VGG deep learning method to obtain a more accurate tongue fur area image; and S4, automatically segmenting the tongue fur image according to the calibrated tongue fur area image. The method realizes more accurate tongue fur segmentation based on the deep learning method of big data, and solves the problem of low tongue fur segmentation accuracy of the existing method.

Description

Tongue coating automatic segmentation method based on deep learning

Technical Field

The invention relates to application of computer vision, image processing and artificial intelligence in the field of traditional Chinese medicine health, in particular to a tongue coating automatic segmentation method based on deep learning.

Background

The tongue coating contains a large amount of human body constitution information, which means that the human body constitution types can be objectively judged by observing the tongue coating, but the tongue coating needs abundant professional experience of Chinese medicine experts and is more difficult for general doctors and ordinary people, so that the realization of automatic analysis of the tongue coating by adopting an artificial intelligence technology is very important, but the realization of automatic segmentation of the tongue coating from a facial image containing the tongue coating is realized on the premise of automation, and the current segmentation accuracy is low.

Tongue coating segmentation is the detection of the tongue body region and the segmentation of the tongue body, and is an important prerequisite for the extraction and analysis of tongue picture characteristics. The method is generally implemented by using common image segmentation algorithms, including threshold segmentation, spatial clustering, region growing, edge detection, contour tracking and the like. Early studies were mainly based on image underlying information, for example, Zhaozaixu et al proposed tongue image segmentation algorithms based on mathematical morphology and HIS models, Liuguan Song et al proposed tongue image automatic segmentation methods based on luminance information and morphological features, and so on. Recently, mainly using a Snake model and deformation of the Snake model, such as martial star, an algorithm based on double-layer polar coordinate edge detection is proposed to obtain a rough edge of the tongue body, then the Snake model is used to correct the rough edge of the tongue body according to local details to obtain an accurate tongue body edge, and Sundalin uses an improved Snake model to extract a tongue body region. In addition, Zhang Shing proposes an improved wavelet transform method to detect the tongue edge, and then he proposes an improved non-parametric active contour model to extract the tongue. However, at present, some deep learning methods with good effects are not adopted to realize tongue coating detection and segmentation, so the effect is poor. The invention adopts a Deep learning method, in particular to a method of fast R-CNN (Ren S, He K, Girshick R, actual. fast R-CNN: Towards read-Time Object Detection with Region ProposalsalNetworks. IEEE Transactions on Pattern Analysis and Machine Analysis, 2015) and VGG (Simony K, Zisserman A. Very Deep Collection. computer Science, 2014) to realize the automatic Detection and segmentation of the tongue coating.

Disclosure of Invention

In order to overcome the defects and shortcomings of the prior art, the invention provides the tongue coating automatic segmentation method based on deep learning, which realizes more accurate tongue coating segmentation based on the deep learning method of big data and solves the problem of low accuracy rate of tongue coating segmentation of the prior art.

In order to solve the technical problems, the invention provides the following technical scheme: a tongue coating automatic segmentation method based on deep learning comprises the following steps:

s1, collecting and inputting an image containing the tongue coating;

s2, detecting the tongue coating of the image containing the tongue coating by adopting a Faster R-CNN deep learning method, and automatically obtaining a preliminary tongue coating area image;

s3, calibrating the preliminary tongue fur area image by adopting a VGG deep learning method to obtain a more accurate tongue fur area image;

and S4, automatically segmenting the tongue fur image according to the calibrated tongue fur area image.

Further, in step S1, an image including the tongue coating is acquired by a smart mobile device or a camera or a computer.

Further, in the step S2, the fast R-CNN deep learning method requires a fast R-CNN model structure, which includes a convolutional neural network, RPN, R_oI pooling and CNET.

Further, in step S3, specifically, the method includes:

s31, adjusting the size of the preliminary tongue fur area image, carrying out mean value removing processing, and transmitting the image into a calibration network to obtain a deviation category;

s32, obtaining corresponding x according to deviation type_nDeviation, y_nDeviation and scaling s_n；

S33, according to x_nDeviation, y_nDeviation and scaling s_nAnd adjusting a tongue coating area by using inverse operation, wherein the area is a final tongue coating detection area, and the inverse operation formula is as follows:

wherein, (x, y, w, h) is the top left x coordinate, top left y coordinate, width and height of the real tongue coating region; wherein:

s_n∈{0.83，0.91，1.0，1.10，1.21}

x_n∈{-0.17，0，0.17}

y_n∈{-0.17，0，0.17}。

further, in step S4, specifically, the method includes: and calculating coordinates of four vertexes of the calibrated tongue coating area image, connecting the four vertexes, and segmenting areas in the four vertexes to obtain a final tongue coating image.

After the technical scheme is adopted, the invention at least has the following beneficial effects:

1. the invention has high accuracy for cutting the tongue coating;

2. the invention adopts a deep learning method of big data, has low requirement on the quality of the input image, can adopt a smart phone and the like to take pictures, and has wide application range.

Drawings

FIG. 1 is a flowchart illustrating the steps of a method for automatically segmenting a tongue coating based on deep learning according to the present invention;

FIG. 2 is a schematic structural diagram of a fast R-CNN model of a fast R-CNN deep learning method in the tongue coating automatic segmentation method based on deep learning according to the present invention;

FIG. 3 is a schematic diagram of the structure of the front 13 layers of the VGG-16 model of the fast R-CNN deep learning method in the tongue coating automatic segmentation method based on deep learning according to the present invention;

FIG. 4 is a schematic diagram of an RPN structure of the fast R-CNN deep learning method in the tongue coating automatic segmentation method based on deep learning according to the present invention;

FIG. 5 is a schematic diagram of a calibration network structure of a VGG deep learning method in the tongue fur automatic segmentation method based on deep learning according to the present invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present application is further described in detail with reference to the drawings and specific embodiments.

As shown in FIG. 1, the present invention provides a method for automatically segmenting tongue coating based on deep learning, which comprises five steps:

acquiring a tongue fur image through a camera and the like to be used as an input tongue fur image;

step two, adopting a deep learning method Faster R-CNN to realize tongue coating detection and automatically obtaining a preliminary tongue coating area image;

automatically calibrating the tongue fur area image by adopting a deep learning method VGG-16 to obtain a more accurate tongue fur area image;

and (IV) automatically segmenting the tongue fur area image according to the calibrated tongue fur area image.

Each step is described in detail below.

Acquiring a tongue fur image through a camera and the like as an input tongue fur image: the tongue fur image can be collected by adopting a smart phone, a tablet personal computer, a camera and the like, and is directly input to the step (two) as an input image without any pretreatment.

Step two, tongue coating detection: tongue coating detection is achieved by using a deep learning method, Faster R-CNN.

2.1, fast R-CNN model structure:

the structure of the Faster R-CNN model is shown in FIG. 2, and mainly comprises a Convolutional Neural Network (CNN), RPN (RegionProposal network), RoI (Region of interest) pooling and CNET.

2.1.1, convolutional neural network:

the convolutional neural network is the most common deep neural network in the field of image recognition and is mainly used for extracting image features. The implementation case adopts one of the models: the structure of the front 13 layers (i.e. the part with the full connection layer removed) of the VGG-16 model is shown in fig. 3, and mainly involves operations such as convolution, ReLU, and pooling (Max pooling).

2.1.2、RPN：

The RPN realizes the generation of the candidate region on the basis of the shared convolutional neural network, namely the learning of attention. The method aims to replace a common candidate region algorithm in target detection and improve algorithm speed. The structure of the RPN is shown in fig. 4. RPN is to generate a full-link Layer (interface Layer) with a length of 256 dimensions with a sliding window (sliding window) on a convolutional feature map (conv feature map) extracted by a convolutional neural network, and then to generate two full-link Layer branches after the full-link feature, namely clslayer and reglayer. clslayer implements a binary task to determine whether the current region is foreground or background. The reglayer is a regression task for predicting the coordinates (x, y) and width and height (w, h) of the candidate region corresponding to the central anchor point (Anchors) of the current region. The concept of anchor points (Anchors) and the computation of loss functions (lossfunction) need to be understood with emphasis within the RPN structure.

Anchor points (Anchors) are located at the center of the sliding window. An anchor point may generate k different candidate regions with different scaling (Scale), different aspect ratios (aspect), so that k candidate regions may be predicted for the same sliding window. In a specific implementation of this embodiment, 4 scaling ratios and 3 aspect ratios are set, as shown in table 1, so that there are 12 candidate regions in total. For a convolution mapping of size W × H, W × H × k anchor points can be obtained by 3 × 3 sliding window (step size 1).

The loss function in the RPN includes two parts, which are a loss function of the classification task and a loss function of the regression task, respectively, corresponding to clslayer and reglayer. The loss function of the classification task adopts cross entropy loss to classify whether the candidate area is foreground or background, and the formula is shown in (1). The loss function of the regression task uses SmoothL1 loss to predict the coordinates and width and height of the target area, and the formula is shown in (2). The loss function of the RPN part is a combination of the loss function of the classification task and the loss function of the regression task according to a certain weight, see formula (3). Wherein { p_iAnd t_iAre the outputs of clslayer and reglayer, respectively, with the input samples as foreground

Input sample as background

λ is set to 10 for the correct target area.

TABLE 1 Anchor scaling ratio, aspect ratio specific settings

L_cls(x，class)＝--X[class]+ln(∑_je^x[j]) (1)

Where x is a one-dimensional vector of size n, containing the values of the scores for n classes, class being the target class.

Where x is the neural network output value and y is the target value.

2.1.3, RoI pooling:

the RoI pooling is a simplified version of spp (spatial pyramid Pooling), and enables sampling of different sized inputs, so that the resulting outputs are of the same size. The SPP can determine the size of the finally obtained feature vector first, and in turn, the size and step size of the pooling kernel. The convolution signature maps to a size of W × H × D, and the SPP sets scales of 4 × 4, 2 × 2, and 1 × 1. For example, a 4 × 4 scale may determine a ceil (W/4) × ceil (H/4) size pooling kernel, and pooling operations with a step size of floor (W/4) and floor (H/4) may be obtained. RoIploling is a one-level SPP, using only one scale, set to 6 × 6 output.

2.1.4、CNET：

The CNET part mainly consists of Box regression and Classifier. The Box regression is composed of a full connection layer, a batch normalization layer and a Dropout layer and is used for predicting the position of the target region. The Classifier is a neural network Classifier composed of a full-link layer, a batch normalization layer, a Dropout layer, and a Softmax layer, and is used for identifying a target class. Boxregression is a regression task, and a SmoothL1 loss function is used; the Classifier is a classification task, using a cross-entropy loss function. The loss function of the entire CNET is a weighted combination of the Box regression loss function and the Classifier loss function.

2.2, training process of fast R-CNN:

the deep neural network training is mainly divided into two processes of forward propagation and backward propagation. The training of the neural network is to obtain output through the forward propagation of the neural network, then calculate the error between the output and the label through a loss function, and continuously update the neural network parameters through the reverse transfer error. The Faster R-CNN is a complex deep neural network, and the fast R-CNN paper indicates that the effect obtained by directly training the whole network is not good enough, and the fast R-CNN needs to be divided into small parts for gradual training. The embodiment pre-trains the convolutional neural network part of the Faster R-CNN by a method of training the tongue coating classifier. The parameters of the pre-trained convolutional neural network portion are then fixed, training the RPN. Finally, the parameters of the whole Faster R-CNN are finely adjusted through the loss functions of Box regression and Classifier.

2.2.1, pre-training of convolutional neural networks:

the structure of the convolutional neural network part in FasterR-CNN uses the first thirteen-layer structure of the VGG-16 model, and the main purpose is to extract the tongue fur image features. A two-classifier is trained by using a VGG-16 model structure to realize the classification of tongue coating and non-tongue coating. The VGG-16 model parameter configuration is shown in Table 2, since the ReLU does not require additional configuration parameters, the ReLU layer is omitted from this table. The convolutional neural network part in the Faster R-CNN uses the parameters of the first thirteen layers of the trained VGG-16 model, so that the Faster R-CNN can better extract the tongue fur image characteristics. I.e., the convolutional neural network portion of Faster R-CNN is pre-trained using a large number of images of tongue coating and non-tongue coating.

TABLE 2 VGG-16 model parameter configuration Table

5683 images are recorded in the tongue coating detection database used in the embodiment, and the real tongue coating area information is marked. The pre-training of the convolutional neural network portion of fast R-CNN is performed in this embodiment to improve the ability of the convolutional neural network portion to extract the tongue coating image features, requiring a large number of tongue coating images and non-tongue coating images. In this embodiment, 5 positive examples of tongue coating images and 5 negative examples of tongue coating images (i.e., non-tongue coating images) are randomly extracted from each tongue coating image marked with tongue coating region information, and a total of 56830 images are used as a pre-training data set. The selection of the positive and negative examples of the tongue coating image is determined according to the IoU (interaction over Unit) value of the labeled area. The IoU value is the ratio of the overlapping area of two regions to their total area. In this embodiment, regions are randomly generated in the tongue coating image, wherein the regions with IoU values greater than 0.6 are positive examples, and the regions with IoU values less than 0.2 and the areas not less than half of the area of the correct regions are negative examples. Since the VGG-16 model requires the input image to be 224 × 224 in size, all positive and negative example images need to be resized to 224 × 224. The pre-training data set consisted of 56830 data sets, 80% of which were training sets and 20% of which were testing sets. And calculating the mean value and the variance of the training set, performing mean value removing processing on an input layer of the neural network, centralizing all dimensions of input data to 0, and accelerating convergence of the neural network. Furthermore, to increase the number of training samples, the training samples are flipped horizontally with a 50% probability. The implementation case is realized by using a Torch deep learning framework, and the training parameters are shown in Table 3.

TABLE 3 Pre-training related parameters Table

2.2.2 training of RPN:

the RPN is intended to enable the generation of candidate regions. In the above description, the network structure of RPN is introduced, and a full-connected feature with a length of 256 dimensions is generated by using a sliding window on the convolution feature map extracted by the convolution neural network. In a specific implementation, due to the factors of program implementation, the RPN structure is slightly different from the RPN structure described above, but the overall function and concept thereof are consistent. In this embodiment, the minimum side of the image needs to be fixed at 480 pixels, and four anchor point regions are set to 48 sizes respectively²、96²、192²、384². Corresponding to the size of each anchor point region, an RPN is respectively designed, and each RPN recommends 3 regions for each anchor point according to three aspect ratios (1: 1, 2: 1, 1: 2), so that the function of recommending 12 regions for each anchor point in the RPN is realized. Considering the difference in anchor region size, the sliding window sizes used by the 4 RPNs are also different, and 3 × 3, 5 × 5, and 7 × 7 are used, respectively. Due to the characteristics of the convolutional neural network, the deeper the extracted features, the more abstract the represented information can be, and the more abstract the information can beSome detail information can be lost, and the smaller the anchor point area is, the more detail information it needs, so in this experiment, the size of the anchor point area is 48²The corresponding RPN uses the feature mapping of the 10 th layer of the convolutional neural network, and the feature mapping of the last layer is used for the rest.

The goal of RPN training is to classify the fully connected features corresponding to each anchor point and further regress the target coordinates corresponding to each anchor point. The target of training is the anchor point area, and the required labeling information is the area category (whether the area feature belongs to the foreground or the background) and the target coordinate. In the experiment, the minimum edge of the tongue fur image is fixed to be 480 pixels, the anchor point area is generated according to the sizes of the four anchor point areas and the three aspect ratios to serve as a training sample of the experiment, if the IoU value of the area and the correct area is larger than 0.6, the area is marked as a foreground area, and if the IoU value is smaller than 0.2, the area is marked as a background area. Selecting the ratio of the number of the foreground areas to the number of the background areas to be 1: 1.

TABLE 4 RPN training-related parameters

The convolutional neural network part of the embodiment uses model parameters obtained through pre-training, the parameters are fixed and are not changed, and only the parameters of the RPN part are adjusted. The experimental environment is consistent with the pre-training experiment, and the training parameters are shown in table 4. The tongue fur image is subjected to convolutional neural network to obtain feature mapping, the RPN scans the feature mapping by using a sliding window to obtain full-connection features corresponding to each anchor point region, then the features corresponding to the anchor point regions in a training sample are selected and are transmitted into a cls layer and a reglaer, clslayer is classified into 1 for the features of the foreground region, a target region is further regressed, the position of a recommended region is corrected, clslayer is classified into 0 for the features of the background region, and learning of a regression task is not performed.

2.2.3 training Fine tuning of Faster R-CNN:

parameters of a convolutional neural network part and an RPN part can be obtained through pre-training and training of the RPN, and then a Classifier and a Boxregression are added to readjust the parameters of the whole Faster R-CNN. The training process of the Faster R-CNN mainly comprises the following steps:

(1) data preprocessing and generation of anchor point areas. The Faster R-CNN training dataset is the tongue coating detection database. And calculating the mean and variance of the training set for mean removal processing. The size of the tongue coating image is adjusted to 480, the minimum side. Horizontal flipping is done with 0.25 probability for adding training samples. And generating anchor point areas according to the sizes of the four anchor point areas and the three aspect ratios, and selecting a foreground area and a background area in a ratio of 1: 1 according to the IoU value of the marked target area.

(2) And extracting the tongue fur image characteristics by the convolutional neural network. And transmitting the preprocessed tongue fur image into a convolutional neural network, extracting the image characteristics of the tongue fur image, and generating corresponding characteristic mapping.

(3) The RPN generates a candidate region. Using the feature mapping generated in the step (2) of RPN sliding window scanning to generate the features of the anchor point region, selecting the features of the foreground region and the background region in the step (1), transmitting the features to clslayer and reglayer, and calculating softmax of clslayer (marked as L)_pcls) And Reglaber L1 loss (denoted L)_preg). For the foreground area, a candidate area is generated by using a reglayer; for background regions, anchor regions are used as candidate regions.

(4) RoIploling generates a full-link feature of fixed size. And (3) selecting the feature mapping corresponding to the candidate region generated in the step (2) from the feature mapping generated in the step (3), and generating the full-link feature with a fixed size through RoIploling. The problem of inconsistent feature sizes caused by inconsistent picture sizes is solved.

(5) Identification of the tongue coating and prediction of coordinates of the tongue coating area. The full-connection characteristic is transmitted to the Box regression and Classifier parts, and the L1 loss (marked as L) of the Box regression is calculated_creg) And the negtivelog-likelihood loss of Classifier (denoted as L)_ccls)。

(6) And (5) performing back propagation and updating the parameters. At 10L_creg+L_cclsTotal loss as part of CNET, denoted L_cnetAnd updating the parameters of the CNET part. At 10L_preg+L_pclsTotal loss as part of RPN, denoted as L_rpnAnd updating the parameters of the RPN part. Finally, by L_cnet+L_rpnThe loss, which is part of the convolutional neural network, updates its parameters.

(7) And (4) repeating the steps (1) to (6) until convergence or the maximum iteration number is reached.

TABLE 5 FasterR-CNN training Fine tuning parameters

The implementation case is realized by a Torch deep learning framework, and the training parameters are shown in a table 5.

2.3, detection process of fast R-CNN:

the calculation of the Faster R-CNN in the detection stage and the training stage is slightly different, and the method mainly comprises the following steps:

(1) and (4) preprocessing data. The same pre-processing is performed as in the training phase, and the mean and variance of the training set are also used for mean removal.

(2) And extracting the tongue fur image characteristics by the convolutional neural network. The preprocessed tongue fur image is transmitted into a convolution neural network to generate feature mapping.

(3) The RPN generates a candidate region. And (3) generating anchor point region characteristics by using the characteristic mapping generated in the step (2) of RPN sliding window scanning, and for the anchor point characteristics of which clslayer is judged as a foreground region and the confidence coefficient is more than 0.95, predicting a corresponding tongue coating region by using a reglayer and listing the tongue coating region as a candidate region.

(4) And screening candidate regions. The redundant candidate regions are eliminated using a Non-maximum suppression (Non-maximum suppression) method, and the best candidate region is found. And (3) non-maximum suppression, namely filtering out non-maximum, always selecting a candidate region with the maximum foreground confidence, then filtering out other candidate regions which are larger than 0.25 of the candidate region IoU, and repeating the actions in the remaining other candidate regions to screen out all candidate regions with high confidence.

(5) RoIploling generates a full-link feature of fixed size. And (3) selecting a feature map corresponding to the screened candidate region from the feature map generated in the step (2), and generating a full-link feature with a fixed size through RoIploling.

(6) The full-junction features are passed to the Classider and Box regression, the tongue coating is identified and tongue coating regions are predicted.

(7) The predicted tongue coating area was screened using a non-maxima suppression method.

Step three, tongue coating calibration:

the fast R-CNN model is utilized to realize the preliminary tongue coating detection function, a deep neural network is designed to realize the further calibration of the tongue coating position, the network is called as a calibration network, and the accuracy of tongue coating detection is further improved.

3.1, structure of calibration network:

the deep neural network and the deep learning are utilized to solve the practical problem, and the first is to determine the learning task target of the neural network and determine the model structure of the neural network. For deep neural network and deep learning, two learning tasks are mainly provided, namely a classification task and a regression task. The problem to be solved in this section is the calibration of the position of the tongue coating, and the predicted position of the tongue coating can be adjusted on the assumption that the deviation between the predicted position of the tongue coating and the actual position of the tongue coating can be known. Namely, the tongue fur position calibration problem is regarded as a classification problem, the deviation from the real tongue fur position is divided into a plurality of categories, the deviation category to which the predicted tongue fur position belongs is checked, and then the tongue fur position is adjusted according to the deviation category, so that the tongue fur position calibration is realized.

Since it is a classification problem, the classification category must be specified. In this embodiment, x will be related to the actual tongue coating position_nOffset, y_nOffset and scaling s_nThe detected positions of the tongue coating are classified into 45 categories, as shown in formula (5-1).

The model structure of the VGG-16 is used to implement the calibration network, and is shown in fig. 5.

3.2, tongue coating calibration database:

the tongue coating calibration database is constructed for tongue coating position calibration on the basis of the tongue coating detection database. The tongue coating area can be obtained from the tongue coating detection database, and the tongue coating calibration database generates a new area from the tongue coating area according to the deviation category, intercepts a new area image and marks the category of the image as the deviation category. The tongue coating calibration database is constructed for the purpose of enabling the deep neural network to learn tongue coating deviation images in the tongue coating calibration database, judging deviation categories of the tongue coating areas, and adjusting the tongue coating areas according to the deviation categories, so that the aim of tongue coating calibration is fulfilled. The tongue coating calibration method is inspired by the thesis of human face area calibration. The deviation categories comprise scaling proportion, deviation in the x direction and deviation in the y direction, and are specifically set in formula (1), so that 45 deviation categories can be formed. The generation mode of the deviation type area is shown in formula (2), wherein (x, y, w, h) is the upper left x coordinate, the upper left y coordinate, the width and the height of the real tongue coating area. By using the inverse operation of the formula (2), the tongue coating area can be adjusted through the deviation type, so as to achieve the purpose of tongue coating calibration. And (3) generating 45 areas in each picture of the tongue coating detection database according to the steps (1) and (2), intercepting and storing the areas, so that 45 new tongue coating images are generated in each tongue coating image in the tongue coating detection database, the labeling types of the new tongue coating images are deviation types, and the new images form a tongue coating calibration database.

3.3, training of a calibration network:

the implementation case needs to train a classifier of tongue coating deviation category, and the training process is as follows:

(1) and (4) preprocessing data. The data preprocessing stage mainly carries out the averaging operation and the data increment operation which is overturned at the probability level of 0.5.

(2) And (4) forward propagation. And transmitting the preprocessed training set images into a calibration network, and finally calculating 45 classes of softmax after operations such as convolution, ReLU, pooling, full connection and the like.

(3) And is propagated in the reverse direction. And (5) propagating softmaxloss in the backward direction and updating network parameters.

(4) And repeating the iteration. And (4) repeating the operations (1) to (3) until convergence or the maximum iteration number is reached.

(5) Every few iterations, the test set is used to verify the effectiveness of the network model, looking at the error rate on the test set.

The training parameters for this embodiment are shown in table 6, and the non-fully connected part of the calibration network is initialized using the VGG-16 model trained using ImageNet images, still using transfer learning.

TABLE 6 calibration of network training parameters

3.4, calibration detection of a calibration network:

the detection process of the calibration network comprises the following steps:

(1) and (4) adjusting the size of the tongue coating area image initially detected in the step (two) to 224 multiplied by 224.

(2) And after mean value removal processing, transmitting the data into a calibration network to obtain a deviation category.

(3) Deriving corresponding x by deviation class_nDeviation, y_nDeviation and scaling s_n。

(4) And (5) adjusting the tongue coating area by using the inverse operation of the formula (5-1), wherein the area is the final tongue coating detection area.

Step (IV), segmenting a tongue coating area image according to the coordinates after tongue coating calibration:

the steps are very simple, and the coordinates of four top points of the initial tongue coating area are calculated directly according to the coordinates after the tongue coating is calibrated and then are divided.

The effect of this implementation case:

the Faster R-CNN training data set is a tongue coating detection database, 5683 sheets in total, and is divided into a training set and a test set in a ratio of 4: 1. The tongue coating detection model after 50000 times of iterative training on the training set needs to evaluate the detection effect on the test set. The test set consisted of 1138 tongue coating images, each containing only one tongue coating. For further analysis of the detection effect, the detection results on the test set are classified into the following categories: correct means that the classification is accurate, and IoU between a prediction box and a true value is more than 0.5; localization means accurate classification, IoU is between 0.1 and 0.5; background means that the classification is accurate, but IoU is less than 0.1; other refers to misclassification.

The primary detection results of Correct and Localization on the test set account for the vast majority of results, and the model can basically detect the position of the tongue coating, wherein the detection accuracy of Correct is 78%, the Localization category still accounts for 19%, and the detection accuracy can be further improved through tongue coating calibration.

And then, automatically calibrating the tongue coating area detected in the step (II) by adopting a tongue coating deviation category classifier realized by a calibration network. The results of the calibrated tongue coating test on the test set show that: the Correct result is improved to 91% from the original 78%, and the Localization result is reduced to 7% from the original 19%, which shows that the calibration network effectively improves the tongue coating detection effect and verifies the effectiveness of the method.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A tongue coating automatic segmentation method based on deep learning is characterized by comprising the following steps:

s1, collecting and inputting an image containing the tongue coating;

s4, automatically segmenting the tongue fur image according to the calibrated tongue fur area image;

the step S3 specifically includes:

s_n∈{0.83,0.91,1.0,1.10,1.21}

x_n∈{-0.17,0,0.17}

y_n∈{-0.17,0,0.17}。

2. the method for automatically segmenting the tongue coating based on the deep learning of claim 1, wherein the image containing the tongue coating is collected in the step S1 and is collected by a smart mobile device or a camera or a computer.

3. The method for automatically segmenting tongue coating based on deep learning of claim 1, wherein the fast R-CNN deep learning method in step S2 requires constructing a fast R-CNN model structure, and the fast R-CNN model structure includes convolutional neural network, RPN, RoI pooling and CNET.

4. The method for automatically segmenting the tongue coating based on the deep learning as claimed in claim 1, wherein the step S4 specifically comprises: and calculating coordinates of four vertexes of the calibrated tongue coating area image, connecting the four vertexes, and segmenting areas in the four vertexes to obtain a final tongue coating image.