CN107610087B - Tongue coating automatic segmentation method based on deep learning - Google Patents

Tongue coating automatic segmentation method based on deep learning Download PDF

Info

Publication number
CN107610087B
CN107610087B CN201710338958.5A CN201710338958A CN107610087B CN 107610087 B CN107610087 B CN 107610087B CN 201710338958 A CN201710338958 A CN 201710338958A CN 107610087 B CN107610087 B CN 107610087B
Authority
CN
China
Prior art keywords
tongue
tongue coating
image
deep learning
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710338958.5A
Other languages
Chinese (zh)
Other versions
CN107610087A (en
Inventor
文贵华
曾海彬
马佳炯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201710338958.5A priority Critical patent/CN107610087B/en
Publication of CN107610087A publication Critical patent/CN107610087A/en
Application granted granted Critical
Publication of CN107610087B publication Critical patent/CN107610087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a tongue coating automatic segmentation method based on deep learning, which comprises the following steps: s1, collecting and inputting an image containing the tongue coating; s2, detecting the tongue coating of the image containing the tongue coating by adopting a Faster R-CNN deep learning method, and automatically obtaining a preliminary tongue coating area image; s3, calibrating the preliminary tongue fur area image by adopting a VGG deep learning method to obtain a more accurate tongue fur area image; and S4, automatically segmenting the tongue fur image according to the calibrated tongue fur area image. The method realizes more accurate tongue fur segmentation based on the deep learning method of big data, and solves the problem of low tongue fur segmentation accuracy of the existing method.

Description

Tongue coating automatic segmentation method based on deep learning
Technical Field
The invention relates to application of computer vision, image processing and artificial intelligence in the field of traditional Chinese medicine health, in particular to a tongue coating automatic segmentation method based on deep learning.
Background
The tongue coating contains a large amount of human body constitution information, which means that the human body constitution types can be objectively judged by observing the tongue coating, but the tongue coating needs abundant professional experience of Chinese medicine experts and is more difficult for general doctors and ordinary people, so that the realization of automatic analysis of the tongue coating by adopting an artificial intelligence technology is very important, but the realization of automatic segmentation of the tongue coating from a facial image containing the tongue coating is realized on the premise of automation, and the current segmentation accuracy is low.
Tongue coating segmentation is the detection of the tongue body region and the segmentation of the tongue body, and is an important prerequisite for the extraction and analysis of tongue picture characteristics. The method is generally implemented by using common image segmentation algorithms, including threshold segmentation, spatial clustering, region growing, edge detection, contour tracking and the like. Early studies were mainly based on image underlying information, for example, Zhaozaixu et al proposed tongue image segmentation algorithms based on mathematical morphology and HIS models, Liuguan Song et al proposed tongue image automatic segmentation methods based on luminance information and morphological features, and so on. Recently, mainly using a Snake model and deformation of the Snake model, such as martial star, an algorithm based on double-layer polar coordinate edge detection is proposed to obtain a rough edge of the tongue body, then the Snake model is used to correct the rough edge of the tongue body according to local details to obtain an accurate tongue body edge, and Sundalin uses an improved Snake model to extract a tongue body region. In addition, Zhang Shing proposes an improved wavelet transform method to detect the tongue edge, and then he proposes an improved non-parametric active contour model to extract the tongue. However, at present, some deep learning methods with good effects are not adopted to realize tongue coating detection and segmentation, so the effect is poor. The invention adopts a Deep learning method, in particular to a method of fast R-CNN (Ren S, He K, Girshick R, actual. fast R-CNN: Towards read-Time Object Detection with Region ProposalsalNetworks. IEEE Transactions on Pattern Analysis and Machine Analysis, 2015) and VGG (Simony K, Zisserman A. Very Deep Collection. computer Science, 2014) to realize the automatic Detection and segmentation of the tongue coating.
Disclosure of Invention
In order to overcome the defects and shortcomings of the prior art, the invention provides the tongue coating automatic segmentation method based on deep learning, which realizes more accurate tongue coating segmentation based on the deep learning method of big data and solves the problem of low accuracy rate of tongue coating segmentation of the prior art.
In order to solve the technical problems, the invention provides the following technical scheme: a tongue coating automatic segmentation method based on deep learning comprises the following steps:
s1, collecting and inputting an image containing the tongue coating;
s2, detecting the tongue coating of the image containing the tongue coating by adopting a Faster R-CNN deep learning method, and automatically obtaining a preliminary tongue coating area image;
s3, calibrating the preliminary tongue fur area image by adopting a VGG deep learning method to obtain a more accurate tongue fur area image;
and S4, automatically segmenting the tongue fur image according to the calibrated tongue fur area image.
Further, in step S1, an image including the tongue coating is acquired by a smart mobile device or a camera or a computer.
Further, in the step S2, the fast R-CNN deep learning method requires a fast R-CNN model structure, which includes a convolutional neural network, RPN, RoI pooling and CNET.
Further, in step S3, specifically, the method includes:
s31, adjusting the size of the preliminary tongue fur area image, carrying out mean value removing processing, and transmitting the image into a calibration network to obtain a deviation category;
s32, obtaining corresponding x according to deviation typenDeviation, ynDeviation and scaling sn
S33, according to xnDeviation, ynDeviation and scaling snAnd adjusting a tongue coating area by using inverse operation, wherein the area is a final tongue coating detection area, and the inverse operation formula is as follows:
Figure GDA0002240492940000021
wherein, (x, y, w, h) is the top left x coordinate, top left y coordinate, width and height of the real tongue coating region; wherein:
sn∈{0.83,0.91,1.0,1.10,1.21}
xn∈{-0.17,0,0.17}
yn∈{-0.17,0,0.17}。
further, in step S4, specifically, the method includes: and calculating coordinates of four vertexes of the calibrated tongue coating area image, connecting the four vertexes, and segmenting areas in the four vertexes to obtain a final tongue coating image.
After the technical scheme is adopted, the invention at least has the following beneficial effects:
1. the invention has high accuracy for cutting the tongue coating;
2. the invention adopts a deep learning method of big data, has low requirement on the quality of the input image, can adopt a smart phone and the like to take pictures, and has wide application range.
Drawings
FIG. 1 is a flowchart illustrating the steps of a method for automatically segmenting a tongue coating based on deep learning according to the present invention;
FIG. 2 is a schematic structural diagram of a fast R-CNN model of a fast R-CNN deep learning method in the tongue coating automatic segmentation method based on deep learning according to the present invention;
FIG. 3 is a schematic diagram of the structure of the front 13 layers of the VGG-16 model of the fast R-CNN deep learning method in the tongue coating automatic segmentation method based on deep learning according to the present invention;
FIG. 4 is a schematic diagram of an RPN structure of the fast R-CNN deep learning method in the tongue coating automatic segmentation method based on deep learning according to the present invention;
FIG. 5 is a schematic diagram of a calibration network structure of a VGG deep learning method in the tongue fur automatic segmentation method based on deep learning according to the present invention.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present application is further described in detail with reference to the drawings and specific embodiments.
As shown in FIG. 1, the present invention provides a method for automatically segmenting tongue coating based on deep learning, which comprises five steps:
acquiring a tongue fur image through a camera and the like to be used as an input tongue fur image;
step two, adopting a deep learning method Faster R-CNN to realize tongue coating detection and automatically obtaining a preliminary tongue coating area image;
automatically calibrating the tongue fur area image by adopting a deep learning method VGG-16 to obtain a more accurate tongue fur area image;
and (IV) automatically segmenting the tongue fur area image according to the calibrated tongue fur area image.
Each step is described in detail below.
Acquiring a tongue fur image through a camera and the like as an input tongue fur image: the tongue fur image can be collected by adopting a smart phone, a tablet personal computer, a camera and the like, and is directly input to the step (two) as an input image without any pretreatment.
Step two, tongue coating detection: tongue coating detection is achieved by using a deep learning method, Faster R-CNN.
2.1, fast R-CNN model structure:
the structure of the Faster R-CNN model is shown in FIG. 2, and mainly comprises a Convolutional Neural Network (CNN), RPN (RegionProposal network), RoI (Region of interest) pooling and CNET.
2.1.1, convolutional neural network:
the convolutional neural network is the most common deep neural network in the field of image recognition and is mainly used for extracting image features. The implementation case adopts one of the models: the structure of the front 13 layers (i.e. the part with the full connection layer removed) of the VGG-16 model is shown in fig. 3, and mainly involves operations such as convolution, ReLU, and pooling (Max pooling).
2.1.2、RPN:
The RPN realizes the generation of the candidate region on the basis of the shared convolutional neural network, namely the learning of attention. The method aims to replace a common candidate region algorithm in target detection and improve algorithm speed. The structure of the RPN is shown in fig. 4. RPN is to generate a full-link Layer (interface Layer) with a length of 256 dimensions with a sliding window (sliding window) on a convolutional feature map (conv feature map) extracted by a convolutional neural network, and then to generate two full-link Layer branches after the full-link feature, namely clslayer and reglayer. clslayer implements a binary task to determine whether the current region is foreground or background. The reglayer is a regression task for predicting the coordinates (x, y) and width and height (w, h) of the candidate region corresponding to the central anchor point (Anchors) of the current region. The concept of anchor points (Anchors) and the computation of loss functions (lossfunction) need to be understood with emphasis within the RPN structure.
Anchor points (Anchors) are located at the center of the sliding window. An anchor point may generate k different candidate regions with different scaling (Scale), different aspect ratios (aspect), so that k candidate regions may be predicted for the same sliding window. In a specific implementation of this embodiment, 4 scaling ratios and 3 aspect ratios are set, as shown in table 1, so that there are 12 candidate regions in total. For a convolution mapping of size W × H, W × H × k anchor points can be obtained by 3 × 3 sliding window (step size 1).
The loss function in the RPN includes two parts, which are a loss function of the classification task and a loss function of the regression task, respectively, corresponding to clslayer and reglayer. The loss function of the classification task adopts cross entropy loss to classify whether the candidate area is foreground or background, and the formula is shown in (1). The loss function of the regression task uses SmoothL1 loss to predict the coordinates and width and height of the target area, and the formula is shown in (2). The loss function of the RPN part is a combination of the loss function of the classification task and the loss function of the regression task according to a certain weight, see formula (3). Wherein { piAnd tiAre the outputs of clslayer and reglayer, respectively, with the input samples as foreground
Figure GDA0002240492940000054
Input sample as background
Figure GDA0002240492940000055
Figure GDA0002240492940000056
λ is set to 10 for the correct target area.
TABLE 1 Anchor scaling ratio, aspect ratio specific settings
Figure GDA0002240492940000051
Lcls(x,class)=--X[class]+ln(∑jex[j]) (1)
Where x is a one-dimensional vector of size n, containing the values of the scores for n classes, class being the target class.
Figure GDA0002240492940000052
Where x is the neural network output value and y is the target value.
Figure GDA0002240492940000053
2.1.3, RoI pooling:
the RoI pooling is a simplified version of spp (spatial pyramid Pooling), and enables sampling of different sized inputs, so that the resulting outputs are of the same size. The SPP can determine the size of the finally obtained feature vector first, and in turn, the size and step size of the pooling kernel. The convolution signature maps to a size of W × H × D, and the SPP sets scales of 4 × 4, 2 × 2, and 1 × 1. For example, a 4 × 4 scale may determine a ceil (W/4) × ceil (H/4) size pooling kernel, and pooling operations with a step size of floor (W/4) and floor (H/4) may be obtained. RoIploling is a one-level SPP, using only one scale, set to 6 × 6 output.
2.1.4、CNET:
The CNET part mainly consists of Box regression and Classifier. The Box regression is composed of a full connection layer, a batch normalization layer and a Dropout layer and is used for predicting the position of the target region. The Classifier is a neural network Classifier composed of a full-link layer, a batch normalization layer, a Dropout layer, and a Softmax layer, and is used for identifying a target class. Boxregression is a regression task, and a SmoothL1 loss function is used; the Classifier is a classification task, using a cross-entropy loss function. The loss function of the entire CNET is a weighted combination of the Box regression loss function and the Classifier loss function.
2.2, training process of fast R-CNN:
the deep neural network training is mainly divided into two processes of forward propagation and backward propagation. The training of the neural network is to obtain output through the forward propagation of the neural network, then calculate the error between the output and the label through a loss function, and continuously update the neural network parameters through the reverse transfer error. The Faster R-CNN is a complex deep neural network, and the fast R-CNN paper indicates that the effect obtained by directly training the whole network is not good enough, and the fast R-CNN needs to be divided into small parts for gradual training. The embodiment pre-trains the convolutional neural network part of the Faster R-CNN by a method of training the tongue coating classifier. The parameters of the pre-trained convolutional neural network portion are then fixed, training the RPN. Finally, the parameters of the whole Faster R-CNN are finely adjusted through the loss functions of Box regression and Classifier.
2.2.1, pre-training of convolutional neural networks:
the structure of the convolutional neural network part in FasterR-CNN uses the first thirteen-layer structure of the VGG-16 model, and the main purpose is to extract the tongue fur image features. A two-classifier is trained by using a VGG-16 model structure to realize the classification of tongue coating and non-tongue coating. The VGG-16 model parameter configuration is shown in Table 2, since the ReLU does not require additional configuration parameters, the ReLU layer is omitted from this table. The convolutional neural network part in the Faster R-CNN uses the parameters of the first thirteen layers of the trained VGG-16 model, so that the Faster R-CNN can better extract the tongue fur image characteristics. I.e., the convolutional neural network portion of Faster R-CNN is pre-trained using a large number of images of tongue coating and non-tongue coating.
TABLE 2 VGG-16 model parameter configuration Table
Figure GDA0002240492940000061
Figure GDA0002240492940000071
5683 images are recorded in the tongue coating detection database used in the embodiment, and the real tongue coating area information is marked. The pre-training of the convolutional neural network portion of fast R-CNN is performed in this embodiment to improve the ability of the convolutional neural network portion to extract the tongue coating image features, requiring a large number of tongue coating images and non-tongue coating images. In this embodiment, 5 positive examples of tongue coating images and 5 negative examples of tongue coating images (i.e., non-tongue coating images) are randomly extracted from each tongue coating image marked with tongue coating region information, and a total of 56830 images are used as a pre-training data set. The selection of the positive and negative examples of the tongue coating image is determined according to the IoU (interaction over Unit) value of the labeled area. The IoU value is the ratio of the overlapping area of two regions to their total area. In this embodiment, regions are randomly generated in the tongue coating image, wherein the regions with IoU values greater than 0.6 are positive examples, and the regions with IoU values less than 0.2 and the areas not less than half of the area of the correct regions are negative examples. Since the VGG-16 model requires the input image to be 224 × 224 in size, all positive and negative example images need to be resized to 224 × 224. The pre-training data set consisted of 56830 data sets, 80% of which were training sets and 20% of which were testing sets. And calculating the mean value and the variance of the training set, performing mean value removing processing on an input layer of the neural network, centralizing all dimensions of input data to 0, and accelerating convergence of the neural network. Furthermore, to increase the number of training samples, the training samples are flipped horizontally with a 50% probability. The implementation case is realized by using a Torch deep learning framework, and the training parameters are shown in Table 3.
TABLE 3 Pre-training related parameters Table
Figure GDA0002240492940000081
2.2.2 training of RPN:
the RPN is intended to enable the generation of candidate regions. In the above description, the network structure of RPN is introduced, and a full-connected feature with a length of 256 dimensions is generated by using a sliding window on the convolution feature map extracted by the convolution neural network. In a specific implementation, due to the factors of program implementation, the RPN structure is slightly different from the RPN structure described above, but the overall function and concept thereof are consistent. In this embodiment, the minimum side of the image needs to be fixed at 480 pixels, and four anchor point regions are set to 48 sizes respectively2、962、1922、3842. Corresponding to the size of each anchor point region, an RPN is respectively designed, and each RPN recommends 3 regions for each anchor point according to three aspect ratios (1: 1, 2: 1, 1: 2), so that the function of recommending 12 regions for each anchor point in the RPN is realized. Considering the difference in anchor region size, the sliding window sizes used by the 4 RPNs are also different, and 3 × 3, 5 × 5, and 7 × 7 are used, respectively. Due to the characteristics of the convolutional neural network, the deeper the extracted features, the more abstract the represented information can be, and the more abstract the information can beSome detail information can be lost, and the smaller the anchor point area is, the more detail information it needs, so in this experiment, the size of the anchor point area is 482The corresponding RPN uses the feature mapping of the 10 th layer of the convolutional neural network, and the feature mapping of the last layer is used for the rest.
The goal of RPN training is to classify the fully connected features corresponding to each anchor point and further regress the target coordinates corresponding to each anchor point. The target of training is the anchor point area, and the required labeling information is the area category (whether the area feature belongs to the foreground or the background) and the target coordinate. In the experiment, the minimum edge of the tongue fur image is fixed to be 480 pixels, the anchor point area is generated according to the sizes of the four anchor point areas and the three aspect ratios to serve as a training sample of the experiment, if the IoU value of the area and the correct area is larger than 0.6, the area is marked as a foreground area, and if the IoU value is smaller than 0.2, the area is marked as a background area. Selecting the ratio of the number of the foreground areas to the number of the background areas to be 1: 1.
TABLE 4 RPN training-related parameters
Figure GDA0002240492940000091
Figure GDA0002240492940000101
The convolutional neural network part of the embodiment uses model parameters obtained through pre-training, the parameters are fixed and are not changed, and only the parameters of the RPN part are adjusted. The experimental environment is consistent with the pre-training experiment, and the training parameters are shown in table 4. The tongue fur image is subjected to convolutional neural network to obtain feature mapping, the RPN scans the feature mapping by using a sliding window to obtain full-connection features corresponding to each anchor point region, then the features corresponding to the anchor point regions in a training sample are selected and are transmitted into a cls layer and a reglaer, clslayer is classified into 1 for the features of the foreground region, a target region is further regressed, the position of a recommended region is corrected, clslayer is classified into 0 for the features of the background region, and learning of a regression task is not performed.
2.2.3 training Fine tuning of Faster R-CNN:
parameters of a convolutional neural network part and an RPN part can be obtained through pre-training and training of the RPN, and then a Classifier and a Boxregression are added to readjust the parameters of the whole Faster R-CNN. The training process of the Faster R-CNN mainly comprises the following steps:
(1) data preprocessing and generation of anchor point areas. The Faster R-CNN training dataset is the tongue coating detection database. And calculating the mean and variance of the training set for mean removal processing. The size of the tongue coating image is adjusted to 480, the minimum side. Horizontal flipping is done with 0.25 probability for adding training samples. And generating anchor point areas according to the sizes of the four anchor point areas and the three aspect ratios, and selecting a foreground area and a background area in a ratio of 1: 1 according to the IoU value of the marked target area.
(2) And extracting the tongue fur image characteristics by the convolutional neural network. And transmitting the preprocessed tongue fur image into a convolutional neural network, extracting the image characteristics of the tongue fur image, and generating corresponding characteristic mapping.
(3) The RPN generates a candidate region. Using the feature mapping generated in the step (2) of RPN sliding window scanning to generate the features of the anchor point region, selecting the features of the foreground region and the background region in the step (1), transmitting the features to clslayer and reglayer, and calculating softmax of clslayer (marked as L)pcls) And Reglaber L1 loss (denoted L)preg). For the foreground area, a candidate area is generated by using a reglayer; for background regions, anchor regions are used as candidate regions.
(4) RoIploling generates a full-link feature of fixed size. And (3) selecting the feature mapping corresponding to the candidate region generated in the step (2) from the feature mapping generated in the step (3), and generating the full-link feature with a fixed size through RoIploling. The problem of inconsistent feature sizes caused by inconsistent picture sizes is solved.
(5) Identification of the tongue coating and prediction of coordinates of the tongue coating area. The full-connection characteristic is transmitted to the Box regression and Classifier parts, and the L1 loss (marked as L) of the Box regression is calculatedcreg) And the negtivelog-likelihood loss of Classifier (denoted as L)ccls)。
(6) And (5) performing back propagation and updating the parameters. At 10Lcreg+LcclsTotal loss as part of CNET, denoted LcnetAnd updating the parameters of the CNET part. At 10Lpreg+LpclsTotal loss as part of RPN, denoted as LrpnAnd updating the parameters of the RPN part. Finally, by Lcnet+LrpnThe loss, which is part of the convolutional neural network, updates its parameters.
(7) And (4) repeating the steps (1) to (6) until convergence or the maximum iteration number is reached.
TABLE 5 FasterR-CNN training Fine tuning parameters
Figure GDA0002240492940000111
The implementation case is realized by a Torch deep learning framework, and the training parameters are shown in a table 5.
2.3, detection process of fast R-CNN:
the calculation of the Faster R-CNN in the detection stage and the training stage is slightly different, and the method mainly comprises the following steps:
(1) and (4) preprocessing data. The same pre-processing is performed as in the training phase, and the mean and variance of the training set are also used for mean removal.
(2) And extracting the tongue fur image characteristics by the convolutional neural network. The preprocessed tongue fur image is transmitted into a convolution neural network to generate feature mapping.
(3) The RPN generates a candidate region. And (3) generating anchor point region characteristics by using the characteristic mapping generated in the step (2) of RPN sliding window scanning, and for the anchor point characteristics of which clslayer is judged as a foreground region and the confidence coefficient is more than 0.95, predicting a corresponding tongue coating region by using a reglayer and listing the tongue coating region as a candidate region.
(4) And screening candidate regions. The redundant candidate regions are eliminated using a Non-maximum suppression (Non-maximum suppression) method, and the best candidate region is found. And (3) non-maximum suppression, namely filtering out non-maximum, always selecting a candidate region with the maximum foreground confidence, then filtering out other candidate regions which are larger than 0.25 of the candidate region IoU, and repeating the actions in the remaining other candidate regions to screen out all candidate regions with high confidence.
(5) RoIploling generates a full-link feature of fixed size. And (3) selecting a feature map corresponding to the screened candidate region from the feature map generated in the step (2), and generating a full-link feature with a fixed size through RoIploling.
(6) The full-junction features are passed to the Classider and Box regression, the tongue coating is identified and tongue coating regions are predicted.
(7) The predicted tongue coating area was screened using a non-maxima suppression method.
Step three, tongue coating calibration:
the fast R-CNN model is utilized to realize the preliminary tongue coating detection function, a deep neural network is designed to realize the further calibration of the tongue coating position, the network is called as a calibration network, and the accuracy of tongue coating detection is further improved.
3.1, structure of calibration network:
the deep neural network and the deep learning are utilized to solve the practical problem, and the first is to determine the learning task target of the neural network and determine the model structure of the neural network. For deep neural network and deep learning, two learning tasks are mainly provided, namely a classification task and a regression task. The problem to be solved in this section is the calibration of the position of the tongue coating, and the predicted position of the tongue coating can be adjusted on the assumption that the deviation between the predicted position of the tongue coating and the actual position of the tongue coating can be known. Namely, the tongue fur position calibration problem is regarded as a classification problem, the deviation from the real tongue fur position is divided into a plurality of categories, the deviation category to which the predicted tongue fur position belongs is checked, and then the tongue fur position is adjusted according to the deviation category, so that the tongue fur position calibration is realized.
Since it is a classification problem, the classification category must be specified. In this embodiment, x will be related to the actual tongue coating positionnOffset, ynOffset and scaling snThe detected positions of the tongue coating are classified into 45 categories, as shown in formula (5-1).
Figure GDA0002240492940000131
The model structure of the VGG-16 is used to implement the calibration network, and is shown in fig. 5.
3.2, tongue coating calibration database:
the tongue coating calibration database is constructed for tongue coating position calibration on the basis of the tongue coating detection database. The tongue coating area can be obtained from the tongue coating detection database, and the tongue coating calibration database generates a new area from the tongue coating area according to the deviation category, intercepts a new area image and marks the category of the image as the deviation category. The tongue coating calibration database is constructed for the purpose of enabling the deep neural network to learn tongue coating deviation images in the tongue coating calibration database, judging deviation categories of the tongue coating areas, and adjusting the tongue coating areas according to the deviation categories, so that the aim of tongue coating calibration is fulfilled. The tongue coating calibration method is inspired by the thesis of human face area calibration. The deviation categories comprise scaling proportion, deviation in the x direction and deviation in the y direction, and are specifically set in formula (1), so that 45 deviation categories can be formed. The generation mode of the deviation type area is shown in formula (2), wherein (x, y, w, h) is the upper left x coordinate, the upper left y coordinate, the width and the height of the real tongue coating area. By using the inverse operation of the formula (2), the tongue coating area can be adjusted through the deviation type, so as to achieve the purpose of tongue coating calibration. And (3) generating 45 areas in each picture of the tongue coating detection database according to the steps (1) and (2), intercepting and storing the areas, so that 45 new tongue coating images are generated in each tongue coating image in the tongue coating detection database, the labeling types of the new tongue coating images are deviation types, and the new images form a tongue coating calibration database.
Figure GDA0002240492940000132
Figure GDA0002240492940000133
3.3, training of a calibration network:
the implementation case needs to train a classifier of tongue coating deviation category, and the training process is as follows:
(1) and (4) preprocessing data. The data preprocessing stage mainly carries out the averaging operation and the data increment operation which is overturned at the probability level of 0.5.
(2) And (4) forward propagation. And transmitting the preprocessed training set images into a calibration network, and finally calculating 45 classes of softmax after operations such as convolution, ReLU, pooling, full connection and the like.
(3) And is propagated in the reverse direction. And (5) propagating softmaxloss in the backward direction and updating network parameters.
(4) And repeating the iteration. And (4) repeating the operations (1) to (3) until convergence or the maximum iteration number is reached.
(5) Every few iterations, the test set is used to verify the effectiveness of the network model, looking at the error rate on the test set.
The training parameters for this embodiment are shown in table 6, and the non-fully connected part of the calibration network is initialized using the VGG-16 model trained using ImageNet images, still using transfer learning.
TABLE 6 calibration of network training parameters
Figure GDA0002240492940000141
3.4, calibration detection of a calibration network:
the detection process of the calibration network comprises the following steps:
(1) and (4) adjusting the size of the tongue coating area image initially detected in the step (two) to 224 multiplied by 224.
(2) And after mean value removal processing, transmitting the data into a calibration network to obtain a deviation category.
(3) Deriving corresponding x by deviation classnDeviation, ynDeviation and scaling sn
(4) And (5) adjusting the tongue coating area by using the inverse operation of the formula (5-1), wherein the area is the final tongue coating detection area.
Step (IV), segmenting a tongue coating area image according to the coordinates after tongue coating calibration:
the steps are very simple, and the coordinates of four top points of the initial tongue coating area are calculated directly according to the coordinates after the tongue coating is calibrated and then are divided.
The effect of this implementation case:
the Faster R-CNN training data set is a tongue coating detection database, 5683 sheets in total, and is divided into a training set and a test set in a ratio of 4: 1. The tongue coating detection model after 50000 times of iterative training on the training set needs to evaluate the detection effect on the test set. The test set consisted of 1138 tongue coating images, each containing only one tongue coating. For further analysis of the detection effect, the detection results on the test set are classified into the following categories: correct means that the classification is accurate, and IoU between a prediction box and a true value is more than 0.5; localization means accurate classification, IoU is between 0.1 and 0.5; background means that the classification is accurate, but IoU is less than 0.1; other refers to misclassification.
The primary detection results of Correct and Localization on the test set account for the vast majority of results, and the model can basically detect the position of the tongue coating, wherein the detection accuracy of Correct is 78%, the Localization category still accounts for 19%, and the detection accuracy can be further improved through tongue coating calibration.
And then, automatically calibrating the tongue coating area detected in the step (II) by adopting a tongue coating deviation category classifier realized by a calibration network. The results of the calibrated tongue coating test on the test set show that: the Correct result is improved to 91% from the original 78%, and the Localization result is reduced to 7% from the original 19%, which shows that the calibration network effectively improves the tongue coating detection effect and verifies the effectiveness of the method.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims (4)

1. A tongue coating automatic segmentation method based on deep learning is characterized by comprising the following steps:
s1, collecting and inputting an image containing the tongue coating;
s2, detecting the tongue coating of the image containing the tongue coating by adopting a Faster R-CNN deep learning method, and automatically obtaining a preliminary tongue coating area image;
s3, calibrating the preliminary tongue fur area image by adopting a VGG deep learning method to obtain a more accurate tongue fur area image;
s4, automatically segmenting the tongue fur image according to the calibrated tongue fur area image;
the step S3 specifically includes:
s31, adjusting the size of the preliminary tongue fur area image, carrying out mean value removing processing, and transmitting the image into a calibration network to obtain a deviation category;
s32, obtaining corresponding x according to deviation typenDeviation, ynDeviation and scaling sn
S33, according to xnDeviation, ynDeviation and scaling snAnd adjusting a tongue coating area by using inverse operation, wherein the area is a final tongue coating detection area, and the inverse operation formula is as follows:
Figure FDA0002240492930000011
wherein, (x, y, w, h) is the top left x coordinate, top left y coordinate, width and height of the real tongue coating region; wherein:
sn∈{0.83,0.91,1.0,1.10,1.21}
xn∈{-0.17,0,0.17}
yn∈{-0.17,0,0.17}。
2. the method for automatically segmenting the tongue coating based on the deep learning of claim 1, wherein the image containing the tongue coating is collected in the step S1 and is collected by a smart mobile device or a camera or a computer.
3. The method for automatically segmenting tongue coating based on deep learning of claim 1, wherein the fast R-CNN deep learning method in step S2 requires constructing a fast R-CNN model structure, and the fast R-CNN model structure includes convolutional neural network, RPN, RoI pooling and CNET.
4. The method for automatically segmenting the tongue coating based on the deep learning as claimed in claim 1, wherein the step S4 specifically comprises: and calculating coordinates of four vertexes of the calibrated tongue coating area image, connecting the four vertexes, and segmenting areas in the four vertexes to obtain a final tongue coating image.
CN201710338958.5A 2017-05-15 2017-05-15 Tongue coating automatic segmentation method based on deep learning Active CN107610087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710338958.5A CN107610087B (en) 2017-05-15 2017-05-15 Tongue coating automatic segmentation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710338958.5A CN107610087B (en) 2017-05-15 2017-05-15 Tongue coating automatic segmentation method based on deep learning

Publications (2)

Publication Number Publication Date
CN107610087A CN107610087A (en) 2018-01-19
CN107610087B true CN107610087B (en) 2020-04-28

Family

ID=61059668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710338958.5A Active CN107610087B (en) 2017-05-15 2017-05-15 Tongue coating automatic segmentation method based on deep learning

Country Status (1)

Country Link
CN (1) CN107610087B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564113A (en) * 2018-03-27 2018-09-21 华南理工大学 A kind of tongue fur constitution recognition methods perceived based on deep neural network and complexity
CN108596087B (en) * 2018-04-23 2020-09-15 合肥湛达智能科技有限公司 Driving fatigue degree detection regression model based on double-network result
CN108734108B (en) * 2018-04-24 2021-08-03 浙江工业大学 Crack tongue identification method based on SSD network
CN108665471A (en) * 2018-05-30 2018-10-16 高鹏 A kind of human body back curve acquisition methods and system based on camera
CN108960285B (en) * 2018-05-31 2021-05-07 东软集团股份有限公司 Classification model generation method, tongue image classification method and tongue image classification device
CN109199334B (en) * 2018-09-28 2021-06-22 小伍健康科技(上海)有限责任公司 Tongue picture constitution identification method and device based on deep neural network
CN109815802A (en) * 2018-12-18 2019-05-28 中国海洋大学 A kind of monitor video vehicle detection and recognition method based on convolutional neural networks
CN109700433A (en) * 2018-12-28 2019-05-03 深圳铁盒子文化科技发展有限公司 A kind of tongue picture diagnostic system and lingual diagnosis mobile terminal
CN109977952B (en) * 2019-03-27 2021-10-22 深动科技(北京)有限公司 Candidate target detection method based on local maximum
CN110176944A (en) * 2019-04-25 2019-08-27 中国科学院上海微系统与信息技术研究所 A kind of intelligent means for anti-jamming and method based on deep learning
CN110163855B (en) * 2019-05-17 2021-01-01 武汉大学 Color image quality evaluation method based on multi-path deep convolutional neural network
CN110503088B (en) * 2019-07-03 2024-05-07 平安科技(深圳)有限公司 Target detection method based on deep learning and electronic device
CN110796186A (en) * 2019-10-22 2020-02-14 华中科技大学无锡研究院 Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN111079822A (en) * 2019-12-12 2020-04-28 哈尔滨市科佳通用机电股份有限公司 Method for identifying dislocation fault image of middle rubber and upper and lower plates of axle box rubber pad
CN112686340B (en) * 2021-03-12 2021-07-13 成都点泽智能科技有限公司 Dense small target detection method based on deep neural network
CN113419868B (en) * 2021-08-23 2021-11-16 南方科技大学 Temperature prediction method, device, equipment and storage medium based on crowdsourcing
CN115994947B (en) * 2023-03-22 2023-06-02 万联易达物流科技有限公司 Positioning-based intelligent card punching estimation method
CN116777930B (en) * 2023-05-24 2024-01-09 深圳汇医必达医疗科技有限公司 Image segmentation method, device, equipment and medium applied to tongue image extraction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2188779B1 (en) * 2007-09-21 2014-04-23 Korea Institute of Oriental Medicine Extraction method of tongue region using graph-based approach and geometric properties
WO2014183246A1 (en) * 2013-05-13 2014-11-20 Huang Bo Medical image processing method and system
CN105160346A (en) * 2015-07-06 2015-12-16 上海大学 Tongue coating greasyness identification method based on texture and distribution characteristics
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106295139A (en) * 2016-07-29 2017-01-04 姹ゅ钩 A kind of tongue body autodiagnosis health cloud service system based on degree of depth convolutional neural networks
CN106651887A (en) * 2017-01-13 2017-05-10 深圳市唯特视科技有限公司 Image pixel classifying method based convolutional neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2188779B1 (en) * 2007-09-21 2014-04-23 Korea Institute of Oriental Medicine Extraction method of tongue region using graph-based approach and geometric properties
WO2014183246A1 (en) * 2013-05-13 2014-11-20 Huang Bo Medical image processing method and system
CN105160346A (en) * 2015-07-06 2015-12-16 上海大学 Tongue coating greasyness identification method based on texture and distribution characteristics
CN106250812A (en) * 2016-07-15 2016-12-21 汤平 A kind of model recognizing method based on quick R CNN deep neural network
CN106295139A (en) * 2016-07-29 2017-01-04 姹ゅ钩 A kind of tongue body autodiagnosis health cloud service system based on degree of depth convolutional neural networks
CN106651887A (en) * 2017-01-13 2017-05-10 深圳市唯特视科技有限公司 Image pixel classifying method based convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Shaoqing Ren etc;《arXiv:1506.01497v3》;20160106;正文附图2 *

Also Published As

Publication number Publication date
CN107610087A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN107610087B (en) Tongue coating automatic segmentation method based on deep learning
CN110599448B (en) Migratory learning lung lesion tissue detection system based on MaskScoring R-CNN network
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
WO2020253629A1 (en) Detection model training method and apparatus, computer device, and storage medium
CN107506761B (en) Brain image segmentation method and system based on significance learning convolutional neural network
CN108648191B (en) Pest image recognition method based on Bayesian width residual error neural network
CN109345527B (en) Bladder tumor detection method based on MaskRcnn
CN113592845A (en) Defect detection method and device for battery coating and storage medium
CN107862694A (en) A kind of hand-foot-and-mouth disease detecting system based on deep learning
Kudva et al. Automation of detection of cervical cancer using convolutional neural networks
CN110930416A (en) MRI image prostate segmentation method based on U-shaped network
CN114241548A (en) Small target detection algorithm based on improved YOLOv5
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
WO2021136368A1 (en) Method and apparatus for automatically detecting pectoralis major region in molybdenum target image
CN109902584A (en) A kind of recognition methods, device, equipment and the storage medium of mask defect
CN111652317A (en) Hyper-parameter image segmentation method based on Bayesian deep learning
CN110543906B (en) Automatic skin recognition method based on Mask R-CNN model
Jin et al. Construction of retinal vessel segmentation models based on convolutional neural network
Zhang et al. High-quality face image generation based on generative adversarial networks
Lan et al. Run: Residual u-net for computer-aided detection of pulmonary nodules without candidate selection
Li et al. Natural tongue physique identification using hybrid deep learning methods
WO2024032010A1 (en) Transfer learning strategy-based real-time few-shot object detection method
CN111127400A (en) Method and device for detecting breast lesions
CN111241957A (en) Finger vein in-vivo detection method based on multi-feature fusion and DE-ELM
CN113066054B (en) Cervical OCT image feature visualization method for computer-aided diagnosis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant