CN108109160A

CN108109160A - It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning

Info

Publication number: CN108109160A
Application number: CN201711133796.8A
Authority: CN
Inventors: 王丽冉; 汤平; 汤一平; 何霞; 陈朋; 袁公萍; 金宇杰
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2017-11-16
Filing date: 2017-11-16
Publication date: 2018-06-01

Abstract

It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning, including being used for the depth convolutional neural networks of tongue global feature extraction, for the area-of-interest of tongue body region Preliminary detection positioning network, the depth convolutional neural networks for carrying out deep layer abstract characteristics extraction to area-of-interest and GrabCut algorithms for being split to tongue picture.The present invention effectively solves the problems, such as that existing GrabCut algorithms are too dependent on man-machine interactively when being split to tongue body, improves the degree of automation of GrabCut algorithms in tongue body segmentation.

Description

It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning

Technical field

The present invention relates to a kind of dividing methods, and in particular to TCM tongue diagnosis, computer vision, Digital Image Processing, pattern The technologies such as identification, deep learning and depth convolutional neural networks split the application in field in tongue picture automatically.

Background technology

Lingual diagnosis is the important component in motherland's medicine observation, according to patient lingual surface tongue mark, tongue nature association attributes, bag The observation of color, form etc. is included, where judging disease, and then diagnosis and treatment.Nowadays, the standardizing of TCM tongue diagnosis, quantitative Change, objectify research have become Diagnostics of Chinese Medicine modernization main direction of studying, have pole to the development of entire traditional Chinese medicine Its profound significance.

The standardization of lingual diagnosis, quantification, the research that objectifies are regarded in camera shooting, Digital Image Processing, pattern-recognition, computer What feel etc. technically grew up, mainly include tongue picture Image Acquisition, colour correction, tongue body segmentation, region division (coating nature point From), tongue color, ligulate, indentation, tongue nature, sublingual vessel etc..The basis that these researchs are applied as modernization lingual diagnosis, by tongue Examine band play the role of into quantification, the process to objectify it is most important.

It is the premise for carrying out lingual diagnosis that tongue body is precisely separating from tongue picture.In recent years, there is researcher by GrabCut algorithm applications In tongue body segmentation, certain effect is achieved.But GranCut algorithms are needed by way of man-machine interactively in use The divided frame of given prospect background, the step for greatly reduce the automation performance of algorithm.

Set forth herein interactive GrabCut algorithms are exempted from based on deep learning, using depth convolutional neural networks to tongue body It is positioned, automatically derives prospect background divided frame.Two Introduction of key techniques that the present invention relies on are as follows：

(1) convolutional neural networks

Deep learning was used widely in computer vision field in recent years, this has benefited from the quick of depth learning technology Development, convolutional neural networks can make full use of substantial amounts of training sample to extract abstracted information therein layer by layer, more directly More fully the further feature of image is arrived in study, these features are proved in substantial amounts of task than traditional manual extraction feature With stronger characterization ability, the overall structure of image can be described in more detail below.Convolutional neural networks technology from R-CNN, Fast R-CNN develop to Faster R-CNN, develop to FCN from CNN, almost cover the meters such as target detection, classification, segmentation Several big key areas of calculation machine vision.

Convolutional neural networks are that the sensory perceptual system of the imitation mankind is built.Human brain is to pass layer by layer to the processing of information It passs, from specific to an abstract process, low-level feature is handled and extracted to input information, obtains the essence letter of data Breath, so form brain it will be appreciated that higher level of abstraction information, the structure of this hierarchy type remains the essential information of object, and Reduce the data volume of human brain processing.The pyramid structure for simulating human brain is transferred into row information so that depth convolutional neural networks An important advantage be exactly successively to extract information from Pixel-level initial data to abstract semantic concept so that it is being extracted There is prominent advantage in terms of the further feature and semantic information of image.

(2) GrabCut algorithms

GrabCut algorithms are a kind of very effective dividing methods.It marks foreground and background by hand firstly the need of user Information, that is, need specify a rectangle for including prospect, then foreground and background is built by gauss hybrid models (GMM) Mould.According to the input of user, GMM can learn and create new pixel distribution.The pixel unknown to those classification, can be according to him Classify with the relations of known classified pixels.Thus can be according to the one width figure of profile creation of pixel, the node in figure It is exactly pixel.Then figure obtained above is split based on mincut algorithms.

The content of the invention

Man-machine interactively is needed during the use that existing CrabCut algorithms split in tongue body in order to overcome, to give prospect The divided frame of background, this not strong problem of automation performance, present invention proposition is a kind of to exempt from interactive mode based on deep learning GrabCut tongue body dividing methods, structure depth convolutional neural networks automatically position tongue body, so as to obtain prospect background point Frame is cut, without manually giving, improves the degree of automation of partitioning algorithm.

The technical solution adopted by the present invention to solve the technical problems is：

It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning, including being used for the extraction of tongue global feature Depth convolutional neural networks, for the area-of-interest of tongue body region Preliminary detection positioning network, for region of interest Domain carries out the depth convolutional neural networks of deep layer abstract characteristics extraction and the GrabCut algorithms for being split to tongue picture；

The depth convolutional neural networks for the extraction of tongue global feature, the facilities network as whole network model Network is divided into five layers, the depth structure being alternately made of convolutional layer, active coating and pond layer, implicitly from given tongue picture number According to middle carry out unsupervised learning, avoid and manually carry out explicit feature extraction；

It is described to be used to position network, i.e. RPN networks to the area-of-interest of tongue body region Preliminary detection, on lingual surface not With attribute, corresponding region is detected and divides, and obtains the preliminary advice result of tongue body；

The depth convolutional neural networks for being used to carry out area-of-interest deep layer abstract characteristics extraction, by connecting entirely Layer composition carries out further feature extraction to the tongue body suggestion areas obtained on last stage, and input area carries out layer by layer in a network Mapping, obtains different representations, extracts its abstract characteristics, so as to fulfill the depth representing to tongue picture, obtains tongue body positioning Result.

The GrabCut algorithms being split to tongue picture, using tongue body posting obtained above as input, thus The foreground and background of tongue picture figure is distinguished, and then the automatic segmentation of tongue body is completed on the premise of without man-machine interactively.

Further, the depth convolutional neural networks for the extraction of tongue global feature, are divided into five layers, convolutional Neural Network is the depth structure being alternately made of convolutional layer, active coating and pond layer；By convolution operation, prime information is made to enhance and subtract Few noise；It is operated by pondization, using the principle of image local correlation, sub-sample is carried out to image, image is useful retaining The treating capacity of data is reduced on the basis of information；

Network receives the tongue picture of arbitrary dimension as inputting, and network structure is as follows：The convolution kernel of first convolutional layer Conv1 Number is 96, and size is 7 × 7 × 3, and convolution step-length is 2, Filling power 3；The Chi Huahe of first pond layer (Pool1) for 7 × 7 × 3, pond step-length is 2, Filling power 1；ReLU active coatings 1 are then carried out to handle；Second convolutional layer Conv2 has 256 volumes Product core, size are 5 × 5 × 96, step-length 2, Filling power 2；The Chi Huahe of second pond layer Pool2 is 7 × 7 × 96, step-length For 2, Filling power 1；ReLU active coatings 1 are then carried out to handle；3rd convolutional layer Conv3 has 384 convolution kernels, size 3 × 3 × 256, Filling power 1；ReLU active coatings 1 are then carried out to handle；4th convolutional layer Conv4 has 384 convolution kernels, greatly Small is 3 × 3 × 384, Filling power 1；ReLU active coatings 1 are then carried out to handle；5th convolutional layer Conv5 has 256 convolution Core, size are 3 × 3 × 384, Filling power 1；ReLU active coatings 1 are then carried out to handle；

By this five layers of feature extraction, every tongue picture obtains 256 characteristic patterns, the input as RPN networks.

Further, described to be used in the area-of-interest positioning network to tongue body region Preliminary detection, RPN networks receive 256 characteristic patterns of basic network generation carry out after-treatment as inputting, using three convolutional layers and algorithm layer to characteristic pattern, The set of rectangular target candidate frame is exported, each frame includes 4 position coordinates variables and a score；

First convolutional layer Conv1/rpn of RPN networks has 256 convolution kernels, and size is 3 × 3 × 256；RPN networks Second convolutional layer Conv2/rpn has 18 convolution kernels, and size is the 3rd convolutional layer Conv3/ of 1 × 1 × 256, RPN networks Rpn has 36 convolution kernels, and size is 1 × 1 × 256；

RPN networks additionally add algorithm layer for formation zone candidate frame, and multiple dimensioned convolution behaviour is carried out on characteristic pattern Make, be implemented as：In the position of each sliding window using 3 kinds of scales and 3 kinds of length-width ratios, with current sliding window mouth center Centered on, and a kind of corresponding scale and length-width ratio, then mapping obtains the candidate region of 9 kinds of different scales in artwork, such as Size is the shared convolution characteristic pattern of w × h, then a total of w × h × 9 candidate region；Finally, classify layer output w × h × 9 × The score of 2 candidate regions is the estimated probability of target/non-targeted to each region, return layer output w × h × 9 × 4 The coordinate parameters of parameter, i.e. candidate region；

Training process is as follows in RPN networks：First with each point on 3 × 3 sliding window traversal characteristic pattern, find Sliding window central point is mapped in the position in artwork, and point centered on it at the point, and 3 kinds of scales are generated in artwork (1282,2562,5122) and 3 kinds of length-width ratios (1:1,2:1,1:2) each point on candidate region, i.e. characteristic pattern is in artwork 9 candidate regions are all corresponded to, if characteristic pattern size is w × h, then the candidate region number generated is w × h × 9, next to institute There is candidate region to be screened and judged twice twice；Leave out first and complete to sieve for the first time beyond the candidate region of artwork scope Choosing then calculates remaining candidate region it and hands over the ratio between unions i.e. Duplication with all real label areas, and according to than It is worth and distributes a binary label for each candidate region, judges that the region is tongue body with this, criterion is：1) will The candidate region of ratio maximum is considered as positive sample, i.e. tongue body；2) in other candidate regions, if ratio is more than 0.7, then it is assumed that be Positive sample, less than 0.3, then it is assumed that be negative sample, i.e., background, the candidate region that ratio is interposed between the two are given up；

Candidate region and the calculating of true callout box GT Duplication are represented by formula (1)：

After completing to the postsearch screening of candidate region, second of marker for judgment is carried out to it, there will be maximum hand over simultaneously with it Label of the label of the true tab area of the ratio between collection as the candidate region, i.e. prospect label, and added for all negative samples Background label carries out stochastical sampling to positive negative sample, and number of samples is set to 128, and oversampling ratio is set to 1:1, under normal circumstances just Sample number is less, if positive sample number is less than 64, differential section is supplied by negative sample, in subsequent network by 128 just Negative sample is merged trains together, with the discrimination of enhancing mark sample and non-mark sample.

Further, the depth convolutional Neural net for being used to carry out area-of-interest deep layer abstract characteristics extraction Network is made of full articulamentum, and is added pyramid pond layer before this and carried out dimension normalization；

Sub-network carries out feature extraction using full articulamentum to the candidate region after sampling, and candidate region shares 9 kinds of sizes, And full articulamentum requires input size consistent, therefore dimension normalization is carried out first with pyramid pond layer herein, then be sent to Three full articulamentums carry out further feature extraction, and full articulamentum output neuron number is set to 1024 in sub-network, obtains The feature vector of 1024 dimensions；Then, this feature vector is respectively fed to two full articulamentums and carries out Feature Compression, full articulamentum Output neuron number is set to 2 and 8；Finally, output valve with true tag value is compared respectively, carries out returning for loss function Reduction beam；

Loss function is represented by formula (2)：

In formula, classification loss function is defined as by formula (3)：

Position returns loss function and is defined as by formula (4)：

R is the loss function smooth of robust_L1, it is expressed as by formula (5)：

In formula, N_clsAnd N_regIt is to avoid the regular terms of over-fitting, λ is weight coefficient, and i is the classification rope of the candidate region Draw value, t_iIt is the prediction coordinate shift amount of the candidate region, t*i is the actual coordinate offset of the candidate region, p_iIt is pre- astronomical observation Favored area belongs to the probability of the i-th class, and p*i represents its true classification, and p*i=0 represents background classes, and p*i=1 represents tongue body class；

The error between predicted value and given actual value is calculated respectively by the two loss functions, is calculated using backpropagation Method returns error layer by layer, and every layer of parameter is adjusted and updated using stochastic gradient descent method, more new formula such as formula (6) It is shown so that closer to actual value, i.e., the output of most latter two full articulamentum is closer gives in mark value the predicted value of network Classification and location information；

In formula, w and w' are respectively to update front and rear parameter value, and E is the error amount being calculated by loss function layer, η For learning rate.

The described GrabCut algorithms for being split to tongue picture comprise the following steps：

Step1：The divided frame of prospect background given first, marks the information of foreground and background, and GrabCut is mixed by Gauss Molding type (Gaussian Mixture Model, GMM) carries out statistical modeling, GMM meetings respectively to foreground data and background data Learn and create new pixel distribution, the pixel unknown to those classification can be according to their pixel relationships with known classification To classify；

Step2：It will be schemed by Step1 according to one pair of profile creation of pixel, the node in figure is exactly pixel.Except picture Vegetarian refreshments is cooked outside node there are two node：Source_node and Sink_node, all foreground pixels are all and Source_ Node is connected, and all background pixels are all connected with Sink_node.Pixel is connected to Source_node/end_node's The weight on (side) belongs to the probability of same class (be both prospect or be both background) to determine by them.Weight between two pixels It is determined by the similitude of the information on side or two pixels.If the color of two pixels is very different, then they Between the weight on side will very little；

Step3：Figure obtained above is split using mincut algorithms, it can will scheme to divide according to least cost equation For Source_node and Sink_node, cost equation is exactly the sum of weight on all sides being cut up, and after cutting, is owned The pixel for being connected to Source_node is considered as prospect, and all pixels for being connected to Sink_node are considered as background；

Step4：Continue this process until classification restrains, you can complete segmentation.

Beneficial effects of the present invention are：Without manually giving prospect background divided frame, the automation of partitioning algorithm is improved Degree.

Description of the drawings

Fig. 1 is the overall network frame diagram positioned to tongue body；

Fig. 2 is RPN network structures；

Fig. 3 is the partial results figure of tongue body segmentation.

Fig. 4 is the flow chart for exempting from interactive GrabCut tongue bodies dividing method based on deep learning.

Specific embodiment

The present invention will be further described below in conjunction with the accompanying drawings.

It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning with reference to Fig. 1~Fig. 4, including being used for tongue Global feature extraction depth convolutional neural networks, for the area-of-interest of tongue body region Preliminary detection positioning network, use In the depth convolutional neural networks that deep layer abstract characteristics extraction is carried out to area-of-interest and for being split to tongue picture GrabCut algorithms；

Loss function is represented by formula (2)：

In formula, classification loss function is defined as by formula (3)：

Position returns loss function and is defined as by formula (4)：

Step3：Figure obtained above is split using mincut algorithms, it can will scheme to divide according to least cost equation For Source_node and Sink_node, cost equation is exactly the sum of weight on all sides being cut up, and after cutting, is owned The pixel for being connected to Source_node is considered as prospect, and all pixels for being connected to Sink_node are considered as background.

Claims

1. a kind of exempt from interactive GrabCut tongue bodies dividing method based on deep learning, it is characterised in that：Including being used for tongue entirety The depth convolutional neural networks of feature extraction, for the area-of-interest of tongue body region Preliminary detection positioning network, for pair Area-of-interest carries out the depth convolutional neural networks of deep layer abstract characteristics extraction and for being split to tongue picture GrabCut algorithms；

The depth convolutional neural networks for the extraction of tongue global feature, it is common as the basic network of whole network model Be divided into five layers, by convolutional layer, active coating and pond the layer alternately depth structure that forms, implicitly from given tongue as in data into Row unsupervised learning avoids and manually carries out explicit feature extraction；

The area-of-interest positioning network being used for tongue body region Preliminary detection, i.e. RPN networks, to not belonged to together on lingual surface The corresponding region of property is detected and divides, and obtains the candidate region of tongue body；

The depth convolutional neural networks for being used to carry out area-of-interest deep layer abstract characteristics extraction, by full articulamentum group Into the candidate region of the tongue body to obtaining on last stage carries out further feature extraction, and input area is reflected layer by layer in a network It penetrates, obtains different representations, extract its abstract characteristics, so as to fulfill the depth representing to tongue picture, obtain tongue body positioning As a result；

The GrabCut algorithms being split to tongue picture, using tongue body posting obtained above as input, so as to distinguish Go out the foreground and background of tongue picture figure, and then the automatic segmentation of tongue body is completed on the premise of without man-machine interactively.

2. a kind of as described in claim 1 exempt from interactive GrabCut tongue bodies dividing method based on deep learning, feature exists In：The depth convolutional neural networks for the extraction of tongue global feature, are divided into five layers, convolutional neural networks are by convolution The depth structure that layer, active coating and pond layer are alternately formed；By convolution operation, enhance prime information and reduce noise；Pass through Pondization operates, and using the principle of image local correlation, sub-sample is carried out to image, on the basis of image useful information is retained Reduce the treating capacity of data；

Network receives the tongue picture of arbitrary dimension as inputting, and network structure is as follows：The convolution kernel number of first convolutional layer Conv1 For 96, size is 7 × 7 × 3, and convolution step-length is 2, Filling power 3；The Chi Huahe of first pond layer (Pool1) for 7 × 7 × 3, pond step-length is 2, Filling power 1；ReLU active coatings 1 are then carried out to handle；Second convolutional layer Conv2 has 256 convolution Core, size are 5 × 5 × 96, step-length 2, Filling power 2；The Chi Huahe of second pond layer Pool2 is 7 × 7 × 96, and step-length is 2, Filling power 1；ReLU active coatings 1 are then carried out to handle；3rd convolutional layer Conv3 has 384 convolution kernels, and size is 3 × 3 × 256, Filling power 1；ReLU active coatings 1 are then carried out to handle；4th convolutional layer Conv4 has 384 convolution kernels, and size is 3 × 3 × 384, Filling power 1；ReLU active coatings 1 are then carried out to handle；5th convolutional layer Conv5 has 256 convolution kernels, greatly Small is 3 × 3 × 384, Filling power 1；ReLU active coatings 1 are then carried out to handle；

3. a kind of as claimed in claim 1 or 2 exempt from interactive GrabCut tongue bodies dividing method, spy based on deep learning Sign is：It is described to be used for in the region of interesting extraction network of tongue body region Preliminary detection, RPN networks receive basic network life Into 256 characteristic patterns as input, after-treatment is carried out to characteristic pattern using three convolutional layers and algorithm layer, exports rectangle mesh The set of candidate frame is marked, each frame includes 4 position coordinates variables and a score；

First convolutional layer Conv1/rpn of RPN networks has 256 convolution kernels, and size is 3 × 3 × 256；The second of RPN networks A convolutional layer Conv2/rpn has 18 convolution kernels, and size is the 3rd convolutional layer Conv3/rpn of 1 × 1 × 256, RPN networks There are 36 convolution kernels, size is 1 × 1 × 256；

RPN networks additionally add algorithm layer for formation zone candidate frame, and multiple dimensioned convolution operation is carried out on characteristic pattern, tool Body is embodied as：In the position of each sliding window using 3 kinds of scales and 3 kinds of length-width ratios, using current sliding window mouth center in The heart, and a kind of corresponding scale and length-width ratio, then mapping obtains the candidate region of 9 kinds of different scales in artwork, such as size For the shared convolution characteristic pattern of w × h, then a total of w × h × 9 candidate region；Finally, layer of classifying exports w × h × 9 × 2 The score of candidate region is the estimated probability of target/non-targeted to each region, return layer output w × h × 9 × 4 ginseng Number, the i.e. coordinate parameters of candidate region；

Training process is as follows in RPN networks：First with each point on 3 × 3 sliding window traversal characteristic pattern, the point is found Place's sliding window central point is mapped in the position in artwork, and point centered on it, and 3 kinds of scales (128 are generated in artwork², 256², 512²) and 3 kinds of length-width ratios (1:1,2:1,1:2) each point on candidate region, i.e. characteristic pattern corresponds to 9 in artwork A candidate region, if characteristic pattern size is w × h, then the candidate region number generated is w × h × 9, next to all candidates It is screened and is judged twice twice in region；Leave out first and complete to screen for the first time beyond the candidate region of artwork scope, then It calculates remaining candidate region it and hands over the ratio between union i.e. Duplication, and be each according to ratio with all real label areas A binary label is distributed in candidate region, judges that the region is tongue body with this, criterion is：1) it is ratio is maximum Candidate region be considered as positive sample, i.e. tongue body；2) in other candidate regions, if ratio is more than 0.7, then it is assumed that it is positive sample, Less than 0.3, then it is assumed that be negative sample, i.e., background, the candidate region that ratio is interposed between the two are given up；

After completing to the postsearch screening of candidate region, second of marker for judgment is carried out to it, will have with it is maximum hand over union it Label of the label of the true tab area of ratio as the candidate region, i.e. prospect label, and add background for all negative samples Label carries out stochastical sampling to positive negative sample, and number of samples is set to 128, and oversampling ratio is set to 1:1, positive sample under normal circumstances Number is less, if positive sample number is less than 64, differential section is supplied by negative sample, by 128 positive and negative samples in subsequent network This is merged trains together, with the discrimination of enhancing mark sample and non-mark sample.

4. a kind of as claimed in claim 1 or 2 exempt from interactive GrabCut tongue bodies dividing method, spy based on deep learning Sign is：The depth convolutional neural networks for being used to carry out area-of-interest deep layer abstract characteristics extraction, by full articulamentum Composition, and add pyramid pond layer before this and carry out dimension normalization；

Sub-network carries out feature extraction using full articulamentum to the candidate region after sampling, and candidate region shares 9 kinds of sizes, and complete Articulamentum requirement input size is consistent, therefore carries out dimension normalization first with pyramid pond layer herein, then is sent to three Full articulamentum carries out further feature extraction, and full articulamentum output neuron number is set to 1024 in sub-network, obtains 1024 dimensions Feature vector；Then, this feature vector is respectively fed to two full articulamentums and carries out Feature Compression, full articulamentum output god 2 and 8 are set to through first number；Finally, output valve with true tag value is compared respectively, carries out the recurrence of loss function about Beam；

Loss function is represented by formula (2)：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mo>{</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>,</mo> <mo>{</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mrow> <mi>c</mi> <mi>l</mi> <mi>s</mi> </mrow> </msub> </mfrac> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>L</mi> <mrow> <mi>c</mi> <mi>l</mi> <mi>s</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>p</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>+</mo> <mi>&lambda;</mi> <mfrac> <mn>1</mn> <msub> <mi>N</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> </mfrac> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msubsup> <mi>p</mi> <mi>i</mi> <mo>*</mo> </msubsup> <msub> <mi>L</mi> <mrow> <mi>r</mi> <mi>e</mi> <mi>g</mi> </mrow> </msub> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>,</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

In formula, classification loss function is defined as by formula (3)：

Position returns loss function and is defined as by formula (4)：

<mrow> <msub> <mi>smooth</mi> <mrow> <mi>L</mi> <mn>1</mn> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mn>0.5</mn> <msup> <mi>x</mi> <mn>2</mn> </msup> </mrow> </mtd> <mtd> <mrow> <mi>i</mi> <mi>f</mi> <mo>|</mo> <mi>x</mi> <mo>|</mo> <mo><</mo> <mn>1</mn> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>|</mo> <mi>x</mi> <mo>|</mo> <mo>-</mo> <mn>0.5</mn> </mrow> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

In formula, N_clsAnd N_regIt is to avoid the regular terms of over-fitting, λ is weight coefficient, and i is the classification index of the candidate region Value, t_iIt is the prediction coordinate shift amount of the candidate region, t*i is the actual coordinate offset of the candidate region, p_iIt is predicting candidate Region belongs to the probability of the i-th class, and p*i represents its true classification, and p*i=0 represents background classes, and p*i=1 represents tongue body class；

The error between predicted value and given actual value is calculated respectively by the two loss functions, it will using back-propagation algorithm Error returns layer by layer, and every layer of parameter is adjusted and updated using stochastic gradient descent method, more new formula such as formula (6) institute Show so that closer to actual value, i.e. the output of most latter two full articulamentum is closer to be given in mark value the predicted value of network Classification and location information；

<mrow> <msup> <mi>w</mi> <mo>&prime;</mo> </msup> <mo>=</mo> <mi>w</mi> <mo>-</mo> <mi>&eta;</mi> <mfrac> <mrow> <mo>&part;</mo> <mi>E</mi> </mrow> <mrow> <mo>&part;</mo> <mi>w</mi> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow>

In formula, w and w' are respectively to update front and rear parameter value, and E is the error amount being calculated by loss function layer, and η is to learn Habit rate.

5. a kind of as claimed in claim 1 or 2 exempt from interactive GrabCut tongue bodies dividing method, spy based on deep learning Sign is：The GrabCut algorithms being split to tongue picture, comprise the following steps：

Step1：The divided frame of prospect background given first, marks the information of foreground and background, and GrabCut passes through Gaussian Mixture mould Type GMM carries out foreground data and background data statistical modeling respectively, and GMM can learn and create new pixel distribution, to those Classify unknown pixel, can be classified according to their pixel relationships with known classification；

Step2：It will be schemed by Step1 according to one pair of profile creation of pixel, the node in figure is exactly pixel, except pixel Node there are two doing outside node：Source_node and Sink_node, all foreground pixels all with Source_node phases Even, all background pixels are all connected with Sink_node, and pixel is connected to the weight on the side of Source_node/end_node Belong to of a sort probability by them to determine, the weight between two pixels is by the information on side or the similitude of two pixels It determines, if the color of two pixels is very different, then the weight on the side between them will very little；

Step3：Figure obtained above is split using mincut algorithms, figure can be divided by it according to least cost equation Source_node and Sink_node, cost equation is exactly the sum of weight on all sides being cut up, after cutting, Suo Youlian The pixel for being connected to Source_node is considered as prospect, and all pixels for being connected to Sink_node are considered as background；