CN110827304B

CN110827304B - Traditional Chinese medicine tongue image positioning method and system based on deep convolution network and level set method

Info

Publication number: CN110827304B
Application number: CN201810911858.1A
Authority: CN
Inventors: 李梢; 侯思宇; 肖帅
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2023-06-09
Anticipated expiration: 2038-08-10
Also published as: CN110827304A

Abstract

The invention adopts the neural network to preprocess the tongue picture, and has good self-adaption and robustness; the tongue picture is divided into sub-blocks of 150 x 3 pixels, and only whether each sub-block contains a tongue region is marked. The neural network has a simple structure, and only comprises 3 convolution layers, 2 pooling layers, 2 full-connection layers (comprising 300 and 100 neurons) and an output layer, wherein the number of the convolution cores of the convolution layers is 10, 10 and 1; the invention can be applied to various intelligent terminals including intelligent mobile phones, gets rid of the limitation of hardware computing capability, and greatly expands the application range of tongue image positioning. The individual user can complete the positioning processing of the tongue picture by using the personal smart phone, the IPAD, the tablet and other portable intelligent terminals, and the tongue picture photo is not required to be uploaded to a processing center, so that the design of the distributed tongue picture acquisition/processing/analysis system is more flexible, and the resource utilization rate is also obviously improved.

Description

Traditional Chinese medicine tongue image positioning method and system based on deep convolution network and level set method

Technical Field

The invention relates to a traditional Chinese medicine tongue image positioning method and system based on a deep convolution network and a level set method.

Background

The tongue diagnosis is a special diagnosis method of traditional Chinese medicine, has relatively complete theoretical basis, reveals objective phenomena of human physiology and pathology through appearance, and is widely accepted and accepted by people. Tongue diagnosis is an effective diagnostic method of traditional Chinese medicine, and still plays an important role in clinical practice today. So far, tongue diagnosis is still one of diagnostic methods with clear direction, easy operation and effective effectiveness, and plays an important role in understanding diseases, guiding drug administration and evaluating curative effect. With the development of computer technology, people begin to utilize methods such as deep learning, machine vision and the like to combine with abundant clinical experience of traditional Chinese medicine specialists to promote tongue diagnosis objectification, standardization and other related researches. The positioning and segmentation of tongue images are one of important steps in the research process, and the accurate segmentation is helpful for subsequent feature analysis and classification recognition. The accuracy of localization and segmentation will have a significant impact on subsequent studies such as feature analysis.

Disclosure of Invention

Aiming at the open acquisition environment and facing different complex factors such as image resolution, image quality, light source color temperature, illumination intensity, shooting angle, background environment and the like, the invention provides a tongue positioning and segmentation method which is rapid, efficient, strong in robustness and strong in self-adaptability. The method designs an effective convolutional neural network structure, can realize the rapid positioning of the tongue body in the original image, and can accurately segment the tongue contour by using a level set method.

In the prior art, the scheme of traditional Chinese medicine tongue image recognition and segmentation aiming at an open background is less, and the common image segmentation method in the prior art has excellent segmentation effect when being used for simple connected areas with obvious areas and single colors such as nuclear magnetic resonance imaging pictures, but has quite unsatisfactory application effect on traditional Chinese medicine tongue image with complicated background, and can not accurately recognize tongue body areas and accurately segment tongues.

Tongue image segmentation schemes in the prior art can be mainly divided into two main categories: 1. performing early-stage processing on the image, and dividing the processed image by using an image dividing method; 2. the tongue picture is identified and segmented by artificial intelligence methods such as a neural network and the like. Specific examples are as follows.

The method comprises the steps of performing combination judgment by adopting a large number of image processing methods including maximum inter-class variance, hue threshold segmentation and RGB three-color component variance segmentation, and adopting a random forest prediction method by combining spatial position, color, shape and contour information of connected domain convex hulls. After a series of processing, the image is segmented by using an SNAKE segmentation method to obtain the tongue position.

CNI07316307A (applicant: beijing university of industry: zhuo Li) designs a convolutional neural network structure, and training the network by using collected sample data to obtain a network model, and the model can be used for automatically segmenting the Chinese medicine tongue image. The method has the advantages that training data are very complex to construct, and the calculated amount of the constructed neural network is very large. The image size is 512 x 512 pixels, each pixel of 5000 training pictures is manually labeled, and the total number of the pixels is 13 hundred million pixels. Meanwhile, the constructed coding network comprises 15 convolution layers, the number of the characteristic diagrams is 32, 64, 128, 256 and 512 respectively, and the coding network comprises 5 pooling layers. The decoding network work function reverse pooling layer, the convolution layer, the Batch Normalization layer and the planning layer and other complex structures are constructed. The number of parameters that the network needs to calculate is in billions.

The main advantages of the present invention compared to these patents are:

1. the neural network method is adopted to preprocess the image, so that the method has stronger self-adaption and robustness, is not limited by the traditional image processing, for example, does not need to manually select the threshold value, and simultaneously solves the problem that the traditional image processing method faces the image acquired under extreme or different environments after setting the threshold value, and the method is invalid.

2. The neural network structure with simple structure and small calculation amount is designed. The method solves the problem that the existing training set needs to label the background area or the tongue area for each pixel, divides the image into small sub-blocks of 150 x 3 pixels, and labels whether the small block contains an effective tongue area or not. A tongue contour segmentation problem is converted into a classification problem of whether the tongue is contained or not. The amount of artificial labeling drops dramatically and the training data of the present invention contains only 5000 sub-tiles. The neural network of the invention has simple structure and only comprises 3 convolution layers, the number of convolution kernels (namely the characteristic diagram in CN 107316307A) is 10,1 and 2 pooling layers, and 2 full-connection layers comprise 300, 100 neurons and an output layer. The model with the calculated parameters in millions after training can be applied to various intelligent terminals including intelligent mobile phones, namely: even if the intelligent terminal is an intelligent mobile phone, the tongue image positioning method can be realized, so that the limitation of hardware computing capacity is eliminated, the realization range of tongue image positioning/contour determination is greatly expanded. The individual user can complete the positioning processing of the tongue picture by using the personal smart phone, IPAD, tablet and other portable intelligent terminals, and the tongue picture is not required to be uploaded to a data processing center such as a server, so that the whole distributed tongue picture acquisition/processing/analysis system is more flexible in design, and the utilization rate of resources is also obviously improved.

According to one aspect of the present invention, there is provided a tongue image positioning method, which is characterized by comprising:

a) Positioning a tongue region in an input tongue picture, including:

for tongue picture, distinguishing the sub-picture blocks obtained by dividing the tongue picture by using a trained convolutional neural network, dividing the sub-picture blocks into two categories of sub-picture blocks containing tongue bodies and sub-picture blocks not containing tongue bodies, namely obtaining sub-picture labels corresponding to the sub-picture blocks,

logic judgment is carried out on the sub-graph labels to obtain a rectangular image containing a complete tongue body, namely, the quick positioning of the tongue body position is realized,

b) Image segmentation is performed on the tongue image using a level set-based process.

According to a further aspect of the present invention, the above-described processing for image segmentation using a level set-based processing target tongue picture includes:

provided that there is a surface phi which intersects a zero plane, thereby obtaining a curve C, the tongue profile obtained by passing the curve C through a level set,

let the coordinate points (x, y) on the curve C belong to a curve evolving over time, let x (t) be the position of the coordinate point at the time t, i.e. at any time t, each point x (t) is the point on the curve with a height of 0, i.e.:

φ(x(t)，t)＝0 (3)

Further, from the following formulas (4), (5) and (6), phi at any time is deduced _t ：

Wherein the method comprises the steps of

The surface phi is related to the tongue image information, updated as the potential derived from the tongue image,

taking x (t) as the determined tongue profile and making the error of x (t) from the actual tongue profile decrease with the change of t, specifically comprises:

using HSV space of tongue picture and RGB space information to give matrix I for calculating potential energy, wherein R, G, B and H respectively represent three channels of RGB space of image and H channel of HSV space, x and y represent horizontal and longitudinal coordinate value and x of matrix I _c ,y _c Representing the center point coordinates of matrix I.

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x _c ，y _c ) (7)

For a rectangular tongue picture containing a complete tongue, given an initial surface phi of the tongue picture at time t=0, phi is expressed in matrix form by equation (8),

the coordinate points within the range including the tongue region are expressed as a set U of formula (9),

and let the set U= { (x, y) |φ (x, y) > 0} (9)

Taking the obtained outline of the outer edge of the set U as a tongue edge coordinate value x (t) determined by the current cycle t,

by Num ₁ Num is the number of elements in set U ₂ To contain the complete tongueSubtracting Num from the total number of pixels of the rectangular image of (a) ₁ The surface phi and the set U are iterated until convergence according to the following formulas (10) to (16), so that the error of x (t) with the actual tongue profile is smaller and smaller in the process of the loop:

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

wherein:

grade1 and Grade2 are the average potential energy magnitudes of the tongue picture inside and outside the set U of rectangles respectively,

f (x, y) is an intermediate variable,

g is a Gaussian operator of a 5X 5 matrix with the size as shown in a formula (18), G is introduced for eliminating noise points to a certain extent, so that a result is more stable, wherein the Gaussian operator G is used as a convolution kernel for each cycle, the matrix phi is subjected to convolution operation,

wherein sigma is the standard deviation of the sum of the deviations,

F _n is the potential derived from the rectangular image as represented by equation (17), for updating phi with the potential and updating the set U with the updated phi, resulting in a smaller error x (t),

F _n ＝α·F(x,y) (17)，

when the set U is not changed any more, iteration is stopped, and the outer edge of the set U obtained at the moment, namely x (t), is taken as the coordinates of the edge of the tongue.

According to yet another aspect of the present invention, the trained convolutional neural network is built using a modeling method comprising:

a step of constructing a convolutional neural network, and

A step of training a convolutional neural network model,

wherein:

the convolutional neural network includes:

an input layer, wherein the input layer is a sub-block with the size of 150 x 3 obtained by dividing the image,

the first, second and third convolution layers, each comprising 10, 1 convolution kernels, all of size 5*5,

a first pooling layer and a second pooling layer, located after the first convolution layer and the second convolution layer, respectively, and being an average pooling layer with a kernel size of 2 x 2,

a fully connected layer comprising two layers, 300 and 100 neurons respectively,

the output layer is provided with a plurality of output layers,

the step of constructing a convolutional neural network includes:

neurons in the convolutional layer are connected with pixels in the small rectangular receptive field of the convolutional layer,

each neuron in the next of the three convolutional layers is connected with only small rectangular receptive fields located in the previous convolutional layer, such that the convolutional neural network concentrates on the low-level features of the previous level, and these low-level features are then assembled into high-level features of the next level,

each neuron in the pooling layer is connected to the output of a limited number of neurons in the previous layer, the connected previous layer neurons being spatially located within a small rectangle that is the kernel of the pooling layer, the average value in each kernel of size 2 x 2, span 2 being input to the next layer,

The third convolution layer is connected with the full connection layer through extension transformation,

connecting the full connection layer with the output layer, obtaining softmax cross entropy of the image for each category,

for the features obtained by the first to third convolution layers, the magnitude of the predicted value of the sub-image block belonging to each category is obtained through the forward propagation of the full connection layer, the probability value of the sub-image block belonging to each category is determined by softmax regression,

the step of training the convolutional neural network model comprises:

using cross entropy as the loss function, as shown in equation (1),

where Loss is the value of cross entropy, n is the number of input sample sub-blocks, p is the expected output probability, i.e. the actual value of each class to which the sample sub-block belongs, and q is the actual output of the convolutional neural network obtained by forward propagation calculation, i.e. the predicted value of each class to which the sub-block belongs.

Determining cross entropy between the predicted value and the real value of each class of the given sample sub-graph block by using a loss function,

training and updating parameters of the convolutional neural network by using a back propagation algorithm and random gradient descent according to formula (2),

where W represents a parameter value in the convolutional neural network, a is a learning rate,

The error between the predicted value and the true value of the category of the sample sub-block of the convolutional neural network is continuously reduced, and the convolutional neural network with perfect training is obtained after a plurality of cycles.

Drawings

FIG. 1 is a block diagram of a convolutional neural network for image classification;

FIG. 2 is a schematic diagram of the complete process of the method of the present invention. Fig. 2 (a) is an input original image, fig. 2 (b) is a scaled and segmented sub-block set, fig. 2 (c) is a classification result of sub-blocks by using a convolutional neural network, and a rectangular area containing a complete tongue body is obtained through logic judgment, and fig. 2 (d) is a tongue body contour finally detected in the rectangular area by a level set method.

FIG. 3 is a flow chart illustrating convolutional neural network modeling for classifying sub-tiles according to one embodiment of the present invention.

FIG. 4 is a flow chart of a tongue image positioning and contour segmentation method according to one embodiment of the present invention.

Detailed Description

The present inventors found that although research in tongue image segmentation has been advanced to some extent, the existing method is often used for segmentation in a single environment, i.e., in a closed acquisition environment, and is poor in universality, susceptible to the influence of image quality, color temperature of a light source, illumination intensity, shooting angle, background environment, etc., so that the tongue image segmentation effect in an open acquisition environment is quite unsatisfactory, and the required calculation amount is too large. In order to overcome the defects of the existing method, such as low robustness, incapability of being widely used and the like, the inventor provides a novel tongue image segmentation method aiming at an open acquisition environment, the method has good robustness and self-adaption, is little influenced by the aspects of external acquisition environment, image quality and the like, and the required calculation amount is obviously reduced compared with the prior art.

Convolutional Neural Networks (CNNs) are a neural network structure that references the principle of operation of the visual cortex of the brain, which have been used for image recognition since the 80 s of the 20 th century. In recent years, CNNs have been able to provide support for image search services, automatic driving automobiles, automatic video classification systems, etc. and have surprisingly performed in these complex visual tasks due to the increase in computational power, the amount of available training data, and the skill of deep network training. CNNs have invariance to image scaling, translation and rotation, and have proven to be very effective in the fields of image recognition, classification and the like, so that rapid positioning of tongue images in different acquisition environments is facilitated by the CNNs.

The level set (level set) method is first proposed by Osher et al in 1988, and by referring to important ideas in some fluids, the problem of geometric topological change in deformation of a closed curve along with time is effectively solved, the curve evolution process is avoided being tracked, and the curve evolution is converted into a pure Partial Differential Equation (PDE) solution, so that the calculation is stable and can be used for any dimension space. The image segmentation problem is solved by using a level set, namely the image segmentation problem is combined with an active contour model, PDE obtained by solving the models is solved by using a level set method, and the image segmentation problem belongs to a segmentation method of edge detection.

According to one embodiment of the invention, firstly, in order to realize quick automatic positioning, the image is scaled, redundant information of the image is reduced, and the processed image is subjected to segmentation operation; the convolution neural network is adopted to classify the sub-image block set obtained by segmentation, and the tongue body is rapidly positioned by logically judging the classification result; further, the contour segmentation is performed on the positioned tongue image rectangular region by using a level set method.

The tongue positioning and segmentation method according to one embodiment of the invention comprises the following steps:

firstly, constructing a convolutional neural network structure, and training the convolutional neural network to obtain a convolutional neural network with perfect training.

Inputting an original picture in an open environment, performing scaling and segmentation operations,

. Classifying the sub-image block set segmented in the previous step by utilizing a convolutional neural network obtained by training,

the classification result obtained by the steps is automatically screened to obtain a rectangular area sub-block containing the complete tongue body, thereby realizing the function of quick positioning,

dividing the tongue contour by level set processing for the obtained rectangular region subgraph containing tongue,

the edge segmentation function is completed and the complete tongue body is obtained.

Compared with the traditional Chinese medicine tongue image processing and dividing method, the method has the advantages and/or beneficial effects that:

1. realizing the rapid positioning of tongue images. Through image preprocessing, the data volume is reduced, so that the calculated volume is greatly reduced. And obtaining a rectangular region subgraph containing the complete tongue body through a convolutional neural network.

2. Automatic segmentation of tongue contours is realized, and a complicated manual selection process is avoided. The method has obvious advantages in the aspects of accuracy, speed, convenience and the like of positioning and segmentation.

3. Stronger adaptability and wider application range. The method provided by the invention has higher accuracy in response to different open acquisition environments, illumination intensity and image quality.

And step S101, constructing a data set of the CNN model.

Different image acquisition devices such as front cameras, rear cameras and professional tongue image acquisition devices of different mobile phones are used for acquiring and dividing tongue images of different people in different environments to form a data set required by training a convolutional neural network, and the data set comprises 51000 and Zhang Xiangsu sub-image block sets with the size of 150 x 150. These sub-block sets contain the tongue region and the background region.

Step S102, manually labeling semantic tags

And (3) manually adding a semantic labeling label to the sub-image block obtained in the step (S101), namely annotating that the sub-image block belongs to a tongue region or a background region. Wherein, if more than half of the sub-blocks are tongue regions, the sub-blocks are marked as tongue regions, otherwise, the sub-blocks are marked as background regions. If the original picture is taken, the shooting distance between the lens and the tongue body is too large, so that the tongue image area is narrow, which affects further feature analysis and tongue image application. Thus, an entire image is determined to be invalid if all sub-blocks of the image are marked as background areas, i.e. the area of the tongue image in any sub-block does not exceed 1/2.

And step S103, constructing and training a convolutional neural network model.

The invention designs a deep convolutional neural network aiming at the classification of sub-image blocks, trains the network by utilizing the data set created in the step S101 and the label of the step S102, and finally obtains a network model which can be used for classifying tongue and background areas.

First, an importance level in a network structure will be described. The most important component of the CNN structure is the convolutional layer, where the neurons are not connected to every pixel in the input image, but rather to the pixels in its receptive field. Furthermore, each neuron in the next convolutional layer is connected only with neurons, i.e., receptive fields, located within the small rectangles in the upper layer. This architecture allows the neural network to concentrate on the low-level features of the first hidden layer and then assemble them into the high-level features of the next hidden layer. Such a hierarchical structure is common in real world images, which is one of the reasons that CNNs work well in terms of image recognition.

Just like in the convolutional layer, each neuron in the pooling layer is connected to the output of a limited number of neurons in the previous layer, within a small rectangular sphere. However, the pooling neurons are not weighted; what it does is to aggregate inputs using an aggregation function. In the present invention, we use a kernel of size 2×2, spanning 2. The average value in each kernel is input to the next layer.

The overall structure design of the model is as follows: the network structure comprises an input layer, a convolution layer, a pooling layer, and full link and output layers. The input layer is a color image with a size of 150×150×3 obtained by segmentation. The network contains a total of 3 convolution layers, each containing 10,1 convolution kernels, all of size 5*5. The pooling layer is located after the convolution 1 layer and the convolution 2 layer, and two layers of pooling layers are added. And an average pooling layer with a core size of 2 x 2 is used. The convolution 3 layer is connected with the full connection layer through extension transformation. The total connection layer is divided into 300 and 100 layers. The full connection layer is connected with the output layer, and finally the softmax cross entropy of the image for each category is obtained. Wherein, the activation function adopted by the invention is a ReLU function. Figure 1 shows the overall network architecture of the design of the present invention.

And (3) obtaining the size of the predicted value of the image belonging to each class through forward propagation of the features obtained through the three convolution layers through the full connection layer, and giving the probability value belonging to each class through softmax regression by using the calculated value. We use cross entropy as a loss function, as shown in equation 1, to train network structure parameters using back propagation.

Where Loss is the value of cross entropy, n is the number of input samples, p is the desired output probability (belonging to 1 or not belonging to 0), and q is the actual output of the convolutional neural network calculated by forward propagation.

The cross entropy between the predicted value and the given real value is calculated using a loss function, the magnitude of which reflects the magnitude of the error. The parameters of the convolutional neural network are trained and updated using a back-propagation algorithm and random gradient descent (equation 2). The error between the predicted value and the true value of the convolutional neural network is continuously reduced, and the convolutional neural network with perfect training is finally obtained.

Where W represents a parameter value in the convolutional neural network and α is a learning rate.

Finally, the classification accuracy of the convolutional neural network constructed by us on 2000 images (1000 positive and negative samples respectively) of the test set is as high as 94.9%.

According to one aspect of the present invention, there is provided a "modeling method of a convolutional neural network model for determining a class of a subgraph in an image", comprising (as shown in fig. 3):

a step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein the method comprises the steps of

The convolutional neural network includes:

an input layer, wherein the input layer is a color image with the size of 150 x 3 obtained by dividing,

the output layer is provided with a plurality of output layers,

the step of constructing a convolutional neural network includes:

the neurons in the convolutional layer are not connected to every pixel in the input image, but to pixels in the receptive field of the convolutional layer,

each neuron in the next of the three convolutional layers is connected only to neurons, i.e., receptive fields, that lie within a small rectangle in the previous convolutional layer, which architecture allows the neural network to concentrate on the low-level features of the first hidden layer, then assemble them into the high-level features of the next hidden layer,

Each neuron in the pooling layer is connected to the output of a limited number of neurons in the previous layer, within a small rectangular gamut, the average value in each kernel of size 2 x 2, span 2 is input to the next layer,

connecting the full connection layer with the output layer to obtain the softmax cross entropy of the image for each category, wherein the activation function adopted by the invention is a ReLU function,

for the features obtained by the first to third convolution layers, the magnitude of the prediction value of the sub-image block belonging to each category is obtained through the forward propagation of the full connection layer, the probability value of the sub-image block belonging to each category is given by softmax regression,

the step of training the convolutional neural network model comprises:

using cross entropy as the loss function, as shown in equation (1),

where Loss is the value of cross entropy, n is the number of input samples, p is the desired output probability, q is the actual output of the convolutional neural network calculated by forward propagation,

the cross entropy between the predicted value and the given real value is calculated using the loss function,

according to equation 2, the parameters of the convolutional neural network are trained and updated using a back-propagation algorithm and random gradient descent,

the error between the predicted value and the true value of the subgraph class adopting the convolutional neural network is continuously reduced, and the convolutional neural network with perfect training is finally obtained after a plurality of cycles. In general, the number of cycles is set, and if the number is sufficient, the cycle can be stopped.

FIG. 4 is a flow chart of a tongue image positioning and contour segmentation method according to one embodiment of the present invention, the method comprising:

step S201: image preprocessing

Because of the difference between the equipment for image shooting and the acquisition environment, the characteristics of the pixel size, the resolution and the like of the image are different. Firstly, preprocessing an input image, and scaling the input image into pictures with basically consistent pixel numbers aiming at pictures with different resolutions. Thus, the subsequent sub-graph segmentation and classification, and the standardization and normalization of the tongue edge detection and other processes are facilitated. Since there is no absolute relationship between the tongue image position and the resolution of the image, the relationship between the tongue image position coordinates and the resolution is proportional. Therefore, the low-resolution image is utilized for positioning through scaling transformation, the obtained edge coordinates are restored by inverse transformation to obtain the coordinate positions of the tongue edges under the real image, and meanwhile, the data quantity and the operation quantity are reduced, so that the calculation speed is obviously improved.

In one embodiment according to the present invention, taking a photo taken by a front camera of a general mobile phone as a reference, using 1080000 pixels as a basic standard, a picture with quality greater than the standard (such as a rear camera, a professional camera) is scaled to the same scale. And the scaling is recorded so that the coordinate position calculated later can be restored by inverse transformation.

Step S202: tongue region positioning

The convolutional neural network for sub-graph classification is obtained via step S103, and the sub-graphs obtained by dividing the input picture in step S201 are classified. Taking a standard front-end shot image of a mobile phone as an example,

under the condition that the shooting distance is proper and the shot picture is effective, a traditional Chinese medicine tongue picture (fig. 2 a) with the pixel size of 1200 x 900 is scaled and segmented to obtain 48 sub-picture blocks (fig. 2 b) with the size of 150 x 150. And judging the sub-graph blocks obtained by segmentation by using a convolutional neural network with perfect training, and dividing the sub-graph blocks into two different categories including tongue bodies and not including tongue bodies to obtain corresponding sub-graph labels. Through logic judgment on the sub-graph labels, a proper rectangular frame (figure 2 c) containing the complete tongue body is finally obtained, and the tongue body position is rapidly positioned.

Step S203: level set tongue contour segmentation

In order to meet the requirement of further tongue image analysis work, the method is based on quick positioning. The invention adopts the processing based on the level set to carry out image segmentation on the image containing the tongue body.

The core idea of the level set is: assuming that there is a surface phi that intersects a zero plane to obtain curve C, then curve C is the profile we have through the level set.

Let the coordinate point (x, y) on the curve C belong to a curve evolving with time, x (t) being the position of the coordinate point at time t. At any instant t, for each point x (t) is the point on the curve of the surface phi at a height of 0, namely:

φ(x(t)，t)＝0 (3)

furthermore, we can infer phi at any time according to the formulas (4), (5) and (6) _t

For a particular embodiment, in tongue image segmentation, the surface φ is related to tongue image information and updated as the potential derived from the tongue image. x (t) is the tongue profile calculated by the method, and as t changes, the error between x (t) and the actual tongue profile is reduced. The specific calculation method is as follows:

firstly, a matrix I for calculating potential energy is given by combining HSV space of a tongue image with RGB space information, wherein R, G, B and H respectively represent three channels of the RGB space of the image and H channels of the HSV space. x and y represent the horizontal and vertical coordinate values of the matrix, x _c ,y _c Representing the coordinates of the center point of the matrix.

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x _c ，y _c ) (7)

According to the rectangular image containing the complete tongue body obtained in step S202, given an initial surface Φ of the image at time t=0, the initial surface Φ is expressed in a matrix form (formula 8), and a set U (formula 9) of coordinate points containing a tongue body region range is used for recording, and the calculated outline of the outer edge of the set U is the tongue edge coordinate value x (t) calculated at time t of the current cycle, where:

and let the set U= { (x, y) |φ (x, y) > 0} (9)

Wherein Num ₁ Num is the number of elements in set U ₂ Subtracting Num from the total number of pixels of the rectangular image containing the complete tongue ₁ Is a number of (3).

The surface phi and the set U are the calculation process from loop iteration to convergence in the operation process, so that the error between x (t) and the actual tongue profile in the loop process is smaller and smaller. The loop iterates the following formula:

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

in the above formula,

levels

1 and 2 record the average potential energy of the images inside and outside the set U respectively. F (x, y) is an intermediate variable used for calculation and derivation, G (formula 18) is a Gaussian operator of a matrix with the size of 5 multiplied by 5, and the introduction of G can eliminate noise points of images to a certain extent, so that a calculation result is more stable, and the Gaussian operator is used as a convolution kernel for carrying out convolution operation on the matrix phi every time. F (F) _n (equation 17) shows that the potential of the tongue image surface phi along with the image derivation is updated phi by using the value, and the set U is updated by using the updated phi, so that x (t) with smaller error is obtained.

F _n ＝α·F(x,y) (17)

When the set U is not changed any more, iteration is stopped, and the obtained outer edge of the set U, namely x (t) corresponding to the final time t, is taken as the edge coordinate of the tongue body.

And calculating through the formula, finally dividing the tongue contour by utilizing level set processing to obtain a rectangular region subgraph containing the tongue, and completing an edge dividing function to obtain a complete tongue (figure 2 d).

The invention provides a rapid and novel tongue image segmentation method. Firstly, an input picture is scaled and segmented to obtain a sub-picture set with smaller pixels, and then the sub-pictures are classified by using a convolutional neural network, so that the calculated amount is effectively reduced, the calculated time is shortened, and the tongue position is rapidly positioned. And further combining statistical information of the HSV channels and the RGB channels of the image and adopting a level set method to accurately segment the tongue image of the rectangular area containing the tongue body positioned in the previous step. The method is suitable for various different acquisition environments including an open environment and a closed environment, and has the characteristics of strong adaptability, wide application range and the like. The classification accuracy of the convolutional neural network constructed by the method on 2000 images (1000 positive and negative samples) of the test set is as high as 94.9%. Compared with a general tongue image positioning and segmentation method, the method has the advantages of smaller calculated amount and obvious improvement of accuracy, avoids complicated manual tongue image contour selection process, and realizes automatic positioning and segmentation functions. The method has obvious advantages in the aspects of accuracy, speed and the like of positioning and segmentation.

Claims

1. A tongue image positioning method is characterized by comprising the following steps:

a) Positioning a tongue region in an input tongue picture, including:

b) Image segmentation is performed on the tongue picture using a level set-based process,

wherein step B) comprises:

φ(x(t)，t)＝0 (3)

Wherein the method comprises the steps of

using HSV space of tongue picture and RGB space information to give matrix I for calculating potential energy, wherein R, G, B and H respectively represent three channels of RGB space of image and H channel of HSV space, x and y represent horizontal and longitudinal coordinate value and x of matrix I _c ,y _c Representing the coordinates of the center point of matrix I

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x _c ，y _c ) (7)

and let the set U= { (x, y) |φ (x, y) > 0} (9)

by Num ₁ Num is the number of elements in set U ₂ Subtracting Num from the total number of pixels of the rectangular image containing the complete tongue ₁ Iterating the surface phi and the set U through the loop of equations (10) through (16) until convergence, such that x (t) is during the loopThe error from the actual tongue profile is smaller and smaller:

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

Wherein:

f (x, y) is an intermediate variable,

wherein sigma is the standard deviation of the sum of the deviations,

F _n is a potential derived from a rectangular image of the surface phi as expressed by equation (17) for updating phi with the potential and usingThe updated phi updates the set U, resulting in a smaller error x (t),

F _n ＝α·F(x，y) (17)，

2. The tongue positioning method according to claim 1, wherein the standard deviation σ has a value of 1.

3. The tongue positioning method according to claim 1, wherein:

the photographed original tongue picture is scaled into a picture with a preset size, and then the tongue picture is divided into sub-blocks with the preset size.

4. A tongue localization method as claimed in any one of claims 1-3, wherein said trained convolutional neural network is built and trained using a modeling method comprising:

A step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein:

the convolutional neural network includes:

the output layer is provided with a plurality of output layers,

the step of constructing a convolutional neural network includes:

the step of training the convolutional neural network model comprises:

using cross entropy as the loss function, as shown in equation (1),

where Loss is the value of cross entropy, n is the number of input sample sub-blocks, p is the expected output probability, i.e. the actual value of each class to which the sample sub-block belongs, q is the actual output of the convolutional neural network calculated by forward propagation, i.e. the predicted value of each class to which the sub-block belongs,

5. The tongue positioning method according to claim 4, wherein:

the n sample sub-blocks are semantically labeled in such a way that the label annotates whether the sample sub-block belongs to the lingual area or to the background area,

wherein if more than half of the sample sub-blocks are tongue regions, the sample sub-blocks are marked as tongue regions, otherwise the sample sub-blocks are marked as background regions,

if all sample sub-blocks from an image are marked as background areas, then the image is determined to be invalid, i.e. all sample sub-blocks from the image are culled from the n sample sub-blocks.

6. The tongue positioning method according to claim 4, wherein n is equal to 5000.

7. A computer readable storage medium storing a computer program enabling a processor to perform the tongue positioning method according to one of claims 1-6.

8. A tongue positioning system, comprising:

A portion for locating a tongue region in an input tongue picture, the portion being configured to:

a portion for image segmentation of the tongue picture using a level set-based processing is used to perform the following operations:

φ(x(t)，t)＝0 (3)

Wherein the method comprises the steps of

using HSV space of tongue picture and RGB space information to give matrix I for calculating potential energy, wherein R, G, B and H respectively represent three channels of RGB space of image and H channel of HSV space, x, y represents horizontal and vertical coordinate value of matrix I, xc and yc represents central point coordinate of matrix I

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x _c ，y _c ) (7)

and let the set U= { (x, y) |φ (x, y) > 0} (9)

by Num ₁ Num is the number of elements in set U ₂ Subtracting Num from the total number of pixels of the rectangular image containing the complete tongue ₁ The surface phi and the set U are iterated until convergence according to the following formulas (10) to (16), so that the error of x (t) with the actual tongue profile is smaller and smaller in the process of the loop:

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

Wherein:

f (x, y) is an intermediate variable,

wherein sigma is the standard deviation of the sum of the deviations,

F _n ＝α·F(x，y) (17)，

9. The tongue positioning system of claim 8, wherein the standard deviation σ has a value of 1.

10. A tongue positioning system according to claim 8, wherein:

the tongue picture is obtained by scaling an original tongue picture into a picture of a predetermined size, and the tongue picture is divided into sub-tiles of a predetermined size.

11. Tongue localization system according to one of the claims 8-10, characterized in that said trained convolutional neural network is built and trained using a modeling method comprising:

A step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein:

the convolutional neural network includes:

the output layer is provided with a plurality of output layers,

the step of constructing a convolutional neural network includes:

the step of training the convolutional neural network model comprises:

using cross entropy as the loss function, as shown in equation (1),

where Loss is the value of cross entropy, n is the number of input sample sub-blocks, p is the desired output probability, i.e., the true value, q is the actual output of the convolutional neural network calculated by forward propagation, i.e., the predicted value,

12. A tongue positioning system according to claim 11, wherein:

13. The tongue positioning system of claim 11, wherein n is equal to 5000.