CN110827304A

CN110827304A - Traditional Chinese medicine tongue image positioning method and system based on deep convolutional network and level set method

Info

Publication number: CN110827304A
Application number: CN201810911858.1A
Authority: CN
Inventors: 李梢; 侯思宇; 肖帅
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2018-08-10
Filing date: 2018-08-10
Publication date: 2020-02-21
Anticipated expiration: 2038-08-10
Also published as: CN110827304B

Abstract

The invention adopts the neural network to preprocess the tongue picture, and has good self-adaptation and robustness; the tongue picture is divided into sub-blocks of 150 × 3 pixels, and only whether each sub-block contains the tongue region is marked. The neural network has a simple structure, and only comprises 3 convolution layers with convolution kernels of 10, 10 and 1, 2 pooling layers, 2 full-connection layers (comprising 300 and 100 neurons), and an output layer; the method can be applied to various intelligent terminals including intelligent mobile phones, and gets rid of the limitation of hardware computing capacity, so that the application range of tongue picture positioning is greatly expanded. An individual user can use a portable intelligent terminal such as an individual smart phone, an IPAD (internet protocol ad), a tablet and the like to complete the positioning processing of the tongue picture, and the tongue picture does not need to be uploaded to a processing center, so that the design of the distributed tongue picture acquisition/processing/analysis system is more flexible, and the resource utilization rate is remarkably improved.

Description

Traditional Chinese medicine tongue image positioning method and system based on deep convolutional network and level set method

Technical Field

The invention relates to a traditional Chinese medicine tongue image positioning method and system based on a deep convolutional network and a level set method.

Background

The tongue diagnosis, as a characteristic diagnostic method in traditional Chinese medicine, has a relatively complete theoretical basis, reveals objective phenomena of human physiology and pathology through the appearance, and is widely accepted and accepted by people. The tongue diagnosis is an effective characteristic diagnosis method in traditional Chinese medicine, and still plays an important role in the clinical practice of today. So far, tongue diagnosis is still one of the definite, easy-to-use and effective diagnostic methods for syndrome diagnosis, and plays an important role in understanding diseases, guiding medication and evaluating curative effect. With the development of computer technology, people begin to utilize methods such as deep learning and machine vision to combine with abundant clinical experience of traditional Chinese medicine experts to promote relevant researches such as objectification and standardization of tongue diagnosis. The positioning and segmentation of the tongue image are one of the important steps in the research process, and the accurate segmentation is helpful for subsequent feature analysis and classification identification. The accuracy of positioning and segmentation will have a great influence on subsequent research such as feature analysis.

Disclosure of Invention

The invention provides a tongue positioning and segmentation method which is rapid, efficient, strong in robustness and strong in self-adaptability, aiming at an open type acquisition environment and facing complex factors such as different image resolution, image quality, light source color temperature, illumination intensity, shooting angle, background environment and the like. The method designs an effective convolution neural network structure, can realize the rapid positioning of the tongue body in the original image, and can accurately segment the tongue outline by using a level set method.

In the prior art, there are few traditional Chinese medicine tongue manifestation recognition and segmentation schemes aiming at an open background, and a common image segmentation method in the prior art has an excellent segmentation effect when being used for a simple connected region with obvious region and single color, such as a nuclear magnetic resonance imaging picture, while the application effect on a traditional Chinese medicine tongue manifestation image with a complex background is very unsatisfactory, and the tongue body region cannot be recognized accurately and the tongue cannot be segmented accurately.

Tongue segmentation schemes in the prior art can be mainly divided into two categories: 1. carrying out early-stage processing on the image, and segmenting the processed image by using an image segmentation method; 2. and (5) identifying and segmenting the tongue picture by using artificial intelligence methods such as a neural network and the like. Specific examples are as follows.

CN107194937A (applicant: Xiamen university, inventor: Huangdaoyang) adopts a large number of image processing methods for combined judgment, including maximum between-class variance, hue threshold segmentation and RGB three-color component variance segmentation, and adopts a random forest prediction method by combining the spatial position, color, shape and outline information of a connected domain convex hull. After a series of processing, the image is segmented by adopting an SNAKE segmentation method to obtain the tongue position.

CNI07316307A (applicant: Beijing industry university, inventor: Zhuoli) designs a convolutional neural network structure, training the network by using collected sample data to obtain a network model, and automatically segmenting the tongue image of traditional Chinese medicine by using the model. The method is quite complex in training data construction, and the calculated amount of the constructed neural network is extremely large. The image size was 512 by 512 pixels and each pixel of 5000 training pictures was artificially labeled for a total of 13 hundred million pixels. Meanwhile, the constructed coding network comprises 15 convolutional layers, the number of the characteristic diagrams is respectively 32, 64, 128, 256 and 512, and the number of the characteristic diagrams comprises 5 pooling layers. And constructing a decoding network office reflocculation layer, a convolution layer, a Batch Normalization layer and a plan layer and other complex structures. The number of parameters that the network needs to compute is in billions.

The main advantages of the present invention compared to these patents are:

1. the neural network method is adopted to preprocess the image, so that the method has strong self-adaption and robustness, is not limited by the traditional image processing, for example, the threshold value does not need to be manually selected, and simultaneously solves the problem that the traditional image processing method fails when the traditional image processing method faces the image acquired under extreme or different environments after the threshold value is set.

2. A neural network structure with simple structure and small calculation amount is designed. The problem that each pixel needs to be marked with a background region or a tongue body region in the existing training set is solved skillfully, the image is divided into small sub-blocks 150 x 3, and only whether the small blocks contain effective tongue regions or not is marked. And converting the tongue body contour segmentation problem into a two-classification problem of whether the tongue head is included or not. The amount of manual labeling decreases sharply, and the training data of the invention only contains 5000 sub-blocks. The neural network of the invention has a simple structure, only comprises 3 convolution layers, the number of convolution kernels (namely a characteristic diagram in CN 107316307A) is 10, 10, 1 and 2 pooling layers, and 2 full-connection layers comprise 300 and 100 neurons and output layers. The calculation parameters are in the million level, and the trained model can be applied to various intelligent terminals including intelligent mobile phones, namely: even if an intelligent terminal such as a smart phone is used, the tongue picture positioning method can be realized, so that the limitation of hardware computing capacity is eliminated, and the realization range of tongue picture positioning/contour determination is greatly expanded. An individual user can use a portable intelligent terminal such as an individual smart phone, an IPAD (internet protocol ad), a tablet and the like to complete the positioning processing of the tongue picture, and the tongue picture does not need to be uploaded to a data processing center such as a server, so that the whole distributed tongue picture collecting/processing/analyzing system is more flexible in design, and the utilization rate of resources is remarkably improved.

According to an aspect of the present invention, there is provided a tongue image positioning method, comprising:

A) positioning the tongue body area in the input tongue picture, comprising:

for the tongue picture, a sub-picture block obtained by segmenting the tongue picture is judged by utilizing a trained convolutional neural network, the sub-picture block is divided into two categories of a sub-picture block containing a tongue body and a sub-picture block not containing the tongue body, and a sub-picture label corresponding to each sub-picture block is obtained,

the sub-image labels are logically judged to obtain a rectangular image containing a complete tongue body, namely the tongue body position is quickly positioned,

B) and performing image segmentation on the tongue picture by adopting level set-based processing.

According to a further aspect of the present invention, the processing for image segmentation of a tongue picture by level set-based processing includes:

a surface phi is provided which intersects a zero plane to obtain a curve C, the curve C is passed through a level set to obtain a tongue profile,

let coordinate points (x, y) on curve C belong to a curve that evolves over time, let x (t) be the positions of the coordinate points at time t, i.e. at any time t, each point x (t) is a point of surface phi on a curve with height 0, i.e.:

φ(x(t)，t)＝0 (3)

further, from the following equations (4), (5) and (6), φ at any time is deduced_t：

Wherein

The surface phi is related to tongue image information, updated with a potential derived from the tongue image,

taking x (t) as the determined tongue contour, and reducing the error between x (t) and the real tongue contour with the change of t, which specifically comprises:

providing a matrix I for calculating potential energy by combining HSV space of the tongue picture with information of RGB space, wherein R, G, B and H respectively represent three channels of RGB space of the image and H channel of HSV space, and x and y represent horizontal and longitudinal coordinate values and x of the matrix I_c,y_cRepresenting the coordinates of the center point of the matrix I.

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x_c，y_c) (7)

For a rectangular picture of the tongue including the complete tongue body, given an initial surface phi of the picture at time t-0, phi is expressed in the form of a matrix of equation (8),

the coordinate points in the range including the tongue body region are represented as a set U of formula (9),

and let the set U { (x, y) | φ (x, y) > 0} (9)

Taking the obtained outline of the outer edge of the set U as the coordinate value x (t) of the tongue edge determined by the current cycle t,

in Num₁Is the number of elements in the set U, Num₂Subtracting Num from the total pixel number of the rectangular image containing the complete tongue body₁Is iterated through the loop of equations (10) to (16) until convergence such that x (t) has a smaller and smaller error from the true tongue contour during the loop:

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

wherein:

grade1 and Grade2 are the average potential energy sizes of the rectangular tongue pictures inside and outside the set U respectively,

f (x, y) is an intermediate variable,

g is a Gaussian operator of a matrix of size 5 x 5 as in equation (18), and G is introduced to eliminate noise to some extent, making the result more stable, wherein each cycle uses the Gaussian operator G as a convolution kernel to perform a convolution operation on the matrix phi,

wherein a is a standard deviation of the values of a,

F_nis a potential derived from the rectangular image with the surface phi as expressed by equation (17) for updating phi with the potential and updating the set U with the updated phi, resulting in x (t) with smaller error,

F_n＝α·F(x,y) (17)，

when the set U is not changed any more, the iteration is stopped, and the outer edge of the set U obtained at this time, namely x (t), is taken as the coordinate of the edge of the tongue.

According to another aspect of the present invention, the above-mentioned trained convolutional neural network is built by a modeling method comprising:

a step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein:

the convolutional neural network includes:

an input layer, which is a sub-block with the size of 150 × 3 obtained by dividing the image,

the first, second and third convolution layers contain 10, 1 convolution kernels, respectively, all convolution kernels are 5 x 5 in size,

a first pooling layer and a second pooling layer located after the first convolutional layer and the second convolutional layer, respectively, and being an average pooling layer having a core size of 2 x 2,

a fully-connected layer, comprising two layers, having 300 and 100 neurons respectively,

an output layer is arranged on the substrate,

the step of constructing the convolutional neural network comprises:

connecting neurons in the convolutional layer to pixels in the small rectangular receptive field of the convolutional layer,

connecting each neuron in the next to three convolutional layers with only a small rectangular receptive field located in the previous convolutional layer, so that the convolutional neural network focuses on the low-level features of the previous level, then assembles these low-level features into high-level features of the next level,

connecting each neuron in the pooling layer to the output of a limited number of neurons in a previous layer, the connected neurons in the previous layer being spatially structured within a small rectangle that is the kernel of the pooling layer, inputting the average value of each kernel of 2 x 2 size and span 2 to the next layer,

the third convolution layer is connected with the full connection layer through extension transformation,

connecting the full connection layer with the output layer to obtain the softmax cross entropy of the image for each category,

obtaining the size of a predicted value of each category of the sub-image block by the forward propagation of the full connection layer of the features obtained by the first to the third convolution layers, determining the probability value of each category of the sub-image block by utilizing softmax regression,

the step of training the convolutional neural network model comprises:

using the cross entropy as a loss function, as shown in equation (1),

wherein Loss is a value of cross entropy, n is the number of input sample sub-image blocks, p is expected output probability, namely a true value of each class to which a sample sub-image block belongs, and q is actual output obtained by forward propagation calculation of the convolutional neural network, namely a predicted value of each class to which the sub-image block belongs.

Determining the cross entropy between the prediction value and the actual value of each class to which the predetermined sample sub-picture block belongs using a loss function,

training and updating the parameters of the convolutional neural network by using a back propagation algorithm and random gradient descent according to the formula (2),

where W represents the parameter values in the convolutional neural network, α is the learning rate,

and continuously reducing the error between the predicted value and the true value of the type of the sample sub-image block adopting the convolutional neural network, and obtaining the convolutional neural network with perfect training through multiple cycles.

Drawings

FIG. 1 is a diagram of a convolutional neural network architecture for image classification;

FIG. 2 is a schematic diagram of the whole process of the method of the present invention. Fig. 2(a) is an input original image, fig. 2(b) is a set of sub-image blocks obtained by scaling and dividing, fig. 2(c) is a rectangular area including a complete tongue obtained by using a classification result of a sub-image block by a convolutional neural network and through logical judgment, and fig. 2(d) is a tongue contour finally detected in the rectangular area by a level set method.

FIG. 3 is a flow diagram illustrating modeling of a convolutional neural network for classifying sub-tiles according to one embodiment of the present invention.

FIG. 4 is a flowchart of a tongue image location and contour segmentation method according to an embodiment of the present invention.

Detailed Description

The inventor finds that although research on tongue image segmentation is advanced to a certain extent, the existing method is often segmented under a single environment, namely a closed acquisition environment, has poor universality, is easily influenced by image quality, light source color temperature, illumination intensity, shooting angle, background environment and the like, has a very undesirable tongue image segmentation effect under an open acquisition environment, and has the problem of large required calculation amount. In order to overcome the defects that the existing method is not strong in robustness, cannot be widely used and the like, the inventor provides a new tongue image segmentation method for an open acquisition environment, the method has good robustness and adaptability, is slightly influenced by the aspects of an external acquisition environment, image quality and the like, and the required calculated amount is remarkably reduced compared with the existing technical scheme.

Convolutional Neural Networks (CNN) are a neural network structure that references the principle of the operation of the cerebral visual cortex and have been used for image recognition since the 80's of the 20 th century. In recent years, due to the increase in computing power, the amount of available training data, and the sophistication of deep web training, CNNs have been able to provide support for image search services, automated driving of automobiles, automated video classification systems, and the like, and have seen dramatic performance in these complex visual tasks. The CNN has invariance to image scaling, translation and rotation, and is proved to be very effective in the fields of image identification and classification, and therefore, the CNN is helpful for quickly positioning tongue images under different acquisition environments.

A level set (level set) method is firstly proposed by Osher et al in 1988, and by taking the important ideas in some fluids as reference, the problem of geometric topological change in deformation of a closed curve along with time is effectively solved, the problem of solving by tracking the evolution process of the closed curve and converting the curve evolution into a pure Partial Differential Equation (PDE) is avoided, so that the calculation is stable, and the method can be used for any dimension space. The image segmentation problem is solved by using a level set, namely the image segmentation problem is combined with an active contour model, and a PDE (partial differential equation) obtained by using the level set method to solve the models belongs to a segmentation method for edge detection.

According to one embodiment of the invention, firstly, in order to realize rapid automatic positioning, the image is zoomed, redundant information of the image is reduced, and the processed image is segmented; a convolutional neural network is adopted to classify the sub-image block set obtained by segmentation, and the rapid positioning of the tongue body is realized by logically judging the classification result; furthermore, the rectangular area of the tongue image obtained by positioning is subjected to contour segmentation by a level set method.

The tongue positioning and dividing method according to one embodiment of the invention comprises the following steps:

firstly, a convolutional neural network structure is constructed, and the network is trained to obtain a well-trained convolutional neural network.

Inputting an original picture in an open environment, performing scaling transformation and segmentation operations,

. Classifying the set of sub-image blocks segmented in the previous step by using a convolutional neural network obtained by training,

the classification result obtained in the steps is automatically screened to obtain a rectangular area sub-image block containing the complete tongue body, thereby realizing the function of quick positioning,

the obtained rectangular region subgraph containing the tongue body is divided into the tongue body outline by level set processing,

the edge segmentation function is completed and the complete tongue body is obtained.

Compared with the traditional Chinese medicine tongue image processing and segmenting method, the method has the advantages and/or beneficial effects that:

1. the tongue image can be quickly positioned. Through image preprocessing, the data volume is reduced, and the calculated amount is greatly reduced. And obtaining a rectangular region subgraph containing the complete tongue body through a convolutional neural network.

2. The automatic segmentation of the tongue outline is realized, and the complicated manual selection process is avoided. The method has obvious advantages in the aspects of accuracy, speed, convenience and the like of positioning and dividing.

3. Strong adaptability and wide application range. The method provided by the invention has higher accuracy rate in response to different open acquisition environments, illumination intensity and image quality.

And S101, constructing a data set of the CNN model.

Different image acquisition devices such as front-mounted and rear-mounted cameras of different mobile phones and professional tongue image acquisition devices are used for acquiring and segmenting tongue images of different people under different environments to form a data set required by a training convolutional neural network, and the data set comprises 51000 sub-image block sets with the pixel size of 150 x 150 in total. These sub-picture block sets contain the tongue region and the background region.

Step S102, manually marking semantic labels

And (4) manually adding a semantic labeling label to the sub-image block obtained in the step (S101), namely, annotating the sub-image block belonging to the tongue body area or the background area. If more than half of the sub-image block is the tongue body area, the sub-image block is marked as the tongue body area, otherwise, the sub-image block is marked as the background area. If the original picture is shot, the shooting distance between the lens and the tongue body is too large, so that the area of the tongue image is narrow, and further feature analysis and tongue image application are influenced. Thus, an image is determined to be invalid if all the sub-blocks of a complete image are labeled as background regions, i.e., the tongue image has no more than 1/2 area in any sub-block.

And S103, constructing and training a convolutional neural network model.

The invention designs a deep convolutional neural network aiming at the classification of sub-image blocks, trains the network by using the data set created in the step S101 and the label of the step S102, and finally obtains a network model for classifying tongue bodies and background areas.

An important hierarchy in the network structure is first introduced. The most important component of the CNN structure is the convolutional layer, in which neurons are not connected to every pixel in the input image, but to pixels in its receptive field. Furthermore, each neuron in the next convolutional layer is connected only to neurons located within the small rectangle in the previous layer, i.e., the receptive field. This architecture allows the neural network to focus on the low-level features of the first hidden layer and then assemble them into the high-level features of the next hidden layer. This hierarchical structure is common in real-world images, which is one of the reasons why CNNs are effective in image recognition.

As in convolutional layers, each neuron in the pooling layer is connected to the output of a limited number of neurons in the previous layer, within a small rectangular receptive field. However, the pooled neurons have no weight; all it does is aggregate the inputs using an aggregation function. In the present invention, we use a kernel of size 2 × 2, with a span of 2. The average value in each core is input to the next layer.

The overall structure of the model is designed as follows: the network structure includes an input layer, a convolutional layer, a pooling layer, and a full link layer and an output layer. Wherein, the input layer is a color image which is obtained by segmentation and has the size of 150 × 3. The network comprises 3 convolutional layers in total, and the convolutional layers respectively comprise 10 convolutional kernels, 10 convolutional kernels and 1 convolutional kernel, and the sizes of all the convolutional kernels are 5 x 5. The pooling layer is located after the convolution 1 layer and the convolution 2 layer, resulting in two pooling layers. And an average pooling layer with a kernel size of 2 x 2 was used. The convolution 3 layers are connected with the full connection layer through extension transformation. The total connection layer has two layers, and the number of the neurons is 300 and 100 respectively. And connecting the full connection layer with the output layer to finally obtain the softmax cross entropy of the image for each category. The activation function adopted by the invention is a ReLU function. Figure 1 shows the overall network structure of the design of the present invention.

And forward propagation of the features obtained by the three convolutional layers through the fully connected layer is carried out to obtain the size of the predicted value of each type of the image, and the probability value of each type is given by utilizing the calculated value through softmax regression. We use cross entropy as a loss function, as shown in equation 1, to train network structure parameters using back propagation.

Wherein Loss is the value of the cross entropy, n is the number of input samples, p is the expected output probability (1 or 0), and q is the actual output of the convolutional neural network calculated by forward propagation.

The cross entropy between the predicted value and the given real value is calculated using a loss function, the magnitude of which reflects the magnitude of the error. The parameters of the convolutional neural network are trained and updated using a back-propagation algorithm and random gradient descent (equation 2). And the error between the predicted value and the true value of the convolutional neural network is continuously reduced, and finally the perfectly trained convolutional neural network is obtained.

Where W represents the parameter value in the convolutional neural network and α is the learning rate.

The classification accuracy of the finally constructed convolutional neural network on 2000 images (1000 positive and negative samples) in the test set is as high as 94.9%.

According to an aspect of the present invention, there is provided a "modeling method of a convolutional neural network model for determining classes of subgraphs in an image", comprising (as shown in fig. 3):

a step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein

The convolutional neural network includes:

an input layer, the input layer being a segmented color image having a size of 150 × 3,

an output layer is arranged on the substrate,

the step of constructing the convolutional neural network comprises:

the neurons in the convolutional layer are not connected to every pixel in the input image, but to pixels in the receptive field of the convolutional layer,

the architecture, which allows the neural network to focus on the low-level features of the first hidden layer, and then assemble them into the high-level features of the next hidden layer,

each neuron in the pooling layer is connected to the output of a limited number of neurons in the previous layer, located in a small rectangular receptive field, the average value of each kernel of 2 x 2 size and span 2 is input to the next layer,

connecting the full connection layer with the output layer, and finally obtaining the softmax cross entropy of the image for each category, wherein the activation function adopted by the invention is a ReLU function,

obtaining the size of each predicted value of the sub-image block belonging to each category by the forward propagation of the fully connected layer to the characteristics obtained by the first to the third convolution layers, and giving the probability value of each category of the sub-image block by utilizing softmax regression,

the step of training the convolutional neural network model comprises:

using the cross entropy as a loss function, as shown in equation (1),

wherein Loss is the value of the cross entropy, n is the number of input samples, p is the expected output probability, q is the actual output of the convolutional neural network through forward propagation calculation,

the cross entropy between the predicted value and the given real value is calculated using a loss function,

according to the formula 2, the parameters of the convolutional neural network are trained and updated by using a back propagation algorithm and random gradient descent,

and continuously reducing the error between the predicted value and the true value of the sub-graph category adopting the convolutional neural network, and finally obtaining the convolutional neural network with perfect training through multiple cycles. In general, the number of cycles is set, and if the number is sufficient, the operation can be stopped.

FIG. 4 is a flowchart of a tongue image localization and contour segmentation method according to an embodiment of the present invention, the method comprising:

step S201: image pre-processing

Due to the fact that the equipment for shooting the images is different from the acquisition environment, the characteristics of the images such as pixel size, resolution and the like are different. Firstly, an input image is preprocessed, and the images with different resolutions are zoomed into the images with basically consistent pixel numbers. Therefore, subsequent subgraph segmentation and classification, tongue edge detection and other processes are conveniently standardized and normalized. Since there is no absolute relationship between the tongue image position and the resolution of the image, the relationship between the tongue image position coordinates and the resolution is proportional. Therefore, the low-resolution image can be used for positioning through scaling transformation, the obtained edge coordinates are restored through inverse transformation to obtain the coordinate positions of the tongue edges under the real image, and meanwhile, the data volume and the operation amount are reduced, so that the calculation speed is obviously improved.

In one embodiment according to the present invention, the photos taken by the front camera of the general mobile phone are taken as a reference, 1080000 pixels are taken as a basic standard, and the pictures with quality greater than the standard (such as the photos taken by the rear camera and the professional camera) are zoomed to the same size. And recording the scaling so that subsequently calculated coordinate positions can be restored by inverse transformation.

Step S202: tongue region positioning

In step S103, a convolutional neural network for sub-image classification is obtained, and sub-images obtained by segmenting the input image in step S201 are classified. Taking a standard mobile phone front-end shot image as an example,

under the condition that the shooting distance is proper and the shooting picture is effective, a Chinese medicine tongue picture (figure 2a) with the pixel size of 1200 x 900 is zoomed and divided to obtain 48 sub-blocks (figure 2b) with the size of 150 x 150. And (3) judging the sub-image blocks obtained by segmentation by using a well-trained convolutional neural network, and dividing the sub-image blocks into two types of different types including tongue bodies and not including tongue bodies to obtain corresponding sub-image labels. Through the logic judgment of the sub-image labels, a proper rectangular frame (figure 2c) containing the complete tongue body is finally obtained, and the rapid positioning of the tongue body position is realized.

Step S203: horizontal tongue contour segmentation

In order to meet the requirement of further tongue image analysis work, the method is based on quick positioning. The present invention employs level set based processing for image segmentation of images containing the tongue.

The core idea of the horizontal set is: assuming that there is a surface φ that intersects a zero plane, resulting in a curve C, then curve C is the profile we have obtained from the level set.

Let the coordinate point (x, y) on the curve C belong to a curve that evolves with time, and x (t) is the position of the coordinate point at time t. At any time t, x (t) is for each point x (t) the point on the curve of height 0 of surface φ, i.e.:

φ(x(t)，t)＝0 (3)

further, we can deduce φ at any time according to equations (4), (5), (6)_t

For a particular embodiment, in tongue image segmentation, the surface φ is associated with tongue image information and is updated with a potential derived from the tongue image. x (t) is the tongue contour calculated by the method, and the error between x (t) and the real tongue contour is reduced along with the change of t. The specific calculation method is as follows:

firstly, a matrix I for calculating the potential energy is given by combining HSV space of a tongue body image with information of RGB space, wherein R, G, B and H respectively represent three channels of the RGB space of the image and an H channel of the HSV space. x, y represent the horizontal and vertical coordinate values of the matrix, x_c,y_cRepresenting the coordinates of the center point of the matrix.

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x_c，y_c) (7)

According to the rectangular image containing the complete tongue body obtained in step S202, given the initial surface Φ of the image at the time t ═ 0, the initial surface Φ is expressed in a matrix form (formula 8), and the set U containing the coordinate points of the tongue body area range is recorded (formula 9), and the outer edge profile of the set U obtained by calculation is the tongue edge coordinate value x (t) calculated at the time t of the present cycle, where:

and let the set U { (x, y) | φ (x, y) > 0} (9)

Wherein Num₁Is the number of elements in the set U, Num₂Subtracting Num from the total pixel number of the rectangular image containing the complete tongue body₁The number of the cells.

The surface phi and the set U are the calculation processes of loop iteration till convergence in the operation process, so that the error of x (t) and the real tongue contour is smaller and smaller in the loop process. The loop iterates the following equation:

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

in the above formula, the average of Grade1,2, recording the average potential energy of the images inside and outside the set U respectively. F (x, y) is an intermediate variable for calculation and derivation, G (formula 18) is a Gaussian operator of a matrix with the size of 5 multiplied by 5, G is introduced to eliminate noise of the image to a certain extent, the calculation result is more stable, and the Gaussian operator is used as a convolution kernel in each cycle to perform convolution operation on the matrix phi. F_n(equation 17) shows the tongue image surface phi along with the derived potential of the image, using the value to update phi, and using the updated phi to update the set U, i.e. x (t) with smaller error is obtained.

F_n＝α·F(x,y) (17)

And when the set U is not changed any more, stopping iteration, and taking the outer edge of the set U, namely x (t) corresponding to the final time t, as the edge coordinate of the tongue body.

And (3) calculating by the formula, and finally segmenting the tongue outline by utilizing level set processing on the obtained rectangular region subgraph containing the tongue body to complete the edge segmentation function and obtain the complete tongue body (fig. 2 d).

The invention provides a fast and novel tongue image segmentation method. Firstly, an input picture is zoomed and divided to obtain a subgraph set with smaller pixels, and then the subgraphs are classified by using a convolutional neural network, so that the calculated amount is effectively reduced, the calculating time is reduced, and the rapid positioning of the tongue body position is realized. And further combining the statistical information of the HSV channel and the RGB channel of the image and adopting a level set method to perform accurate tongue image segmentation operation on the rectangular region containing the tongue body positioned in the last step. The method is suitable for various different acquisition environments including an open environment and a closed environment, and has the characteristics of strong adaptability, wide application range and the like. The classification accuracy of the constructed convolutional neural network on 2000 images (1000 positive and negative samples) in the test set is as high as 94.9%. Compared with the general tongue image positioning and dividing method, the method has the advantages of smaller calculated amount, obvious improvement of accuracy, avoidance of complicated process of manually selecting the tongue image outline and realization of automatic positioning and dividing functions. The method has obvious advantages in the aspects of positioning and segmenting accuracy, speed and the like.

Claims

1. A tongue image positioning method, characterized by comprising:

A) positioning the tongue body area in the input tongue picture, comprising:

2. The tongue image positioning method according to claim 1, wherein step B) comprises:

φ(x(t)，t)＝0 (3)

Wherein

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x_c，y_c) (7)

and let the set U { (x, y) | φ (x, y) > 0} (9)

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

wherein:

f (x, y) is an intermediate variable,

wherein a is a standard deviation of the values of a,

F_n＝α·F(x,y) (17)，

3. The tongue image positioning method according to claim 1, wherein the standard deviation σ is 1.

4. The tongue image positioning method according to claim 1, wherein:

firstly, zooming the shot original tongue picture into a picture with a preset size as the tongue picture, and then dividing the tongue picture into sub-picture blocks with preset sizes.

5. The tongue image localization method according to any one of claims 1-4, wherein the trained convolutional neural network is built and trained using a modeling method comprising:

a step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein:

the convolutional neural network includes:

an output layer is arranged on the substrate,

the step of constructing the convolutional neural network comprises:

the step of training the convolutional neural network model comprises:

using the cross entropy as a loss function, as shown in equation (1),

6. The tongue image positioning method according to claim 5, wherein:

the n sample sub-tiles are semantically tagged in such a way that the tag annotates whether the sample sub-tile belongs to a tongue region or a background region,

wherein if more than half of the sample sub-image block is a tongue region, the sample sub-image block is marked as the tongue region, otherwise, the sample sub-image block is marked as the background region,

if all sample sub-blocks from an image are marked as background regions, the image is judged to be invalid, i.e. all sample sub-blocks from the image are removed from the n sample sub-blocks.

7. A method for positioning a tongue image in accordance with claim 5, wherein n is equal to 5000.

8. A tongue image localization system, comprising:

a section for locating a tongue body region in the input tongue picture, for:

the part for image segmentation of the tongue picture by adopting the level set-based processing is used for executing the following operations:

φ(x(t)，t)＝0 (3)

Wherein

I(x，y)＝1.3R(x，y)-6.7G(x，y)+6.4B(x，y)-H(x_c，y_c) (7)

and let the set U { (x, y) | φ (x, y) > 0} (9)

φ(x，y)：＝φ(x，y)*G (15)

U＝{(x，y)|φ(x，y)＞0} (16)

wherein:

f (x, y) is an intermediate variable,

wherein a is a standard deviation of the values of a,

F_n＝α·F(x,y) (17)，

9. A tongue image positioning system as claimed in claim 8 wherein the standard deviation σ is 1.

10. The tongue image positioning system of claim 8, wherein:

the tongue picture is obtained by scaling an original tongue picture into a picture of a predetermined size, and the tongue picture is divided into sub-tiles of a predetermined size.

11. A tongue image localization system according to any one of claims 8-10 wherein said trained convolutional neural network is constructed and trained using a modeling method comprising:

a step of constructing a convolutional neural network, and

a step of training a convolutional neural network model,

wherein:

the convolutional neural network includes:

an output layer is arranged on the substrate,

the step of constructing the convolutional neural network comprises:

the step of training the convolutional neural network model comprises:

using the cross entropy as a loss function, as shown in equation (1),

wherein Loss is the value of the cross entropy, n is the number of the input sample sub-image blocks, p is the expected output probability, i.e. the true value, q is the actual output, i.e. the predicted value, obtained by the convolutional neural network through forward propagation calculation,

12. The tongue image positioning system of claim 11, wherein:

13. A tongue image positioning system as claimed in claim 11 wherein n is equal to 5000.

14. Storage medium having stored thereon a computer program enabling a processor to execute the tongue image localization method according to one of claims 1 to 7.