CN113239790A

CN113239790A - Tongue crack feature identification and length measurement method and system

Info

Publication number: CN113239790A
Application number: CN202110512076.2A
Authority: CN
Inventors: 颜仕星; 郭峰; 何海洋; 李晓霞; 李春清
Original assignee: Shanghai Daosheng Medical Tech Co Ltd
Current assignee: Shanghai Daosheng Medical Tech Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-08-10

Abstract

The invention discloses a tongue crack feature identification and length measurement method based on computer vision, which comprises the following steps of: step 1, detecting and marking tongue cracks; and 2, calculating the length of the tongue crack. The method improves the accuracy of crack feature extraction and identification, can quantify the length of the crack, solves the problems of incomplete segmentation and over-segmentation of some defects in the traditional segmentation method, can correctly detect tongue crack images, has certain anti-jamming capability, and solves the problems of information loss and over-segmentation. The invention also discloses a tongue crack characteristic identification and length measurement system based on computer vision.

Description

Tongue crack feature identification and length measurement method and system

Technical Field

The invention relates to an image processing method, in particular to a tongue crack feature identification and length measurement method based on computer vision. The invention also relates to a tongue crack characteristic identification and length measurement system based on computer vision.

Background

The tongue diagnosis in traditional Chinese medicine is an examination method for finding out the pathological and physiological changes of human viscera by observing the physiological and pathological forms of the tongue body. Doctors can make corresponding diagnosis and evaluation on the disease condition of patients by observing the tongue picture, and the method has important value in the traditional Chinese medicine. Long-term clinical practice proves that the tongue diagnosis can accurately distinguish the lesion part and the severity of the disease. The tongue cracks in the tongue picture are important components in tongue diagnosis, the existence and shape of the tongue cracks can reflect the deficiency of human vitamins and the health condition of the digestive system, and the types of the tongue cracks can be roughly divided into longitudinal shapes, transverse shapes, vertical lines, radial shapes, brain loops, cobblestone shapes and the like. The traditional Chinese medicine obtains the relationship between the tongue crack type and the health condition of the human body, so that the tongue crack type is clearly distinguished, and great benefit is brought to the inquiry of the traditional Chinese medicine.

Currently, the methods for tongue crack research can be divided into two broad categories. The first method is a traditional image processing technology, which is specifically divided into two methods, one method is to perform threshold segmentation on the tongue cracks based on the gray scale and color information of the tongue image, but tongue crack characteristics are not considered, so that the tongue cracks are difficult to segment accurately and completely; the other is to divide tongue cracks based on a line detection method, and can be further divided into 3 subclasses, namely a contour-based dividing method, a center line-based dividing method and a region-based dividing method. Although the line detection-based method considers the texture characteristics of tongue cracks, it has some disadvantages, such as: the contour-based segmentation method adopts a first derivative, is sensitive to noise and often cannot obtain a closed contour in practical application; the segmentation method based on the central line generally adopts a second derivative which is sensitive to noise, and the extraction of the position of the central line has larger error; the segmentation method based on the region often segments rough textures and pseudo cracks on the tongue coating, and the cracks can be segmented only by manually removing redundant textures.

The second method is a deep learning-based technology, but the existing tongue crack feature extraction technology based on the convolutional neural network is much dependent on the design and training of the convolutional neural network, and the wide application of the traditional image processing technology in tongue cracks is often ignored. Therefore, the obtained tongue crack characteristics hardly have accurate reference value for traditional Chinese medicine inquiry.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a tongue crack feature identification and length measurement method based on computer vision, which can realize objective quantitative evaluation on the tongue crack.

In order to solve the technical problems, the technical solution of the tongue crack feature identification and length measurement method based on computer vision of the present invention is that the method comprises the following steps:

step 1, detecting and marking tongue cracks;

detecting tongue cracks, performing primary characteristic extraction on cracks in an input tongue picture, obtaining and marking a boundary frame of local cracks, and obtaining a non-marked local crack picture;

in another embodiment, the step 1 comprises the steps of:

step 1.1, collecting not less than N thousands of original tongue pictures through a tongue picture collecting device, and marking boundary frame information of cracks on the original tongue pictures in a manual marking mode, wherein the cracks are original cracks; storing the marked tongue picture file marked with the original crack boundary frame information; randomly dividing the marked tongue picture into a crack detection training set and a crack detection verification set; at least M pieces of original tongue pictures without labels are collected; wherein M is greater than N;

step 1.2, building a crack detection training model based on deep learning;

step 1.3, performing model training on the crack detection training model to obtain a trained crack detection model;

the model training of step 1.3 comprises the following steps:

step 1.3.1, training the crack detection training model based on deep learning set up in the step 1.2 by using the crack detection training set obtained in the step 1.1, and verifying by using the crack detection verification set obtained in the step 1.1 to obtain a detection model;

step 1.3.2, predicting each of the M thousands of unmarked original tongue pictures by using the detection model obtained in the step 1.3.1, sequencing the predicted tongue crack boundary boxes according to probability values, taking the crack boundary boxes with the probability values larger than 0.9 +/-0.05, deleting the boundary boxes with the probability values smaller than 0.9 +/-0.05 to obtain a prediction file, and merging the prediction file with the crack detection training set obtained in the step 1.1 to obtain a new crack detection training set;

step 1.3.3, the new crack detection training set obtained in the step 1.3.2 is used for training the detection model obtained in the step 1.3.1 again, and the crack detection verification set obtained in the step 1.1 is used for verification to obtain a trained crack detection model;

step 1.4, detecting a local crack boundary frame by using the trained crack detection model;

and step 1.5, cutting M thousands of unmarked original tongue pictures according to the local crack boundary frame detected in the step 1.4, and cutting out the local crack pictures to obtain the unmarked local crack pictures.

In another embodiment, ciou loss is used as a regression loss function of the crack bounding box in the training process of step 1.3.1 and/or step 1.3.3; the formula of the ciou crack bounding box regression loss function is as follows:

where ρ (.) represents the Euclidean distance,

b represents the center point of the prediction box B,

b^gtrepresenting the real box B^gtIs measured at a central point of the beam,

c represents a prediction box B and a real box B^gtIs the smallest diagonal distance of the bounding rectangle,

alpha is a parameter used to balance the loss function,

v is a parameter used to measure the uniformity of the aspect ratio.

In another embodiment, the parameter α in step 1.3.1 is:

wherein IoU indicates the overlapping degree of any two crack boundary frames B1 and B2,

the parameter v in the step 1.3.1 is:

wherein, w^gtIndicating the width of the truly marked crack bounding box,

h^gtindicating the height of the truly marked crack bounding box,

w represents the width of the predicted crack bounding box,

h represents the height of the predicted crack bounding box.

In another embodiment, the method for detecting the local crack bounding box in step 1.4 is as follows: and (3) predicting each of the M thousands of original tongue pictures without labels by using the trained crack detection model obtained in the step 1.3, sequencing the predicted tongue crack boundary boxes according to the probability values, taking the crack boundary boxes with the probability values larger than 0.9 +/-0.05, and deleting the boundary boxes with the probability values smaller than 0.9 +/-0.05 to obtain local crack boundary boxes.

Step 2, calculating the length of the tongue crack;

and performing image processing on the unmarked local crack image by using the unmarked local crack image and sequentially using a threshold segmentation algorithm, an expansion corrosion algorithm, a wavelet transform algorithm and a skeleton extraction algorithm, and then calculating the length of the crack.

In another embodiment, the step 2 specifically includes the following steps:

step 3.1, performing graying treatment on the unmarked local crack image to obtain a grayed tongue crack image;

step 3.2, applying median filtering to the tongue crack image subjected to the graying treatment to obtain a smoothed local crack image;

in another embodiment, further, the specific method of step 3.2 is:

using a two-dimensional sliding template on the tongue crack image subjected to the graying treatment, sequencing pixels in the template according to the size of pixel values, generating a two-dimensional data sequence which monotonically rises or falls, taking an intermediate pixel value as an output value of the template region, and outputting a median filter as follows:

g(x,y)＝med{f(x-k,y-l),(k,l∈w)}

wherein g (x, y) is the processed image,

f (x, y) is the original image,

(k, l) is a region pixel of the pixel (x, y),

w is the two-dimensional template size;

med represents the median filter function.

3.3, segmenting the smoothed local crack image by utilizing self-adaptive threshold segmentation, calculating a local threshold of the image according to the brightness distribution of different regions of the image, and segmenting the local region by utilizing the threshold to obtain a binarized local tongue image;

in another embodiment, the calculation of the local threshold in said step 3.3 employs the following formula:

t＝mean*(1+p*e^(-q*mean)+k*((std/r)-1))

wherein, t is a threshold value,

mean is the local mean value of the mean,

std is the local variance of the signal,

p＝2，

q＝10，

k＝0.25，

r＝0.5。

step 3.4, performing opening operation on the binarized local tongue picture for N times, namely performing corrosion for N times and then performing expansion for N times;

step 3.5, performing N times of closing operation on the local crack diagram after opening operation, namely performing N times of expansion and then performing N times of corrosion;

in another embodiment, further, the etching performed N times in step 3.4 and/or step 3.5 adopts the following formula:

dst(x,y)＝erode(src(x,y))＝min_(x1,y1)src(x+x1,y+y1)

wherein, dst is an objective function,

(x, y) are points in the original image,

the enode refers to performing an etching operation on an original image,

src is the original image and the reference image,

(x1, y1) is a point in a structural element;

the expansion in step 3.4 is performed for N times by using the following formula:

dst(x,y)＝dilate(src(x,y))＝max_(x1,y1)src(x+x1,y+y1)

wherein, dst is an objective function,

(x, y) are points in the original image,

the dilate refers to the dilation operation performed on the original image,

src is the original image and the reference image,

(x1, y1) is a point in a structural element.

Step 3.6, obtaining a high-low frequency tongue image component diagram by using wavelet decomposition on the local crack diagram after the closed operation;

in another embodiment, further, the wavelet decomposition method of step 3.6 is: performing i-layer wavelet decomposition on the crack image;

an image signal is a square image and is divided into a left area and a right area which are the same, wherein the left area is L, and the right area is H; l is a low frequency component, H is a high frequency component;

performing one-level wavelet decomposition on the image signal, and decomposing a left region and a right region into an upper region and a lower region respectively; wherein, the upper left area is LL1, and the lower left area is LH 1; the upper right region is HL1, and the lower right region is HH 1;

performing two-level wavelet decomposition on the image signal, and decomposing an upper left secondary region LL1 into 4 secondary regions, wherein the upper left secondary region is LL2, and the lower left secondary region is LH 2; the secondary region at the upper right is HL2, and the secondary region at the lower right is HH 2; the rest areas are the same as the first-level decomposition;

by analogy, performing i-level wavelet decomposition on the image signal to obtain a group of wavelet coefficients, wherein the size and the shape of the group of wavelet coefficients are the same as those of the original image; wherein, the upper left area is LLI, and the lower left area is LHi; the upper right region is HLi, and the lower right region is HHi; the remaining regions are the same as the previous level decomposition.

Step 3.7, fusing the high-low frequency tongue image component image by using a wavelet fusion method, and obtaining processed high-frequency and low-frequency components by using local variance as a basis in low frequency;

in another embodiment, the wavelet fusion method of step 3.7 uses the following formula:

G(X,p)＝∑_P∈Qω(p)|c(X,p)-u(X,p)|²

wherein the function G (X, p) represents the significance of the regional variance of the low-frequency component coefficient matrix in the tongue crack image X,

q represents a region of the image to be displayed,

point p is the center of the Q region, p (m, n) represents the spatial location of the wavelet coefficients,

c (X) a coefficient matrix representing wavelet low frequency components of the tongue crack image X,

c (X, p) represents the value of an element indexed by (m, n) in the low frequency component coefficient matrix,

u (X, p) represents the mean value of the coefficient matrix of the low frequency component of the tongue flaw image X,

ω (p) represents the weight, and the value is larger as it gets closer to p;

taking any two low-frequency sub-images A1 and A2, and representing the regional variance of the low-frequency coefficient matrix of the images A1 and A2 as G (A1, p) and G (A2, p); the regional variance match of the low frequency coefficient matrices for images a1 and a2 is defined by M2(p) at point p:

M₂the value of (p) is changed between 0 and 1, and the smaller the value is, the lower the matching degree of the low-frequency coefficient matrixes of the two images is;

let T2 be the threshold value of the matching degree; when M is₂(p) < T2, the fusion strategy is as follows:

when M is₂(p) ≧ T2, the fusion strategy is as follows:

wherein,

W_man＝1-W_min；

selecting the maximum value of the wavelet coefficient absolute value in the high-frequency part of the wavelet transform to make up the exact part of the information between high frequency and low frequency; because the noise and the defect of the crack target are high-frequency information, the median filter is used for filtering the high-frequency coefficient of the tongue crack image after fusion so as to remove the noise and the defect of the tongue crack image:

d (X, P) represents a coefficient matrix of the wavelet high frequency component at point P.

Step 3.8, reconstructing crack characteristics by the wavelet inverse transformation on the processed high-frequency and low-frequency components;

step 3.9, using a skeleton extraction algorithm for the reconstructed crack characteristics to obtain a tongue crack skeleton;

in another embodiment, further, the skeleton extraction algorithm of step 3.9 is to erode the target pixels meeting the characteristic conditions in each iteration step, so that the target becomes thinner and thinner; continuously iterating until no new pixel point is corroded in the current round of operation of the target after the last corrosion, and ending the algorithm; the four conditions of the algorithm are:

(a)2≤B(P₁)≤6

wherein, B (P)₁) Representing the central pixel P₁The number of non-zero neighbourhoods of the point, i.e. the central pixel P₁The number of surrounding target pixels is between 2 and 6;

(b)A(P₁)＝1

wherein, A (P)₁) As a central pixel P₁In 8 area pixels around the point, two adjacent pixels appear 0->1 times;

(c)P₂*P₄*P ₆0 or P₂*P₄*P₈＝0

(d)P₄*P₆*P ₈0 or P₂*P₆*P₈＝0。

And 3.10, calculating the length of the tongue crack.

In another embodiment, further, the specific method of step 3.10 is: calculating the length of the tongue crack by using a progressive method; the formula for calculating the tongue crack length is:

L_n＝∑L_i

wherein L is_iIs two adjacent pixel points (x)_i，y_i)、(x_i-1，y_i-1) The distance of (a) to (b),

S_bin order to refer to the area of the scale,

m is the number of pixel points occupied by the reference scale,

S_bwhere/m is the actual area corresponding to a pixel point,

the actual length corresponding to one pixel point;

L_ncalculating the length of the upper boundary or the lower boundary of the crack by a progressive method; the maximum value in the upper and lower boundaries was taken as the length of the tongue crack.

The invention also provides a tongue crack feature identification and length measurement system based on computer vision, and the technical solution thereof is as follows:

a tongue crack detection module; the boundary frame is configured for detecting the tongue cracks, performing primary characteristic extraction on the cracks in the input tongue picture, acquiring and labeling the boundary frame of the local cracks, and obtaining a label-free local crack picture; and

a tongue crack length calculation module; the method is configured to utilize the unmarked local crack image, sequentially use threshold segmentation, expansion corrosion, wavelet transformation and a skeleton extraction algorithm to perform image processing on the unmarked local crack image, and then calculate the length of the crack.

In another embodiment, further, the tongue crack detection module includes:

a tongue picture acquisition module; the system is configured for acquiring not less than N thousands of original tongue pictures through a tongue picture acquisition device, and marking boundary frame information of cracks on the original tongue pictures in a manual marking mode, wherein the cracks are original cracks; storing the marked tongue picture file marked with the original crack boundary frame information; randomly dividing the marked tongue picture into a crack detection training set and a crack detection verification set; at least M pieces of original tongue pictures without labels are collected; wherein M is greater than N;

a crack detection training model building module; configured to build a deep learning based crack detection training model;

a crack detection model training module; the crack detection training model is configured for performing model training on the crack detection training model to obtain a trained crack detection model;

the crack detection model training module comprises:

detecting a model training module; the crack detection training set is configured to train the crack detection training model based on deep learning by using the crack detection training set, and the crack detection training model is verified by using the crack detection verification set to obtain a detection model;

a crack detection training set correction module; the detection model is configured to predict each of the M thousands of unmarked original tongue pictures, sort the predicted tongue crack bounding boxes according to probability values, select the crack bounding boxes with the probability values larger than 0.9 +/-0.05, delete the bounding boxes with the probability values smaller than 0.9 +/-0.05, obtain a prediction file, and merge the prediction file with the crack detection training set to obtain a new crack detection training set;

a detection model correction module; the new crack detection training set is configured to train the detection model again, and the crack detection verification set is used for verification to obtain a trained crack detection model;

a local crack bounding box detection module; configured to detect a local flaw bounding box using the trained flaw detection model;

a tongue picture cutting module; and the method is configured to cut M pieces of unmarked original tongue pictures according to the local crack boundary frame, and cut out the local crack pictures to obtain the unmarked local crack pictures.

In another embodiment, further, the tongue crack length calculation module includes:

a graying processing module; the tongue crack image processing device is configured to perform graying processing on the unmarked local crack image to obtain a grayed tongue crack image;

a median filtering module; configured to apply a median filter to the grayed tongue crack image to obtain a smoothed local crack image;

a threshold segmentation module; the image segmentation device is configured to segment the smoothed local crack image by adaptive threshold segmentation, calculate a local threshold value according to the brightness distribution of different regions of the image, and segment the local region by using the threshold value to obtain a binary local tongue image;

an expansion corrosion module; is configured to perform the opening operation on the binarized local tongue picture for N times, namely, perform the corrosion for N times and then perform the expansion for N times; then, performing N times of closing operation on the local crack diagram after the opening operation, namely performing N times of expansion and then performing N times of corrosion;

a wavelet decomposition module; the local crack image after the closed operation is configured is decomposed by utilizing wavelets to obtain a high and low frequency tongue image component image;

a wavelet fusion module; configured to fuse the high and low frequency tongue image component maps by a wavelet fusion method, and obtain processed high and low frequency components in low frequency by using local variance as a basis;

a crack shape structure reconstruction module; is configured to act on the processed high-frequency and low-frequency components through inverse wavelet transform to reconstruct crack features;

a skeleton extraction module; is configured to use a skeleton extraction algorithm on the reconstructed crack characteristics to obtain a tongue crack skeleton; and

a tongue crack length calculation module; is configured to calculate the length of the tongue crack by a progressive method.

The invention can achieve the technical effects that:

the method simultaneously applies the traditional image processing technology and the deep learning technology, takes advantages and weaknesses of the traditional image processing technology and the deep learning technology into consideration, and can improve the accuracy of tongue crack feature extraction and recognition.

The invention organically integrates deep learning and the traditional image processing method, can improve the accuracy of tongue crack feature extraction and identification, can realize objective evaluation of tongue cracks, and fills the gap that quantitative objective evaluation of tongue cracks cannot be realized only by extracting crack features in the prior art.

The invention realizes the objective evaluation of tongue cracks by using the traditional image processing technology and the deep learning technology based on computer vision. Compared with the traditional single tongue crack identification algorithm (namely only using the traditional image processing technology or deep learning calculation), the method can identify whether the tongue picture contains cracks or not and identify the form type of the cracks.

The method improves the accuracy of crack feature extraction and identification, can quantify the length of the crack, solves the problems of incomplete segmentation and over-segmentation of some defects in the traditional segmentation method, can correctly detect tongue crack images, has certain anti-jamming capability, and solves the problems of information loss and over-segmentation.

Drawings

It is to be understood by those skilled in the art that the following description is only exemplary of the principles of the present invention, which may be applied in numerous ways to achieve many different alternative embodiments. These descriptions are made for the purpose of illustrating the general principles of the present teachings and are not meant to limit the inventive concepts disclosed herein.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the general description given above and the detailed description of the drawings given below, serve to explain the principles of the invention.

The invention is described in further detail below with reference to the following figures and detailed description:

FIG. 1 is a schematic flow chart of the tongue crack analysis method based on computer vision according to the present invention;

FIG. 2 is a schematic flow chart of the tongue crack length measurement method based on computer vision according to the present invention;

FIG. 3 is a schematic diagram of an efficientdet structure employed in the present invention;

FIG. 4 is a schematic diagram of an efficientnet architecture employed by the present invention;

FIGS. 5a to 5c are schematic diagrams of wavelet decompositions employed in the present invention;

FIG. 6 is a schematic diagram of a skeleton extraction algorithm employed by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention. Unless defined otherwise, technical or scientific terms used herein shall have the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs. As used herein, the terms "first," "second," and the like, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" and similar words are intended to mean that the elements or items listed before the word cover the elements or items listed after the word and their equivalents, without excluding other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

As shown in fig. 1, the tongue crack analysis method based on computer vision of the present invention comprises the following steps:

step 1, detecting and marking tongue cracks;

detecting tongue cracks, performing primary characteristic extraction on cracks in an input tongue picture, acquiring and marking a boundary frame of local cracks, and providing a basis for finishing subsequent shape recognition and crack length work of the tongue cracks;

specifically, the method comprises the following steps:

step 1.1, collecting not less than 5 thousands of original tongue pictures through tongue picture collecting equipment, and marking boundary frame information of cracks on the original tongue pictures in a manual marking mode, wherein the cracks are original cracks; storing the marked tongue picture file marked with the original crack boundary frame information; randomly dividing the marked tongue picture into a crack detection training set and a crack detection verification set according to the proportion of 8: 2;

at least 10 ten thousand original tongue pictures without labels are collected; in addition, 1 ten thousand unmarked original tongue pictures are collected as a test set and are manually marked;

step 1.2, building a crack detection training model based on deep learning, wherein the crack detection training model adopts an efficientdet structure shown in fig. 3, and building a multi-scale characteristic pyramid by using bifpn so as to improve the detection accuracy of small cracks and complex cracks;

the crack detection training model adopts an efficientdet structure in the prior art; in the prior art, target detection models based on deep learning are generally divided into two categories, namely a one-stage detection model and a two-stage detection model, and an efficientdet structure adopted by the invention belongs to the one-stage detection model; the one-stage detection model consists of three modules: a basic module (backbone), a multi-scale feature extraction module (bifpn) and a detection module (prediction net); the detection module consists of two prediction modules, namely a target classification result prediction module class prediction net and a target boundary Box prediction module Box prediction net; the prediction module adopts convolution operation Conv; of course, other deep learning based object detection models may be employed with the present invention.

Step 1.3, performing model training on the crack detection training model alternately by adopting a full-supervision training mode and a semi-supervision training mode to obtain a trained crack detection model;

and (3) full supervision training:

training a crack detection training model based on deep learning by using the crack detection training set obtained in the step 1.1, and verifying by using the crack detection verification set obtained in the step 1.1 to obtain a detection model;

in order to complete the training of the crack detection model more accurately and more quickly, ciouloss is adopted as a regression loss function of the crack boundary frame in the training process so as to improve the accuracy of the crack boundary frame detected by the detection model;

the formula for the ciou crack bounding box regression loss function is as follows:

where ρ (.) represents the Euclidean distance,

b represents the center point of the prediction box B,

b^gtrepresenting the real box B^gtIs measured at a central point of the beam,

α is a parameter for balancing the loss function, specifically:

ν is a parameter used to measure the uniformity of the aspect ratio, specifically:

wherein, w^gtIndicating the width of the truly marked crack bounding box,

h^gtindicating the height of the truly marked crack bounding box,

w represents the width of the predicted crack bounding box,

h represents the height of the predicted crack bounding box;

an online strong data enhancement mode is used in the training process, so that data enhancement is carried out on training set data;

the on-line strong data enhancement mode can be in the modes of random scaling and cutting, rotation, mirror image horizontal turning, Gaussian noise, median filtering, brightness change, contrast transformation, RGB channel exchange, mosaci and the like so as to improve the generalization capability of the model and enhance the tongue crack identification capability;

semi-supervised training:

predicting each of 10 ten thousand unmarked original tongue pictures by using a first-version detection model trained in a full supervision training mode to obtain position information of a predicted tongue crack boundary box, sequencing the predicted tongue crack boundary boxes according to probability values, taking the crack boundary boxes with the probability values larger than 0.9, deleting the boundary boxes with the probability values smaller than 0.9 to obtain a prediction file, and merging the prediction file with the crack detection training set obtained in the step 1.1 to obtain a new crack detection training set;

training step b.2, repeating the full supervision training step, and utilizing a new crack detection training set to train the first-version detection model again until the full supervision training process is finished to obtain a trained crack detection model;

step 1.4, detecting a boundary frame of the local crack;

predicting each of 10 ten thousand unmarked original tongue pictures by using the trained crack detection model, sequencing predicted tongue crack boundary boxes according to probability values, selecting the crack boundary boxes with the probability values larger than 0.9, and deleting the boundary boxes with the probability values smaller than 0.9 to obtain boundary boxes with local cracks; the boundary frame of the local crack is the position information of the crack detected by the trained crack detection model;

step 1.5, cutting 10 ten thousand unmarked original tongue pictures according to the local crack boundary frame detected in the step 1.4, and cutting out the local crack pictures to obtain unmarked local crack pictures (namely the local crack pictures without the shape type information marked with cracks);

step 2, identifying and marking the shape of the tongue crack;

step 2.1, manually marking the unmarked local crack image, marking shape category information (comprising one or more of longitudinal shape, transverse shape, yao-shaped shape, radial shape, gyrus shape and cobblestone shape) of the crack, and randomly dividing the marked local crack image into a shape recognition training set and a shape recognition verification set according to the proportion of 8: 2;

step 2.2, building a crack shape classification training model based on deep learning;

the crack shape classification training model adopts an efficientnet structure shown in FIG. 4; the efficentnet structure is the prior art and is not described herein again;

step 2.3, performing model training on the crack shape classification training model alternately by adopting a full-supervision training mode and a semi-supervision training mode to obtain a trained crack shape classification model;

and (3) full supervision training:

training a crack shape classification training model based on deep learning by using the shape recognition training set obtained in the step 2.1, and verifying by using the shape recognition verification set obtained in the step 2.1 to obtain a classification model; an online strong data enhancement mode is used in the training process, so that data enhancement is carried out on training set data;

the online robust data enhancement mode can be in the modes of automation, random scaling and cutting, rotation, horizontal mirror image turning, vertical mirror image turning, image attribute change, cutmix and the like, so that the generalization capability of the model is improved, and the tongue crack identification capability is enhanced;

due to the fact that the crack shape data has the problem of sample unbalance, a crack shape classification loss function FL can be modified, focal loss is used as the classification loss function, so that the less-class loss weight is increased, the more-class loss weight is reduced, and the crack shape classification accuracy is improved; the crack shape classification loss function FL uses the following formula:

FL(p_t)＝-α_t(1-p_t)^γlog(p_t)

wherein FL is a loss function of the crack shape classification,

p_tin order to be a predicted probability value,

α_tas a default parameter, take α_t＝0.25，

Gamma is a default parameter, taking gamma as 2,

according to the method, data enhancement is performed on training set data in the training process, and the generalization capability and crack shape recognition capability of the model can be improved.

Semi-supervised training:

predicting the local crack images cut out in the step 1.5 by using a trained first-version classification model (namely, a classification model obtained by executing a whole supervision training step), sequencing all the predicted local crack images according to probability values, reserving the local crack images with the probability values larger than 0.9, obtaining a prediction file, and merging the prediction file with the shape recognition training set obtained in the step 2.1 to obtain a new shape recognition training set;

repeating the step of fully supervised training, and training the first version classification model again by using a new shape recognition training set until the fully supervised training process is finished to obtain a trained crack shape classification model;

and 2.4, predicting the unmarked local crack image obtained in the step 1.5 by using the trained crack shape classification model to obtain the final crack shape category.

In order to verify the accuracy of the tongue crack shape analysis result, the trained crack shape classification model is used for predicting each piece in the test set, the obtained crack shape classification is compared with the manual marking result, and the result shows that the crack shape classification accuracy reaches 97.15%, so that the method can better identify small cracks and complex cracks.

According to the invention, the unmarked local crack image and the crack category information are utilized in the training process of the crack shape classification training model based on deep learning, and the unmarked local crack image contains the cut crack image information, so that the trained crack shape classification model can more accurately identify the shape of the tongue crack.

Step 3, measuring the length and the grading of the tongue cracks;

acquiring more refined crack characteristics by using the unmarked local crack image obtained in the step 1.5 and using threshold segmentation, expansion corrosion, wavelet transformation and a skeleton extraction algorithm, so that the accurate crack length can be obtained in the process of measuring the crack length, and the crack length is calculated by using a progressive method;

the method comprises the following specific steps:

step 3.1, carrying out gray processing on the unmarked local crack image obtained in the step 1.5;

step 3.2, the median filtering is acted on the tongue crack image subjected to the graying treatment so as to smooth noise interference factors in the tongue crack image;

specifically, a two-dimensional sliding template is used on a tongue crack image subjected to graying processing, pixels in the template are sorted according to the size of pixel values to generate a two-dimensional data sequence which monotonically rises or falls, an intermediate pixel value is taken as an output value of the template region, and the median filter output is as follows:

g(x,y)＝med{f(x-k,y-l),(k,l∈w)}

wherein g (x, y) is the processed image,

f (x, y) is the original image,

(k, l) are area pixels of the pixel (x, y), respectively,

w is the size of the two-dimensional template, and can be 3 multiplied by 3;

med represents a median filter formula;

the two-dimensional sliding template adopted in the embodiment is a two-dimensional matrix of 3x3, and slides from left to right and from top to bottom on the tongue picture;

3.3, segmenting the smoothed local crack image by utilizing self-adaptive threshold segmentation, calculating a local threshold of the image according to the brightness distribution of different regions of the image, and segmenting the local region by utilizing the threshold;

the method for calculating the threshold value adopts a Phansalkar method, the method has better binarization effect on the image with low contrast, and the following formula is adopted;

t＝mean*(1+p*e^(-q*mean)+k*((std/r)-1))

wherein, t is a threshold value,

mean is the local mean value of the mean,

p＝2，

q＝10，

k＝0.25，

std is the local variance of the signal,

r＝0.5，

step 3.4, performing opening operation on the binarized local tongue picture for N times, namely performing corrosion for N times and then performing expansion for N times to remove only local small pixels, separating the object at a fine point, smoothing the boundary of a larger object, and not obviously changing the area of the object; the corrosion formula is as follows:

dst(x，y)＝erode(src(x，y))＝min_(x1，y1)src(x+x1，y+y1)

wherein, dst is an objective function,

(x, y) are points in the original image,

the enode refers to performing an etching operation on an original image,

src is the original image and the reference image,

(x1, y1) is a point in a structural element,

the etching operation is equivalent to moving a certain structural element S2, if the intersection of the structural element S2 and another structural element S1 completely belongs to the region of the structural element S1, the position point is saved, and all points meeting the condition form the result of etching the structural element S1 by the structural element S2;

the expansion formula is as follows:

dst(x，y)＝dilate(src(x,y))＝max_(x1，y1)src(x+x1，y+y1)

wherein, dst is an objective function,

(x, y) are points in the original image,

the dilate refers to the dilation operation performed on the original image,

src is the original image and the reference image,

(x1, y1) is a point in a structural element,

the expansion operation is equivalent to performing convolution operation on a certain structural element S2 on another structural element S1 by taking the structural element S2 as a core, and the set of the positions where all the structural elements S2 and S1 intersect is the expansion result of the structural element S1 under the action of the structural element S2;

wherein, N is 3, the shape of the structural element is defined as an elliptical shape;

step 3.5, performing N times of closing operation on the local crack image after opening operation, namely performing N times of expansion and then performing N times of corrosion to fill small black holes in a local area; likewise, N ═ 3, the shape of the structural elements defines an elliptical shape;

performing i-layer wavelet decomposition on the crack image by using Daubechies-4 type wavelets;

as shown in fig. 4a, an image signal is a square image and is divided into two identical left and right regions, where the left region is L and the right region is H; l is a low frequency component, H is a high frequency component;

as shown in fig. 4b, the image signal is subjected to one-level wavelet decomposition, and the left and right regions are respectively decomposed into an upper region and a lower region; wherein, the upper left area is LL1, and the lower left area is LH 1; the upper right region is HL1, and the lower right region is HH 1;

as shown in fig. 4c, the image signal is subjected to a two-level wavelet decomposition, and the upper left region LL1 is decomposed into 4 two-level regions, wherein the upper left two-level region is LL2, and the lower left two-level region is LH 2; the secondary region at the upper right is HL2, and the secondary region at the lower right is HH 2; the rest areas are the same as the first-level decomposition;

by analogy, performing i-level wavelet decomposition on the image signal to obtain a group of wavelet coefficients, wherein the size and the shape of the group of wavelet coefficients are the same as those of the original image; wherein, the upper left area is LLI, and the lower left area is LHi; the upper right region is HLi, and the lower right region is HHi; the rest areas are the same as the previous decomposition;

step 3.7, fusing the high-low frequency tongue image component image by using a wavelet fusion method, and using local variance as a basis in low frequency;

c (X) represents a coefficient matrix of wavelet low-frequency components of the tongue crack image X, p (m, n) represents the spatial position of the wavelet coefficients, and c (X, p) represents the value of an element with a low-frequency component coefficient matrix subscript of (m, n); firstly, taking p as a center, and expressing the significance of the regional variance by using the weighted variance in the region Q; u (X, p) represents the mean value of the low-frequency coefficient matrix of the tongue crack image X, and the point p is the center of the Q area; representing the area variance significance of a low-frequency coefficient matrix in the tongue crack image X by using a function G (X, p), and taking a point p as the center of a Q area; then:

G(X,p)＝∑_P∈Qω(p)|c(X,p)-u(X,p)|²

wherein, Q is a region,

p is the spatial position of the wavelet coefficient with coordinates (m, n)

c (X, p) is the value of the element with subscript (m, n) in the coefficient matrix of the low-frequency component of the wavelet of the tongue crack image X,

u (X, p) is the mean value of the low-frequency coefficient matrix of the tongue crack image X,

ω (p) represents the weight, and the value is larger as it gets closer to p;

the processing of the image by means of a two-dimensional wavelet transform can be decomposed into a series of low-frequency sub-images, the result of which depends on the type of wavelet basis, i.e. on the type of filter; taking any two low-frequency sub-images A1 and A2, and representing the regional variance of the low-frequency coefficient matrix of the images A1 and A2 as G (A1, p) and G (A2, p); further, the regional variance matching of the low frequency coefficient matrices of images a1 and a2 is defined by M2(p) at point p:

setting T2 as a threshold value of the matching degree, wherein the value is usually 0.5-1; when M is₂(p) < T2, the fusion strategy was chosen as follows:

when M is₂(p) ≧ T2, the average fusion strategy is as follows:

wherein,

W_man＝1-W_min

d (X, P) represents a coefficient matrix of wavelet high frequency components at point P;

step 3.8, reconstructing a clear crack shape structure by the wavelet inverse transformation acting on the processed high-frequency and low-frequency components;

step 3.9, using a Zhangard C.Y.Suen skeleton extraction algorithm for the reconstructed crack shape structure to obtain a clear tongue crack skeleton, as shown in FIG. 6;

specifically, each iteration step corrodes a target pixel meeting the characteristic condition, so that the target becomes thinner and thinner; continuously iterating until no new pixel point is corroded in the current round of operation of the target after the last corrosion, and ending the algorithm; the four conditions of the algorithm are:

(a)2≤B(P₁)≤6

wherein, B (P)₁) Represents P₁Of non-zero neighbourhood of pointsNumber, center pixel P₁The number of surrounding target pixels (1 in the binary values) is between 2 and 6;

(b)A(P₁)＝1

wherein, A (P)₁) Is P₁In 8 area pixels around the point, two adjacent pixels appear 0->1 times;

(c)P₂*P₄*P ₆0 or P₂*P₄*P₈＝0

(d)P₄*P₆*P ₈0 or P₂*P₆*P₈＝0

Wherein, P₂、P₄、P₆、P₈Refers to a pixel point;

step 3.10, calculating the length of the tongue crack;

calculating the length of the tongue crack by using a progressive method; taking the upper boundary of the tongue crack as an example, the boundary of the tongue crack is covered once by using pixel points, the length of the tongue crack is represented by the distance between 2 adjacent pixel points from left to right, and the pixel points are sequentially superposed to the rightmost pixel point of the tongue crack; the formula for calculating the tongue crack length is:

L_n＝∑L_i

wherein L is_iIs two adjacent pixel points (x)_i，y_i)、(x_i-1，y_i-1) Is taken as L₁＝0；

L_nCalculating the length of the crack upper boundary for the progressive method; the length of the lower boundary of the crack can be obtained in the same way, and the maximum value of the length of the lower boundary of the crack and the length of the lower boundary of the crack is taken as the length of the crack.

In order to accurately represent the characteristic parameters of the crack, the area S is used when the crack image is shot_bThe reference scale is divided by using an image processing technology as a target, and then the number of the occupied pixels is calculated, wherein the number of the pixels is m, and then one pixel is obtainedThe actual area corresponding to the pixel point is S_bM, the actual length of a pixel point is

Wherein the reference scale is a planar reference object of known area placed beside the crack before photographing.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

1. A tongue crack feature identification and length measurement method based on computer vision is characterized by comprising the following steps:

step 1, detecting and marking tongue cracks;

step 2, calculating the length of the tongue crack;

2. The computer vision based tongue crack feature identification and length measurement method according to claim 1, wherein the step 2 specifically comprises the following steps:

and 3.10, calculating the length of the tongue crack.

3. The computer vision based tongue crack feature identification and length measurement method according to claim 1 or 2, wherein the step 1 comprises the following steps:

step 1.2, building a crack detection training model based on deep learning;

the model training of step 1.3 comprises the following steps:

4. The computer vision based tongue crack feature identification and length measurement method according to claim 2, wherein the specific method of step 3.2 is as follows:

g(x,y)＝med{f(x-k,y-l),(k,l∈w)}

wherein g (x, y) is the processed image,

f (x, y) is the original image,

(k, l) is a region pixel of the pixel (x, y),

w is the two-dimensional template size;

med represents the median filter function.

And/or said calculating a local threshold in step 3.3 uses the following equation:

t＝mean*(1+p*e^(-q*mean)+k*((std/r)-1))

wherein, t is a threshold value,

mean is the local mean value of the mean,

std is the local variance of the signal,

p＝2，

q＝10，

k＝0.25，

r＝0.5。

and/or the etching is performed N times in step 3.4 and/or step 3.5 using the following formula:

dst(x，y)＝erode(src(x，y))＝min_(x1，y1)src(x+x1，y+y1)

wherein, dst is an objective function,

(x, y) are points in the original image,

the enode refers to performing an etching operation on an original image,

src is the original image and the reference image,

(x1, y1) is a point in a structural element;

dst(x,y)＝dilate(src(x,y))＝max_(x1,y1)src(x+x1,y+y1)

wherein, dst is an objective function,

(x, y) are points in the original image,

the dilate refers to the dilation operation performed on the original image,

src is the original image and the reference image,

(x1, y1) is a point in a structural element.

5. The computer vision based tongue crack feature identification and length measurement method according to claim 2, wherein the wavelet decomposition method of step 3.6 is: performing i-layer wavelet decomposition on the crack image;

6. The computer vision based tongue crack feature identification and length measurement method according to claim 2, wherein the wavelet fusion method of step 3.7 adopts the following formula:

G(X,p)＝∑_P∈Qω(p)|c(X,p)-u(X,p)|²

q represents a region of the image to be displayed,

ω (p) represents the weight, and the value is larger as it gets closer to p;

the value of M2(p) is changed between 0 and 1, and the smaller the value is, the lower the matching degree of the low-frequency coefficient matrixes of the two images is;

let T2 be the threshold value of the matching degree; when M2(p) < T2, the fusion strategy is as follows:

when M2(p) ≧ T2, the fusion strategy is as follows:

wherein,

W_man＝1-W_min；

7. The computer vision based tongue crack feature identification and length measurement method according to claim 2, wherein the specific method of step 3.10 is as follows: calculating the length of the tongue crack by using a progressive method; the formula for calculating the tongue crack length is:

L_n＝∑L_i

S_bin order to refer to the area of the scale,

m is the number of pixel points occupied by the reference scale,

S_bwhere/m is the actual area corresponding to a pixel point,

the actual length corresponding to one pixel point;

8. A computer vision based tongue crack feature identification and length measurement system, comprising:

9. The computer vision based tongue crack feature identification and length measurement system of claim 8, wherein the tongue crack detection module comprises:

the crack detection model training module comprises:

10. The computer vision based tongue crack feature identification and length measurement system according to claim 8 or 9, wherein the tongue crack length calculation module comprises: