CN110729045A - Tongue image segmentation method based on context-aware residual error network - Google Patents

Tongue image segmentation method based on context-aware residual error network Download PDF

Info

Publication number
CN110729045A
CN110729045A CN201910969290.3A CN201910969290A CN110729045A CN 110729045 A CN110729045 A CN 110729045A CN 201910969290 A CN201910969290 A CN 201910969290A CN 110729045 A CN110729045 A CN 110729045A
Authority
CN
China
Prior art keywords
tongue
network
candidate
aware
residual error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910969290.3A
Other languages
Chinese (zh)
Inventor
李佐勇
樊好义
周常恩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Minjiang University
Original Assignee
Minjiang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Minjiang University filed Critical Minjiang University
Priority to CN201910969290.3A priority Critical patent/CN110729045A/en
Publication of CN110729045A publication Critical patent/CN110729045A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

The invention relates to a tongue image segmentation method based on a context-aware residual error network, which utilizes a deep neural network to automatically extract image characteristics; determining a candidate region where the tongue body is located by using a region candidate network based on the extracted feature map; and finally, obtaining a tongue body segmentation result by segmenting the candidate region. The invention can effectively improve the accuracy of tongue image segmentation.

Description

Tongue image segmentation method based on context-aware residual error network
Technical Field
The invention relates to the technical field of image processing, in particular to a tongue image segmentation method based on a context-aware residual error network.
Background
Tongue diagnosis is one of the main contents of inspection in traditional Chinese medicine, and is one of the traditional diagnostic methods with the characteristics of traditional Chinese medicine. The tongue picture is the most sensitive index reflecting the physiological function and pathological change of human body and has important application value in the process of traditional Chinese medicine diagnosis and treatment. The image processing technology is applied to establish an objective quantification and identification method of tongue inspection information, so that the automation of the tongue inspection of the traditional Chinese medicine is realized, and the method has important practical significance for the modernization of the traditional Chinese medicine. In the automatic tongue diagnosis system, after tongue images of a patient are acquired by a digital acquisition instrument (an industrial camera, a camera and the like), a target area (a tongue body) must be automatically segmented, and then tongue body characteristics can be extracted and diagnosed. Therefore, the tongue image is segmented into important links connecting tongue image acquisition and tongue body diagnosis, and the segmentation quality directly influences the accuracy of subsequent diagnosis.
The difficulty of tongue image segmentation is: (1) the color of the tongue body is very close to the color of the face, particularly the color of the lips, and is easy to be confused; (2) the tongue body is used as a soft body without a fixed shape, and the individual difference of the shape of the tongue body is large; (3) the tongue body is not smooth, the tongue coating and tongue quality vary from person to person, and the pathological characteristics have large differences; (4) cracks and color blocks of the tongue coating may affect the accurate segmentation of the tongue.
In view of the difficulties and challenges of tongue image segmentation, it is often difficult to obtain satisfactory segmentation results with a single conventional image segmentation technique. Therefore, a fusion of various conventional segmentation techniques has been studied. Under the framework of fusion of various traditional segmentation technologies, the mainstream method is an Active Contour Model (ACM) based method. The ACM is also called Snake model, is a popular variable shape model, and is widely applied to contour extraction. And giving an initial contour curve, and evolving the initial contour curve towards the real target contour by the active contour model under the combined action of internal and external forces. The segmentation method based on the ACM mainly researches points on acquisition of an initial contour and curve evolution. However, the segmentation effect of the conventional tongue image segmentation method still needs to be improved.
Recently, methods based on the deep Convolutional Neural Network (CNN) have enjoyed significant success in the fields of computer vision and image processing. In the field of medical image segmentation, CNN-based methods are also widely used due to their powerful feature learning and representation capabilities. Among these methods, the Full Convolutional Network (FCN) shows good performance in biological cell and organ segmentation. A U-network (U-Net) was developed from FCN and allows for a jump connection between encoder and decoder by extending the symmetric self-encoder design to better locate targets in the image by combining high resolution features in the encoding path with the up-sampled output. The U-Net network was used to identify and segment the cardiac regions of Drosophila at different stages of development. In addition, convolutional neural networks have also been used to construct a focal stack-based method for the automatic detection of plasmodium falciparum malaria from blood smears. Tongue image segmentation based on deep learning has started in the last two years.
Disclosure of Invention
In view of this, the present invention provides a tongue image segmentation method based on a context-aware residual error network, which can effectively improve the accuracy of tongue image segmentation.
The invention is realized by adopting the following scheme: a tongue image segmentation method based on a context-aware residual error network specifically comprises the following steps:
automatically extracting image features by using a deep neural network;
determining a candidate region where the tongue body is located by using a region candidate network based on the extracted feature map;
and finally, obtaining a tongue body segmentation result by segmenting the candidate region.
Further, the automatic extraction of image features by using the deep neural network specifically includes the following steps:
step S11: establishing a cavity residual error module, wherein the corresponding mapping is as follows:
Figure BDA0002231544370000031
in the formula, xiAnd xi+1Respectively representing the input and output of the ith residual block, D representing a hole convolution operation, GD(. and F)D() represents two different non-linear mapping groups, wherein each mapping group consists of a hole convolution operation, a batch normalization operation and a ReLU activation function;and
Figure BDA0002231544370000033
respectively representing two mapped related parameter sets which are weights to be learned by the neural network;
Figure BDA0002231544370000034
and
Figure BDA0002231544370000035
respectively representing different weights assigned to the two mapping groups;
step S12: establishing a characteristic pyramid network by using the cavity residual error module established in the step S1 to realize multi-scale characteristic extraction of the tongue image and obtain a multi-scale characteristic diagram;
the feature pyramid network comprises a bottom-up path module, a transverse connection module and a top-down path module, wherein the bottom-up path module is a feature extraction basic network which is formed by serially constructing five context-aware cavity residual error modules, and the transverse connection module is used for connecting a feature diagram of the bottom-up path module to the top-down path module.
Further, the determining the candidate area where the tongue body is located by using the area candidate network specifically includes: the regional candidate network extracts candidate targets on the multi-scale feature map by using a sliding window, then obtains a 2048-dimensional vector through a standard convolutional layer with the size of 3 multiplied by 3 convolutional kernel, and respectively realizes target classification and position positioning of candidate frames by following two branches of candidate frame classification and candidate frame regression formed by the standard convolutional layer with the size of 1 multiplied by 1 convolutional kernel, thereby respectively generating 2k class probabilities and 4k candidate frame coordinate positions; the category probability comprises tongue probability and non-tongue probability, and the candidate frame coordinate position comprises an x coordinate, a y coordinate, box width and box height.
Further, the tongue segmentation result obtained by segmenting the candidate region specifically includes: firstly, a characteristic diagram corresponding to each candidate area is converted into a candidate characteristic diagram with a fixed size by a RoI alignment module and a bilinear interpolation technology, a plurality of candidate characteristic diagrams are aligned, and then final tongue body positioning and segmentation are realized through a positioning branch network and a segmentation branch network respectively.
Furthermore, the positioning branch network is used for performing position regression operation by taking two full-connection layers as regressors to realize accurate positioning; the segmentation branch network takes two layers of standard convolution layers as a pixel classifier to carry out pixel-level classification, namely tongue segmentation.
Further, the loss function adopted in the training process of the positioning branch network and the splitting branch network is as follows:
L=Lloc+Lmask
wherein the content of the first and second substances,
Figure BDA0002231544370000041
Figure BDA0002231544370000042
in the formula, tiIs the tongue position marked manually,
Figure BDA0002231544370000043
the tongue body positioning branch network predicts the tongue body position, and x, y, w and h respectively represent the abscissa of the upper right corner of the tongue body positioning frame, the ordinate of the upper right corner of the tongue body positioning frame, the length of the tongue body and the width of the tongue body;
wherein the content of the first and second substances,
Lmask=∑c(1-TIc)
in the formula, TIcIs the Tversky similarity metric defined as follows:
in the formula, picIs the probability that the predicted pixel i belongs to the tongue class,
Figure BDA0002231544370000052
is the predicted probability that pixel i does not belong to the tongue class, g ic1 indicates that pixel i belongs to the tongue category,
Figure BDA0002231544370000053
indicating that pixel i does not belong to the tongue class, e is an infinitesimal constant to avoid zero, and α and β are two parameters that control the accuracy and recall balance. Wherein α is 0.3 and β is 0.7.
Compared with the prior art, the invention has the following beneficial effects: the method firstly positions the tongue body area, and then classifies the pixel level in the positioned area, so as to realize the final accurate segmentation and effectively avoid the interference of complex background. In the characteristic learning process, in order to extract more representative characteristics, the invention provides a novel cavity residual error module based on context sensing, and the effective extraction of multi-level scale characteristics can be realized by combining a characteristic pyramid network. The invention can effectively improve the accuracy and robustness of tongue image segmentation.
Drawings
FIG. 1 is a schematic diagram of the method of the embodiment of the present invention.
Fig. 2 shows the original residual block structure in ResNet.
FIG. 3 is a block diagram of a context-aware hole residual module according to an embodiment of the present invention.
FIG. 4 is a feature pyramid network according to an embodiment of the present invention.
FIG. 5 is a sample of a feature pattern of an embodiment of the present invention: output of the context-aware feature pyramid network.
Fig. 6 is a diagram illustrating a candidate area network according to an embodiment of the present invention.
FIG. 7 is a box plot of the performance of the various methods on the three datasets. Wherein, (a) is Precision, (b) is Dice, (c) is mloU, (d) is FPR, (e) is FNR, and (f) is ME.
FIG. 8 is a quantitative comparison of the performance of various algorithm segmentations on three data sets.
Fig. 9 is a comparison of the segmentation results for three randomly selected tongue images on three data sets. Where (a) is dataset TestSet1, (b) is TestSet2, and (c) is TestSet 3.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
As shown in fig. 1, the present embodiment provides a tongue image segmentation method based on a context-aware residual error network, an overall network framework of the present embodiment is an end-to-end tongue positioning and segmentation deep neural network, referred to as TongueNet for short, and an entire algorithm flow is composed of three stages: a Feature Extraction Stage (Feature Extraction Stage), a Region candidate Stage (Region Prediction Stage), and a Prediction Stage (Prediction Stage). Firstly, in the feature extraction stage, in order to effectively extract image space information and prior information of the tongue body (such as color, shape, tongue fur texture, and the like), the embodiment proposes a pyramid network module, which is based on a hole Convolution constraint and residual learning (residual learning), and can effectively realize multi-scale feature extraction of the tongue image; then, in the Region candidate stage, based on the feature map extracted in the feature extraction stage, the embodiment utilizes a Region candidate Network (Region pro-potential Network) to realize effective coarse positioning of the tongue candidate Region; finally, in the prediction stage, based on the tongue candidate region and the characteristic diagram thereof located in the region candidate stage, the combined learning of two different learning tasks (segmentation and location) is realized by optimizing the multi-task loss function designed by the invention.
The method specifically comprises the following steps:
automatically extracting image features by using a deep neural network;
determining a candidate region where the tongue body is located by using a region candidate network based on the extracted feature map;
and finally, obtaining a tongue body segmentation result by segmenting the candidate region.
Experiments show that the segmentation precision of the tongue image is remarkably improved by the method.
Preferably, an ideal feature extraction network should be a neural network deep enough to achieve efficient extraction of multi-scale features. Inspired by the successful application of ResNet in the task of feature extraction and image classification, the present embodiment provides a new Context-aware void Residual Block (Context-aware Residual Block) based on the Residual Block (Residual Blocks) in ResNet, so as to implement extraction of tongue features with more discriminability. The original residual block is composed of convolution layers with different convolution kernel sizes, and the corresponding mapping is shown as the following formula:
Figure BDA0002231544370000071
wherein x isiAnd xi+1Respectively representing the input and output of the ith residual block, and G (-) and F (-) respectively representing two different nonlinear mapping groups, wherein each mapping group consists of a standard convolution operation (StandardConvolition), a Batch Normalization operation (Batch Normalization) and a ReLU activation function;
Figure BDA0002231544370000081
and
Figure BDA0002231544370000082
the sets of relevant parameters representing the two mappings, respectively, are the weights that the neural network needs to learn. As shown in FIG. 2, given a residual block of 3 map groups, each map group consists of a standardConvolution layer, a batch normalization layer and a ReLU activation layer. The resolution of the feature map output by such a residual block is half of the original input, which results in a certain loss of spatial information.
In this embodiment, the automatically extracting image features by using the deep neural network specifically includes the following steps:
step S11: unlike the original residual block structure in ResNet, this embodiment proposes a new context-aware hole residual block, and the corresponding mapping is as follows:
Figure BDA0002231544370000083
in the formula, xiAnd xi+1Respectively representing the input and output of the ith residual block, D representing a hole convolution operation, GD(. and F)D() represents two different non-linear mapping groups, wherein each mapping group consists of a hole convolution operation, a batch normalization operation and a ReLU activation function;andrespectively representing two mapped related parameter sets which are weights to be learned by the neural network;and
Figure BDA0002231544370000087
respectively representing different weights assigned to the two mapping groups; here, the present embodiment employs a weighted Skip Connection (Skip Connection) to realize a weighted residual learning. As shown in FIG. 3, given a context-aware hole residual module consisting of 3 map-groups, each map-group consists of a hole convolution layer, a bulk normalization layer, and a ReLU activation layer. In thatIn the feature extraction process, the resolution of the feature map output by the residual block is consistent with the input resolution, so that the loss of spatial information caused by halving the output resolution of the original residual block in ResNet to the input resolution can be avoided. In addition, a more flexible characteristic learning process is realized through weighted residual learning;
step S12: establishing a characteristic pyramid network by using the cavity residual error module established in the step S1 to realize multi-scale characteristic extraction of the tongue image and obtain a multi-scale characteristic diagram;
as shown in fig. 4, the feature pyramid network includes a Bottom-up path module (Bottom-up path), which is a feature extraction base network serially constructed by five context-aware hole residual modules (DConv1_ x, DConv2_ x, DConv3_ x, DConv4_ x, and DConv5_ x), a horizontal connection module for connecting the feature map of the Bottom-up path module to the top-down path module, and a top-down path module. Finally, a multi-scale feature pyramid is formed and used for coarse positioning of the tongue body region in the candidate stage of the region and accurate positioning and segmentation of the tongue body region in the prediction stage. FIG. 5 is a diagram illustrating multi-scale features output by the context-aware feature pyramid network.
Preferably, in the region candidate stage, based on the multi-scale feature map extracted in the feature extraction stage, the present embodiment utilizes the region candidate network to achieve effective coarse positioning of the tongue candidate region, and the feature map corresponding to the positioned tongue candidate region is used for accurate positioning and segmentation of the tongue in the prediction stage.
In this embodiment, as shown in fig. 6, the determining the candidate region where the tongue is located by using the region candidate network specifically includes: the regional candidate network utilizes a Sliding Window (Sliding Window) to extract candidate targets on a multi-scale feature map, then obtains a 2048-dimensional vector through a standard convolutional layer with the size of 3 multiplied by 3 convolutional kernel, and respectively realizes target Classification and position positioning of candidate frames by following two branches of candidate frame Classification (Box Classification) and candidate frame regression (Box regression) which are formed by the standard convolutional layer with the size of 1 multiplied by 1 convolutional kernel, thereby respectively generating 2k class probabilities and 4k candidate frame coordinate positions; the category probability comprises tongue probability and non-tongue probability, and the candidate frame coordinate position comprises an x coordinate, a y coordinate, box width and box height.
In this embodiment, the tongue segmentation result obtained by segmenting the candidate region specifically includes: firstly, a feature map corresponding to each candidate area is converted into a candidate feature map with a fixed size by a RoI alignment (Align) module and a bilinear interpolation technology, a plurality of candidate feature maps are aligned, and then final tongue body positioning and segmentation are realized through a positioning Branch network (Localization Branch) and a segmentation Branch network (Mask Branch) respectively.
In this embodiment, the positioning branch network performs position regression operation by using two fully-connected layers as regressors, so as to realize accurate positioning; the segmentation branch network takes two layers of standard convolution layers as a pixel classifier to carry out pixel-level classification, namely tongue segmentation.
In this embodiment, the loss function adopted in the training process of the positioning branch network and the splitting branch network is:
L=Lloc+Lmask
wherein the content of the first and second substances,
Figure BDA0002231544370000102
in the formula, tiIs the tongue position marked manually,
Figure BDA0002231544370000103
the tongue body positioning branch network predicts the tongue body position, and x, y, w and h respectively represent the abscissa of the upper right corner of the tongue body positioning frame, the ordinate of the upper right corner of the tongue body positioning frame, the length of the tongue body and the width of the tongue body;
wherein the content of the first and second substances,
Lmask=∑c(1-TIc)
in the formula, TIcIs the Tversky similarity metric defined as follows:
Figure BDA0002231544370000111
in the formula, picIs the probability that the predicted pixel i belongs to the tongue class,is the predicted probability that pixel i does not belong to the tongue class, g ic1 indicates that pixel i belongs to the tongue category,
Figure BDA0002231544370000113
indicating that the pixel i does not belong to the tongue class, and e is an infinitesimal constant to avoid zero, 10 is selected in this embodiment-8And alpha and beta are two parameters for balancing control accuracy and recall rate.
Wherein α is 0.3 and β is 0.7.
The embodiment improves the accuracy and robustness of tongue image segmentation by a new end-to-end multitask deep learning framework. The tongue body area is firstly positioned, and then pixel-level classification is carried out in the positioned area, so that the final accurate segmentation is realized, and the interference of a complex background is effectively avoided. In the feature learning process, in order to extract more representative features, the embodiment provides a new context-aware cavity residual error module, and the effective extraction of multi-level scale features is realized by combining a feature pyramid network.
Specifically, in order to evaluate the performance of the tongue image segmentation algorithm, the present embodiment performed ten-fold cross validation experiments on three data sets, TestSet1(300 tongue images with a resolution of 768 × 576), TestSet2(331 tongue images with a resolution of 550 × 650), TestSet3(290 tongue images with a resolution of 600 × 576), and the segmentation performance was measured by 6 common segmentation measures. The first 3 measures, namely Precision (Precision), a Dice coefficient (Dice coefficient) and an mlou (mean interaction over union), are commonly used on the performance measurement of a segmentation model based on deep learning, and the larger the measure value is, the better the segmentation performance is; the last 3 measures, namely False Positive Rate/False alarm Rate (FPR), False Negative Rate (FNR), and Misclassification Error (ME), are commonly used in the performance metric of the conventional segmentation model, and a smaller measurement value indicates better segmentation performance. These measures are defined as:
Figure BDA0002231544370000121
Figure BDA0002231544370000122
Figure BDA0002231544370000123
Figure BDA0002231544370000124
Figure BDA0002231544370000125
in the formula, BgAnd FgBackground and object representing results of manual standard segmentation, BpAnd FpRepresenting the background and the target in the segmentation result corresponding to the automatic segmentation algorithm, and | represents the number of elements in the set. The value ranges of the six measures are all 0-1. Lower values of ME, FPR and FNR represent better segmentation; conversely, higher Precision, Dice, and mlou values represent better segmentation results.
To verify the effectiveness of the method of the present embodiment in tongue image segmentation, the present embodiment compares it with the newly proposed deep learning algorithms FCN, U-Net, SegNet, deep learning, and Mask R-CNN. As shown in the box plot of fig. 7 and the table of fig. 8, the metric results of the algorithm of the present invention (TongueNet) are almost the best at six measurements of the three data sets, with the Precision, Dice, mliou measurement values being significantly higher than the other methods, and the FPR and ME measurement values being significantly lower than the other methods. The only exception is that DeepTongue and U-Net are superior to the present algorithm in terms of FNR measure on partial data sets, but this is due to the more obvious over-segmentation phenomenon of the segmentation results of the two algorithms. The boxplot of FIG. 7 further demonstrates that the algorithm of the present invention is more stable than other methods because its outliers are generally less or deviate less.
Fig. 9 shows the manual segmentation result and the algorithm segmentation result of three randomly selected tongue images on three data sets, respectively, wherein the dotted line represents the manual ideal segmentation result and the solid line represents the algorithm segmentation result. As can be seen from FIG. 9, the algorithm segmentation result of the present invention is usually closest to the manual ideal segmentation result (the dotted line has the highest coincidence degree with the implementation), and the segmentation effect is the best; and the best segmentation effect is basically obtained on three randomly selected tongue images of the three data sets, which shows that the segmentation performance is most stable.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is directed to preferred embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. However, any simple modification, equivalent change and modification of the above embodiments according to the technical essence of the present invention are within the protection scope of the technical solution of the present invention.

Claims (7)

1. A tongue image segmentation method based on context-aware residual error network is characterized in that,
automatically extracting image features by using a deep neural network;
determining a candidate region where the tongue body is located by using a region candidate network based on the extracted feature map;
and finally, obtaining a tongue body segmentation result by segmenting the candidate region.
2. The tongue image segmentation method based on the context-aware residual error network according to claim 1, wherein the automatic image feature extraction by using the deep neural network specifically comprises the following steps:
step S11: establishing a cavity residual error module, wherein the corresponding mapping is as follows:
Figure FDA0002231544360000011
in the formula, xiAnd xi+1Respectively representing the input and output of the ith residual block, D representing a hole convolution operation, GD(. and F)D() represents two different non-linear mapping groups, wherein each mapping group consists of a hole convolution operation, a batch normalization operation and a ReLU activation function;
Figure FDA0002231544360000012
and
Figure FDA0002231544360000013
respectively representing two mapped related parameter sets which are weights to be learned by the neural network;
Figure FDA0002231544360000014
and
Figure FDA0002231544360000015
respectively representing different weights assigned to the two mapping groups;
step S12: establishing a characteristic pyramid network by using the cavity residual error module established in the step S1 to realize multi-scale characteristic extraction of the tongue image and obtain a multi-scale characteristic diagram;
the feature pyramid network comprises a bottom-up path module, a transverse connection module and a top-down path module, wherein the bottom-up path module is a feature extraction basic network which is formed by serially constructing five context-aware cavity residual error modules, and the transverse connection module is used for connecting a feature diagram of the bottom-up path module to the top-down path module.
3. The tongue image segmentation method based on the context-aware residual error network as claimed in claim 1, wherein the determining the candidate region where the tongue body is located by using the region candidate network specifically comprises: the regional candidate network extracts candidate targets on the multi-scale feature map by using a sliding window, then obtains a 2048-dimensional vector through a standard convolutional layer with the size of 3 multiplied by 3 convolutional kernel, and respectively realizes target classification and position positioning of candidate frames by following two branches of candidate frame classification and candidate frame regression formed by the standard convolutional layer with the size of 1 multiplied by 1 convolutional kernel, thereby respectively generating 2k class probabilities and 4k candidate frame coordinate positions; the category probability comprises tongue probability and non-tongue probability, and the candidate frame coordinate position comprises an x coordinate, a y coordinate, box width and box height.
4. The tongue image segmentation method based on the context-aware residual error network according to claim 1, wherein the tongue segmentation result obtained by segmenting the candidate region is specifically: firstly, a characteristic diagram corresponding to each candidate area is converted into a candidate characteristic diagram with a fixed size by a RoI alignment module and a bilinear interpolation technology, a plurality of candidate characteristic diagrams are aligned, and then final tongue body positioning and segmentation are realized through a positioning branch network and a segmentation branch network respectively.
5. The tongue image segmentation method based on the context-aware residual error network according to claim 4, wherein the positioning branch network is implemented by performing position regression operation by using two fully-connected layers as regressors to realize accurate positioning; the segmentation branch network takes two layers of standard convolution layers as a pixel classifier to carry out pixel-level classification, namely tongue segmentation.
6. The tongue image segmentation method based on context-aware residual error network according to claim 4, wherein the loss function adopted in the training process of the positioning branch network and the segmentation branch network is:
L=Lloc+Lmask
wherein the content of the first and second substances,
Figure FDA0002231544360000031
Figure FDA0002231544360000032
in the formula, tiIs the tongue position marked manually,
Figure FDA0002231544360000036
the tongue body positioning branch network predicts the tongue body position, and x, y, w and h respectively represent the abscissa of the upper right corner of the tongue body positioning frame, the ordinate of the upper right corner of the tongue body positioning frame, the length of the tongue body and the width of the tongue body;
wherein the content of the first and second substances,
Lmask=∑c(1-TIc)
in the formula, TIcIs the Tversky similarity metric defined as follows:
Figure FDA0002231544360000033
in the formula, picIs the probability that the predicted pixel i belongs to the tongue class,
Figure FDA0002231544360000034
is the predicted probability that pixel i does not belong to the tongue class, gic1 indicates that pixel i belongs to the tongue category,
Figure FDA0002231544360000035
indicating that pixel i does not belong to the tongue class, e is an infinitesimal constant to avoid zero, and α and β are two parameters that control the accuracy and recall balance.
7. The tongue image segmentation method based on context-aware residual error network as claimed in claim 6, wherein α -0.3 and β -0.7.
CN201910969290.3A 2019-10-12 2019-10-12 Tongue image segmentation method based on context-aware residual error network Pending CN110729045A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910969290.3A CN110729045A (en) 2019-10-12 2019-10-12 Tongue image segmentation method based on context-aware residual error network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910969290.3A CN110729045A (en) 2019-10-12 2019-10-12 Tongue image segmentation method based on context-aware residual error network

Publications (1)

Publication Number Publication Date
CN110729045A true CN110729045A (en) 2020-01-24

Family

ID=69220043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910969290.3A Pending CN110729045A (en) 2019-10-12 2019-10-12 Tongue image segmentation method based on context-aware residual error network

Country Status (1)

Country Link
CN (1) CN110729045A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325211A (en) * 2020-02-13 2020-06-23 上海眼控科技股份有限公司 Method for automatically recognizing color of vehicle, electronic device, computer apparatus, and medium
CN111368775A (en) * 2020-03-13 2020-07-03 西北工业大学 Complex scene dense target detection method based on local context sensing
CN111523403A (en) * 2020-04-03 2020-08-11 咪咕文化科技有限公司 Method and device for acquiring target area in picture and computer readable storage medium
CN111783792A (en) * 2020-05-31 2020-10-16 浙江大学 Method for extracting significant texture features of B-ultrasonic image and application thereof
CN111914843A (en) * 2020-08-20 2020-11-10 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Character detection method, system, equipment and storage medium
CN112507872A (en) * 2020-12-09 2021-03-16 中科视语(北京)科技有限公司 Positioning method and positioning device for head and shoulder area of human body and electronic equipment
CN112926531A (en) * 2021-04-01 2021-06-08 深圳市优必选科技股份有限公司 Feature information extraction method, model training method and device and electronic equipment
CN114359739A (en) * 2022-03-18 2022-04-15 深圳市海清视讯科技有限公司 Target identification method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080139966A1 (en) * 2006-12-07 2008-06-12 The Hong Kong Polytechnic University Automatic tongue diagnosis based on chromatic and textural features classification using bayesian belief networks
CN107977671A (en) * 2017-10-27 2018-05-01 浙江工业大学 A kind of tongue picture sorting technique based on multitask convolutional neural networks
CN108109160A (en) * 2017-11-16 2018-06-01 浙江工业大学 It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN110136149A (en) * 2019-05-21 2019-08-16 闽江学院 Leucocyte positioning and dividing method based on deep neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080139966A1 (en) * 2006-12-07 2008-06-12 The Hong Kong Polytechnic University Automatic tongue diagnosis based on chromatic and textural features classification using bayesian belief networks
CN107977671A (en) * 2017-10-27 2018-05-01 浙江工业大学 A kind of tongue picture sorting technique based on multitask convolutional neural networks
CN108109160A (en) * 2017-11-16 2018-06-01 浙江工业大学 It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning
CN109711413A (en) * 2018-12-30 2019-05-03 陕西师范大学 Image, semantic dividing method based on deep learning
CN110136149A (en) * 2019-05-21 2019-08-16 闽江学院 Leucocyte positioning and dividing method based on deep neural network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHANGEN ZHOU 等: "Tonguenet: Accurate Localization and Segmentation for Tongue Images Using Deep Neural Networks", 《IEEE ACCESS》 *
SHAOQING REN 等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《COMPUTER SCIENCE》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325211A (en) * 2020-02-13 2020-06-23 上海眼控科技股份有限公司 Method for automatically recognizing color of vehicle, electronic device, computer apparatus, and medium
CN111368775A (en) * 2020-03-13 2020-07-03 西北工业大学 Complex scene dense target detection method based on local context sensing
CN111523403B (en) * 2020-04-03 2023-10-20 咪咕文化科技有限公司 Method and device for acquiring target area in picture and computer readable storage medium
CN111523403A (en) * 2020-04-03 2020-08-11 咪咕文化科技有限公司 Method and device for acquiring target area in picture and computer readable storage medium
CN111783792A (en) * 2020-05-31 2020-10-16 浙江大学 Method for extracting significant texture features of B-ultrasonic image and application thereof
CN111783792B (en) * 2020-05-31 2023-11-28 浙江大学 Method for extracting significant texture features of B-ultrasonic image and application thereof
CN111914843A (en) * 2020-08-20 2020-11-10 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Character detection method, system, equipment and storage medium
CN112507872A (en) * 2020-12-09 2021-03-16 中科视语(北京)科技有限公司 Positioning method and positioning device for head and shoulder area of human body and electronic equipment
CN112507872B (en) * 2020-12-09 2021-12-28 中科视语(北京)科技有限公司 Positioning method and positioning device for head and shoulder area of human body and electronic equipment
CN112926531A (en) * 2021-04-01 2021-06-08 深圳市优必选科技股份有限公司 Feature information extraction method, model training method and device and electronic equipment
CN112926531B (en) * 2021-04-01 2023-09-26 深圳市优必选科技股份有限公司 Feature information extraction method, model training method, device and electronic equipment
CN114359739B (en) * 2022-03-18 2022-06-28 深圳市海清视讯科技有限公司 Target identification method and device
CN114359739A (en) * 2022-03-18 2022-04-15 深圳市海清视讯科技有限公司 Target identification method and device

Similar Documents

Publication Publication Date Title
CN110729045A (en) Tongue image segmentation method based on context-aware residual error network
US11813047B2 (en) Automatic quantification of cardiac MRI for hypertrophic cardiomyopathy
CN107506761B (en) Brain image segmentation method and system based on significance learning convolutional neural network
Shen et al. Domain-invariant interpretable fundus image quality assessment
CN109523535B (en) Pretreatment method of lesion image
CN109544518B (en) Method and system applied to bone maturity assessment
CN107993221B (en) Automatic identification method for vulnerable plaque of cardiovascular Optical Coherence Tomography (OCT) image
Zhang et al. Automated semantic segmentation of red blood cells for sickle cell disease
CN109614869A (en) A kind of pathological image classification method based on multi-scale compress rewards and punishments network
CN111612756B (en) Coronary artery specificity calcification detection method and device
CN112465905A (en) Characteristic brain region positioning method of magnetic resonance imaging data based on deep learning
CN112884788B (en) Cup optic disk segmentation method and imaging method based on rich context network
CN114202545A (en) UNet + + based low-grade glioma image segmentation method
Shamrat et al. Analysing most efficient deep learning model to detect COVID-19 from computer tomography images
CN111462082A (en) Focus picture recognition device, method and equipment and readable storage medium
CN112396605B (en) Network training method and device, image recognition method and electronic equipment
Huang et al. HEp-2 cell images classification based on textural and statistic features using self-organizing map
CN112686932B (en) Image registration method for medical image, image processing method and medium
CN113888520A (en) System and method for generating a bullseye chart
CN113096080A (en) Image analysis method and system
CN107832695A (en) The optic disk recognition methods based on textural characteristics and device in retinal images
CN117036288A (en) Tumor subtype diagnosis method for full-slice pathological image
CN109934298A (en) A kind of gradual figure matching process and device of the deformation map based on cluster
CN110428405A (en) Method, relevant device and the medium of lump in a kind of detection biological tissue images
CN113222985B (en) Image processing method, image processing device, computer equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200124