CN116310545A - Cross-domain tongue image classification method based on depth layering optimal transmission - Google Patents
Cross-domain tongue image classification method based on depth layering optimal transmission Download PDFInfo
- Publication number
- CN116310545A CN116310545A CN202310252527.2A CN202310252527A CN116310545A CN 116310545 A CN116310545 A CN 116310545A CN 202310252527 A CN202310252527 A CN 202310252527A CN 116310545 A CN116310545 A CN 116310545A
- Authority
- CN
- China
- Prior art keywords
- domain
- tongue
- tongue image
- representing
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000005540 biological transmission Effects 0.000 title claims abstract description 110
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000009826 distribution Methods 0.000 claims abstract description 52
- 239000011159 matrix material Substances 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 18
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 238000011176 pooling Methods 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 9
- 239000000203 mixture Substances 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 6
- 238000000638 solvent extraction Methods 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 abstract description 5
- 230000008901 benefit Effects 0.000 abstract description 3
- 238000013145 classification model Methods 0.000 abstract 2
- 230000006870 function Effects 0.000 description 25
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 239000003814 drug Substances 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Radiology & Medical Imaging (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a cross-domain tongue image classification method based on depth layering optimal transmission, which comprises the steps of collecting tongue images; constructing a machine learning classification model, wherein the machine learning classification model realizes the alignment of tongue image features in different fields by using a depth layering optimal transmission model, and the depth layering optimal transmission model comprises a two-layer network structure, wherein a first layer of network structure is used for realizing optimal transmission among different fields, and a second layer of network structure is used for realizing optimal transmission among different samples; classifying the tongue images according to the depth layering optimal transmission model, and outputting tongue image categories. The tongue image classifying method has the advantages that the tongue images with different distributions are aligned, and meanwhile, the classifying capability of the tongue images is enhanced.
Description
Technical Field
The invention relates to the technical field of tongue image classification for assisting traditional Chinese medicine diagnosis and treatment, in particular to a cross-domain tongue image classification method based on depth layering optimal transmission.
Background
The existing tongue image classification method based on machine learning is mostly based on supervised learning. The supervised learning method generally assumes that the training set and the test set follow the same distribution, so that the model trained on the training set can perform well in the test set. However, in real applications, such an assumption is difficult to be made, and the main idea for solving such a problem is to assume that two data distributions are mapped to a common hidden space between domains through a nonlinear mapping, so as to reduce drift between the distributions, and make the two distributions more similar after undergoing the nonlinear mapping conversion. This process of nonlinear mapping is known as domain adaptation.
Existing machine learning based tongue image classification faces such problems. First, tongue images of different people have differences, including edge textures, colors, etc. of tongue images. Second, tongue image acquisition devices of different hospitals may be different, and the tongue image data acquired is also affected by the acquisition environment, such as angle, illumination, etc. In addition, the geographical locations of different hospitals are different, and individuals who acquire tongue images also have regional differences. These factors result in relatively large differences in distribution of tongue image data from hospital to hospital. If the tongue image data of each hospital is field data, the data distribution in different fields is different, and the differences can lead to a model trained on the acquired data set, and serious performance degradation can occur when the model is deployed to other hospitals. Meanwhile, since the labeling cost of medical data is higher, when the target domain has no labeled data, it is more difficult, namely, only the label of the source domain is available.
To solve this problem, the different domain distributions need to be aligned. The main stream method for solving the distribution alignment of different fields mainly comprises two steps: firstly, two distributions are closer by nonlinear conversion; then, a classifier is trained on the transformed distribution for the target domain using the label information of the source domain, so that the model can be generalized to the target domain, which is also the process of inter-domain knowledge migration. It can be seen how to find this nonlinear transformation is critical to solving the domain adaptation problem. In recent years, the optimal transmission method shows great advantages in the field self-adaption problem, and can directly measure the distance between two distributions on the edge distribution without label information. In the vision field, this Distance based on optimized transmission is called the EMD (Earth Mover's Distance) Distance. On the one hand, it can calculate the distance between two distributions directly on a discrete empirical distribution (domain). On the other hand, meaningful gradients can also be provided when the support sets of the two domains do not significantly overlap, thus not easily leading to training failure. Furthermore, it has good interpretability, enabling explicit modeling of the coupling between domains.
By optimizing the minimum transmission cost between the feature distributions of the two domains, the distributions of both the source domain and the target domain can be transformed into a common hidden space with the minimum cost, and features in the hidden space have domain invariance. This process is called domain alignment. The classifier trained on such features has the ability to migrate to the target domain. Whereas domain alignment is not the final goal, classification is only. However, in the optimal transmission, the cost matrix is generally calculated by calculating the euclidean distance (L2 distance) between every two samples. In such a metric space, a meaningful distance cannot be provided when the support sets of two samples do not overlap. This is manifested in visual problems in that when the background of two samples is too cluttered or has a large change in appearance within a class, images of the same class may be far apart in such metric space. In other words, the L2 distance at this time is greatly affected by the background change. Although this can be alleviated by modeling of neural networks, sufficient training data is required, which is difficult to achieve in real scenes (especially medical scenes), local features of the target area need to be emphasized, meanwhile, the L2 distance is used as a global representation, the spatial structure of the image features is destroyed, and local information is lost. While local information is information that can provide differentiation and migration, which is important for classification tasks. In particular, in the traditional Chinese medicine physical sign image dataset, the standardization degree of the acquisition process is poor, and the background or environmental factors are greatly changed. For the above reasons, in the existing domain alignment process, the domain invariance feature is obtained, and meanwhile, the category distinction of the feature is also blurred, namely, the excessive alignment occurs.
Therefore, how to avoid the excessive alignment phenomenon generated in the tongue image classification process and improve the accuracy of the tongue classification image are technical problems that need to be solved by those skilled in the art.
Disclosure of Invention
In order to solve the problems, the invention discloses a cross-domain tongue image classification method based on deep hierarchical optimal transmission, so that a machine learning model can learn invariance characteristics which are more robust to environmental noise, tongue image data with different distributions have self-adaptive capacity, and classification accuracy is improved.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a cross-domain tongue image classification method based on depth layering optimal transmission comprises the following steps:
s1, collecting tongue image samples in a plurality of different fields as a training set;
s2, performing feature extraction on source domain tongue image samples in the training set by using a deep neural network, and obtaining a source domain image feature map formed by corresponding source domain tongue image sample features;
extracting features of the target domain tongue image samples in the training set by using a deep neural network, and obtaining a target domain image feature map formed by corresponding target domain tongue image sample features;
s3, partitioning source domain tongue image sample characteristics in the source domain image characteristic diagram to obtain a source domain image characteristic set corresponding to the source domain tongue image sample;
dividing the characteristics of the target domain tongue image sample in the target domain image characteristic map into blocks to obtain a target domain image characteristic set corresponding to the target domain tongue image sample;
s4, calculating an optimized transmission distance between a source domain image feature set corresponding to each source domain tongue image sample and a target domain image feature set corresponding to the target domain tongue image sample, and taking the optimized transmission distance as a sample optimized transmission distance between the source domain tongue image sample and the target domain image sample;
s5, taking the sample optimized transmission distance as a cost measure between a source domain and a target domain, and calculating an inter-domain optimized transmission distance between the source domain and the target domain;
s6, calculating softmax cross entropy loss according to the source domain tongue image sample characteristic value extracted in the step S2, and taking the softmax cross entropy loss as a part of a loss function; constructing a classification loss function by taking the inter-domain optimized transmission distance as another part of the loss function, and training a classifier by using the classification loss function;
s7, classifying the tongue image sample to be verified by using the trained classifier.
In the step S4, an optimized transmission distance between a source domain image feature set corresponding to each source domain tongue image sample and a target domain image feature set corresponding to a target domain tongue image sample is calculated, which includes the following steps:
the EMD distance and the L2 distance are jointly used as a cost function between the source domain image feature set and the target domain image feature set, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
wherein g represents a feature extractor of the deep neural network;
source domain image feature map representing i-th source domain tongue image sample extraction,/for>Representing the ith source domain tongue image sample, +.>H i W and W i Respectively representing the width and the height of a source domain image feature map extracted from an ith source domain tongue image sample;
target domain image feature map representing extraction of jth target domain tongue image sample,/for>Represents the j-th target domain tongue image sample, < >>H j W and W j The width and the height of a target domain image feature map extracted by a jth target domain tongue image sample are respectively represented;
a joint image feature map representing the meta-domain image feature map and the target-domain image feature map;
γ in representing an optimal transmission scheme between any one source domain tongue image sample and any one target domain tongue image sample with respect to a corresponding image feature set, C in A cost matrix representing a set of image features between any one of the source domain tongue image samples and any one of the target domain tongue image samples with respect to the corresponding image feature set;<γ in ,C in > F representing gamma in and Cin Frobenius point multiplication of (2);global average pooling of source domain image feature maps representing i-th source domain tongue image sample extraction along spatial dimensions, +.> Global average pooling result of target domain image feature map extracted from jth target domain tongue image sample along space dimension is represented, and the target domain tongue image feature map is +.>ch represents the number of channels.
Preferably, in the step S4, an optimized transmission distance between the source domain image feature set corresponding to each source domain tongue image sample and the target domain image feature set corresponding to the target domain tongue image sample is calculated, and the method further includes the following steps:
the SWD distance and the L2 distance are jointly used as a cost function between the source domain image feature set and the target domain image feature set, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
p represents the permutation matrix and,representing the set of all permutation matrices, U i Representing the source tongue image sample +.>The corresponding features being transferred to a common Gao Weiyin layer space, U j Representing the tongue image sample of the target region +.>The corresponding features are transformed into a common Gao Weiyin layer space, T being the matrix transposed symbol.
Preferably, in the step S4, an optimized transmission distance between the source domain image feature set corresponding to each source domain tongue image sample and the target domain image feature set corresponding to the target domain tongue image sample is calculated, and the method further includes the following steps:
the cross entropy of SWD distance, L2 distance and class condition distribution difference is used as a cost function between two feature sets, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
λ swd balance coefficient lambda representing SWD distance l2 Balance coefficient lambda representing L2 distance cond Balance coefficients of cross entropy representing class conditional distribution differences,sample representing source tongue image->Is->The representation is the difference of class condition distribution, M represents the total number of projection matrixes and Z i Representing the source tongue image sample +.>Feature matrix obtained by mapping to hidden layer space Z j Representing the target Domain sample->Feature matrix mapped to hidden space Z, < >>Representing the source tongue image sample +.>Or target area tongue image sample->The projection to the hidden layer space Z forms a corresponding mth projection matrix.
Preferably, in the step S5, a mini-batch strategy is adopted, which specifically includes randomly extracting a mini-batch with a size n from each of the source domain tongue image sample and the target domain tongue image sample each time, and calculating an optimal transmission between the two mini-batches as an optimal transmission distance between the fields: wherein ,/>OT n Matrix representing an optimized transmission distance between domains, +.>Matrix representing distribution composition of source domain tongue image samples, < >>Matrix representing the composition of the tongue sample distribution of the target domain, +.>Representation-> and />Is a joint distribution of gamma n N x n matrix representing optimal transmission scheme composition between any one source domain tongue image sample and any one target domain tongue image sample with respect to corresponding image feature set, C n An n matrix representing an optimal transmission distance composition of samples between any one of the source domain tongue image samples and any one of the target domain image samples,<γ n ,C n > F representing gamma n and Cn Frobenius point multiplication of (C).
Preferably, in step S5, calculating the inter-domain optimal transmission distance between the source domain and the target domain further includes using unbalanced optimal transmission.
Preferably, in the step S5, calculating the inter-domain optimized transmission distance between the source domain and the target domain further includes adding a classification cross entropy loss function of the source domain by using unbalanced optimal transmission loss.
Compared with the prior art, the invention discloses a cross-domain tongue image classification method based on depth layering optimal transmission, which has the following beneficial effects:
the tongue images with different distributions are aligned through the optimal transmission of depth layering, and meanwhile, the classification capability is enhanced; the unbalanced optimal transmission is adopted in the optimal transmission of the first layer for field alignment, so that the edge constraint of the optimal transmission is relaxed, and the more robust optimization performance can be provided for small-batch training; for optimal transmission of the second layer, SWD is used instead of EMD distance, enhancing the distinguishing characteristics of the samples, SWD being an approximation of EMD distance but less computationally expensive. The accuracy of tongue image classification is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a depth layering image classification method provided by the invention;
fig. 2 is a schematic structural diagram of a depth layering optimal transmission model provided by the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention discloses a cross-domain tongue image classification method based on depth layering optimal transmission, which comprises the following steps:
a cross-domain tongue image classification method based on depth layering optimal transmission comprises the following steps:
s1, collecting tongue image samples in a plurality of different fields as a training set;
s2, respectively carrying out feature extraction on a source domain tongue image sample and a target domain tongue image sample in a training set by using a deep neural network to obtain a source domain image feature map formed by corresponding source domain tongue image sample features and a target domain image feature map formed by target domain tongue image sample features;
namely: performing feature extraction on source domain tongue image samples in a training set by using a deep neural network to obtain a source domain image feature map formed by corresponding source domain tongue image sample features;
extracting features of the target domain tongue image samples in the training set by using a deep neural network, and obtaining a target domain image feature map formed by corresponding target domain tongue image sample features;
s3, partitioning the source domain image feature map to obtain a source domain image feature set corresponding to the source domain tongue image sample; partitioning the target domain image feature map to obtain a target domain image feature set corresponding to the target domain tongue image sample;
s4, calculating an optimized transmission distance between a source domain image feature set corresponding to each source domain tongue image sample and a target domain image feature set corresponding to the target domain tongue image sample, and taking the optimized transmission distance as a sample optimized transmission distance between the source domain tongue image sample and the target domain image sample;
according to the invention, two samples in the optimized transmission distance of the sample come from a source domain tongue image sample and a target domain image sample respectively, so that local information is introduced while field alignment is realized, and the distinguishing property of the characteristics is maintained;
s5, taking the sample optimized transmission distance as a cost measure between a source domain and a target domain, and calculating an inter-domain optimized transmission distance between the source domain and the target domain;
s6, calculating softmax cross entropy loss according to the source domain tongue image sample characteristic value extracted in the step S2, and taking the softmax cross entropy loss as a part of a loss function; constructing a classification loss function by taking the inter-domain optimized transmission distance as another part of the loss function, and training a classifier by using the classification loss function;
s7, classifying the tongue image sample to be verified by using the trained classifier.
Assume that and />Are respectively from source domain distribution mu s And target domain distribution mu t Pi (mu) s ,μ t ) Is the source domain distribution mu s And target domain distribution mu t Is described. Let the number of samples in the two domains be N s and Nt C is greater than or equal to 0 andis mu s ,μ t Cost matrix in between, wherein each element is composed of +.>The cost between two samples is calculated to measure the difference between the two samples, and c is a cost function that measures the distance between the two samples, typically using an L2 distance. To->As a cost measure, an optimized transmission distance between the fields can be calculated. />The following method is known.
In one embodiment, the sample optimized transmission distance between the source domain tongue image sample and the target domain image sample in step S4 may use the EMD distance and the L2 distance in combination as a cost function between the source domain image feature set and the target domain image feature set:
first a feature extractor g x-Z of the base depth neural network is designed, which can map the input to a hidden layer space Z. Meanwhile, a classifier f z- & gt y is designed, and can map the hidden layer space to the label space. The image x can be obtained by the feature extractor gThen, the cost function between the source domain image feature set and the target domain image feature set may become:
wherein g represents a feature extractor of the deep neural network;
representing a source domain image feature map,>representing source domain tongue image samples,>H i w and W i Respectively representing the width and the height of a source domain image feature map;
image feature map representing the target domain->Representing a target domain tongue image sample,>H j w and W j Respectively representing the width and the height of the target domain image feature map;
γ in representing an optimal transmission scheme between two samples with respect to an image feature set, C in Cost matrix, gamma, representing the set of features between two samples with respect to the image in ∈R HiWi×HjWj ;C in ∈R HiWi×HjWj ;<γ in ,C in > F Representing gamma in and Cin Frobenius point multiplication of (2);representing the global average pooling result of the source domain image feature map along the spatial dimension, representing the global average pooling result of the target domain image feature map along the spatial dimension,ch represents the number of channels.
The feature extractor may be implemented using a convolutional layer of a convolutional neural network.
In order to further optimize the above technical solution, in another embodiment, step S4 calculates a sample optimized transmission distance between the source domain tongue image sample and the target domain image sample by:
the SWD distance and the L2 distance are jointly used as a cost function between the source domain image feature set and the target domain image feature set, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
p represents the permutation matrix and,representing the set of all permutation matrices, U i Representing sample +.>The corresponding features being transferred to a common Gao Weiyin layer space, U j Representing sample +.>The corresponding features are transformed into a common Gao Weiyin layer space, T being the matrix transposed symbol.
In a mini-batch of size n, equation (1) is calculated once between every two samples between the source domain tongue image sample and the target domain image sample, and the calculation cost is too high, equation (2) uses SWD distance (Sliced Wasserstein Distance) to calculate the cost function between the source domain image feature set and the target domain image feature set instead of EMD distance approximation:
equation (2) introduces a permutation matrix P to match different regions of the images, P comprising an association between regions of the two imagesRepresenting the set of all permutation matrices. />Representing sample +.>The corresponding features are transformed into a d-dimensional common hidden space.
In order to further optimize the above technical solution, in another embodiment, in step S4, another method for calculating the optimal transmission distance between the source domain tongue image sample and the target domain image sample is as follows:
the matching problem in equation (2) is NP-hard, time-complex too high, in order to solve this problem more efficiently, so equation (2) will be approximated by the following algorithm:
wherein ,the M projections are included, where the permutation matrix P need not be explicitly found, but rather only the projected regions are ordered and their corresponding distances are calculated. Thus, the sample optimal transmission distance between the source domain tongue image sample and the target domain image sample is approximately calculated by the formula (3), each time +.>Can be calculated from O (N) 3 ) (if solved by linear programming) or O (JN 2 ) (J is the number of iterations of the Sinkhorn scaling algorithm if solved by the Sinkhorn scaling algorithm) is reduced to O (MN), where N is the problem complexity. Since the feature map extracted by the depth feature extractor g is not large in size, and the segmented area is limited, N is relatively small, and the calculation cost is not increased too much.
The difference in edge distribution here is measured by the inner SWD distance versus the L2 distance, and class condition distribution is measured by the entropy between sample tags. Due to the target domain samplesLabel-free information, here label +.>As a proxy. By combining the alignment edge distribution and the class condition distribution, more class information can be introduced, and the distinguishing property between classes is improved. Equation (3) will therefore be converted into:
wherein ,λswd Balance coefficient lambda representing SWD distance l2 Balance coefficient lambda representing L2 distance cond Balance coefficient of cross entropy representing class condition distribution difference, M represents total number of projection matrixes and Z i Representing source domain tongue image samplesFeature matrix obtained by mapping to hidden layer space Z j Representing the tongue image sample of the target region +.>Feature matrix mapped to hidden space Z, < >>Representing the source tongue image sample +.>Or target area tongue image sample->Projecting to the hidden layer space Z to form a corresponding mth projection matrix, < >>Sample representing source tongue image->Is->The representation is a difference in the distribution of class conditions.
The three contents are contained in the formula (4), and the first term is a source domain tongue image sampleAnd objectsDomain image sample->In particular we will +.>Feature Z of (2) i and />Feature Z of (2) j Projection to a plurality of Z i and Zj Shared space in each of which Z can be respectively aligned with i and Zj After the feature subsets of (2) are ordered, the Euclidean distance is directly calculated to obtain Z i and Zj Distance in the shared space. Finally, for Z calculated in multiple shared spaces i and Zj Is taken as the average of the distances of Z i and Zi SWD distance of (c). Second item->The function adds a global average pooling operation to the g-function, equivalent to adding Z i Or Z is j Global averaging pooling is performed so that the second item of content is for Z i Corresponding global average pooling results and Z j And calculating the L2 distance according to the corresponding global average pooling result. The third term is calculate->Corresponding tag and pair->The cross entropy between labels for classification prediction represents the difference in class condition distribution. Finally, equation (4) uses three super-parameters as the balance coefficients for the three terms, weighted and summed.
Notably, the three terms of equation (4) are all complementary: the SWD distance and the L2 distance form local and global complementary information; the SWD distance and the L2 distance are all differences of edge distribution;the measure is the difference in the distribution of class conditions. By the mutual complementation of the three items, the domain alignment can be carried out while the classification distinction of the domain alignment is maintained, so that the classification performance is improved.
In this embodiment, the network structure of the feature extractor is ResNet-50, and the feature map extracted by the feature extractor g maintains its spatial structure for calculating the SWD distance. The feature extractor g was first pre-trained on ImageNet and the classifier f was trained from scratch, so the learning rate of the classifier was 10 times that of the feature extractor g. Here, several equilibrium coefficients in our loss function will be set to λ swd =0.001,λ l2 =0.001 and λ cond =1.0。
The optimizer of this embodiment employs an SGD optimizer with a momentum set of 0.9. The setting of the change strategy of the learning rate is linearly changed. The size of the small lot is 65 and iterates 10000 times.
Step S5, calculating the inter-domain optimized transmission distance between the source domain and the target domain according to the sample optimized transmission distance as a cost measure between the source domain and the target domain;
optimized transmission (Optimal Transport, OT) is a method of measuring the distance between two probability distributions, which can take advantage of the geometry of the distribution. In general, the OT searches for two distributions μ s and μt Possible coupling modes gamma e pi (mu) s ,μ t ) Finding a coupling scheme with the minimum transmission cost:
wherein and />Are respectively from source domain distribution mu s And target domain distribution mu t Any two samples of (a); />Is the cost of the cost between two samples, which is used to measure the difference between the two samples; II (mu) s ,μ t ) Is the edge distribution mu s and μt Is described. Whereas the discrete form of OT over an empirical distribution can be defined as:
wherein ,μs ,μ t In the case of a positive vector quantity,<·,·> F is the Frobenius point product. Let the number of samples in the two domains be N s and Nt C is greater than or equal to 0 andis mu s ,μ t Cost matrix between each element is composed ofCalculated. c is a cost function that measures the distance between two samples, typically using an L2 distance. By optimizing equation (6), i.e. minimizing the transmission cost, an optimal transmission stream +.>Equation (6) can be solved by linear programming.
In one embodiment, step S5, calculating an inter-domain optimal transmission distance between a source domain and a target domain includes adopting a mini-batch policy, specifically including randomly extracting a mini-batch of size n from each domain each time, and calculating an optimal transmission between the two mini-batches as a proxy optimal transmission between domains:
wherein ,considering the calculation cost of the formula (6), the invention randomly extracts mini-batch with the size of n from each domain each time, calculates the optimal transmission between the two mini-batches as the agent optimal transmission between the domains, namely, converting the formula (6) into the formula (7) C n Is calculated from equation (6) to form a hierarchical optimal transmission model.
As an improved technical solution, in another embodiment, step S5, another method of calculating an inter-domain optimized transmission distance between a source domain and a target domain uses a mini-batch based unbalanced optimal transmission method instead of formula (7):
wherein ,Dφ Csiszar Divergences, KL is the Kullback-Leibler divergence, and />Is gamma n Is a boundary distribution of the (c). Here, τ is a marginal penalty coefficient (Marginal Penalization), ε is a regularization coefficient (Regularization Coefficient), ε is equal to or greater than 0, and specifically, ε=0.01 and τ=0.5 can be set.
Thus, in each mini-batch, each field is a collection of samples. Equation (8) serves as the optimal transmission between the first layer source domain and the target domain, while the cost matrix C in equation (8) n Is calculated (4) from two samples of the corresponding source domain tongue image sample and the target domain image, whereby the second layer is an optimal transmission between the source domain tongue image sample and the target domain image sample, where each sample is a collection of spatial regions of the image feature map. Such that the two layers of optimal transmission form a depth hierarchyThe optimal transmission model (Deep Hierarchical Optimal Transport, deep) is optimized as shown in fig. 2. Thus, for a given mini-batch, the target problem for deep is:
as an improved technical solution, in another embodiment, step S5, another method calculates an inter-domain optimized transmission distance between the source domain and the target domain, and increases the cross-class entropy loss function of the source domain by using unbalanced optimal transmission loss:
the objective is to avoid the problem of "catastrophic forgetting" (Catastrophic Forgetting) on the source domain, and the final optimization objective includes a classification cross entropy loss L of the source domain.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (7)
1. A cross-domain tongue image classification method based on depth layering optimal transmission, which is characterized by comprising the following steps:
s1, collecting tongue image samples in a plurality of different fields as a training set;
s2, performing feature extraction on source domain tongue image samples in the training set by using a deep neural network, and obtaining a source domain image feature map formed by corresponding source domain tongue image sample features;
extracting features of the target domain tongue image samples in the training set by using a deep neural network, and obtaining a target domain image feature map formed by corresponding target domain tongue image sample features;
s3, partitioning source domain tongue image sample characteristics in the source domain image characteristic diagram to obtain a source domain image characteristic set corresponding to the source domain tongue image sample;
dividing the characteristics of the target domain tongue image sample in the target domain image characteristic map into blocks to obtain a target domain image characteristic set corresponding to the target domain tongue image sample;
s4, calculating an optimized transmission distance between a source domain image feature set corresponding to each source domain tongue image sample and a target domain image feature set corresponding to the target domain tongue image sample, and taking the optimized transmission distance as a sample optimized transmission distance between the source domain tongue image sample and the target domain image sample;
s5, taking the sample optimized transmission distance as a cost measure between a source domain and a target domain, and calculating an inter-domain optimized transmission distance between the source domain and the target domain;
s6, calculating softmax cross entropy loss according to the source domain tongue image sample characteristic value extracted in the step S2, and taking the softmax cross entropy loss as a part of a loss function; constructing a classification loss function by taking the inter-domain optimized transmission distance as another part of the loss function, and training a classifier by using the classification loss function;
s7, classifying the tongue image sample to be verified by using the trained classifier.
2. The method for classifying cross-domain tongue images based on depth layering optimal transmission according to claim 1, wherein the step S4 of calculating the optimal transmission distance between the source domain image feature set corresponding to each source domain tongue image sample and the target domain image feature set corresponding to the target domain tongue image sample comprises the following steps:
the EMD distance and the L2 distance are jointly used as a cost function between the source domain image feature set and the target domain image feature set, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
wherein g represents a feature extractor of the deep neural network;
source domain image feature map representing i-th source domain tongue image sample extraction,/for>Representing the ith source domain tongue image sample, +.>H i W and W i Respectively representing the width and the height of a source domain image feature map extracted from an ith source domain tongue image sample;
targets representing j-th target domain tongue image sample extractionDomain image feature map, < >>Represents the j-th target domain tongue image sample, < >>H j W and W j The width and the height of a target domain image feature map extracted by a jth target domain tongue image sample are respectively represented;
a joint image feature map representing the meta-domain image feature map and the target-domain image feature map;
γ in representing an optimal transmission scheme between any one source domain tongue image sample and any one target domain tongue image sample with respect to a corresponding image feature set, C in A cost matrix representing a set of image features between any one of the source domain tongue image samples and any one of the target domain tongue image samples with respect to the corresponding image feature set;<γ in ,C in > F representing gamma in and Cin Frobenius point multiplication of (2);global average pooling of source domain image feature maps representing i-th source domain tongue image sample extraction along spatial dimensions, +.> Global average pooling result of target domain image feature map extracted from jth target domain tongue image sample along space dimension is represented, and the target domain tongue image feature map is +.>ch tableShowing the number of channels.
3. The method for classifying cross-domain tongue images based on depth layering optimal transmission according to claim 2, wherein in step S4, an optimal transmission distance between a source domain image feature set corresponding to each source domain tongue image sample and a target domain image feature set corresponding to a target domain tongue image sample is calculated, and further comprising the following steps:
the SWD distance and the L2 distance are jointly used as a cost function between the source domain image feature set and the target domain image feature set, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
p represents the permutation matrix and,representing the set of all permutation matrices, U i Representing the source tongue image sample +.>The corresponding features being transferred to a common Gao Weiyin layer space, U j Representing the tongue image sample of the target region +.>The corresponding features are transformed into a common Gao Weiyin layer space, T being the matrix transposed symbol.
4. A method for classifying a cross-domain tongue image based on depth layering optimal transmission according to claim 3, wherein in the step S4, an optimal transmission distance between a source domain image feature set corresponding to each source domain tongue image sample and a target domain image feature set corresponding to a target domain tongue image sample is calculated, and further comprising the following steps:
the cross entropy of SWD distance, L2 distance and class condition distribution difference is used as a cost function between two feature sets, and the optimal transmission distance between the two feature sets is calculated, wherein the cost function specifically comprises the following steps:
λ swd balance coefficient lambda representing SWD distance l2 Balance coefficient lambda representing L2 distance cond Balance coefficients of cross entropy representing class conditional distribution differences,sample representing source tongue image->Is->Representing the difference of class condition distribution, M represents the total number of projection matrixes and Z i Representing the source tongue image sample +.>Feature matrix obtained by mapping to hidden layer space Z j Representing the tongue image sample of the target region +.>Feature matrix mapped to hidden space Z, < >>Representing the source tongue image sample +.>Or target area tongue image sample->The projection to the hidden layer space Z forms a corresponding mth projection matrix.
5. The method for classifying a cross-domain tongue image based on depth layering optimal transmission according to claim 4, wherein in step S5, an inter-domain optimal transmission distance between a source domain and a target domain is calculated, including using a mini-batch policy, specifically including,
each time randomly extracting a mini-batch with the size of n from each source domain tongue image sample and each target domain tongue image sample, and calculating the optimal transmission between the two mini-batches to serve as the optimal transmission distance between the fields:
wherein ,OT n matrix representing an optimized transmission distance between domains, +.>Matrix representing distribution composition of source domain tongue image samples, < >>Matrix representing the composition of the tongue sample distribution of the target domain, +.>Representation->Andis a joint distribution of gamma n N x n matrix representing optimal transmission scheme composition between any one source domain tongue image sample and any one target domain tongue image sample with respect to corresponding image feature set, C n An n matrix representing an optimal transmission distance composition of samples between any one of the source domain tongue image samples and any one of the target domain image samples,<γ n ,C n > F representing gamma n and Cn Frobenius point multiplication of (C).
6. A method for classifying a tongue image across domains based on depth layering optimal transmission according to claim 5, wherein in step S5, calculating an inter-domain optimal transmission distance between the source domain and the target domain further comprises using unbalanced optimal transmission.
7. A method for classifying a cross-domain tongue image based on depth layering optimal transmission according to claim 1, wherein in step S5, calculating an inter-domain optimal transmission distance between the source domain and the target domain further comprises adding a classification cross entropy loss function of the source domain using an unbalanced optimal transmission loss.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310252527.2A CN116310545A (en) | 2023-03-16 | 2023-03-16 | Cross-domain tongue image classification method based on depth layering optimal transmission |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310252527.2A CN116310545A (en) | 2023-03-16 | 2023-03-16 | Cross-domain tongue image classification method based on depth layering optimal transmission |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116310545A true CN116310545A (en) | 2023-06-23 |
Family
ID=86781093
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310252527.2A Pending CN116310545A (en) | 2023-03-16 | 2023-03-16 | Cross-domain tongue image classification method based on depth layering optimal transmission |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116310545A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116566743A (en) * | 2023-07-05 | 2023-08-08 | 北京理工大学 | Account alignment method, equipment and storage medium |
-
2023
- 2023-03-16 CN CN202310252527.2A patent/CN116310545A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116566743A (en) * | 2023-07-05 | 2023-08-08 | 北京理工大学 | Account alignment method, equipment and storage medium |
CN116566743B (en) * | 2023-07-05 | 2023-09-08 | 北京理工大学 | Account alignment method, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lu et al. | Class-agnostic counting | |
Krebs et al. | Unsupervised probabilistic deformation modeling for robust diffeomorphic registration | |
CN107704877B (en) | Image privacy perception method based on deep learning | |
CN106547880B (en) | Multi-dimensional geographic scene identification method fusing geographic area knowledge | |
US11494616B2 (en) | Decoupling category-wise independence and relevance with self-attention for multi-label image classification | |
Xu et al. | Ask, attend and answer: Exploring question-guided spatial attention for visual question answering | |
Papa et al. | Efficient supervised optimum-path forest classification for large datasets | |
Wang | Online Learning Behavior Analysis Based on Image Emotion Recognition. | |
Louis et al. | Riemannian geometry learning for disease progression modelling | |
Gao et al. | Small sample classification of hyperspectral image using model-agnostic meta-learning algorithm and convolutional neural network | |
Gong et al. | A coupling translation network for change detection in heterogeneous images | |
CN109102015A (en) | A kind of SAR image change detection based on complex-valued neural networks | |
CN114692732B (en) | Method, system, device and storage medium for updating online label | |
Shu et al. | LVC-Net: Medical image segmentation with noisy label based on local visual cues | |
CN113298129B (en) | Polarized SAR image classification method based on superpixel and graph convolution network | |
Ning et al. | Conditional generative adversarial networks based on the principle of homologycontinuity for face aging | |
CN111126464A (en) | Image classification method based on unsupervised domain confrontation field adaptation | |
Alshehri | A content-based image retrieval method using neural network-based prediction technique | |
Huynh et al. | Joint age estimation and gender classification of Asian faces using wide ResNet | |
Huang et al. | An evidential combination method with multi-color spaces for remote sensing image scene classification | |
Franchi et al. | Latent discriminant deterministic uncertainty | |
CN116310545A (en) | Cross-domain tongue image classification method based on depth layering optimal transmission | |
Chen et al. | A robust automatic clustering algorithm for probability density functions with application to categorizing color images | |
CN113554653A (en) | Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration | |
CN117011714A (en) | Hyperspectral image classification method based on pseudo tag assistance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |