CN117557844A

CN117557844A - Multi-model fusion tongue image intelligent classification method based on data enhancement

Info

Publication number: CN117557844A
Application number: CN202311513218.2A
Authority: CN
Inventors: 刘锡铃; 龙海侠
Original assignee: Hainan Normal University
Current assignee: Hainan Normal University
Priority date: 2023-11-14
Filing date: 2023-11-14
Publication date: 2024-02-13
Anticipated expiration: 2043-11-14
Also published as: CN117557844B

Abstract

The invention belongs to the technical field of tongue image classification, and particularly discloses a multi-model fusion tongue image intelligent classification method based on data enhancement, which comprises the following steps: constructing an original data set, expanding the original data set to obtain a training data set, and adding a prediction label and a real label to each training sample in the training data set; selecting a difficult-to-train sample, setting a difficult-to-train sample queue, and performing reinforcement training; extracting and fusing the characteristics of the training samples; and carrying out data exchange on each local trainer and the global trainer, training, classifying tongue images according to the data exchange, carrying out a plurality of training data set comparison experiments, and analyzing comparison experiment data. The method effectively solves the problem that the accuracy of the classification result is difficult to guarantee when the current data volume is insufficient, expands the use scene of tongue image classification, avoids the scene restriction of the current tongue image classification mode, and further improves the flexibility, applicability and reliability of tongue image classification.

Description

Multi-model fusion tongue image intelligent classification method based on data enhancement

Technical Field

The invention belongs to the technical field of tongue image classification, and relates to a multi-model fusion tongue image intelligent classification method based on data enhancement.

Background

In recent years, with the continuous development of artificial intelligence technology, tongue image intelligent classification plays an increasingly important role in the field of traditional Chinese medicine diagnosis. Tongue images are one of important basis of traditional Chinese medicine diagnosis, and the type, degree and change trend of diseases can be judged through observation and analysis of tongue images.

In the tongue image intelligent classification task, a training set can be expanded by using a data enhancement method, so that the diversity of training data can be effectively increased, and the robustness of a model is improved. However, the current model has higher requirements on sample size, and parameters are difficult to be effectively trained when the sample size is insufficient, so that under fitting is often caused, so that the model is difficult to achieve a good effect, and therefore, the current tongue image classification has the following defects: 1. the accuracy of classification is difficult to guarantee when the data volume is insufficient: cases of medical related organizations are confidential and the structured data that can actually be used for machine learning differs significantly. Large medical organizations may have many case resources, but for small medical organizations, cases tend to be limited, with certain limitations.

2. Federal learning is difficult to develop under limited and medical history resources, and federal model retraining is performed without adding a pseudo tag to the difficult-to-train samples.

3. The feature extraction is incomplete: the residual network, the VGG16, the LSTM and other algorithms can well extract static features of images, texts and the like, but the spatial association relation of different parts of the images is difficult to effectively model.

4. The current data transmission amount is huge, so that a large amount of communication overhead is caused, communication links are easy to block, and tongue classification efficiency cannot be guaranteed.

Disclosure of Invention

In view of this, in order to solve the problems set forth in the above background art, a multi-model fusion tongue image intelligent classification method based on data enhancement is now proposed.

The aim of the invention can be achieved by the following technical scheme: the invention provides a multi-model fusion tongue image intelligent classification method based on data enhancement, which comprises the following steps: step 1, taking each tongue fur image as each training sample, forming an original data set, expanding the original data set to obtain a training data set, and adding a prediction label and a real label to each training sample in the training data set.

And 2, respectively extracting the characteristics of different dimensions of each training sample in the training data set by combining a plurality of network structures, and fusing to obtain fused characteristics.

And 3, selecting the difficult-to-train samples of each trainer, setting a difficult-to-train sample queue according to the difficult-to-train sample, and performing reinforcement training on the difficult-to-train sample queue.

And 4, extracting attribute labels of all trainers, respectively marking the trainers with the attribute labels of the overall and the body as the overall trainers and the local trainers, exchanging data between each local trainer and the overall trainers, retraining the exchanged data, and classifying tongue images according to training results.

And 5, performing a plurality of training data set comparison experiments to obtain comparison experiment data, analyzing the comparison experiment data to obtain comparison experiment results, and further outputting the comparison experiment results.

Preferably, the specific implementation process of the fusion in the step 2 includes the following steps: and selecting training samples in the training data set in a random voting mode, and marking each selected training sample as a target sample.

And extracting static characteristics of the target sample through a residual error network.

Spatial features between internal body structures of the target sample are extracted through the capsule network.

And reducing the dimension of the static features extracted from the residual network and the spatial features extracted from the capsule network through a plurality of layers of convolution.

And fusing the static characteristics and the space characteristics after dimension reduction through a fusion algorithm.

Preferably, the fusion algorithm is specifically expressed as: x is training sample data of a trainer i, i is a trainer number, i=1, 2, &.. _merge Representing the fusion characteristics.

Preferably, the difficult training sample selection is obtained through a training sample selection model, wherein the training sample selection model specifically represents:where j represents the training sample number in the training dataset, j=1, 2,.. M is a reference minimum cosine value for setting characteristics and fusion characteristics of samples difficult to train, and M _j Representing cosine distance between predicted label and real label corresponding to jth training sample in training data set, H _j ←x if m _j M is less than or equal to the minimum cosine value of the reference of the feature of the difficult training sample and the fusion feature when the cosine distance between the predicted label and the real label corresponding to the jth training sample in the training data set is less than or equal to the minimum cosine value of the reference of the feature of the difficult training sample and the fusion feature, the training samples are collected to a difficult training sample queue H _j In L tag number, avg _j′ And representing the average value of all the real sample characteristics corresponding to the real labels in the training sample of the trainer i.

Preferably, the data exchange between each local trainer and the global trainer includes: taking a residual network and a capsule network as basic network architecture of each trainer under a federal learning model, and based on the federal learning model, each local trainer encrypts fusion characteristics of each local trainer through a homomorphic encryption model to further obtain an encryption parameter matrix P of each local trainer _g G represents the local trainer number, g=1, 2.

Based on encryption parametersMatrix construction of reduction parameter matrix W of each ontology trainer _g ，

Wherein, the sig function represents the sign of the element in the encryption matrix, a represents the dimension of the matrix to be encrypted, d represents the length of the vector in the dimension of the matrix to be encrypted, y is a set natural constant, x 'is a matrix element, and x' =f _a,d 。

Setting decryption parameter matrix The parameter Q is the decryption function of the homomorphic encryption matrix, ">Representing the inner product of the matrix corresponding elements.

And according to the encryption parameter matrix and the decryption parameter matrix of the corresponding fusion characteristics of each local trainer, each local trainer and each global trainer exchange data in an asynchronous transmission mode.

Preferably, the homomorphic encryption model is specifically expressed as:in (1) the->Representing rounding up symbols.

Preferably, the performing a plurality of training dataset comparison experiments comprises the steps of: a1, taking the current multi-model fusion tongue image classification as a target tongue image classification rule, and extracting each tongue image classification rule which is currently accumulated and formulated from a tongue image classification database as each reference tongue image classification rule.

A3, setting each comparison data set, taking each comparison data set as each experimental data set, and simultaneously formulating each evaluation index.

A2, randomly extracting Y ₀ The local trainers are taken as the local trainers of each experiment, and Y is randomly extracted at the same time ₁ And the global trainers are used as the global trainers of all experiments, and an experiment data set allocation strategy is formulated.

And A4, carrying out fusion comparison experiments on all experimental data sets based on the target tongue image classification rules and all reference tongue image classification rules, and recording fusion comparison experiment data.

And A5, updating the test data set, taking the updated test data set as an ablation experiment data set, carrying out an ablation comparison experiment on the ablation experiment data set based on the target tongue image classification rule and each reference tongue image classification rule, and recording ablation comparison experiment data.

Preferably, the formulating the experimental dataset distribution strategy includes: and integrating the number of the experimental global trainers and the number of the experimental local trainers to obtain the experimental trainers, sequencing the experimental trainers according to the first sequencing rule of the experimental global trainers, and taking the sequencing result as the serial number of the experimental trainers.

Performing scale distribution of the experimental data sets on each experimental trainer according to a random distribution rule to obtain the distribution scale of the experimental data sets corresponding to each experimental trainer, wherein the random distribution rule is specifically expressed ask′ _q The scale of allocation of the corresponding experimental data set for the q-th experimental trainer, q representing the experimental trainer number, q=1, 2. _q-1 The scale of assignment of the corresponding experimental data set for the q-1 th experimental trainer is shown.

Setting a training data set and a test data set, setting the allocation ratio of the training data set and the test data set, and marking the allocation ratio of the training data set as k _{Training device} The allocation duty ratio of the test data set is recorded as k _Measuring And let k _{Training device} And k _Measuring Distribution rule of training data set and test data set corresponding to each experiment trainerAnd (5) molding.

The distribution scale of the experimental data set and the distribution scales of the training data set and the test data set are used as the experimental data set distribution strategy.

Preferably, said analyzing said comparative experimental data comprises: and extracting a target tongue image classification rule and an elevated value of each evaluation index corresponding to each reference tongue image classification rule in each experimental data set from the fusion comparison experimental data.

And carrying out corresponding difference on the target tongue image classification rule and each reference tongue image classification rule at the lifting value of each evaluation index corresponding to each experimental data set, and taking the difference value as a lifting value deviation.

If the deviation of the lifting value of the target tongue image classification rule corresponding to a certain evaluation index in a certain experimental data set and the lifting value of the target tongue image classification rule corresponding to the evaluation index in the experimental data set is greater than 0, taking the reference tongue image classification rule as a first optimized tongue image classification rule, taking the experimental data set as a first optimized experiment data set of the first optimized tongue image classification rule, and taking the evaluation index as an optimized evaluation index of the first optimized experiment data set.

Extracting a lifting value deviation difference U of each optimization evaluation index corresponding to each first optimization experiment data set under each first optimization tongue image classification rule corresponding to the target tongue image classification rule _rvl R represents the number of the first optimized tongue image classification rule,v denotes the first optimization experimental data set number, v=1, 2, & gt _{Excellent (excellent)} 。

Screening and extracting target tongue image classification rules and descending values of each reference tongue image classification rule corresponding to each evaluation index in each experimental data set from the ablation comparison experimental data according to the psi _{Excellent (excellent)} The statistical mode of the target tongue image classification rule is obtained through the same statistics, and the ablation accurate optimization trend psi 'of the target tongue image classification rule is obtained' _{Excellent (excellent)} 。

Comprehensive accuracy of statistics target tongue image classification ruleOptimizing trend ψ _Heald ，ψ ₀ 、ψ ₁ Respectively setting reference fusion experiment precision optimization trend degree and ablation experiment precision optimization trend degree, and combining psi _{Excellent (excellent)} 、ψ′ _{Excellent (excellent)} Sum phi _Heald As a result of comparative experimental analysis.

Preferably, a specific statistical formula of the fused precise optimization trend of the target tongue image classification rule is as follows:wherein K is _{Ginseng radix} Representing tongue image classification rule number, U ₀ For setting the reference evaluation index boost value +.>Sigma and epsilon are the number of the first optimized tongue image classification rules, the number of the first optimized experimental data sets and the number of the optimized evaluation indexes respectively.

Compared with the prior art, the invention has the following beneficial effects: (1) According to the invention, the static characteristics of the image are extracted by using the residual network, the spatial characteristics between the internal main structures of the image are extracted by using the capsule network, then the dimension reduction of the characteristics of the residual network and the capsule network is realized by using a plurality of layers of convolution, finally the extracted characteristics of the residual network and the capsule network are fused, the residual network and the capsule network are used as basic network architecture of each trainer of the federal learning model, further the federal learning is developed based on the architecture, the problem that the accuracy of classification results is difficult to guarantee when the current data quantity is insufficient is effectively solved, the use scene of tongue image classification is expanded, the scene restriction of the current tongue image classification mode is avoided, and the flexibility, the applicability and the reliability of tongue image classification are further improved.

(2) According to the invention, by adding the predictive label and the real label to the training sample, the possibility and convenience are provided for the development of federal learning under the background of limited case resources, the addition of the pseudo label of the difficult-to-train sample is realized, the subsequent difficult-to-train sample is convenient for repeated training of the federal learning model, and the accuracy and the effectiveness of the subsequent tongue image classification are further improved.

(3) According to the invention, the static features of the image are extracted by using the residual error network, the spatial features between the main structures in the image are extracted by using the capsule network, the defect that the current image features are extracted incompletely is avoided, the spatial feature extraction of different parts of the tongue fur image is realized, and the convenience and feasibility of constructing the federal learning model are improved.

(4) When the invention carries out data transmission between a local trainer and a global trainer, the encryption parameter matrix and the decryption parameter matrix are set based on the federal learning model and the homomorphic encryption model, and meanwhile, the data transmission is carried out in an asynchronous transmission mode, so that the current transmission data volume is effectively reduced, the communication overhead is further reduced, the blocking of a communication link is effectively avoided, the tongue classification efficiency is ensured, and the loss probability and the transmission error probability in the data transmission process are also effectively reduced on the other side.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of the steps of the method of the present invention.

FIG. 2 is a schematic diagram of a federal learning model according to the present invention.

Fig. 3 is a schematic diagram of data exchange transfer according to the present invention.

FIG. 4 is a diagram illustrating the extraction of network structural features of a trainer according to the present invention.

Fig. 5 is a comparison chart of a fusion experiment of tongue image classification rules in a 2CLS dataset according to the present invention.

FIG. 6 is a comparative graph of a fusion experiment of tongue image classification rules of the present invention in a ZXSFL dataset.

Fig. 7 is a comparison graph of an ablation experiment of the tongue image classification rule of the present invention in a 2CLS dataset.

FIG. 8 is a comparative graph of an ablation experiment of tongue image classification rules of the present invention in a ZXSFL dataset.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, the invention provides a multi-model fusion tongue image intelligent classification method based on data enhancement, which comprises the following steps: step 1, taking each tongue fur image as each training sample, forming an original data set, expanding the original data set to obtain a training data set, and adding a prediction label and a real label to each training sample in the training data set.

In a specific embodiment, the expansion mode of expanding the original data set to obtain the training data set is as follows: each training sample in the raw data set is incremented by a number of training samples including, but not limited to, flipping and mirroring, and the incremented number of training samples and the number of training samples are then combined into the training data set.

The embodiment of the invention provides possibility and convenience for the development of federal learning under the background of limited case resources by adding the predictive label and the real label to the training sample, realizes the addition of the pseudo label of the difficult-to-train sample, is convenient for the subsequent difficult-to-train sample to carry out repeated training of the federal learning model, and further improves the accuracy and the effectiveness of the subsequent tongue classification.

The specific implementation process of fusion in step 2 includes the following steps: and selecting training samples in the training data set in a random voting mode, and marking each selected training sample as a target sample.

In a specific embodiment, a specific fusion process for performing fusion is shown in fig. 4, first, a certain batch of samples are obtained by using a random voting mode, and static features and spatial features of the samples are extracted by using a residual network and a capsule network respectively. Then continuing to carry out 3 times of basic convolution, wherein the convolution kernel sizes are respectively as follows: 3×3,2×2, the step sizes are respectively: and 2 multiplied by 2, reducing the dimension of the characteristics extracted by the residual network and the capsule network through 3-layer simple convolution, and finally fusing the residual network and the capsule network through a fusion algorithm.

According to the embodiment of the invention, the static characteristics of the image are extracted by using the residual network, the spatial characteristics among the main structures in the image are extracted by using the capsule network, the defect that the current image characteristics are extracted incompletely is avoided, the spatial characteristic extraction of different parts of the tongue fur image is practiced, and the convenience and feasibility of constructing the federal learning model are further improved.

Further, the fusion algorithm is specifically expressed as:x is training sample data of a trainer i, i is a trainer number, i=1, 2, &.. _merge Representing the fusion characteristics.

Specifically, the difficult training sample selection is obtained through training sample selection model selection, wherein the training sample selection model specifically comprises the following steps:where j represents the training sample number in the training dataset, j=1, 2,.. M is a reference minimum cosine value for setting characteristics and fusion characteristics of samples difficult to train, and M _j Representing cosine distance between predicted label and real label corresponding to jth training sample in training data set, H _j ←x if m _j M is less than or equal to the minimum cosine value of the reference of the feature of the difficult training sample and the fusion feature when the cosine distance between the predicted label and the real label corresponding to the jth training sample in the training data set is less than or equal to the minimum cosine value of the reference of the feature of the difficult training sample and the fusion feature, the training samples are collected to a difficult training sample queue H _j In L tag number, avg _j′ And representing the average value of all the real sample characteristics corresponding to the real labels in the training sample of the trainer i.

It is to be noted that assume f _ix Representing the features extracted by the trainer i on the sample x, f _ix ＝G _i,α,β Avg' _j′ The calculation formula of (2) is as follows:dim represents feature fusion dimensions of each trainer, wherein the dimensions of the feature fusion of each trainer are the same, z represents dimensions of a vector corresponding to a spatial feature, and k represents dimensions of a vector corresponding to a static feature.

Specifically, each local trainer exchanges data with a global trainer, including: b1, taking a residual error network and a capsule network as bases of all trainers under a federal learning modelThe network architecture is based on a federal learning model, and each local trainer encrypts the fusion characteristics of the local trainer through a homomorphic encryption model to further obtain an encryption parameter matrix P of each local trainer _g G represents the local trainer number, g=1, 2.

In one embodiment, the federal learning model is illustrated with reference to fig. 2, each local trainer has its own private data set, and after training on the private data set, participates in data exchange with the global trainer.

Understandably, the homomorphic encryption model is specifically expressed as:in (1) the->Represents an upward rounding symbol, y is a set natural constant, f _a,d For matrix elements, a represents the dimension of the matrix to be encrypted, and d represents the length of the vector in the dimension of the matrix to be encrypted.

In a specific embodiment, the encryption precision is completely dependent on the amplification ratio, and the smaller the amplification ratio is, the more precision is lost, and in order to ensure the encryption precision, the specific value of y can be 7.

B2, constructing a restoration parameter matrix W of each body trainer based on the encryption parameter matrix _g ，

Wherein the sig function characterizes the sign of an element in the encryption matrix, x 'is a matrix element, x' =f _a,d 。

B3, setting decryption parameter matrix The parameter Q is the decryption function of the homomorphic encryption matrix,representing the inner product of the matrix corresponding elements.

And B4, according to the encryption parameter matrix and the decryption parameter matrix of the fusion characteristics corresponding to each local trainer, each local trainer and each global trainer exchange data in an asynchronous transmission mode, as shown in fig. 3, a dotted line indicates that no communication exists between some local trainers and global trainers in the current communication process.

It should be noted that, the local trainer uploads the encrypted gradient, the global experience pool is used for storing the uploading gradient of the local trainer, after the global experience pool is full, all samples in the global experience pool are emptied, the network parameters of the global trainer are safely aggregated and updated, and the aggregation mode sums the gradients uploaded by the local trainers.

In a specific embodiment, the local trainer may be understood as a client, and the global trainer may be understood as a server, where the lack of communication between the local trainer and the global trainer may be that the computing capability of the local trainer is insufficient, so that the gradient calculation cannot be completed within a specified time, or the global trainer randomly selects a part of the local trainers to broadcast the gradient, so that a number of local trainers cannot update parameters at a time.

When the embodiment of the invention is used for transmitting data of a local trainer and a global trainer, the encryption parameter matrix and the decryption parameter matrix are set based on the federal learning model and the homomorphic encryption model, and meanwhile, the data transmission is carried out in an asynchronous transmission mode, so that the current transmission data amount is effectively reduced, the communication overhead is further reduced, the blocking of a communication link is effectively avoided, the tongue classification efficiency is ensured, and the loss probability and the transmission error probability in the data transmission process are also effectively reduced in the other layer.

In the embodiment of the invention, a plurality of training data set comparison experiments are carried out, and the method comprises the following steps: a1, taking the current multi-model fusion tongue image classification as a target tongue image classification rule, and extracting each tongue image classification rule which is currently accumulated and formulated from a tongue image classification database as each reference tongue image classification rule.

In a specific embodiment, the target tongue image classification rule is simplified into an AFME algorithm, and a CapsNet+LSTM algorithm, a ResNet+BILSTM algorithm and a ResNetBlock+CapsNet algorithm are selected as each reference tongue image classification rule.

In a specific embodiment, the medical dataset datasets 2CLS and ZXSFL are selected as each experimental dataset, and the dataset scale is expanded in a manner of simultaneously turning over on the X-axis, the Y-axis, and the X-axis, wherein the basic information of each experimental dataset is shown in table 1.

Table 1 basic information of experimental data set

Data set name	Data type	Original dataset size	Post expansion dataset size	Category(s)
					2CLS	Image data set	831	3324	4
ZXSFL	Image data set	1778	7112	2

In another embodiment, four of Accuracy, precision, recall-Score and F1-Score are selected as evaluation indexes, recall-Score is simplified to Recall, F1-Score is simplified to F1, and the evaluation indexes are calculated by the existing mature technology, which is not described herein, wherein tongue image classification rules and evaluation indexes can be shown in table 2.

Table 2 comparison algorithm and evaluation index

Understandably, formulating an experimental dataset distribution strategy includes: a2-1, integrating the number of all experimental global trainers and the number of all experimental local trainers to obtain all experimental trainers, sequencing all experimental trainers according to the prior sequencing rule of all experimental trainers, and taking the sequencing result as the number of all experimental trainers.

A2-2, carrying out scale distribution of the experimental data sets on each experimental trainer according to a random distribution rule to obtain the distribution scale of the experimental data sets corresponding to each experimental trainer, wherein the random distribution rule is specifically expressed as k′ _q The scale of allocation of the corresponding experimental data set for the q-th experimental trainer, q representing the experimental trainer number, q=1, 2. _q-1 The scale of assignment of the corresponding experimental data set for the q-1 th experimental trainer is shown.

A2-3, setting a training data set and a test data set, setting the allocation duty ratio of the training data set and the test data set, and marking the allocation duty ratio of the training data set as k _{Training device} The allocation duty ratio of the test data set is recorded as k _Measuring And let k _{Training device} And k _Measuring Each experimental trainer corresponds to the distribution scale of the training data set and the test data set.

A2-4, taking the distribution scale of the experimental data set and the distribution scales of the training data set and the test data set as an experimental data set distribution strategy.

In a specific embodiment, for ease of analysis, the embodiment of the present invention selects one global trainer and three local trainers as experimental trainers, where the experimental trainers' experimental dataset allocation strategy can be referred to in table 3.

Table 3 experimental trainer experimental data set assignments

Training person	Data set size	Training set	Test set
				Global experiment trainer	10％	80％	20％
Local experiment trainer 1	20％	80％	20％
				Local experiment trainer 2	30％	80％	20％
Local experiment trainer 3	40％	80％	20％

In a specific embodiment, based on the fusion contrast experimental data of the fusion contrast experiment performed on the 2CLS experimental data set by using the target tongue image classification rule and each reference tongue image classification rule shown in fig. 5, fig. 5 (a) shows the F1 value of the fusion contrast experiment performed on the 2CLS experimental data set by using the target tongue image classification rule and each reference tongue image classification rule, fig. 5 (b) shows the Accuracy value of the fusion contrast experiment performed on the 2CLS experimental data set by using the target tongue image classification rule and each reference tongue image classification rule, fig. 5 (c) shows the Precision value of the fusion contrast experiment performed on the 2CLS experimental data set by using the target tongue image classification rule and each reference tongue image classification rule, and fig. 5 (d) shows the Recall value of the fusion contrast experiment performed on the 2CLS experimental data set by using the target tongue image classification rule and each reference tongue image classification rule.

In a specific embodiment, based on the fusion contrast experimental data of the fusion contrast experiment performed on the ZXSFL experimental data set by using the target tongue image classification rule and each reference tongue image classification rule shown in fig. 6, fig. 6 (a) shows the F1 value of the fusion contrast experiment performed on the ZXSFL experimental data set by using the target tongue image classification rule and each reference tongue image classification rule, fig. 6 (b) shows the Accuracy value of the fusion contrast experiment performed on the ZXSFL experimental data set by using the target tongue image classification rule and each reference tongue image classification rule, fig. 6 (c) shows the Precision value of the fusion contrast experiment performed on the ZXSFL experimental data set by using the target tongue image classification rule and each reference tongue image classification rule, and fig. 6 (d) shows the Recall value of the fusion contrast experiment performed on the ZXSFL experimental data set by using the target tongue image classification rule and each reference tongue image classification rule.

In another embodiment, the fusion contrast experimental data are specifically shown with reference to tables 4, 5 and 6 for the convenience of subsequent analysis.

Table 4 experimental summary at 2CLS and ZXSFL experimental data set

Table 5 summary of the improvement effect of each evaluation index in 2CLS experimental dataset

Table 6 summary of the boosting effect of each evaluation index in the ZXSFL experimental dataset

It can be understood from tables 5 and 6 that the average accuracy of the AFME algorithm on the 2CLS experimental data set is improved by 5.111% at the maximum, 3.795% at the minimum, and the accuracy is improved by 8.010% at the maximum, 3.031% at the minimum, 2.953% at the maximum, 2.265% at the minimum, and 4.972% at the minimum, and 8.829% at the maximum, compared with the reference tongue classification rules, for the AFME algorithm, the AFME algorithm has a certain superiority and robustness.

A5, updating the test data set, taking the updated test data set as an ablation experiment data set, carrying out an ablation contrast experiment on the ablation experiment data set based on the target tongue image classification rule and each reference tongue image classification rule, recording the ablation contrast experiment data,

In a specific embodiment, based on the ablation contrast experiment data of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the 2CLS experiment data set shown in fig. 7, fig. 7 (a) shows the F1 value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the 2CLS experiment data set, fig. 7 (b) shows the Accuracy value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the 2CLS experiment data set, fig. 7 (c) shows the Precision value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the 2CLS experiment data set, and fig. 7 (d) shows the Recall value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the 2CLS experiment data set.

In a specific embodiment, based on the ablation contrast experiment data of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the ZXSFL experiment data set shown in fig. 8, fig. 8 (a) shows the F1 value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the ZXSFL experiment data set, fig. 8 (b) shows the Accuracy value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the ZXSFL experiment data set, fig. 8 (c) shows the Precision value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the ZXSFL experiment data set, and fig. 8 (d) shows the Recall value of the target tongue image classification rule and each reference tongue image classification rule for performing the ablation contrast experiment on the ZXSFL experiment data set.

Ablation contrast experimental data may be as shown in table 7 for ease of subsequent analysis.

Table 7 summary of ablation vs. experimental data

Understandably, from table 7, it can be seen that the AFME algorithm trained without adding an expanded sample in the training set has a reduced sample predictive power on the test set, but the proportion of the reduction is not high. Compared with the experimental result of the algorithm CapsNet+LSTM, the accuracy of the CapsNet+LSTM algorithm on the 2CLS experimental data set is reduced by 9.64%, the prediction accuracy on the ZXSFL experimental data set is reduced by 4.75%, namely, the prediction accuracy on the two experimental data sets is reduced by 7.15% on average, the accuracy of the ResNetblock+CapsNet algorithm on the 2CLS data set is reduced by 8.42%, the prediction accuracy on the ZXSFL experimental data set is reduced by 6.14%, namely, the prediction accuracy on the two experimental data sets is reduced by 7.25% on average, the prediction accuracy of the ResNet+BILSTM algorithm on the 2CLS experimental data set is reduced by 6.93%, the prediction accuracy on the ZXSFL experimental data set is reduced by 5.25%, the prediction accuracy on the two experimental data sets is reduced by 6.09%, and the target tongue image classification rule of the invention is reduced by 1.04%, the prediction accuracy on the 2CLS experimental data set is reduced by 1.73% on average.

According to the embodiment of the invention, through fusion comparison experiments and ablation comparison experiments, the classification accuracy and robustness of the current tongue image classification mode are intuitively displayed, and through comparison experiments of multiple experimental data sets, multiple evaluation indexes and multiple comparison groups, the scientificity and the referential of the current tongue image classification accuracy evaluation are ensured, data assistance is provided for the construction of a subsequent tongue image classification model, the optimization of the tongue image classification model is facilitated, and the selection of a subsequent user is facilitated.

Further, analyzing the comparative experimental data, comprising: and X1, extracting a target tongue image classification rule from the fusion comparison experimental data, wherein each reference tongue image classification rule corresponds to the lifting value of each evaluation index in each experimental data set.

And X2, carrying out corresponding difference on the target tongue image classification rule and each reference tongue image classification rule at the lifting value of each evaluation index corresponding to each experimental data set, and taking the difference value as a lifting value deviation.

And X3, if the deviation of the lifting value of the target tongue image classification rule corresponding to a certain evaluation index in a certain experimental data set and the lifting value of the target tongue image classification rule corresponding to the evaluation index in the experimental data set is greater than 0, taking the reference tongue image classification rule as a first optimized tongue image classification rule, taking the experimental data set as a first optimized experiment data set of the first optimized tongue image classification rule, and taking the evaluation index as an optimized evaluation index of the first optimized experiment data set.

X4, extracting a lifting value deviation difference U of each optimization evaluation index corresponding to each first optimization experiment data set under each first optimization tongue image classification rule corresponding to the target tongue image classification rule _rvl R represents the number of the first optimized tongue image classification rule,v denotes the first optimization experimental data set number, v=1, 2, & gt _{Excellent (excellent)} ，Wherein K is _{Ginseng radix} Representing tongue image classification rule number, U ₀ For setting the reference evaluation index boost value +.>Sigma and epsilon are the number of the first optimized tongue image classification rules, the number of the first optimized experimental data sets and the number of the optimized evaluation indexes respectively.

And X5, screening and extracting a target tongue image classification rule and a descending value of each reference tongue image classification rule corresponding to each evaluation index in each experimental data set from the ablation comparison experimental data, and carrying out corresponding difference between the target tongue image classification rule and the descending value of each reference tongue image classification rule corresponding to each evaluation index in each experimental data set to obtain a descending value deviation.

And X6, if the deviation between the descending value of the target tongue image classification rule corresponding to a certain evaluation index in a certain experimental data set and the descending value of the target tongue image classification rule corresponding to the evaluation index in the experimental data set is smaller than 0, taking the reference tongue image classification rule as a second optimized tongue image classification rule, taking the experimental data set as a second optimized experiment data set of the second optimized tongue image classification rule, and taking the evaluation index as an optimized evaluation index of the second optimized experiment data set.

X7 is according to ψ _{Excellent (excellent)} The statistical mode of the target tongue image classification rule is obtained through the same statistics, and the ablation accurate optimization trend psi 'of the target tongue image classification rule is obtained' _{Excellent (excellent)} 。

X8, comprehensive accurate optimization trend psi of statistical target tongue image classification rule _Heald ，ψ ₀ 、ψ ₁ Respectively setting reference fusion experiment precision optimization trend degree and ablation experiment precision optimization trend degree, and combining psi _{Excellent (excellent)} 、ψ′ _{Excellent (excellent)} Sum phi _Heald As a result of comparative experimental analysis.

The foregoing is merely illustrative and explanatory of the principles of this invention, as various modifications and additions may be made to the specific embodiments described, or similar arrangements may be substituted by those skilled in the art, without departing from the principles of this invention or beyond the scope of this invention as defined in the claims.

Claims

1. A multi-model fusion tongue image intelligent classification method based on data enhancement is characterized in that: comprising the following steps:

step 1, taking each tongue fur image as each training sample, forming an original data set, expanding the original data set to obtain a training data set, and adding a prediction label and a real label to each training sample in the training data set;

step 2, respectively extracting features of different dimensions of each training sample in the training data set by combining a plurality of network structures, and fusing to obtain fused features;

Step 3, selecting a difficult-to-train sample for each trainer, setting a difficult-to-train sample queue according to the difficult-to-train sample selection, and performing reinforcement training on the difficult-to-train sample queue;

step 4, extracting attribute labels of all trainers, respectively marking the trainers with the attribute labels of the overall and the body as the overall trainers and the local trainers, exchanging data between each local trainer and the overall trainer, retraining the exchanged data, and classifying tongue images according to training results;

2. The intelligent classification method of the multi-model fusion tongue images based on data enhancement of claim 1 is characterized in that: the specific implementation process of the fusion in the step 2 comprises the following steps:

training sample selection is carried out in a training data set in a random voting mode, and each selected training sample is marked as a target sample;

extracting static characteristics of a target sample through a residual error network;

extracting spatial features among internal main body structures of a target sample through a capsule network;

Realizing dimension reduction of static features extracted from a residual error network and spatial features extracted from a capsule network through a plurality of layers of convolution;

3. The intelligent classification method of the multi-model fusion tongue images based on data enhancement according to claim 2, which is characterized in that: the fusion algorithm is specifically expressed as:x is training sample data of a trainer i, i is a trainer number, i=1, 2, &.. _merge Representing the fusion characteristics.

4. The intelligent classification method of the multi-model fusion tongue images based on data enhancement of claim 1 is characterized in that: the difficult training sample selection is obtained through training sample selection models, wherein the training sample selection models are specifically expressed as:where j represents the training sample number in the training dataset, j=1, 2,.. M is a reference minimum cosine value for setting characteristics and fusion characteristics of samples difficult to train, and M _j Representing cosine distance between predicted label and real label corresponding to jth training sample in training data set, H _j ←x if m _j M is less than or equal to the minimum cosine value of the reference of the feature of the difficult training sample and the fusion feature when the cosine distance between the predicted label and the real label corresponding to the jth training sample in the training data set is less than or equal to the minimum cosine value of the reference of the feature of the difficult training sample and the fusion feature, the training samples are collected to a difficult training sample queue H _j In L tag number, avg _j′ And representing the average value of all the real sample characteristics corresponding to the real labels in the training sample of the trainer i.

5. The intelligent classification method of the multi-model fusion tongue images based on data enhancement according to claim 2, which is characterized in that: each local trainer exchanges data with a global trainer, and the method comprises the following steps:

taking a residual network and a capsule network as basic network architecture of each trainer under a federal learning model, and based on the federal learning model, each local trainer encrypts fusion characteristics of each local trainer through a homomorphic encryption model to further obtain an encryption parameter matrix P of each local trainer _g G represents a local trainer number, g=1, 2,..;

restoring parameter matrix W of each ontology trainer is constructed based on encryption parameter matrix _g ，

Wherein, the sig function represents the sign of the element in the encryption matrix, a represents the dimension of the matrix to be encrypted, d represents the length of the vector in the dimension of the matrix to be encrypted, y is a set natural constant, x 'is a matrix element, and x' =f _a,d ；

Setting decryption parameter matrix The parameter Q is the decryption function of the homomorphic encryption matrix, ">Representing solving inner products of corresponding elements of the matrix;

6. The intelligent classification method of the multi-model fusion tongue images based on data enhancement according to claim 5, which is characterized in that: by a means ofThe homomorphic encryption model is specifically expressed as:in (1) the->Representing rounding up symbols.

7. The intelligent classification method of the multi-model fusion tongue images based on data enhancement of claim 1 is characterized in that: the method for performing the plurality of training data set comparison experiments comprises the following steps:

a1, taking the current multi-model fusion tongue image classification as a target tongue image classification rule, and extracting each tongue image classification rule which is currently accumulated and formulated from a tongue image classification database as each reference tongue image classification rule;

a3, setting each comparison data set, taking each comparison data set as each experimental data set, and simultaneously formulating each evaluation index;

a2, randomly extracting Y ₀ The local trainers are taken as the local trainers of each experiment, and Y is randomly extracted at the same time ₁ The global trainers are used as the global trainers of all experiments, and an experiment data set allocation strategy is formulated;

a4, carrying out fusion comparison experiments on all experimental data sets based on the target tongue image classification rules and all reference tongue image classification rules, and recording fusion comparison experiment data;

8. The intelligent classification method for the multi-model fusion tongue images based on data enhancement of claim 7 is characterized in that: the making of the experimental data set allocation strategy comprises the following steps:

integrating the number of the experimental global trainers and the number of the experimental local trainers to obtain the experimental trainers, sequencing the experimental trainers according to the first sequencing rule of the experimental global trainers, and taking the sequencing result as the serial number of the experimental trainers;

performing scale distribution of the experimental data sets on each experimental trainer according to a random distribution rule to obtain the distribution scale of the experimental data sets corresponding to each experimental trainer, wherein the random distribution rule is specifically expressed as k _q ' represents the scale of allocation of the corresponding experimental data set for the q-th experimental trainer, q represents the experimental trainer number, q=1, 2 _q ′ _-1 Representing the distribution scale of the corresponding experimental data set of the (q-1) th experimental trainer;

setting a training data set and a test data set, setting the allocation ratio of the training data set and the test data set, and marking the allocation ratio of the training data set as k _{Training device} The allocation duty ratio of the test data set is recorded as k _Measuring And let k _{Training device} And k _Measuring Each experimental trainer corresponds to the distribution scale of the training data set and the test data set;

9. The intelligent classification method for the multi-model fusion tongue images based on data enhancement of claim 7 is characterized in that: said analyzing said comparative experimental data comprising:

extracting a target tongue image classification rule and an elevated value of each evaluation index corresponding to each reference tongue image classification rule in each experimental data set from the fusion comparison experimental data;

performing corresponding difference on the target tongue image classification rule and each reference tongue image classification rule at the lifting value of each evaluation index corresponding to each experimental data set, and taking the difference value as a lifting value deviation;

If the deviation of the lifting value of the target tongue image classification rule corresponding to a certain evaluation index in a certain experimental data set and the lifting value of the target tongue image classification rule corresponding to the evaluation index in the experimental data set is greater than 0, taking the reference tongue image classification rule as a first optimized tongue image classification rule, taking the experimental data set as a first optimized experiment data set of the first optimized tongue image classification rule, and taking the evaluation index as an optimized evaluation index of the first optimized experiment data set;

extracting a lifting value deviation difference U of each optimization evaluation index corresponding to each first optimization experiment data set under each first optimization tongue image classification rule corresponding to the target tongue image classification rule _rvl R represents the number of the first optimized tongue image classification rule,v denotes the first optimization experimental data set number, v=1, 2, & gt _{Excellent (excellent)} ；

Screening and extracting target tongue image classification rules and descending values of each reference tongue image classification rule corresponding to each evaluation index in each experimental data set from the ablation comparison experimental data according to the psi _{Excellent (excellent)} The statistical mode of the target tongue image classification rule is obtained through the same statistics, and the ablation accurate optimization trend psi 'of the target tongue image classification rule is obtained' _{Excellent (excellent)} ；

Comprehensive accurate optimization trend psi for statistics of target tongue image classification rules _Heald ，ψ ₀ 、ψ ₁ Respectively setting reference fusion experiment precision optimization trend degree and ablation experiment precision optimization trend degree, and combining psi _{Excellent (excellent)} 、ψ′ _{Excellent (excellent)} Sum phi _Heald As a result of comparative experimental analysis.

10. The intelligent classification method of the multi-model fusion tongue images based on data enhancement of claim 9 is characterized in that: the specific statistical formula of the fused precise optimization trend of the target tongue image classification rule is as follows:wherein K is _{Ginseng radix} Representing tongue image classification rule number, U ₀ For setting the reference evaluation index boost value +.>Sigma and epsilon are the number of the first optimized tongue image classification rules, the number of the first optimized experimental data sets and the number of the optimized evaluation indexes respectively.