CN111325264A - Multi-label data classification method based on entropy - Google Patents

Multi-label data classification method based on entropy Download PDF

Info

Publication number
CN111325264A
CN111325264A CN202010096523.6A CN202010096523A CN111325264A CN 111325264 A CN111325264 A CN 111325264A CN 202010096523 A CN202010096523 A CN 202010096523A CN 111325264 A CN111325264 A CN 111325264A
Authority
CN
China
Prior art keywords
label
entropy
training
classifier
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010096523.6A
Other languages
Chinese (zh)
Inventor
杜博
陈玉坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202010096523.6A priority Critical patent/CN111325264A/en
Publication of CN111325264A publication Critical patent/CN111325264A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-label data classification method based on entropy, which comprises a training phase and a testing phase. The training stage comprises the steps of selecting data samples, constructing a training set, constructing a label set, constructing a classifier and analyzing parameters. Selecting a proper data sample, dividing the sample into a training set and a testing set according to the ratio of 4:1, calculating the entropy value of each Label in the training set sample, selecting a proper Label set by sequencing the entropy values of the labels, performing parameter analysis to obtain the optimal Label subset number and voting threshold value, and training based on a Label Power set classifier. In the testing stage, the testing centralized samples are used as input, the trained classifier is used for predicting, and the prediction result is evaluated, so that the multi-label data classification result is obtained.

Description

Multi-label data classification method based on entropy
Technical Field
The invention belongs to the field of machine learning multi-label classification, and particularly relates to a multi-label data classification method based on entropy.
Background
In the field of machine learning, traditional supervised learning is a learning framework which is researched most and applied most widely. Under this framework, for each object in the real world, the learning system learns a mapping between the input space and the output space using a certain learning algorithm, based on which class labels of unseen examples can be predicted, and the traditional supervised learning framework has had great success when the object to be learned has definite, single semantics, i.e. the class labels of the object are unique.
However, real-world objects tend not to have only unique semantics, but may be ambiguous. With the continuous improvement of scientific technology, various expression forms of data are continuously enriched, and the assumption of a single type label of a sample is difficult to accurately describe semantic information of a real object. Due to the complexity and ambiguity of the objective object itself, many objects in real life may be simultaneously associated with multiple category labels. In order to intuitively reflect the various semantic information that an ambiguous object has, it is a natural way to explicitly assign the object a set of appropriate class labels, i.e., a subset of labels. Based on the above considerations, a multi-label learning framework arises from this. Under this framework, each object is described by an example with multiple, no longer unique class labels, the goal of learning is to assign all appropriate class labels to unknown examples.
Aiming at the problem of multi-label classification, scholars at home and abroad put forward a plurality of methods. Existing multi-label learning methods can be divided into two broad categories, the first category is "problem transformation" methods, and the second category is "algorithm adaptation" methods. For the problem conversion method, the strategy is to convert the multi-label classification problem into a series of single-label classification problems, so that the existing single-label learning algorithm can be more conveniently applied to solve the problems. For the algorithm adaptation method, the strategy is to improve and expand the current single-label learning algorithm so that the algorithm can be applied to a multi-label classification task.
Problem translation methods typically translate multi-label classification problems into other learning problems known as single-label classification problems and label ordering problems. Considering that the single-label classification problem is a special case of multi-label classification and many efficient and accurate algorithms exist for single-label classification, the problem transformation method naturally transforms the multi-label classification into single-label classification problems of different types in the research process, and the algorithm adaptation method is to adapt other known learning algorithms to directly process the multi-label classification problems.
In addition to the two-category problem, the multi-category problem is also a problem for many researchersThe transformed objects are considered in designing the multi-label classification algorithm. The LP (Label Power set) method first transforms all the different label subsets corresponding to each sample in the training set into a series of different class values. Each unique label subset corresponds to a class, an unknown sample is classified by training a multi-class classifier, and then the label subset corresponding to the class output by the multi-classifier is used as a final prediction result of the sample. However, for a data set containing q tags, the number of tag subsets can reach 2 at mostq1, therefore, the number of samples corresponding to many labeled subsets in the actual data set is very small, which is likely to cause an unbalanced classification problem, thereby affecting the final classification generalization performance, and the method cannot predict the label subsets that do not appear in the training set. To overcome these deficiencies, a RAKEL (Random K-labelsets) algorithm was subsequently proposed. The main idea is as follows: a series of multi-class classifiers are established through an ensemble learning framework, wherein each multi-classifier randomly selects subsets from all label subsets of a label set, then the subsets are constructed through a method, and finally the related label subsets of unknown samples are predicted through a voting method. Based on the above thought, the token method overcomes the disadvantages of the LP method, but also brings other disadvantages, the randomly selected tag set may cause unbalanced data distribution of single-tag multi-class learning, and the dependency relationship between different tags in the same tag set may also cause serious information redundancy and overlapping. Both of these deficiencies affect the generalization ability of multi-label learning.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a high-accuracy multi-label data classification method based on entropy.
In order to solve the technical problems, the invention adopts the following technical scheme that the multi-label data classification method based on entropy comprises the following steps:
(1) selecting a multi-label data sample, and constructing a training set and a testing set based on sparse representation and five-fold cross validation;
(2) calculating entropy values of labels in the training set, obtaining entropy value sequencing, and selecting k labels with the minimum entropy values to construct a label set;
(3) performing parameter analysis to obtain optimal tag set parameters and voting threshold values, and constructing a Label Power set classifier according to parameter analysis results;
(4) inputting the constructed training set into the Label Powerset classifier constructed in the step (3) for classification training;
(5) inputting the constructed test set sample into a trained classifier for classification test to obtain a corresponding predicted value;
(6) and taking the predicted value obtained by the classifier as an output value of the test set, and evaluating and analyzing the output value.
Further, the specific implementation manner of the step (2) is as follows,
calculating by using the training set obtained in the step (1) and using a libsvm function, setting command parameters to be [ '-c', '100', '-g', num2str (gamma), '-b 1' ], setting a penalty coefficient c in the function to be 100, setting a parameter g, namely a gamma coefficient, to be the reciprocal of the number of classes in the training set, and setting a parameter b to be 1 to represent that the probability of classifying the sample into each class is estimated in the training process; based on the parameters, training a sample by using an svmtrain function, predicting the probability p of each label to be true by using an svmpredict function, obtaining the entropy value of each label based on an entropy value formula H-p log (p), and performing order increasing sequencing on all the entropy values according to a sequencing function to select k labels with the minimum entropy value to construct a label set;
after the data set is processed by adopting the five-fold cross validation in the step (1), different training sets can be generated, and different label sets can be obtained by performing the operation based on the different training sets.
Further, in the step (3), based on parameter analysis of the token method, an approximate range of optimal values of two parameters, namely the number k of the tags contained in each tag set and the voting threshold t, is determined, and then the most suitable parameter is obtained in a mode of controlling a variable.
Further, k was 4 and t was 0.75.
Further, in the step (4), according to the tag joint entropy increasing ordering result, selecting the first q tag sets with the minimum entropy values from the tag sets as tag subsets to perform ensemble learning, wherein q is the number of the tag types of the data samples; aiming at the constructed Label subsets, a classifier based on a Label Power set method is adopted to convert the multi-Label classification problem into a multi-class single-Label problem, then the classifiers are respectively called for q Label subsets, statistics is carried out on all class labels to obtain the actual votes of each class Label in the classification result, then the actual votes are divided by the maximum votes to obtain the ratio of the actual votes to the maximum votes, the ratio is compared with a threshold t in parameter analysis, and when the ratio is larger than the threshold t, the Label is regarded as a related Label of the test sample.
The invention has the beneficial effects that:
(1) the invention provides a label subset selection strategy based on entropy analysis, which can select k labels with the minimum entropy sequencing in labels to obtain a label subset with the minimum joint entropy.
(2) The invention provides a multi-label classifier, which can process a training set subjected to entropy analysis through parameter analysis, distinguish the largest label subset on information, and efficiently realize the prediction of unknown labels in a sample based on a voting process.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention.
Detailed Description
For the convenience of those skilled in the art to understand and implement the technical solution of the present invention, the following detailed description of the present invention is provided in conjunction with the accompanying drawings and examples, it is to be understood that the embodiments described herein are only for illustrating and explaining the present invention and are not to be construed as limiting the present invention.
The invention discloses a multi-label data classification method based on entropy, which comprises a training phase and a testing phase. The training stage comprises the steps of selecting data samples, constructing a training set, constructing a label subset, analyzing parameters and constructing a classifier. Selecting a proper data sample, dividing the sample into a training set and a testing set according to the ratio of 4:1, calculating the entropy value of each Label in the training set sample, selecting a proper Label subset through sorting the Label entropy values, performing parameter analysis to obtain the optimal Label subset number and voting threshold, and training based on a Label Power set classifier. In the testing stage, the testing concentrated samples are used as input, the trained classifier is used for predicting, and the prediction result is evaluated, so that the multi-label data classification effect is obtained.
The method comprises the steps of adopting a Matlab platform to achieve the method based on an SVM base, enabling an SVM function to be a classifier base function, enabling a Libsvm function to be used for entropy value calculation, inputting multi-label data samples, reading in a matrix X with the size of M × N, enabling M in the matrix to be the number of the samples, enabling N to be the number of features of the multi-label samples, and enabling the SVM technology to be a well-known technology in the field of machine learning and not repeated.
In an embodiment, the following operations are performed on a multi-label sample:
(1): processing the data sample by using sparse representation and five-fold cross validation to obtain a training set and a test set;
the specific operation of the step (1) is as follows: and carrying out sparse representation on the matrix X by using a sparse function to obtain a processed data set. And then processing the data set by adopting a five-fold cross validation mode, dividing all the data sets into 5 parts, repeatedly taking one part as a test set, training a classifier by taking the other four parts as a training set, calculating output results of the classifier on the test set, and averaging all the output results to obtain a final output result. The function for performing sparsification processing on the matrix is a library function in the MATLAB, which is not described herein again.
(2): calculating entropy values of labels in the training set, obtaining entropy value sequencing, and selecting k labels with the minimum entropy values to construct a label set;
in the embodiment, the specific operation of step (2) is as follows: using the training set obtained in the step (1), calculating by using a libsvm function, setting command parameters to be [ '-c', '100', '-g', num2str (gamma), '-b 1' ], setting a penalty coefficient c in the function to be 100, setting a parameter g, namely a gamma coefficient, to be the reciprocal of the number of classes in the training set, and setting a parameter b to be 1 to represent that the probability of the samples being classified into each class is estimated in the training process. Based on the parameters, a sample is trained by using an svmtrain function, then the probability p that each label is true is predicted by using an svmpredict function, entropy values of all labels are obtained based on an entropy value formula H-p log (p), then all the entropy values are subjected to increasing sequence ordering according to an ordering function, k labels with the minimum entropy values are selected to construct a label subset, and the k value is the number of the labels contained in the label set.
(3): performing parameter analysis to obtain optimal tag set parameters and voting threshold values, and constructing a Label Power set classifier according to parameter analysis results;
in the embodiment, the specific operation of step (3) is as follows: and (3) selecting the data set based on the five-fold cross validation technology for processing the data set in the step (1) to generate different training sets, and performing the operation in the step (2) based on different training set combinations to obtain different label sets. Based on parameter analysis of a RAKEL method, determining an approximate range of optimal values of two parameters, namely the number k of the labels contained in each label set and a voting threshold t, and then measuring the most appropriate parameter in a variable control mode, so that the method has optimal comprehensive performance under the evaluation index. The parameter analysis experiment is carried out based on CAL500 and Birds data sets, wherein the optimal value range of the number k of the labels contained in each label set is 3-6, and the optimal value range of the voting threshold t is 0.4-0.8. Firstly, setting a voting threshold value to be 0.5, and then determining an optimal k value by comparing the quality of output results corresponding to different k values, wherein the k value is more suitable for being 4 through experiments; and setting the value of k to be 4, comparing output results corresponding to different voting threshold values t, and testing to confirm that the performance of the output result is optimal when the value of t is set to be 0.75. The classifier is adjusted according to the two parameter values.
(4): inputting the constructed training set into the Label Powerset classifier constructed in the step (3) for classification training;
in an embodiment, the specific operation of step (4) is to, based on the tag ascending order sorting result in (2), obtain- ∑ according to the joint entropy formula H (x, y)xyP(x,y)log2[P(x,y)]Calculating label joint entropy (p in the formula represents the probability that the label is true), and selecting the first q label sets with the minimum entropy values from the label sets as label sets according to the label joint entropy increasing sequencing resultAnd performing ensemble learning by using the label subset, wherein q is the number of label types of the data sample. Aiming at the constructed label subsets, a classifier based on a LabelPowerset method is adopted to convert the multi-label classification problem into a plurality of types (single label) problems, then the classifiers are respectively called for q label subsets, statistics is carried out on all class labels to obtain the actual votes for each class label in the classification result, then the actual votes are divided by the maximum votes to obtain the ratio of the actual votes to the maximum votes, the ratio is compared with a threshold t in parameter analysis, and when the ratio is larger than the threshold t, the label is regarded as a related label of the test sample. And inputting the constructed training set into the classifier for classification training, and integrating the classification results of the plurality of label subsets to obtain an output value.
(5): inputting the constructed test set sample into a trained classifier for classification test to obtain a corresponding predicted value;
in the embodiment, the specific operation of step (5) is as follows: and taking the constructed test set sample as the input of the trained classifier to obtain a corresponding predicted value.
(6): and taking the predicted value obtained by the classifier as an output value of the test set, and evaluating and analyzing the output value.
The specific operation of the step (6) is as follows: and after the test set is input into the classifier, the output of the prediction label set is obtained, and the prediction label set is evaluated by adopting various evaluation indexes. Specific evaluation indexes comprise The amplified F-measure, Hamming lose, One Error, Macro F-measure and Micro F-measure, and The indexes are all common evaluation indexes in multi-label learning.
In specific implementation, the automatic operation of the process can be realized by adopting a software mode. The apparatus for operating the process should also be within the scope of the present invention.
The advantageous effects of the present invention are verified by comparative experiments as follows.
The test uses eight data sets covering five fields including creatures, text, images, music and video, as shown in table 1 below:
TABLE 1 dataset attributes
ID Data set FIELD Number of samples Number of features Number of labels Average number of labels in a sample
1 Birds Audio 645 260 19 1.014
2 CAL500 Music 502 68 174 26.044
3 Emotions Music 593 72 6 1.869
4 Flags Images 194 19 7 3.392
5 Gensbase Biology 662 1186 27 1.252
6 Medical Text 978 1449 45 1.245
7 Scene Images 2407 294 6 1.074
8 Yeast Biology 2417 103 14 4.237
Multi-label classification evaluation index: the evaluation indexes can be calculated specifically by Zhang M L, Zhou Z H.A review on multi-label learning algorithms [ J ] IEEE transactions on knowledge and data engineering,2013,26(8):1819-1837.
The Example-based F measure is a comprehensive version of sample precision and recall:
Figure BDA0002385453180000071
piand riRespectively the precision ratio and the recall ratio of the ith sample, wherein m represents the number of the samples;
hamming Loss represents the proportion of all labels misclassified (i.e., the correct label is not predicted and the wrong label is predicted):
Figure BDA0002385453180000072
wherein Δ represents the difference between the true tag set and the predicted tag set;
one Error represents the proportion of all samples where the highest predictive accuracy ranked label is not in the true label set:
Figure BDA0002385453180000073
wherein
Figure BDA0002385453180000074
Is that
Figure BDA0002385453180000075
A sorting function of the tags;
the Macro F-measure is a comprehensive version of the label precision and recall ratio, and the Micro F-measure expands the F1 index of single-label classification to multi-label classification:
Figure BDA0002385453180000076
Figure BDA0002385453180000077
table 2 comparative experimental results
Average rank The method of the invention Method 1 Method 2 Method 3 Method 4 Method 5
Hamming loss 1.875 5.25 2.375 4 1.75 5.25
One error 2.375 5.125 2.5 3.875 2.625 4.5
Example-based F 1.75 4.75 4.2500 2.5 3.125 4.25
Micro-F 2.25 4.125 3.625 3.75 2.25 4.875
Macro-F 1.625 4.875 4.75 3.25 2.25 4.125
As can be seen from Table 2, the ranking of the five evaluation indexes tested by the method of the invention is listed as top row, which shows that the method of the invention has better classification performance. Compared with basic classical algorithms such as methods 1 and 5, the Hamminloss ranking of the method is greatly advanced, and the method is more excellent than the overall classification effect of the classical algorithms; the sample-based F rankings were also higher for the methods of the invention compared to prior representation-based methods such as methods 1, 2, 4, and 5, indicating that the invention performed better in sample-based tag classification.
Therefore, the method provided by the invention has better classification performance compared with the existing multi-label classification method. The invention solves the problems that the randomly selected tag set can cause unbalanced data distribution of single-tag multi-class learning and the dependency relationship between different tags in the same tag set can also cause serious information redundancy and overlapping, selects the tag subsets through entropy analysis and sorting, enables the information quantity of the selected tag subsets and the randomly selected comparison to be larger, and can better utilize the tag correlation to realize multi-tag classification, thereby greatly improving the precision of target detection.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (5)

1. An entropy-based multi-label data classification method is characterized by comprising the following steps:
(1) selecting a multi-label data sample, and constructing a training set and a testing set based on sparse representation and five-fold cross validation;
(2) calculating entropy values of labels in the training set, obtaining entropy value sequencing, and selecting k labels with the minimum entropy values to construct a label set;
(3) performing parameter analysis to obtain optimal tag set parameters and voting threshold values, and constructing a LabelPowerset classifier according to parameter analysis results;
(4) inputting the constructed training set into the Label Powerset classifier constructed in the step (3) for classification training;
(5) inputting the constructed test set sample into a trained classifier for classification test to obtain a corresponding predicted value;
(6) and taking the predicted value obtained by the classifier as an output value of the test set, and evaluating and analyzing the output value.
2. An entropy-based multi-label data classification method as claimed in claim 1, characterized in that: the specific implementation manner of the step (2) is as follows,
calculating by using the training set obtained in the step (1) and using a libsvm function, setting command parameters to be [ '-c', '100', '-g', num2str (gamma), '-b 1' ], setting a penalty coefficient c in the function to be 100, setting a parameter g, namely a gamma coefficient, to be the reciprocal of the number of classes in the training set, and setting a parameter b to be 1 to represent that the probability of classifying the sample into each class is estimated in the training process; based on the parameters, training a sample by using an svmtrain function, predicting the probability p of each label to be true by using an svmpredict function, obtaining the entropy value of each label based on an entropy value formula H-p log (p), and performing order increasing sequencing on all the entropy values according to a sequencing function to select k labels with the minimum entropy value to construct a label set;
after the data set is processed by adopting the five-fold cross validation in the step (1), different training sets can be generated, and different label sets can be obtained by performing the operation based on the different training sets.
3. An entropy-based multi-label data classification method as claimed in claim 1, characterized in that: the method is characterized in that: and (3) determining an approximate range of optimal values of two parameters, namely the number k of the labels contained in each label set and a voting threshold t, based on parameter analysis of a RAKEL method, and then obtaining the most appropriate parameter in a variable control mode.
4. An entropy-based multi-label data classification method as claimed in claim 3, characterized in that: the method is characterized in that: the value of k is 4 and the value of t is 0.75.
5. An entropy-based multi-label data classification method as claimed in claim 3, characterized in that: in the step (4), according to the label joint entropy increasing sequencing result, selecting the first q label sets with the minimum entropy values from the label sets as label subsets to perform ensemble learning, wherein q is the number of label types of the data samples; aiming at the constructed Label subsets, a classifier based on a Label Power set method is adopted to convert the multi-Label classification problem into a multi-class single-Label problem, then the classifiers are respectively called for q Label subsets, statistics is carried out on all class labels to obtain the actual votes of each class Label in the classification result, then the actual votes are divided by the maximum votes to obtain the ratio of the actual votes to the maximum votes, the ratio is compared with a threshold t in parameter analysis, and when the ratio is larger than the threshold t, the Label is regarded as a related Label of the test sample.
CN202010096523.6A 2020-02-17 2020-02-17 Multi-label data classification method based on entropy Pending CN111325264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010096523.6A CN111325264A (en) 2020-02-17 2020-02-17 Multi-label data classification method based on entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010096523.6A CN111325264A (en) 2020-02-17 2020-02-17 Multi-label data classification method based on entropy

Publications (1)

Publication Number Publication Date
CN111325264A true CN111325264A (en) 2020-06-23

Family

ID=71167070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010096523.6A Pending CN111325264A (en) 2020-02-17 2020-02-17 Multi-label data classification method based on entropy

Country Status (1)

Country Link
CN (1) CN111325264A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN112201300A (en) * 2020-10-23 2021-01-08 天津大学 Protein subcellular localization method based on depth image features and threshold learning strategy
CN112529100A (en) * 2020-12-24 2021-03-19 深圳前海微众银行股份有限公司 Training method and device for multi-classification model, electronic equipment and storage medium
CN112906779A (en) * 2021-02-07 2021-06-04 中山大学 Data classification method based on sample boundary value and integrated diversity
CN113255772A (en) * 2021-05-27 2021-08-13 北京玻色量子科技有限公司 Data analysis method and device
CN115543855A (en) * 2022-12-01 2022-12-30 江苏邑文微电子科技有限公司 Semiconductor device parameter testing method, device, electronic device and storage medium
CN112529100B (en) * 2020-12-24 2024-05-28 深圳前海微众银行股份有限公司 Training method and device for multi-classification model, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971201A (en) * 2017-03-23 2017-07-21 重庆邮电大学 Multi-tag sorting technique based on integrated study

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971201A (en) * 2017-03-23 2017-07-21 重庆邮电大学 Multi-tag sorting technique based on integrated study

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753790A (en) * 2020-07-01 2020-10-09 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN111753790B (en) * 2020-07-01 2023-12-12 武汉楚精灵医疗科技有限公司 Video classification method based on random forest algorithm
CN112201300A (en) * 2020-10-23 2021-01-08 天津大学 Protein subcellular localization method based on depth image features and threshold learning strategy
CN112529100A (en) * 2020-12-24 2021-03-19 深圳前海微众银行股份有限公司 Training method and device for multi-classification model, electronic equipment and storage medium
CN112529100B (en) * 2020-12-24 2024-05-28 深圳前海微众银行股份有限公司 Training method and device for multi-classification model, electronic equipment and storage medium
CN112906779A (en) * 2021-02-07 2021-06-04 中山大学 Data classification method based on sample boundary value and integrated diversity
CN112906779B (en) * 2021-02-07 2023-12-08 中山大学 Data classification method based on sample boundary value and integrated diversity
CN113255772A (en) * 2021-05-27 2021-08-13 北京玻色量子科技有限公司 Data analysis method and device
CN115543855A (en) * 2022-12-01 2022-12-30 江苏邑文微电子科技有限公司 Semiconductor device parameter testing method, device, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN111325264A (en) Multi-label data classification method based on entropy
Raghu et al. Evaluation of causal structure learning methods on mixed data types
CN112069310B (en) Text classification method and system based on active learning strategy
CN107292350A (en) The method for detecting abnormality of large-scale data
CN110110858B (en) Automatic machine learning method based on reinforcement learning
CN113190699A (en) Remote sensing image retrieval method and device based on category-level semantic hash
Wang et al. Novel and efficient randomized algorithms for feature selection
CN101561805A (en) Document classifier generation method and system
CN106033426A (en) A latent semantic min-Hash-based image retrieval method
JP2022530447A (en) Chinese word division method based on deep learning, equipment, storage media and computer equipment
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
CN113377981A (en) Large-scale logistics commodity image retrieval method based on multitask deep hash learning
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN113516019B (en) Hyperspectral image unmixing method and device and electronic equipment
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure
CN114153839A (en) Integration method, device, equipment and storage medium of multi-source heterogeneous data
CN111553442B (en) Optimization method and system for classifier chain tag sequence
CN111027636B (en) Unsupervised feature selection method and system based on multi-label learning
CN111832645A (en) Classification data feature selection method based on discrete crow difference collaborative search algorithm
Li et al. MT-MAG: Accurate and interpretable machine learning for complete or partial taxonomic assignments of metagenomeassembled genomes
Salman et al. Gene expression analysis via spatial clustering and evaluation indexing
CN116595125A (en) Open domain question-answering method based on knowledge graph retrieval
CN114565063A (en) Software defect prediction method based on multi-semantic extractor
Akbacak et al. MLMQ-IR: Multi-label multi-query image retrieval based on the variance of Hamming distance
Mumuni et al. Automated data processing and feature engineering for deep learning and big data applications: a survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200623

RJ01 Rejection of invention patent application after publication