CN110442722A - Method and device for training classification model and method and device for data classification - Google Patents
Method and device for training classification model and method and device for data classification Download PDFInfo
- Publication number
- CN110442722A CN110442722A CN201910746175.XA CN201910746175A CN110442722A CN 110442722 A CN110442722 A CN 110442722A CN 201910746175 A CN201910746175 A CN 201910746175A CN 110442722 A CN110442722 A CN 110442722A
- Authority
- CN
- China
- Prior art keywords
- characteristic
- sample
- label
- class label
- disaggregated model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method and a device for training a classification model and a method and a device for classifying data, wherein the method for training the classification model comprises the following steps: acquiring a sample data set, wherein the sample data set comprises at least three category labels and characteristic data corresponding to the category labels, and counting the proportion of the number of each category label in the sample data set; dividing the class labels in the sample data set into at least two sample groups according to the proportion of the number of each class label in the sample data set; and inputting the sample group into a corresponding classification model for training until a training condition is reached. The class label proportion in the sample data set is unbalanced, the quality of the processed sample data set is greatly improved, the training effect of the classification model can be further ensured, and the classification accuracy of the trained classification model is greatly improved when the trained classification model is used for actual classification prediction.
Description
Technical field
This application involves data classification technology field, in particular to a kind of method and device, the data of disaggregated model training
The method and device of classification calculates equipment, computer readable storage medium.
Background technique
Data classification is to carry out automatic classification marker, such as text point according to certain classification system or standard to data
Class carries out automation classification according to certain bibliography system to the text of input.Text Classification has been widely used
In natural language processing fields such as text audit, advertisement filter, sentiment analysis and anti-yellow identifications.
In the training method of existing disaggregated model, is usually directly concentrated from sample data and choose sample data and be input to point
In class model, disaggregated model is trained, but sample data concentrates the sample number that will appear a kind of classification much larger than other
The case where number of samples of classification, causes the classification for sample number in trained sample set unbalance, i.e., sample data is concentrated
The classification of sample number is unbalanced, trained disaggregated model, is only the different classes of sample classification effect concentrated to training sample
Fruit is good, but when classifying for raw data set, since raw data set and training sample concentrate the proportional difference of classification
Larger, the classification results error rate of trained disaggregated model prediction is higher, and existing trained disaggregated model is difficult to carry out
Actual application.
In the training of disaggregated model, to establish the training set of each sample class equilibrium, then need to spend a large amount of manpowers
Processing material is found with material resources and obtains the training set of sample class equilibrium, and the cost of disaggregated model training will greatly improved in this.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of method and device of disaggregated model training, the method for data classification
And device, calculating equipment, computer readable storage medium, to solve technological deficiency existing in the prior art.
The embodiment of the present application discloses a kind of method of disaggregated model training, comprising:
Sample data set is obtained, the sample data set includes at least three kinds of class labels and the corresponding spy of class label
Data are levied, the accounting that the quantity of each class label is concentrated in the sample data is counted;
According to the accounting that the quantity of each class label is concentrated in the sample data, the class that the sample data is concentrated
Distinguishing label and its corresponding characteristic are divided at least two sample groups;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition.
The embodiment of the present application also discloses a kind of method of data classification, comprising:
Receive characteristic to be sorted;
The characteristic to be sorted is input to the first disaggregated model;
In the case where first disaggregated model output is first category label, determine that the characteristic to be sorted is
The corresponding classification of first category label;
It, will be described in the case where first disaggregated model output is remaining class label in addition to first category label
Characteristic to be sorted is input to the second disaggregated model, determines the feature to be sorted according to the output result of the second disaggregated model
The corresponding classification of data.
The embodiment of the present application discloses a kind of device of disaggregated model training characterized by comprising
Processing module, be configured as obtain sample data set, the sample data set include at least three kinds of class labels with
And the corresponding characteristic of class label, count the accounting that the quantity of each class label is concentrated in the sample data;
Division module is configured as the accounting concentrated according to the quantity of each class label in the sample data, by institute
The class label and its corresponding characteristic for stating sample data concentration are divided at least two sample groups;
Training module, is configured as the sample group being input in corresponding disaggregated model and is trained until reach instruction
The condition of white silk.
The embodiment of the present application discloses a kind of device of data classification, comprising:
Receiving module is configured as receiving characteristic to be sorted;
Input module is configured as the characteristic to be sorted being input to the first disaggregated model;
First determining module is configured as in the case where it is first category label that first disaggregated model, which exports, really
The fixed characteristic to be sorted is the corresponding classification of first category label;
Second determining module is configured as exporting in first disaggregated model as remaining class in addition to first category label
In the case where distinguishing label, the characteristic to be sorted is input to the second disaggregated model, according to the output of the second disaggregated model
As a result the corresponding classification of the characteristic to be sorted is determined.
The embodiment of the present application discloses a kind of calculating equipment, including memory, processor and storage are on a memory and can
The computer instruction run on a processor, the processor realize disaggregated model training as described above when executing described instruction
Method or data classification method the step of.
The embodiment of the present application discloses a kind of computer readable storage medium, is stored with computer instruction, the instruction quilt
Processor execute when realize disaggregated model training as described above method or data classification method the step of.
Method and device, the method and device of data classification of a kind of disaggregated model training provided by the present application, statistics are every
The accounting that the quantity of a class label is concentrated in the sample data;According to the quantity of each class label in the sample data
The class label that the sample data is concentrated is divided at least two sample groups by the accounting of concentration;The class that sample data is concentrated
Distinguishing label ratio is unbalanced, and the quality of treated sample data set greatly improves, and the sample group is input to correspondence
Disaggregated model in be trained until reach training condition, and then can ensure the training effect of disaggregated model, it is trained
Disaggregated model greatly improves the classification accuracy of trained disaggregated model when actual classification is predicted.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of the calculating equipment of the embodiment of the present application;
Fig. 2 is the schematic flow chart of the method for the disaggregated model training of one embodiment of the application;
Fig. 3 is the schematic flow chart of the method for the disaggregated model training of another embodiment of the application;
Fig. 4 is the schematic flow chart of the method for the data classification of one embodiment of the application;
Fig. 5 is the flow diagram that the corresponding classification of characteristic to be sorted is determined in the application;
Fig. 6 is the apparatus structure schematic diagram of the disaggregated model training of one embodiment of the application;
Fig. 7 is the apparatus structure schematic diagram of the data classification of one embodiment of the application.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with
Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where
Under do similar popularization, therefore the application is not limited by following public specific implementation.
The term used in this specification one or more embodiment be only merely for for the purpose of describing particular embodiments,
It is not intended to be limiting this specification one or more embodiment.In this specification one or more embodiment and appended claims
The "an" of singular used in book, " described " and "the" are also intended to including most forms, unless context is clearly
Indicate other meanings.It is also understood that term "and/or" used in this specification one or more embodiment refers to and includes
One or more associated any or all of project listed may combine.
It will be appreciated that though may be retouched using term first, second etc. in this specification one or more embodiment
Various information are stated, but these information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other
It opens.For example, first can also be referred to as second, class in the case where not departing from this specification one or more scope of embodiments
As, second can also be referred to as first.Depending on context, word as used in this " if " can be construed to
" ... when " or " when ... " or " in response to determination ".
Firstly, the vocabulary of terms being related to one or more embodiments of the invention explains.
Two disaggregated models: being the model that data are carried out with binary classification, and two disaggregated models can be one kind by supervised learning
Mode carries out the generalized linear classifier of binary classification to data, and linear classifier is based on training sample and finds in two-dimensional space
One hyperplane separates two class samples.
More disaggregated models: the model that can classify respectively to the data of multiple classifications, more disaggregated models can be one
Kind promotes tree-model, and many tree-models are integrated, and a very strong classifier is formed, in other words, by many weak typings
Device integrates to form a strong classifier, in more classification problems.
Fig. 1 is to show the structural block diagram of the calculating equipment 100 according to one embodiment of this specification.The calculating equipment 100
Component include but is not limited to memory 110 and processor 120.Processor 120 is connected with memory 110 by bus 130,
Database 150 is for saving data.
Calculating equipment 100 further includes access device 140, access device 140 enable calculate equipment 100 via one or
Multiple networks 160 communicate.The example of these networks includes public switched telephone network (PSTN), local area network (LAN), wide area network
(WAN), the combination of the communication network of personal area network (PAN) or such as internet.Access device 140 may include wired or wireless
One or more of any kind of network interface (for example, network interface card (NIC)), such as IEEE802.11 wireless local area
Net (WLAN) wireless interface, worldwide interoperability for microwave accesses (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) connect
Mouth, cellular network interface, blue tooth interface, near-field communication (NFC) interface, etc..
In one embodiment of this specification, other unshowned portions in the above-mentioned component and Fig. 1 of equipment 100 are calculated
Part can also be connected to each other, such as pass through bus.It should be appreciated that calculating device structure block diagram shown in FIG. 1 merely for the sake of
Exemplary purpose, rather than the limitation to this specification range.Those skilled in the art can according to need, and increases or replaces it
His component.
Calculating equipment 100 can be any kind of static or mobile computing device, including mobile computer or mobile meter
Calculate equipment (for example, tablet computer, personal digital assistant, laptop computer, notebook computer, net book etc.), movement
Phone (for example, smart phone), wearable calculating equipment (for example, smartwatch, intelligent glasses etc.) or other kinds of shifting
Dynamic equipment, or the static calculating equipment of such as desktop computer or PC.Calculating equipment 100 can also be mobile or state type
Server.
Wherein, processor 120 can execute the step in method shown in Fig. 2.Fig. 2 is to show to be implemented according to the application one
The schematic flow chart of the method for the disaggregated model training of example, including step 202 is to step 206.
Step 202: obtaining sample data set, the sample data set includes at least three kinds of class labels and class label
Corresponding characteristic counts the accounting that the quantity of each class label is concentrated in the sample data.
Above-mentioned sample data set is the training set for disaggregated model training, sample data set can for samples of text collection or
Picture sample collection, for example the class label that samples of text is concentrated is that company receives the classifications of most legal documents, legal documents
Classification can for lawsuit position paper, indictment, not indictment letter of decision, position paper of prosecuting, document of protesting, criminal complaint filed by procuratorate against a court decision book,
Civil counter-appealing book, administration are protested book and suggestion from prosecutorial organizations book etc..
The corresponding characteristic of the class label is to input the data calculated in disaggregated model, and the characteristic can be with
For history dispute data, financial statement data, company's categorical data, project data, funds data, the technical data, city of company
Field data and management environment data.
It should be noted that use the task of practical application after train classification models in above-mentioned example is, according to above-mentioned
History dispute data, financial statement data, company's categorical data and the project data of company in characteristic, disaggregated model output
Company will receive the classification of most legal documents whithin a period of time, i.e., prediction company will receive most whithin a period of time
The classification of more legal documents.
The accounting of the quantity of each class label is concentrated to count the sample data.
For example, being schematically illustrated so that the sample data set includes tetra- kinds of class labels of A, B, C, D as an example below, join
It is shown in Table 1.
Table 1
Class label | A | B | C | D |
Classification | Lawsuit position paper | Indictment | Public prosecution position paper | Criminal complaint filed by procuratorate against a court decision book |
Accounting | 70% | 15% | 10% | 5% |
Class label A is lawsuit position paper, the corresponding class label A of characteristic, it can be understood as, the spy of a company
It is class label A that it is corresponding, which to levy data, and class label A accounting is 70%, and meaning is that sample data concentrates the company for having 70%
It is lawsuit position paper that most legal documents are received in sometime range.
Class label B is indictment, the corresponding class label B of characteristic, it can be understood as, the characteristic of a company
According to it is corresponding be class label B, class label B accounting is 15%, and meaning is that sample data is concentrated and has 15% company at certain
It is indictment that most legal documents are received in one time range.
Class label C is public prosecution position paper, and class label C accounting is 10%, and class label D is criminal complaint filed by procuratorate against a court decision book, classification
Label D accounting is 5%, and referring specifically to above description, details are not described herein again.
Above-mentioned sample data set includes that tri- kinds of class label difference accountings of A, B, C, D are 70%, 15%, 10%, 5%.
Step 204: the accounting concentrated according to the quantity of each class label in the sample data, by the sample data
The class label of concentration and its corresponding characteristic are divided at least two sample groups.
For example the accounting of the class label A in above-mentioned example reaches 70%, the ratio of above-mentioned four kinds of class labels of appearance is not
Balanced situation.It above are only and class label accounting and characteristic are schematically illustrated, and be used for actual
Trained sample data is concentrated, and the type of class label reaches ten kinds or more, and the situation that class label accounting is unbalance is even more serious,
The unbalance influence disaggregated model training effect of class label accounting is solved the problems, such as by following step.
The highest first category label of accounting in the class label and its corresponding characteristic are divided into the first sample
The sample data is concentrated remaining class label and its corresponding characteristic in addition to first category label to be divided by this group
Second sample group.
In other words, the highest first category label of quantity accounting and its corresponding characteristic divide in the class label
For first sample group, in above-mentioned example, the first sample group that class label A and its corresponding characteristic are divided will be described
Sample data concentrates remaining class label and its corresponding characteristic in addition to class label A to be divided into the second sample group, i.e.,
Class label B, C, D and its corresponding characteristic are divided into the second sample group, the second sample group it can be appreciated that
Non- first sample group, first sample group can be correspondingly arranged positive class label, and second sample group can be correspondingly arranged negative category
Label, above-mentioned positive class label and negative class label can be set to 1 and 0.
Step 206: the sample group being input in corresponding disaggregated model and is trained until reaching training condition.
Such as in above-mentioned example, in the case where determining that class label accounting is balanced in second sample group, class label B
Accounting 15% and class label C accounting 10% ratio be 1.5 be lower than trimming threshold 3.5, it is determined that second sample
Class label accounting is balanced in group, and the first sample group and the second sample group are input to the first disaggregated model and are trained directly
To training condition is reached, the characteristic for being divided into the second sample group and corresponding class label are input to the second disaggregated model
In be trained until reach training condition.
In the above-mentioned implementation column of the application, by above-mentioned direct to the maximum class label of accounting and its corresponding characteristic
Be divided into first sample group, by the sample data concentrate remaining class label in addition to the maximum first category label of accounting and
Its corresponding characteristic is divided into the first category label and second of first sample group in the second sample group, such as above-mentioned example
The all categories label accounting of sample group is respectively 70% and 30%, and the accounting of first sample group and the second sample group tends to be equal
Weighing apparatus improves the training effect of the first disaggregated model, then by the characteristic for being divided into the second sample group and corresponding class label
It is input in the second disaggregated model, wherein the accounting of class label B, C, D are respectively 15%, 10% and 5% in the second sample group,
Class label accounting in second sample group equally tends to be balanced, so as to improve the training effect of the second disaggregated model, even
The class label ratio that sample data is concentrated is unbalanced, and the quality of treated sample data set greatly improves, Jin Erneng
Enough ensure the training effect of disaggregated model, trained disaggregated model is when actual classification is predicted, trained disaggregated model
Classification accuracy it is high, guarantee in natural language processing fields realities such as text audit, advertisement filter, sentiment analysis and anti-yellow identifications
The using effect of border application.
Below with reference to specific example, one embodiment of the application is described in detail.
Assuming that the sample data set obtained includes five kinds of class labels of A, B, C, D, E and its corresponding characteristic.
The accounting that the quantity of each class label is concentrated in the sample data is counted, table 2 shows five kinds of class labels
Quantity respectively sample data concentrate accounting.
Table 2
Class label | A | B | C | D | E |
Classification | Lawsuit position paper | Indictment | Public prosecution position paper | Criminal complaint filed by procuratorate against a court decision book | Administration is protested book |
Accounting | 50% | 30% | 8% | 7% | 5% |
According to the accounting that the quantity of each class label is concentrated in the sample data, the class that the sample data is concentrated
Distinguishing label is divided at least two sample groups.
The highest first category label of accounting in the class label and its corresponding characteristic are divided into the first sample
The sample data is concentrated remaining class label and its corresponding characteristic in addition to first category label to be divided by this group
Second sample group, the first sample group that class label A and its corresponding characteristic are divided.
Remaining class label and its corresponding characteristic in addition to class label A is concentrated to be divided into the sample data
Class label B, C, D, E and its corresponding characteristic are divided into the second sample group by the second sample group.
In remaining class label in addition to class label A, i.e., in class label B, C, D and E, the highest class of accounting
The ratio of the accounting of distinguishing label B and the high class label C of accounting second is more than trimming threshold, the accounting 30% of class label B with
It is more than trimming threshold 3.5 that the ratio of the accounting 8% of class label C, which is 3.75, it is determined that class label in second sample group
Accounting is unbalance, by the highest second category label of accounting and its corresponding feature in the corresponding class label of second sample group
Data are divided into third sample group, i.e., the third sample group divided class label B and its corresponding characteristic, by classification mark
Label C, D, E and its corresponding characteristic are divided into the 4th sample group.
Due to the class label in the 4th sample group be it is balanced, no longer the 4th sample group is divided, when
So, if the 4th sample group is unbalanced, continue to divide the 4th sample group.
Class label A accounting is 50% in first sample group, and the total accounting of class label B, C, D, E is in the second sample group
50%, the first sample group and the second sample group are input to two disaggregated models and are trained until reaching trained item
Part.
Class label B accounting is 30% in third sample group, and the total accounting of class label C, D, E is in the 4th sample group
20%, the third sample group and the 4th sample group are input to another two disaggregated model and are trained until reaching trained article
Part.
It is 8%, 7%, 5% that class label C, D, E, which distinguish accounting, in 4th sample group, will be divided into the feature of the 4th sample
Data and corresponding class label, which are input in more disaggregated models, to be trained until reaching training condition.
In above-mentioned example, for the first sample group of two disaggregated model training training and the classification mark of the second sample group
Label ratio is balanced, and for the third sample group of another two disaggregated models training and the class label of the 4th sample group
Ratio equally tends to be balanced, and the ratio for all kinds of distinguishing labels in the 4th sample group of more disaggregated model training is also to become
In equilibrium.It is greatly improved by the quality of treated sample data set, and then can ensure the training of disaggregated model
Effect, trained disaggregated model greatly improve the classification accuracy of trained disaggregated model when actual classification is predicted.
It should be noted that concentrating the accounting of class label, the disaggregated model packet being trained according to sample data in table 2
Include two two disaggregated models and disaggregated model more than one, in actual classification model training, the type and quantity of specific disaggregated model
The accounting of class label is concentrated to determine according to sample data.
Fig. 3 shows the schematic flow chart of the method for the disaggregated model training according to another embodiment of the application, including
Step 302 is to step 310.
Step 302: obtaining sample data set, the sample data set includes at least three kinds of class labels and class label
Corresponding characteristic counts the accounting that the quantity of each class label is concentrated in the sample data.
Step 304: setting first threshold, delete the sample data concentrate class label of the accounting lower than first threshold and
Its corresponding characteristic.
For example setting first threshold is 1%, sample data is then concentrated accounting low by the class label accounting less than 1%
It is deleted in the class label of first threshold and its corresponding characteristic.
The influence very little of class label and its corresponding characteristic to model training due to accounting lower than first threshold,
By concentrating accounting to delete lower than the class label of first threshold the sample data, to ensure that sample data concentrates class
Distinguishing label accounting tends to be balanced, and the effect of disaggregated model training can be integrally improved in such following step.
Step 306: setting second threshold, the second threshold are greater than first threshold, accounting are located at the first threshold
Class label between second threshold merges into combination sort label.
Can be set second threshold be 5%, the class label be located at first threshold be 1% and second threshold be 5% it
Between, class label is merged into combination sort label, the harmony that sample data concentrates class label is further increased, by setting
First threshold and second threshold are set, sample data set is handled, greatly improves the training effect of disaggregated model in following step
Fruit.
It should be noted that the specific value of above-mentioned first threshold and second threshold, it can be according to actual sample data set
Middle class label quantity and the accounting of class label determine.
Step 308: the highest first category label of accounting in the class label and its corresponding characteristic are divided
For first sample group, the sample data is concentrated to remaining class label and its corresponding characteristic in addition to first category label
According to being divided into the second sample group.
The first sample group that the highest first category label of accounting and its corresponding characteristic are divided, by the sample
Remaining class label and combination sort label in data set in addition to first category label are divided into the second sample group.
Step 310: the first sample group and the second sample group being input to two disaggregated models and are trained until reaching
The characteristic for being divided into negative sample and corresponding class label are input in more disaggregated models and are trained directly by training condition
To reaching training condition.
In above-described embodiment of the application, on the one hand, by above-mentioned to the maximum class label of accounting and its corresponding spy
Sign data are directly divided into first sample group, and the sample data is concentrated remaining in addition to the maximum first category label of accounting
Class label and its corresponding characteristic are divided into the second sample group;On the other hand, concentrate accounting low the sample data
It is deleted in the class label of first threshold, and the classification mark by accounting between the first threshold and second threshold
Label merge into combination sort label, and sample data concentrates class label balanced to improve, so even sample data set
In class label there is the unbalanced situation of ratio, the application can also ensure that the training effect of disaggregated model, greatly improve
The accuracy rate of the prediction result of trained disaggregated model guarantees in text audit, advertisement filter, sentiment analysis and anti-yellow identification
Etc. natural language processing fields practical application using effect.
Fig. 4 is to show the schematic flow chart of the method for the data classification according to one embodiment of the application, including step
402 to step 408.
Step 402: receiving characteristic to be sorted.
In above-mentioned example, for example to predict that a company receives most legal documents classifications in following a period of time, then
Receive history dispute data, the financial statement data, company's categorical data, project data, funds data, technology number of the said firm
According to, marketing data and management environment data.
Step 404: the characteristic to be sorted is input to the first disaggregated model.
Step 406: in the case where first disaggregated model output is first category label, determining the spy to be sorted
Sign data are the corresponding classification of first category label.
Step 408: the case where first disaggregated model output is remaining class label in addition to first category label
Under, the characteristic to be sorted is input to the second disaggregated model, according to the determination of the output result of the second disaggregated model
The corresponding classification of characteristic to be sorted.
Shown in Figure 5, shown step 408 specifically includes step 502 to step 506.
Step 502: judging whether the characteristic to be sorted is combination sort, if so, step 504 is executed, if it is not, holding
Row step 506.
Step 504: according to the corresponding classification of characteristic in combination sort, determining that the characteristic to be sorted is corresponding
Classification.
The step 504 includes step 5042 to step 5044.
Step 5042: according to the corresponding class label of at least two classifications in combination sort in the ratio of sample data set.
According to the corresponding class label of classification in combination sort sample data set ratio, it is of all categories as upper to determine
Company is stated to determine in the following probability for receiving most legal documents classifications for a period of time.
Step 5044: determining classification of the classification as the characteristic to be sorted in the combination sort.
According to the corresponding probability of classification each in said combination classification, to determine that a classification in combination sort is made at random
Most legal documents classifications is received in following a period of time for the said firm, the accurate of the said firm's classification can be further increased
Property.
Step 506: using the output result of the second disaggregated model as the corresponding classification of the characteristic to be sorted.
The above-mentioned implementation column of the application is by being input to the first disaggregated model for the characteristic to be sorted, if the first classification
The output result of model is first category label, the classification of characteristic to be sorted is directly determined, if first disaggregated model
Output result be remaining class label in addition to first category label, the characteristic to be sorted is input to the second classification
Model determines the corresponding classification of the characteristic to be sorted according to the output result of the second disaggregated model, greatly improves data
The accuracy of classification.
For example the specific implementation process of the method for the method and data classification of the model training of bright the application below.
The technical solution of the application in order to facilitate understanding, below for example the method for the model training of bright the application and
The specific implementation process of the method for data classification.
Assuming that the sample data set obtained includes seven kinds of class labels of A, B, C, D, E, F, G and its corresponding characteristic.
The accounting that the quantity of each class label is concentrated in the sample data is counted, table 3 shows seven kinds of class labels
Quantity respectively sample data concentrate accounting.
Table 3
For example setting first threshold is 1%, the accounting of the class label G is 0.5% less than 1%, by sample data set
Middle accounting is deleted lower than the class label G of first threshold and its corresponding characteristic.
It is 5% that second threshold, which is arranged, and the ratio of the class label E is 2% and the ratio of class label F is 1% equal
Positioned at first threshold be 1% and second threshold is that class label E and class label F are merged into combination sort mark between 5%
Label.
The highest first category label of accounting is A in the class label, by the highest first category label A of accounting and its
The sample data is concentrated remaining class label in addition to class label A by the first sample group that corresponding characteristic divides
And its corresponding characteristic is divided into the second sample group, i.e., by class label B, C, D, combination sort label and its respectively corresponds
Characteristic be divided into the second sample group.
The first sample group and the second sample group are input to two disaggregated models to be trained until reaching training condition.
The characteristic for being divided into the second sample group and corresponding class label are input in more disaggregated models and instructed
Practice until reaching training condition, i.e., by class label B, C, D, combination tag and its corresponding characteristic in the second sample group
According to being input in more disaggregated models, the training of disaggregated model is completed.
Method below by taking the trained disaggregated model of above-mentioned sample data set as an example, to illustrate data classification.
It is now to one company of prediction and receives most legal documents classifications in following a period of time.
Characteristic to be sorted is received, characteristic to be sorted can be the history dispute data, financial statement of the said firm
Data, company's categorical data, project data, funds data, technical data, marketing data and management environment data.
The characteristic to be sorted is input to the first disaggregated model.
If the output class label A of two disaggregated model directly determines the corresponding class of the characteristic to be sorted
Not Wei lawsuit position paper, i.e., it is lawsuit position paper that the said firm, which receives most legal documents classifications in following a period of time,.
If the output result of two disaggregated model is remaining class label in addition to class label A, then will be described wait divide
Category feature data are input to more disaggregated models, determine that the characteristic to be sorted is corresponding according to the output result of more disaggregated models
Classification, for example it is indictment or public prosecution position paper or punishment that the said firm, which receives most legal documents classifications in following a period of time,
Thing is protested book.
If the output result of more disaggregated models is combination sort, protested book according to civil counter-appealing book and administration in combination sort
Corresponding class label is respectively 2% and 1% in the ratio of sample data set.The classification for then determining prediction is civil counter-appealing book
Probability is 2/3, and the classification of prediction is that the probability of administrative book of protesting is 1/3, according to probability stochastic prediction the said firm of two categories
Most legal documents classifications is received in following a period of time.
Fig. 6 is the apparatus structure schematic diagram of the disaggregated model training of one embodiment of the application, the disaggregated model training
Device includes:
Processing module 602 is configured as obtaining sample data set, and the sample data set includes at least three kinds of class labels
And the corresponding characteristic of class label, count the accounting that the quantity of each class label is concentrated in the sample data.
Division module 604 is configured as the accounting concentrated according to the quantity of each class label in the sample data, will
The class label and its corresponding characteristic that the sample data is concentrated are divided at least two sample groups.
Training module 606, is configured as the sample group being input in corresponding disaggregated model and is trained until reach
To training condition.
Preferably, the division module 604 is further configured to the highest first kind of accounting in the class label
Distinguishing label and its corresponding characteristic are divided into first sample group, and the sample data is concentrated in addition to first category label
Remaining class label and its corresponding characteristic are divided into the second sample group.
Preferably, the training module 606 is further configured to input the first sample group and the second sample group
It is trained to the first disaggregated model up to reaching training condition, the characteristic and corresponding class of the second sample group will be divided into
Distinguishing label, which is input in the second disaggregated model, to be trained until reaching training condition.
The device of the application disaggregated model training passes through straight to the maximum class label of accounting and its corresponding characteristic
It connects and is divided into first sample group, the sample data is concentrated into remaining class label in addition to the maximum first category label of accounting
And its corresponding characteristic is divided into the second sample group, owns in first category label and the second sample group in first sample group
The accounting of class label tends to be balanced, improves the training effect of the first disaggregated model, then will be divided into the second sample group negative sample
Characteristic and corresponding class label be input in the second disaggregated model, the class label accounting in negative sample equally tends to
Equilibrium, so as to improve the training effect of the second disaggregated model, it can be ensured that the training effect of disaggregated model improves trained point
The accuracy rate of the prediction result of class model.
Preferably, the division module 604 is additionally configured to the highest first category mark of accounting in the class label
Label and its corresponding characteristic are divided into first sample group, and the sample data is concentrated remaining in addition to first category label
Class label and its corresponding characteristic are divided into the second sample group;
The accounting that remaining class label in addition to first category label is concentrated according to the sample data is determining described the
It is in the case that class label accounting is unbalance in two sample groups, accounting in the corresponding class label of second sample group is highest
Second category label and its corresponding characteristic are divided into third sample packet, by the corresponding classification mark of second sample group
Remaining class label and its corresponding characteristic removed outside second category label in label is divided into the 4th sample packet;
The training module 606 is additionally configured to the first sample group and the second sample group being input to two disaggregated models
It is trained until reaching training condition;The third sample group and the 4th sample group are input to two disaggregated models to be trained
Until reaching training condition;4th sample group is input in more disaggregated models and is trained until reaching training condition.
Preferably, the device of the disaggregated model training, further includes:
Removing module is configured as setting first threshold;Deleting the sample data concentrates accounting lower than first threshold
Class label and its corresponding characteristic.
Preferably, the device of the disaggregated model training, further includes:
Merging module, is configured as setting second threshold, and the second threshold is greater than first threshold;Accounting is located at described
Class label between first threshold and second threshold merges into combination sort label.
The training module 606 is additionally configured to the first sample group and the second sample group being input to two disaggregated models
It is trained until reaching training condition.
The training module 606 is additionally configured to that the characteristic of the second sample group and corresponding class label will be divided into
It is input in more disaggregated models and is trained until reaching training condition.
Fig. 7 is the apparatus structure schematic diagram of the data classification of one embodiment of the application, and the device of the data classification includes:
Receiving module 702 is configured as receiving characteristic to be sorted;
Input module 704 is configured as the characteristic to be sorted being input to the first disaggregated model;
First determining module 706 is configured as in the case where it is first category label that first disaggregated model, which exports,
Determine that the characteristic to be sorted is the corresponding classification of first category label;
Second determining module 708 is configured as in first disaggregated model output being its in addition to first category label
In the case where remaining class label, the characteristic to be sorted is input to the second disaggregated model, according to the second disaggregated model
Output result determines the corresponding classification of the characteristic to be sorted.
Preferably, second determining module 708 be configured to judge the characteristic to be sorted whether be
Combination sort;
If so, determining the corresponding class of the characteristic to be sorted according to the corresponding classification of characteristic in combination sort
Not;
If it is not, using the output result of the second disaggregated model as the corresponding classification of the characteristic to be sorted.
Preferably, second determining module 708 is configured to according at least two classification pair in combination sort
Ratio of the class label answered in sample data set;
Determine classification of the classification as the characteristic to be sorted in the combination sort.
The device of the above-mentioned data classification of the application by the way that the characteristic to be sorted is input to the first disaggregated model, if
The output result of first disaggregated model is first category label, directly determines the classification of characteristic to be sorted, if described first
The output result of disaggregated model is remaining class label in addition to first category label, and the characteristic to be sorted is input to
Second disaggregated model determines the corresponding classification of the characteristic to be sorted according to the output result of the second disaggregated model, substantially
Improve the accuracy of data classification.
One embodiment of the application also provides a kind of calculating equipment, including memory, processor and storage are on a memory simultaneously
The computer instruction that can be run on a processor, the processor realize disaggregated model training as previously described when executing described instruction
Method or data classification method the step of.
One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer instruction, the instruction
Realized when being executed by processor disaggregated model training as previously described method or data classification method the step of.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited
The technical solution of the method for the method or data classification of the technical solution of storage media and above-mentioned disaggregated model training belongs to same structure
Think, the detail content that the technical solution of storage medium is not described in detail, may refer to above-mentioned disaggregated model training method or
The description of the technical solution of the method for data classification.
One embodiment of the application also provides a kind of chip, is stored with computer instruction, when which is executed by processor
Realize disaggregated model training as previously described method or data classification method the step of.
The computer instruction includes computer program code, the computer program code can for source code form,
Object identification code form, executable file or certain intermediate forms etc..The computer-readable medium may include: that can carry institute
State any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, the computer storage of computer program code
Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory),
Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior
Increase and decrease appropriate can be carried out according to the requirement made laws in jurisdiction with patent practice by holding, such as in certain jurisdictions of courts
Area does not include electric carrier signal and telecommunication signal according to legislation and patent practice, computer-readable medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because
According to the application, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this Shen
It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment
All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification,
It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to preferably explain the application
Principle and practical application, so that skilled artisan be enable to better understand and utilize the application.The application is only
It is limited by claims and its full scope and equivalent.
Claims (14)
1. a kind of method of disaggregated model training characterized by comprising
Sample data set is obtained, the sample data set includes at least three kinds of class labels and the corresponding characteristic of class label
According to counting the accounting that the quantity of each class label is concentrated in the sample data;
According to the accounting that the quantity of each class label is concentrated in the sample data, the classification mark that the sample data is concentrated
Label and its corresponding characteristic are divided at least two sample groups;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition.
2. the method according to claim 1, wherein according to the quantity of each class label in the sample data
Class label that the sample data is concentrated and its corresponding characteristic are divided at least two samples by the accounting of concentration
Group, comprising:
The highest first category label of accounting in the class label and its corresponding characteristic are divided into first sample group,
Remaining class label and its corresponding characteristic in addition to first category label is concentrated to be divided into second the sample data
Sample group;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition, comprising:
In the case that class label accounting is balanced in determining second sample group, by the first sample group and the second sample
Group is input to the first disaggregated model and is trained until reach training condition, will be divided into the characteristic of the second sample group and right
The class label answered, which is input in the second disaggregated model, to be trained until reaching training condition.
3. the method according to claim 1, wherein according to the quantity of each class label in the sample data
Class label that the sample data is concentrated and its corresponding characteristic are divided at least two samples by the accounting of concentration
Group, comprising:
The highest first category label of accounting in the class label and its corresponding characteristic are divided into first sample group,
Remaining class label and its corresponding characteristic in addition to first category label is concentrated to be divided into second the sample data
Sample group;
In the case that class label accounting is unbalance in determining second sample group, by the corresponding classification of second sample group
The highest second category label of accounting and its corresponding characteristic are divided into third sample packet in label, by second sample
This organize in corresponding class label remove second category label outside remaining class label and its corresponding characteristic be divided into
4th sample packet;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition, comprising:
In the case that class label accounting is balanced in determining the 4th sample group, by the first sample group and the second sample
Group is input to two disaggregated models and is trained until reaching training condition;
The third sample group and the 4th sample group are input to two disaggregated models to be trained until reaching training condition;
4th sample group is input in more disaggregated models and is trained until reaching training condition.
4. the method according to claim 1, wherein counting the quantity of each class label in the sample data
After the accounting of concentration, further includes:
First threshold is set;
It deletes the sample data and concentrates class label of the accounting lower than first threshold and its corresponding characteristic.
5. according to the method described in claim 2, it is characterized in that, the sample data is concentrated in addition to first category label
Remaining class label and its corresponding characteristic are divided into before the second sample group, further includes:
Second threshold is set, and the second threshold is greater than first threshold;
Class label of the accounting between the first threshold and second threshold is merged into combination sort label.
6. according to the method described in claim 2, it is characterized in that, the first sample group and the second sample group are input to
One disaggregated model is trained until reaching training condition, comprising:
The first sample group and the second sample group are input to two disaggregated models to be trained until reaching training condition.
7. the method according to claim 2 or 6, which is characterized in that the characteristic of the second sample group and right will be divided into
The class label answered, which is input in the second disaggregated model, to be trained until reaching training condition, comprising:
The characteristic for being divided into the second sample group and corresponding class label are input in more disaggregated models and are trained directly
To reaching training condition.
8. a kind of method of data classification characterized by comprising
Receive characteristic to be sorted;
The characteristic to be sorted is input to the first disaggregated model;
In the case where first disaggregated model output is first category label, determine that the characteristic to be sorted is first
The corresponding classification of class label;
It, will be described wait divide in the case where first disaggregated model output is remaining class label in addition to first category label
Category feature data are input to the second disaggregated model, determine the characteristic to be sorted according to the output result of the second disaggregated model
Corresponding classification.
9. according to the method described in claim 8, it is characterized in that, according to the output result of the second disaggregated model determine it is described to
The corresponding classification of characteristic classification data, comprising:
Judge whether the characteristic to be sorted is combination sort;
If so, determining the corresponding classification of the characteristic to be sorted according to the corresponding classification of characteristic in combination sort;
If it is not, using the output result of the second disaggregated model as the corresponding classification of the characteristic to be sorted.
10. according to the method described in claim 9, it is characterized in that, according to the corresponding classification of characteristic in combination sort, really
Determine the corresponding classification of the characteristic to be sorted, comprising:
According to the corresponding class label of at least two classifications in combination sort in the ratio of sample data set;
Determine classification of the classification as the characteristic to be sorted in the combination sort.
11. a kind of device of disaggregated model training characterized by comprising
Processing module is configured as obtaining sample data set, and the sample data set includes at least three kinds of class labels and class
The corresponding characteristic of distinguishing label counts the accounting that the quantity of each class label is concentrated in the sample data;
Division module is configured as the accounting concentrated according to the quantity of each class label in the sample data, by the sample
The class label and its corresponding characteristic that notebook data is concentrated are divided at least two sample groups;
Training module, is configured as the sample group being input in corresponding disaggregated model and is trained until reach trained item
Part.
12. a kind of device of data classification characterized by comprising
Receiving module is configured as receiving characteristic to be sorted;
Input module is configured as the characteristic to be sorted being input to the first disaggregated model;
First determining module is configured as determining institute in the case where it is first category label that first disaggregated model, which exports,
Stating characteristic to be sorted is the corresponding classification of first category label;
Second determining module is configured as exporting in first disaggregated model as remaining classification mark in addition to first category label
In the case where label, the characteristic to be sorted is input to the second disaggregated model, according to the output result of the second disaggregated model
Determine the corresponding classification of the characteristic to be sorted.
13. a kind of calculating equipment including memory, processor and stores the calculating that can be run on a memory and on a processor
Machine instruction, which is characterized in that the processor realizes side described in claim 1-7 or 8-10 any one when executing described instruction
The step of method.
14. a kind of computer readable storage medium, is stored with computer instruction, which is characterized in that the instruction is held by processor
The step of claim 1-7 or 8-10 any one the method are realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910746175.XA CN110442722B (en) | 2019-08-13 | 2019-08-13 | Method and device for training classification model and method and device for data classification |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910746175.XA CN110442722B (en) | 2019-08-13 | 2019-08-13 | Method and device for training classification model and method and device for data classification |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442722A true CN110442722A (en) | 2019-11-12 |
CN110442722B CN110442722B (en) | 2022-05-13 |
Family
ID=68435192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910746175.XA Active CN110442722B (en) | 2019-08-13 | 2019-08-13 | Method and device for training classification model and method and device for data classification |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442722B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889457A (en) * | 2019-12-03 | 2020-03-17 | 深圳奇迹智慧网络有限公司 | Sample image classification training method and device, computer equipment and storage medium |
CN110929785A (en) * | 2019-11-21 | 2020-03-27 | 中国科学院深圳先进技术研究院 | Data classification method and device, terminal equipment and readable storage medium |
CN111190973A (en) * | 2019-12-31 | 2020-05-22 | 税友软件集团股份有限公司 | Method, device, equipment and storage medium for classifying statement forms |
CN111737520A (en) * | 2020-06-22 | 2020-10-02 | Oppo广东移动通信有限公司 | Video classification method, video classification device, electronic equipment and storage medium |
CN112070138A (en) * | 2020-08-31 | 2020-12-11 | 新华智云科技有限公司 | Multi-label mixed classification model construction method, news classification method and system |
CN112434157A (en) * | 2020-11-05 | 2021-03-02 | 平安直通咨询有限公司上海分公司 | Document multi-label classification method and device, electronic equipment and storage medium |
CN113222043A (en) * | 2021-05-25 | 2021-08-06 | 北京有竹居网络技术有限公司 | Image classification method, device, equipment and storage medium |
CN113240032A (en) * | 2021-05-25 | 2021-08-10 | 北京有竹居网络技术有限公司 | Image classification method, device, equipment and storage medium |
CN113297382A (en) * | 2021-06-21 | 2021-08-24 | 西南大学 | Method for processing instrument and equipment function labeling |
CN113673866A (en) * | 2021-08-20 | 2021-11-19 | 上海寻梦信息技术有限公司 | Crop decision method, model training method and related equipment |
CN113723507A (en) * | 2021-08-30 | 2021-11-30 | 联仁健康医疗大数据科技股份有限公司 | Data classification identification determination method and device, electronic equipment and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN105446988A (en) * | 2014-06-30 | 2016-03-30 | 华为技术有限公司 | Classification predicting method and device |
CN106204083A (en) * | 2015-04-30 | 2016-12-07 | 中国移动通信集团山东有限公司 | A kind of targeted customer's sorting technique, Apparatus and system |
CN107577785A (en) * | 2017-09-15 | 2018-01-12 | 南京大学 | A kind of level multi-tag sorting technique suitable for law identification |
CN108090503A (en) * | 2017-11-28 | 2018-05-29 | 东软集团股份有限公司 | On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer |
CN108470187A (en) * | 2018-02-26 | 2018-08-31 | 华南理工大学 | A kind of class imbalance question classification method based on expansion training dataset |
CN108628971A (en) * | 2018-04-24 | 2018-10-09 | 深圳前海微众银行股份有限公司 | File classification method, text classifier and the storage medium of imbalanced data sets |
CN109117862A (en) * | 2018-06-29 | 2019-01-01 | 北京达佳互联信息技术有限公司 | Image tag recognition methods, device and server |
CN109344884A (en) * | 2018-09-14 | 2019-02-15 | 腾讯科技(深圳)有限公司 | The method and device of media information classification method, training picture classification model |
CN109376179A (en) * | 2018-08-24 | 2019-02-22 | 苏宁消费金融有限公司 | A kind of sample equilibrating method in data mining |
CN110111344A (en) * | 2019-05-13 | 2019-08-09 | 广州锟元方青医疗科技有限公司 | Pathological section image grading method, apparatus, computer equipment and storage medium |
-
2019
- 2019-08-13 CN CN201910746175.XA patent/CN110442722B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101980202A (en) * | 2010-11-04 | 2011-02-23 | 西安电子科技大学 | Semi-supervised classification method of unbalance data |
CN105446988A (en) * | 2014-06-30 | 2016-03-30 | 华为技术有限公司 | Classification predicting method and device |
CN106204083A (en) * | 2015-04-30 | 2016-12-07 | 中国移动通信集团山东有限公司 | A kind of targeted customer's sorting technique, Apparatus and system |
CN107577785A (en) * | 2017-09-15 | 2018-01-12 | 南京大学 | A kind of level multi-tag sorting technique suitable for law identification |
CN108090503A (en) * | 2017-11-28 | 2018-05-29 | 东软集团股份有限公司 | On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer |
CN108470187A (en) * | 2018-02-26 | 2018-08-31 | 华南理工大学 | A kind of class imbalance question classification method based on expansion training dataset |
CN108628971A (en) * | 2018-04-24 | 2018-10-09 | 深圳前海微众银行股份有限公司 | File classification method, text classifier and the storage medium of imbalanced data sets |
CN109117862A (en) * | 2018-06-29 | 2019-01-01 | 北京达佳互联信息技术有限公司 | Image tag recognition methods, device and server |
CN109376179A (en) * | 2018-08-24 | 2019-02-22 | 苏宁消费金融有限公司 | A kind of sample equilibrating method in data mining |
CN109344884A (en) * | 2018-09-14 | 2019-02-15 | 腾讯科技(深圳)有限公司 | The method and device of media information classification method, training picture classification model |
CN110111344A (en) * | 2019-05-13 | 2019-08-09 | 广州锟元方青医疗科技有限公司 | Pathological section image grading method, apparatus, computer equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
姜思羽等: ""结合标签相关性和不均衡性的多标签学习模型"", 《哈尔滨工业大学学报》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110929785B (en) * | 2019-11-21 | 2023-12-05 | 中国科学院深圳先进技术研究院 | Data classification method, device, terminal equipment and readable storage medium |
CN110929785A (en) * | 2019-11-21 | 2020-03-27 | 中国科学院深圳先进技术研究院 | Data classification method and device, terminal equipment and readable storage medium |
CN110889457B (en) * | 2019-12-03 | 2022-08-19 | 深圳奇迹智慧网络有限公司 | Sample image classification training method and device, computer equipment and storage medium |
CN110889457A (en) * | 2019-12-03 | 2020-03-17 | 深圳奇迹智慧网络有限公司 | Sample image classification training method and device, computer equipment and storage medium |
CN111190973A (en) * | 2019-12-31 | 2020-05-22 | 税友软件集团股份有限公司 | Method, device, equipment and storage medium for classifying statement forms |
CN111737520A (en) * | 2020-06-22 | 2020-10-02 | Oppo广东移动通信有限公司 | Video classification method, video classification device, electronic equipment and storage medium |
CN111737520B (en) * | 2020-06-22 | 2023-07-25 | Oppo广东移动通信有限公司 | Video classification method, video classification device, electronic equipment and storage medium |
CN112070138A (en) * | 2020-08-31 | 2020-12-11 | 新华智云科技有限公司 | Multi-label mixed classification model construction method, news classification method and system |
CN112070138B (en) * | 2020-08-31 | 2023-09-05 | 新华智云科技有限公司 | Construction method of multi-label mixed classification model, news classification method and system |
CN112434157B (en) * | 2020-11-05 | 2024-05-17 | 平安直通咨询有限公司上海分公司 | Method and device for classifying documents in multiple labels, electronic equipment and storage medium |
CN112434157A (en) * | 2020-11-05 | 2021-03-02 | 平安直通咨询有限公司上海分公司 | Document multi-label classification method and device, electronic equipment and storage medium |
CN113222043B (en) * | 2021-05-25 | 2024-02-02 | 北京有竹居网络技术有限公司 | Image classification method, device, equipment and storage medium |
CN113240032A (en) * | 2021-05-25 | 2021-08-10 | 北京有竹居网络技术有限公司 | Image classification method, device, equipment and storage medium |
CN113240032B (en) * | 2021-05-25 | 2024-01-30 | 北京有竹居网络技术有限公司 | Image classification method, device, equipment and storage medium |
CN113222043A (en) * | 2021-05-25 | 2021-08-06 | 北京有竹居网络技术有限公司 | Image classification method, device, equipment and storage medium |
CN113297382A (en) * | 2021-06-21 | 2021-08-24 | 西南大学 | Method for processing instrument and equipment function labeling |
CN113297382B (en) * | 2021-06-21 | 2023-04-25 | 西南大学 | Instrument and equipment function labeling processing method |
CN113673866A (en) * | 2021-08-20 | 2021-11-19 | 上海寻梦信息技术有限公司 | Crop decision method, model training method and related equipment |
CN113723507A (en) * | 2021-08-30 | 2021-11-30 | 联仁健康医疗大数据科技股份有限公司 | Data classification identification determination method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110442722B (en) | 2022-05-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442722A (en) | Method and device for training classification model and method and device for data classification | |
CN108108902B (en) | Risk event warning method and device | |
CN107844559A (en) | A kind of file classifying method, device and electronic equipment | |
CN107657267B (en) | Product potential user mining method and device | |
CN106951925A (en) | Data processing method, device, server and system | |
CN105389480B (en) | Multiclass imbalance genomics data iteration Ensemble feature selection method and system | |
CN109872162B (en) | Wind control classification and identification method and system for processing user complaint information | |
CN106651057A (en) | Mobile terminal user age prediction method based on installation package sequence table | |
CN105095755A (en) | File recognition method and apparatus | |
JP2019519042A (en) | Method and device for pushing information | |
CN103984703B (en) | Mail classification method and device | |
CN109960727B (en) | Personal privacy information automatic detection method and system for unstructured text | |
US11429810B2 (en) | Question answering method, terminal, and non-transitory computer readable storage medium | |
CN108280542A (en) | A kind of optimization method, medium and the equipment of user's portrait model | |
CN102722713A (en) | Handwritten numeral recognition method based on lie group structure data and system thereof | |
CN105045913B (en) | File classification method based on WordNet and latent semantic analysis | |
WO2021136315A1 (en) | Mail classification method and apparatus based on conjoint analysis of behavior structures and semantic content | |
CN106445908A (en) | Text identification method and apparatus | |
CN107194815B (en) | Client segmentation method and system | |
CN107958270A (en) | Classification recognition methods, device, electronic equipment and computer-readable recording medium | |
CN110543898A (en) | Supervised learning method for noise label, data classification processing method and device | |
CN104809104A (en) | Method and system for identifying micro-blog textual emotion | |
CN114663002A (en) | Method and equipment for automatically matching performance assessment indexes | |
CN109166012A (en) | The method and apparatus of classification and information push for stroke predetermined class user | |
CN104850540A (en) | Sentence recognizing method and sentence recognizing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |