CN110442722A - Method and device for training classification model and method and device for data classification - Google Patents

Method and device for training classification model and method and device for data classification Download PDF

Info

Publication number
CN110442722A
CN110442722A CN201910746175.XA CN201910746175A CN110442722A CN 110442722 A CN110442722 A CN 110442722A CN 201910746175 A CN201910746175 A CN 201910746175A CN 110442722 A CN110442722 A CN 110442722A
Authority
CN
China
Prior art keywords
characteristic
sample
label
class label
disaggregated model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910746175.XA
Other languages
Chinese (zh)
Other versions
CN110442722B (en
Inventor
王献
唐剑波
李长亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Digital Entertainment Co Ltd
Chengdu Kingsoft Digital Entertainment Co Ltd
Original Assignee
Beijing Kingsoft Digital Entertainment Co Ltd
Chengdu Kingsoft Digital Entertainment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Digital Entertainment Co Ltd, Chengdu Kingsoft Digital Entertainment Co Ltd filed Critical Beijing Kingsoft Digital Entertainment Co Ltd
Priority to CN201910746175.XA priority Critical patent/CN110442722B/en
Publication of CN110442722A publication Critical patent/CN110442722A/en
Application granted granted Critical
Publication of CN110442722B publication Critical patent/CN110442722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method and a device for training a classification model and a method and a device for classifying data, wherein the method for training the classification model comprises the following steps: acquiring a sample data set, wherein the sample data set comprises at least three category labels and characteristic data corresponding to the category labels, and counting the proportion of the number of each category label in the sample data set; dividing the class labels in the sample data set into at least two sample groups according to the proportion of the number of each class label in the sample data set; and inputting the sample group into a corresponding classification model for training until a training condition is reached. The class label proportion in the sample data set is unbalanced, the quality of the processed sample data set is greatly improved, the training effect of the classification model can be further ensured, and the classification accuracy of the trained classification model is greatly improved when the trained classification model is used for actual classification prediction.

Description

Method and device, the method and device of data classification of disaggregated model training
Technical field
This application involves data classification technology field, in particular to a kind of method and device, the data of disaggregated model training The method and device of classification calculates equipment, computer readable storage medium.
Background technique
Data classification is to carry out automatic classification marker, such as text point according to certain classification system or standard to data Class carries out automation classification according to certain bibliography system to the text of input.Text Classification has been widely used In natural language processing fields such as text audit, advertisement filter, sentiment analysis and anti-yellow identifications.
In the training method of existing disaggregated model, is usually directly concentrated from sample data and choose sample data and be input to point In class model, disaggregated model is trained, but sample data concentrates the sample number that will appear a kind of classification much larger than other The case where number of samples of classification, causes the classification for sample number in trained sample set unbalance, i.e., sample data is concentrated The classification of sample number is unbalanced, trained disaggregated model, is only the different classes of sample classification effect concentrated to training sample Fruit is good, but when classifying for raw data set, since raw data set and training sample concentrate the proportional difference of classification Larger, the classification results error rate of trained disaggregated model prediction is higher, and existing trained disaggregated model is difficult to carry out Actual application.
In the training of disaggregated model, to establish the training set of each sample class equilibrium, then need to spend a large amount of manpowers Processing material is found with material resources and obtains the training set of sample class equilibrium, and the cost of disaggregated model training will greatly improved in this.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of method and device of disaggregated model training, the method for data classification And device, calculating equipment, computer readable storage medium, to solve technological deficiency existing in the prior art.
The embodiment of the present application discloses a kind of method of disaggregated model training, comprising:
Sample data set is obtained, the sample data set includes at least three kinds of class labels and the corresponding spy of class label Data are levied, the accounting that the quantity of each class label is concentrated in the sample data is counted;
According to the accounting that the quantity of each class label is concentrated in the sample data, the class that the sample data is concentrated Distinguishing label and its corresponding characteristic are divided at least two sample groups;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition.
The embodiment of the present application also discloses a kind of method of data classification, comprising:
Receive characteristic to be sorted;
The characteristic to be sorted is input to the first disaggregated model;
In the case where first disaggregated model output is first category label, determine that the characteristic to be sorted is The corresponding classification of first category label;
It, will be described in the case where first disaggregated model output is remaining class label in addition to first category label Characteristic to be sorted is input to the second disaggregated model, determines the feature to be sorted according to the output result of the second disaggregated model The corresponding classification of data.
The embodiment of the present application discloses a kind of device of disaggregated model training characterized by comprising
Processing module, be configured as obtain sample data set, the sample data set include at least three kinds of class labels with And the corresponding characteristic of class label, count the accounting that the quantity of each class label is concentrated in the sample data;
Division module is configured as the accounting concentrated according to the quantity of each class label in the sample data, by institute The class label and its corresponding characteristic for stating sample data concentration are divided at least two sample groups;
Training module, is configured as the sample group being input in corresponding disaggregated model and is trained until reach instruction The condition of white silk.
The embodiment of the present application discloses a kind of device of data classification, comprising:
Receiving module is configured as receiving characteristic to be sorted;
Input module is configured as the characteristic to be sorted being input to the first disaggregated model;
First determining module is configured as in the case where it is first category label that first disaggregated model, which exports, really The fixed characteristic to be sorted is the corresponding classification of first category label;
Second determining module is configured as exporting in first disaggregated model as remaining class in addition to first category label In the case where distinguishing label, the characteristic to be sorted is input to the second disaggregated model, according to the output of the second disaggregated model As a result the corresponding classification of the characteristic to be sorted is determined.
The embodiment of the present application discloses a kind of calculating equipment, including memory, processor and storage are on a memory and can The computer instruction run on a processor, the processor realize disaggregated model training as described above when executing described instruction Method or data classification method the step of.
The embodiment of the present application discloses a kind of computer readable storage medium, is stored with computer instruction, the instruction quilt Processor execute when realize disaggregated model training as described above method or data classification method the step of.
Method and device, the method and device of data classification of a kind of disaggregated model training provided by the present application, statistics are every The accounting that the quantity of a class label is concentrated in the sample data;According to the quantity of each class label in the sample data The class label that the sample data is concentrated is divided at least two sample groups by the accounting of concentration;The class that sample data is concentrated Distinguishing label ratio is unbalanced, and the quality of treated sample data set greatly improves, and the sample group is input to correspondence Disaggregated model in be trained until reach training condition, and then can ensure the training effect of disaggregated model, it is trained Disaggregated model greatly improves the classification accuracy of trained disaggregated model when actual classification is predicted.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of the calculating equipment of the embodiment of the present application;
Fig. 2 is the schematic flow chart of the method for the disaggregated model training of one embodiment of the application;
Fig. 3 is the schematic flow chart of the method for the disaggregated model training of another embodiment of the application;
Fig. 4 is the schematic flow chart of the method for the data classification of one embodiment of the application;
Fig. 5 is the flow diagram that the corresponding classification of characteristic to be sorted is determined in the application;
Fig. 6 is the apparatus structure schematic diagram of the disaggregated model training of one embodiment of the application;
Fig. 7 is the apparatus structure schematic diagram of the data classification of one embodiment of the application.
Specific embodiment
Many details are explained in the following description in order to fully understand the application.But the application can be with Much it is different from other way described herein to implement, those skilled in the art can be without prejudice to the application intension the case where Under do similar popularization, therefore the application is not limited by following public specific implementation.
The term used in this specification one or more embodiment be only merely for for the purpose of describing particular embodiments, It is not intended to be limiting this specification one or more embodiment.In this specification one or more embodiment and appended claims The "an" of singular used in book, " described " and "the" are also intended to including most forms, unless context is clearly Indicate other meanings.It is also understood that term "and/or" used in this specification one or more embodiment refers to and includes One or more associated any or all of project listed may combine.
It will be appreciated that though may be retouched using term first, second etc. in this specification one or more embodiment Various information are stated, but these information should not necessarily be limited by these terms.These terms are only used to for same type of information being distinguished from each other It opens.For example, first can also be referred to as second, class in the case where not departing from this specification one or more scope of embodiments As, second can also be referred to as first.Depending on context, word as used in this " if " can be construed to " ... when " or " when ... " or " in response to determination ".
Firstly, the vocabulary of terms being related to one or more embodiments of the invention explains.
Two disaggregated models: being the model that data are carried out with binary classification, and two disaggregated models can be one kind by supervised learning Mode carries out the generalized linear classifier of binary classification to data, and linear classifier is based on training sample and finds in two-dimensional space One hyperplane separates two class samples.
More disaggregated models: the model that can classify respectively to the data of multiple classifications, more disaggregated models can be one Kind promotes tree-model, and many tree-models are integrated, and a very strong classifier is formed, in other words, by many weak typings Device integrates to form a strong classifier, in more classification problems.
Fig. 1 is to show the structural block diagram of the calculating equipment 100 according to one embodiment of this specification.The calculating equipment 100 Component include but is not limited to memory 110 and processor 120.Processor 120 is connected with memory 110 by bus 130, Database 150 is for saving data.
Calculating equipment 100 further includes access device 140, access device 140 enable calculate equipment 100 via one or Multiple networks 160 communicate.The example of these networks includes public switched telephone network (PSTN), local area network (LAN), wide area network (WAN), the combination of the communication network of personal area network (PAN) or such as internet.Access device 140 may include wired or wireless One or more of any kind of network interface (for example, network interface card (NIC)), such as IEEE802.11 wireless local area Net (WLAN) wireless interface, worldwide interoperability for microwave accesses (Wi-MAX) interface, Ethernet interface, universal serial bus (USB) connect Mouth, cellular network interface, blue tooth interface, near-field communication (NFC) interface, etc..
In one embodiment of this specification, other unshowned portions in the above-mentioned component and Fig. 1 of equipment 100 are calculated Part can also be connected to each other, such as pass through bus.It should be appreciated that calculating device structure block diagram shown in FIG. 1 merely for the sake of Exemplary purpose, rather than the limitation to this specification range.Those skilled in the art can according to need, and increases or replaces it His component.
Calculating equipment 100 can be any kind of static or mobile computing device, including mobile computer or mobile meter Calculate equipment (for example, tablet computer, personal digital assistant, laptop computer, notebook computer, net book etc.), movement Phone (for example, smart phone), wearable calculating equipment (for example, smartwatch, intelligent glasses etc.) or other kinds of shifting Dynamic equipment, or the static calculating equipment of such as desktop computer or PC.Calculating equipment 100 can also be mobile or state type Server.
Wherein, processor 120 can execute the step in method shown in Fig. 2.Fig. 2 is to show to be implemented according to the application one The schematic flow chart of the method for the disaggregated model training of example, including step 202 is to step 206.
Step 202: obtaining sample data set, the sample data set includes at least three kinds of class labels and class label Corresponding characteristic counts the accounting that the quantity of each class label is concentrated in the sample data.
Above-mentioned sample data set is the training set for disaggregated model training, sample data set can for samples of text collection or Picture sample collection, for example the class label that samples of text is concentrated is that company receives the classifications of most legal documents, legal documents Classification can for lawsuit position paper, indictment, not indictment letter of decision, position paper of prosecuting, document of protesting, criminal complaint filed by procuratorate against a court decision book, Civil counter-appealing book, administration are protested book and suggestion from prosecutorial organizations book etc..
The corresponding characteristic of the class label is to input the data calculated in disaggregated model, and the characteristic can be with For history dispute data, financial statement data, company's categorical data, project data, funds data, the technical data, city of company Field data and management environment data.
It should be noted that use the task of practical application after train classification models in above-mentioned example is, according to above-mentioned History dispute data, financial statement data, company's categorical data and the project data of company in characteristic, disaggregated model output Company will receive the classification of most legal documents whithin a period of time, i.e., prediction company will receive most whithin a period of time The classification of more legal documents.
The accounting of the quantity of each class label is concentrated to count the sample data.
For example, being schematically illustrated so that the sample data set includes tetra- kinds of class labels of A, B, C, D as an example below, join It is shown in Table 1.
Table 1
Class label A B C D
Classification Lawsuit position paper Indictment Public prosecution position paper Criminal complaint filed by procuratorate against a court decision book
Accounting 70% 15% 10% 5%
Class label A is lawsuit position paper, the corresponding class label A of characteristic, it can be understood as, the spy of a company It is class label A that it is corresponding, which to levy data, and class label A accounting is 70%, and meaning is that sample data concentrates the company for having 70% It is lawsuit position paper that most legal documents are received in sometime range.
Class label B is indictment, the corresponding class label B of characteristic, it can be understood as, the characteristic of a company According to it is corresponding be class label B, class label B accounting is 15%, and meaning is that sample data is concentrated and has 15% company at certain It is indictment that most legal documents are received in one time range.
Class label C is public prosecution position paper, and class label C accounting is 10%, and class label D is criminal complaint filed by procuratorate against a court decision book, classification Label D accounting is 5%, and referring specifically to above description, details are not described herein again.
Above-mentioned sample data set includes that tri- kinds of class label difference accountings of A, B, C, D are 70%, 15%, 10%, 5%.
Step 204: the accounting concentrated according to the quantity of each class label in the sample data, by the sample data The class label of concentration and its corresponding characteristic are divided at least two sample groups.
For example the accounting of the class label A in above-mentioned example reaches 70%, the ratio of above-mentioned four kinds of class labels of appearance is not Balanced situation.It above are only and class label accounting and characteristic are schematically illustrated, and be used for actual Trained sample data is concentrated, and the type of class label reaches ten kinds or more, and the situation that class label accounting is unbalance is even more serious, The unbalance influence disaggregated model training effect of class label accounting is solved the problems, such as by following step.
The highest first category label of accounting in the class label and its corresponding characteristic are divided into the first sample The sample data is concentrated remaining class label and its corresponding characteristic in addition to first category label to be divided by this group Second sample group.
In other words, the highest first category label of quantity accounting and its corresponding characteristic divide in the class label For first sample group, in above-mentioned example, the first sample group that class label A and its corresponding characteristic are divided will be described Sample data concentrates remaining class label and its corresponding characteristic in addition to class label A to be divided into the second sample group, i.e., Class label B, C, D and its corresponding characteristic are divided into the second sample group, the second sample group it can be appreciated that Non- first sample group, first sample group can be correspondingly arranged positive class label, and second sample group can be correspondingly arranged negative category Label, above-mentioned positive class label and negative class label can be set to 1 and 0.
Step 206: the sample group being input in corresponding disaggregated model and is trained until reaching training condition.
Such as in above-mentioned example, in the case where determining that class label accounting is balanced in second sample group, class label B Accounting 15% and class label C accounting 10% ratio be 1.5 be lower than trimming threshold 3.5, it is determined that second sample Class label accounting is balanced in group, and the first sample group and the second sample group are input to the first disaggregated model and are trained directly To training condition is reached, the characteristic for being divided into the second sample group and corresponding class label are input to the second disaggregated model In be trained until reach training condition.
In the above-mentioned implementation column of the application, by above-mentioned direct to the maximum class label of accounting and its corresponding characteristic Be divided into first sample group, by the sample data concentrate remaining class label in addition to the maximum first category label of accounting and Its corresponding characteristic is divided into the first category label and second of first sample group in the second sample group, such as above-mentioned example The all categories label accounting of sample group is respectively 70% and 30%, and the accounting of first sample group and the second sample group tends to be equal Weighing apparatus improves the training effect of the first disaggregated model, then by the characteristic for being divided into the second sample group and corresponding class label It is input in the second disaggregated model, wherein the accounting of class label B, C, D are respectively 15%, 10% and 5% in the second sample group, Class label accounting in second sample group equally tends to be balanced, so as to improve the training effect of the second disaggregated model, even The class label ratio that sample data is concentrated is unbalanced, and the quality of treated sample data set greatly improves, Jin Erneng Enough ensure the training effect of disaggregated model, trained disaggregated model is when actual classification is predicted, trained disaggregated model Classification accuracy it is high, guarantee in natural language processing fields realities such as text audit, advertisement filter, sentiment analysis and anti-yellow identifications The using effect of border application.
Below with reference to specific example, one embodiment of the application is described in detail.
Assuming that the sample data set obtained includes five kinds of class labels of A, B, C, D, E and its corresponding characteristic.
The accounting that the quantity of each class label is concentrated in the sample data is counted, table 2 shows five kinds of class labels Quantity respectively sample data concentrate accounting.
Table 2
Class label A B C D E
Classification Lawsuit position paper Indictment Public prosecution position paper Criminal complaint filed by procuratorate against a court decision book Administration is protested book
Accounting 50% 30% 8% 7% 5%
According to the accounting that the quantity of each class label is concentrated in the sample data, the class that the sample data is concentrated Distinguishing label is divided at least two sample groups.
The highest first category label of accounting in the class label and its corresponding characteristic are divided into the first sample The sample data is concentrated remaining class label and its corresponding characteristic in addition to first category label to be divided by this group Second sample group, the first sample group that class label A and its corresponding characteristic are divided.
Remaining class label and its corresponding characteristic in addition to class label A is concentrated to be divided into the sample data Class label B, C, D, E and its corresponding characteristic are divided into the second sample group by the second sample group.
In remaining class label in addition to class label A, i.e., in class label B, C, D and E, the highest class of accounting The ratio of the accounting of distinguishing label B and the high class label C of accounting second is more than trimming threshold, the accounting 30% of class label B with It is more than trimming threshold 3.5 that the ratio of the accounting 8% of class label C, which is 3.75, it is determined that class label in second sample group Accounting is unbalance, by the highest second category label of accounting and its corresponding feature in the corresponding class label of second sample group Data are divided into third sample group, i.e., the third sample group divided class label B and its corresponding characteristic, by classification mark Label C, D, E and its corresponding characteristic are divided into the 4th sample group.
Due to the class label in the 4th sample group be it is balanced, no longer the 4th sample group is divided, when So, if the 4th sample group is unbalanced, continue to divide the 4th sample group.
Class label A accounting is 50% in first sample group, and the total accounting of class label B, C, D, E is in the second sample group 50%, the first sample group and the second sample group are input to two disaggregated models and are trained until reaching trained item Part.
Class label B accounting is 30% in third sample group, and the total accounting of class label C, D, E is in the 4th sample group 20%, the third sample group and the 4th sample group are input to another two disaggregated model and are trained until reaching trained article Part.
It is 8%, 7%, 5% that class label C, D, E, which distinguish accounting, in 4th sample group, will be divided into the feature of the 4th sample Data and corresponding class label, which are input in more disaggregated models, to be trained until reaching training condition.
In above-mentioned example, for the first sample group of two disaggregated model training training and the classification mark of the second sample group Label ratio is balanced, and for the third sample group of another two disaggregated models training and the class label of the 4th sample group Ratio equally tends to be balanced, and the ratio for all kinds of distinguishing labels in the 4th sample group of more disaggregated model training is also to become In equilibrium.It is greatly improved by the quality of treated sample data set, and then can ensure the training of disaggregated model Effect, trained disaggregated model greatly improve the classification accuracy of trained disaggregated model when actual classification is predicted.
It should be noted that concentrating the accounting of class label, the disaggregated model packet being trained according to sample data in table 2 Include two two disaggregated models and disaggregated model more than one, in actual classification model training, the type and quantity of specific disaggregated model The accounting of class label is concentrated to determine according to sample data.
Fig. 3 shows the schematic flow chart of the method for the disaggregated model training according to another embodiment of the application, including Step 302 is to step 310.
Step 302: obtaining sample data set, the sample data set includes at least three kinds of class labels and class label Corresponding characteristic counts the accounting that the quantity of each class label is concentrated in the sample data.
Step 304: setting first threshold, delete the sample data concentrate class label of the accounting lower than first threshold and Its corresponding characteristic.
For example setting first threshold is 1%, sample data is then concentrated accounting low by the class label accounting less than 1% It is deleted in the class label of first threshold and its corresponding characteristic.
The influence very little of class label and its corresponding characteristic to model training due to accounting lower than first threshold, By concentrating accounting to delete lower than the class label of first threshold the sample data, to ensure that sample data concentrates class Distinguishing label accounting tends to be balanced, and the effect of disaggregated model training can be integrally improved in such following step.
Step 306: setting second threshold, the second threshold are greater than first threshold, accounting are located at the first threshold Class label between second threshold merges into combination sort label.
Can be set second threshold be 5%, the class label be located at first threshold be 1% and second threshold be 5% it Between, class label is merged into combination sort label, the harmony that sample data concentrates class label is further increased, by setting First threshold and second threshold are set, sample data set is handled, greatly improves the training effect of disaggregated model in following step Fruit.
It should be noted that the specific value of above-mentioned first threshold and second threshold, it can be according to actual sample data set Middle class label quantity and the accounting of class label determine.
Step 308: the highest first category label of accounting in the class label and its corresponding characteristic are divided For first sample group, the sample data is concentrated to remaining class label and its corresponding characteristic in addition to first category label According to being divided into the second sample group.
The first sample group that the highest first category label of accounting and its corresponding characteristic are divided, by the sample Remaining class label and combination sort label in data set in addition to first category label are divided into the second sample group.
Step 310: the first sample group and the second sample group being input to two disaggregated models and are trained until reaching The characteristic for being divided into negative sample and corresponding class label are input in more disaggregated models and are trained directly by training condition To reaching training condition.
In above-described embodiment of the application, on the one hand, by above-mentioned to the maximum class label of accounting and its corresponding spy Sign data are directly divided into first sample group, and the sample data is concentrated remaining in addition to the maximum first category label of accounting Class label and its corresponding characteristic are divided into the second sample group;On the other hand, concentrate accounting low the sample data It is deleted in the class label of first threshold, and the classification mark by accounting between the first threshold and second threshold Label merge into combination sort label, and sample data concentrates class label balanced to improve, so even sample data set In class label there is the unbalanced situation of ratio, the application can also ensure that the training effect of disaggregated model, greatly improve The accuracy rate of the prediction result of trained disaggregated model guarantees in text audit, advertisement filter, sentiment analysis and anti-yellow identification Etc. natural language processing fields practical application using effect.
Fig. 4 is to show the schematic flow chart of the method for the data classification according to one embodiment of the application, including step 402 to step 408.
Step 402: receiving characteristic to be sorted.
In above-mentioned example, for example to predict that a company receives most legal documents classifications in following a period of time, then Receive history dispute data, the financial statement data, company's categorical data, project data, funds data, technology number of the said firm According to, marketing data and management environment data.
Step 404: the characteristic to be sorted is input to the first disaggregated model.
Step 406: in the case where first disaggregated model output is first category label, determining the spy to be sorted Sign data are the corresponding classification of first category label.
Step 408: the case where first disaggregated model output is remaining class label in addition to first category label Under, the characteristic to be sorted is input to the second disaggregated model, according to the determination of the output result of the second disaggregated model The corresponding classification of characteristic to be sorted.
Shown in Figure 5, shown step 408 specifically includes step 502 to step 506.
Step 502: judging whether the characteristic to be sorted is combination sort, if so, step 504 is executed, if it is not, holding Row step 506.
Step 504: according to the corresponding classification of characteristic in combination sort, determining that the characteristic to be sorted is corresponding Classification.
The step 504 includes step 5042 to step 5044.
Step 5042: according to the corresponding class label of at least two classifications in combination sort in the ratio of sample data set.
According to the corresponding class label of classification in combination sort sample data set ratio, it is of all categories as upper to determine Company is stated to determine in the following probability for receiving most legal documents classifications for a period of time.
Step 5044: determining classification of the classification as the characteristic to be sorted in the combination sort.
According to the corresponding probability of classification each in said combination classification, to determine that a classification in combination sort is made at random Most legal documents classifications is received in following a period of time for the said firm, the accurate of the said firm's classification can be further increased Property.
Step 506: using the output result of the second disaggregated model as the corresponding classification of the characteristic to be sorted.
The above-mentioned implementation column of the application is by being input to the first disaggregated model for the characteristic to be sorted, if the first classification The output result of model is first category label, the classification of characteristic to be sorted is directly determined, if first disaggregated model Output result be remaining class label in addition to first category label, the characteristic to be sorted is input to the second classification Model determines the corresponding classification of the characteristic to be sorted according to the output result of the second disaggregated model, greatly improves data The accuracy of classification.
For example the specific implementation process of the method for the method and data classification of the model training of bright the application below.
The technical solution of the application in order to facilitate understanding, below for example the method for the model training of bright the application and The specific implementation process of the method for data classification.
Assuming that the sample data set obtained includes seven kinds of class labels of A, B, C, D, E, F, G and its corresponding characteristic.
The accounting that the quantity of each class label is concentrated in the sample data is counted, table 3 shows seven kinds of class labels Quantity respectively sample data concentrate accounting.
Table 3
For example setting first threshold is 1%, the accounting of the class label G is 0.5% less than 1%, by sample data set Middle accounting is deleted lower than the class label G of first threshold and its corresponding characteristic.
It is 5% that second threshold, which is arranged, and the ratio of the class label E is 2% and the ratio of class label F is 1% equal Positioned at first threshold be 1% and second threshold is that class label E and class label F are merged into combination sort mark between 5% Label.
The highest first category label of accounting is A in the class label, by the highest first category label A of accounting and its The sample data is concentrated remaining class label in addition to class label A by the first sample group that corresponding characteristic divides And its corresponding characteristic is divided into the second sample group, i.e., by class label B, C, D, combination sort label and its respectively corresponds Characteristic be divided into the second sample group.
The first sample group and the second sample group are input to two disaggregated models to be trained until reaching training condition.
The characteristic for being divided into the second sample group and corresponding class label are input in more disaggregated models and instructed Practice until reaching training condition, i.e., by class label B, C, D, combination tag and its corresponding characteristic in the second sample group According to being input in more disaggregated models, the training of disaggregated model is completed.
Method below by taking the trained disaggregated model of above-mentioned sample data set as an example, to illustrate data classification.
It is now to one company of prediction and receives most legal documents classifications in following a period of time.
Characteristic to be sorted is received, characteristic to be sorted can be the history dispute data, financial statement of the said firm Data, company's categorical data, project data, funds data, technical data, marketing data and management environment data.
The characteristic to be sorted is input to the first disaggregated model.
If the output class label A of two disaggregated model directly determines the corresponding class of the characteristic to be sorted Not Wei lawsuit position paper, i.e., it is lawsuit position paper that the said firm, which receives most legal documents classifications in following a period of time,.
If the output result of two disaggregated model is remaining class label in addition to class label A, then will be described wait divide Category feature data are input to more disaggregated models, determine that the characteristic to be sorted is corresponding according to the output result of more disaggregated models Classification, for example it is indictment or public prosecution position paper or punishment that the said firm, which receives most legal documents classifications in following a period of time, Thing is protested book.
If the output result of more disaggregated models is combination sort, protested book according to civil counter-appealing book and administration in combination sort Corresponding class label is respectively 2% and 1% in the ratio of sample data set.The classification for then determining prediction is civil counter-appealing book Probability is 2/3, and the classification of prediction is that the probability of administrative book of protesting is 1/3, according to probability stochastic prediction the said firm of two categories Most legal documents classifications is received in following a period of time.
Fig. 6 is the apparatus structure schematic diagram of the disaggregated model training of one embodiment of the application, the disaggregated model training Device includes:
Processing module 602 is configured as obtaining sample data set, and the sample data set includes at least three kinds of class labels And the corresponding characteristic of class label, count the accounting that the quantity of each class label is concentrated in the sample data.
Division module 604 is configured as the accounting concentrated according to the quantity of each class label in the sample data, will The class label and its corresponding characteristic that the sample data is concentrated are divided at least two sample groups.
Training module 606, is configured as the sample group being input in corresponding disaggregated model and is trained until reach To training condition.
Preferably, the division module 604 is further configured to the highest first kind of accounting in the class label Distinguishing label and its corresponding characteristic are divided into first sample group, and the sample data is concentrated in addition to first category label Remaining class label and its corresponding characteristic are divided into the second sample group.
Preferably, the training module 606 is further configured to input the first sample group and the second sample group It is trained to the first disaggregated model up to reaching training condition, the characteristic and corresponding class of the second sample group will be divided into Distinguishing label, which is input in the second disaggregated model, to be trained until reaching training condition.
The device of the application disaggregated model training passes through straight to the maximum class label of accounting and its corresponding characteristic It connects and is divided into first sample group, the sample data is concentrated into remaining class label in addition to the maximum first category label of accounting And its corresponding characteristic is divided into the second sample group, owns in first category label and the second sample group in first sample group The accounting of class label tends to be balanced, improves the training effect of the first disaggregated model, then will be divided into the second sample group negative sample Characteristic and corresponding class label be input in the second disaggregated model, the class label accounting in negative sample equally tends to Equilibrium, so as to improve the training effect of the second disaggregated model, it can be ensured that the training effect of disaggregated model improves trained point The accuracy rate of the prediction result of class model.
Preferably, the division module 604 is additionally configured to the highest first category mark of accounting in the class label Label and its corresponding characteristic are divided into first sample group, and the sample data is concentrated remaining in addition to first category label Class label and its corresponding characteristic are divided into the second sample group;
The accounting that remaining class label in addition to first category label is concentrated according to the sample data is determining described the It is in the case that class label accounting is unbalance in two sample groups, accounting in the corresponding class label of second sample group is highest Second category label and its corresponding characteristic are divided into third sample packet, by the corresponding classification mark of second sample group Remaining class label and its corresponding characteristic removed outside second category label in label is divided into the 4th sample packet;
The training module 606 is additionally configured to the first sample group and the second sample group being input to two disaggregated models It is trained until reaching training condition;The third sample group and the 4th sample group are input to two disaggregated models to be trained Until reaching training condition;4th sample group is input in more disaggregated models and is trained until reaching training condition.
Preferably, the device of the disaggregated model training, further includes:
Removing module is configured as setting first threshold;Deleting the sample data concentrates accounting lower than first threshold Class label and its corresponding characteristic.
Preferably, the device of the disaggregated model training, further includes:
Merging module, is configured as setting second threshold, and the second threshold is greater than first threshold;Accounting is located at described Class label between first threshold and second threshold merges into combination sort label.
The training module 606 is additionally configured to the first sample group and the second sample group being input to two disaggregated models It is trained until reaching training condition.
The training module 606 is additionally configured to that the characteristic of the second sample group and corresponding class label will be divided into It is input in more disaggregated models and is trained until reaching training condition.
Fig. 7 is the apparatus structure schematic diagram of the data classification of one embodiment of the application, and the device of the data classification includes:
Receiving module 702 is configured as receiving characteristic to be sorted;
Input module 704 is configured as the characteristic to be sorted being input to the first disaggregated model;
First determining module 706 is configured as in the case where it is first category label that first disaggregated model, which exports, Determine that the characteristic to be sorted is the corresponding classification of first category label;
Second determining module 708 is configured as in first disaggregated model output being its in addition to first category label In the case where remaining class label, the characteristic to be sorted is input to the second disaggregated model, according to the second disaggregated model Output result determines the corresponding classification of the characteristic to be sorted.
Preferably, second determining module 708 be configured to judge the characteristic to be sorted whether be Combination sort;
If so, determining the corresponding class of the characteristic to be sorted according to the corresponding classification of characteristic in combination sort Not;
If it is not, using the output result of the second disaggregated model as the corresponding classification of the characteristic to be sorted.
Preferably, second determining module 708 is configured to according at least two classification pair in combination sort Ratio of the class label answered in sample data set;
Determine classification of the classification as the characteristic to be sorted in the combination sort.
The device of the above-mentioned data classification of the application by the way that the characteristic to be sorted is input to the first disaggregated model, if The output result of first disaggregated model is first category label, directly determines the classification of characteristic to be sorted, if described first The output result of disaggregated model is remaining class label in addition to first category label, and the characteristic to be sorted is input to Second disaggregated model determines the corresponding classification of the characteristic to be sorted according to the output result of the second disaggregated model, substantially Improve the accuracy of data classification.
One embodiment of the application also provides a kind of calculating equipment, including memory, processor and storage are on a memory simultaneously The computer instruction that can be run on a processor, the processor realize disaggregated model training as previously described when executing described instruction Method or data classification method the step of.
One embodiment of the application also provides a kind of computer readable storage medium, is stored with computer instruction, the instruction Realized when being executed by processor disaggregated model training as previously described method or data classification method the step of.
A kind of exemplary scheme of above-mentioned computer readable storage medium for the present embodiment.It should be noted that this is deposited The technical solution of the method for the method or data classification of the technical solution of storage media and above-mentioned disaggregated model training belongs to same structure Think, the detail content that the technical solution of storage medium is not described in detail, may refer to above-mentioned disaggregated model training method or The description of the technical solution of the method for data classification.
One embodiment of the application also provides a kind of chip, is stored with computer instruction, when which is executed by processor Realize disaggregated model training as previously described method or data classification method the step of.
The computer instruction includes computer program code, the computer program code can for source code form, Object identification code form, executable file or certain intermediate forms etc..The computer-readable medium may include: that can carry institute State any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic disk, CD, the computer storage of computer program code Device, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), Electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that the computer-readable medium include it is interior Increase and decrease appropriate can be carried out according to the requirement made laws in jurisdiction with patent practice by holding, such as in certain jurisdictions of courts Area does not include electric carrier signal and telecommunication signal according to legislation and patent practice, computer-readable medium.
It should be noted that for the various method embodiments described above, describing for simplicity, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, certain steps can use other sequences or carry out simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules might not all be this Shen It please be necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiments.
The application preferred embodiment disclosed above is only intended to help to illustrate the application.There is no detailed for alternative embodiment All details are described, are not limited the invention to the specific embodiments described.Obviously, according to the content of this specification, It can make many modifications and variations.These embodiments are chosen and specifically described to this specification, is in order to preferably explain the application Principle and practical application, so that skilled artisan be enable to better understand and utilize the application.The application is only It is limited by claims and its full scope and equivalent.

Claims (14)

1. a kind of method of disaggregated model training characterized by comprising
Sample data set is obtained, the sample data set includes at least three kinds of class labels and the corresponding characteristic of class label According to counting the accounting that the quantity of each class label is concentrated in the sample data;
According to the accounting that the quantity of each class label is concentrated in the sample data, the classification mark that the sample data is concentrated Label and its corresponding characteristic are divided at least two sample groups;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition.
2. the method according to claim 1, wherein according to the quantity of each class label in the sample data Class label that the sample data is concentrated and its corresponding characteristic are divided at least two samples by the accounting of concentration Group, comprising:
The highest first category label of accounting in the class label and its corresponding characteristic are divided into first sample group, Remaining class label and its corresponding characteristic in addition to first category label is concentrated to be divided into second the sample data Sample group;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition, comprising:
In the case that class label accounting is balanced in determining second sample group, by the first sample group and the second sample Group is input to the first disaggregated model and is trained until reach training condition, will be divided into the characteristic of the second sample group and right The class label answered, which is input in the second disaggregated model, to be trained until reaching training condition.
3. the method according to claim 1, wherein according to the quantity of each class label in the sample data Class label that the sample data is concentrated and its corresponding characteristic are divided at least two samples by the accounting of concentration Group, comprising:
The highest first category label of accounting in the class label and its corresponding characteristic are divided into first sample group, Remaining class label and its corresponding characteristic in addition to first category label is concentrated to be divided into second the sample data Sample group;
In the case that class label accounting is unbalance in determining second sample group, by the corresponding classification of second sample group The highest second category label of accounting and its corresponding characteristic are divided into third sample packet in label, by second sample This organize in corresponding class label remove second category label outside remaining class label and its corresponding characteristic be divided into 4th sample packet;
The sample group is input in corresponding disaggregated model and is trained until reaching training condition, comprising:
In the case that class label accounting is balanced in determining the 4th sample group, by the first sample group and the second sample Group is input to two disaggregated models and is trained until reaching training condition;
The third sample group and the 4th sample group are input to two disaggregated models to be trained until reaching training condition;
4th sample group is input in more disaggregated models and is trained until reaching training condition.
4. the method according to claim 1, wherein counting the quantity of each class label in the sample data After the accounting of concentration, further includes:
First threshold is set;
It deletes the sample data and concentrates class label of the accounting lower than first threshold and its corresponding characteristic.
5. according to the method described in claim 2, it is characterized in that, the sample data is concentrated in addition to first category label Remaining class label and its corresponding characteristic are divided into before the second sample group, further includes:
Second threshold is set, and the second threshold is greater than first threshold;
Class label of the accounting between the first threshold and second threshold is merged into combination sort label.
6. according to the method described in claim 2, it is characterized in that, the first sample group and the second sample group are input to One disaggregated model is trained until reaching training condition, comprising:
The first sample group and the second sample group are input to two disaggregated models to be trained until reaching training condition.
7. the method according to claim 2 or 6, which is characterized in that the characteristic of the second sample group and right will be divided into The class label answered, which is input in the second disaggregated model, to be trained until reaching training condition, comprising:
The characteristic for being divided into the second sample group and corresponding class label are input in more disaggregated models and are trained directly To reaching training condition.
8. a kind of method of data classification characterized by comprising
Receive characteristic to be sorted;
The characteristic to be sorted is input to the first disaggregated model;
In the case where first disaggregated model output is first category label, determine that the characteristic to be sorted is first The corresponding classification of class label;
It, will be described wait divide in the case where first disaggregated model output is remaining class label in addition to first category label Category feature data are input to the second disaggregated model, determine the characteristic to be sorted according to the output result of the second disaggregated model Corresponding classification.
9. according to the method described in claim 8, it is characterized in that, according to the output result of the second disaggregated model determine it is described to The corresponding classification of characteristic classification data, comprising:
Judge whether the characteristic to be sorted is combination sort;
If so, determining the corresponding classification of the characteristic to be sorted according to the corresponding classification of characteristic in combination sort;
If it is not, using the output result of the second disaggregated model as the corresponding classification of the characteristic to be sorted.
10. according to the method described in claim 9, it is characterized in that, according to the corresponding classification of characteristic in combination sort, really Determine the corresponding classification of the characteristic to be sorted, comprising:
According to the corresponding class label of at least two classifications in combination sort in the ratio of sample data set;
Determine classification of the classification as the characteristic to be sorted in the combination sort.
11. a kind of device of disaggregated model training characterized by comprising
Processing module is configured as obtaining sample data set, and the sample data set includes at least three kinds of class labels and class The corresponding characteristic of distinguishing label counts the accounting that the quantity of each class label is concentrated in the sample data;
Division module is configured as the accounting concentrated according to the quantity of each class label in the sample data, by the sample The class label and its corresponding characteristic that notebook data is concentrated are divided at least two sample groups;
Training module, is configured as the sample group being input in corresponding disaggregated model and is trained until reach trained item Part.
12. a kind of device of data classification characterized by comprising
Receiving module is configured as receiving characteristic to be sorted;
Input module is configured as the characteristic to be sorted being input to the first disaggregated model;
First determining module is configured as determining institute in the case where it is first category label that first disaggregated model, which exports, Stating characteristic to be sorted is the corresponding classification of first category label;
Second determining module is configured as exporting in first disaggregated model as remaining classification mark in addition to first category label In the case where label, the characteristic to be sorted is input to the second disaggregated model, according to the output result of the second disaggregated model Determine the corresponding classification of the characteristic to be sorted.
13. a kind of calculating equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine instruction, which is characterized in that the processor realizes side described in claim 1-7 or 8-10 any one when executing described instruction The step of method.
14. a kind of computer readable storage medium, is stored with computer instruction, which is characterized in that the instruction is held by processor The step of claim 1-7 or 8-10 any one the method are realized when row.
CN201910746175.XA 2019-08-13 2019-08-13 Method and device for training classification model and method and device for data classification Active CN110442722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910746175.XA CN110442722B (en) 2019-08-13 2019-08-13 Method and device for training classification model and method and device for data classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910746175.XA CN110442722B (en) 2019-08-13 2019-08-13 Method and device for training classification model and method and device for data classification

Publications (2)

Publication Number Publication Date
CN110442722A true CN110442722A (en) 2019-11-12
CN110442722B CN110442722B (en) 2022-05-13

Family

ID=68435192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910746175.XA Active CN110442722B (en) 2019-08-13 2019-08-13 Method and device for training classification model and method and device for data classification

Country Status (1)

Country Link
CN (1) CN110442722B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889457A (en) * 2019-12-03 2020-03-17 深圳奇迹智慧网络有限公司 Sample image classification training method and device, computer equipment and storage medium
CN110929785A (en) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 Data classification method and device, terminal equipment and readable storage medium
CN111190973A (en) * 2019-12-31 2020-05-22 税友软件集团股份有限公司 Method, device, equipment and storage medium for classifying statement forms
CN111737520A (en) * 2020-06-22 2020-10-02 Oppo广东移动通信有限公司 Video classification method, video classification device, electronic equipment and storage medium
CN112070138A (en) * 2020-08-31 2020-12-11 新华智云科技有限公司 Multi-label mixed classification model construction method, news classification method and system
CN112434157A (en) * 2020-11-05 2021-03-02 平安直通咨询有限公司上海分公司 Document multi-label classification method and device, electronic equipment and storage medium
CN113222043A (en) * 2021-05-25 2021-08-06 北京有竹居网络技术有限公司 Image classification method, device, equipment and storage medium
CN113240032A (en) * 2021-05-25 2021-08-10 北京有竹居网络技术有限公司 Image classification method, device, equipment and storage medium
CN113297382A (en) * 2021-06-21 2021-08-24 西南大学 Method for processing instrument and equipment function labeling
CN113673866A (en) * 2021-08-20 2021-11-19 上海寻梦信息技术有限公司 Crop decision method, model training method and related equipment
CN113723507A (en) * 2021-08-30 2021-11-30 联仁健康医疗大数据科技股份有限公司 Data classification identification determination method and device, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN105446988A (en) * 2014-06-30 2016-03-30 华为技术有限公司 Classification predicting method and device
CN106204083A (en) * 2015-04-30 2016-12-07 中国移动通信集团山东有限公司 A kind of targeted customer's sorting technique, Apparatus and system
CN107577785A (en) * 2017-09-15 2018-01-12 南京大学 A kind of level multi-tag sorting technique suitable for law identification
CN108090503A (en) * 2017-11-28 2018-05-29 东软集团股份有限公司 On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer
CN108470187A (en) * 2018-02-26 2018-08-31 华南理工大学 A kind of class imbalance question classification method based on expansion training dataset
CN108628971A (en) * 2018-04-24 2018-10-09 深圳前海微众银行股份有限公司 File classification method, text classifier and the storage medium of imbalanced data sets
CN109117862A (en) * 2018-06-29 2019-01-01 北京达佳互联信息技术有限公司 Image tag recognition methods, device and server
CN109344884A (en) * 2018-09-14 2019-02-15 腾讯科技(深圳)有限公司 The method and device of media information classification method, training picture classification model
CN109376179A (en) * 2018-08-24 2019-02-22 苏宁消费金融有限公司 A kind of sample equilibrating method in data mining
CN110111344A (en) * 2019-05-13 2019-08-09 广州锟元方青医疗科技有限公司 Pathological section image grading method, apparatus, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101980202A (en) * 2010-11-04 2011-02-23 西安电子科技大学 Semi-supervised classification method of unbalance data
CN105446988A (en) * 2014-06-30 2016-03-30 华为技术有限公司 Classification predicting method and device
CN106204083A (en) * 2015-04-30 2016-12-07 中国移动通信集团山东有限公司 A kind of targeted customer's sorting technique, Apparatus and system
CN107577785A (en) * 2017-09-15 2018-01-12 南京大学 A kind of level multi-tag sorting technique suitable for law identification
CN108090503A (en) * 2017-11-28 2018-05-29 东软集团股份有限公司 On-line tuning method, apparatus, storage medium and the electronic equipment of multi-categorizer
CN108470187A (en) * 2018-02-26 2018-08-31 华南理工大学 A kind of class imbalance question classification method based on expansion training dataset
CN108628971A (en) * 2018-04-24 2018-10-09 深圳前海微众银行股份有限公司 File classification method, text classifier and the storage medium of imbalanced data sets
CN109117862A (en) * 2018-06-29 2019-01-01 北京达佳互联信息技术有限公司 Image tag recognition methods, device and server
CN109376179A (en) * 2018-08-24 2019-02-22 苏宁消费金融有限公司 A kind of sample equilibrating method in data mining
CN109344884A (en) * 2018-09-14 2019-02-15 腾讯科技(深圳)有限公司 The method and device of media information classification method, training picture classification model
CN110111344A (en) * 2019-05-13 2019-08-09 广州锟元方青医疗科技有限公司 Pathological section image grading method, apparatus, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜思羽等: ""结合标签相关性和不均衡性的多标签学习模型"", 《哈尔滨工业大学学报》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929785B (en) * 2019-11-21 2023-12-05 中国科学院深圳先进技术研究院 Data classification method, device, terminal equipment and readable storage medium
CN110929785A (en) * 2019-11-21 2020-03-27 中国科学院深圳先进技术研究院 Data classification method and device, terminal equipment and readable storage medium
CN110889457B (en) * 2019-12-03 2022-08-19 深圳奇迹智慧网络有限公司 Sample image classification training method and device, computer equipment and storage medium
CN110889457A (en) * 2019-12-03 2020-03-17 深圳奇迹智慧网络有限公司 Sample image classification training method and device, computer equipment and storage medium
CN111190973A (en) * 2019-12-31 2020-05-22 税友软件集团股份有限公司 Method, device, equipment and storage medium for classifying statement forms
CN111737520A (en) * 2020-06-22 2020-10-02 Oppo广东移动通信有限公司 Video classification method, video classification device, electronic equipment and storage medium
CN111737520B (en) * 2020-06-22 2023-07-25 Oppo广东移动通信有限公司 Video classification method, video classification device, electronic equipment and storage medium
CN112070138A (en) * 2020-08-31 2020-12-11 新华智云科技有限公司 Multi-label mixed classification model construction method, news classification method and system
CN112070138B (en) * 2020-08-31 2023-09-05 新华智云科技有限公司 Construction method of multi-label mixed classification model, news classification method and system
CN112434157B (en) * 2020-11-05 2024-05-17 平安直通咨询有限公司上海分公司 Method and device for classifying documents in multiple labels, electronic equipment and storage medium
CN112434157A (en) * 2020-11-05 2021-03-02 平安直通咨询有限公司上海分公司 Document multi-label classification method and device, electronic equipment and storage medium
CN113222043B (en) * 2021-05-25 2024-02-02 北京有竹居网络技术有限公司 Image classification method, device, equipment and storage medium
CN113240032A (en) * 2021-05-25 2021-08-10 北京有竹居网络技术有限公司 Image classification method, device, equipment and storage medium
CN113240032B (en) * 2021-05-25 2024-01-30 北京有竹居网络技术有限公司 Image classification method, device, equipment and storage medium
CN113222043A (en) * 2021-05-25 2021-08-06 北京有竹居网络技术有限公司 Image classification method, device, equipment and storage medium
CN113297382A (en) * 2021-06-21 2021-08-24 西南大学 Method for processing instrument and equipment function labeling
CN113297382B (en) * 2021-06-21 2023-04-25 西南大学 Instrument and equipment function labeling processing method
CN113673866A (en) * 2021-08-20 2021-11-19 上海寻梦信息技术有限公司 Crop decision method, model training method and related equipment
CN113723507A (en) * 2021-08-30 2021-11-30 联仁健康医疗大数据科技股份有限公司 Data classification identification determination method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110442722B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN110442722A (en) Method and device for training classification model and method and device for data classification
CN108108902B (en) Risk event warning method and device
CN107844559A (en) A kind of file classifying method, device and electronic equipment
CN107657267B (en) Product potential user mining method and device
CN106951925A (en) Data processing method, device, server and system
CN105389480B (en) Multiclass imbalance genomics data iteration Ensemble feature selection method and system
CN109872162B (en) Wind control classification and identification method and system for processing user complaint information
CN106651057A (en) Mobile terminal user age prediction method based on installation package sequence table
CN105095755A (en) File recognition method and apparatus
JP2019519042A (en) Method and device for pushing information
CN103984703B (en) Mail classification method and device
CN109960727B (en) Personal privacy information automatic detection method and system for unstructured text
US11429810B2 (en) Question answering method, terminal, and non-transitory computer readable storage medium
CN108280542A (en) A kind of optimization method, medium and the equipment of user's portrait model
CN102722713A (en) Handwritten numeral recognition method based on lie group structure data and system thereof
CN105045913B (en) File classification method based on WordNet and latent semantic analysis
WO2021136315A1 (en) Mail classification method and apparatus based on conjoint analysis of behavior structures and semantic content
CN106445908A (en) Text identification method and apparatus
CN107194815B (en) Client segmentation method and system
CN107958270A (en) Classification recognition methods, device, electronic equipment and computer-readable recording medium
CN110543898A (en) Supervised learning method for noise label, data classification processing method and device
CN104809104A (en) Method and system for identifying micro-blog textual emotion
CN114663002A (en) Method and equipment for automatically matching performance assessment indexes
CN109166012A (en) The method and apparatus of classification and information push for stroke predetermined class user
CN104850540A (en) Sentence recognizing method and sentence recognizing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant