CN106909654A - A kind of multiclass classification system and method based on newsletter archive information - Google Patents

A kind of multiclass classification system and method based on newsletter archive information Download PDF

Info

Publication number
CN106909654A
CN106909654A CN201710103541.0A CN201710103541A CN106909654A CN 106909654 A CN106909654 A CN 106909654A CN 201710103541 A CN201710103541 A CN 201710103541A CN 106909654 A CN106909654 A CN 106909654A
Authority
CN
China
Prior art keywords
classification
archive information
newsletter archive
training
multiclass
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710103541.0A
Other languages
Chinese (zh)
Other versions
CN106909654B (en
Inventor
赵毅强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing time Ltd.
Original Assignee
Beijing Time Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Time Ltd By Share Ltd filed Critical Beijing Time Ltd By Share Ltd
Priority to CN201710103541.0A priority Critical patent/CN106909654B/en
Publication of CN106909654A publication Critical patent/CN106909654A/en
Application granted granted Critical
Publication of CN106909654B publication Critical patent/CN106909654B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention discloses a kind of multiclass classification system and method based on newsletter archive information, it is related to document classification technical field.Wherein, the system includes:Training module, for the classification at different levels for newsletter archive information, is trained by various machine learning algorithms to default training sample set, several amount and type of the grader according to corresponding to training result determines classification at different levels;Multiclass classification module, for several amount and type of the grader according to corresponding to the classification at different levels that training module determines, configures corresponding multiclass classification model;As a result determining module, the newsletter archive information input multiclass classification model to be sorted for that will get is classified, and the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.As can be seen here, the present invention targetedly solves the problems, such as that the uneven caused classification results of sample data are inaccurate, and effectively increases the accuracy of classification, improves classification effectiveness.

Description

A kind of multiclass classification system and method based on newsletter archive information
Technical field
The present invention relates to document classification technical field, and in particular to a kind of multiclass classification system based on newsletter archive information And method.
Background technology
With the development of Internet era, Internet resources increasingly enrich, and species is also more and more.In order to effectively inspection Above-mentioned Internet resources accurately and comprehensively classify being particularly important by the various resources on rope and utilization network.With Machine learning algorithm has been applied to newsletter archive information classification by the appearance and development of machine learning algorithm, increasing people In method.
But, inventor realize it is of the invention during, find at least there are the following problems in the prior art:Permitted Under many concrete application scenes because it is various the reason for, it may appear that sample data is distributed unbalanced situation.Running into injustice Weighing apparatus data when, in the prior art using machine learning algorithm realize regardless of level newsletter archive information classification approach can because The imbalance of sample data, causes machine learning algorithm to pay close attention to many several classes of samples too much, and makes minority class sample accurate Really identification, so as to reduce the accuracy rate of these newsletter archive information classification approach on the whole.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on State the multiclass classification system and corresponding method based on newsletter archive information of problem.
According to an aspect of the invention, there is provided a kind of multiclass classification system based on newsletter archive information, including:Instruction Practice module, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default training sample set It is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels;Multiclass classification module, is used for According to several amount and type of the grader corresponding to the classification at different levels that training module determines, corresponding multiclass classification model is configured; As a result determining module, the newsletter archive information input multiclass classification model to be sorted for that will get is classified, by multistage The output result of disaggregated model is defined as the final classification result of newsletter archive information to be sorted.
According to another aspect of the present invention, there is provided a kind of multiclass classification method based on newsletter archive information, including:Pin Classification at different levels to newsletter archive information, are trained by various machine learning algorithms to default training sample set, according to Training result determines several amount and type of the grader corresponding to classification at different levels;The number of the grader according to corresponding to classification at different levels Amount and type, configure corresponding multiclass classification model;The newsletter archive information input multiclass classification model to be sorted that will be got Classified, the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.
As can be seen here, the invention provides a kind of multiclass classification system and method based on newsletter archive information, by structure A newsletter archive information classifying system framework for multi-layer is built, and is configured not according to newsletter archive information type in each level Same multistage classifier, targetedly solves the problems, such as that the uneven caused classification results of sample data are inaccurate, and The accuracy of newsletter archive information classification is effectively increased, newsletter archive information classification efficiency is improved.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention, And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by specific embodiment of the invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is a kind of structural representation of multiclass classification system based on newsletter archive information that the embodiment of the present invention one is provided Figure;
Fig. 2 is a kind of structural representation of multiclass classification system based on newsletter archive information that the embodiment of the present invention two is provided Figure;
Fig. 3 is a kind of flow chart of multiclass classification method based on newsletter archive information that the embodiment of the present invention three is provided;
Fig. 4 is a kind of flow chart of multiclass classification method based on newsletter archive information that the embodiment of the present invention four is provided;
Fig. 5 is a kind of workflow of multiclass classification system based on newsletter archive information that the embodiment of the present invention two is provided Figure.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.Conversely, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
The invention provides a kind of multiclass classification system and method based on newsletter archive information, at least can solve the problem that existing Because of the technical problem that newsletter archive information classification caused by data nonbalance is inaccurate in technology.
Embodiment one
Fig. 1 shows a kind of multiclass classification system based on newsletter archive information of present invention offer, and the system includes:Instruction Practice module 110, multiclass classification module 120 and result determining module 130.
Training module 110, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default Training sample set be trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
During newsletter archive information classification, according to the newsletter archive information content, different newsletter archives can be believed Breath is included into different classes of.In order that the classification of newsletter archive information is accurate and fine, can be using the taxonomic hierarchies of multi-layer.Should The taxonomic hierarchies of multi-layer can be according to the level of abstraction of classification is incremented by successively, or level of abstraction according to classification Successively decrease successively.Classify for convenience, and meet custom, the three-level classified body that the present embodiment is successively decreased successively using level of abstraction System, for example, the category of " League Matches of England Premier League " one word is " physical culture ", two grades of classifications are " international soccer ", and three-level classification is " England Premier League League matches ".Level and classification foundation for taxonomic hierarchies, the present invention are not specifically limited, and those skilled in the art can basis Actual conditions flexibly set.
The problem of data nonbalance is frequently encountered during newsletter archive information classification, if classified only with one kind When algorithm carries out the classification of total data, can be because the characteristic of sorting algorithm itself causes the sorting algorithm to pay close attention to sample too much In a part of data, and another part data is accurately identified, so as to reduce categorizing system classification on the whole Accuracy.In order to overcome above mentioned problem, a kind of newsletter archive information classifying system of multi-layer, Er Qieben are present embodiments provided Corresponding grader is set on each node of embodiment each level in systems, these graders include but do not limit In:Root node grader, leaf node grader and intermediate node grader.In a particular application, the classification on each node Device can use identical sorting algorithm, it would however also be possible to employ different sorting algorithm, it is preferable that be the difference according to different levels Data characteristic corresponding to node selects different sorting algorithms.
Specifically, it is necessary to each node for being directed to each level pre-sets accordingly in the scheme that the present embodiment is provided Training sample set, the data that each training sample is concentrated should contain the whole of corresponding node categorical data or at least Most feature.Training module 110 will be instructed by various sorting algorithms to the corresponding training sample set of each node Practice, and for each node selects optimal sorting algorithm, so that it is determined that the quantity and class of the corresponding grader of classification at different levels Type.
In order to further improve the classification accuracy of multiclass classification system, in the present embodiment, various sorting algorithms are preferred It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural Network algorithm, Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention is adopted to node Specific machine learning algorithm is not specifically limited, and those skilled in the art can be set according to practical application effect.
Multiclass classification module 120, for several amount and type of the grader according to corresponding to classification at different levels, configuration is corresponding Multiclass classification model.
The multiclass classification model is a mixed model for containing many algorithms, it comprises on all nodes in system The different classifications algorithm that is used of grader, and the annexation between each grader have recorded by configuration file. In the present embodiment, the quantity of grader of the multiclass classification module 120 according to corresponding to the classification at different levels that training module 110 determines And type, corresponding multiclass classification model is configured, and generate the configuration file for recording each node classifier information;When to be sorted After newsletter archive information input multiclass classification model, multiclass classification module 120 can be according to the present node grader for getting Output result, inquires about above-mentioned configuration file, to determine the next stage node classifier of present node grader.The multiclass classification mould Type preferably comprises the tree-shaped disaggregated model of multistage node classifier.
As a result determining module 130, the newsletter archive information input multiclass classification model to be sorted for that will get is carried out Classification, the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.
Specifically, the newsletter archive information input to be sorted that as a result determining module 130 will get is to multiclass classification module In multiclass classification model in 120, multiclass classification module 120 can be entered according to built-in grader to the newsletter archive information to be sorted Row identification classification, and classification results are passed into result determining module 130, as a result determining module 130 is according to multiclass classification module The classification results of 120 outputs determine the final classification result of the newsletter archive information to be sorted.
As can be seen here, a kind of multiclass classification system based on newsletter archive information that the present invention is provided, by building The newsletter archive information classifying system framework of multi-layer, and different graders are configured in each level, targetedly solve The inaccurate problems of classification results caused by sample data is uneven, and effectively increase the standard of newsletter archive information classification True property, improves newsletter archive information classification efficiency.
Embodiment two
Fig. 2 shows a kind of multiclass classification system based on newsletter archive information of present invention offer, and the system includes:Instruction Practice module 210, evaluation module 220, multiclass classification module 230, model modification module 240 and result determining module 250.
Training module 210, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default Training sample set be trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
Specifically, training module 210 generates training sample set according to the labeled data for getting, and extracts training sample set In the training characteristics word that includes, and be that the training characteristics word for having extracted assigns corresponding weight;Then training module 210 further according to The training characteristics word and its weight for having extracted generate corresponding training feature vector, and training knot is obtained according to the training feature vector Fruit and corresponding grader.Wherein it is possible to be trained the extraction of Feature Words according to default dictionary, it is also possible to according to other Rule is trained the extraction of Feature Words, and the present invention is not especially limited to this.For being assigned for the training characteristics word for having extracted The specific method of weight, the present invention is also not specifically limited, and those skilled in the art can flexibly be set.For example, when to be sorted When newsletter archive information is text-only file, TF-IDF (Term Frequency-Inverse Document can be used Frequency, i.e. word frequency-reverse document-frequency) algorithm assigns corresponding weight to the training characteristics word for extracting.
In order to further improve the classification accuracy of multiclass classification system, in the present embodiment, various sorting algorithms are preferred It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural Network algorithm and Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention to section The specific machine learning algorithm that point is used is not specifically limited, and those skilled in the art can be set according to practical application effect It is fixed.
Evaluation module 220, evaluates for the training result to training module 210, according to evaluation result to each fraction Several amount and type of the grader corresponding to class are modified.
In order to further improve the accuracy of several amount and type of the grader of the determination of training module 210, can add and comment Valency module 220.Evaluation module 220 is evaluated the training result of training module 210 according to default checking set, and according to Several amount and type of the grader corresponding to classification at different levels that evaluation result determines to training module 210 are modified, make really Fixed grader is more suitable for the classification of level where it.Above-mentioned modification includes the deletion of grader, newly-increased and/or replacement.Its In, checking set is the sub-fraction of labeled data, is not involved in model training, is specifically used to assess the different models for training, Which effect is more preferable.
Wherein, evaluation module 220 not only can determine suitable grader with supplemental training module 210, can also be follow-up In module running, sample set and the new sorting algorithm for using to increasing newly are constantly attempted, and then for every kind of Attempt result to be evaluated, so that it is determined that more excellent grader.For the specific evaluation method that evaluation module 220 is used, this hair Bright to be not specifically limited, those skilled in the art can flexibly be set according to actual conditions.
Multiclass classification module 230, for the grader according to corresponding to the classification at different levels that training module determines quantity and Type, configures corresponding multiclass classification model.
In the present embodiment, classification of the multiclass classification module 230 according to corresponding to the classification at different levels that training module 210 determines Several amount and type of device, configure corresponding multiclass classification model, and generate configuration file corresponding with multiclass classification model;Whenever When getting the output result of present node grader, multiclass classification module 230 by inquiring about above-mentioned configuration file, so that it is determined that The next stage node classifier of present node grader.It is stored with the configuration file corresponding with each node classifier respectively Multiple configuration items, specifically, each configuration item comprising corresponding node classifier description information, the node classifier institute It is corresponding between every kind of output result of the classification type of adaptation, and/or the node classifier and its next stage node classifier Relation.Therefore, multiclass classification module 230 can just be selected by the configuration file in the automatic multiple graders classified from next stage Most suitable grader carries out further sort operation.
In the present embodiment, multiclass classification model is to include the tree-shaped disaggregated model of multistage node classifier, the model bag Multiple different types of node classifiers are included, for example, can be including root node grader, leaf node grader and intermediate node point Class device, wherein, the quantity of leaf node grader and intermediate node grader is usually multiple, for example, it may be a node point Class device only corresponds to an one-one relationship for subclassification;Can also be the multipair of the same subclassification of multiple node classifier correspondences One relation, in the case of many-to-one relationship, further can select different type according to factors such as newsletter archive information types Node classifier carry out the subclassification and be identified;Can also be the one-to-many of the corresponding multiple subclassifications of a node classifier Relation, now the classifying rules of multiple subclassifications is typically similar, therefore can be identified with same node classifier. In addition, the quantity of root node grader is usually one but it is also possible to be multiple different types of root node graders, so that suitable Should be in different newsletter archive information types.The description to multiclass classification model structure is a kind of citing above, not this hair The bright restriction to multiclass classification model structure, those skilled in the art can use other suitable structures according to actual conditions.
Model modification module 240, for the modification according to evaluation module 220, is carried out to configured multiclass classification model Update.
In order to enable a system to the newsletter archive information recognition effect being optimal, evaluation module 220 can be to training module The 210 grader number amount and type for determining constantly are modified, therefore, model modification module 240 is repaiied according to evaluation module 220 Change, configured multiclass classification model is updated, at the same time, model modification module 240 is also needed to multiclass classification mould The configuration file of the generation of block 230 is updated accordingly, and according to the configuration file after renewal, phase is carried out to multiclass classification model The renewal of matching.
In order to improve the overall operation efficiency of the system, renewal of the model modification module 240 to multiclass classification model is operated Can be the renewal operation of hot-swap type, you can with the case of not closing system, the modification knot according to evaluation module 220 Really, when new model effect is better than model on line, the multiclass classification model that quickly more new system is used is operated by hot-swap Species.In order to the hot-swap for coordinating model update module 240 is operated, can be with the configuration file of the generation of multiclass classification module 230 Comprising multiple metadata corresponding from different disaggregated models respectively, each metadata record path of corresponding disaggregated model With description information (such as version etc.), the corresponding metadata of synchronized update when disaggregated model updates, model modification module 240 can complete to update operation when the hot-swap for carrying out model updates operation automatically according to the content of metadata record.
As a result determining module 250, the newsletter archive information input multiclass classification model to be sorted for that will get is carried out Classification, the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.
Wherein, newsletter archive information to be sorted is generally complete paragraph or article, it is impossible to directly input multiclass classification mould Type is identified, therefore before multiclass classification model is input into, as a result determining module 250 needs to treat classified news text message and Row series of preprocessing is operated, and newsletter archive information to be sorted is converted into the file type that multiclass classification model can be recognized. Common pretreatment operation can extract the file characteristic word included in newsletter archive information to be sorted, be the file for having extracted Feature Words assign corresponding weight, and corresponding document characteristic vector etc. is generated according to the file characteristic word and its weight for having extracted. Wherein, extraction document Feature Words and assign respective weights rule can with training module 210 similar operations it is regular consistent, Will not be repeated here.
In addition, in actual applications, the source of newsletter archive information to be sorted is diversified, therefore, as a result determine Module 250 is also needed to first to treat classified news text message and carries out a series of standardization processing, so as to facilitate follow-up pre- place Reason operation.Common standardization processing includes setting the regular font treated in classified news text message according to default font It is adjusted, and/or treats the vocabulary in classified news text message according to default filtering rule and is filtered.
As described above, in a kind of multiclass classification system based on newsletter archive information that the present invention is provided, each node The grader of variety classes and quantity can be set, it is possible thereby to type according to newsletter archive information to be sorted and interior have Grader is pointedly set.For example, when newsletter archive information to be sorted is text type, correspondence grader can be set and adopted With NB Algorithm etc. suitable for text classification algorithm;When newsletter archive information to be sorted is picture/mb-type, can be with The algorithm that corresponding grader uses deep learning algorithm etc. suitable for picture classification is set.As can be seen here, by different sections Point sets variety classes and the grader of quantity and various types of newsletter archive information to be sorted can be carried out targetedly Identification so that newsletter archive information final classification result is more accurate.For example, when newsletter archive packet to be sorted contains picture category During type, the pictorial information included in newsletter archive information can be first obtained;Then, determined by default picture classification model The picture classification result corresponding with pictorial information;Finally, generated according to picture classification result corresponding with newsletter archive information Document characteristic vector, and the news corresponding with document characteristic vector is determined by default newsletter archive information classification model Text message classification results.When the newsletter archive information comprising picture is processed by this kind of mode, can rapidly and accurately to figure Piece is quantified, and the picture that data volume is huge and form is changeable is quantified as into corresponding picture classification result, due to the picture point Class result has that data volume is small, processing speed fast and many advantages such as good classification effect, therefore, using the picture classification result Also possess many advantages such as processing speed is fast, classification results are accurate when determining newsletter archive information type.
A kind of work of multiclass classification system based on newsletter archive information of present invention offer is be provided for convenience Make flow, the workflow of system is described in detail with reference to Fig. 5:The multiclass classification system that the present invention is provided substantially may be used To be divided into two parts, i.e. " training part " and " predicted portions ", wherein, " training part " is used for structure and the amendment of model, " predicted portions " are identified and classify for treating classified news text message using the disaggregated model for building.Just " training For part ", specifically, first by preprepared mark document input system, the training module of system can be from mark Labeled data is obtained in document, training sample set is generated using the labeled data, and concentrate extraction training special from the training sample Word is levied to store in corresponding dictionary;Then, training module carries out model training using training sample set and dictionary, so as to obtain Different disaggregated models and metadata corresponding with each disaggregated model and dictionary;Afterwards, reality of the evaluation module according to model Applicable cases are evaluated and select most suitable disaggregated model to be grasped for the identification of specific newsletter archive information to be sorted and classification Make.For " predicted portions ", specifically, first by newsletter archive information input system to be sorted, as a result determining module can be right Newsletter archive information to be sorted is pre-processed, and pretreated newsletter archive information to be sorted is sent into multiclass classification mould Block;The multiclass classification algorithm included in the multiclass classification model that multiclass classification module can be selected according to it is (such as shown in figure First-level class algorithm, secondary classification algorithm and three-level sorting algorithm) treat classified news text message and be identified and classify, and Classification results are sent to result determining module, at the same time, model modification module can also be according to the modification of evaluation module, to many The multiclass classification model that level sort module is used carries out hot-swap, so that system keeps optimal working condition;Finally, as a result really The output result of the multiclass classification model that cover half block sends multiclass classification module is defined as newsletter archive information to be sorted most Whole classification results.
As can be seen here, a kind of multiclass classification system based on newsletter archive information that the present invention is provided, by building The newsletter archive information classifying system framework of multi-layer, and different graders are configured in each level, so that targetedly Solve the problems, such as that the uneven caused classification results of sample data are inaccurate, and effectively increase newsletter archive information classification Accuracy, improve newsletter archive information classification efficiency.In addition, this multiclass classification system is also carried out using machine learning algorithm Newsletter archive information classification, and realized to the real-time of system by evaluating mechanism and model modification mechanism with hot-swapping function Amendment, enables a system to keep optimal working condition.Pass through pretreatment operation and standardized operation simultaneously so that system can Newsletter archive information to be sorted to variety classes separate sources is identified, and further increases the adaptability of system, widens The use scope of system.
Embodiment three
Fig. 3 shows a kind of multiclass classification method based on newsletter archive information of present invention offer, and the method includes:
Step S310:For the classification at different levels of newsletter archive information, by various machine learning algorithms to default training Sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
Specifically, it is necessary to each node for being directed to each level pre-sets accordingly in the scheme that the present embodiment is provided Training sample set, the data that each training sample is concentrated should contain the whole of corresponding node categorical data or at least Most feature, is then trained, and be every by various sorting algorithms to the corresponding training sample set of each node One node selects optimal sorting algorithm, so that it is determined that several amount and type of the corresponding grader of classification at different levels.
In order to further improve the classification accuracy of multiclass classification system, in the present embodiment, various sorting algorithms are preferred It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural Network algorithm and Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention to section The specific machine learning algorithm that point is used is not specifically limited, and those skilled in the art can be set according to practical application effect It is fixed.
Step S320:Several amount and type of the grader according to corresponding to classification at different levels, configure corresponding multiclass classification mould Type.
Wherein, multiclass classification model is a mixed model for containing many algorithms, it comprises all sections in system The different classifications algorithm that grader on point is used, and the connection pass between each grader is have recorded by configuration file System.In the present embodiment, several amount and type of the grader first according to corresponding to the classification at different levels that step S310 determines, configuration Corresponding multiclass classification model, and generate the configuration file for recording each node classifier information;When newsletter archive to be sorted letter After breath input multiclass classification model, further according to the output result of the present node grader for getting, above-mentioned configuration file is inquired about, To determine the next stage node classifier of present node grader.The multiclass classification model preferably comprises multistage node classifier Tree-shaped disaggregated model.
Step S330:The newsletter archive information input multiclass classification model to be sorted that will be got is classified, by multistage The output result of disaggregated model is defined as the final classification result of newsletter archive information to be sorted.
Specifically, the newsletter archive information input to be sorted that will be got in multiclass classification model, the multiclass classification mould Type can be identified classification to the newsletter archive information to be sorted according to built-in grader, and generate classification results, finally will be defeated The classification results for going out are defined as the final classification result of the newsletter archive information to be sorted.
As can be seen here, a kind of multiclass classification method based on newsletter archive information that the present invention is provided, by building The newsletter archive information classification framework of multi-layer, and different graders are configured in each level, targetedly solve sample The inaccurate problem of classification results caused by notebook data is uneven, and effectively increase the accurate of newsletter archive information classification Property, improve newsletter archive information classification efficiency.
Example IV
Fig. 4 shows a kind of multiclass classification method based on newsletter archive information of present invention offer, and the method includes:
Step S410:For the classification at different levels of newsletter archive information, by various machine learning algorithms to default training Sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
Specifically, training sample set is generated according to the labeled data for getting, extracts training sample and concentrate the training for including Feature Words, and for the training characteristics word for having extracted assigns corresponding weight;Then further according to the training characteristics word for having extracted and its Weight generates corresponding training feature vector, and training result and corresponding grader are obtained according to the training feature vector.Its In, the extraction of Feature Words can be trained according to default dictionary, it is also possible to be trained Feature Words according to other rules Extract, the present invention is not especially limited to this.For the specific method that weight is assigned for the training characteristics word for having extracted, the present invention Also it is not specifically limited, those skilled in the art can flexibly be set.For example, when newsletter archive information to be sorted is plain text text During part, can be using TF-IDF (Term Frequency-Inverse Document Frequency, i.e. word frequency-reverse files Frequency) algorithm assigns corresponding weight to the training characteristics word that extracts.
In order to further improve the classification accuracy of multiclass classification method, in the present embodiment, various sorting algorithms are preferred It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural Network algorithm, Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention is adopted to node Specific machine learning algorithm is not specifically limited, and those skilled in the art can be set according to practical application effect.
Step S420:Several amount and type of the grader according to corresponding to classification at different levels, configure corresponding multiclass classification mould Type.
In the present embodiment, several amount and type of the grader according to corresponding to the classification at different levels that step S410 determines, match somebody with somebody Corresponding multiclass classification model is put, and generates configuration file corresponding with multiclass classification model;Divide whenever present node is got During the output result of class device, by inquiring about above-mentioned configuration file, so that it is determined that the next stage node-classification of present node grader Device.Be stored with multiple configuration items corresponding with each node classifier respectively in the configuration file, specifically, each configuration Classification type, and/or the node-classification that description information of the item comprising corresponding node classifier, the node classifier are adapted to Corresponding relation between every kind of output result of device and its next stage node classifier.Therefore, just can be certainly by the configuration file Most suitable grader is selected in the dynamic multiple graders classified from next stage carries out further sort operation.
In the present embodiment, multiclass classification model is to include the tree-shaped disaggregated model of multistage node classifier, the model bag Multiple different types of node classifiers are included, for example, can be including root node grader, leaf node grader and intermediate node point Class device, wherein, the quantity of leaf node grader and intermediate node grader is usually multiple, for example, it may be a node point Class device only corresponds to an one-one relationship for subclassification;Can also be the multipair of the same subclassification of multiple node classifier correspondences One relation, in the case of many-to-one relationship, further can select different type according to factors such as newsletter archive information types Node classifier carry out the subclassification and be identified;Can also be the one-to-many of the corresponding multiple subclassifications of a node classifier Relation, now the classifying rules of multiple subclassifications is typically similar, therefore can be identified with same node classifier. In addition, the quantity of root node grader is usually one but it is also possible to be multiple different types of root node graders, so that suitable Should be in different newsletter archive information types.The description to multiclass classification model structure is a kind of citing above, not this hair The bright restriction to multiclass classification model structure, those skilled in the art can use other suitable structures according to actual conditions.
Step S430:Training result is evaluated, according to evaluation result to the number of the grader corresponding to classification at different levels Amount and type are modified, and configured multiclass classification model is updated according to modification result.
In order to further improve the accuracy of several amount and type of the grader of step S410 determinations, evaluation can be added to walk Suddenly, i.e. step S430.The training result of step S410 is evaluated according to default checking set, and according to evaluation result pair Several amount and type of the grader corresponding to classification at different levels that step S410 determines are modified, and make identified grader more It is adapted to the classification of level where it.Above-mentioned modification includes the deletion of grader, newly-increased and/or replacement.Wherein, checking set is mark The sub-fraction of data is noted, model training is not involved in, is specifically used to assess the different models for training, which effect is more preferable.
Wherein, step S430 not only can determine suitable grader with additional step S410, can also be transported in subsequent step During row, sample set and the new sorting algorithm for using to increasing newly are constantly attempted, and then attempt knot for every kind of Fruit is evaluated, so that it is determined that more excellent grader.For the specific evaluation method that step S430 is used, the present invention does not do specific Limit, those skilled in the art can flexibly be set according to actual conditions.
In order to enable the method to the newsletter archive information recognition effect being optimal, step S430 can be true to step S410 Fixed grader number amount and type are constantly modified, meanwhile, to configured multiclass classification model and corresponding with model match somebody with somebody File is put to be updated accordingly, and according to the configuration file after renewal, the renewal matched to multiclass classification model.
In order to improve the overall operation efficiency of this method, renewal operations of the step S430 to multiclass classification model can be heat The renewal operation of switching type, you can in the case of not closing system, when new model effect is better than model on line, to pass through The multiclass classification version that the quick more new system of hot-swap operation is used.In order to coordinate the hot-swap to operate, step S420 Can be comprising multiple metadata corresponding from different disaggregated models respectively, each metadata record in the configuration file of generation The path of corresponding disaggregated model and description information (such as version etc.), when disaggregated model updates, synchronized update is corresponding Metadata, therefore can complete to update behaviour automatically according to the content of metadata record when the hot-swap for carrying out model updates operation Make.
Step S440:The newsletter archive information input multiclass classification model to be sorted that will be got is classified, by multistage The output result of disaggregated model is defined as the final classification result of newsletter archive information to be sorted.
Wherein, newsletter archive information to be sorted is generally complete paragraph or article, it is impossible to directly input multiclass classification mould Type is identified, therefore, it is necessary to treating classified news text message carries out series of preprocessing before multiclass classification model is input into Operation, the file type that multiclass classification model can be recognized is converted to by newsletter archive information to be sorted.Common pretreatment behaviour Work can extract the file characteristic word included in newsletter archive information to be sorted, be that the file characteristic word for having extracted assigns correspondence Weight, corresponding document characteristic vector etc. is generated according to the file characteristic word and its weight that have extracted.Wherein, extraction document is special Levy word and assign respective weights rule can with step S410 similar operations it is regular consistent, will not be repeated here.
In addition, in actual applications, the source of newsletter archive information to be sorted is diversified, therefore, it is also desirable to first Treating classified news text message carries out a series of standardization processing, so as to facilitate follow-up pretreatment operation.Common rule Generalized treatment include according to default font set the font treated in classified news text message of rule be adjusted, and/or The vocabulary in classified news text message is treated according to default filtering rule to be filtered.
As can be seen here, a kind of multiclass classification method based on newsletter archive information that the present invention is provided, by building The newsletter archive information classification framework of multi-layer, and different graders are configured in each level, so as to targetedly solve The inaccurate problems of classification results caused by sample data is uneven, and effectively increase the standard of newsletter archive information classification True property, improves newsletter archive information classification efficiency.In addition, this multiclass classification method also carries out news using machine learning algorithm Text message is classified, and is realized to the real-time of disaggregated model by evaluating mechanism and model modification mechanism with hot-swapping function Amendment, enables this method to keep optimal implementation state.Pass through pretreatment operation and standardized operation simultaneously so that this method The newsletter archive information to be sorted of variety classes separate sources can be identified, further increase the adaptation of this method Property, widen the use scope of stupid method.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system Structure be obvious.Additionally, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this hair Bright preferred forms.
In specification mentioned herein, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify one or more that the disclosure and helping understands in each inventive aspect, exist Above to the description of exemplary embodiment of the invention in, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, and wherein each claim is in itself All as separate embodiments of the invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Unit or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, can use any Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power Profit is required, summary and accompanying drawing) disclosed in each feature can the alternative features of or similar purpose identical, equivalent by offer carry out generation Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection is appointed One of meaning mode can be used in any combination.
All parts embodiment of the invention can be realized with hardware, or be run with one or more processor Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor (DSP) realize the multistage based on newsletter archive information according to embodiments of the present invention The some or all functions of some or all parts in categorizing system.The present invention is also implemented as performing here Some or all equipment or program of device of described method are (for example, computer program and computer program are produced Product).It is such to realize that program of the invention be stored on a computer-readable medium, or can have one or more The form of signal.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, or to appoint What other forms is provided.
It should be noted that above-described embodiment the present invention will be described rather than limiting the invention, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol being located between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element listed in the claims or step.Word "a" or "an" before element is not excluded the presence of as multiple Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.
The invention discloses:A1. a kind of multiclass classification system based on newsletter archive information, including:
Training module, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default Training sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Multiclass classification module, for the number of the grader corresponding to the classification described at different levels that are determined according to the training module Amount and type, configure corresponding multiclass classification model;
As a result determining module, is carried out for multiclass classification model described in the newsletter archive information input to be sorted that will get Classification, the output result of the multiclass classification model is defined as the final classification result of the newsletter archive information to be sorted.
A2. the system according to A1, wherein, the system is further included:
Evaluation module, evaluates for the training result to the training module, according to evaluation result to described at different levels Several amount and type of the corresponding grader of classification are modified, and the modification includes:It is the deletion of grader, newly-increased and/or replace Change;
Model modification module, for the modification according to the evaluation module, is carried out more to configured multiclass classification model Newly.
A3. the system according to A2, wherein, the multiclass classification module is further used for:Generation and many fractions The corresponding configuration file of class model, and the model modification module is further used for:The configuration file is updated, root The multiclass classification model is updated according to the configuration file after renewal.
A4. according to any described systems of A1-A3, wherein, the multiclass classification model is to include multistage node classifier Tree-shaped disaggregated model, and the tree-shaped disaggregated model includes multiple different types of node classifiers.
A5. the system according to A4, wherein, the multiclass classification module is further used for:Work as prosthomere whenever getting During the output result of point grader, by inquiring about the configuration file corresponding with the multiclass classification model, determine described current The next stage node classifier of node classifier;
Wherein, it is stored with the configuration file:Multiple configuration items corresponding with each node classifier respectively, each Configuration item includes:Classification type, and/or the section that the description information of corresponding node classifier, the node classifier are adapted to Corresponding relation between every kind of output result and its next stage node classifier of point grader.
A6. according to any described systems of A1-A5, wherein, the training module specifically for:
The training sample set is generated according to the labeled data for getting, the training sample is extracted and is concentrated the training for including Feature Words, are that the training characteristics word for having extracted assigns corresponding weight;
Corresponding training feature vector is generated according to the training characteristics word and its weight for having extracted, according to the training characteristics Vector obtains training result and corresponding grader.
A7. according to any described systems of A1-A6, wherein, the result determining module specifically for:To treating for getting Classified news text message is pre-processed, and by multiclass classification mould described in pretreated newsletter archive information input to be sorted Type is classified;
Wherein, the pretreatment includes:The file characteristic word included in the newsletter archive information to be sorted is extracted, is The file characteristic word of extraction assigns corresponding weight;Corresponding file is generated according to the file characteristic word and its weight for having extracted special Levy vector.
A8. the system according to A7, wherein, the result determining module was further used for before being pre-processed: Rule is set according to default font to be adjusted the font in the newsletter archive information to be sorted, and/or according to default Filtering rule the vocabulary in the newsletter archive information to be sorted is filtered.
A9. according to any described systems of A1-A8, wherein, various machine learning algorithms include it is following at least One:Algorithm of support vector machine, convolutional neural networks algorithm and Recognition with Recurrent Neural Network algorithm.
The invention also discloses:B10. a kind of multiclass classification method based on newsletter archive information, including:
For the classification at different levels of newsletter archive information, default training sample set is carried out by various machine learning algorithms Training, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Several amount and type of the grader according to corresponding to the classification at different levels, configure corresponding multiclass classification model;
Multiclass classification model described in the newsletter archive information input to be sorted that will be got is classified, by many fractions The output result of class model is defined as the final classification result of the newsletter archive information to be sorted.
B11. the method according to B10, wherein, methods described is further included:
Training result is evaluated, according to evaluation result to the quantity and class of the grader corresponding to the classification at different levels Type is modified, and configured multiclass classification model is updated according to modification result;Wherein, the modification includes:Classification The deletion of device, newly-increased and/or replacement.
B12. the method according to B11, wherein, the quantity of the grader according to corresponding to the classification at different levels and The step of type, configuration corresponding multiclass classification model, further includes:Generation is corresponding with the multiclass classification model to match somebody with somebody Put file, and described further include the step of be updated to configured multiclass classification model according to modification result:To institute State configuration file to be updated, the multiclass classification model is updated according to the configuration file after renewal.
B13. according to any described methods of B10-B12, wherein, the multiclass classification model is to include multistage node-classification The tree-shaped disaggregated model of device, and the tree-shaped disaggregated model includes multiple different types of node classifiers.
B14. the method according to B13, wherein, the quantity of the grader according to corresponding to the classification at different levels and The step of type, configuration corresponding multiclass classification model, further includes:Whenever the output knot for getting present node grader During fruit, by inquiring about the configuration file corresponding with the multiclass classification model, the next of the present node grader is determined Level node classifier;
Wherein, it is stored with the configuration file:Multiple configuration items corresponding with each node classifier respectively, each Configuration item includes:Classification type, and/or the section that the description information of corresponding node classifier, the node classifier are adapted to Corresponding relation between every kind of output result and its next stage node classifier of point grader.
B15. according to any described methods of B10-B14, wherein, the classification at different levels for newsletter archive information are led to Cross various machine learning algorithms to be trained default training sample set, according to corresponding to training result determines classification at different levels The step of several amount and type of grader, specifically includes:
The training sample set is generated according to the labeled data for getting, the training sample is extracted and is concentrated the training for including Feature Words, are that the training characteristics word for having extracted assigns corresponding weight;
Corresponding training feature vector is generated according to the training characteristics word and its weight for having extracted, according to the training characteristics Vector obtains training result and corresponding grader.
B16. according to any described methods of B10-B15, wherein, the newsletter archive information to be sorted that will be got is defeated Enter the multiclass classification model to be classified, the output result of the multiclass classification model is defined as the news text to be sorted The step of final classification result of this information, specifically includes:
Newsletter archive information to be sorted to getting is pre-processed, and pretreated newsletter archive to be sorted is believed The breath input multiclass classification model is classified;
Wherein, the pretreatment includes:The file characteristic word included in the newsletter archive information to be sorted is extracted, is The file characteristic word of extraction assigns corresponding weight;Corresponding file is generated according to the file characteristic word and its weight for having extracted special Levy vector.
B17. the method according to B16, wherein, further included before the pretreatment:According to default font Rule is set to be adjusted, and/or according to default filtering rule to institute the font in the newsletter archive information to be sorted The vocabulary stated in newsletter archive information to be sorted is filtered.
B18. according to any described methods of B10-B17, wherein, various machine learning algorithms include it is following in extremely It is few one:Algorithm of support vector machine, convolutional neural networks algorithm and Recognition with Recurrent Neural Network algorithm.

Claims (10)

1. a kind of multiclass classification system based on newsletter archive information, including:
Training module, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default training Sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Multiclass classification module, quantity for the grader corresponding to the classification described at different levels that are determined according to the training module and Type, configures corresponding multiclass classification model;
As a result determining module, is divided for multiclass classification model described in the newsletter archive information input to be sorted that will get Class, the output result of the multiclass classification model is defined as the final classification result of the newsletter archive information to be sorted.
2. system according to claim 1, wherein, the system is further included:
Evaluation module, evaluates for the training result to the training module, according to evaluation result to the classification at different levels Several amount and type of corresponding grader are modified, and the modification includes:The deletion of grader, newly-increased and/or replacement;
Model modification module, for the modification according to the evaluation module, is updated to configured multiclass classification model.
3. system according to claim 2, wherein, the multiclass classification module is further used for:Generation and the multistage The corresponding configuration file of disaggregated model, and the model modification module is further used for:The configuration file is updated, The multiclass classification model is updated according to the configuration file after renewal.
4. according to any described systems of claim 1-3, wherein, the multiclass classification model is to include multistage node classifier Tree-shaped disaggregated model, and the tree-shaped disaggregated model includes multiple different types of node classifiers.
5. system according to claim 4, wherein, the multiclass classification module is further used for:It is current whenever getting During the output result of node classifier, by inquiring about the configuration file corresponding with the multiclass classification model, it is determined that described work as The next stage node classifier of front nodal point grader;
Wherein, it is stored with the configuration file:Multiple configuration items corresponding with each node classifier respectively, each configuration Item includes:Classification type, and/or the node point that the description information of corresponding node classifier, the node classifier are adapted to Corresponding relation between every kind of output result of class device and its next stage node classifier.
6. according to any described systems of claim 1-5, wherein, the training module specifically for:
The training sample set is generated according to the labeled data for getting, the training sample is extracted and is concentrated the training characteristics for including Word, is that the training characteristics word for having extracted assigns corresponding weight;
Corresponding training feature vector is generated according to the training characteristics word and its weight for having extracted, according to the training feature vector Obtain training result and corresponding grader.
7. according to any described systems of claim 1-6, wherein, the result determining module specifically for:To what is got Newsletter archive information to be sorted is pre-processed, and by multiclass classification described in pretreated newsletter archive information input to be sorted Model is classified;
Wherein, the pretreatment includes:The file characteristic word included in the newsletter archive information to be sorted is extracted, is to have extracted File characteristic word assign corresponding weight;According to the file characteristic word and its weight that have extracted generate corresponding file characteristic to Amount.
8. system according to claim 7, wherein, the result determining module was further used before being pre-processed In:Rule is set according to default font to be adjusted, and/or according to pre- font in the newsletter archive information to be sorted If filtering rule the vocabulary in the newsletter archive information to be sorted is filtered.
9. according to any described systems of claim 1-8, wherein, various machine learning algorithms include it is following at least One:Algorithm of support vector machine, convolutional neural networks algorithm and Recognition with Recurrent Neural Network algorithm.
10. a kind of multiclass classification method based on newsletter archive information, including:
For the classification at different levels of newsletter archive information, default training sample set is instructed by various machine learning algorithms Practice, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Several amount and type of the grader according to corresponding to the classification at different levels, configure corresponding multiclass classification model;
Multiclass classification model described in the newsletter archive information input to be sorted that will be got is classified, by the multiclass classification mould The output result of type is defined as the final classification result of the newsletter archive information to be sorted.
CN201710103541.0A 2017-02-24 2017-02-24 Multi-level classification system and method based on news text information Active CN106909654B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710103541.0A CN106909654B (en) 2017-02-24 2017-02-24 Multi-level classification system and method based on news text information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710103541.0A CN106909654B (en) 2017-02-24 2017-02-24 Multi-level classification system and method based on news text information

Publications (2)

Publication Number Publication Date
CN106909654A true CN106909654A (en) 2017-06-30
CN106909654B CN106909654B (en) 2020-07-21

Family

ID=59208413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710103541.0A Active CN106909654B (en) 2017-02-24 2017-02-24 Multi-level classification system and method based on news text information

Country Status (1)

Country Link
CN (1) CN106909654B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402994A (en) * 2017-07-17 2017-11-28 广州特道信息科技有限公司 A kind of sorting technique and device of multi-component system distinguishing hierarchy
CN107562880A (en) * 2017-09-01 2018-01-09 北京神州泰岳软件股份有限公司 A kind of classification results screening technique and device based on multistage classifier
CN107943940A (en) * 2017-11-23 2018-04-20 网易(杭州)网络有限公司 Data processing method, medium, system and electronic equipment
CN108073677A (en) * 2017-11-02 2018-05-25 中国科学院信息工程研究所 A kind of multistage text multi-tag sorting technique and system based on artificial intelligence
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN109189950A (en) * 2018-09-03 2019-01-11 腾讯科技(深圳)有限公司 Multimedia resource classification method, device, computer equipment and storage medium
CN109471938A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal
CN109960725A (en) * 2019-01-17 2019-07-02 平安科技(深圳)有限公司 Text classification processing method, device and computer equipment based on emotion
CN110019776A (en) * 2017-09-05 2019-07-16 腾讯科技(北京)有限公司 Article classification method and device, storage medium
CN110442725A (en) * 2019-08-14 2019-11-12 科大讯飞股份有限公司 Entity relation extraction method and device
WO2019214133A1 (en) * 2018-05-08 2019-11-14 华南理工大学 Method for automatically categorizing large-scale customer complaint data
CN110597985A (en) * 2019-08-15 2019-12-20 重庆金融资产交易所有限责任公司 Data classification method, device, terminal and medium based on data analysis
CN110633366A (en) * 2019-07-31 2019-12-31 国家计算机网络与信息安全管理中心 Short text classification method, device and storage medium
CN110781292A (en) * 2018-07-25 2020-02-11 百度在线网络技术(北京)有限公司 Text data multi-level classification method and device, electronic equipment and storage medium
CN111625644A (en) * 2020-04-14 2020-09-04 北京捷通华声科技股份有限公司 Text classification method and device
CN111753197A (en) * 2020-06-18 2020-10-09 达而观信息科技(上海)有限公司 News element extraction method and device, computer equipment and storage medium
CN112052331A (en) * 2019-06-06 2020-12-08 武汉Tcl集团工业研究院有限公司 Method and terminal for processing text information
CN113139558A (en) * 2020-01-16 2021-07-20 北京京东振世信息技术有限公司 Method and apparatus for determining a multi-level classification label for an article
CN113254645A (en) * 2021-06-08 2021-08-13 南京冰鉴信息科技有限公司 Text classification method and device, computer equipment and readable storage medium
WO2022116438A1 (en) * 2020-12-01 2022-06-09 平安科技(深圳)有限公司 Customer service violation quality inspection method and apparatus, computer device, and storage medium
CN116777400A (en) * 2023-08-21 2023-09-19 江苏海外集团国际工程咨询有限公司 Engineering consultation information whole-flow management system and method based on deep learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation
CN102117411A (en) * 2009-12-30 2011-07-06 日电(中国)有限公司 Method and system for constructing multi-level classification model
CN102193928A (en) * 2010-03-08 2011-09-21 三星电子(中国)研发中心 Method for matching lightweight ontologies based on multilayer text categorizer
CN103324758A (en) * 2013-07-10 2013-09-25 苏州大学 News classifying method and system
CN103426007A (en) * 2013-08-29 2013-12-04 人民搜索网络股份公司 Machine learning classification method and device
CN103778569A (en) * 2014-02-13 2014-05-07 上海交通大学 Distributed generation island detection method based on meta learning
CN104978328A (en) * 2014-04-03 2015-10-14 北京奇虎科技有限公司 Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device
CN106453033A (en) * 2016-08-31 2017-02-22 电子科技大学 Multilevel Email classification method based on Email content

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation
CN102117411A (en) * 2009-12-30 2011-07-06 日电(中国)有限公司 Method and system for constructing multi-level classification model
CN102193928A (en) * 2010-03-08 2011-09-21 三星电子(中国)研发中心 Method for matching lightweight ontologies based on multilayer text categorizer
CN103324758A (en) * 2013-07-10 2013-09-25 苏州大学 News classifying method and system
CN103426007A (en) * 2013-08-29 2013-12-04 人民搜索网络股份公司 Machine learning classification method and device
CN103778569A (en) * 2014-02-13 2014-05-07 上海交通大学 Distributed generation island detection method based on meta learning
CN104978328A (en) * 2014-04-03 2015-10-14 北京奇虎科技有限公司 Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device
CN106453033A (en) * 2016-08-31 2017-02-22 电子科技大学 Multilevel Email classification method based on Email content

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EASON.WXD: "系统学习机器学习之组合多分类器", 《CSDN博客HTTPS://BLOG.CSDN.NET/APP_12062011/ARTICLE/DETAILS/50424776》 *
王爱华等: "基于Boost和信任函数的多文本分类器组合模型", 《计算机工程与应用》 *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107402994A (en) * 2017-07-17 2017-11-28 广州特道信息科技有限公司 A kind of sorting technique and device of multi-component system distinguishing hierarchy
CN107562880A (en) * 2017-09-01 2018-01-09 北京神州泰岳软件股份有限公司 A kind of classification results screening technique and device based on multistage classifier
CN110019776B (en) * 2017-09-05 2023-04-28 腾讯科技(北京)有限公司 Article classification method and device and storage medium
CN110019776A (en) * 2017-09-05 2019-07-16 腾讯科技(北京)有限公司 Article classification method and device, storage medium
CN108073677A (en) * 2017-11-02 2018-05-25 中国科学院信息工程研究所 A kind of multistage text multi-tag sorting technique and system based on artificial intelligence
CN108073677B (en) * 2017-11-02 2021-12-28 中国科学院信息工程研究所 Multi-level text multi-label classification method and system based on artificial intelligence
CN107943940A (en) * 2017-11-23 2018-04-20 网易(杭州)网络有限公司 Data processing method, medium, system and electronic equipment
WO2019214133A1 (en) * 2018-05-08 2019-11-14 华南理工大学 Method for automatically categorizing large-scale customer complaint data
CN110781292A (en) * 2018-07-25 2020-02-11 百度在线网络技术(北京)有限公司 Text data multi-level classification method and device, electronic equipment and storage medium
CN109165380A (en) * 2018-07-26 2019-01-08 咪咕数字传媒有限公司 A kind of neural network model training method and device, text label determine method and device
CN109165380B (en) * 2018-07-26 2022-07-01 咪咕数字传媒有限公司 Neural network model training method and device and text label determining method and device
CN109189950B (en) * 2018-09-03 2023-04-07 腾讯科技(深圳)有限公司 Multimedia resource classification method and device, computer equipment and storage medium
CN109189950A (en) * 2018-09-03 2019-01-11 腾讯科技(深圳)有限公司 Multimedia resource classification method, device, computer equipment and storage medium
CN109471938B (en) * 2018-10-11 2023-06-16 平安科技(深圳)有限公司 Text classification method and terminal
CN109471938A (en) * 2018-10-11 2019-03-15 平安科技(深圳)有限公司 A kind of file classification method and terminal
CN109960725A (en) * 2019-01-17 2019-07-02 平安科技(深圳)有限公司 Text classification processing method, device and computer equipment based on emotion
CN112052331A (en) * 2019-06-06 2020-12-08 武汉Tcl集团工业研究院有限公司 Method and terminal for processing text information
CN110633366A (en) * 2019-07-31 2019-12-31 国家计算机网络与信息安全管理中心 Short text classification method, device and storage medium
CN110633366B (en) * 2019-07-31 2022-12-16 国家计算机网络与信息安全管理中心 Short text classification method, device and storage medium
CN110442725A (en) * 2019-08-14 2019-11-12 科大讯飞股份有限公司 Entity relation extraction method and device
CN110442725B (en) * 2019-08-14 2022-02-25 科大讯飞股份有限公司 Entity relationship extraction method and device
CN110597985A (en) * 2019-08-15 2019-12-20 重庆金融资产交易所有限责任公司 Data classification method, device, terminal and medium based on data analysis
CN113139558A (en) * 2020-01-16 2021-07-20 北京京东振世信息技术有限公司 Method and apparatus for determining a multi-level classification label for an article
CN113139558B (en) * 2020-01-16 2023-09-05 北京京东振世信息技术有限公司 Method and device for determining multi-stage classification labels of articles
CN111625644A (en) * 2020-04-14 2020-09-04 北京捷通华声科技股份有限公司 Text classification method and device
CN111625644B (en) * 2020-04-14 2023-09-12 北京捷通华声科技股份有限公司 Text classification method and device
CN111753197A (en) * 2020-06-18 2020-10-09 达而观信息科技(上海)有限公司 News element extraction method and device, computer equipment and storage medium
CN111753197B (en) * 2020-06-18 2024-04-05 达观数据有限公司 News element extraction method, device, computer equipment and storage medium
WO2022116438A1 (en) * 2020-12-01 2022-06-09 平安科技(深圳)有限公司 Customer service violation quality inspection method and apparatus, computer device, and storage medium
CN113254645A (en) * 2021-06-08 2021-08-13 南京冰鉴信息科技有限公司 Text classification method and device, computer equipment and readable storage medium
CN116777400A (en) * 2023-08-21 2023-09-19 江苏海外集团国际工程咨询有限公司 Engineering consultation information whole-flow management system and method based on deep learning
CN116777400B (en) * 2023-08-21 2023-10-31 江苏海外集团国际工程咨询有限公司 Engineering consultation information whole-flow management system and method based on deep learning

Also Published As

Publication number Publication date
CN106909654B (en) 2020-07-21

Similar Documents

Publication Publication Date Title
CN106909654A (en) A kind of multiclass classification system and method based on newsletter archive information
Halibas et al. Application of text classification and clustering of Twitter data for business analytics
CN106407211B (en) The method and apparatus classified to the semantic relation of entity word
CN107577739B (en) Semi-supervised domain word mining and classifying method and equipment
Bazhenova et al. Discovering decision models from event logs
CN104978328A (en) Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device
CN105975491A (en) Enterprise news analysis method and system
CN106934008B (en) Junk information identification method and device
CN106445908A (en) Text identification method and apparatus
WO2018134248A1 (en) Classifying data
CN107748898A (en) File classifying method, device, computing device and computer-readable storage medium
CN106997367A (en) Sorting technique, sorter and the categorizing system of program file
CN109598307A (en) Data screening method, apparatus, server and storage medium
CN105512195A (en) Auxiliary method for analyzing and making decisions of product FMECA report
CN108304509A (en) A kind of comment spam filter method for indicating mutually to learn based on the multidirectional amount of text
CN105045913A (en) Text classification method based on WordNet and latent semantic analysis
CN106096413A (en) A kind of malicious code detecting method based on multi-feature fusion and system
CN107861945A (en) Finance data analysis method, application server and computer-readable recording medium
CN102411592B (en) Text classification method and device
CN110807653A (en) Method and device for screening users and electronic equipment
CN104504334A (en) System and method used for evaluating selectivity of classification rules
CN103246686A (en) Method and device for text classification, and method and device for characteristic processing of text classification
Krenn et al. Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network
CN110019563B (en) Portrait modeling method and device based on multi-dimensional data
CN106611189A (en) Method for constructing integrated classifier of standardized multi-dimensional cost sensitive decision-making tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100089 710, 7 / F, building 1, zone 1, No.3, Xisanhuan North Road, Haidian District, Beijing

Patentee after: Beijing time Ltd.

Address before: 100089 710, 7 / F, building 1, zone 1, No.3, Xisanhuan North Road, Haidian District, Beijing

Patentee before: BEIJING TIME Co.,Ltd.

CP01 Change in the name or title of a patent holder