CN106909654A - A kind of multiclass classification system and method based on newsletter archive information - Google Patents
A kind of multiclass classification system and method based on newsletter archive information Download PDFInfo
- Publication number
- CN106909654A CN106909654A CN201710103541.0A CN201710103541A CN106909654A CN 106909654 A CN106909654 A CN 106909654A CN 201710103541 A CN201710103541 A CN 201710103541A CN 106909654 A CN106909654 A CN 106909654A
- Authority
- CN
- China
- Prior art keywords
- classification
- archive information
- newsletter archive
- training
- multiclass
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of multiclass classification system and method based on newsletter archive information, it is related to document classification technical field.Wherein, the system includes:Training module, for the classification at different levels for newsletter archive information, is trained by various machine learning algorithms to default training sample set, several amount and type of the grader according to corresponding to training result determines classification at different levels;Multiclass classification module, for several amount and type of the grader according to corresponding to the classification at different levels that training module determines, configures corresponding multiclass classification model;As a result determining module, the newsletter archive information input multiclass classification model to be sorted for that will get is classified, and the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.As can be seen here, the present invention targetedly solves the problems, such as that the uneven caused classification results of sample data are inaccurate, and effectively increases the accuracy of classification, improves classification effectiveness.
Description
Technical field
The present invention relates to document classification technical field, and in particular to a kind of multiclass classification system based on newsletter archive information
And method.
Background technology
With the development of Internet era, Internet resources increasingly enrich, and species is also more and more.In order to effectively inspection
Above-mentioned Internet resources accurately and comprehensively classify being particularly important by the various resources on rope and utilization network.With
Machine learning algorithm has been applied to newsletter archive information classification by the appearance and development of machine learning algorithm, increasing people
In method.
But, inventor realize it is of the invention during, find at least there are the following problems in the prior art:Permitted
Under many concrete application scenes because it is various the reason for, it may appear that sample data is distributed unbalanced situation.Running into injustice
Weighing apparatus data when, in the prior art using machine learning algorithm realize regardless of level newsletter archive information classification approach can because
The imbalance of sample data, causes machine learning algorithm to pay close attention to many several classes of samples too much, and makes minority class sample accurate
Really identification, so as to reduce the accuracy rate of these newsletter archive information classification approach on the whole.
The content of the invention
In view of the above problems, it is proposed that the present invention so as to provide one kind overcome above mentioned problem or at least in part solve on
State the multiclass classification system and corresponding method based on newsletter archive information of problem.
According to an aspect of the invention, there is provided a kind of multiclass classification system based on newsletter archive information, including:Instruction
Practice module, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default training sample set
It is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels;Multiclass classification module, is used for
According to several amount and type of the grader corresponding to the classification at different levels that training module determines, corresponding multiclass classification model is configured;
As a result determining module, the newsletter archive information input multiclass classification model to be sorted for that will get is classified, by multistage
The output result of disaggregated model is defined as the final classification result of newsletter archive information to be sorted.
According to another aspect of the present invention, there is provided a kind of multiclass classification method based on newsletter archive information, including:Pin
Classification at different levels to newsletter archive information, are trained by various machine learning algorithms to default training sample set, according to
Training result determines several amount and type of the grader corresponding to classification at different levels;The number of the grader according to corresponding to classification at different levels
Amount and type, configure corresponding multiclass classification model;The newsletter archive information input multiclass classification model to be sorted that will be got
Classified, the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.
As can be seen here, the invention provides a kind of multiclass classification system and method based on newsletter archive information, by structure
A newsletter archive information classifying system framework for multi-layer is built, and is configured not according to newsletter archive information type in each level
Same multistage classifier, targetedly solves the problems, such as that the uneven caused classification results of sample data are inaccurate, and
The accuracy of newsletter archive information classification is effectively increased, newsletter archive information classification efficiency is improved.
Described above is only the general introduction of technical solution of the present invention, in order to better understand technological means of the invention,
And can be practiced according to the content of specification, and in order to allow the above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by specific embodiment of the invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 is a kind of structural representation of multiclass classification system based on newsletter archive information that the embodiment of the present invention one is provided
Figure;
Fig. 2 is a kind of structural representation of multiclass classification system based on newsletter archive information that the embodiment of the present invention two is provided
Figure;
Fig. 3 is a kind of flow chart of multiclass classification method based on newsletter archive information that the embodiment of the present invention three is provided;
Fig. 4 is a kind of flow chart of multiclass classification method based on newsletter archive information that the embodiment of the present invention four is provided;
Fig. 5 is a kind of workflow of multiclass classification system based on newsletter archive information that the embodiment of the present invention two is provided
Figure.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.Conversely, there is provided these embodiments are able to be best understood from the disclosure, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
The invention provides a kind of multiclass classification system and method based on newsletter archive information, at least can solve the problem that existing
Because of the technical problem that newsletter archive information classification caused by data nonbalance is inaccurate in technology.
Embodiment one
Fig. 1 shows a kind of multiclass classification system based on newsletter archive information of present invention offer, and the system includes:Instruction
Practice module 110, multiclass classification module 120 and result determining module 130.
Training module 110, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default
Training sample set be trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
During newsletter archive information classification, according to the newsletter archive information content, different newsletter archives can be believed
Breath is included into different classes of.In order that the classification of newsletter archive information is accurate and fine, can be using the taxonomic hierarchies of multi-layer.Should
The taxonomic hierarchies of multi-layer can be according to the level of abstraction of classification is incremented by successively, or level of abstraction according to classification
Successively decrease successively.Classify for convenience, and meet custom, the three-level classified body that the present embodiment is successively decreased successively using level of abstraction
System, for example, the category of " League Matches of England Premier League " one word is " physical culture ", two grades of classifications are " international soccer ", and three-level classification is " England Premier League
League matches ".Level and classification foundation for taxonomic hierarchies, the present invention are not specifically limited, and those skilled in the art can basis
Actual conditions flexibly set.
The problem of data nonbalance is frequently encountered during newsletter archive information classification, if classified only with one kind
When algorithm carries out the classification of total data, can be because the characteristic of sorting algorithm itself causes the sorting algorithm to pay close attention to sample too much
In a part of data, and another part data is accurately identified, so as to reduce categorizing system classification on the whole
Accuracy.In order to overcome above mentioned problem, a kind of newsletter archive information classifying system of multi-layer, Er Qieben are present embodiments provided
Corresponding grader is set on each node of embodiment each level in systems, these graders include but do not limit
In:Root node grader, leaf node grader and intermediate node grader.In a particular application, the classification on each node
Device can use identical sorting algorithm, it would however also be possible to employ different sorting algorithm, it is preferable that be the difference according to different levels
Data characteristic corresponding to node selects different sorting algorithms.
Specifically, it is necessary to each node for being directed to each level pre-sets accordingly in the scheme that the present embodiment is provided
Training sample set, the data that each training sample is concentrated should contain the whole of corresponding node categorical data or at least
Most feature.Training module 110 will be instructed by various sorting algorithms to the corresponding training sample set of each node
Practice, and for each node selects optimal sorting algorithm, so that it is determined that the quantity and class of the corresponding grader of classification at different levels
Type.
In order to further improve the classification accuracy of multiclass classification system, in the present embodiment, various sorting algorithms are preferred
It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural
Network algorithm, Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention is adopted to node
Specific machine learning algorithm is not specifically limited, and those skilled in the art can be set according to practical application effect.
Multiclass classification module 120, for several amount and type of the grader according to corresponding to classification at different levels, configuration is corresponding
Multiclass classification model.
The multiclass classification model is a mixed model for containing many algorithms, it comprises on all nodes in system
The different classifications algorithm that is used of grader, and the annexation between each grader have recorded by configuration file.
In the present embodiment, the quantity of grader of the multiclass classification module 120 according to corresponding to the classification at different levels that training module 110 determines
And type, corresponding multiclass classification model is configured, and generate the configuration file for recording each node classifier information;When to be sorted
After newsletter archive information input multiclass classification model, multiclass classification module 120 can be according to the present node grader for getting
Output result, inquires about above-mentioned configuration file, to determine the next stage node classifier of present node grader.The multiclass classification mould
Type preferably comprises the tree-shaped disaggregated model of multistage node classifier.
As a result determining module 130, the newsletter archive information input multiclass classification model to be sorted for that will get is carried out
Classification, the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.
Specifically, the newsletter archive information input to be sorted that as a result determining module 130 will get is to multiclass classification module
In multiclass classification model in 120, multiclass classification module 120 can be entered according to built-in grader to the newsletter archive information to be sorted
Row identification classification, and classification results are passed into result determining module 130, as a result determining module 130 is according to multiclass classification module
The classification results of 120 outputs determine the final classification result of the newsletter archive information to be sorted.
As can be seen here, a kind of multiclass classification system based on newsletter archive information that the present invention is provided, by building
The newsletter archive information classifying system framework of multi-layer, and different graders are configured in each level, targetedly solve
The inaccurate problems of classification results caused by sample data is uneven, and effectively increase the standard of newsletter archive information classification
True property, improves newsletter archive information classification efficiency.
Embodiment two
Fig. 2 shows a kind of multiclass classification system based on newsletter archive information of present invention offer, and the system includes:Instruction
Practice module 210, evaluation module 220, multiclass classification module 230, model modification module 240 and result determining module 250.
Training module 210, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default
Training sample set be trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
Specifically, training module 210 generates training sample set according to the labeled data for getting, and extracts training sample set
In the training characteristics word that includes, and be that the training characteristics word for having extracted assigns corresponding weight;Then training module 210 further according to
The training characteristics word and its weight for having extracted generate corresponding training feature vector, and training knot is obtained according to the training feature vector
Fruit and corresponding grader.Wherein it is possible to be trained the extraction of Feature Words according to default dictionary, it is also possible to according to other
Rule is trained the extraction of Feature Words, and the present invention is not especially limited to this.For being assigned for the training characteristics word for having extracted
The specific method of weight, the present invention is also not specifically limited, and those skilled in the art can flexibly be set.For example, when to be sorted
When newsletter archive information is text-only file, TF-IDF (Term Frequency-Inverse Document can be used
Frequency, i.e. word frequency-reverse document-frequency) algorithm assigns corresponding weight to the training characteristics word for extracting.
In order to further improve the classification accuracy of multiclass classification system, in the present embodiment, various sorting algorithms are preferred
It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural
Network algorithm and Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention to section
The specific machine learning algorithm that point is used is not specifically limited, and those skilled in the art can be set according to practical application effect
It is fixed.
Evaluation module 220, evaluates for the training result to training module 210, according to evaluation result to each fraction
Several amount and type of the grader corresponding to class are modified.
In order to further improve the accuracy of several amount and type of the grader of the determination of training module 210, can add and comment
Valency module 220.Evaluation module 220 is evaluated the training result of training module 210 according to default checking set, and according to
Several amount and type of the grader corresponding to classification at different levels that evaluation result determines to training module 210 are modified, make really
Fixed grader is more suitable for the classification of level where it.Above-mentioned modification includes the deletion of grader, newly-increased and/or replacement.Its
In, checking set is the sub-fraction of labeled data, is not involved in model training, is specifically used to assess the different models for training,
Which effect is more preferable.
Wherein, evaluation module 220 not only can determine suitable grader with supplemental training module 210, can also be follow-up
In module running, sample set and the new sorting algorithm for using to increasing newly are constantly attempted, and then for every kind of
Attempt result to be evaluated, so that it is determined that more excellent grader.For the specific evaluation method that evaluation module 220 is used, this hair
Bright to be not specifically limited, those skilled in the art can flexibly be set according to actual conditions.
Multiclass classification module 230, for the grader according to corresponding to the classification at different levels that training module determines quantity and
Type, configures corresponding multiclass classification model.
In the present embodiment, classification of the multiclass classification module 230 according to corresponding to the classification at different levels that training module 210 determines
Several amount and type of device, configure corresponding multiclass classification model, and generate configuration file corresponding with multiclass classification model;Whenever
When getting the output result of present node grader, multiclass classification module 230 by inquiring about above-mentioned configuration file, so that it is determined that
The next stage node classifier of present node grader.It is stored with the configuration file corresponding with each node classifier respectively
Multiple configuration items, specifically, each configuration item comprising corresponding node classifier description information, the node classifier institute
It is corresponding between every kind of output result of the classification type of adaptation, and/or the node classifier and its next stage node classifier
Relation.Therefore, multiclass classification module 230 can just be selected by the configuration file in the automatic multiple graders classified from next stage
Most suitable grader carries out further sort operation.
In the present embodiment, multiclass classification model is to include the tree-shaped disaggregated model of multistage node classifier, the model bag
Multiple different types of node classifiers are included, for example, can be including root node grader, leaf node grader and intermediate node point
Class device, wherein, the quantity of leaf node grader and intermediate node grader is usually multiple, for example, it may be a node point
Class device only corresponds to an one-one relationship for subclassification;Can also be the multipair of the same subclassification of multiple node classifier correspondences
One relation, in the case of many-to-one relationship, further can select different type according to factors such as newsletter archive information types
Node classifier carry out the subclassification and be identified;Can also be the one-to-many of the corresponding multiple subclassifications of a node classifier
Relation, now the classifying rules of multiple subclassifications is typically similar, therefore can be identified with same node classifier.
In addition, the quantity of root node grader is usually one but it is also possible to be multiple different types of root node graders, so that suitable
Should be in different newsletter archive information types.The description to multiclass classification model structure is a kind of citing above, not this hair
The bright restriction to multiclass classification model structure, those skilled in the art can use other suitable structures according to actual conditions.
Model modification module 240, for the modification according to evaluation module 220, is carried out to configured multiclass classification model
Update.
In order to enable a system to the newsletter archive information recognition effect being optimal, evaluation module 220 can be to training module
The 210 grader number amount and type for determining constantly are modified, therefore, model modification module 240 is repaiied according to evaluation module 220
Change, configured multiclass classification model is updated, at the same time, model modification module 240 is also needed to multiclass classification mould
The configuration file of the generation of block 230 is updated accordingly, and according to the configuration file after renewal, phase is carried out to multiclass classification model
The renewal of matching.
In order to improve the overall operation efficiency of the system, renewal of the model modification module 240 to multiclass classification model is operated
Can be the renewal operation of hot-swap type, you can with the case of not closing system, the modification knot according to evaluation module 220
Really, when new model effect is better than model on line, the multiclass classification model that quickly more new system is used is operated by hot-swap
Species.In order to the hot-swap for coordinating model update module 240 is operated, can be with the configuration file of the generation of multiclass classification module 230
Comprising multiple metadata corresponding from different disaggregated models respectively, each metadata record path of corresponding disaggregated model
With description information (such as version etc.), the corresponding metadata of synchronized update when disaggregated model updates, model modification module
240 can complete to update operation when the hot-swap for carrying out model updates operation automatically according to the content of metadata record.
As a result determining module 250, the newsletter archive information input multiclass classification model to be sorted for that will get is carried out
Classification, the output result of multiclass classification model is defined as the final classification result of newsletter archive information to be sorted.
Wherein, newsletter archive information to be sorted is generally complete paragraph or article, it is impossible to directly input multiclass classification mould
Type is identified, therefore before multiclass classification model is input into, as a result determining module 250 needs to treat classified news text message and
Row series of preprocessing is operated, and newsletter archive information to be sorted is converted into the file type that multiclass classification model can be recognized.
Common pretreatment operation can extract the file characteristic word included in newsletter archive information to be sorted, be the file for having extracted
Feature Words assign corresponding weight, and corresponding document characteristic vector etc. is generated according to the file characteristic word and its weight for having extracted.
Wherein, extraction document Feature Words and assign respective weights rule can with training module 210 similar operations it is regular consistent,
Will not be repeated here.
In addition, in actual applications, the source of newsletter archive information to be sorted is diversified, therefore, as a result determine
Module 250 is also needed to first to treat classified news text message and carries out a series of standardization processing, so as to facilitate follow-up pre- place
Reason operation.Common standardization processing includes setting the regular font treated in classified news text message according to default font
It is adjusted, and/or treats the vocabulary in classified news text message according to default filtering rule and is filtered.
As described above, in a kind of multiclass classification system based on newsletter archive information that the present invention is provided, each node
The grader of variety classes and quantity can be set, it is possible thereby to type according to newsletter archive information to be sorted and interior have
Grader is pointedly set.For example, when newsletter archive information to be sorted is text type, correspondence grader can be set and adopted
With NB Algorithm etc. suitable for text classification algorithm;When newsletter archive information to be sorted is picture/mb-type, can be with
The algorithm that corresponding grader uses deep learning algorithm etc. suitable for picture classification is set.As can be seen here, by different sections
Point sets variety classes and the grader of quantity and various types of newsletter archive information to be sorted can be carried out targetedly
Identification so that newsletter archive information final classification result is more accurate.For example, when newsletter archive packet to be sorted contains picture category
During type, the pictorial information included in newsletter archive information can be first obtained;Then, determined by default picture classification model
The picture classification result corresponding with pictorial information;Finally, generated according to picture classification result corresponding with newsletter archive information
Document characteristic vector, and the news corresponding with document characteristic vector is determined by default newsletter archive information classification model
Text message classification results.When the newsletter archive information comprising picture is processed by this kind of mode, can rapidly and accurately to figure
Piece is quantified, and the picture that data volume is huge and form is changeable is quantified as into corresponding picture classification result, due to the picture point
Class result has that data volume is small, processing speed fast and many advantages such as good classification effect, therefore, using the picture classification result
Also possess many advantages such as processing speed is fast, classification results are accurate when determining newsletter archive information type.
A kind of work of multiclass classification system based on newsletter archive information of present invention offer is be provided for convenience
Make flow, the workflow of system is described in detail with reference to Fig. 5:The multiclass classification system that the present invention is provided substantially may be used
To be divided into two parts, i.e. " training part " and " predicted portions ", wherein, " training part " is used for structure and the amendment of model,
" predicted portions " are identified and classify for treating classified news text message using the disaggregated model for building.Just " training
For part ", specifically, first by preprepared mark document input system, the training module of system can be from mark
Labeled data is obtained in document, training sample set is generated using the labeled data, and concentrate extraction training special from the training sample
Word is levied to store in corresponding dictionary;Then, training module carries out model training using training sample set and dictionary, so as to obtain
Different disaggregated models and metadata corresponding with each disaggregated model and dictionary;Afterwards, reality of the evaluation module according to model
Applicable cases are evaluated and select most suitable disaggregated model to be grasped for the identification of specific newsletter archive information to be sorted and classification
Make.For " predicted portions ", specifically, first by newsletter archive information input system to be sorted, as a result determining module can be right
Newsletter archive information to be sorted is pre-processed, and pretreated newsletter archive information to be sorted is sent into multiclass classification mould
Block;The multiclass classification algorithm included in the multiclass classification model that multiclass classification module can be selected according to it is (such as shown in figure
First-level class algorithm, secondary classification algorithm and three-level sorting algorithm) treat classified news text message and be identified and classify, and
Classification results are sent to result determining module, at the same time, model modification module can also be according to the modification of evaluation module, to many
The multiclass classification model that level sort module is used carries out hot-swap, so that system keeps optimal working condition;Finally, as a result really
The output result of the multiclass classification model that cover half block sends multiclass classification module is defined as newsletter archive information to be sorted most
Whole classification results.
As can be seen here, a kind of multiclass classification system based on newsletter archive information that the present invention is provided, by building
The newsletter archive information classifying system framework of multi-layer, and different graders are configured in each level, so that targetedly
Solve the problems, such as that the uneven caused classification results of sample data are inaccurate, and effectively increase newsletter archive information classification
Accuracy, improve newsletter archive information classification efficiency.In addition, this multiclass classification system is also carried out using machine learning algorithm
Newsletter archive information classification, and realized to the real-time of system by evaluating mechanism and model modification mechanism with hot-swapping function
Amendment, enables a system to keep optimal working condition.Pass through pretreatment operation and standardized operation simultaneously so that system can
Newsletter archive information to be sorted to variety classes separate sources is identified, and further increases the adaptability of system, widens
The use scope of system.
Embodiment three
Fig. 3 shows a kind of multiclass classification method based on newsletter archive information of present invention offer, and the method includes:
Step S310:For the classification at different levels of newsletter archive information, by various machine learning algorithms to default training
Sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
Specifically, it is necessary to each node for being directed to each level pre-sets accordingly in the scheme that the present embodiment is provided
Training sample set, the data that each training sample is concentrated should contain the whole of corresponding node categorical data or at least
Most feature, is then trained, and be every by various sorting algorithms to the corresponding training sample set of each node
One node selects optimal sorting algorithm, so that it is determined that several amount and type of the corresponding grader of classification at different levels.
In order to further improve the classification accuracy of multiclass classification system, in the present embodiment, various sorting algorithms are preferred
It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural
Network algorithm and Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention to section
The specific machine learning algorithm that point is used is not specifically limited, and those skilled in the art can be set according to practical application effect
It is fixed.
Step S320:Several amount and type of the grader according to corresponding to classification at different levels, configure corresponding multiclass classification mould
Type.
Wherein, multiclass classification model is a mixed model for containing many algorithms, it comprises all sections in system
The different classifications algorithm that grader on point is used, and the connection pass between each grader is have recorded by configuration file
System.In the present embodiment, several amount and type of the grader first according to corresponding to the classification at different levels that step S310 determines, configuration
Corresponding multiclass classification model, and generate the configuration file for recording each node classifier information;When newsletter archive to be sorted letter
After breath input multiclass classification model, further according to the output result of the present node grader for getting, above-mentioned configuration file is inquired about,
To determine the next stage node classifier of present node grader.The multiclass classification model preferably comprises multistage node classifier
Tree-shaped disaggregated model.
Step S330:The newsletter archive information input multiclass classification model to be sorted that will be got is classified, by multistage
The output result of disaggregated model is defined as the final classification result of newsletter archive information to be sorted.
Specifically, the newsletter archive information input to be sorted that will be got in multiclass classification model, the multiclass classification mould
Type can be identified classification to the newsletter archive information to be sorted according to built-in grader, and generate classification results, finally will be defeated
The classification results for going out are defined as the final classification result of the newsletter archive information to be sorted.
As can be seen here, a kind of multiclass classification method based on newsletter archive information that the present invention is provided, by building
The newsletter archive information classification framework of multi-layer, and different graders are configured in each level, targetedly solve sample
The inaccurate problem of classification results caused by notebook data is uneven, and effectively increase the accurate of newsletter archive information classification
Property, improve newsletter archive information classification efficiency.
Example IV
Fig. 4 shows a kind of multiclass classification method based on newsletter archive information of present invention offer, and the method includes:
Step S410:For the classification at different levels of newsletter archive information, by various machine learning algorithms to default training
Sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels.
Specifically, training sample set is generated according to the labeled data for getting, extracts training sample and concentrate the training for including
Feature Words, and for the training characteristics word for having extracted assigns corresponding weight;Then further according to the training characteristics word for having extracted and its
Weight generates corresponding training feature vector, and training result and corresponding grader are obtained according to the training feature vector.Its
In, the extraction of Feature Words can be trained according to default dictionary, it is also possible to be trained Feature Words according to other rules
Extract, the present invention is not especially limited to this.For the specific method that weight is assigned for the training characteristics word for having extracted, the present invention
Also it is not specifically limited, those skilled in the art can flexibly be set.For example, when newsletter archive information to be sorted is plain text text
During part, can be using TF-IDF (Term Frequency-Inverse Document Frequency, i.e. word frequency-reverse files
Frequency) algorithm assigns corresponding weight to the training characteristics word that extracts.
In order to further improve the classification accuracy of multiclass classification method, in the present embodiment, various sorting algorithms are preferred
It is machine learning algorithm, wherein, above-mentioned machine learning algorithm specifically includes but is not limited to algorithm of support vector machine, convolutional Neural
Network algorithm, Recognition with Recurrent Neural Network algorithm etc..Different algorithms has itself different advantage and disadvantage, therefore the present invention is adopted to node
Specific machine learning algorithm is not specifically limited, and those skilled in the art can be set according to practical application effect.
Step S420:Several amount and type of the grader according to corresponding to classification at different levels, configure corresponding multiclass classification mould
Type.
In the present embodiment, several amount and type of the grader according to corresponding to the classification at different levels that step S410 determines, match somebody with somebody
Corresponding multiclass classification model is put, and generates configuration file corresponding with multiclass classification model;Divide whenever present node is got
During the output result of class device, by inquiring about above-mentioned configuration file, so that it is determined that the next stage node-classification of present node grader
Device.Be stored with multiple configuration items corresponding with each node classifier respectively in the configuration file, specifically, each configuration
Classification type, and/or the node-classification that description information of the item comprising corresponding node classifier, the node classifier are adapted to
Corresponding relation between every kind of output result of device and its next stage node classifier.Therefore, just can be certainly by the configuration file
Most suitable grader is selected in the dynamic multiple graders classified from next stage carries out further sort operation.
In the present embodiment, multiclass classification model is to include the tree-shaped disaggregated model of multistage node classifier, the model bag
Multiple different types of node classifiers are included, for example, can be including root node grader, leaf node grader and intermediate node point
Class device, wherein, the quantity of leaf node grader and intermediate node grader is usually multiple, for example, it may be a node point
Class device only corresponds to an one-one relationship for subclassification;Can also be the multipair of the same subclassification of multiple node classifier correspondences
One relation, in the case of many-to-one relationship, further can select different type according to factors such as newsletter archive information types
Node classifier carry out the subclassification and be identified;Can also be the one-to-many of the corresponding multiple subclassifications of a node classifier
Relation, now the classifying rules of multiple subclassifications is typically similar, therefore can be identified with same node classifier.
In addition, the quantity of root node grader is usually one but it is also possible to be multiple different types of root node graders, so that suitable
Should be in different newsletter archive information types.The description to multiclass classification model structure is a kind of citing above, not this hair
The bright restriction to multiclass classification model structure, those skilled in the art can use other suitable structures according to actual conditions.
Step S430:Training result is evaluated, according to evaluation result to the number of the grader corresponding to classification at different levels
Amount and type are modified, and configured multiclass classification model is updated according to modification result.
In order to further improve the accuracy of several amount and type of the grader of step S410 determinations, evaluation can be added to walk
Suddenly, i.e. step S430.The training result of step S410 is evaluated according to default checking set, and according to evaluation result pair
Several amount and type of the grader corresponding to classification at different levels that step S410 determines are modified, and make identified grader more
It is adapted to the classification of level where it.Above-mentioned modification includes the deletion of grader, newly-increased and/or replacement.Wherein, checking set is mark
The sub-fraction of data is noted, model training is not involved in, is specifically used to assess the different models for training, which effect is more preferable.
Wherein, step S430 not only can determine suitable grader with additional step S410, can also be transported in subsequent step
During row, sample set and the new sorting algorithm for using to increasing newly are constantly attempted, and then attempt knot for every kind of
Fruit is evaluated, so that it is determined that more excellent grader.For the specific evaluation method that step S430 is used, the present invention does not do specific
Limit, those skilled in the art can flexibly be set according to actual conditions.
In order to enable the method to the newsletter archive information recognition effect being optimal, step S430 can be true to step S410
Fixed grader number amount and type are constantly modified, meanwhile, to configured multiclass classification model and corresponding with model match somebody with somebody
File is put to be updated accordingly, and according to the configuration file after renewal, the renewal matched to multiclass classification model.
In order to improve the overall operation efficiency of this method, renewal operations of the step S430 to multiclass classification model can be heat
The renewal operation of switching type, you can in the case of not closing system, when new model effect is better than model on line, to pass through
The multiclass classification version that the quick more new system of hot-swap operation is used.In order to coordinate the hot-swap to operate, step S420
Can be comprising multiple metadata corresponding from different disaggregated models respectively, each metadata record in the configuration file of generation
The path of corresponding disaggregated model and description information (such as version etc.), when disaggregated model updates, synchronized update is corresponding
Metadata, therefore can complete to update behaviour automatically according to the content of metadata record when the hot-swap for carrying out model updates operation
Make.
Step S440:The newsletter archive information input multiclass classification model to be sorted that will be got is classified, by multistage
The output result of disaggregated model is defined as the final classification result of newsletter archive information to be sorted.
Wherein, newsletter archive information to be sorted is generally complete paragraph or article, it is impossible to directly input multiclass classification mould
Type is identified, therefore, it is necessary to treating classified news text message carries out series of preprocessing before multiclass classification model is input into
Operation, the file type that multiclass classification model can be recognized is converted to by newsletter archive information to be sorted.Common pretreatment behaviour
Work can extract the file characteristic word included in newsletter archive information to be sorted, be that the file characteristic word for having extracted assigns correspondence
Weight, corresponding document characteristic vector etc. is generated according to the file characteristic word and its weight that have extracted.Wherein, extraction document is special
Levy word and assign respective weights rule can with step S410 similar operations it is regular consistent, will not be repeated here.
In addition, in actual applications, the source of newsletter archive information to be sorted is diversified, therefore, it is also desirable to first
Treating classified news text message carries out a series of standardization processing, so as to facilitate follow-up pretreatment operation.Common rule
Generalized treatment include according to default font set the font treated in classified news text message of rule be adjusted, and/or
The vocabulary in classified news text message is treated according to default filtering rule to be filtered.
As can be seen here, a kind of multiclass classification method based on newsletter archive information that the present invention is provided, by building
The newsletter archive information classification framework of multi-layer, and different graders are configured in each level, so as to targetedly solve
The inaccurate problems of classification results caused by sample data is uneven, and effectively increase the standard of newsletter archive information classification
True property, improves newsletter archive information classification efficiency.In addition, this multiclass classification method also carries out news using machine learning algorithm
Text message is classified, and is realized to the real-time of disaggregated model by evaluating mechanism and model modification mechanism with hot-swapping function
Amendment, enables this method to keep optimal implementation state.Pass through pretreatment operation and standardized operation simultaneously so that this method
The newsletter archive information to be sorted of variety classes separate sources can be identified, further increase the adaptation of this method
Property, widen the use scope of stupid method.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.Additionally, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various
Programming language realizes the content of invention described herein, and the description done to language-specific above is to disclose this hair
Bright preferred forms.
In specification mentioned herein, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be put into practice in the case of without these details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify one or more that the disclosure and helping understands in each inventive aspect, exist
Above to the description of exemplary embodiment of the invention in, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:I.e. required guarantor
The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, and wherein each claim is in itself
All as separate embodiments of the invention.
Those skilled in the art are appreciated that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Unit or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, can use any
Combine to all features disclosed in this specification (including adjoint claim, summary and accompanying drawing) and so disclosed appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification (including adjoint power
Profit is required, summary and accompanying drawing) disclosed in each feature can the alternative features of or similar purpose identical, equivalent by offer carry out generation
Replace.
Although additionally, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of different embodiments means in of the invention
Within the scope of and form different embodiments.For example, in the following claims, embodiment required for protection is appointed
One of meaning mode can be used in any combination.
All parts embodiment of the invention can be realized with hardware, or be run with one or more processor
Software module realize, or with combinations thereof realize.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor (DSP) realize the multistage based on newsletter archive information according to embodiments of the present invention
The some or all functions of some or all parts in categorizing system.The present invention is also implemented as performing here
Some or all equipment or program of device of described method are (for example, computer program and computer program are produced
Product).It is such to realize that program of the invention be stored on a computer-readable medium, or can have one or more
The form of signal.Such signal can be downloaded from internet website and obtained, or be provided on carrier signal, or to appoint
What other forms is provided.
It should be noted that above-described embodiment the present invention will be described rather than limiting the invention, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol being located between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element listed in the claims or step.Word "a" or "an" before element is not excluded the presence of as multiple
Element.The present invention can come real by means of the hardware for including some different elements and by means of properly programmed computer
It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
The invention discloses:A1. a kind of multiclass classification system based on newsletter archive information, including:
Training module, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default
Training sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Multiclass classification module, for the number of the grader corresponding to the classification described at different levels that are determined according to the training module
Amount and type, configure corresponding multiclass classification model;
As a result determining module, is carried out for multiclass classification model described in the newsletter archive information input to be sorted that will get
Classification, the output result of the multiclass classification model is defined as the final classification result of the newsletter archive information to be sorted.
A2. the system according to A1, wherein, the system is further included:
Evaluation module, evaluates for the training result to the training module, according to evaluation result to described at different levels
Several amount and type of the corresponding grader of classification are modified, and the modification includes:It is the deletion of grader, newly-increased and/or replace
Change;
Model modification module, for the modification according to the evaluation module, is carried out more to configured multiclass classification model
Newly.
A3. the system according to A2, wherein, the multiclass classification module is further used for:Generation and many fractions
The corresponding configuration file of class model, and the model modification module is further used for:The configuration file is updated, root
The multiclass classification model is updated according to the configuration file after renewal.
A4. according to any described systems of A1-A3, wherein, the multiclass classification model is to include multistage node classifier
Tree-shaped disaggregated model, and the tree-shaped disaggregated model includes multiple different types of node classifiers.
A5. the system according to A4, wherein, the multiclass classification module is further used for:Work as prosthomere whenever getting
During the output result of point grader, by inquiring about the configuration file corresponding with the multiclass classification model, determine described current
The next stage node classifier of node classifier;
Wherein, it is stored with the configuration file:Multiple configuration items corresponding with each node classifier respectively, each
Configuration item includes:Classification type, and/or the section that the description information of corresponding node classifier, the node classifier are adapted to
Corresponding relation between every kind of output result and its next stage node classifier of point grader.
A6. according to any described systems of A1-A5, wherein, the training module specifically for:
The training sample set is generated according to the labeled data for getting, the training sample is extracted and is concentrated the training for including
Feature Words, are that the training characteristics word for having extracted assigns corresponding weight;
Corresponding training feature vector is generated according to the training characteristics word and its weight for having extracted, according to the training characteristics
Vector obtains training result and corresponding grader.
A7. according to any described systems of A1-A6, wherein, the result determining module specifically for:To treating for getting
Classified news text message is pre-processed, and by multiclass classification mould described in pretreated newsletter archive information input to be sorted
Type is classified;
Wherein, the pretreatment includes:The file characteristic word included in the newsletter archive information to be sorted is extracted, is
The file characteristic word of extraction assigns corresponding weight;Corresponding file is generated according to the file characteristic word and its weight for having extracted special
Levy vector.
A8. the system according to A7, wherein, the result determining module was further used for before being pre-processed:
Rule is set according to default font to be adjusted the font in the newsletter archive information to be sorted, and/or according to default
Filtering rule the vocabulary in the newsletter archive information to be sorted is filtered.
A9. according to any described systems of A1-A8, wherein, various machine learning algorithms include it is following at least
One:Algorithm of support vector machine, convolutional neural networks algorithm and Recognition with Recurrent Neural Network algorithm.
The invention also discloses:B10. a kind of multiclass classification method based on newsletter archive information, including:
For the classification at different levels of newsletter archive information, default training sample set is carried out by various machine learning algorithms
Training, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Several amount and type of the grader according to corresponding to the classification at different levels, configure corresponding multiclass classification model;
Multiclass classification model described in the newsletter archive information input to be sorted that will be got is classified, by many fractions
The output result of class model is defined as the final classification result of the newsletter archive information to be sorted.
B11. the method according to B10, wherein, methods described is further included:
Training result is evaluated, according to evaluation result to the quantity and class of the grader corresponding to the classification at different levels
Type is modified, and configured multiclass classification model is updated according to modification result;Wherein, the modification includes:Classification
The deletion of device, newly-increased and/or replacement.
B12. the method according to B11, wherein, the quantity of the grader according to corresponding to the classification at different levels and
The step of type, configuration corresponding multiclass classification model, further includes:Generation is corresponding with the multiclass classification model to match somebody with somebody
Put file, and described further include the step of be updated to configured multiclass classification model according to modification result:To institute
State configuration file to be updated, the multiclass classification model is updated according to the configuration file after renewal.
B13. according to any described methods of B10-B12, wherein, the multiclass classification model is to include multistage node-classification
The tree-shaped disaggregated model of device, and the tree-shaped disaggregated model includes multiple different types of node classifiers.
B14. the method according to B13, wherein, the quantity of the grader according to corresponding to the classification at different levels and
The step of type, configuration corresponding multiclass classification model, further includes:Whenever the output knot for getting present node grader
During fruit, by inquiring about the configuration file corresponding with the multiclass classification model, the next of the present node grader is determined
Level node classifier;
Wherein, it is stored with the configuration file:Multiple configuration items corresponding with each node classifier respectively, each
Configuration item includes:Classification type, and/or the section that the description information of corresponding node classifier, the node classifier are adapted to
Corresponding relation between every kind of output result and its next stage node classifier of point grader.
B15. according to any described methods of B10-B14, wherein, the classification at different levels for newsletter archive information are led to
Cross various machine learning algorithms to be trained default training sample set, according to corresponding to training result determines classification at different levels
The step of several amount and type of grader, specifically includes:
The training sample set is generated according to the labeled data for getting, the training sample is extracted and is concentrated the training for including
Feature Words, are that the training characteristics word for having extracted assigns corresponding weight;
Corresponding training feature vector is generated according to the training characteristics word and its weight for having extracted, according to the training characteristics
Vector obtains training result and corresponding grader.
B16. according to any described methods of B10-B15, wherein, the newsletter archive information to be sorted that will be got is defeated
Enter the multiclass classification model to be classified, the output result of the multiclass classification model is defined as the news text to be sorted
The step of final classification result of this information, specifically includes:
Newsletter archive information to be sorted to getting is pre-processed, and pretreated newsletter archive to be sorted is believed
The breath input multiclass classification model is classified;
Wherein, the pretreatment includes:The file characteristic word included in the newsletter archive information to be sorted is extracted, is
The file characteristic word of extraction assigns corresponding weight;Corresponding file is generated according to the file characteristic word and its weight for having extracted special
Levy vector.
B17. the method according to B16, wherein, further included before the pretreatment:According to default font
Rule is set to be adjusted, and/or according to default filtering rule to institute the font in the newsletter archive information to be sorted
The vocabulary stated in newsletter archive information to be sorted is filtered.
B18. according to any described methods of B10-B17, wherein, various machine learning algorithms include it is following in extremely
It is few one:Algorithm of support vector machine, convolutional neural networks algorithm and Recognition with Recurrent Neural Network algorithm.
Claims (10)
1. a kind of multiclass classification system based on newsletter archive information, including:
Training module, for the classification at different levels for newsletter archive information, by various machine learning algorithms to default training
Sample set is trained, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Multiclass classification module, quantity for the grader corresponding to the classification described at different levels that are determined according to the training module and
Type, configures corresponding multiclass classification model;
As a result determining module, is divided for multiclass classification model described in the newsletter archive information input to be sorted that will get
Class, the output result of the multiclass classification model is defined as the final classification result of the newsletter archive information to be sorted.
2. system according to claim 1, wherein, the system is further included:
Evaluation module, evaluates for the training result to the training module, according to evaluation result to the classification at different levels
Several amount and type of corresponding grader are modified, and the modification includes:The deletion of grader, newly-increased and/or replacement;
Model modification module, for the modification according to the evaluation module, is updated to configured multiclass classification model.
3. system according to claim 2, wherein, the multiclass classification module is further used for:Generation and the multistage
The corresponding configuration file of disaggregated model, and the model modification module is further used for:The configuration file is updated,
The multiclass classification model is updated according to the configuration file after renewal.
4. according to any described systems of claim 1-3, wherein, the multiclass classification model is to include multistage node classifier
Tree-shaped disaggregated model, and the tree-shaped disaggregated model includes multiple different types of node classifiers.
5. system according to claim 4, wherein, the multiclass classification module is further used for:It is current whenever getting
During the output result of node classifier, by inquiring about the configuration file corresponding with the multiclass classification model, it is determined that described work as
The next stage node classifier of front nodal point grader;
Wherein, it is stored with the configuration file:Multiple configuration items corresponding with each node classifier respectively, each configuration
Item includes:Classification type, and/or the node point that the description information of corresponding node classifier, the node classifier are adapted to
Corresponding relation between every kind of output result of class device and its next stage node classifier.
6. according to any described systems of claim 1-5, wherein, the training module specifically for:
The training sample set is generated according to the labeled data for getting, the training sample is extracted and is concentrated the training characteristics for including
Word, is that the training characteristics word for having extracted assigns corresponding weight;
Corresponding training feature vector is generated according to the training characteristics word and its weight for having extracted, according to the training feature vector
Obtain training result and corresponding grader.
7. according to any described systems of claim 1-6, wherein, the result determining module specifically for:To what is got
Newsletter archive information to be sorted is pre-processed, and by multiclass classification described in pretreated newsletter archive information input to be sorted
Model is classified;
Wherein, the pretreatment includes:The file characteristic word included in the newsletter archive information to be sorted is extracted, is to have extracted
File characteristic word assign corresponding weight;According to the file characteristic word and its weight that have extracted generate corresponding file characteristic to
Amount.
8. system according to claim 7, wherein, the result determining module was further used before being pre-processed
In:Rule is set according to default font to be adjusted, and/or according to pre- font in the newsletter archive information to be sorted
If filtering rule the vocabulary in the newsletter archive information to be sorted is filtered.
9. according to any described systems of claim 1-8, wherein, various machine learning algorithms include it is following at least
One:Algorithm of support vector machine, convolutional neural networks algorithm and Recognition with Recurrent Neural Network algorithm.
10. a kind of multiclass classification method based on newsletter archive information, including:
For the classification at different levels of newsletter archive information, default training sample set is instructed by various machine learning algorithms
Practice, several amount and type of the grader according to corresponding to training result determines classification at different levels;
Several amount and type of the grader according to corresponding to the classification at different levels, configure corresponding multiclass classification model;
Multiclass classification model described in the newsletter archive information input to be sorted that will be got is classified, by the multiclass classification mould
The output result of type is defined as the final classification result of the newsletter archive information to be sorted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710103541.0A CN106909654B (en) | 2017-02-24 | 2017-02-24 | Multi-level classification system and method based on news text information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710103541.0A CN106909654B (en) | 2017-02-24 | 2017-02-24 | Multi-level classification system and method based on news text information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106909654A true CN106909654A (en) | 2017-06-30 |
CN106909654B CN106909654B (en) | 2020-07-21 |
Family
ID=59208413
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710103541.0A Active CN106909654B (en) | 2017-02-24 | 2017-02-24 | Multi-level classification system and method based on news text information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106909654B (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402994A (en) * | 2017-07-17 | 2017-11-28 | 广州特道信息科技有限公司 | A kind of sorting technique and device of multi-component system distinguishing hierarchy |
CN107562880A (en) * | 2017-09-01 | 2018-01-09 | 北京神州泰岳软件股份有限公司 | A kind of classification results screening technique and device based on multistage classifier |
CN107943940A (en) * | 2017-11-23 | 2018-04-20 | 网易(杭州)网络有限公司 | Data processing method, medium, system and electronic equipment |
CN108073677A (en) * | 2017-11-02 | 2018-05-25 | 中国科学院信息工程研究所 | A kind of multistage text multi-tag sorting technique and system based on artificial intelligence |
CN109165380A (en) * | 2018-07-26 | 2019-01-08 | 咪咕数字传媒有限公司 | A kind of neural network model training method and device, text label determine method and device |
CN109189950A (en) * | 2018-09-03 | 2019-01-11 | 腾讯科技(深圳)有限公司 | Multimedia resource classification method, device, computer equipment and storage medium |
CN109471938A (en) * | 2018-10-11 | 2019-03-15 | 平安科技(深圳)有限公司 | A kind of file classification method and terminal |
CN109960725A (en) * | 2019-01-17 | 2019-07-02 | 平安科技(深圳)有限公司 | Text classification processing method, device and computer equipment based on emotion |
CN110019776A (en) * | 2017-09-05 | 2019-07-16 | 腾讯科技(北京)有限公司 | Article classification method and device, storage medium |
CN110442725A (en) * | 2019-08-14 | 2019-11-12 | 科大讯飞股份有限公司 | Entity relation extraction method and device |
WO2019214133A1 (en) * | 2018-05-08 | 2019-11-14 | 华南理工大学 | Method for automatically categorizing large-scale customer complaint data |
CN110597985A (en) * | 2019-08-15 | 2019-12-20 | 重庆金融资产交易所有限责任公司 | Data classification method, device, terminal and medium based on data analysis |
CN110633366A (en) * | 2019-07-31 | 2019-12-31 | 国家计算机网络与信息安全管理中心 | Short text classification method, device and storage medium |
CN110781292A (en) * | 2018-07-25 | 2020-02-11 | 百度在线网络技术(北京)有限公司 | Text data multi-level classification method and device, electronic equipment and storage medium |
CN111480167A (en) * | 2017-12-20 | 2020-07-31 | 艾普维真股份有限公司 | Authenticated machine learning with multi-digit representation |
CN111625644A (en) * | 2020-04-14 | 2020-09-04 | 北京捷通华声科技股份有限公司 | Text classification method and device |
CN111753197A (en) * | 2020-06-18 | 2020-10-09 | 达而观信息科技(上海)有限公司 | News element extraction method and device, computer equipment and storage medium |
CN111832589A (en) * | 2019-04-22 | 2020-10-27 | 北京京东尚科信息技术有限公司 | Method and device for classifying multi-stage classified objects |
CN112052331A (en) * | 2019-06-06 | 2020-12-08 | 武汉Tcl集团工业研究院有限公司 | A method and terminal for processing text information |
CN113139558A (en) * | 2020-01-16 | 2021-07-20 | 北京京东振世信息技术有限公司 | Method and apparatus for determining a multi-level classification label for an article |
CN113254645A (en) * | 2021-06-08 | 2021-08-13 | 南京冰鉴信息科技有限公司 | Text classification method and device, computer equipment and readable storage medium |
CN114022813A (en) * | 2021-11-01 | 2022-02-08 | 北京达佳互联信息技术有限公司 | Video special effect type identification method and device, electronic equipment and storage medium |
WO2022116438A1 (en) * | 2020-12-01 | 2022-06-09 | 平安科技(深圳)有限公司 | Customer service violation quality inspection method and apparatus, computer device, and storage medium |
CN116777400A (en) * | 2023-08-21 | 2023-09-19 | 江苏海外集团国际工程咨询有限公司 | Engineering consultation information whole-flow management system and method based on deep learning |
WO2024139291A1 (en) * | 2022-12-30 | 2024-07-04 | 深圳云天励飞技术股份有限公司 | Multi-level classification model classification method, training method and apparatus, device, and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | A sensitive web page filtering method and system based on multi-classifier fusion |
CN102117411A (en) * | 2009-12-30 | 2011-07-06 | 日电(中国)有限公司 | Method and system for constructing multi-level classification model |
CN102193928A (en) * | 2010-03-08 | 2011-09-21 | 三星电子(中国)研发中心 | Method for matching lightweight ontologies based on multilayer text categorizer |
CN103324758A (en) * | 2013-07-10 | 2013-09-25 | 苏州大学 | News classifying method and system |
CN103426007A (en) * | 2013-08-29 | 2013-12-04 | 人民搜索网络股份公司 | Machine learning classification method and device |
CN103778569A (en) * | 2014-02-13 | 2014-05-07 | 上海交通大学 | Distributed generation island detection method based on meta learning |
CN104978328A (en) * | 2014-04-03 | 2015-10-14 | 北京奇虎科技有限公司 | Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device |
CN106453033A (en) * | 2016-08-31 | 2017-02-22 | 电子科技大学 | Multilevel Email classification method based on Email content |
-
2017
- 2017-02-24 CN CN201710103541.0A patent/CN106909654B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | A sensitive web page filtering method and system based on multi-classifier fusion |
CN102117411A (en) * | 2009-12-30 | 2011-07-06 | 日电(中国)有限公司 | Method and system for constructing multi-level classification model |
CN102193928A (en) * | 2010-03-08 | 2011-09-21 | 三星电子(中国)研发中心 | Method for matching lightweight ontologies based on multilayer text categorizer |
CN103324758A (en) * | 2013-07-10 | 2013-09-25 | 苏州大学 | News classifying method and system |
CN103426007A (en) * | 2013-08-29 | 2013-12-04 | 人民搜索网络股份公司 | Machine learning classification method and device |
CN103778569A (en) * | 2014-02-13 | 2014-05-07 | 上海交通大学 | Distributed generation island detection method based on meta learning |
CN104978328A (en) * | 2014-04-03 | 2015-10-14 | 北京奇虎科技有限公司 | Hierarchical classifier obtaining method, text classification method, hierarchical classifier obtaining device and text classification device |
CN106453033A (en) * | 2016-08-31 | 2017-02-22 | 电子科技大学 | Multilevel Email classification method based on Email content |
Non-Patent Citations (2)
Title |
---|
EASON.WXD: "系统学习机器学习之组合多分类器", 《CSDN博客HTTPS://BLOG.CSDN.NET/APP_12062011/ARTICLE/DETAILS/50424776》 * |
王爱华等: "基于Boost和信任函数的多文本分类器组合模型", 《计算机工程与应用》 * |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107402994A (en) * | 2017-07-17 | 2017-11-28 | 广州特道信息科技有限公司 | A kind of sorting technique and device of multi-component system distinguishing hierarchy |
CN107562880A (en) * | 2017-09-01 | 2018-01-09 | 北京神州泰岳软件股份有限公司 | A kind of classification results screening technique and device based on multistage classifier |
CN110019776A (en) * | 2017-09-05 | 2019-07-16 | 腾讯科技(北京)有限公司 | Article classification method and device, storage medium |
CN110019776B (en) * | 2017-09-05 | 2023-04-28 | 腾讯科技(北京)有限公司 | Article classification method and device and storage medium |
CN108073677A (en) * | 2017-11-02 | 2018-05-25 | 中国科学院信息工程研究所 | A kind of multistage text multi-tag sorting technique and system based on artificial intelligence |
CN108073677B (en) * | 2017-11-02 | 2021-12-28 | 中国科学院信息工程研究所 | Multi-level text multi-label classification method and system based on artificial intelligence |
CN107943940A (en) * | 2017-11-23 | 2018-04-20 | 网易(杭州)网络有限公司 | Data processing method, medium, system and electronic equipment |
CN111480167A (en) * | 2017-12-20 | 2020-07-31 | 艾普维真股份有限公司 | Authenticated machine learning with multi-digit representation |
WO2019214133A1 (en) * | 2018-05-08 | 2019-11-14 | 华南理工大学 | Method for automatically categorizing large-scale customer complaint data |
CN110781292A (en) * | 2018-07-25 | 2020-02-11 | 百度在线网络技术(北京)有限公司 | Text data multi-level classification method and device, electronic equipment and storage medium |
CN109165380A (en) * | 2018-07-26 | 2019-01-08 | 咪咕数字传媒有限公司 | A kind of neural network model training method and device, text label determine method and device |
CN109165380B (en) * | 2018-07-26 | 2022-07-01 | 咪咕数字传媒有限公司 | Neural network model training method and device and text label determining method and device |
CN109189950B (en) * | 2018-09-03 | 2023-04-07 | 腾讯科技(深圳)有限公司 | Multimedia resource classification method and device, computer equipment and storage medium |
CN109189950A (en) * | 2018-09-03 | 2019-01-11 | 腾讯科技(深圳)有限公司 | Multimedia resource classification method, device, computer equipment and storage medium |
CN109471938B (en) * | 2018-10-11 | 2023-06-16 | 平安科技(深圳)有限公司 | Text classification method and terminal |
CN109471938A (en) * | 2018-10-11 | 2019-03-15 | 平安科技(深圳)有限公司 | A kind of file classification method and terminal |
CN109960725A (en) * | 2019-01-17 | 2019-07-02 | 平安科技(深圳)有限公司 | Text classification processing method, device and computer equipment based on emotion |
CN111832589A (en) * | 2019-04-22 | 2020-10-27 | 北京京东尚科信息技术有限公司 | Method and device for classifying multi-stage classified objects |
CN112052331A (en) * | 2019-06-06 | 2020-12-08 | 武汉Tcl集团工业研究院有限公司 | A method and terminal for processing text information |
CN110633366B (en) * | 2019-07-31 | 2022-12-16 | 国家计算机网络与信息安全管理中心 | Short text classification method, device and storage medium |
CN110633366A (en) * | 2019-07-31 | 2019-12-31 | 国家计算机网络与信息安全管理中心 | Short text classification method, device and storage medium |
CN110442725B (en) * | 2019-08-14 | 2022-02-25 | 科大讯飞股份有限公司 | Entity relationship extraction method and device |
CN110442725A (en) * | 2019-08-14 | 2019-11-12 | 科大讯飞股份有限公司 | Entity relation extraction method and device |
CN110597985A (en) * | 2019-08-15 | 2019-12-20 | 重庆金融资产交易所有限责任公司 | Data classification method, device, terminal and medium based on data analysis |
CN113139558A (en) * | 2020-01-16 | 2021-07-20 | 北京京东振世信息技术有限公司 | Method and apparatus for determining a multi-level classification label for an article |
CN113139558B (en) * | 2020-01-16 | 2023-09-05 | 北京京东振世信息技术有限公司 | Method and device for determining multi-stage classification labels of articles |
CN111625644A (en) * | 2020-04-14 | 2020-09-04 | 北京捷通华声科技股份有限公司 | Text classification method and device |
CN111625644B (en) * | 2020-04-14 | 2023-09-12 | 北京捷通华声科技股份有限公司 | Text classification method and device |
CN111753197A (en) * | 2020-06-18 | 2020-10-09 | 达而观信息科技(上海)有限公司 | News element extraction method and device, computer equipment and storage medium |
CN111753197B (en) * | 2020-06-18 | 2024-04-05 | 达观数据有限公司 | News element extraction method, device, computer equipment and storage medium |
WO2022116438A1 (en) * | 2020-12-01 | 2022-06-09 | 平安科技(深圳)有限公司 | Customer service violation quality inspection method and apparatus, computer device, and storage medium |
CN113254645A (en) * | 2021-06-08 | 2021-08-13 | 南京冰鉴信息科技有限公司 | Text classification method and device, computer equipment and readable storage medium |
CN114022813A (en) * | 2021-11-01 | 2022-02-08 | 北京达佳互联信息技术有限公司 | Video special effect type identification method and device, electronic equipment and storage medium |
WO2024139291A1 (en) * | 2022-12-30 | 2024-07-04 | 深圳云天励飞技术股份有限公司 | Multi-level classification model classification method, training method and apparatus, device, and medium |
CN116777400A (en) * | 2023-08-21 | 2023-09-19 | 江苏海外集团国际工程咨询有限公司 | Engineering consultation information whole-flow management system and method based on deep learning |
CN116777400B (en) * | 2023-08-21 | 2023-10-31 | 江苏海外集团国际工程咨询有限公司 | Engineering consultation information whole-flow management system and method based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN106909654B (en) | 2020-07-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106909654A (en) | A kind of multiclass classification system and method based on newsletter archive information | |
Halibas et al. | Application of text classification and clustering of Twitter data for business analytics | |
CN107577739B (en) | Semi-supervised domain word mining and classifying method and equipment | |
CN109872162B (en) | Wind control classification and identification method and system for processing user complaint information | |
KR20200007969A (en) | Information processing methods, terminals, and computer storage media | |
US20190340507A1 (en) | Classifying data | |
CN106934008B (en) | Junk information identification method and device | |
CN104361037B (en) | Microblogging sorting technique and device | |
CN110020176A (en) | A kind of resource recommendation method, electronic equipment and computer readable storage medium | |
CN102541958A (en) | Method, device and computer equipment for identifying short text category information | |
CN106997367A (en) | Sorting technique, sorter and the categorizing system of program file | |
Clavijo et al. | Adversarial domain adaptation to reduce sample bias of a high energy physics event classifier | |
CN107193915A (en) | A kind of company information sorting technique and device | |
CN109598307A (en) | Data screening method, apparatus, server and storage medium | |
WO2022244106A1 (en) | Data conversion device, data conversion method, and data conversion program | |
CN105512195A (en) | Auxiliary method for analyzing and making decisions of product FMECA report | |
CN102411592B (en) | Text classification method and device | |
CN106096413A (en) | A kind of malicious code detecting method based on multi-feature fusion and system | |
Krenn et al. | Predicting the Future of AI with AI: High-quality link prediction in an exponentially growing knowledge network | |
CN107861945A (en) | Finance data analysis method, application server and computer-readable recording medium | |
CN107679209B (en) | Classification expression generation method and device | |
CN109716660A (en) | Data compression device and method | |
CN106611189A (en) | Method for constructing integrated classifier of standardized multi-dimensional cost sensitive decision-making tree | |
CN110377741A (en) | File classification method, intelligent terminal and computer readable storage medium | |
CN101553815A (en) | Method and apparatus for classifying a content item |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: 100089 710, 7 / F, building 1, zone 1, No.3, Xisanhuan North Road, Haidian District, Beijing Patentee after: Beijing time Ltd. Address before: 100089 710, 7 / F, building 1, zone 1, No.3, Xisanhuan North Road, Haidian District, Beijing Patentee before: BEIJING TIME Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |