CN106446230A - Method for optimizing word classification in machine learning text - Google Patents
Method for optimizing word classification in machine learning text Download PDFInfo
- Publication number
- CN106446230A CN106446230A CN201610881132.9A CN201610881132A CN106446230A CN 106446230 A CN106446230 A CN 106446230A CN 201610881132 A CN201610881132 A CN 201610881132A CN 106446230 A CN106446230 A CN 106446230A
- Authority
- CN
- China
- Prior art keywords
- classification
- word
- text
- training
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the field of data processing and machine learning classification, in particular to a method for optimizing word classification in machine learning text. The method comprises the steps that on the basis of text classification, self-defined and semantically related features are filtered out through a feature selection regulator based on a regular expression, a user customizes classification types in training data after feature selection, and classification based training is conducted by means of the features and the types according to a naive Bayesian model; after training is completed, in the application stage, if statements conforming to the feature selection regulator exist in text needing word classification, classification is completed by combining a trained model. According to the method, the capacity of the model for processing work classification is not limited in word data in a training sample; the method can be applied to optimization and application of machine learning text work classification and derivation functions thereof.
Description
Technical field
The present invention relates to data processing and machine learning classification field, word in especially a kind of optimization machine learning text
The method of classification.
Background technology
With the fast development of information technology, modern society's information content is in explosive growth, in the today in big data epoch
How to make good use of mass data, really valuable information of excavating becomes a social concerns focus.Machine learning is in data
Effect played in excavation is more and more obvious, and in the problem with text classification for the process to natural language, machine learning
By traditional rules customization method is replaced come solve problem using statistical method, by facts have proved this way effect not
Wrong and in hgher efficiency.On the basis of to text classification it is desirable in further to text, each word, keyword are classified,
Extract required key wordses information, this just puts forward higher requirement to machine learning classification.
Content of the invention
Present invention solves the technical problem that being to provide a kind of method optimizing word's kinds in machine learning text, solve
The classification problem of self-defined key wordses in current text classification.
The present invention solve above-mentioned technical problem technical scheme be:
Described method is on the basis of text classification, and the feature selecting rule device based on regular expression filters out
Self-defining and semantic related feature, the class categories in User Defined training data after feature selecting, and then utilize
These features and classification carry out classification based training according to model-naive Bayesian;After completing training, when the application stage, need word
In the text of language classification if there is meet feature selecting rule device sentence when, complete point in conjunction with the trained model completing
Class.
Described method concretely comprises the following steps:
S1, training set create:Create satisfactory training text data according to actual needs, complete in conjunction with true environment
Training set creates;
S2, data prediction:When being related to Chinese in the text of need classification, need the text in training set is carried out point
The pretreatment such as word and stop words removal;
S3, feature selecting rule device in, input self-defining regular expression as filter condition, feature selecting advise
Then device goes out legal text in training set according to regular expression Rules Filtering, will be in regular expression wildcard in text
Participle queue put in word at symbol;
Whether full S4, the characteristic vector model being generated according to feature selecting rule device, compare word in each participle queue
The regular expression that each feature of foot is located, calculates the weights of the vector of each word;
The defined classification results completing of user in S5, the characteristic vector according to each word and training set, in conjunction with simplicity
Bayes classifier, calculates each class conditional probability and prior probability, then completes the training to disaggregated model;When completing mould
After type training, tested using pre-prepd test the set pair analysis model, formed after the contrast of test result and legitimate reading to point
The Performance Evaluation of class model, and possible modification is proposed, model is optimized;
S6, using complete train grader the text data being actually needed word's kinds is classified.
Described a characteristic value is represented with feature selecting rule one of the self-defining expression formula of device asterisk wildcard, feature is selected
Select regular device when the text to input checks, if there are the sentence meeting expression formula, then this sentence is extracted simultaneously
Using the word of asterisk wildcard position or word collection as in the object typing classification queue needing classification;User can customize each and leads to
Join the meaning of the representative characteristic value of symbol.
User is processed to training data before train classification models, self-defined first required sorting item, entirely literary composition
The word collection meeting feature selecting rule device in this is broadly divided into A, B, C tri- class, and to should in each training text individuality
Individual last classification results are labeled.
During described model training, such as certain word meets first regularity, the then characteristic value representated by this regularity
It is designated as 1, be otherwise designated as 0;
After the completion of model training, test result is analyzed;Now, feature weight synthesis word position, the frequency of occurrences
Calculated as considering index etc. factor.
The invention provides word's kinds in a kind of utilization regular expressions accurately mate semantic optimization machine learning text
Method;Can be applicable in the optimization of the classification of text word and its derivation function and related application in machine learning category.
Brief description
The present invention is further described for accompanying drawings below:
Fig. 1 is classification process schematic diagram of the present invention;
Specific embodiment
As shown in figure 1, the present invention is on the basis of traditional machine learning file classification method, using with regular expression
Based on feature selecting rule device, filter out self-defining and semantic related feature, User Defined after feature selecting
Class categories in training data, and then using these features and classification, classification based training is carried out according to model-naive Bayesian;
After completing training, when the application stage, need in the text of word's kinds if there is the sentence meeting feature selecting rule device
When, complete classification task in conjunction with the trained model completing.
Feature selector is based on regular expression, represents one in one of self-defining regular expression asterisk wildcard
Characteristic value.As:". " in " .* [xyz]+" can represent a specific feature, similar to:" meet regularity at this
Word is all national title " or " word meeting regularity at this is all related to religion " etc..One feature selecting rule
Then device can comprise one or more rules, and these rules constitute the basis forming characteristic vector model.According to feature selecting
The sentence meeting regular expression in text selected by regular device, the set of words that in these sentences, corresponding asterisk wildcard is located it is simply that
By the word being classified.When being related to Chinese word's kinds, need to process using Chinese word segmentation instrument and some stop words
Flow process is come the word in classification queue that to standardize.
Whether meet, according to these words, a plurality of regularity specified in feature selecting rule device, set up its characteristic vector
Model.Number of dimensions in characteristic vector model, is the characteristic in regularity, be expressed as feature 1, feature 2 ... feature
N }, such as certain word meets first regularity, then the characteristic value representated by this regularity is designated as 1, is otherwise designated as 0.Thus
Understand arbitrarily to need the word of classification, its characteristic vector is all represented by form such similar to { 1,0,0,1... }.Obtaining
After getting the characteristic vector of each word in training set, classification based training is carried out according to model-naive Bayesian.
Here assume initially with naive Bayesian theorem separate between each feature, instructed according to preprepared
Practice collection, set up the characteristic vector of word the artificially defined class categories belonging to it that each needs classification.I.e. user needs
Before train classification models, training data is processed, self-defined required sorting item, such as meet feature selecting in whole text
The word collection of regular device is broadly divided into A, B, C tri- class, then will need the classification knot of the word of classification in each training text
The artificial mark of fruit draws.May then pass through model-naive Bayesian training and show that the prior probability of each feature and posteriority are general
Rate, is referred to as class conditional probability, and the so far training of model completes.
After model training completes, need the data in test set is tested.After completing test, test is tied
Fruit is analyzed, thus the performance of assessment models, and carry out the optimization of model as far as possible to a certain extent.As feature weight meter
Calculation mode no longer indicates whether to meet regular expression with 0,1, but the factor such as comprehensive word position, frequency of occurrences is as considering
Index is calculated.
In actual classification task, feature selecting rule device checks to the text of input, if there are meeting expression formula
Sentence, then this sentence is extracted and the word of asterisk wildcard position or word collection is divided as the object typing needing classification
In Class Queue.When word of classifying is related to Chinese, using general Chinese word segmentation instrument as solution.Can after completing participle
According to demand, stop-word is processed, then each word is carried out in the manner described above its characteristic vector modeling, Ran Houtong
Cross and train the sorter model finishing to complete work of classifying.
The concrete steps of scheme can be as follows as described above:
S1, training set create:Create satisfactory training text data according to actual needs, can be complete in conjunction with true environment
Training set is become to create.In text in training set, the word of classification is needed to have artificially sorted result.It should be noted that
When creating training set, to be created for a certain or several regular expression rules, these rules will be
It is cited, for generating corresponding characteristic item in feature selecting rule device.
S2, data prediction:When being related to Chinese in the text of need classification, need the text in training set is carried out point
The pretreatment such as word and stop words removal.Chinese word segmentation can be using Chinese word segmentation instrument SCWS or Jcseg currently commonly using etc..
The process of stop words is fairly simple, it is possible to use conventional deactivation vocabulary, as foundation, word corresponding in text is removed.This
In it should be noted that when feature selecting rule device in regularity used some stop words when, then will not remove this deactivation
Word.
S3, feature selecting rule device in, input self-defining regular expression as filter condition, feature selecting advise
Then device can go out legal text in training set according to regular expression Rules Filtering, leads to being in regular expression in text
Join the word at symbol and put into participle queue.
Whether full S4, the characteristic vector model being generated according to feature selecting rule device, compare word in each participle queue
The regular expression that each feature of foot is located, calculates the weights of the vector of each word.Concrete grammar is:When word place text
When meeting regular expression 1 in feature selecting rule device, the weights of the corresponding characteristic vector of this expression formula are set to 1, are otherwise set to
0.Thus the characteristic vector obtaining each word is expressed as form such similar to { 1,0,0,1... }.
The defined classification results completing of user in S5, the characteristic vector according to each word and training set, in conjunction with simplicity
Bayes classifier, calculates each class conditional probability and prior probability, then completes the training to disaggregated model.When completing mould
After type training, to be tested using pre-prepd test the set pair analysis model, it is right that test result is formed after being contrasted with legitimate reading
The Performance Evaluation of disaggregated model, and possible modification is proposed, model is optimized.
S6, using complete train grader the text data being actually needed word's kinds is classified.Here need
It should be noted that the text classified has to meet rule in feature selecting rule device, and class to be classified
Also not consistent with self-defined classification in feature selecting rule device, otherwise need to redefine regularity and sorting item, again instruct
Practice model, just can complete new classification task.
Case study on implementation described above is an example of the present invention and not all, based on the example in the present invention, this
Other examples that field those of ordinary skill is obtained on the premise of not making creative work, broadly fall into the guarantor of the present invention
Shield scope.
Claims (7)
1. a kind of optimize machine learning text in word's kinds method it is characterised in that:Described method is in text classification
On the basis of, the feature selecting rule device based on regular expression filters out self-defining and semantic related feature, in spy
Levy the class categories in User Defined training data after selection, so using these features with classification according to naive Bayesian mould
Type is carrying out classification based training;After completing training, when the application stage, need in the text of word's kinds if there is meeting feature
During the sentence of the regular device of selection, complete to classify in conjunction with the trained model completing.
2. method according to claim 1 it is characterised in that:Described method concretely comprises the following steps:
S1, training set create:Create satisfactory training text data according to actual needs, complete to train in conjunction with true environment
Collection creates;
S2, data prediction:When being related to Chinese in the text of need classification, need the text in training set is carried out participle and
The pretreatments such as stop words removal;
S3, feature selecting rule device in, input self-defining regular expression as filter condition, feature selecting rule device
Legal text in training set is gone out according to regular expression Rules Filtering, will be in text at regular expression asterisk wildcard
Word put into participle queue;
S4, the characteristic vector model being generated according to feature selecting rule device, compare whether word in each participle queue meets respectively
The regular expression that individual feature is located, calculates the weights of the vector of each word;
The defined classification results completing of user in S5, the characteristic vector according to each word and training set, in conjunction with simple pattra leaves
This grader, calculates each class conditional probability and prior probability, then completes the training to disaggregated model;Instruct when completing model
After white silk, tested using pre-prepd test the set pair analysis model, formed to classification mould after test result and legitimate reading contrast
The Performance Evaluation of type, and possible modification is proposed, model is optimized;
S6, using complete train grader the text data being actually needed word's kinds is classified.
3. method according to claim 1 it is characterised in that:In the described rule self-defining expression formula of device with feature selecting
An asterisk wildcard represent a characteristic value, feature selecting rule device when the text to input checks, if there are satisfaction
The sentence of expression formula, then extract this sentence and using the word of asterisk wildcard position or word collection as the object needing classification
In typing classification queue;User can customize the meaning of the characteristic value representated by each asterisk wildcard.
4. method according to claim 2 it is characterised in that:In the described rule self-defining expression formula of device with feature selecting
An asterisk wildcard represent a characteristic value, feature selecting rule device when the text to input checks, if there are satisfaction
The sentence of expression formula, then extract this sentence and using the word of asterisk wildcard position or word collection as the object needing classification
In typing classification queue;User can customize the meaning of the characteristic value representated by each asterisk wildcard.
5. the method according to any one of Claims 1-4 it is characterised in that:User is before train classification models to training
Data is processed, self-defined first required sorting item, meets the word collection of feature selecting rule device substantially in whole text
A, B, C tri- class can be divided into, and in each training text individuality, classification results last for this individuality are labeled.
6. the method according to any one of claim 1-4 it is characterised in that:
During described model training, such as certain word meets first regularity, then the characteristic value representated by this regularity is designated as
1, otherwise it is designated as 0;
After the completion of model training, test result is analyzed;Now, feature weight synthesis word position, frequency of occurrences etc. because
Element conduct is considered index and is calculated.
7. method according to claim 5 it is characterised in that:
During described model training, such as certain word meets first regularity, then the characteristic value representated by this regularity is designated as
1, otherwise it is designated as 0;
After the completion of model training, test result is analyzed;Now, feature weight synthesis word position, frequency of occurrences etc. because
Element conduct is considered index and is calculated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610881132.9A CN106446230A (en) | 2016-10-08 | 2016-10-08 | Method for optimizing word classification in machine learning text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610881132.9A CN106446230A (en) | 2016-10-08 | 2016-10-08 | Method for optimizing word classification in machine learning text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106446230A true CN106446230A (en) | 2017-02-22 |
Family
ID=58172086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610881132.9A Pending CN106446230A (en) | 2016-10-08 | 2016-10-08 | Method for optimizing word classification in machine learning text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446230A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368464A (en) * | 2017-07-28 | 2017-11-21 | 深圳数众科技有限公司 | A kind of method and device for obtaining bid product information |
CN107679734A (en) * | 2017-09-27 | 2018-02-09 | 成都四方伟业软件股份有限公司 | It is a kind of to be used for the method and system without label data classification prediction |
CN108470116A (en) * | 2018-03-03 | 2018-08-31 | 淄博职业学院 | The personal identification method and device of a kind of computer system and its user |
CN108491390A (en) * | 2018-03-28 | 2018-09-04 | 江苏满运软件科技有限公司 | A kind of main line logistics goods title automatic recognition classification method |
CN108519978A (en) * | 2018-04-10 | 2018-09-11 | 成都信息工程大学 | A kind of Chinese document segmenting method based on Active Learning |
CN109144999A (en) * | 2018-08-02 | 2019-01-04 | 东软集团股份有限公司 | A kind of data positioning method, device and storage medium, program product |
CN109409533A (en) * | 2018-09-28 | 2019-03-01 | 深圳乐信软件技术有限公司 | A kind of generation method of machine learning model, device, equipment and storage medium |
CN109508370A (en) * | 2018-09-28 | 2019-03-22 | 北京百度网讯科技有限公司 | Opinions Extraction method, equipment and storage medium |
CN110457566A (en) * | 2019-08-15 | 2019-11-15 | 腾讯科技(武汉)有限公司 | Method, device, electronic equipment and storage medium |
CN110968687A (en) * | 2018-09-30 | 2020-04-07 | 北京国双科技有限公司 | Method and device for classifying texts |
CN111428518A (en) * | 2019-01-09 | 2020-07-17 | 科大讯飞股份有限公司 | Low-frequency word translation method and device |
CN112579733A (en) * | 2019-09-30 | 2021-03-30 | 华为技术有限公司 | Rule matching method, rule matching device, storage medium and electronic equipment |
CN113536785A (en) * | 2021-06-15 | 2021-10-22 | 合肥讯飞数码科技有限公司 | Text recommendation method, intelligent terminal and computer readable storage medium |
CN113742479A (en) * | 2020-05-29 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Method and device for screening target text |
CN117556049A (en) * | 2024-01-10 | 2024-02-13 | 杭州光云科技股份有限公司 | Text classification method of regular expression generated based on large language model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559174A (en) * | 2013-09-30 | 2014-02-05 | 东软集团股份有限公司 | Semantic emotion classification characteristic value extraction method and system |
US20140279761A1 (en) * | 2013-03-15 | 2014-09-18 | Konstantinos (Constantin) F. Aliferis | Document Coding Computer System and Method With Integrated Quality Assurance |
CN105808524A (en) * | 2016-03-11 | 2016-07-27 | 江苏畅远信息科技有限公司 | Patent document abstract-based automatic patent classification method |
-
2016
- 2016-10-08 CN CN201610881132.9A patent/CN106446230A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140279761A1 (en) * | 2013-03-15 | 2014-09-18 | Konstantinos (Constantin) F. Aliferis | Document Coding Computer System and Method With Integrated Quality Assurance |
CN103559174A (en) * | 2013-09-30 | 2014-02-05 | 东软集团股份有限公司 | Semantic emotion classification characteristic value extraction method and system |
CN105808524A (en) * | 2016-03-11 | 2016-07-27 | 江苏畅远信息科技有限公司 | Patent document abstract-based automatic patent classification method |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107368464B (en) * | 2017-07-28 | 2020-07-10 | 深圳数众科技有限公司 | Method and device for acquiring bidding product information |
CN107368464A (en) * | 2017-07-28 | 2017-11-21 | 深圳数众科技有限公司 | A kind of method and device for obtaining bid product information |
CN107679734A (en) * | 2017-09-27 | 2018-02-09 | 成都四方伟业软件股份有限公司 | It is a kind of to be used for the method and system without label data classification prediction |
CN108470116A (en) * | 2018-03-03 | 2018-08-31 | 淄博职业学院 | The personal identification method and device of a kind of computer system and its user |
CN108491390A (en) * | 2018-03-28 | 2018-09-04 | 江苏满运软件科技有限公司 | A kind of main line logistics goods title automatic recognition classification method |
CN108519978A (en) * | 2018-04-10 | 2018-09-11 | 成都信息工程大学 | A kind of Chinese document segmenting method based on Active Learning |
CN109144999A (en) * | 2018-08-02 | 2019-01-04 | 东软集团股份有限公司 | A kind of data positioning method, device and storage medium, program product |
CN109144999B (en) * | 2018-08-02 | 2021-06-08 | 东软集团股份有限公司 | Data positioning method, device, storage medium and program product |
CN109409533A (en) * | 2018-09-28 | 2019-03-01 | 深圳乐信软件技术有限公司 | A kind of generation method of machine learning model, device, equipment and storage medium |
CN109508370A (en) * | 2018-09-28 | 2019-03-22 | 北京百度网讯科技有限公司 | Opinions Extraction method, equipment and storage medium |
CN109409533B (en) * | 2018-09-28 | 2021-07-27 | 深圳乐信软件技术有限公司 | Method, device, equipment and storage medium for generating machine learning model |
CN110968687B (en) * | 2018-09-30 | 2023-06-16 | 北京国双科技有限公司 | Method and device for classifying text |
CN110968687A (en) * | 2018-09-30 | 2020-04-07 | 北京国双科技有限公司 | Method and device for classifying texts |
CN111428518A (en) * | 2019-01-09 | 2020-07-17 | 科大讯飞股份有限公司 | Low-frequency word translation method and device |
CN111428518B (en) * | 2019-01-09 | 2023-11-21 | 科大讯飞股份有限公司 | Low-frequency word translation method and device |
CN110457566A (en) * | 2019-08-15 | 2019-11-15 | 腾讯科技(武汉)有限公司 | Method, device, electronic equipment and storage medium |
WO2021063089A1 (en) * | 2019-09-30 | 2021-04-08 | 华为技术有限公司 | Rule matching method, rule matching apparatus, storage medium and electronic device |
CN112579733A (en) * | 2019-09-30 | 2021-03-30 | 华为技术有限公司 | Rule matching method, rule matching device, storage medium and electronic equipment |
CN112579733B (en) * | 2019-09-30 | 2023-10-20 | 华为技术有限公司 | Rule matching method, rule matching device, storage medium and electronic equipment |
CN113742479A (en) * | 2020-05-29 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Method and device for screening target text |
CN113536785A (en) * | 2021-06-15 | 2021-10-22 | 合肥讯飞数码科技有限公司 | Text recommendation method, intelligent terminal and computer readable storage medium |
CN117556049A (en) * | 2024-01-10 | 2024-02-13 | 杭州光云科技股份有限公司 | Text classification method of regular expression generated based on large language model |
CN117556049B (en) * | 2024-01-10 | 2024-05-17 | 杭州光云科技股份有限公司 | Text classification method of regular expression generated based on large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446230A (en) | Method for optimizing word classification in machine learning text | |
CN108573047A (en) | A kind of training method and device of Module of Automatic Chinese Documents Classification | |
CN108199951A (en) | A kind of rubbish mail filtering method based on more algorithm fusion models | |
CN106776538A (en) | The information extracting method of enterprise's noncanonical format document | |
CN102622373B (en) | Statistic text classification system and statistic text classification method based on term frequency-inverse document frequency (TF*IDF) algorithm | |
CN107944480A (en) | A kind of enterprises ' industry sorting technique | |
CN107193801A (en) | A kind of short text characteristic optimization and sentiment analysis method based on depth belief network | |
CN106021410A (en) | Source code annotation quality evaluation method based on machine learning | |
CN106844424A (en) | A kind of file classification method based on LDA | |
CN107301171A (en) | A kind of text emotion analysis method and system learnt based on sentiment dictionary | |
CN107291723A (en) | The method and apparatus of web page text classification, the method and apparatus of web page text identification | |
CN104834940A (en) | Medical image inspection disease classification method based on support vector machine (SVM) | |
CN106202561A (en) | Digitized contingency management case library construction methods based on the big data of text and device | |
CN105373606A (en) | Unbalanced data sampling method in improved C4.5 decision tree algorithm | |
CN106294783A (en) | A kind of video recommendation method and device | |
CN103995876A (en) | Text classification method based on chi square statistics and SMO algorithm | |
CN101876987A (en) | Overlapped-between-clusters-oriented method for classifying two types of texts | |
KR20120109943A (en) | Emotion classification method for analysis of emotion immanent in sentence | |
CN109670039A (en) | Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering | |
CN106897290B (en) | Method and device for establishing keyword model | |
CN102541838A (en) | Method and equipment for optimizing emotional classifier | |
CN110008309A (en) | A kind of short phrase picking method and device | |
CN103593431A (en) | Internet public opinion analyzing method and device | |
CN108280164A (en) | A kind of short text filtering and sorting technique based on classification related words | |
CN109145108A (en) | Classifier training method, classification method, device and computer equipment is laminated in text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170222 |