CN107292348A - A kind of Bagging_BSJ short text classification methods - Google Patents

A kind of Bagging_BSJ short text classification methods Download PDF

Info

Publication number
CN107292348A
CN107292348A CN201710554325.8A CN201710554325A CN107292348A CN 107292348 A CN107292348 A CN 107292348A CN 201710554325 A CN201710554325 A CN 201710554325A CN 107292348 A CN107292348 A CN 107292348A
Authority
CN
China
Prior art keywords
short text
lexical item
feature
bagging
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710554325.8A
Other languages
Chinese (zh)
Inventor
赵德新
张德干
常智
杜娜娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN201710554325.8A priority Critical patent/CN107292348A/en
Publication of CN107292348A publication Critical patent/CN107292348A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

A kind of Bagging_BSJ short text classification methods.The features such as short text has high openness, real-time, lack of standard, when existing traditional text sorting algorithm is applied to short text classification, is influenceed larger, it is difficult to obtain preferable effect by singular data.The features such as the inventive method is directed to the high openness of short text, real-time, it is proposed that the short text classification method based on Integrated.This method is used into Bagging Integrated Algorithm thoughts, semantic feature extension is carried out to short text, and combines bayesian algorithm, algorithm of support vector machine and J48 algorithms, the short text after being extended to semantic feature is classified, and has obtained more preferable classifying quality.Set forth herein Bagging_BSJ methods its accuracys rate improve 12%, recall rate improves 28%, F values and improves 20%.

Description

A kind of Bagging_BSJ short text classification methods
Technical field
The invention belongs to the technical field that computer application is combined with natural language processing.
Background technology
Short text sorting technique refers to the technology classified to number of words in 160 words or so, the text with sparse characteristic. Under normal circumstances, the features such as short text information has real-time, language conciseness, many noises.For openness extremely strong short essay This, using traditional text sorting technique, by calculate the quantity of public lexical item between the frequency and document that lexical item occurs in document come Similitude between document is judged, its accuracy rate is not high.Therefore, for more than the exclusive real-time, language conciseness of short text, noise Feature, improves the accuracy rate of sorting algorithm, and recall rate has important application.
At present, two major classes can be divided into for the more commonly used sorting algorithm of short text:One class is changed based on certain rule The assorting process entered;Another kind of is the content based on external semantic information expansion short text, and then improves the classification effect of short text Really.
Rule-based improved method is mainly handled short text data collection, by means of feature extraction, text table Show, the innovation improved method that multiple links propose such as grader is built.But, in short text classification, due to Sparse, base The problem of semantic gap being commonly encountered when the grader of local feature is expressing short text, it is impossible to effectively distinguish different short essays Semantic information in this.
The sorting algorithm for expanding short text based on semantic information is mainly known by means of text language ambience information or external semantic Know storehouse, utilize the presentation content of the abundant short text of certain rule.This algorithm alleviates Sparse and brought to a certain extent Influence, but when amount of training data increases, the raising that assistance data is brought gradually weakens, and classifying quality can decline.For The feature of short text is openness, and the present invention has carried out wikipedia as external semantic knowledge base the extension of short essay eigen.
A large amount of concepts constantly increased are there are in wikipedia, this expands for the content of short text provides very Effective platform.Semantic Similarity Measurement is a kind of based on wikipedia text and the semantic relation of link structure information quantization mould Type, the model chooses the higher feature of similarity by calculating the semantic similarity between alternative extension feature and theme feature As extension feature, said process is referred to as semantic extension.
The main process of Wiki extension short essay eigen is as follows:(1) after given short text data is pre-processed, obtain To corresponding lexical item vector;(2) each feature lexical item (being referred to as theme feature lexical item) in vector is mapped to Wiki hundred In the theme page corresponding to section, the text message of summary section in the theme page is obtained, and the text message of acquisition is divided Word, the pretreatment of denoising, to obtain the Wiki extension feature vector of each theme feature lexical item;(3) WLA (Wikipedia are passed through Links andAbstract) algorithm carries out the given lexical item of semantic relation quantum chemical method, i.e. quantitative description and it alternatively extends lexical item Between semantic association degree.Due to the correlation degree between the alternative features extension lexical item and theme lexical item in extension vocabulary not Together, then the ability that they are supplemented body feature semantic information is just differed.Thus the given feature lexical item of quantitative description with Semantic association degree between the alternative extension lexical item that 1.1st step is obtained;(4) all theme features of the short text are extended into lexical item Combination, statistics, obtained vector are the characteristic vector after the short text is extended based on wikipedia text message.
In handling short text data collection, classical textual classification model have naive Bayesian NB ( Bayesian), support vector machines (Support Vector Machine) and decision tree (J48) algorithm.Naive Bayesian NB Model rises in classical mathematics theory, there is solid Fundamentals of Mathematics, and stable classification effectiveness.Meanwhile, needed for NB models Seldom, less sensitive to missing data, algorithm is also fairly simple for the parameter of estimation.It is separate between NB hypothesis attributes, this Assuming that being often in actual applications invalid, this brings certain influence to the correct classification of NB models.In attribute number When correlation is larger more than comparing or between attribute, the classification effectiveness of NB models is less than decision-tree model.And it is related in attribute When property is smaller, the performance of NB models is the best.Algorithm of support vector machine SVM, which is one, the learning model of supervision, generally uses To carry out pattern-recognition, classification and regression analysis.Decision tree (J48) algorithm, is a kind of method for approaching discrete function value. It is a kind of typical sorting technique, and data are handled first, using the readable rule of inductive algorithm generation and decision tree, Then new data is analyzed using decision-making.Substantially decision tree is the mistake classified by series of rules to data Journey.There are many defects in the algorithm that the studies above is proposed, poor to short text treatment effect, and such as short text carries out Wiki extension The characteristic vector obtained afterwards, dimension disaster problem may be caused when being classified.Single grader, which can not be obtained, preferably to be divided The lexical item independence of class effect, such as NB Algorithm is poor, and J48 sorting algorithms are influenceed larger by singular data.We use The algorithm of integrated study solves the above problems.
The basic ideas of Ensemble Learning Algorithms are:It is when classifying to new example information, several are independent The combining classifiers of training get up, and the classification results of these single graders are combined with certain weights, are used as final collection Constituent class result.Shown by related data, the performance of integrated classifier is more preferable than the classifying quality of single grader.
At present, integrated study sorting algorithm is broadly divided into two classes:A kind of is the parallel generation using Bagging algorithms as representative Algorithm, it requires that the dependence between component classifier is relatively weak;Another is the string using Boosting algorithms as representative There is stronger dependence between row generating algorithm, this algorithm individual.Boosting algorithms existed in actual applications (over fitting) problem of fitting, causes its classifying quality to be weaker than the classifying quality of single grader.The present invention is used The thought of Bagging algorithms, i.e.,:A training set and one group of Weak Classifier are given, there is the M sample of extraction put back to training set A training subset is constituted, n times is extracted and obtains N number of training subset.N number of grader is trained by this N number of training subset, you can obtain N number of anticipation function sequence, then with being predicted to sample set and obtain last predict the outcome by most voting mechanisms.
Recall rate (Recall Rate, be also recall ratio) is all phases in the relevant documentation number and document library retrieved Close number of files ratio, measurement be searching system recall ratio;Accuracy rate is the relevant documentation number retrieved and retrieved The ratio of total number of documents, measurement be searching system precision ratio.It is the general expression that is:Precision ratio=(the relevant information retrieved Amount/retrieved message total amount) * 100%.Recall rate (Recall) and accuracy rate (Precise) are widely used in information retrieval With two metrics in Statistical Classification field, for the quality of evaluation result.F values are precision ratio (accuracy rate) and recall ratio The synthesis of (recall rate) two indexes, is a kind of overall target, in terms of classification of assessment effect is good and bad, than accuracy rate is used alone Or recall rate is more convincing.We use F1 values (i.e. F hereinβMiddle β=1) as comprehensive evaluation index.
The content of the invention
The present invention seeks to solve the problem of short text classification accuracy is low, there is provided a kind of Bagging_BSJ short essays one's duty Class method, to improve the accuracy rate of short text classification, recall rate and F values.
The present invention is high openness for short text, the features such as real-time and lack of standard, by the use of wikipedia as knowing Know storehouse, propose to carry out WLA semantic extensions, in bayesian algorithm, the basis of algorithm of support vector machine and J48 algorithms to short text On, with reference to Bagging Integrated Algorithm thoughts, it is proposed that the integrated short text sorting algorithms of Bagging_BSJ.This method is applied to In short text classification, semantic feature extension is carried out to short text, the short text after being extended to semantic feature utilizes Bagging_ BSJ algorithms are classified, and have obtained classification accuracy more more preferable than conventional method, recall rate and F values.
Technical scheme
A kind of Bagging_BSJ short text classification methods, this method mainly includes following committed step:
1st, the WLA short texts semantic feature extension based on Wiki feature;
1.1st, feature extraction.For giving feature lexical item, this feature lexical item is reflected by disambiguation, redirecting technique It is mapped in the corresponding wikipedia page, extracts page text information, and participle is carried out to these text messages, except stop words etc. Denoising, obtains the alternative extension that the element in one group of characteristic vector being made up of lexical item, this feature vector is characterized lexical item Lexical item;
1.2nd, semantic relation quantifies.By it is proposed that WLA (WikipediaLinksandAbstract) algorithm enter Row semantic relation is calculated, and quantitative description gives the semantic association between the alternative extension lexical item that feature lexical item and the 1.1st step are obtained Degree;
1.3rd, feature expanded set is built.Extracted by correlated characteristic, lexical item semantic relation quantify after, be each Given feature lexical item builds individual features expansion word item vector Ct{(c1,r1),(c2,r2),…,(ck,rk), wherein ci, i= 1,2 ..., k, are the alternative extension lexical item related to theme feature lexical item t, ri, i=1,2 ..., k, represent ciLanguage between t Adopted similarity, using these lexical items vector as following short text classify when sample.
2nd, the Bagging_BSJ short text sorting algorithms based on Integrated;
2.1st, training set S={ (x1,y1),(x2,y2),…,(xm,yn) in contain m article n kind classifications, wherein xi For training sample, yjFor xiCorresponding class label;
2.2nd, using there is the sampling techniques put back to extract Z respectively from training set S1Individual, Z2Individual, Z3Individual training sample Containing g sample in collection, each subset;
2.3rd, respectively with the Bagging graders that naive Bayesian is base grader to preceding Z1Individual subset is trained, The model trained is designated asSimilar, middle Z2Individual subset and last Z3Individual subset is respectively with support Vector machine and J48 are that base grader is trained, and obtained disaggregated model is designated as respectivelyWithZ is obtained with this method training1+Z2+Z3Individual grader;
2.4th, assorting process is the disaggregated model H for obtaining 2.3 trainingi, i=1,2 ..., Z1+Z2+Z3, act on Sample (i.e. new samples data) to be sorted, and integrated processing is carried out to classification results by means of Voting Algorithm, so as to judge new Sample class;I.e.:
Wherein, WLA (semantic relevancy) algorithmic formula described in the quantization of the 1.2nd step semantic relation derives as follows:
First, it is contemplated that two lexical items corresponding wikipedia theme page abstract section calculates the semantic phase of two entries Guan Du, formula is as follows:
Wherein, a, b are two candidate topics, N1, N2It is group of words T respectively1, T2Word quantity, q is two group of words Public word number, MAX (N1,N2)/MIN(N1,N2) it is mediation parameter, T1WiIt is group of words T1In i-th of public word power Weight, wherein tfiIt is the frequency that i-th of word occurs in a document, calculation formula is as follows:
Wherein, V represents T1、T2In public word frequency sum.
Secondly, it is considered to entered using the chain in the wikipedia theme page pointed by lexical item and go out information with chain to calculate semanteme The degree of correlation, wherein, it is as follows that the chain that David Milne are proposed enters computational methods:
Because the wikipedia theme page also has chain to go out structure, it is also contemplated for wherein so chain is gone out into structure, finally using chain The formula for connecing Structure Calculation semantic relevancy is as follows
Siml(a, b)=β Simout(a,b)+(1-β)Simin(a,b)
A, b are two candidate topics, and A, B is that the chain of the corresponding theme page enters quantity, and W is wikipedia theme page number Amount, simout(a, b) is the semantic relevancy for going out calculating by theme page chain, its computational methods and simin(a, b) is identical, comprehensive Belonging to upper, show that WLA calculation formula is as follows:
WLAsim(a,b)=α Simα(a,b)+(1-α)Siml(a,b)
In the present invention, α, β represent the text message of lexical item correspondence wikipedia and the weights of link structure respectively, respectively Take α=0.7, β=0.3 i.e.,
Sim=0.7*Simα(a,b)+0.3*Siml(a,b)
Wherein Siml=0.7*Simin+0.3*Simout
Feature expanded set construction method process is as follows described in 1.3rd step:
It is that each given theme is special as shown in figure 1, being extracted by correlated characteristic, after semantic relation quantifies between lexical item Levy lexical item and build corresponding feature expansion word item vector Ct{(c1,r1),(c2,r2),…,(ck,rk), wherein ci(i=1,2 ..., K) it is the alternative extension lexical item related to theme feature lexical item t, ri(i=1,2 ..., k) represent ciIt is semantic similar between t Degree.The problem of considering the alternative extension lexical item frequency of occurrences, the present invention will alternatively extend lexical item and theme feature using equation below Semantic similarity and its frequency of occurrences between lexical item are integrated.
riThe semantic similarity between alternative extension lexical item and theme feature lexical item t is represented, k represents theme feature lexical item t The number of element, N in corresponding alternative expansion word item vectoriRepresent tiThe frequency of appearance.Wherein CtThe order of middle element according to riBig minispread.
Bagging_BSJ arthmetic statements are as follows:
According to algorithm above, Bagging_BSJ algorithm flow charts are as shown in Figure 2.The part presentation class device of solid line connection Training process, dotted line connection part represent test process.When training grader, the Sampling techniques put back to first by having Extract Z1+Z2+Z3Individual training sample subset, then the Bagging graders using naive Bayesian as base grader are to preceding Z1Height Collection is trained, and the model trained is designated asSimilar, middle Z2Individual subset and last Z3Individual subset It is trained respectively using SVMs and J48 as base grader, obtained disaggregated model is designated as respectively WithZ is obtained with this method training1+Z2+Z3Individual grader.
Advantages and positive effects of the present invention
The present invention is applied in short text classification, and WLA semantic extensions are carried out to short text, carries out correlated characteristic extraction, right Semantic relation quantification treatment, construction feature expanded set, and based on Bagging Integrated Algorithm thoughts, calculated with reference to naive Bayesian Method, algorithm of support vector machine and J48 algorithms, overcome the defect of three kinds of algorithms, propose Bagging_BSJ algorithms.Can be more preferable Feature extension and classification are carried out to short text.Theoretical and experiment shows that this method is being permitted than traditional NB Algorithm etc. Many-side has more preferable efficiency, such as accuracy rate, recall rate and F values etc..
Bagging_BSJ methods proposed by the present invention can be applicable to the various aspects of short text classification, and such as QQ message is micro- Letter, short message, microblogging etc..The present invention can effectively make up the defects such as sparse, the semantic scarcity of short essay eigen, and for the analysis of public opinion, The fields such as social instant message processing provide reference means, clear with algorithm steps, excellent to short text classification effectiveness height etc. Point, thus with very strong actual application value.
Brief description of the drawings
Fig. 1 is Wiki extension short text feature lexical item table model figure.
Fig. 2 is Bagging_BSJ algorithm flow charts of the present invention.
Fig. 3 is the corresponding wikipedia theme page figure of lexical item.
Fig. 4 is that a variety of sorting algorithms carry out classification time cost figure to identical data.
Fig. 5 is classification accuracy figure of a variety of sorting algorithms on different pieces of information collection.
Fig. 6 is classification recall rate figure of a variety of sorting algorithms on different pieces of information collection.
Fig. 7 is classification F value figure of a variety of sorting algorithms on different pieces of information collection.
Fig. 8 is classification time loss figure of a variety of sorting algorithms on different pieces of information collection.
Embodiment
Embodiment one, short text WLA semantic extensions and Bagging_BSJ classification
The WLA semantic features extension based on Wiki is carried out to short text, and is classified with Bagging_BSJ algorithms, is had Body step is as follows:
1st, WLA semantic extensions are carried out to short text
(1) lexical item book is given, the theme page corresponding to it is found, as shown in Figure 3.Utilize Lucence participle instruments After being pre-processed, obtain one group and alternatively extend lexical item { write, printing, illustration, sheet, text, e- Book, page, paper, ink, parchment, material, book, leaf }, as lexical item book is based on wikipedia Simple feature extends lexical item table.
(2) calculated using the Arithmetic of Semantic Similarity WLA based on wikipedia proposed, utilize equation below:
WLAsim(a,b)=α Sima(a,b)+(1-α)Siml(a,b)
α=0.7, β=0.3 is taken respectively, is obtained
Sim=0.3*Siml+0.7*Simt
Wherein Siml=0.7*Simin+0.3*Simout
Calculate semantic related between theme feature lexical item book and each alternative extension feature lexical item in alternative extension lexical item Property, obtain result as follows:
(write, 0.74), (printing, 0.73), (illustration, 0.78), (sheet, 0.79), (text, 0.88), (book, 1), (e-book, 0.828), (page, 0.876), (paper, 0.86) }.
(3) feature extension lexical item table is built
Five feature lexical items of top ranked are selected as lexical item book feature extension lexical item, i.e., book, text, Page, paper,
E-book }, (1) and (2) two steps are repeated to this five lexical items, obtaining Wiki spread vector is:{e- Book, information, source, physical, database, document, material, newspaper, digital…}
The lexical item vector after semantic extension, which can finally be obtained, is:
{ (information, 0.82), (database, 0.798), (book, 0.796) ... }
2nd, Bagging_BSJ classification is carried out to the lexical item after WLA semantic extensions
Using Weka digging tools, with Bagging_BSJ sorting algorithms model proposed by the present invention to word obtained above Item is classified, wherein, take Z1=Z2=Z3=15, g=1000.
From Fig. 4, it could be assumed that:Bagging_BSJ algorithms proposed by the present invention are slightly more than SVM and NB required times, But much smaller than J48 algorithms.Be due to J48 disaggregated models when classifying to each group of experimental data, be required for instructing again Practice model.
Different type short text is classified, including following three kinds of data types.Undressed short essay basis data, By Wiki extend after short text and this method propose WLA semantic extensions after short text.Respectively use NB, SVM, J48 and The inventive method Bagging_BSJ algorithms are classified, and are obtained a result as shown in Fig. 5, Fig. 6, Fig. 7 and Fig. 8.
From Fig. 5, it could be assumed that:Classification accuracy of each data set on different graders all shows more consistent Trend, Wiki extension and WLA semantic extensions after data set classification accuracy (94.6%), extended far above without feature Short text classification.And by Wiki extend after short text classification accuracy rate, slightly below by the short text after WLA semantic extensions Classification accuracy.
From Fig. 6, it could be assumed that:Inventive algorithm Bagging_BSJ recall rate (93.3%) behaves oneself best, other The classificatory recall rate highest of short text of the grader after WLA semantic extensions, the classification in the original data set not extended is called together The rate of returning is minimum.Although and classification of the Bagging_BSJ graders on Wiki growth data collection and WLA semantic extension data sets is called together The rate of returning is equal, but far above the recall rate in original data set.
From Fig. 7, it could be assumed that:Consider classification accuracy and recall rate, i.e. F values.Short essay one's duty after extension The short essay basis data classification that analogy does not extend is demonstrated by preferable F values (94.1%).Compared to former data and Wiki spreading number According to the effect of the short text classification after WLA semantic extensions proposed by the present invention is best.
From Fig. 8, it could be assumed that:Spend minimum based on the time that short essay basis data are classified, base proposed by the present invention It is slightly above the classification processing time of former data in the classification processing time of WLA semantic extensions, but less than wikipedia simple extension The time loss of short text classification.
Complex chart 5, Fig. 6, Fig. 7 and Fig. 8, it was therefore concluded that:It is proposed by the present invention to be based on WLA relative to other sorting techniques Short text Bagging_BSJ sorting techniques after semantic feature extension are in accuracy rate, recall rate, shown in the index such as F values Preferable performance.It is effective to solve that accuracy rate when traditional text disaggregated model is classified applied to short text is low, recall rate is low Problem, while also shortening the time cost of short text classification.

Claims (1)

1. a kind of Bagging_BSJ short text classification methods, it is characterised in that this method mainly includes following committed step:
1st, the WLA short texts semantic feature extension based on wikipedia knowledge base;
1.1st, correlated characteristic is extracted, for giving feature lexical item, is mapped to this feature lexical item by disambiguation justice and redirection In the corresponding wikipedia page, extract page text information, and to these text messages carry out denoising, obtain one group by Element in the characteristic vector of lexical item composition, this feature vector is characterized the alternative extension lexical item of lexical item;
1.2nd, semantic relation quantifies, and semantic relation is carried out by WLA (Wikipedia Links and Abstract) algorithm Calculate, quantitative description gives the semantic association degree between the alternative extension lexical item that feature lexical item and the 1.1st step are obtained;
1.3rd, extracted by correlated characteristic, after semantic relation quantifies between lexical item, be each given theme feature lexical item structure Build corresponding feature expansion word item vector Ct{(c1,r1),(c2,r2),…,(ck,rk), wherein ci, i=1,2 ..., k, be and master Inscribe the related alternative extension lexical items of feature lexical item t, ri, i=1,2 ..., k, represent ciSemantic similarity between t, by these Sample when lexical item vector is classified as following short text;
2nd, the Bagging_BSJ short text sorting algorithms based on Integrated;
2.1st, training set S={ (x are assumed1,y1),(x2,y2),…,(xm,yn) in contain m article n kind classifications, wherein xiFor Training sample, yjFor xiCorresponding class label;
2.2nd, using there is the sampling techniques put back to extract Z from training set S respectively1Individual, Z2Individual, Z3Individual training sample subset, often Height is concentrated containing g sample;
2.3rd, the Bagging graders using naive Bayesian as base grader are to preceding Z1Individual subset is trained, the mould trained Type is designated asSimilar, middle Z2Individual subset and last Z3Individual subset is respectively with SVMs and J48 It is trained for base grader, obtained disaggregated model is designated as respectivelyWithWith this Method training obtains Z1+Z2+Z3Individual grader;
2.4th, assorting process is the disaggregated model H for obtaining 2.3 trainingi, i=1,2 ..., Z1+Z2+Z3, act on to be sorted Sample, and integrated processing is carried out to classification results by means of Voting Algorithm, so as to judge new samples classification;I.e.:
CN201710554325.8A 2017-07-10 2017-07-10 A kind of Bagging_BSJ short text classification methods Pending CN107292348A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710554325.8A CN107292348A (en) 2017-07-10 2017-07-10 A kind of Bagging_BSJ short text classification methods

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710554325.8A CN107292348A (en) 2017-07-10 2017-07-10 A kind of Bagging_BSJ short text classification methods

Publications (1)

Publication Number Publication Date
CN107292348A true CN107292348A (en) 2017-10-24

Family

ID=60100968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710554325.8A Pending CN107292348A (en) 2017-07-10 2017-07-10 A kind of Bagging_BSJ short text classification methods

Country Status (1)

Country Link
CN (1) CN107292348A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804622A (en) * 2018-08-20 2018-11-13 天津探数科技有限公司 A kind of short text grader building method considering semantic background
CN110162379A (en) * 2018-04-24 2019-08-23 腾讯云计算(北京)有限责任公司 Virtual machine migration method, device and computer equipment
CN110532540A (en) * 2018-05-25 2019-12-03 北京京东尚科信息技术有限公司 Determine method, system, computer system and the readable storage medium storing program for executing of user preference
CN111563165A (en) * 2020-05-11 2020-08-21 北京中科凡语科技有限公司 Statement classification method based on anchor word positioning and training statement augmentation
CN112749756A (en) * 2021-01-21 2021-05-04 淮阴工学院 Short text classification method based on NB-Bagging

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955856A (en) * 2012-11-09 2013-03-06 北京航空航天大学 Chinese short text classification method based on characteristic extension
KR20160121999A (en) * 2015-04-13 2016-10-21 연세대학교 산학협력단 Apparatus and Method of Support Vector Machine Classifier Using Multistage Sub-Classifier

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102955856A (en) * 2012-11-09 2013-03-06 北京航空航天大学 Chinese short text classification method based on characteristic extension
KR20160121999A (en) * 2015-04-13 2016-10-21 연세대학교 산학협력단 Apparatus and Method of Support Vector Machine Classifier Using Multistage Sub-Classifier

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
秦靓靓: "基于维基百科的短文本特征扩展及分类算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110162379A (en) * 2018-04-24 2019-08-23 腾讯云计算(北京)有限责任公司 Virtual machine migration method, device and computer equipment
CN110532540A (en) * 2018-05-25 2019-12-03 北京京东尚科信息技术有限公司 Determine method, system, computer system and the readable storage medium storing program for executing of user preference
CN110532540B (en) * 2018-05-25 2024-04-09 北京京东尚科信息技术有限公司 Method, system, computer system and readable storage medium for determining user preferences
CN108804622A (en) * 2018-08-20 2018-11-13 天津探数科技有限公司 A kind of short text grader building method considering semantic background
CN111563165A (en) * 2020-05-11 2020-08-21 北京中科凡语科技有限公司 Statement classification method based on anchor word positioning and training statement augmentation
CN111563165B (en) * 2020-05-11 2020-12-18 北京中科凡语科技有限公司 Statement classification method based on anchor word positioning and training statement augmentation
CN112749756A (en) * 2021-01-21 2021-05-04 淮阴工学院 Short text classification method based on NB-Bagging
CN112749756B (en) * 2021-01-21 2023-10-13 淮阴工学院 Short text classification method based on NB-Bagging

Similar Documents

Publication Publication Date Title
CN107292348A (en) A kind of Bagging_BSJ short text classification methods
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN104750844B (en) Text eigenvector based on TF-IGM generates method and apparatus and file classification method and device
CN102789498B (en) Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning
CN109558487A (en) Document Classification Method based on the more attention networks of hierarchy
CN107451278A (en) Chinese Text Categorization based on more hidden layer extreme learning machines
CN106445919A (en) Sentiment classifying method and device
CN104573046A (en) Comment analyzing method and system based on term vector
CN105868184A (en) Chinese name recognition method based on recurrent neural network
CN104834940A (en) Medical image inspection disease classification method based on support vector machine (SVM)
CN106815369A (en) A kind of file classification method based on Xgboost sorting algorithms
CN106844632A (en) Based on the product review sensibility classification method and device that improve SVMs
CN103473380B (en) A kind of computer version sensibility classification method
CN109189926A (en) A kind of construction method of technical paper corpus
CN109446423B (en) System and method for judging sentiment of news and texts
CN104285224A (en) Method for classifying text
Basha et al. A novel summarization-based approach for feature reduction enhancing text classification accuracy
CN113312480A (en) Scientific and technological thesis level multi-label classification method and device based on graph convolution network
Sadr et al. Unified topic-based semantic models: A study in computing the semantic relatedness of geographic terms
CN104794209B (en) Chinese microblogging mood sorting technique based on Markov logical network and system
CN103268346A (en) Semi-supervised classification method and semi-supervised classification system
CN106055596A (en) Multi-tag on-line news reader emotion prediction method
CN111859955A (en) Public opinion data analysis model based on deep learning
Wei et al. The instructional design of Chinese text classification based on SVM
Handayani et al. Sentiment Analysis Of Electric Cars Using Recurrent Neural Network Method In Indonesian Tweets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171024