CN109165294A - Short text classification method based on Bayesian classification - Google Patents

Short text classification method based on Bayesian classification Download PDF

Info

Publication number
CN109165294A
CN109165294A CN201810951636.2A CN201810951636A CN109165294A CN 109165294 A CN109165294 A CN 109165294A CN 201810951636 A CN201810951636 A CN 201810951636A CN 109165294 A CN109165294 A CN 109165294A
Authority
CN
China
Prior art keywords
classification
data
short text
text
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810951636.2A
Other languages
Chinese (zh)
Other versions
CN109165294B (en
Inventor
水新莹
张宇光
黄亚坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Xunfei Intelligent Technology Co ltd
Original Assignee
Anhui Xunfei Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Xunfei Intelligent Technology Co ltd filed Critical Anhui Xunfei Intelligent Technology Co ltd
Priority to CN201810951636.2A priority Critical patent/CN109165294B/en
Publication of CN109165294A publication Critical patent/CN109165294A/en
Application granted granted Critical
Publication of CN109165294B publication Critical patent/CN109165294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a short text classification method based on Bayesian classification, which relates to the field of smart cities and electronic government affairs, and comprises the following steps: (1) preprocessing data and labeling categories; (2) completing word segmentation and incremental feature vector extraction of short text data, and mainly comprising the following two core steps; (3) establishing a short text classification model based on Bayes; (4) dividing the processed data set into a training set and a testing set, carrying out classification model training, and carrying out model optimization according to the result of the training set; (5) according to the trained model, short text data of unknown classes are input, the probability that the current input text belongs to each class is output, the class with the highest probability is selected as the result of the final classification class, and the short text classification method based on the Bayesian classification can effectively, intelligently and automatically classify the short text content.

Description

A kind of short text classification method based on Bayes's classification
Technical field
The present invention relates to smart cities and E-Government field, and in particular to a kind of short essay one's duty based on Bayes's classification Class method.
Background technique:
With the development of mobile Internet and social networks, the rise of the social softwares such as microblogging, wechat, company and portion of government Door is also gradually established connection using social software, is linked up.Issue frequency height, short and small content of text is mobile social media Feature, the scale of short text content are also being skyrocketed through.In search engine, intelligent customer service and public sentiment monitoring field, short text It is studied emphasis.In face of so huge and constantly incremental netizen's quantity, from various phenomenon description, personal letter, In the imperfect text informations such as comment, useful information is extracted, seems particularly important to policymaker such as media, governments.Manually Therefore, how efficiently, intelligence processing huge large-scale short text classification such as extracts at the inefficiency, can not usually efficiently accomplish task, Can, important in inhibiting of the effective classification to the construction for promoting E-Government automatically be carried out to short text content.
For the technology of existing text classification mainly from the representative degree of keyword, i.e. popularity proposes the similar methods such as weight To carry out the design of core classification algorithm;For example, in existing literature " a kind of file classification method based on cluster word insertion ", it is main If k- mean algorithm is applied on the word vector of document, the cluster set of fixed size, the mass center of each cluster are obtained It is interpreted a super word insertion, each insertion word in text collection is assigned to nearest cluster centers.Each cluster Mass center be interpreted a super word insertion, each insertion word in text collection is assigned to nearest cluster centers.Often A text is represented as a super word insertion packet, calculates each super word and is embedded in the frequency in respective text, that is, obtains Obtain the type of text.
Above-mentioned short text classification method is analyzed it is found that the selection of keyword affects classifying quality, needs to consider keyword Quantity and feature popularity, and short text classification in, short text characteristic key words are few, during actual classification, close Keyword is difficult to the inherent meaning of effective expression short text, and being easy to produce a text, there are the results of multiple class categories;In addition, Semantic information in short text also affect classification as a result, and in the prior art extract characteristic key words method to long text Classification there is preferable effect, and short text is difficult to effectively classify
Such as application No. is CN201710216502.1 to disclose a kind of text classifier obtained for automatic marking corpus Method and text classifier, this method includes determining concept set, with general in the corresponding concept keyword set of each concept It reads keyword and match simultaneously automatic marking processing to un-annotated data text;For each concept, when the corresponding mark of the concept When amount of text meets threshold condition in note corpus text collection, then corresponding text classification mould is trained to the concept Type, obtains corresponding text classifier, finally obtains the text corresponding with the concept that all amount of text meet threshold condition Classifier set.This kind of algorithm structure has universality, can neatly change classification system, save calculating time and resource, And the present invention provides a small amount of initial corpus text, and automatic marking, without artificial mark, further save the time and Cost, but this kind of classification method is it is not disclosed how make the higher technical solution of its accuracy by independently training.
Such as application No. is CN201710882685.0 disclose a kind of method for establishing textual classification model and text classification, Device, method for building up include: acquisition training sample;Corresponding moment of a vector is obtained after carrying out word cutting to text based on entity dictionary Battle array;Using the vector matrix of text and the classification of text, the first disaggregated model of training and the second disaggregated model;In training process In, the loss function of textual classification model is obtained using the loss function of the first disaggregated model and the second disaggregated model, and utilize The loss function of textual classification model obtains by the first and second disaggregated model structures the first and second disaggregated model adjusting parameters At textual classification model.The method of text classification includes: to obtain text to be sorted;Text is cut based on entity dictionary The corresponding vector matrix of text is obtained after word;Vector matrix is inputted into textual classification model, according to the output of textual classification model, The classification results of the text are obtained, but this kind of classification method by independently training it is not disclosed how keep its accuracy higher Technical solution.
Summary of the invention
The purpose of the present invention is to provide a kind of short text classification methods based on Bayes's classification, to solve the prior art In caused above-mentioned defects.
A kind of short text classification method based on Bayes's classification, which is characterized in that this method includes following steps:
(1) data prediction and classification mark:
Step 1: the history short text data reported is extracted, and routine data cleaning, data integration etc. are carried out to data Reason improves the quality of data;
Step 2: to the data after preliminary cleaning are completed, the processed short text of history has been accomplished manually classification mark, to working as Preceding untreated partial data carries out artificial classification mark, completes process of data preprocessing;
(2) participle and increment feature vector for completing short text data extract, and are broadly divided into following two core procedure:
Step 1: the three-party library Jieba participle based on Python segments the short text content after cleaning;
Step 2: proposing increment feature vector, and TF-IDF combined to carry out keyword extraction, if keyword is very few, It directly uses and all segments phrase as final sorting parameter input;
(3) the short text disaggregated model based on Bayes is established;
(4) training set and test set are divided into processed data acquisition system, carry out disaggregated model training, and according to training The result of collection carries out the optimization of model;
(5) it according to trained model, the short text data of unknown input classification, exports currently to input text and belonging to The probability of each classification chooses result of the classification of maximum probability as final classification classification.
Preferably, the data prediction includes following four step:
Step 1: carrying out cleaning classification for initial data, and text is divided into three classifications using Kettle, is major class respectively Serial number, group serial number and text;
Step 2: the data handled well are stored in database;
Step 3: the content i.e. plain text of third field are segmented using Jiaba participle;
Step 4: the every row of the word divided is left by three word deposit databases according to part of speech.
Preferably, the increment feature vector sum TF-IDF Feature Words extraction method carry out the extraction of characteristic key words include with Lower two steps:
Step 1: note B=(B1,B2,...,Bu) be the Feature Words composition extracted from text feature vector, u herein Value be it is smaller, such as 3 or 4, such as rubbish reports information, there is a floating on water surface in the position of rubbish, and in greenbelt, road surface will The word for describing the distributing position of rubbish is summarised as a new Feature Words Bu+1, it gives and names, and so on, work as u=5, 6 ..., m just obtains increment feature vector B=(B1,B2,...,Bm);
Step 2: if the TF high of some word or phrase frequency of occurrences in an article, and in other articles very It is few to occur, then it is assumed that this word or phrase have good class discrimination ability, are adapted to classify, the feature extraction of TF-IDF Function are as follows: f (w)=TF (w) × IDF (w) × log [N*n (w)+1] completes feature to short text content according to above-mentioned formula and closes Keyword extracts, firstly, the TF value of Feature Words w is denoted asTF(w), often characteristic item frequency TF is risen in conjunction with anti-document frequency IDF To use;Then IDF (w)=log [N*n (w)+1] is calculated, N is text sum, and n (w) is the textual data comprising w.
Preferably, to the short text sample record of input, B=(B1, B2 ..., Bm) be extract feature vector, C1, C2 ..., Cn is n classification results;P (Ci | B), i=1,2 ..., n indicates that text to be sorted belongs to i-th of classification results Probability P (Bj| Ci), j=1,2 ..., m, i=1,2 ..., n indicate that j-th of Feature Words belongs to the probability of the i-th class, specifically counting It is lower shown based on Bayesian formula in calculation:
When classify new text when, it is only necessary to new sample is sentenced probability by the value for calculating P in n classification (Ci | B) It is worth in maximum class, wherein probability P (B) is the constant unrelated with classification, further according to feature vector B=(B1,B2,...,Bm) each Independence between a Feature Words, above-mentioned calculation formula can simplify are as follows:
Preferably, according to the model of foundation, the classification ownership of unknown short text information is calculated, if N is the sample of prediction Sum, Cou (Ci) indicate i-th of classification counting in the sample, then P (Ci)=Cou (Ci)/N, Cou (Bij) indicate i-th point In class, the number of j-th of Feature Words, then P (Bj|Ci)=Cou (Bij)/Cou(Ci), belong to finally, calculating sample to be sorted The probability of each classification obtains maximum probability
The present invention has the advantages that the short text classification method based on Bayes's classification is somebody's turn to do, according to the short essay of reporting of user Classified after this content analysis and be distributed to service unit, for the short text assorting process of core, first to source data into Row data cleansing, the regular processing such as integrated, and extraction section short essay data are as training data, according to the demand of classification to extraction Data carry out classification annotation;Then, the three-party library Jieba participle based on Python divides the short text content after cleaning Word, and keyword is extracted based on TF-IDF, it is contemplated that short text content is few, and therefore, the keyword that TF-IDF is extracted is as pattra leaves Reference before this classification model construction, if the keyword extracted is very few, the phrase after the short text that then be used directly participle carries out classification and builds Mould establishes disaggregated model based on Bayesian formula, and adjust correlation model according to above-mentioned steps, until the precision of class test Until tending towards stability.
Detailed description of the invention
Fig. 1 is flow chart of the method for the present invention.
Fig. 2 is the flow chart of data processing in the present invention.
Specific embodiment
To be easy to understand the technical means, the creative features, the aims and the efficiencies achieved by the present invention, below with reference to Specific embodiment, the present invention is further explained.
As depicted in figs. 1 and 2, a kind of short text classification method based on Bayes's classification, which is characterized in that this method packet Include following steps:
(1) data prediction and classification mark:
Step 1: the history short text data reported is extracted, and routine data cleaning, data integration etc. are carried out to data Reason improves the quality of data;
Step 2: to the data after preliminary cleaning are completed, the processed short text of history has been accomplished manually classification mark, to working as Preceding untreated partial data carries out artificial classification mark, completes process of data preprocessing;
(2) participle and increment feature vector for completing short text data extract, and are broadly divided into following two core procedure:
Step 1: the three-party library Jieba participle based on Python segments the short text content after cleaning;
Step 2: proposing increment feature vector, and TF-IDF combined to carry out keyword extraction, if keyword is very few, It directly uses and all segments phrase as final sorting parameter input;
(3) the short text disaggregated model based on Bayes is established;
(4) training set and test set are divided into processed data acquisition system, carry out disaggregated model training, and according to training The result of collection carries out the optimization of model;
(5) it according to trained model, the short text data of unknown input classification, exports currently to input text and belonging to The probability of each classification chooses result of the classification of maximum probability as final classification classification.
It is worth noting that, the data prediction includes following four step:
Step 1: carrying out cleaning classification for initial data, and text is divided into three classifications using Kettle, is major class respectively Serial number, group serial number and text;
Step 2: the data handled well are stored in database;
Step 3: the content i.e. plain text of third field are segmented using Jiaba participle;
Step 4: the every row of the word divided is left by three word deposit databases according to part of speech.
In the present embodiment, the increment feature vector sum TF-IDF Feature Words extraction method carries out the extraction of characteristic key words Including following two step:
Step 1: note B=(B1,B2,...,Bu) be the Feature Words composition extracted from text feature vector, u herein Value be it is smaller, such as 3 or 4, such as rubbish reports information, there is a floating on water surface in the position of rubbish, and in greenbelt, road surface will The word for describing the distributing position of rubbish is summarised as a new Feature Words Bu+1, it gives and names, and so on, work as u=5, 6 ..., m just obtains increment feature vector B=(B1,B2,...,Bm);
Step 2: if the TF high of some word or phrase frequency of occurrences in an article, and in other articles very It is few to occur, then it is assumed that this word or phrase have good class discrimination ability, are adapted to classify, the feature extraction of TF-IDF Function are as follows: f (w)=TF (w) × IDF (w) × log [N*n (w)+1] completes feature to short text content according to above-mentioned formula and closes Keyword extracts, firstly, the TF value of Feature Words w is denoted as TF (w), often by characteristic item frequency TF in conjunction with anti-document frequency IDF Get up to use;Then IDF (w)=log [N*n (w)+1] is calculated, N is text sum, and n (w) is the textual data comprising w.
In the present embodiment, to the short text sample record of input, B=(B1, B2,...,Bm) it is the feature vector extracted, C1,C2,...,CnFor n classification results;P(Ci| B), i=1,2 ..., n indicates that text to be sorted belongs to i-th of classification results Probability P (Bj|Ci), j=1,2 ..., m, i=1,2 ..., n indicate that j-th of Feature Words belongs to the probability of the i-th class, specific It is lower shown based on Bayesian formula in calculating:
When the new text of classification, it is only necessary to calculate P (C in n classificationi| B) value, new sample is sentenced into probability It is worth in maximum class, wherein probability P (B) is the constant unrelated with classification, further according to feature vector B=(B1,B2,...,Bm) each Independence between a Feature Words, above-mentioned calculation formula can simplify are as follows:
In addition, according to the model of foundation, the classification ownership of unknown short text information is calculated, if N be that the sample predicted is total Number, Cou (Ci) indicate i-th of classification counting in the sample, then P (Ci)=Cou (Ci)/N, Cou (Bij) indicate i-th of classification In, the number of j-th of Feature Words, then P (Bj|Ci)=Cou (Bij)/Cou(Ci), belong to often finally, calculating sample to be sorted The probability of a classification obtains maximum probability
Based on above-mentioned, should short text classification method based on Bayes's classification, this method includes following steps: (1) number Data preprocess and classification mark;(2) participle and increment feature vector for completing short text data extract, and are broadly divided into following two Core procedure;(3) the short text disaggregated model based on Bayes is established;(4) training set is divided into processed data acquisition system And test set, disaggregated model training is carried out, and the optimization of model is carried out according to the result of training set;(5) according to trained Model, the short text data of unknown input classification export currently to input the probability that text belongs to each classification, choose probability most Big classification is as final classification classification as a result, being classified and being distributed to according to after the short text content analysis of reporting of user Service unit carries out data cleansing, the regular processing such as integrated to source data first for the short text assorting process of core, and Extraction section short essay data carry out classification annotation as training data, according to data of the demand of classification to extraction;Then, it is based on The three-party library Jieba participle of Python segments the short text content after cleaning, and extracts keyword based on TF-IDF, examines It is few to consider short text content, therefore, the keyword that TF-IDF is extracted is as the reference before Bayes's classification modeling, if the pass extracted Keyword is very few, and the phrase after the short text that then be used directly participle carries out classification model construction, according to above-mentioned steps, is based on Bayesian formula Disaggregated model is established, and adjusts correlation model, until the precision of class test tends towards stability.
As known by the technical knowledge, the present invention can pass through the embodiment party of other essence without departing from its spirit or essential feature Case is realized.Therefore, embodiment disclosed above, in all respects are merely illustrative, not the only.Institute Have within the scope of the present invention or is included in the invention in the change being equal in the scope of the present invention.

Claims (5)

1. a kind of short text classification method based on Bayes's classification, which is characterized in that this method includes following steps:
(1) data prediction and classification mark:
Step 1: extracting the history short text data reported, and carry out routine data cleaning to data, and data integration etc. is handled, Improve the quality of data;
Step 2: to the data after preliminary cleaning are completed, the processed short text of history has been accomplished manually classification mark, to currently not The partial data of processing carries out artificial classification mark, completes process of data preprocessing;
(2) participle and increment feature vector for completing short text data extract, and are broadly divided into following two core procedure:
Step 1: the three-party library Jieba participle based on Python segments the short text content after cleaning;
Step 2: increment feature vector is proposed, and TF-IDF is combined to carry out keyword extraction, if keyword is very few, directly Use and all segments phrase as final sorting parameter input;
(3) the short text disaggregated model based on Bayes is established;
(4) training set and test set are divided into processed data acquisition system, carry out disaggregated model training, and according to training set As a result the optimization of model is carried out;
(5) according to trained model, the short text data of unknown input classification, export for currently input text belong to it is each The probability of classification chooses result of the classification of maximum probability as final classification classification.
2. a kind of short text classification method based on Bayes's classification according to claim 1, it is characterised in that: the number Data preprocess includes following four step:
Step 1: carrying out cleaning classification for initial data, and text is divided into three classifications using Kettle, is major class sequence respectively Number, group serial number and text;
Step 2: the data handled well are stored in database;
Step 3: the content i.e. plain text of third field are segmented using Jiaba participle;
Step 4: the every row of the word divided is left by three word deposit databases according to part of speech.
3. a kind of short text classification method based on Bayes's classification according to claim 1, it is characterised in that: the increasing The extraction that measure feature vector sum TF-IDF Feature Words extraction method carries out characteristic key words includes following two step:
Step 1: note B=(B1,B2,...,Bu) be the Feature Words composition extracted from text feature vector, the value of u herein Be it is smaller, such as 3 or 4, such as rubbish reports information, there is a floating on water surface in the position of rubbish, and in greenbelt, road surface will be described The word of the distributing position of rubbish is summarised as a new Feature Words Bu+1, it gives and names, and so on, work as u=5,6 ..., m Just increment feature vector B=(B is obtained1,B2,...,Bm);
Step 2: if the TF high of some word or phrase frequency of occurrences in an article, and seldom go out in other articles It is existing, then it is assumed that this word or phrase have good class discrimination ability, are adapted to classify, the feature extraction function of TF-IDF Are as follows: f (w)=TF (w) × IDF (w) × log [N*n (w)+1] completes characteristic key words to short text content according to above-mentioned formula It extracts, firstly, the TF value of Feature Words w is denoted as TF (w), often combines characteristic item frequency TF with anti-document frequency IDF It uses;Then IDF (w)=log [N*n (w)+1] is calculated, N is text sum, and n (w) is the textual data comprising w.
4. a kind of short text classification method based on Bayes's classification according to claim 3, it is characterised in that: to input Short text sample record, B=(B1,B2,...,Bm) it is the feature vector extracted, C1,C2,...,CnFor n classification results;P (Ci| B), i=1,2 ..., n indicates that text to be sorted belongs to the probability P (B of i-th of classification resultsj|Ci), j=1,2 ..., m, I=1,2 ..., n indicate that j-th of Feature Words belongs to the probability of the i-th class, are lower institute based on Bayesian formula in specific calculate Show:
When the new text of classification, it is only necessary to calculate P (C in n classificationi| B) value, by new sample sentence probability value maximum Class in, wherein probability P (B) is the constant unrelated with classification, further according to feature vector B=(B1,B2,...,Bm) each feature Independence between word, above-mentioned calculation formula can simplify are as follows:
5. a kind of short text classification method based on Bayes's classification according to claim 1, it is characterised in that: according to building Vertical model calculates the classification ownership of unknown short text information, if N is the total sample number of prediction, Cou (Ci) indicate i-th Classification counting in the sample, then P (Ci)=Cou (Ci)/N, Cou (Bij) indicate in i-th of classification, of j-th of Feature Words It counts, then P (Bj|Ci)=Cou (Bij)/Cou(Ci), finally, calculating the probability that sample to be sorted belongs to each classification, obtain most Big probability
CN201810951636.2A 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification Active CN109165294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810951636.2A CN109165294B (en) 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810951636.2A CN109165294B (en) 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification

Publications (2)

Publication Number Publication Date
CN109165294A true CN109165294A (en) 2019-01-08
CN109165294B CN109165294B (en) 2021-09-24

Family

ID=64896189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810951636.2A Active CN109165294B (en) 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification

Country Status (1)

Country Link
CN (1) CN109165294B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110619363A (en) * 2019-09-17 2019-12-27 陕西优百信息技术有限公司 Classification method for subclass names corresponding to long description of material data
CN111159414A (en) * 2020-04-02 2020-05-15 成都数联铭品科技有限公司 Text classification method and system, electronic equipment and computer readable storage medium
CN111488459A (en) * 2020-04-15 2020-08-04 焦点科技股份有限公司 Product classification method based on keywords
CN111985222A (en) * 2020-08-24 2020-11-24 平安国际智慧城市科技股份有限公司 Text keyword recognition method and related equipment
WO2020244336A1 (en) * 2019-06-04 2020-12-10 深圳前海微众银行股份有限公司 Alarm classification method and device, electronic device, and storage medium
CN112084308A (en) * 2020-09-16 2020-12-15 中国信息通信研究院 Method, system and storage medium for text type data recognition
CN112214598A (en) * 2020-09-27 2021-01-12 中润普达(十堰)大数据中心有限公司 Cognitive system based on hair condition
CN112256865A (en) * 2019-01-31 2021-01-22 青岛科技大学 Chinese text classification method based on classifier
CN112559748A (en) * 2020-12-18 2021-03-26 厦门市法度信息科技有限公司 Method for classifying stroke record data records, terminal equipment and storage medium
CN112883159A (en) * 2021-02-25 2021-06-01 北京精准沟通传媒科技股份有限公司 Method, medium, and electronic device for generating hierarchical category label for domain evaluation short text
CN113869356A (en) * 2021-08-17 2021-12-31 杭州华亭科技有限公司 Method for judging escape tendency of people based on Bayesian classification
CN114528404A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Method and device for identifying provincial and urban areas
CN114564582A (en) * 2022-02-25 2022-05-31 苏州浪潮智能科技有限公司 Short text classification method, device, equipment and storage medium
CN116956930A (en) * 2023-09-20 2023-10-27 北京九栖科技有限责任公司 Short text information extraction method and system integrating rules and learning models

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8725732B1 (en) * 2009-03-13 2014-05-13 Google Inc. Classifying text into hierarchical categories
CN104850650A (en) * 2015-05-29 2015-08-19 清华大学 Short-text expanding method based on similar-label relation
WO2016090197A1 (en) * 2014-12-05 2016-06-09 Lightning Source Inc. Automated content classification/filtering
CN106407482A (en) * 2016-12-01 2017-02-15 合肥工业大学 Multi-feature fusion-based online academic report classification method
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8725732B1 (en) * 2009-03-13 2014-05-13 Google Inc. Classifying text into hierarchical categories
WO2016090197A1 (en) * 2014-12-05 2016-06-09 Lightning Source Inc. Automated content classification/filtering
CN104850650A (en) * 2015-05-29 2015-08-19 清华大学 Short-text expanding method based on similar-label relation
CN106407482A (en) * 2016-12-01 2017-02-15 合肥工业大学 Multi-feature fusion-based online academic report classification method
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范云杰、刘怀亮: "基于维基百科的中文短文本分类研究", 《现代图书情报技术》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256865A (en) * 2019-01-31 2021-01-22 青岛科技大学 Chinese text classification method based on classifier
CN112256865B (en) * 2019-01-31 2023-03-21 青岛科技大学 Chinese text classification method based on classifier
WO2020244336A1 (en) * 2019-06-04 2020-12-10 深圳前海微众银行股份有限公司 Alarm classification method and device, electronic device, and storage medium
CN110619363A (en) * 2019-09-17 2019-12-27 陕西优百信息技术有限公司 Classification method for subclass names corresponding to long description of material data
CN111159414A (en) * 2020-04-02 2020-05-15 成都数联铭品科技有限公司 Text classification method and system, electronic equipment and computer readable storage medium
CN111488459B (en) * 2020-04-15 2022-07-22 焦点科技股份有限公司 Product classification method based on keywords
CN111488459A (en) * 2020-04-15 2020-08-04 焦点科技股份有限公司 Product classification method based on keywords
CN111985222A (en) * 2020-08-24 2020-11-24 平安国际智慧城市科技股份有限公司 Text keyword recognition method and related equipment
CN111985222B (en) * 2020-08-24 2023-07-18 平安国际智慧城市科技股份有限公司 Text keyword recognition method and related equipment
CN112084308A (en) * 2020-09-16 2020-12-15 中国信息通信研究院 Method, system and storage medium for text type data recognition
CN112214598A (en) * 2020-09-27 2021-01-12 中润普达(十堰)大数据中心有限公司 Cognitive system based on hair condition
CN112214598B (en) * 2020-09-27 2023-01-13 吾征智能技术(北京)有限公司 Cognitive system based on hair condition
CN112559748A (en) * 2020-12-18 2021-03-26 厦门市法度信息科技有限公司 Method for classifying stroke record data records, terminal equipment and storage medium
CN112883159A (en) * 2021-02-25 2021-06-01 北京精准沟通传媒科技股份有限公司 Method, medium, and electronic device for generating hierarchical category label for domain evaluation short text
CN113869356A (en) * 2021-08-17 2021-12-31 杭州华亭科技有限公司 Method for judging escape tendency of people based on Bayesian classification
CN114528404A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Method and device for identifying provincial and urban areas
CN114564582A (en) * 2022-02-25 2022-05-31 苏州浪潮智能科技有限公司 Short text classification method, device, equipment and storage medium
CN114564582B (en) * 2022-02-25 2024-06-28 苏州浪潮智能科技有限公司 Short text classification method, device, equipment and storage medium
CN116956930A (en) * 2023-09-20 2023-10-27 北京九栖科技有限责任公司 Short text information extraction method and system integrating rules and learning models

Also Published As

Publication number Publication date
CN109165294B (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN109165294A (en) Short text classification method based on Bayesian classification
CN108710651B (en) Automatic classification method for large-scale customer complaint data
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN107577688B (en) Original article influence analysis system based on media information acquisition
CN102289522B (en) Method of intelligently classifying texts
CN110825877A (en) Semantic similarity analysis method based on text clustering
CN110781679B (en) News event keyword mining method based on associated semantic chain network
CN110415071B (en) Automobile competitive product comparison method based on viewpoint mining analysis
CN1687924A (en) Method for producing internet personage information search engine
CN109657063A (en) A kind of processing method and storage medium of magnanimity environment-protection artificial reported event data
CN111309864A (en) User group emotional tendency migration dynamic analysis method for microblog hot topics
CN109783623A (en) The data analysing method of user and customer service dialogue under a kind of real scene
CN112149422B (en) Dynamic enterprise news monitoring method based on natural language
CN111782806A (en) Artificial intelligence algorithm-based similar marketing enterprise retrieval classification method and system
CN111522950B (en) Rapid identification system for unstructured massive text sensitive data
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
CN115017887A (en) Chinese rumor detection method based on graph convolution
CN111651566A (en) Multi-task small sample learning-based referee document dispute focus extraction method
CN111460147A (en) Title short text classification method based on semantic enhancement
CN108399238A (en) A kind of viewpoint searching system and method for fusing text generalities and network representation
CN105677888A (en) Service preference identification method based on user time fragments
CN109871889B (en) Public psychological assessment method under emergency
CN115600602B (en) Method, system and terminal device for extracting key elements of long text
CN107122420A (en) A kind of tourist hot spot event detecting method and system
CN108804524B (en) Emotion distinguishing and importance dividing method based on hierarchical classification system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 241000 room 01, 18 / F, iFLYTEK intelligent building, No. 9, Wenjin West Road, Yijiang District, Wuhu City, Anhui Province

Patentee after: ANHUI XUNFEI INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 241000 Floor 9, block A1, Wanjiang Fortune Plaza, Jiujiang District, Wuhu City, Anhui Province

Patentee before: ANHUI XUNFEI INTELLIGENT TECHNOLOGY Co.,Ltd.