CN109165294B - Short text classification method based on Bayesian classification - Google Patents

Short text classification method based on Bayesian classification Download PDF

Info

Publication number
CN109165294B
CN109165294B CN201810951636.2A CN201810951636A CN109165294B CN 109165294 B CN109165294 B CN 109165294B CN 201810951636 A CN201810951636 A CN 201810951636A CN 109165294 B CN109165294 B CN 109165294B
Authority
CN
China
Prior art keywords
classification
short text
data
text
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810951636.2A
Other languages
Chinese (zh)
Other versions
CN109165294A (en
Inventor
水新莹
张宇光
黄亚坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Xunfei Intelligent Technology Co ltd
Original Assignee
Anhui Xunfei Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Xunfei Intelligent Technology Co ltd filed Critical Anhui Xunfei Intelligent Technology Co ltd
Priority to CN201810951636.2A priority Critical patent/CN109165294B/en
Publication of CN109165294A publication Critical patent/CN109165294A/en
Application granted granted Critical
Publication of CN109165294B publication Critical patent/CN109165294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention discloses a short text classification method based on Bayesian classification, which relates to the field of smart cities and electronic government affairs, and comprises the following steps: (1) preprocessing data and labeling categories; (2) completing word segmentation and incremental feature vector extraction of short text data, and mainly comprising the following two core steps; (3) establishing a short text classification model based on Bayes; (4) dividing the processed data set into a training set and a testing set, carrying out classification model training, and carrying out model optimization according to the result of the training set; (5) according to the trained model, short text data of unknown classes are input, the probability that the current input text belongs to each class is output, the class with the highest probability is selected as the result of the final classification class, and the short text classification method based on the Bayesian classification can effectively, intelligently and automatically classify the short text content.

Description

Short text classification method based on Bayesian classification
Technical Field
The invention relates to the field of smart cities and electronic government affairs, in particular to a short text classification method based on Bayesian classification.
Background art:
with the development of mobile internet and social networks and the rise of social software such as microblogs and wechat, companies and government departments gradually use the social software to establish connection and communicate. The characteristics of high publishing frequency and short text content are mobile social media, and the scale of the short text content is also rapidly increased. Short texts are also the focus of research in the fields of search engines, intelligent customer service and public opinion monitoring. In the face of such a huge and increasing number of netizens, useful information is extracted from incomplete text information such as various phenomenon descriptions, private letters, comments and the like, and the method is very important for decision makers such as media, governments and the like. The manual processing of huge and large-scale short text classification and extraction equivalence rate is low, and tasks cannot be effectively completed usually, so that how to efficiently, intelligently and automatically effectively classify short text contents has important significance for promoting the construction of electronic government affairs.
The existing text classification technology mainly carries out the design of a core classification algorithm by using a similar method such as the representative degree of keywords, namely, the broad proposition of weights and the like; for example, in the existing document, "a text classification method based on cluster word embedding", a k-means algorithm is mainly applied to word vectors of a document to obtain a set of fixed-size clusters, a centroid of each cluster is interpreted as a hyperword embedding, and each embedded word in the text set is assigned to the nearest cluster center. The centroid of each cluster is interpreted as a hypernym embedding, and each embedded word in the text collection is assigned to the nearest cluster center. Each text is represented as a super word embedding package, and the frequency of embedding each super word in the respective text is calculated, namely the type of the text is obtained.
Analyzing the short text classification method, the selection of the keywords influences the classification effect, the number of the keywords and the feature universality need to be considered, in the short text classification, the short text feature keywords are few, in the actual classification process, the keywords are difficult to effectively express the intrinsic meaning of the short text, and the result that one text has a plurality of classification categories is easy to generate; in addition, semantic information in the short text also influences the classification result, while the method for extracting the feature keywords in the prior art has a good effect on the classification of the long text, and the short text is difficult to effectively classify
For example, CN201710216502.1 discloses a method for obtaining a text classifier for automatically labeling corpora and a text classifier, the method includes determining a concept set, matching and automatically labeling the text of the un-labeled corpora with the concept keywords in the concept keyword set corresponding to each concept; for each concept, when the text quantity in the labeled corpus text set corresponding to the concept meets the threshold condition, training a corresponding text classification model for the concept to obtain a corresponding text classifier, and finally obtaining a text classifier set corresponding to the concept, wherein the text classifier set meets the threshold condition in all text quantities. The algorithm structure has universality, can flexibly change a classification system, saves calculation time and resources, provides a small amount of initial corpus texts, automatically labels without manual labeling, and further saves time and cost, but the classification method does not disclose a technical scheme for improving the accuracy of the classification method through autonomous training.
For example, CN201710882685.0 discloses a method and an apparatus for establishing a text classification model and text classification, where the establishing method includes: obtaining a training sample; the method comprises the steps of obtaining a corresponding vector matrix after word segmentation is carried out on a text based on an entity dictionary; training a first classification model and a second classification model by using a vector matrix of the text and classification of the text; in the training process, a loss function of the text classification model is obtained by using the loss functions of the first classification model and the second classification model, and parameters of the first classification model and the second classification model are adjusted by using the loss functions of the text classification model, so that the text classification model formed by the first classification model and the second classification model is obtained. The text classification method comprises the following steps: acquiring a text to be classified; the method comprises the steps of obtaining a vector matrix corresponding to a text after the text is cut into words based on an entity dictionary; the vector matrix is input into the text classification model, and the classification result of the text is obtained according to the output of the text classification model, but the classification method does not disclose a technical scheme of how to make the accuracy of the text more accurate through autonomous training.
Disclosure of Invention
The invention aims to provide a short text classification method based on Bayesian classification to solve the defects in the prior art.
A short text classification method based on Bayesian classification is characterized by comprising the following steps:
(1) data preprocessing and category labeling:
the method comprises the following steps: extracting reported historical short text data, and performing conventional data cleaning and data integration processing on the data to improve the data quality;
step two: manually completing category labeling on the data after the preliminary cleaning is completed and on the historical processed short text, and manually labeling the category of the currently unprocessed partial data to complete the data preprocessing process;
(2) completing word segmentation and incremental feature vector extraction of short text data, comprising the following two core steps:
the method comprises the following steps: carrying out word segmentation on the cleaned short text content by using Python-based three-party library Jieba word segmentation;
step two: extracting the incremental characteristic vector, extracting key words by combining TF-IDF, and directly using all word segmentation phrases as final classification parameters to input if the number of the key words is too small;
(3) establishing a short text classification model based on Bayes;
(4) dividing the processed data set into a training set and a testing set, carrying out classification model training, and carrying out model optimization according to the result of the training set;
(5) according to the trained model, inputting short text data of unknown classes, outputting the probability that the current input text belongs to each class, and selecting the class with the maximum probability as a result of final classification of the classes.
Preferably, the data preprocessing comprises the following four steps:
the method comprises the following steps: cleaning and classifying the original data, and classifying the text into three categories, namely a major serial number, a minor serial number and the text by using a button;
step two: storing the processed data into a database;
step three: utilizing the Jianba word segmentation to segment the content of the third field, namely the plain text;
step four: and reserving three words in each row of the divided words according to the part of speech and storing the words in the database.
Preferably, the extracting of the feature key words by the incremental feature vector and the TF-IDF feature word extracting method comprises the following two steps:
the method comprises the following steps: let B ═ B1,B2,...,Bu) For the feature vector composed of the feature words extracted from the text, the words of the feature words describing the feature vector are summarized into a new feature word Bu+1Given a name, and so on, when u is 5,6, m obtains the incremental feature vector B (B)1,B2,...,Bm);
Step two: if a word or phrase has high frequency of TF in one article and rarely occurs in other articles, the word or phrase is considered to have good classification capability and is suitable for classification, and the feature extraction function of TF-IDF is: f (w) ═ TF (w) xIDF (w), completing feature keyword extraction on short text content according to the formula, firstly, marking the TF value of the feature word w as TF (w), and often combining the feature term frequency TF with the inverse document frequency IDF for use; then, idf (w) ═ log [ N/N (w) +1], N being the total number of texts, and N (w) being the number of texts containing w, are calculated.
Preferably, for the input short text sample record, B ═ B (B)1,B2,...,Bm) For extracted feature vectors, C1,C2,...,CnN classification results; p (C)iI ═ 1, 2., n denotes the probability that the text to be classified belongs to the ith classification result; p (B)j|Ci) J 1,2, a., m, i 1,2, a., n denotes a probability that the jth feature word belongs to the ith class, and in a specific calculation, the following is shown based on a bayesian formula:
Figure GDA0003130320690000051
when classifying new text, only P (C) of n classes needs to be calculatedi| B), determining new samples to the class with the highest probability value, wherein the probability p (B) is a constant independent of the class, and then determining the new samples according to the characteristic vector B (B)1,B2,...,Bm) The independence among all the characteristic words, the above calculation formula can be simplified as follows:
Figure GDA0003130320690000052
preferably, the category attribution of the unknown short text information is calculated according to the established model, and if N is the total number of the predicted samples, Cou (C)i) Representing the count of the ith class in the sample, P (C)i)=Cou(Ci)/N,Cou(Bij) Representing the number of the jth feature word in the ith classification, P (B)j|Ci)=Cou(Bij)/Cou(Ci) Finally, calculating the probability of each class of the sample to be classified to obtain the maximum probability
Figure GDA0003130320690000053
The invention has the advantages that: the short text classification method based on Bayesian classification is characterized in that classification is carried out after analysis according to short text content reported by a user and distributed to business units, for the core short text classification process, data cleaning, regularization integration and other processing are carried out on source data, part of short text data are extracted as training data, and classification and labeling are carried out on the extracted data according to classification requirements; then, the cleaned short text content is segmented by the aid of Python-based three-party library Jieba segmentation, keywords are extracted based on TF-IDF, and the short text content is considered to be small, so that the keywords extracted by the TF-IDF serve as references before Bayesian classification modeling, if the extracted keywords are too small, phrases after the short text segmentation are directly used for classification modeling, a classification model is built based on a Bayesian formula according to the steps, and relevant models are adjusted until the classification test precision is stable.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of data processing in the present invention.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
As shown in fig. 1 and fig. 2, a short text classification method based on bayesian classification is characterized in that the method comprises the following steps:
(1) data preprocessing and category labeling:
the method comprises the following steps: the reported historical short text data is extracted, and the data is processed by conventional data cleaning, data integration and the like, so that the data quality is improved;
step two: manually completing category labeling on the data after the preliminary cleaning is completed and on the historical processed short text, and manually labeling the category of the currently unprocessed partial data to complete the data preprocessing process;
(2) completing word segmentation and incremental feature vector extraction of short text data, and mainly comprising the following two core steps:
the method comprises the following steps: carrying out word segmentation on the cleaned short text content by using Python-based three-party library Jieba word segmentation;
step two: extracting the incremental characteristic vector, extracting key words by combining TF-IDF, and directly using all word segmentation phrases as final classification parameters to input if the number of the key words is too small;
(3) establishing a short text classification model based on Bayes;
(4) dividing the processed data set into a training set and a testing set, carrying out classification model training, and carrying out model optimization according to the result of the training set;
(5) according to the trained model, inputting short text data of unknown classes, outputting the probability that the current input text belongs to each class, and selecting the class with the maximum probability as a result of final classification of the classes.
It is noted that the data preprocessing comprises the following four steps:
the method comprises the following steps: cleaning and classifying the original data, and classifying the text into three categories, namely a major serial number, a minor serial number and the text by using a button;
step two: storing the processed data into a database;
step three: utilizing the Jianba word segmentation to segment the content of the third field, namely the plain text;
step four: and reserving three words in each row of the divided words according to the part of speech and storing the words in the database.
In this embodiment, the extracting of the feature keyword by the incremental feature vector and TF-IDF feature word extracting method includes the following two steps:
the method comprises the following steps:let B ═ B1,B2,...,Bu) For the feature vector composed of the feature words extracted from the text, the words of the feature words describing the feature vector are summarized into a new feature word Bu+1Given a name, and so on, when u is 5,6, m obtains the incremental feature vector B (B)1,B2,...,Bm);
Step two: if a word or phrase has high frequency of TF in one article and rarely occurs in other articles, the word or phrase is considered to have good classification capability and is suitable for classification, and the feature extraction function of TF-IDF is: f (w) ═ TF (w) xIDF (w), completing feature keyword extraction on short text content according to the formula, firstly, marking the TF value of the feature word w as TF (w), and often combining the feature term frequency TF with the inverse document frequency IDF for use; then, idf (w) ═ log [ N/N (w) +1], N being the total number of texts, and N (w) being the number of texts containing w, are calculated.
In the present embodiment, for the input short text sample record, B ═ B (B)1,B2,...,Bm) For extracted feature vectors, C1,C2,...,CnN classification results; p (C)iI ═ 1, 2., n denotes the probability that the text to be classified belongs to the ith classification result; p (B)j|Ci) J 1,2, a., m, i 1,2, a., n denotes a probability that the jth feature word belongs to the ith class, and in a specific calculation, the following is shown based on a bayesian formula:
Figure GDA0003130320690000081
when classifying new text, only P (C) of n classes needs to be calculatedi| B), determining new samples to the class with the highest probability value, wherein the probability p (B) is a constant independent of the class, and then determining the new samples according to the characteristic vector B (B)1,B2,...,Bm) The independence among all the characteristic words, the above calculation formula can be simplified as follows:
Figure GDA0003130320690000082
in addition, according to the established model, the category attribution of the unknown short text information is calculated, and if N is the total number of the predicted samples, Cou (C)i) Representing the count of the ith class in the sample, P (C)i)=Cou(Ci)/N,Cou(Bij) Representing the number of the jth feature word in the ith classification, P (B)j|Ci)=Cou(Bij)/Cou(Ci) Finally, calculating the probability of each class of the sample to be classified to obtain the maximum probability
Figure GDA0003130320690000083
Based on the foregoing, the short text classification method based on bayesian classification includes the following steps: (1) preprocessing data and labeling categories; (2) completing word segmentation and incremental feature vector extraction of short text data, and mainly comprising the following two core steps; (3) establishing a short text classification model based on Bayes; (4) dividing the processed data set into a training set and a testing set, carrying out classification model training, and carrying out model optimization according to the result of the training set; (5) inputting short text data of unknown classes according to a trained model, outputting the probability that the current input text belongs to each class, selecting the class with the highest probability as a final classification result, classifying and distributing the class to a service unit according to the short text content analysis reported by a user, firstly performing data cleaning, regularizing integration and other processing on source data in the core short text classification process, extracting partial short text data as training data, and classifying and labeling the extracted data according to the classification requirement; then, the cleaned short text content is segmented by the aid of Python-based three-party library Jieba segmentation, keywords are extracted based on TF-IDF, and the short text content is considered to be small, so that the keywords extracted by the TF-IDF serve as references before Bayesian classification modeling, if the extracted keywords are too small, phrases after the short text segmentation are directly used for classification modeling, a classification model is built based on a Bayesian formula according to the steps, and relevant models are adjusted until the classification test precision is stable.
It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims (3)

1. A short text classification method based on Bayesian classification is characterized by comprising the following steps:
(1) data preprocessing and category labeling:
the method comprises the following steps: extracting reported historical short text data, and performing conventional data cleaning and data integration processing on the data to improve the data quality;
step two: manually completing category labeling on the data after the preliminary cleaning is completed and on the historical processed short text, and manually labeling the category of the currently unprocessed partial data to complete the data preprocessing process;
(2) completing word segmentation and incremental feature vector extraction of short text data, comprising the following two core steps:
the method comprises the following steps: carrying out word segmentation on the cleaned short text content by using Python-based three-party library Jieba word segmentation;
step two: extracting the incremental characteristic vector, extracting key words by combining TF-IDF, and directly using all word segmentation phrases as final classification parameters to input if the number of the key words is too small;
(3) establishing a short text classification model based on Bayes;
(4) dividing the processed data set into a training set and a testing set, carrying out classification model training, and carrying out model optimization according to the result of the training set;
(5) inputting short text data of unknown classes according to the trained model, outputting the probability that the current input text belongs to each class, and selecting the class with the maximum probability as a result of final classification of the classes;
the data preprocessing comprises the following four steps:
the method comprises the following steps: cleaning and classifying the original data, and classifying the text into three categories, namely a major serial number, a minor serial number and the text by using a button;
step two: storing the processed data into a database;
step three: utilizing the Jianba word segmentation to segment the content of the third field, namely the plain text;
step four: storing three words left in each row of the divided words into a database according to the part of speech;
the method for extracting the feature key words by the incremental feature vector and TF-IDF feature word extraction method comprises the following two steps:
the method comprises the following steps: let B ═ B1,B2,...,Bu) For the feature vector composed of the feature words extracted from the text, the words of the feature words describing the feature vector are summarized into a new feature word Bu+1Given a name, and so on, when u is 5,6, m obtains the incremental feature vector B (B)1,B2,...,Bm);
Step two: if a word or phrase has high frequency of TF in one article and rarely occurs in other articles, the word or phrase is considered to have good classification capability and is suitable for classification, and the feature extraction function of TF-IDF is: f (w) ═ TF (w) x IDF (w), extracting the feature key words from the short text content according to the formula, firstly, marking the TF value of the feature word w as TF (w), and combining the feature term frequency TF with the inverse document frequency IDF for use; then, idf (w) ═ log [ N/N (w) +1], N being the total number of texts, and N (w) being the number of texts containing w, are calculated.
2. The Bayesian classification-based short text classification method according to claim 1, wherein: for the input short text sample record, B ═ B1,B2,...,Bm) For extracted feature vectors, C1,C2,...,CnN classification results; p (C)iI |, B), i ═ 1,2The probability of an outcome; p (B)j|Ci) J 1,2, a., m, i 1,2, a., n denotes a probability that the jth feature word belongs to the ith class, and in a specific calculation, the following is shown based on a bayesian formula:
Figure FDA0003130320680000021
when classifying new text, only P (C) of n classes needs to be calculatedi| B), determining new samples to the class with the highest probability value, wherein the probability p (B) is a constant independent of the class, and then determining the new samples according to the characteristic vector B (B)1,B2,...,Bm) The independence among all the characteristic words, the above calculation formula can be simplified as follows:
Figure FDA0003130320680000031
3. the Bayesian classification-based short text classification method according to claim 1, wherein: calculating the category attribution of the unknown short text information according to the established model, and if N is the total number of the predicted samples, Cou (C)i) Representing the count of the ith class in the sample, P (C)i)=Cou(Ci)/N,Cou(Bij) Representing the number of the jth feature word in the ith classification, P (B)j|Ci)=Cou(Bij)/Cou(Ci) Finally, calculating the probability of each class of the sample to be classified to obtain the maximum probability
Figure FDA0003130320680000032
CN201810951636.2A 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification Active CN109165294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810951636.2A CN109165294B (en) 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810951636.2A CN109165294B (en) 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification

Publications (2)

Publication Number Publication Date
CN109165294A CN109165294A (en) 2019-01-08
CN109165294B true CN109165294B (en) 2021-09-24

Family

ID=64896189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810951636.2A Active CN109165294B (en) 2018-08-21 2018-08-21 Short text classification method based on Bayesian classification

Country Status (1)

Country Link
CN (1) CN109165294B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256865B (en) * 2019-01-31 2023-03-21 青岛科技大学 Chinese text classification method based on classifier
CN110287316A (en) * 2019-06-04 2019-09-27 深圳前海微众银行股份有限公司 A kind of Alarm Classification method, apparatus, electronic equipment and storage medium
CN110619363A (en) * 2019-09-17 2019-12-27 陕西优百信息技术有限公司 Classification method for subclass names corresponding to long description of material data
CN111159414B (en) * 2020-04-02 2020-07-14 成都数联铭品科技有限公司 Text classification method and system, electronic equipment and computer readable storage medium
CN111488459B (en) * 2020-04-15 2022-07-22 焦点科技股份有限公司 Product classification method based on keywords
CN111985222B (en) * 2020-08-24 2023-07-18 平安国际智慧城市科技股份有限公司 Text keyword recognition method and related equipment
CN112084308A (en) * 2020-09-16 2020-12-15 中国信息通信研究院 Method, system and storage medium for text type data recognition
CN112214598B (en) * 2020-09-27 2023-01-13 吾征智能技术(北京)有限公司 Cognitive system based on hair condition
CN112559748A (en) * 2020-12-18 2021-03-26 厦门市法度信息科技有限公司 Method for classifying stroke record data records, terminal equipment and storage medium
CN112883159A (en) * 2021-02-25 2021-06-01 北京精准沟通传媒科技股份有限公司 Method, medium, and electronic device for generating hierarchical category label for domain evaluation short text
CN113869356A (en) * 2021-08-17 2021-12-31 杭州华亭科技有限公司 Method for judging escape tendency of people based on Bayesian classification
CN114528404A (en) * 2022-02-18 2022-05-24 浪潮卓数大数据产业发展有限公司 Method and device for identifying provincial and urban areas
CN116956930A (en) * 2023-09-20 2023-10-27 北京九栖科技有限责任公司 Short text information extraction method and system integrating rules and learning models

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8725732B1 (en) * 2009-03-13 2014-05-13 Google Inc. Classifying text into hierarchical categories
CN104850650A (en) * 2015-05-29 2015-08-19 清华大学 Short-text expanding method based on similar-label relation
WO2016090197A1 (en) * 2014-12-05 2016-06-09 Lightning Source Inc. Automated content classification/filtering
CN106407482A (en) * 2016-12-01 2017-02-15 合肥工业大学 Multi-feature fusion-based online academic report classification method
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8725732B1 (en) * 2009-03-13 2014-05-13 Google Inc. Classifying text into hierarchical categories
WO2016090197A1 (en) * 2014-12-05 2016-06-09 Lightning Source Inc. Automated content classification/filtering
CN104850650A (en) * 2015-05-29 2015-08-19 清华大学 Short-text expanding method based on similar-label relation
CN106407482A (en) * 2016-12-01 2017-02-15 合肥工业大学 Multi-feature fusion-based online academic report classification method
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于维基百科的中文短文本分类研究;范云杰、刘怀亮;《现代图书情报技术》;20121231;全文 *

Also Published As

Publication number Publication date
CN109165294A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN109165294B (en) Short text classification method based on Bayesian classification
CN109933670B (en) Text classification method for calculating semantic distance based on combined matrix
CN105183833A (en) User model based microblogging text recommendation method and recommendation apparatus thereof
CN112163424A (en) Data labeling method, device, equipment and medium
CN108596637B (en) Automatic E-commerce service problem discovery system
CN111191442A (en) Similar problem generation method, device, equipment and medium
CN111462752A (en) Client intention identification method based on attention mechanism, feature embedding and BI-L STM
TWI828928B (en) Highly scalable, multi-label text classification methods and devices
CN107818173B (en) Vector space model-based Chinese false comment filtering method
CN116304020A (en) Industrial text entity extraction method based on semantic source analysis and span characteristics
CN111966888B (en) Aspect class-based interpretability recommendation method and system for fusing external data
CN115409018A (en) Company public opinion monitoring system and method based on big data
CN113360647B (en) 5G mobile service complaint source-tracing analysis method based on clustering
CN115146062A (en) Intelligent event analysis method and system fusing expert recommendation and text clustering
CN111754208A (en) Automatic screening method for recruitment resumes
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN111178080A (en) Named entity identification method and system based on structured information
CN114722198A (en) Method, system and related device for determining product classification code
CN114239579A (en) Electric power searchable document extraction method and device based on regular expression and CRF model
CN109871889B (en) Public psychological assessment method under emergency
CN111859955A (en) Public opinion data analysis model based on deep learning
CN111104422A (en) Training method, device, equipment and storage medium of data recommendation model
CN115033689A (en) Original network Euclidean distance calculation method based on small sample text classification
CN113239277A (en) Probability matrix decomposition recommendation method based on user comments
CN115080732A (en) Complaint work order processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: 241000 room 01, 18 / F, iFLYTEK intelligent building, No. 9, Wenjin West Road, Yijiang District, Wuhu City, Anhui Province

Patentee after: ANHUI XUNFEI INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 241000 Floor 9, block A1, Wanjiang Fortune Plaza, Jiujiang District, Wuhu City, Anhui Province

Patentee before: ANHUI XUNFEI INTELLIGENT TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder