CN110647919A - Text clustering method and system based on K-means clustering and capsule network - Google Patents

Text clustering method and system based on K-means clustering and capsule network Download PDF

Info

Publication number
CN110647919A
CN110647919A CN201910794559.9A CN201910794559A CN110647919A CN 110647919 A CN110647919 A CN 110647919A CN 201910794559 A CN201910794559 A CN 201910794559A CN 110647919 A CN110647919 A CN 110647919A
Authority
CN
China
Prior art keywords
document
text
clustering
capsule
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910794559.9A
Other languages
Chinese (zh)
Inventor
张伟
汤旭东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201910794559.9A priority Critical patent/CN110647919A/en
Publication of CN110647919A publication Critical patent/CN110647919A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The invention provides a text clustering method based on K-means clustering and a capsule network, which comprises the following steps: acquiring text data, preprocessing the text data, and training word2vec on a data set to be used as expression of words. Averaging word vectors of all words in one document to serve as vector representation of the corresponding document; the pseudo-tags are generated using Kmeans clustering on the vector representations of these documents. And finally, taking the word sequence, the word vector and the pseudo label as training data, training a classifier based on the capsule network, controlling certain training loss, and finally clustering by using the classifier. The method combines the K-nearest neighbor and the capsule network, converts the unsupervised text clustering problem into the supervised classification problem, and further improves the clustering effect on the basis of the traditional clustering method. The invention also provides a text clustering system based on the K-means clustering and the capsule network.

Description

Text clustering method and system based on K-means clustering and capsule network
Technical Field
The invention relates to the field of natural language processing, in particular to a text clustering method and a text clustering system for converting an unsupervised task into a supervised task by utilizing K-means clustering and a capsule network.
Background
In recent years, with the rapid development of internet technology, massive network data is continuously generated, and in information storage, text is the most widely used form, and massive information is stored in text form. Text mining techniques investigate how to mine interesting, valuable information from various forms of text data. One branch of text mining is text clustering, and the method is widely applied to the directions of pattern recognition, topic recognition, recommendation systems and the like.
Text clustering applies a clustering algorithm to texts, which is an important component of text mining technology. The method is applied to a search engine, can enable a user to quickly and effectively search information wanted by the user, and can extract hot spot information of the day from news information collected from various channels or recommend content interested by the user by combining with history records. The text clustering belongs to an unsupervised machine learning method, is different from a supervised machine learning method, and has higher flexibility and automatic processing capability.
Disclosure of Invention
The invention innovatively provides a method for applying a capsule network to text clustering for the first time, and the unsupervised problem is converted into a supervised problem. The pseudo label is generated by adopting K-nearest neighbor, and then the capsule network training is used to control the training loss, so that the clustering method can achieve better effect than the pseudo label generating clustering method.
The method utilizes the potential characteristics of the data to redistribute the fuzzy edge part of the pseudo label generated by the K-nearest neighbor algorithm in the characteristic learning process through the capsule network so as to achieve better clustering effect.
The text clustering method provided by the invention comprises the following steps:
firstly, selecting a text data set, and preprocessing text data in the text data set;
secondly, converting the text sequence into vector characteristic representation by using word vectors;
thirdly, averaging word vectors of each document to serve as vector feature representation of the document, and carrying out K-means clustering on the representation of the document to generate a pseudo label of the document;
fourthly, training a classifier based on a capsule network by taking the word vectors of the documents and the pseudo labels generated in the third step as training data without perfectly training, and keeping certain training loss;
and fifthly, clustering the text data set by using the trained capsule network classifier.
In the present invention, text data refers to data including but not limited to twitter, microblog, news and other network platforms.
In the first step, the text data preprocessing refers to: because some words or characters without information content exist in the text, the text needs to be subjected to operations of deactivating words, special symbols and links.
The stop words refer to words or phrases which have a high frequency of use in English but are removed without affecting the overall understanding, and are often articles, prepositions, adverbs or conjunctions.
Wherein, the special symbols are basic comma periods, mathematical symbols, emoticons and the like.
The links are website links describing objects, and the links are removed in the data preprocessing process.
In the second step, the word vector is used to convert the text sequence into vector feature representation, specifically:
training the preprocessed text data by using a word vector model word2vec, and learning a word vector representation of each word in the whole data set; dimension of the token vector is De
In the third step, a vector representation of the document is generated as follows:
for a certain document of the data set, performing average pooling operation on each dimension of word vectors of all words of the document according to the acquired word vectors, namely, performing average pooling operation on N in the document iiDimension of each word is Ni*DeUsing the average pooling of the first dimension to obtain a DeA textual representation of the dimension;
in the third step, the pseudo tag is generated as follows:
setting M documents in the data set in total, and vectorizing the documents to obtain M x DeAnd carrying out K-means clustering on the vectors, wherein the K value can be selected according to actual needs. Recording the results of K-means clustering corresponding to each documentAs a pseudo tag for the document.
In the fourth step, the word vector of the document refers to: for a data set, the maximum length N of the document is specified, for NiWord-free document diIf N is presentiIntercepting the document to the Nth word if the number is more than or equal to N, and if the number is more than or equal to N, intercepting the document to the Nth wordiFilling N-N for documentiA special character e representing a space. And finally, sequentially replacing each word in the document with a corresponding word vector trained in the second step to represent, and replacing the epsilon with a full 0 vector. Thus each document corresponds to an N x DeThe word vector matrix of (2).
In the fourth step, based on a classifier of the capsule network, convolution is used in a shallow layer; the deep layer uses a dynamic routing mechanism, and the module length of each capsule output by the last layer represents the probability of each category, and the method comprises the following steps:
(1) input is N x DeWhere N is the maximum length of the sentence, DeIs the dimension of the word vector;
(2) n-gram convolutional layer: let WaIs a size of K1*DeThen convolving with it can result in a mapping of features:
Figure BDA0002180546920000031
wherein the content of the first and second substances,
Figure BDA0002180546920000034
denotes the multiplication of element-wise, b0Is the bias term, f is the activation function ReLU; thus, assuming B sliding windows of the same size, a size of (L-K) can be obtained1+1) × B feature matrix:
M=[m1,m2,...,mB]
(3) main capsule layer: here, the introduction of a capsule, i-th capsule:
pi=g((Wb)TMi+b1)
wherein, WbIs a weight matrixDimension B x d, d is the dimension of the capsule, MiAnd the vector with dimension B is the ith component output by the previous layer. g is the square function. The output of this layer can then be written as:
P=[p1,P2,...,pc]
namely (L-K)1+1) C capsules of d vitamins;
(4) fully connecting capsule layers:
Figure BDA0002180546920000032
wherein the content of the first and second substances,
Figure BDA0002180546920000033
is a shared weight matrix, and then uses a dynamic routing algorithm to find the upper capsule vj
In the fourth step, based on the classifier of the capsule network, the loss function adopted in the training process is as follows:
Lk=Tkmax(0,m+-||vk||)2+λ(1-Tk)max(0,||vk||-m-)2
wherein, TkIf and only if the label corresponding to the text is a category k, | | vkI is the module length of the kth capsule of the output layer, m+,m-λ are all adjustable hyper-parameters, e.g. take m+=0.9,m-=0.1,λ=0.5;
In the fifth step, when clustering the texts by using the trained capsule network, the subscript of the capsule with the largest length in the output capsules is taken, namely:
prediction(x)=argmax(||v{j|x}||)。
based on the method, the invention also provides a text clustering system based on K-means clustering and a capsule network, which comprises the following steps:
the input representation unit is used for preprocessing the text data and serializing the text data by using the word vectors;
the pseudo label generating unit is used for clustering the preprocessed data by adopting a K-nearest neighbor algorithm to obtain a pseudo label;
and the class label generating unit is used for training the classifier based on the capsule network by adopting the serialized text data and the pseudo label, controlling the training loss and acquiring the network output.
Compared with the prior art, the beneficial effects of the invention comprise: by combining K-nearest neighbor and a capsule network, the unsupervised text clustering problem is converted into a supervised classification problem, and the clustering effect is further improved on the basis of the traditional clustering method.
Drawings
FIG. 1 is a flow chart of the text clustering method according to the present invention.
FIG. 2 is a flow chart of data processing in an example of the present invention.
FIG. 3 is a diagram of the model architecture of the capsule network classifier in an example of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art except for the contents specifically mentioned below, and the present invention is not particularly limited. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
The text clustering method provided by the invention, as shown in fig. 1, comprises the following steps:
firstly, selecting a text data set, and preprocessing text data in the text data set;
secondly, converting the text sequence into vector characteristic representation by using word vectors;
thirdly, averaging word vectors of each document to serve as vector feature representation of the document, and carrying out K-means clustering on the representation of the document to generate a pseudo label of the document;
fourthly, training a classifier based on a capsule network by taking the word vectors of the documents and the pseudo labels generated in the third step as training data without perfectly training, and keeping certain training loss;
and fifthly, clustering the text data set by using the trained capsule network classifier.
The specific flow of this embodiment is shown in fig. 1.
Firstly, selecting a text data set Google News;
for the selected original text data, the following describes the conversion manner of the data:
vectorized representation of text:
(a) for the preprocessed text, a word vector representation of each word in the data set is learned using a word vector model word2 vec. Dimension of the token vector is De
(b) Aiming at a certain document of the data set, carrying out average pooling operation on each dimension of the word vectors of all words of the document according to the acquired word vectors, namely, carrying out N in the document iiDimension of each word is Ni*DeUsing the average pooling of the first dimension to obtain a DeA textual representation of the dimension;
then, a Kmeans module in scinit-lean is utilized to designate the number K of clusters needing to be aggregated, K-nearest neighbor clustering is carried out on the vector quantized text, and a pseudo label of the document is generated;
then specifying the maximum length N of the document for which there is NiWord-free document diIf N is presentiIntercepting the document to the Nth word if the number is more than or equal to N, and if the number is more than or equal to N, intercepting the document to the Nth wordiFilling N-N for documentiA special character e representing a space. And finally, sequentially replacing each word in the document with a corresponding word vector trained in the second step to represent, and replacing the epsilon with a full 0 vector. Thus each document corresponds to an N x DeThe word vector matrix of (2).
And taking the word vector matrix of the document and the corresponding pseudo label as training data of the capsule network classifier, and setting the number of capsules of the network output layer to be equal to K of the K-neighbor cluster. The structure of the capsule network in this example is shown in figure 3. Controlling the training loss within a certain range, such as 0.2 +/-0.01, and finishing the training of the network.
And finally, the trained capsule network classifier is used for marking each document of the data set with a category, and according to the definition of the capsule network, the category corresponding to each document is a subscript corresponding to the capsule with the largest length in the output layer.
The method can also be applied to other various text data sets, and the specific process is not described in detail.
The invention provides a text clustering system based on K-means clustering and a capsule network, which comprises the following steps:
the input representation unit is used for preprocessing the text data and serializing the text data by using the word vectors;
the pseudo label generating unit is used for clustering the preprocessed data by adopting a K-nearest neighbor algorithm to obtain a pseudo label;
and the class label generating unit is used for training the classifier based on the capsule network by adopting the serialized text data and the pseudo label, controlling the training loss and acquiring the network output.
The parameters in the above embodiments of the present invention are determined according to experimental results, i.e., different parameter combinations are tested, and a group of parameters with better accuracy is selected. In the above tests, the above parameters can be adjusted appropriately according to the requirements, and the purpose of the present invention can also be achieved.
The protection of the present invention is not limited to the above embodiments. Variations and advantages that may occur to those skilled in the art may be incorporated into the invention without departing from the spirit and scope of the inventive concept, which is set forth in the following claims.

Claims (8)

1. A text clustering method based on K-means clustering and a capsule network is characterized by comprising the following steps:
selecting a text data set, and preprocessing text data in the text data set;
converting the text sequence into vector characteristic representation by using the word vector;
averaging word vectors of each document to serve as vector feature representation of the document, and carrying out K-means clustering on the representation of the document to generate a pseudo label of the document;
step four, taking the word vector of the document and the pseudo label as training data to train a classifier based on a capsule network;
and step five, clustering the text data set by using the trained capsule network classifier.
2. The method according to claim 1, wherein in the first step, the preprocessing the text data comprises: and operations of words, special symbols and links are stopped.
3. The text clustering method according to claim 1, wherein the second step specifically comprises: training the preprocessed text data by using a word vector model word2vec to learn the word vector representation of each word in the whole text data set; the dimension of the token vector is De.
4. The text clustering method according to claim 1, wherein in the third step, the pseudo label of the document is generated according to the following steps:
(1) for a certain document of the text data set, performing average pooling operation on each dimension of word vectors of all words of the document according to the acquired word vectors, namely performing average pooling on N in the document iiDimension of each word is Ni*DeUsing the average pooling of the first dimension to obtain a DeA social text representation of the dimension;
(2) setting the total M documents in the text data set, and comparing the M x D obtained in the step (1)eAnd performing K-means clustering on the vectors, and recording the K-means clustering result corresponding to each document as a pseudo label of the document.
5. The method of text clustering according to claim 1, characterized in thatIn the fourth step, the word vector of the document refers to: for a data set, the maximum length N of the document is specified, for NiWord-free document diIf N is presentiIntercepting the document to the Nth word if the number is more than or equal to N, and if the number is more than or equal to N, intercepting the document to the Nth wordiFilling N-N for documentiA special character epsilon representing a blank; finally, each word in the document is sequentially replaced by the corresponding word vector trained in the second step to represent, the epsilon is replaced by a full 0 vector, and each document corresponds to one N x DeThe word vector matrix of (2).
6. The text clustering method according to claim 1, wherein in the fourth step, based on the classifier of the capsule network, the shallow layer uses convolution; the deep layer uses a mechanism of dynamic routing, with the modular length of each capsule output by the last layer representing the probability of the respective class.
7. The text clustering method according to claim 6, wherein the probability calculation for each category comprises the steps of:
(1) input is N x DeWhere N is the maximum length of the sentence, DeIs the dimension of the word vector;
(2) n-gram convolutional layer: let WaIs a size of K1*DeThe sliding window of (2) with which convolution is performed to obtain a mapping of features:
Figure FDA0002180546910000021
wherein the content of the first and second substances,
Figure FDA0002180546910000024
denotes the multiplication of element-wise, b0Is the bias term, f is the activation function ReLU; thus, assuming B sliding windows of the same size, a size of (L-K) can be obtained1+1) × B feature matrix:
M=[m1,m2,...,mB]
(3) main capsule layer: here, the introduction of a capsule, i-th capsule:
pi=g((Wb)TMi+b1)
wherein, WbIs a weight matrix with dimension B x d, d is the dimension of the capsule, MiA vector with dimension B for the ith component output by the previous layer; g is a square function; the output of the main capsule layer is then:
P=[p1,P2,...,Pc]
namely (L-K)1+1) C capsules of d vitamins;
(4) fully connecting capsule layers:
wherein the content of the first and second substances,
Figure FDA0002180546910000023
is a shared weight matrix, and uses a dynamic routing algorithm to calculate the upper capsule vj
In the fourth step, based on the classifier of the capsule network, the loss function adopted in the training process is as follows:
Lk=Tkmax(0,m+-||vk||)2+λ(1-Tk)max(0,||vk||-m-)2
wherein, TkIf and only if the label corresponding to the text is a category k, | | vkI is the module length of the kth capsule of the output layer, m+,m-λ is an adjustable hyper-parameter;
in the fifth step, when clustering the texts by using the trained capsule network, the subscript of the capsule with the largest length in the output capsules is taken, that is:
prediction(x)=argmax(||v{j|x}||)。
8. a text clustering system based on K-means clustering and capsule networks, characterized in that the text clustering method according to any one of claims 1 to 7 is used, the system comprising the following:
the input representation unit is used for preprocessing the text data and serializing the text data by using the word vectors;
the pseudo label generating unit is used for clustering the preprocessed data by adopting a K-nearest neighbor algorithm to obtain a pseudo label;
and the class label generating unit is used for training the classifier based on the capsule network by adopting the serialized text data and the pseudo label, controlling the training loss and acquiring the network output.
CN201910794559.9A 2019-08-27 2019-08-27 Text clustering method and system based on K-means clustering and capsule network Pending CN110647919A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910794559.9A CN110647919A (en) 2019-08-27 2019-08-27 Text clustering method and system based on K-means clustering and capsule network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910794559.9A CN110647919A (en) 2019-08-27 2019-08-27 Text clustering method and system based on K-means clustering and capsule network

Publications (1)

Publication Number Publication Date
CN110647919A true CN110647919A (en) 2020-01-03

Family

ID=69009820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910794559.9A Pending CN110647919A (en) 2019-08-27 2019-08-27 Text clustering method and system based on K-means clustering and capsule network

Country Status (1)

Country Link
CN (1) CN110647919A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460818A (en) * 2020-03-31 2020-07-28 中国测绘科学研究院 Web page text classification method based on enhanced capsule network and storage medium
CN111737456A (en) * 2020-05-15 2020-10-02 恩亿科(北京)数据科技有限公司 Corpus information processing method and apparatus
CN112115259A (en) * 2020-06-17 2020-12-22 上海金融期货信息技术有限公司 Feature word driven text multi-label hierarchical classification method and system
CN112235434A (en) * 2020-10-16 2021-01-15 重庆理工大学 DGA network domain name detection and identification system fusing k-means and capsule network thereof
CN112261028A (en) * 2020-10-16 2021-01-22 重庆理工大学 DGA botnet domain name detection method based on capsule network and k-means
WO2021247610A1 (en) * 2020-06-01 2021-12-09 Cognizer, Inc. Semantic frame identification using capsule networks

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070094500A1 (en) * 2005-10-20 2007-04-26 Marvin Shannon System and Method for Investigating Phishing Web Sites
CN107145503A (en) * 2017-03-20 2017-09-08 中国农业大学 Remote supervision non-categorical relation extracting method and system based on word2vec
CN108595632A (en) * 2018-04-24 2018-09-28 福州大学 A kind of hybrid neural networks file classification method of fusion abstract and body feature
CN109118479A (en) * 2018-07-26 2019-01-01 中睿能源(北京)有限公司 Defects of insulator identification positioning device and method based on capsule network
CN109492678A (en) * 2018-10-24 2019-03-19 浙江工业大学 A kind of App classification method of integrated shallow-layer and deep learning
CN109784405A (en) * 2019-01-16 2019-05-21 山东建筑大学 Cross-module state search method and system based on pseudo label study and semantic consistency
CN110046671A (en) * 2019-04-24 2019-07-23 吉林大学 A kind of file classification method based on capsule network
CN110059181A (en) * 2019-03-18 2019-07-26 中国科学院自动化研究所 Short text stamp methods, system, device towards extensive classification system
CN110097096A (en) * 2019-04-16 2019-08-06 天津大学 A kind of file classification method based on TF-IDF matrix and capsule network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070094500A1 (en) * 2005-10-20 2007-04-26 Marvin Shannon System and Method for Investigating Phishing Web Sites
CN107145503A (en) * 2017-03-20 2017-09-08 中国农业大学 Remote supervision non-categorical relation extracting method and system based on word2vec
CN108595632A (en) * 2018-04-24 2018-09-28 福州大学 A kind of hybrid neural networks file classification method of fusion abstract and body feature
CN109118479A (en) * 2018-07-26 2019-01-01 中睿能源(北京)有限公司 Defects of insulator identification positioning device and method based on capsule network
CN109492678A (en) * 2018-10-24 2019-03-19 浙江工业大学 A kind of App classification method of integrated shallow-layer and deep learning
CN109784405A (en) * 2019-01-16 2019-05-21 山东建筑大学 Cross-module state search method and system based on pseudo label study and semantic consistency
CN110059181A (en) * 2019-03-18 2019-07-26 中国科学院自动化研究所 Short text stamp methods, system, device towards extensive classification system
CN110097096A (en) * 2019-04-16 2019-08-06 天津大学 A kind of file classification method based on TF-IDF matrix and capsule network
CN110046671A (en) * 2019-04-24 2019-07-23 吉林大学 A kind of file classification method based on capsule network

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
HAO REN 等: "COMPOSITIONAL CODING CAPSULE NETWORK WITH K-MEANS ROUTING FOR TEXT CLASSIFICATION", 《ARXIV》 *
WEI ZHAO 等: "Investigating Capsule Networks with Dynamic Routing for Text Classfication", 《ARXIV》 *
曾谁飞 等: "基于神经网络的文本表示模型新方法", 《通信学报》 *
衷路生 等: "多级神经网络的轴承故障诊断研究", 《计算机工程与应用》 *
阳馨 等: "基于多种特征池化的中文文本分类算法", 《四川大学学报(自然科学版)》 *
陈培新: "文本语义的向量表示与建模方法研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
陈龙 等: "情感分类研究进展", 《计算机研究与发展》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460818A (en) * 2020-03-31 2020-07-28 中国测绘科学研究院 Web page text classification method based on enhanced capsule network and storage medium
CN111737456A (en) * 2020-05-15 2020-10-02 恩亿科(北京)数据科技有限公司 Corpus information processing method and apparatus
WO2021247610A1 (en) * 2020-06-01 2021-12-09 Cognizer, Inc. Semantic frame identification using capsule networks
CN112115259A (en) * 2020-06-17 2020-12-22 上海金融期货信息技术有限公司 Feature word driven text multi-label hierarchical classification method and system
CN112235434A (en) * 2020-10-16 2021-01-15 重庆理工大学 DGA network domain name detection and identification system fusing k-means and capsule network thereof
CN112261028A (en) * 2020-10-16 2021-01-22 重庆理工大学 DGA botnet domain name detection method based on capsule network and k-means

Similar Documents

Publication Publication Date Title
CN110866117B (en) Short text classification method based on semantic enhancement and multi-level label embedding
CN111368996B (en) Retraining projection network capable of transmitting natural language representation
CN109325231B (en) Method for generating word vector by multitasking model
CN110647919A (en) Text clustering method and system based on K-means clustering and capsule network
CN111160037B (en) Fine-grained emotion analysis method supporting cross-language migration
CN110046248B (en) Model training method for text analysis, text classification method and device
Dashtipour et al. Exploiting deep learning for Persian sentiment analysis
CN110263325B (en) Chinese word segmentation system
CN111078833A (en) Text classification method based on neural network
CN112925904B (en) Lightweight text classification method based on Tucker decomposition
CN111222318A (en) Trigger word recognition method based on two-channel bidirectional LSTM-CRF network
Chen et al. Deep neural networks for multi-class sentiment classification
CN111507093A (en) Text attack method and device based on similar dictionary and storage medium
CN112988970A (en) Text matching algorithm serving intelligent question-answering system
Chakravarthy et al. HYBRID ARCHITECTURE FOR SENTIMENT ANALYSIS USING DEEP LEARNING.
VeeraSekharReddy et al. An attention based bi-LSTM DenseNet model for named entity recognition in english texts
Yang et al. Text classification based on convolutional neural network and attention model
CN113065350A (en) Biomedical text word sense disambiguation method based on attention neural network
Litvinov Research of neural network methods of text information classification
CN107729509A (en) The chapter similarity decision method represented based on recessive higher-dimension distributed nature
Ariwibowo et al. Hate Speech Text Classification Using Long Short-Term Memory (LSTM)
Fan et al. Multi-label Chinese question classification based on word2vec
Kumari et al. An integrated single framework for text, image and voice for sentiment mining of social media posts
Rao et al. Algorithm for using NLP with extremely small text datasets
Kim Research on Text Classification Based on Deep Neural Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200103

RJ01 Rejection of invention patent application after publication