CN112347255B - Text classification method based on title and text combination of graph network - Google Patents
Text classification method based on title and text combination of graph network Download PDFInfo
- Publication number
- CN112347255B CN112347255B CN202011233244.6A CN202011233244A CN112347255B CN 112347255 B CN112347255 B CN 112347255B CN 202011233244 A CN202011233244 A CN 202011233244A CN 112347255 B CN112347255 B CN 112347255B
- Authority
- CN
- China
- Prior art keywords
- text
- word
- title
- document
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 230000004927 fusion Effects 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 43
- 238000012360 testing method Methods 0.000 claims description 18
- 230000011218 segmentation Effects 0.000 claims description 17
- 238000012795 verification Methods 0.000 claims description 12
- 230000009849 deactivation Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 3
- 230000036651 mood Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 6
- 230000000052 comparative effect Effects 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 239000013078 crystal Substances 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 210000004243 sweat Anatomy 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a text classification method based on the combination of a title and a text of a graph network, which mainly comprises the following steps: dividing each document into a title document and a text document, respectively carrying out data preprocessing to obtain a title word set and a text word set, obtaining word vector representation by using a word vector model, obtaining a subject vector by using an LDA model, obtaining text document feature representation by using an HAN model, constructing a heterogeneous graph by using three types of nodes of the title, the title word set and the subject, inputting the heterogeneous graph into a GAT model to realize the fusion of the title and the text feature, obtaining the feature representation of each document, and carrying out text category prediction by using a Softmax function. The classification method not only utilizes the extra information to enhance the semantic sparsity of the titles, but also better fuses the characteristics of the titles and the text, embodies the importance of the titles in the text classification task, improves the classification precision, and solves the problem of low classification efficiency caused by neglecting the importance of the titles in the current news text classification.
Description
Technical Field
The invention relates to a text classification method based on the combination of a title and a text of a graph network, belonging to the field of natural language processing.
Background
Text classification is a fundamental problem of natural language processing. Nowadays, statistical learning methods have become the mainstream in the field of text classification. The text classification method based on traditional machine learning mainly comprises the steps of preprocessing and feature extraction of texts, vectorization of the processed texts, and modeling of a training data set through a common machine learning classification algorithm, wherein the common machine learning classification algorithm mainly comprises a naive Bayes model, a k nearest neighbor algorithm, an expectation maximization algorithm and a Support Vector Machine (SVM) model. However, the difficulty of feature engineering is considered a challenge for traditional text classification.
Today, the constant development of deep learning methods and artificial intelligence has yielded many promising results in the field of text classification. Different from the traditional Chinese text classification learning method, the deep learning method adopts a neural network model to train word embedding. Such as Convolutional Neural Networks (CNN), periodic neural networks (RNN), and long-short term memory networks (LSTM). The deep learning models can automatically learn text features well, improve classification efficiency and are popular with researchers.
In recent years, a new research pattern neural network has attracted much attention, which is effective for a task having a rich relational structure and can preserve global structural information of a pattern in pattern embedding. The invention solves the problem of neglecting the importance of the title in the text classification by using the graph network, and improves the text classification efficiency.
Disclosure of Invention
The invention provides a text classification method based on the combination of a title and a text of a graph network, which utilizes the graph network to fuse the characteristics of the title and the text and solves the problem of low text classification precision caused by neglecting the importance of the title in text classification in the current text classification task.
The invention provides a text classification method based on the combination of a title and a text of a graph network, which comprises the following steps:
1) collecting a Chinese news text data set, wherein the data set comprises documents and belonged categories; and a deactivation vocabulary is established,
2) processing the data set, and dividing all documents in the data set into a title document and a text document;
3) carrying out data preprocessing on the text document divided in the step 2), including sentence segmentation, word segmentation and stop word removal, and constructing a text word set;
4) training the text word set constructed in the step 3) by using a word vector training model to obtain distributed representation of each word in the text word set;
5) dividing the text document divided in the step 2) into a training set, a verification set and a test set;
6) inputting the training set divided in the step 5) into an HAN (hierarchical Attention networks) model for training, detecting the HAN model by using the test set divided in the step 5), optimizing the HAN model, and obtaining each text document vector;
7) segmenting the title documents divided in the step 2), constructing a topic word set, and training the topic word set by using a word vector training model to obtain distributed representation of each word in the topic word set;
8) training the document in the data set by using an LDA topic model to obtain N topics and topic word distribution of each topic, and obtaining each topic vector according to the topic word distribution;
9) taking the title document divided in the step 2), the title word set constructed in the step 7) and the theme obtained in the step 8) as nodes, and constructing a heterogeneous graph according to the relationship among the nodes;
10) dividing the title documents divided in the step 2) into a training set, a verification set and a test set;
11) representing each heading document vector in the training set of the step 10) by each text document vector obtained in the step 6);
12) training a GAT (graph Attention networks) model by using the heterogeneous graph constructed in the step 9), the title document vector constructed in the step 11), the word vector constructed in the step 7) and the theme vector constructed in the step 8), detecting the GAT model by using the test set divided in the step 10), realizing the fusion of title and text characteristics, obtaining the characteristic representation of the whole document, inputting the characteristic representation of the document into a softmax function, wherein the output of the softmax function is the document category.
Further, the text classification method based on the combination of the title and the text of the graph network, provided by the invention, comprises the following steps:
in step 1), the stop word list includes punctuation marks, mathematical marks, conjunctions, exclamation words and word-atmosphere words.
The specific steps of step 3) are as follows: 3-1) intercepting each text document by 500 words; 3-2) carrying out sentence division on the text document by 20 words per sentence, wherein the sequence after the sentence division is consistent with the sequence in the text; 3-3) performing word segmentation on each clause by using a jieba word segmentation tool, and removing stop words in each clause according to a stop word list; 3-4) establishing a text word set.
In the step 4), a skip-gram model in Word2vec is utilized to train the Chinese Word set, and the dimension is set to be 300 dimensions.
In the step 5), the text document is divided into a training set, a verification set and a test set, and in the step 10), the title document is divided into the training set, the verification set and the test set, wherein the division ratio of the training set, the verification set and the test set is 8:1: 1.
In the step 7), a jieba Word segmentation tool is used for Word segmentation, and the Word vector model is a skip-gram model in Word2 vec.
In step 8), the value of N is set according to the confusion degree of the LDA theme model.
In the step 9), the relationship among the three types of nodes is shown as the formula (1):
in step 12), each document feature represents the output document category by using the softmax function shown as the formula (2),
Z=softmax(H(L)) (2)
wherein Z is a document category, H(L)Is a document feature representation.
Compared with the prior art, the invention has the beneficial effects that:
(1) the method utilizes the HAN network to extract the text characteristic representation, when long texts are classified, only performing attention on Word granularity is not enough, and performing attention learning on each sentence is also needed, so that the long text characteristic representation can be well learned.
(2) According to the method for fusing the title and the text features by using the GAT, the GAT model not only utilizes extra information to enhance the title semantic sparsity, but also can better fuse the title and the text features.
(3) The invention provides the importance of the title in the text classification task, and provides a text classification method based on the combination of the title and the text of a graph network, so that the classification precision is improved.
Drawings
FIG. 1 is a flow chart of the text classification based on the heading and body combination of a graph network in accordance with the present invention;
fig. 2 is a heterostructure display diagram.
Detailed Description
In order to solve the problem that the classification efficiency is low because the importance of the titles is neglected in the classification of the current news texts, the text classification method based on the combination of the titles and the texts of the graph network has the design concept that: dividing each document into a title document and a text document, respectively carrying out data preprocessing to obtain a title word set and a text word set, obtaining word vector representation by using a word vector model, obtaining a subject vector by using an LDA model, obtaining text document characteristic representation by using an HAN model, constructing a heterogeneous graph by using three types of nodes of a title, the title word set and a subject, inputting the heterogeneous graph into a GAT model to realize the fusion of the title and the text characteristics, obtaining the characteristic representation of each document, and carrying out text category prediction by using a Softmax function.
The text classification method based on the combination of the headline and the text of the graph network is further described by taking a Qinghua news data set as an example in combination with the attached drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention, and the described examples are only a part of the embodiments of the present invention, and thus the protection scope of the present invention is not limited thereby. All other embodiments obtained by a person skilled in the art without making any inventive step should fall within the scope of protection of the present invention.
As shown in fig. 1, the text classification method of the present invention includes the following steps:
step 1) preparing a Chinese news text data set required by training, selecting a Qinghua news data set (THUCNews) as an example, wherein the data set comprises ten categories of finance, real estate, home furnishing, education, science and technology, fashion, politeness, sports, games and entertainment, and each category comprises ten thousand pieces of data; and establishing a deactivation vocabulary, wherein the deactivation vocabulary comprises punctuation marks, mathematical marks, connection words, exclamation words and word-of-speech words, but is not limited to the punctuation marks, the mathematical marks, the connection words, the exclamation words and the word-of-speech words.
Step 2) processing the data set, and dividing all documents into header documents and text documents; according to experimental data, for example, who did sports bob kui award? The NCAA in the season enters the end segment, the data is divided into two parts according to spaces between a title and a text, and the two parts are respectively marked with labels.
Step 3) carrying out data preprocessing on the text document divided in the step 2), including sentence segmentation, word segmentation and stop word removal, and constructing a text word set; the method comprises the following specific steps:
3-1) intercepting each text document by 500 words;
3-2) carrying out sentence division on the text document by 20 words per sentence, wherein the sequence after the sentence division is consistent with the sequence in the text;
3-3) segmenting each sentence using a jieba segmentation tool, such as "who is bob kui award? "who can get bob kui award-who belongs? ", and removing stop words from the stop word list; for example, 'the great wall of the great wall is the crystal of famous blood sweat of ancient Chinese laborers and the symbol of ancient Chinese culture and the proud' of Chinese nation, the stop word is removed to 'the symbol of ancient Chinese laborers blood sweat crystal of great wall of the great wall of China' the proud 'of Chinese nation', and the calculation amount can be saved.
3-4) establishing a text word set.
Step 4) training the text word set constructed in the step 3) by using a word vector training model to obtain distributed representation of each word in the text word set; in the embodiment, a skip-gram model in Word2vec is used for training a text Word set, and the dimension is set to be 300 dimensions. The distributed representation of each Word of the text and title that can be obtained according to Word2vec, such as { great wall 0.330.320.250.350.23.., china 0.520.390.56.. said. }, the specific dimension can be set by itself during model training, such as 200 dimensions and 100 dimensions.
Step 5) dividing the text document divided in the step 2) into a training set, a verification set and a test set, wherein the division ratio is 8:1: 1;
and 6) inputting the training set divided in the step 5) into an HAN (hierarchical Attention networks) model for training, detecting the HAN model by using the test set divided in the step 5), optimizing the HAN model, and acquiring each text document vector, namely the document 1{ 0.360560.35.
And 7) segmenting the title document segmented in the step 2), constructing a topic word set, and training the topic word set by using a word vector training model to obtain the distributed representation of each word in the topic word set. And performing Word segmentation by using a jieba Word segmentation tool, wherein the Word vector model is a skip-gram model in Word2 vec.
Step 8) training the document in the data set by using an LDA topic model to obtain N topics and topic word distribution of each topic, and obtaining each topic vector according to the topic word distribution; wherein the value of N is set according to the confusion degree of the LDA theme model.
Step 9) taking the title document divided in the step 2), the title word set constructed in the step 7) and the subject obtained in the step 8) as nodes, and constructing a heterogeneous graph according to the relationship among the nodes, as shown in FIG. 2; the relationship among the three types of nodes of the title document, the set of title words and the theme is shown as formula (1):
step 10) dividing the title document divided in the step 2) into a training set, a verification set and a test set, wherein the division ratio is 8:1: 1;
step 11) representing each title document vector in the training set in the step 10) by each text document vector obtained in the step 6);
and step 12) utilizing the heterogeneous map constructed in the step 9), the title document vector in the step 11), the word vector in the step 7) and the theme vector in the step 8) to train a GAT (graph Attention networks) model. And respectively placing the title document vector, the word vector and the theme vector in three files and respectively marking labels. The second is the relationship between nodes, i.e. the adjacency matrix, and the storage format in the file, e.g., { 23, 36, 915. Detecting the GAT model by using the test set divided in the step 10), realizing the fusion of title and text characteristics to obtain the characteristic representation of the whole document, inputting the characteristic representation of the document into the softmax function as shown in the formula (2), wherein the output of the softmax function is the document category,
Z=softmax(H(L)) (2)
wherein Z is a document category, H(L)Is a document feature representation.
The classification accuracy obtained in the example is 96.04, 2 comparative examples are made for the above-mentioned qinghua news data set, the classification accuracy of the TextCNN model in comparative example 1 is 92.36, and the classification accuracy of the BiLstm model in comparative example 2 is 94.36, so that the text classification accuracy is improved by the method provided by the invention. It is useful to describe the present invention, i.e., not to ignore the importance of the headline text in the text classification task.
Claims (9)
1. A text classification method based on title and text combination of a graph network is characterized by comprising the following steps:
step 1) collecting a Chinese news text data set, wherein the data set comprises documents and belonged categories; and a deactivation vocabulary is established,
step 2) processing the data set, and dividing all documents into header documents and text documents;
step 3) carrying out data preprocessing on the text document divided in the step 2), including sentence segmentation, word segmentation and stop word removal, and constructing a text word set;
step 4) training the text word set constructed in the step 3) by using a word vector training model to obtain distributed representation of each word in the text word set;
step 5) dividing the text documents divided in the step 2) into a training set, a verification set and a test set;
step 6) inputting the training set divided in the step 5) into an HAN (hierarchical Attention networks) model for training, detecting the HAN model by using the test set divided in the step 5), optimizing the HAN model, and obtaining each text document vector;
step 7) segmenting the title documents segmented in the step 2), constructing a topic word set, and training the topic word set by using a word vector training model to obtain distributed representation of each word in the topic word set;
step 8) training the document in the data set by using an LDA topic model to obtain N topics and topic word distribution of each topic, and obtaining each topic vector according to the topic word distribution;
step 9) taking the title document divided in the step 2), the title word set constructed in the step 7) and the theme obtained in the step 8) as nodes, and constructing a heterogeneous graph according to the relationship among the nodes;
step 10) dividing the title documents divided in the step 2) into a training set, a verification set and a test set;
step 11) representing each title document vector in the training set in the step 10) by each text document vector obtained in the step 6);
and step 12) training a GAT (graph attachment networks) model by using the heterogeneous map constructed in the step 9), the title document vector constructed in the step 11), the word vector constructed in the step 7) and the theme vector constructed in the step 8), detecting the GAT model by using the test set divided in the step 10), realizing the fusion of title and text characteristics, obtaining the characteristic representation of the whole document, inputting the characteristic representation of the document into a softmax function, wherein the output of the softmax function is the document category.
2. The method for classifying texts combining titles and texts based on graph network according to claim 1, wherein in step 1), the deactivated vocabulary includes punctuation marks, mathematical marks, conjunctions, exclamation words and word moods.
3. The text classification method based on the combination of the title and the text of the graph network as claimed in claim 1, wherein the specific steps of step 3) are as follows:
3-1) intercepting each text document by 500 words;
3-2) carrying out sentence division on the text document by 20 words per sentence, wherein the sequence after the sentence division is consistent with the sequence in the text;
3-3) performing word segmentation on each clause by using a jieba word segmentation tool, and removing stop words in each clause according to a stop word list;
3-4) establishing a text word set.
4. The text classification method based on the combination of the title and the text of the graph network as claimed in claim 1, wherein in the step 4), the skip-gram model in Word2vec is used for training the text Word set, and the set dimension is 300 dimensions.
5. The method for classifying texts combining titles and texts based on graph networks according to claim 1, wherein in step 5), the text documents are divided into training sets, verification sets and test sets, and in step 10), the title documents are divided into training sets, verification sets and test sets, wherein the division ratio of the training sets, the verification sets and the test sets is 8:1: 1.
6. The method for text classification based on the combination of caption and text of graph network according to claim 1, wherein in step 7), a jieba Word segmentation tool is used for Word segmentation, and the Word vector model is a skip-gram model in Word2 vec.
7. The method for classifying texts combining titles and texts based on graph network as claimed in claim 1, wherein in step 8), the value of N is set according to the perplexity of LDA topic model.
9. the text classification method based on title and body combination of graph network of claim 1, characterized in that, in step 12), each document feature representation uses softmax function as shown in formula (2) to output document classification,
Z=softmax(H(L)) (2)
wherein Z is a document category, H(L)Is a document feature representation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011233244.6A CN112347255B (en) | 2020-11-06 | 2020-11-06 | Text classification method based on title and text combination of graph network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011233244.6A CN112347255B (en) | 2020-11-06 | 2020-11-06 | Text classification method based on title and text combination of graph network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112347255A CN112347255A (en) | 2021-02-09 |
CN112347255B true CN112347255B (en) | 2021-11-23 |
Family
ID=74428724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011233244.6A Expired - Fee Related CN112347255B (en) | 2020-11-06 | 2020-11-06 | Text classification method based on title and text combination of graph network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112347255B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239200B (en) * | 2021-05-20 | 2022-07-12 | 东北农业大学 | Content identification and classification method, device and system and storage medium |
CN113378950A (en) * | 2021-06-22 | 2021-09-10 | 深圳市查策网络信息技术有限公司 | Unsupervised classification method for long texts |
CN116701812B (en) * | 2023-08-03 | 2023-11-28 | 中国测绘科学研究院 | Geographic information webpage text topic classification method based on block units |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6217468B2 (en) * | 2014-03-10 | 2017-10-25 | 富士ゼロックス株式会社 | Multilingual document classification program and information processing apparatus |
CN109753567A (en) * | 2019-01-31 | 2019-05-14 | 安徽大学 | A kind of file classification method of combination title and text attention mechanism |
CN110704626B (en) * | 2019-09-30 | 2022-07-22 | 北京邮电大学 | Short text classification method and device |
CN111581967B (en) * | 2020-05-06 | 2023-08-11 | 西安交通大学 | News theme event detection method combining LW2V with triple network |
-
2020
- 2020-11-06 CN CN202011233244.6A patent/CN112347255B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN112347255A (en) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112347255B (en) | Text classification method based on title and text combination of graph network | |
CN107133213B (en) | Method and system for automatically extracting text abstract based on algorithm | |
CN110413986B (en) | Text clustering multi-document automatic summarization method and system for improving word vector model | |
CN110866117B (en) | Short text classification method based on semantic enhancement and multi-level label embedding | |
CN108052593B (en) | Topic keyword extraction method based on topic word vector and network structure | |
CN108280206B (en) | Short text classification method based on semantic enhancement | |
CN107392147A (en) | A kind of image sentence conversion method based on improved production confrontation network | |
CN101599071A (en) | The extraction method of conversation text topic | |
CN106202256A (en) | Propagate based on semanteme and mix the Web graph of multi-instance learning as search method | |
CN102207945A (en) | Knowledge network-based text indexing system and method | |
CN111027595A (en) | Double-stage semantic word vector generation method | |
CN112883171B (en) | Document keyword extraction method and device based on BERT model | |
CN111274804A (en) | Case information extraction method based on named entity recognition | |
CN112163089B (en) | High-technology text classification method and system integrating named entity recognition | |
CN110956041A (en) | Depth learning-based co-purchase recombination bulletin summarization method | |
CN108920586A (en) | A kind of short text classification method based on depth nerve mapping support vector machines | |
CN110457711A (en) | A kind of social media event topic recognition methods based on descriptor | |
CN117171333A (en) | Electric power file question-answering type intelligent retrieval method and system | |
CN103853792A (en) | Automatic image semantic annotation method and system | |
CN107832307B (en) | Chinese word segmentation method based on undirected graph and single-layer neural network | |
CN101271448A (en) | Chinese language fundamental noun phrase recognition, its regulation generating method and apparatus | |
CN104123336A (en) | Deep Boltzmann machine model and short text subject classification system and method | |
CN116304064A (en) | Text classification method based on extraction | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN113434668B (en) | Deep learning text classification method and system based on model fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211123 |