CN114996455A - News title short text classification method based on double knowledge maps - Google Patents

News title short text classification method based on double knowledge maps Download PDF

Info

Publication number
CN114996455A
CN114996455A CN202210643031.3A CN202210643031A CN114996455A CN 114996455 A CN114996455 A CN 114996455A CN 202210643031 A CN202210643031 A CN 202210643031A CN 114996455 A CN114996455 A CN 114996455A
Authority
CN
China
Prior art keywords
information
entity
short text
news
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210643031.3A
Other languages
Chinese (zh)
Inventor
高楠
王永健
吴一鸣
陈朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202210643031.3A priority Critical patent/CN114996455A/en
Publication of CN114996455A publication Critical patent/CN114996455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A news headline short text classification method based on double knowledge maps comprises the following steps: preprocessing a news headline short text to remove special characters; extracting key words in news titles through a jieba word segmentation tool, and removing stop words; linking the keywords to an external knowledge base through an API (application programming interface) provided by a CNDBpedia external knowledge base to obtain an entity set; disambiguating the entity set through cosine similarity to obtain a candidate entity set; constructing a domain knowledge graph based on global keyword co-occurrence information, and solving the OOV problem; acquiring interpretation information related to the entity by linking to an external knowledge base, and enriching context semantic information; obtaining character-level vector representation of the explanation information of the original news title and the entity link by using BERT, and fusing the vector representation of the two parts to make up for the defect of insufficient short text information; extracting N-grams characteristics among a plurality of continuous words by using TextCNN, and capturing deep semantic information; and finally, classifying through a Softmax function to obtain a final classification result.

Description

News title short text classification method based on double knowledge maps
Technical Field
The invention relates to a news headline short text classification method based on a double-knowledge-map, in particular to classification aiming at domain news headlines. The invention utilizes a Chinese text word segmentation tool to extract a plurality of key words in each news title, and links the key words to entities in an external knowledge base through an entity linking technology. If the link fails, a suitable node is queried from the domain knowledge graph to replace the original keyword. Otherwise, the entity is added into the candidate entity set. By re-linking the candidate entity to an external knowledge base, the interpretation information associated with the entity can be obtained. The method utilizes BERT to obtain vector representation of short texts and explanatory information of news headlines, and finally classifies the short texts and the explanatory information through softmax. The invention relates to the fields of probability models, voice models, deep learning and the like, in particular to the field of natural language processing based on deep learning.
Background
With the popularization of news network platforms, the digital development of the news industry is very rapid, mass data is continuously generated, and the variety of news is more and more. Compared with paragraphs and documents, news headlines have fewer words and lack contextual semantic information, and are sparse and ambiguous. The news headlines are correctly classified, so that the information can be better organized and utilized, and therefore, the method has very important significance on accurately dividing massive news headline data into correct categories.
In the face of news data information with huge and growing scale, the efficiency of classifying news headline data by simply relying on manpower is not high and the cost is huge. In recent years, with the rapid development of machine learning and deep learning, more and more problems can be accomplished by computers, which is also a popular technique in the big data era. Therefore, it is a current trend to solve the tedious task of classifying news headlines by using deep learning. In recent years, many methods for classifying short texts of news headlines have been proposed, which are roughly classified into two types:
1) the classification method based on machine learning comprises the following steps: the method mainly comprises the steps of preprocessing news headlines, extracting features, vectorizing the processed texts, and modeling training data through a common machine learning algorithm, such as a space vector model, a decision tree model and a support vector machine model.
2) The deep learning-based classification method comprises the following steps: the method mainly comprises the steps of vectorizing each character in a news title, and then acquiring local information or sequence information of a text deep level by adopting a Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN). In recent years, more and more methods have been used to enhance the semantic information of news headline short text by using external knowledge bases to obtain relevant concepts of the short text.
The classification method based on machine learning is simple, but the quality of text feature extraction has great influence on classification accuracy, and the methods are heavily dependent on manually designed features and have high cost. In addition, the feature representations are often very sparse, the feature expression capability is weak, and the requirement of short text classification of news headlines cannot be well met.
The deep learning-based classification method can avoid complicated manual feature engineering, and has more influence on classification precision on data volume and training iteration times. By the word vector technology, text data can be converted into low-dimensional dense vectors, manual design features are not relied on, and deep semantic information can be learned.
However, the use of deep learning techniques to implement news headline classification also has several problems:
(1) most news titles are short text types, the number of words is small, context semantic information is lacked, and sparseness and ambiguity are achieved, so that classification cannot be well achieved by the mainstream natural language processing method at present.
(2) At present, semantic information of short texts is enhanced by external knowledge bases in many methods, and entities in multiple fields can be obtained when the entities are obtained through keywords. It is a challenge how to disambiguate an entity to obtain a correct and reasonable entity.
(3) Since the Chinese word segmentation limit is not as well defined as English, so that the accuracy of the segmentation is affected by the segmentation tool, an out-of-probability (OOV) problem is an inevitable problem when entity linking is performed, and how to avoid the OOV problem is a challenge for short text classification of news headlines.
In the field of natural language processing, in order to capture more semantic and syntactic information from short text, the mainstream method at present is to enrich the semantic information of the short text by using an external knowledge base. First, short text is tokenized using common tokenization tools. Then, the key words are linked to the external knowledge base through an entity linking technology to obtain related entities, and concepts related to the entities are obtained from the external knowledge base. And finally, splicing the original text information and the concept information to enhance the semantic information of the short text.
The method makes effective progress in the field of short text classification, but ignores that irregular keywords may appear in news headlines during word segmentation, and the keywords can cause OOV problems. OOV problems refer to the inability of keywords to successfully link to external knowledge bases when entities are linked, which can lead to a degradation in classification performance. Therefore, the invention constructs a map facing a specific field based on the global keyword information. When an OOV problem arises, appropriate keywords can be retrieved from the domain knowledge graph to replace the OOV word.
Therefore, how to solve the problems of insufficient short text information of news headlines and OOV is a problem to be solved urgently in the current big data age.
Disclosure of Invention
The invention provides a news headline short text classification method based on double knowledge maps, which aims to solve the problems that short text information is insufficient and OOV (object oriented language) is generated during entity linking in the prior art.
By means of the external knowledge base, the invention can extract additional interpretation information to enrich the semantics of the short text. When an entity link fails, the domain knowledge base can be used to reselect appropriate keywords to correct the results of the entity link and solve the OOV problem. And finally, combining the character-level features of the short text with the external knowledge features to extract semantic information to obtain a final classification result. Compared with the existing method, the method can achieve more advanced performance.
In order to solve the problems, the technical scheme provided by the invention is as follows:
a news headline short text classification method based on double knowledge maps comprises the following steps:
step 1: the short text of the news headline is preprocessed, and some special characters and stop words are mainly removed.
Step 2: and extracting key words in the news headlines by a jieba word segmentation tool.
And 3, step 3: and linking the keywords to the external knowledge base through an API provided by the CNDBpedia external knowledge base to obtain the entity set.
And 4, step 4: and disambiguating the obtained entity set through cosine similarity to obtain a candidate entity set.
And 5: and constructing a field knowledge graph based on the global keyword co-occurrence information to solve the OOV problem.
Step 6: and for each entity in the candidate entity set, acquiring explanation information related to the entity by linking to an external knowledge base, and enriching the context semantic information.
And 7: and acquiring character-level vector representation of the explanation information of the original news title and the entity link by using BERT, and fusing the vector representation of the original news title and the entity link to make up for the defect of insufficient short text information.
And 8: the TextCNN is used to extract N-grams features between multiple consecutive words, capturing deep semantic information.
And step 9: and finally, classifying through a Softmax function to obtain a final classification result.
The invention enhances the semantic information of the short text by extracting the key words in the short text of the news headline, searching the entity related to the key words from the external knowledge base and combining the entity link technology to obtain the explanation of the entity. Moreover, the method also constructs a domain knowledge graph based on the local data set, and is used for solving the OOV problem occurring during entity linking and enhancing the domain information of the short news text. And finally, capturing the deep semantic features of the short texts of the news headlines by adopting a TextCNN model, and finishing corresponding classification. Compared with the prior method, the method has certain improvement on accuracy and efficiency.
The invention has the advantages that:
1. the invention utilizes the double knowledge maps to obtain more additional information, makes up for the defect of insufficient short text information and improves the accuracy of short text classification of news headlines.
2. The invention constructs the domain knowledge graph to solve the OOV problem and provides a new idea for solving the OOV problem in natural language processing.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention.
FIG. 2 is a core architecture diagram of the TextCNN of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
The embodiment shows the bougnews public data set of the news RSS subscription channel as an example:
a news headline short text classification method based on double knowledge maps comprises the following steps:
step 1: the short text of the news headline is preprocessed, and some special characters, such as Chinese and English punctuations, English characters, numbers and special symbols, are removed. In addition, stop words are removed from the Hadamard stop word list. Part of the news headline preprocessing results are shown in table one.
Table news headline preprocessing results example
Original news Preprocessed news
First money TD single-chip removes in commercial and beats cost tablet again First money single-chip is removed in commercial and is made cost tablet again
Guge buys IBM thousands of patents to deal with apple litigation Guge buys thousands of patents to deal with apple litigation
All-metal body motorola publishing novel machine Klassic All-metal frame motorcycle roller releasing machine
Beijing Dong: eligible iPhone user spread compensation Kyoto eligible user spread compensation
Ultra-light ultra-thin Sanyo New-style linear PCM recording pen Ultra-light ultra-thin Sanyo New-style linear recording pen
Step 2: and extracting key words from the preprocessed short text of the news headline through a jieba word segmentation tool. Such as: for short text S 1 : the method is characterized in that the Google buys thousands of ICM patents to deal with apple litigation, and a keyword set { "patents", "apples" } can be obtained.
And step 3: the invention uses CNDBpedia external knowledge base developed by the knowledge workshop laboratory of the university of double denier to obtain the entity related to the keyword according to the provided ment2ent entity link API.
And 4, step 4: for the set of entities obtained in the previous step, the BerT is used to obtain a vector representation of the set of entities and the news headlines. Entity E is then calculated using the cosine similarity i With news headlines S i The degree of similarity of (a) to (b),and selecting the entity with the highest similarity score to be added into the candidate entity set. For the keyword "apple", entity set E { "apple company", "apple (movie work)", "apple (fruit)" } can be obtained, and after the cosine similarity calculation, the following result E { "apple company" is obtained: 90.26, "apple (movie work)": 87.74, "apple (fruit)": 87.88, so the "apple firm" with the highest score is added to the set of candidate entities here.
Figure BDA0003682979020000051
And 5: because short text itself may have irregular expressions, not every keyword may be successfully linked to an entity in the external knowledge base, where OOV problems may arise.
For Chinese entity chaining, there are two main reasons for the OOV problem: (1) the entity is not covered by an external repository. (2) The word segmentation result of the short text is incorrect.
In order to solve the OOV problem, the method takes keywords as nodes to construct a domain knowledge graph. In particular, the present invention uses a fixed-size sliding window to collect keyword co-occurrence information. The weights between two keyword nodes are calculated using point-by-Point Mutual Information (PMI). The higher the probability that two keywords appear simultaneously in the text, the stronger the correlation between the two keywords. When the value of PMI is less than 0, the relationship between the two keywords is considered weakly correlated. Only if the PMI value is greater than 0 will an edge be created between the two keys. The PMI calculation process is as follows:
Figure BDA0003682979020000052
Figure BDA0003682979020000053
Figure BDA0003682979020000054
here, # W (i) represents the number of sliding windows in the corpus that contain the keyword i, # W (i, j) represents the number of sliding windows that contain both the keyword i and the keyword j, and # W represents the total number of sliding windows.
With the help of the domain knowledge graph, when an entity link has an OOV problem, the neighborhood information of the keyword can be inquired from the domain knowledge graph. And sorting according to the weight value calculated by the PMI, and taking out the neighbor of the top three. And firstly, replacing the original keyword with the neighbor with the lowest rank, re-linking the entity from the external knowledge base, and if the OOV problem still occurs, sequentially taking out the next ranked neighbor for re-linking until success or the traversal is finished.
In the news headline short text S 2 In the judgment result of the infringement case of the all-side data approval paper, the keyword 'infringement case' has an OOV problem when operated at ment2 ent. At the moment, the 'litigation case' is used as a node, neighbor information in the domain knowledge graph is inquired to obtain a node set { 'litigation request', 'copyright method', 'complaining valley' }, the 'litigation request' is used for replacing the 'litigation case' key word, the key word is re-linked to an external knowledge base, no OOV problem occurs, and therefore the 'litigation request' is added into the candidate entity set.
Step 6: the semantic enhancement can make up for the defect of short text information deficiency. And for the candidate entity set obtained in the last step, sequentially linking the candidate entity set to an external knowledge base to obtain the interpretation information related to the entities and enrich the semantic information of the short text.
For a candidate entity "litigation request", its interpretation information K is available 1 The concept of litigation request has broad and narrow meaning in foreign civil litigation. In a broad sense, litigation requests are requests that are submitted to a court to require the court to make a decision "};
for the candidate entity "apple Corp", its interpretation information K can be obtained 2 Is { "apple company is Mei {National high-tech company "}.
And 7: for short Chinese texts, the words are not uniformly distributed, so that a fine-tuned pre-training model BERT is used to obtain the character-level semantic information. There are two reasons to use character-level embedding instead of word embedding: (1) the news headline is short in length, and word embedding has the problem of data sparseness. (2) The TextCNN is facilitated to extract N-grams information between a plurality of consecutive words.
Suppose that the length of the news headline short text S is n, the length of the interpretation information K is l, and the vector dimension is d. If the length of the news headline or the explanatory information is not long enough, the news headline or the explanatory information is used<PAD>To fill in the sentence, whereas the superfluous parts are truncated. Thus, we can obtain the short text semantic matrix W s And interpreting the information semantic matrix W k
Figure BDA0003682979020000061
Figure BDA0003682979020000062
Figure BDA0003682979020000063
A d-dimensional vector representation representing the ith word in the news headline short text S,
Figure BDA0003682979020000064
representing a vector stitching operation. Thus, the semantically enhanced feature representation matrix
Figure BDA0003682979020000065
And 8: although CNN is not suitable for learning long-distance semantic information, it can better learn local information of news headline short text. Therefore, the invention adopts the TextCNN to capture deep semantic information, which mainly comprises a convolution layer, a pooling layer and a full-link layer.
Convolutional layer of
Figure BDA0003682979020000066
The convolution kernel with the size acts on the semantic matrix with the length of n + l words, and deep semantic features can be obtained.
c i =f(w·x i:i+h-1 +b) (7)
Here, the
Figure BDA0003682979020000071
Representing a bias term, f is a nonlinear activation function, and finally a new feature matrix c can be obtained.
c=[c 1 ,c 2 ,…,c n-h+1 ] (8)
The pooling layer is used to capture the most important eigenvalues and randomly initializes the eigenvalues to 0 using dropout. This is a regularization approach to avoid model overfitting. And then splicing the feature matrixes obtained by convolution kernels with different sizes into one block, and inputting the full connection layer for classification.
And step 9: and outputting the probability value of each category by using a softmax activation function to obtain a final classification result.
The invention mainly solves the problems of insufficient short text information and OOV in natural language processing. A dual knowledge base model based on an external knowledge base and a domain knowledge base is provided. The model acquires semantic enhancement information of the short text by using a CNDBPedia external knowledge base. When an entity link fails, a substitution is made by finding the appropriate keyword in the domain knowledge graph. TextCNN is then used to capture features between multiple consecutive words. Finally, the full connectivity layer is used for classification.
The invention has been described by way of the above examples, but it is obvious that the examples are given for illustrative purposes only and are not intended to limit the invention to the scope of the examples. Workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention, and therefore the invention is not limited to the specific forms and details described above.

Claims (3)

1. A news headline short text classification method based on double knowledge maps comprises the following steps:
step 1: preprocessing a news headline short text to remove special characters; removing stop words according to the work-great stop word list;
step 2: extracting key words in news titles through a jieba word segmentation tool;
and step 3: linking the keywords to an external knowledge base through an API (application programming interface) provided by a CNDBpedia external knowledge base to obtain an entity set;
and 4, step 4: disambiguating the obtained entity set through cosine similarity to obtain a candidate entity set; for the entity set obtained in the step 3, using BERT to obtain the vector representation of the entity set and the news headline; entity E is then calculated using the cosine similarity i With news headlines S i Selecting an entity with the highest similarity score and adding the entity into the candidate entity set;
Figure FDA0003682979010000011
and 5: constructing a domain knowledge graph based on global keyword co-occurrence information to solve the OOV problem;
5.1) constructing a domain knowledge graph by taking the keywords as nodes; in particular, a fixed-size sliding window is used to collect keyword co-occurrence information; calculating a weight between two keyword nodes using a point-by-point mutual information PMI; the higher the probability that two keywords appear simultaneously in the text is, the stronger the correlation between the two keywords is; when the value of PMI is less than 0, the relationship between these two keywords is considered weakly correlated; only if the PMI value is greater than 0, an edge is created between the two keywords; the PMI calculation process is as follows:
Figure FDA0003682979010000012
Figure FDA0003682979010000013
Figure FDA0003682979010000014
here, # W (i) represents the number of sliding windows in the corpus that contain the keyword i, # W (i, j) represents the number of sliding windows that contain both the keyword i and the keyword j, and # W represents the total number of sliding windows;
5.2) when the entity link has OOV problem, inquiring the neighbor node of the keyword from the domain knowledge graph; sequencing the neighbor nodes according to the weight values calculated by the PMI, and taking out the neighbor nodes ranked in the first three; firstly, replacing the original keyword by using the neighbor node with the lowest rank, re-linking the entity from the external knowledge base, and if the OOV problem still occurs, sequentially taking out the neighbor with the next rank for re-linking until success or traversal end;
step 6: for each entity in the candidate entity set, acquiring explanation information related to the entity by linking to a CNDBpedia external knowledge base, and enriching context semantic information;
and 7: obtaining character-level vector representation of the explanation information of the original news title and the entity link by using BERT, and fusing the vector representation of the two parts to make up for the defect of insufficient short text information;
acquiring character-level semantic information by adopting a fine-tuned pre-training model BERT; using character level embedding instead of word embedding;
supposing that the length of a news headline short text S is n, the length of explanatory information K is l, and the vector dimension is d; if the length of the news headline or the explanatory information is not long enough, the news headline or the explanatory information is used<PAD>To fill in sentences, otherwise to truncate superfluous parts; thus, we can obtain the short text semantic matrix W s And interpreting the information semantic matrix W k
Figure FDA0003682979010000021
Figure FDA0003682979010000022
Figure FDA0003682979010000023
A d-dimensional vector representation representing the ith word in the news headline short text S,
Figure FDA0003682979010000024
representing a vector splicing operation; thus, the semantically enhanced feature representation matrix
Figure FDA0003682979010000025
And 8: extracting N-grams characteristics among a plurality of continuous words by adopting TextCNN, and capturing deep semantic information;
convolutional layer adoption
Figure FDA0003682979010000026
The large convolution kernel acts on the semantic matrix with the length of n + l words to obtain deep semantic features;
c i =f(w·x i:i+h-1 +b) (7)
here, the
Figure FDA0003682979010000027
Representing a bias term, f is a nonlinear activation function, and finally a new feature matrix c can be obtained;
c=[c 1 ,c 2 ,…,c n-h+1 ] (8)
the pooling layer is used for capturing the most important characteristic value, and a dropout random initialization characteristic value is 0; dropout is a regularization means to avoid model overfitting; splicing feature matrixes obtained by convolution kernels with different sizes into one block, and inputting a full connection layer for classification;
and step 9: and outputting the probability value of each category through the softmax activation function to obtain a final classification result.
2. The dual-knowledge-graph-based news headline short text classification method of claim 1, wherein: the special characters in the step 1 comprise Chinese and English punctuation marks, English characters, numbers and special symbols.
3. The dual-knowledge-graph-based news headline short text classification method of claim 1, wherein: the TextCNN in step 8 comprises a convolution layer, a pooling layer and a full-link layer.
CN202210643031.3A 2022-06-08 2022-06-08 News title short text classification method based on double knowledge maps Pending CN114996455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210643031.3A CN114996455A (en) 2022-06-08 2022-06-08 News title short text classification method based on double knowledge maps

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210643031.3A CN114996455A (en) 2022-06-08 2022-06-08 News title short text classification method based on double knowledge maps

Publications (1)

Publication Number Publication Date
CN114996455A true CN114996455A (en) 2022-09-02

Family

ID=83033607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210643031.3A Pending CN114996455A (en) 2022-06-08 2022-06-08 News title short text classification method based on double knowledge maps

Country Status (1)

Country Link
CN (1) CN114996455A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051132A (en) * 2023-04-03 2023-05-02 之江实验室 Illegal commodity identification method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116051132A (en) * 2023-04-03 2023-05-02 之江实验室 Illegal commodity identification method and device, computer equipment and storage medium
CN116051132B (en) * 2023-04-03 2023-06-30 之江实验室 Illegal commodity identification method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113011533B (en) Text classification method, apparatus, computer device and storage medium
CN111177365B (en) Unsupervised automatic abstract extraction method based on graph model
CN110442760B (en) Synonym mining method and device for question-answer retrieval system
CN110298037B (en) Convolutional neural network matching text recognition method based on enhanced attention mechanism
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
CN108628828B (en) Combined extraction method based on self-attention viewpoint and holder thereof
CN108710894B (en) Active learning labeling method and device based on clustering representative points
CN110532328B (en) Text concept graph construction method
CN110162771B (en) Event trigger word recognition method and device and electronic equipment
CN112818093B (en) Evidence document retrieval method, system and storage medium based on semantic matching
CN110674252A (en) High-precision semantic search system for judicial domain
CN114065758B (en) Document keyword extraction method based on hypergraph random walk
CN113569050B (en) Method and device for automatically constructing government affair field knowledge map based on deep learning
CN107391565B (en) Matching method of cross-language hierarchical classification system based on topic model
CN111241824B (en) Method for identifying Chinese metaphor information
CN112307182B (en) Question-answering system-based pseudo-correlation feedback extended query method
CN111008530A (en) Complex semantic recognition method based on document word segmentation
CN110866102A (en) Search processing method
CN114881043B (en) Deep learning model-based legal document semantic similarity evaluation method and system
Sousa et al. Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN111737420A (en) Class case retrieval method, system, device and medium based on dispute focus
CN112860898B (en) Short text box clustering method, system, equipment and storage medium
CN114996455A (en) News title short text classification method based on double knowledge maps
CN116756346A (en) Information retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination