CN112215000A - Text classification method based on entity replacement - Google Patents
Text classification method based on entity replacement Download PDFInfo
- Publication number
- CN112215000A CN112215000A CN202011131161.6A CN202011131161A CN112215000A CN 112215000 A CN112215000 A CN 112215000A CN 202011131161 A CN202011131161 A CN 202011131161A CN 112215000 A CN112215000 A CN 112215000A
- Authority
- CN
- China
- Prior art keywords
- document
- vector
- entity
- disambiguation
- anchor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The invention requests to protect a text classification method based on entity replacement, belongs to the field of natural language processing, and specifically comprises the following steps: (1) detecting the anchor phrases in the document by using an external knowledge base and inquiring an entity set corresponding to each anchor phrase; (2) averaging the document word vectors to obtain context vectors of the documents; (3) respectively calculating attention weights of entities corresponding to the anchor phrases under the context expression vectors to obtain disambiguation vectors of the phrases (4), replacing the anchor phrases at the original text positions with the disambiguation entity vectors and inputting a long-time memory network to obtain document expression vectors after disambiguation, inputting the document expression vectors into a full connection layer of a neural network, and calculating the probability of each text belonging to each category by using a classifier to train the network; (5) and predicting the class of the text to be predicted by using the trained model, and taking the class with the highest probability as the predicted class to be output. The method can eliminate the situation that the semantic ambiguity of the words in the document exists, and the word order information and the context information are kept, so that the text content can be classified more accurately.
Description
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a text classification method based on entity replacement.
Background
Text classification is an important task of natural language processing, and refers to a technique for classifying a given text object according to the characteristics of the text in a fixed category defined in advance. It is widely applied to many scenarios such as topic classification, spam detection, and sentiment classification. In recent years, deep learning and machine learning have made great progress in natural language processing. Recent studies have shown that neural network-based models perform better in text classification tasks than traditional models (e.g., naive bayes). Typical neural network-based text classification models are word-based. They typically use the words in the target document as input to the model, map the words into a space of continuous vectors (word embedding), and combine these vectors by methods such as summing, averaging, Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN) to capture the semantics of the document.
In addition to the above methods, there have been studies attempting to capture semantic information using entities in a Knowledge Base (KB). This approach represents a document using a set of entities (or entity bags) that are related to the document. The benefits of using an entity are: unlike words, entities provide unambiguous semantic information because they are uniquely identified in a knowledge base, whereas words may be semantically ambiguous (e.g., "apple" may refer to fruit or apple corporation, which may have different meanings in different contexts). However, as with the previous approach using the bag-of-words model, simply representing a document with a set of entities can lose the lexical information. Meanwhile, some non-entity descriptive words also have rich information.
Disclosure of Invention
The present invention is directed to solving the above problems of the prior art. The text classification method based on entity replacement solves the problem of semantic ambiguity by finding out a proper entity to replace a semantically ambiguous word in an original text and simultaneously retains language order information and descriptive information in the original text. The technical scheme of the invention is as follows:
a method of text classification based on entity substitution, comprising the steps of:
s1, detecting the anchor phrases in the document by using an external knowledge base and inquiring an entity set corresponding to each anchor phrase;
s2, solving an embedding matrix for averaging the entity set obtained by the document word vector to obtain a context vector of the document;
s3, respectively calculating the attention weight of the entity corresponding to each anchor phrase under the document context expression vector to obtain a disambiguation vector of each entity;
s4, replacing the entity on the original position with a disambiguation entity vector and inputting a long-time memory network to obtain a document expression vector after disambiguation, inputting the document expression vector after disambiguation into a full connection layer of a neural network, and using a classifier to calculate the probability of each text belonging to each category to train the network;
and S5, predicting the category of the text to be predicted by using the trained model, and taking the category with the highest probability as the predicted category to be output.
Further, in step S1, the detecting anchor phrases in the document and querying the entity set corresponding to each anchor phrase by using the external knowledge base includes the following steps:
s11, defining the entity as the determined unambiguous object in the knowledge base; an "anchor phrase" is a literal word, and an anchor phrase may correspond to multiple entities, and an entity may also be represented by multiple anchor phrases;
s12 collecting all anchor phrases in the external corpus Wikipedia, for each anchor phrase S, all entities { e } connected with it1,e2,...eKUsing all the anchor phrases and the entity dictionary thereof to form a Wikipedia dictionary;
s13, extracting all n-grams phrases (n is less than or equal to k) in a document T, wherein the n-grams phrases refer to phrases formed by n words, if one n-gram can exist as an anchor phrase in a Wikipedia dictionary and has at least two corresponding entities, adding the n-gram into a candidate anchor phrase, and adopting a 'first longest' method for the n-grams phrases with contradictory coverage, namely selecting the longest n-gram phrase which appears first, wherein all the anchor phrases in one document are expressed as:
U(T)={c1,c2,...}
the entity set corresponding to the ith anchor phrase is represented as:
E(ci)={e1,e2,...}。
further, in step S2, averaging the document word vectors to obtain a context vector of the document, including the following steps:
s21, pre-training by using a Wikipedia2Vec tool to obtain an embedded matrix of words and entities, and enabling the word vector of the ith word in the documentRepresenting x as a d-dimensional vector),representing d-dimensional space, d representing the degree of dimension, and the length of the document being n, the sentence is represented as:
x1:n=[x1;x2;...;xn]
s22, averaging the word vectors of the document T to obtain the context vector of the document, wherein the calculation formula is as follows:
where C is a context vector for the document.
Further, in step S3, the step of respectively calculating attention weights of entities corresponding to the anchor phrases under the document context representation vector to obtain disambiguation vectors of the anchor phrases includes the following steps:
s31, obtaining the vector representation corresponding to the entity matched in the step S1 by using the embedding matrix pre-trained by the Wikipedia2Vec tool in the step S21, and enabling the jth entity vector corresponding to the ith anchor phrase in the document
S32, for each anchor phrase, calculating the attention weight of the corresponding entity vector under the context expression vector obtained in the step S2, and then weighting and summing the entity vectors to obtain the disambiguation vector of each anchor phrase, wherein the calculation formula is as follows:
wherein alpha isijThe attention weight of the jth entity corresponding to the ith anchor phrase of the document under the context C, v is the number of the entities corresponding to the ith anchor phrase of the document, and ziA disambiguation vector for the ith anchor phrase of the document.
Further, in step S4, replacing the entity at the position of the original text with a disambiguation entity vector and inputting a long-term and short-term memory network to obtain a document expression vector after disambiguation, and inputting the document expression vector into a full link layer of a neural network, and using a classifier to calculate the probability that each text belongs to each category to train the network, the method includes the following steps:
s41, replacing the anchor phrase of the original document with the corresponding disambiguation vector obtained in the step S3, and then the document can be represented as T ═ x1;...;z1;...;zv;...;xn],zvRepresenting the last disambiguation vector, xnRepresents the last primitive word vector, for convenience of description, denoted as [ l ]1;...;lr]Wherein r is the number of vectors contained after replacement;
s42, for the document T, the word vector and the disambiguation vector are input into a bidirectional long-time and short-time memory network in sequence, and for the forward direction of the long-time and short-time memory network, the word vector and the disambiguation vector are input into a bidirectional long-time and short-time memory network in sequence1,...,lrFor the reverse direction of the long-short term memory network, the data are output in sequenceL tor,...,l1(ii) a Calculating hidden layer state values of each word in the forward direction and the reverse direction, summing the hidden layer state values to obtain a final disambiguation document expression vector, wherein the calculation formula is as follows:
wherein liIs the ith vector in the document representation, f is a calculation function of the hidden layer state in the long-time and short-time memory network,the ith vector in the document is represented as a hidden state vector in a forward long-and-short memory network,representing the hidden layer state vector of the ith vector in the document in a reverse long-short time memory network, wherein o is a disambiguation vector of the document;
s43, inputting the disambiguation vector of the document into the full-link layer, using softmax normalization to calculate the probability that the document belongs to each category, finally using the log-likelihood function as a loss function, using the random gradient descent, using back propagation iteration to update the model parameters, and training the model by using the minimized loss function, wherein the calculation formula is as follows:
p=softmax(Wco+bc)
wherein, WcIs a full connection layer weight matrix, bcFor bias terms, softmax is a normalization operation, p is the probability of a document belonging to each category, x is the document in the training set, y is its true category label, and θ is the model parameter.
The invention has the following advantages and beneficial effects:
the invention provides a text classification method based on entity replacement, which is characterized in that a knowledge base and an attention mechanism are utilized to find out a proper entity to replace a semantically fuzzy word in an original text, and a document expression vector after ambiguity removal is obtained. The semantic ambiguity problem is solved, and simultaneously the word order information and the descriptive information in the original text are kept. Therefore, the understanding of the model to the semantics of the documents is improved, and the documents are classified more reliably and accurately.
The main innovation of the method is that the semantically unclear phrases or words at the corresponding positions in the document original text are replaced by the unambiguous entities in the knowledge base, so that the method only finds out the entities and considers the entities as an unordered set, and the language order information and other descriptive information are kept. For each ambiguous phrase, the most likely entity of the phrase is found out by using an attention mechanism, thereby improving the accuracy of determining the entity.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the present invention;
fig. 2 is a network structure diagram of a text classification method based on entity replacement according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described in detail and clearly with reference to the accompanying drawings. The described embodiments are only some of the embodiments of the present invention.
The technical scheme for solving the technical problems is as follows:
the invention mainly provides a text classification method based on entity replacement. The process flow shown in fig. 1 is used. An entity set related to the document is found out by using a knowledge base, a correct entity is selected by using an attention mechanism shown in fig. 2, a semantic fuzzy word in the original text is replaced, a document expression vector after ambiguity removal is obtained, and the language sequence information and the descriptive information in the original text are also retained while the semantic fuzzy problem is solved.
The text classification method based on entity replacement comprises the following steps:
s1, detecting the anchor phrases in the document by using an external knowledge base and inquiring an entity set corresponding to each anchor phrase;
in this embodiment, the sub-steps of specifically implementing S1 are as follows:
s11, defining the entity as the determined unambiguous object in the knowledge base; an "anchor phrase" is a literal. One anchor phrase may correspond to a plurality of entities, and an entity may also be represented by a plurality of anchor phrases;
s12 collecting all anchor phrases in the external corpus Wikipedia, for each anchor phrase S, all entities { e } connected with it1,e2,...eKAs its physical dictionary. All the anchor phrases and the entity dictionary form a Wikipedia dictionary together;
s13, extracting all n-grams phrases (n ≦ k) in the document T, and adding an n-gram to the candidate anchor phrase if the n-gram can exist as an anchor phrase in the Wikipedia dictionary and there are at least two corresponding entities. All anchor phrases in a document are represented as:
U(T)={c1,c2,...}
the entity set corresponding to the ith anchor phrase is represented as:
E(ci)={e1,e2,...}
s2, averaging the word vectors of the documents to obtain context vectors of the documents;
in this embodiment, the sub-steps of specifically implementing S2 are as follows:
s21, pre-training by using Wikipedia2Vec tool to obtain embedded matrix of words and entitiesLet the word vector of the ith word in the documentIf the document length is n, the sentence is represented as:
x1:n=[x1;x2;...;xn]
s22: averaging the word vectors of the document T to obtain a context vector of the document, wherein the calculation formula is as follows:
where C is a context vector for the document.
S3, respectively calculating the attention weight of the entity corresponding to each anchor phrase under the document context expression vector to obtain a disambiguation vector of each entity;
in this embodiment, the sub-steps of specifically implementing S3 are as follows:
and S31, obtaining a vector representation corresponding to the entity matched in the step S1 by means of the embedding matrix pre-trained by the Wikipedia2Vec tool in the step S21. Let the jth entity vector corresponding to the ith anchor phrase in the document
S32, for each anchor phrase, calculating the attention weight of the corresponding entity vector under the context expression vector obtained in the step S2, and then weighting and summing the entity vectors to obtain the disambiguation vector of each anchor phrase. The calculation formula is as follows:
wherein alpha isijAnchoring short for ith documentThe attention weight of the jth entity corresponding to the language under the context C, v is the number of entities corresponding to the ith anchor phrase of the document, and ziA disambiguation vector for the ith anchor phrase of the document.
S4, replacing the entity on the original position with a disambiguation entity vector and inputting a long-time memory network to obtain a document expression vector after disambiguation, inputting the document expression vector into a full connection layer of a neural network, and using a classifier to calculate the probability of each text belonging to each category to train the network;
in this embodiment, the sub-steps of specifically implementing S4 are as follows:
s41, replacing the anchor phrase of the original document with the corresponding disambiguation vector obtained in the step S3, and then the document can be represented as T ═ x1;...;z1;...;zv;...;xn]For convenience of description, it is marked as [ l1;...;lr]Wherein r is the number of vectors contained after replacement;
s42, for the document T, the word vector and the disambiguation vector are input into a bidirectional long-time and short-time memory network in sequence, and for the forward direction of the long-time and short-time memory network, the word vector and the disambiguation vector are input into a bidirectional long-time and short-time memory network in sequence1,...,lrFor the reverse direction of the long-short term memory network, l is input in sequencer,...,l1(ii) a And calculating hidden layer state values of each word in the forward direction and the reverse direction, and summing the hidden layer state values to obtain a final disambiguation document representation vector. The calculation formula is as follows:
wherein liIs the ith vector in the document representation, f is a calculation function of the hidden layer state in the long-time and short-time memory network,the ith vector in the document is represented as a hidden state vector in a forward long-and-short memory network,representing the hidden layer state vector of the ith vector in the document in a reverse long-short time memory network, wherein o is a disambiguation vector of the document;
s43, inputting the disambiguation vector of the document into the full-link layer, using softmax normalization to calculate the probability of the document belonging to each category, finally using the log-likelihood function as a loss function, using the random gradient descent, using the back propagation iteration to update the model parameters, training the model by using the minimized loss function, and calculating
The formula is as follows:
p=softmax(Wco+bc)
wherein, WcIs a full connection layer weight matrix, bcFor bias terms, softmax is a normalization operation, p is the probability of a document belonging to each category, x is the document in the training set, y is its true category label, and θ is the model parameter.
And S5, predicting the category of the text to be predicted by using the trained model, and taking the category with the highest probability as the predicted category to be output.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above examples are to be construed as merely illustrative and not limitative of the remainder of the disclosure. After reading the description of the invention, the skilled person can make various changes or modifications to the invention, and these equivalent changes and modifications also fall into the scope of the invention defined by the claims.
Claims (5)
1. A text classification method based on entity replacement is characterized by comprising the following steps:
s1, detecting the anchor phrases in the document by using an external knowledge base and inquiring an entity set corresponding to each anchor phrase;
s2, solving an embedding matrix for averaging the entity set obtained by the document word vector to obtain a context vector of the document;
s3, respectively calculating the attention weight of the entity corresponding to each anchor phrase under the document context expression vector to obtain a disambiguation vector of each entity;
s4, replacing the entity on the original position with a disambiguation entity vector and inputting a long-time memory network to obtain a document expression vector after disambiguation, inputting the document expression vector after disambiguation into a full connection layer of a neural network, and using a classifier to calculate the probability of each text belonging to each category to train the network;
and S5, predicting the category of the text to be predicted by using the trained model, and taking the category with the highest probability as the predicted category to be output.
2. The method of claim 1, wherein the text classification based on entity replacement,
in step S1, detecting the anchor phrases in the document by using the external knowledge base and querying the entity set corresponding to each anchor phrase, including the following steps:
s11, defining the entity as the determined unambiguous object in the knowledge base; an "anchor phrase" is a literal word, and an anchor phrase may correspond to multiple entities, and an entity may also be represented by multiple anchor phrases;
s12 collecting all anchor phrases in the external corpus Wikipedia, for each anchor phrase S, all entities { e } connected with it1,e2,...eKUsing all the anchor phrases and the entity dictionary thereof to form a Wikipedia dictionary;
s13, extracting all n-grams phrases (n is less than or equal to k) in a document T, wherein the n-grams phrases refer to phrases formed by n words, if one n-gram can exist as an anchor phrase in a Wikipedia dictionary and has at least two corresponding entities, adding the n-gram into a candidate anchor phrase, and adopting a 'first longest' method for the n-grams phrases with contradictory coverage, namely selecting the longest n-gram phrase which appears first, wherein all the anchor phrases in one document are expressed as:
U(T)={c1,c2,...}
the entity set corresponding to the ith anchor phrase is represented as:
E(ci)={e1,e2,...}。
3. the method of claim 2, wherein the text classification based on entity replacement,
in step S2, averaging the document word vectors to obtain a context vector of the document, including the following steps:
s21, pre-training by using a Wikipedia2Vec tool to obtain an embedded matrix of words and entities, and enabling the word vector of the ith word in the documentRepresenting x as a d-dimensional vector),a d-dimensional space is represented,d represents the degree of dimension, the length of the document is n, and the sentence is represented as:
x1:n=[x1;x2;...;xn]
s22, averaging the word vectors of the document T to obtain the context vector of the document, wherein the calculation formula is as follows:
where C is a context vector for the document.
4. The method of claim 3, wherein the text classification based on entity replacement,
in step S3, the step of calculating the attention weight of the entity corresponding to each anchor phrase under the document context expression vector to obtain the disambiguation vector of each anchor phrase includes the following steps:
s31, obtaining the vector representation corresponding to the entity matched in the step S1 by using the embedding matrix pre-trained by the Wikipedia2Vec tool in the step S21, and enabling the jth entity vector corresponding to the ith anchor phrase in the document
S32, for each anchor phrase, calculating the attention weight of the corresponding entity vector under the context expression vector obtained in the step S2, and then weighting and summing the entity vectors to obtain the disambiguation vector of each anchor phrase, wherein the calculation formula is as follows:
wherein alpha isijThe attention weight of the jth entity corresponding to the ith anchor phrase of the document under the context C, v is the number of the entities corresponding to the ith anchor phrase of the document, and ziA disambiguation vector for the ith anchor phrase of the document.
5. The method of claim 4, wherein the text classification based on entity replacement,
in step S4, replacing the entity at the position of the original text with a disambiguation entity vector and inputting a long-term memory network to obtain a document expression vector after disambiguation, inputting the document expression vector into a full link layer of a neural network, and calculating the probability of each text belonging to each category using a classifier to train the network, including the following steps:
s41, replacing the anchor phrase of the original document with the corresponding disambiguation vector obtained in the step S3, and then the document can be represented as T ═ x1;...;z1;...;zv;...;xn],zvRepresenting the last disambiguation vector, xnRepresents the last primitive word vector, for convenience of description, denoted as [ l ]1;...;lr]Wherein r is the number of vectors contained after replacement;
s42, for the document T, the word vector and the disambiguation vector are input into a bidirectional long-time and short-time memory network in sequence, and for the forward direction of the long-time and short-time memory network, the word vector and the disambiguation vector are input into a bidirectional long-time and short-time memory network in sequence1,...,lrFor the reverse direction of the long-short term memory network, l is input in sequencer,...,l1(ii) a Calculating hidden layer state values of each word in the forward direction and the reverse direction, summing the hidden layer state values to obtain a final disambiguation document expression vector, wherein the calculation formula is as follows:
wherein liIs the ith vector in the document representation, f is a calculation function of the hidden layer state in the long-time and short-time memory network,the ith vector in the document is represented as a hidden state vector in a forward long-and-short memory network,representing the hidden layer state vector of the ith vector in the document in a reverse long-short time memory network, wherein o is a disambiguation vector of the document;
s43, inputting the disambiguation vector of the document into the full-link layer, using softmax normalization to calculate the probability that the document belongs to each category, finally using the log-likelihood function as a loss function, using the random gradient descent, using back propagation iteration to update the model parameters, and training the model by using the minimized loss function, wherein the calculation formula is as follows:
p=softmax(Wco+bc)
wherein, WcIs a full connection layer weight matrix, bcFor bias terms, softmax is a normalization operation, p is the probability of a document belonging to each category, x is the document in the training set, y is its true category label, and θ is the model parameter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011131161.6A CN112215000B (en) | 2020-10-21 | 2020-10-21 | Text classification method based on entity replacement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011131161.6A CN112215000B (en) | 2020-10-21 | 2020-10-21 | Text classification method based on entity replacement |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112215000A true CN112215000A (en) | 2021-01-12 |
CN112215000B CN112215000B (en) | 2022-08-23 |
Family
ID=74056225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011131161.6A Active CN112215000B (en) | 2020-10-21 | 2020-10-21 | Text classification method based on entity replacement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112215000B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102207945A (en) * | 2010-05-11 | 2011-10-05 | 天津海量信息技术有限公司 | Knowledge network-based text indexing system and method |
CN103150382A (en) * | 2013-03-14 | 2013-06-12 | 中国科学院计算技术研究所 | Automatic short text semantic concept expansion method and system based on open knowledge base |
CN103177075A (en) * | 2011-12-30 | 2013-06-26 | 微软公司 | Knowledge-based entity detection and disambiguation |
CN106716402A (en) * | 2014-05-12 | 2017-05-24 | 迪飞奥公司 | Entity-centric knowledge discovery |
CN108549723A (en) * | 2018-04-28 | 2018-09-18 | 北京神州泰岳软件股份有限公司 | A kind of text concept sorting technique, device and server |
CN108984745A (en) * | 2018-07-16 | 2018-12-11 | 福州大学 | A kind of neural network file classification method merging more knowledge mappings |
CN109657238A (en) * | 2018-12-10 | 2019-04-19 | 宁波深擎信息科技有限公司 | Context identification complementing method, system, terminal and the medium of knowledge based map |
CN110825848A (en) * | 2019-06-10 | 2020-02-21 | 北京理工大学 | Text classification method based on phrase vectors |
CN111199155A (en) * | 2018-10-30 | 2020-05-26 | 飞狐信息技术(天津)有限公司 | Text classification method and device |
CN111209410A (en) * | 2019-12-27 | 2020-05-29 | 中国地质大学(武汉) | Anchor point-based dynamic knowledge graph representation learning method and system |
CN111488455A (en) * | 2020-04-03 | 2020-08-04 | 上海携旅信息技术有限公司 | Model training method, text classification method, system, device and medium |
-
2020
- 2020-10-21 CN CN202011131161.6A patent/CN112215000B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102207945A (en) * | 2010-05-11 | 2011-10-05 | 天津海量信息技术有限公司 | Knowledge network-based text indexing system and method |
CN103177075A (en) * | 2011-12-30 | 2013-06-26 | 微软公司 | Knowledge-based entity detection and disambiguation |
CN103150382A (en) * | 2013-03-14 | 2013-06-12 | 中国科学院计算技术研究所 | Automatic short text semantic concept expansion method and system based on open knowledge base |
CN106716402A (en) * | 2014-05-12 | 2017-05-24 | 迪飞奥公司 | Entity-centric knowledge discovery |
CN108549723A (en) * | 2018-04-28 | 2018-09-18 | 北京神州泰岳软件股份有限公司 | A kind of text concept sorting technique, device and server |
CN108984745A (en) * | 2018-07-16 | 2018-12-11 | 福州大学 | A kind of neural network file classification method merging more knowledge mappings |
CN111199155A (en) * | 2018-10-30 | 2020-05-26 | 飞狐信息技术(天津)有限公司 | Text classification method and device |
CN109657238A (en) * | 2018-12-10 | 2019-04-19 | 宁波深擎信息科技有限公司 | Context identification complementing method, system, terminal and the medium of knowledge based map |
CN110825848A (en) * | 2019-06-10 | 2020-02-21 | 北京理工大学 | Text classification method based on phrase vectors |
CN111209410A (en) * | 2019-12-27 | 2020-05-29 | 中国地质大学(武汉) | Anchor point-based dynamic knowledge graph representation learning method and system |
CN111488455A (en) * | 2020-04-03 | 2020-08-04 | 上海携旅信息技术有限公司 | Model training method, text classification method, system, device and medium |
Non-Patent Citations (3)
Title |
---|
SHENGZE HU 等: "Entity Linking via Symmetrical Attention-Based Neural Network and Entity Structural Features", 《SYMMETRY》 * |
WEI SHEN 等: ""Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions"", 《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》 * |
牛翊童: "基于知识图谱的命名实体消歧方法研究", 《计算机产品与流通》 * |
Also Published As
Publication number | Publication date |
---|---|
CN112215000B (en) | 2022-08-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kim et al. | Two-stage multi-intent detection for spoken language understanding | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN109726389B (en) | Chinese missing pronoun completion method based on common sense and reasoning | |
WO2021109671A1 (en) | Fine-granularity sentiment analysis method supporting cross-language transfer | |
McDonald et al. | Identifying gene and protein mentions in text using conditional random fields | |
CN111738003B (en) | Named entity recognition model training method, named entity recognition method and medium | |
CN111324752B (en) | Image and text retrieval method based on graphic neural network structure modeling | |
CN107832306A (en) | A kind of similar entities method for digging based on Doc2vec | |
CN109800437A (en) | A kind of name entity recognition method based on Fusion Features | |
CN110263325B (en) | Chinese word segmentation system | |
CN110309514A (en) | A kind of method for recognizing semantics and device | |
US20180357531A1 (en) | Method for Text Classification and Feature Selection Using Class Vectors and the System Thereof | |
US20230122900A1 (en) | Speech recognition method and apparatus | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
CN111104509B (en) | Entity relationship classification method based on probability distribution self-adaption | |
CN110489523B (en) | Fine-grained emotion analysis method based on online shopping evaluation | |
CN108491382A (en) | A kind of semi-supervised biomedical text semantic disambiguation method | |
CN109408802A (en) | A kind of method, system and storage medium promoting sentence vector semanteme | |
CN111222330B (en) | Chinese event detection method and system | |
CN113128203A (en) | Attention mechanism-based relationship extraction method, system, equipment and storage medium | |
CN111666752A (en) | Circuit teaching material entity relation extraction method based on keyword attention mechanism | |
CN114781375A (en) | Military equipment relation extraction method based on BERT and attention mechanism | |
Huang et al. | Text classification with document embeddings | |
Yu et al. | Stance detection in Chinese microblogs with neural networks | |
CN112699685A (en) | Named entity recognition method based on label-guided word fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |