CN106909537A - A kind of polysemy analysis method based on topic model and vector space - Google Patents

A kind of polysemy analysis method based on topic model and vector space Download PDF

Info

Publication number
CN106909537A
CN106909537A CN201710067919.6A CN201710067919A CN106909537A CN 106909537 A CN106909537 A CN 106909537A CN 201710067919 A CN201710067919 A CN 201710067919A CN 106909537 A CN106909537 A CN 106909537A
Authority
CN
China
Prior art keywords
word
vector
topic
representing
theme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710067919.6A
Other languages
Chinese (zh)
Other versions
CN106909537B (en
Inventor
罗嘉文
卓汉逵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201710067919.6A priority Critical patent/CN106909537B/en
Publication of CN106909537A publication Critical patent/CN106909537A/en
Application granted granted Critical
Publication of CN106909537B publication Critical patent/CN106909537B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a kind of polysemy analysis method based on topic model and vector space, including:S1, using formula (1) as object function, set up the topic model of polysemy;S2, the data for reading whole collection of document D;S3, descriptor distributionInitialization;S4, theme sampling;S5, theme vector update;S6, term vector training;S7, circulation perform S4 to S6 several times, to carry out iteration several times;S8, the term vector that will be drawn and theme vector are exported and stored;S9, judge whether polysemy.A kind of polysemy analysis method based on topic model and vector space that the present invention is provided, term vector, the theme vector of more high-quality can be trained, it is set to show more reasonably to explain in the researching and analysing of polysemy, and the performance of topic model is also significantly better than archetype LDA.The present invention learns mutually to improve by the intersection of topic model, term vector, theme vector this three, can be efficiently applied to the tasks such as similarity assessment, document classification, topic relativity.

Description

One-word polysemous analysis method based on topic model and vector space
Technical Field
The invention relates to the field of natural language processing, in particular to a word polysemous analysis method based on a topic model and a vector space.
Background
With the vigorous development of artificial intelligence technology, natural language processing is used as an innovative language research mode, combines computer science, linguistics and mathematics into an intelligent science, and is widely applied to the aspects of machine translation, question-answering systems, information retrieval, document processing and the like. Since most words do not have a meaning, that is, a phenomenon of word ambiguity exists, if each word is represented by a single word vector, the phenomenon of ambiguity cannot be eliminated, in order to solve the problem, context information or a topic vector is used for assisting in a word ambiguity study, but the study isolates a topic model, a word vector and a topic vector, and the existing result is simply used as prior knowledge to assist in training the model.
The topic model is used for mining hidden topic information of a document set, each topic represents a related concept and is embodied as a series of related words, and the realization form is topic-word distribution. The word vector model maps each word to a low-dimensional real-valued space by using context information in the text and contains information such as syntax semantics, so that the similarity of word vectors can be measured by using Euclidean distance or cosine included angle. The topic vector directly maps a topic into vector space, approximately representing the semantic center of a topic.
The topic model, the word vector and the topic vector can be used for document representation and are mainly applied to tasks such as document clustering and document classification. The three items have the characteristics in text mining, the combination of global information of a topic model and local information of a word vector is proved by research to be beneficial to improving the effect of an original model, but the research has great limitation, most of the research independently opens the three items, or one or two items of the three items are trained independently, and then the effect of the other item is improved by means of a training result; or directly using the training result of the larger training set as external knowledge to assist model training of other small data sets.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a word polysemous analysis method based on a topic model and a vector space by modeling a text document set and using the advantages of the topic model, a word vector and a topic vector for reference so as to better mine the hidden topic information of the document set.
In order to achieve the purpose, the invention adopts the following technical scheme:
a word-polysemous analysis method based on a topic model and a vector space comprises the following steps:
s1, taking the formula (1) as an objective function, establishing a word-polysemous topic model:
whereinFor a collection of text documents, M is the number of documents in the collection, NmNumber of words for mth document, c size of context information window, wm,nRepresenting the nth word of the mth document, K representing the number of topics, tkWhich represents the k-th topic vector,representing a topic-word distribution in the topic model,denotes wm,nThe subject number of (1);
s2, reading the whole document setThe data of (a);
s3, topic-word distributionInitialAnd (3) conversion: firstly, a GibbsLDA algorithm is adopted to collect text documentsSubject sampling is performed on each word in the list; then, topic-word distribution to topic modelsCarrying out initialization estimation;
s4, theme sampling: for each word w in the documentm,nCalculating the probability of each topic belonging to the word, and sampling the corresponding topic number z by adopting an accumulation distribution modem,n∈[1,K];
S5, theme vector updating: for each topic vector tk,k∈[1,K]The vector representation is recalculated according to equation (5):
wherein,to indicate a function, when x is true, the result is 1, otherwise it is 0.The expression wm,nCorresponding word vector representation, W represents the vocabulary size of the document collection, nk,wThe number of words w assigned to the topic k;
s6, training word vectors: constructing a Huffman tree, wherein leaf nodes are each word w in a vocabulary table, non-leaf nodes are used as auxiliary vectors u, and a target function shown in a formula (1) is solved by adopting a random gradient descending mode;
s7, circularly executing S4-S6 for a plurality of times to perform a plurality of iterations;
s8, outputting and storing the obtained word vector and the obtained theme vector;
s9, judging whether the word is ambiguous: splicing the word vector and the subject vector of the word to be analyzed to form a new vector representing the whole context environment, then calculating the cosine value of the new vector, and when the cosine value is smaller than a set threshold value, determining that the word has a word ambiguity phenomenon; otherwise, the word is determined not to have a word ambiguity phenomenon.
Further, in S3, the update rule used in the topic sampling process is as shown in equation (2):
where- (m, n) denotes removing the current word at statistical time, W denotes the collection of text documentsNumber of words of (1), nm,kRepresenting the number of words belonging to topic k in the mth document, zm,nThe expression wm,nThe assigned subject number is assigned to the subject,the expression wm,nNumber, n, assigned to topic kk' denotes the number of all words assigned to subject k, α is the dirichlet symmetric hyperparameter;
topic-word distribution to topic modelsThe formula used for the initialization estimation is formula (3):
wherein,the topic-word distribution representing the initialization estimate, β represents the dirichlet symmetric hyperparameter.
Further, in S4, the probability that the word belongs to each topic is calculated according to equation (4):
further, in S6, the method specifically includes the following steps:
s601, updating theme-word distributionIs calculated according to equation (6)The gradient of each component; for each component, defining its constraint as
Wherein, L (w)m,n+j) Representing the distance w from the root node to the leaf node of the Huffman treem,n+jThe path length (number of nodes, including root and leaf nodes),representing the Huffman coding of the node i → i +1 on the path, σ (x) 1/(1+ e)-x),Representing the ith non-leaf node on the path;
s602, updating a word vector w: calculating the gradient of each word according to the formula (7) and updating by using the auxiliary vector;
s603, updating the auxiliary vector u of the non-leaf node of the Huffman tree: calculating a non-leaf node vector u on a Huffman tree path according to the formula (8) so that the non-leaf node vector u can influence the training quality of a word vector w;
further, in S9, the set threshold value is 0.6.
The word polysemous analysis method based on the topic model and the vector space can train higher-quality word vectors and topic vectors, so that the word vectors and the topic vectors can be more reasonably explained in the research and analysis of polysemous, and the expression of the topic model is obviously superior to that of the original model LDA. The method and the system improve the similarity through the cross learning of the topic model, the word vector and the topic vector, and can be effectively applied to tasks such as similarity evaluation, document classification and topic correlation.
Drawings
Fig. 1 is a schematic flowchart of a word-polysemous analysis method based on a topic model and a vector space according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
In order to fully utilize the intrinsic characteristics of a topic model, a word vector and a topic vector, consider the universality of a word ambiguity phenomenon of text data and better mine the latent topic information of a document set and train a word vector and a topic vector with higher quality, the invention provides a word ambiguity analysis method based on the topic model and a vector space.
Specifically, the present invention makes the following reasonable assumptions according to the basic rules of natural language processing:
1. topic-word distribution in topic modelsA series of words with higher probability can be used for representing a specific concept, the numerical meaning of the concept is the probability of a certain word appearing under the theme, and the quality of the mined theme can be evaluated through theme correlation.
2. Each word in the text can be mapped into a low-dimensional real-valued vector space, i.e., a word vector, which contains information such as the syntactic semantics of the word, and the differences between them can be evaluated using mathematical means such as euclidean distance or cosine.
3. Topic-word distribution in topic vectors and topic modelsRather than being completely isolated, the topic vector can be viewed as a semantic center mapping of the probability distribution in the word vector space, closely associated with the word vector.
Based on the above assumptions, the present invention provides a word-of-word polysemous analysis method based on a topic model and a vector space, as shown in fig. 1, the method includes the following steps:
s1, taking the formula (1) as an objective function, establishing a word-polysemous topic model:
whereinFor a collection of text documents, M is the number of documents in the collection, NmNumber of words for mth document, c size of context information window, wm,nRepresenting the nth word of the mth document, K representing the number of topics, tkWhich represents the k-th topic vector,representing a topic-word distribution in the topic model,denotes wm,nThe subject number of (1);
s2, reading the whole text document setThe data of (a);
s3, topic-word distributionInitialization: firstly, a GibbsLDA algorithm is adopted to collect text documentsSubject sampling is performed on each word in the list; then, topic-word distribution to topic modelsCarrying out initialization estimation;
the updating rule used in the theme sampling process is shown as formula (2):
where- (m, n) indicates removal of the current word at the time of statistics, and W indicates the textDocument collectionNumber of words of (1), nm,kRepresenting the number of words belonging to topic k in the mth document, zm,nThe expression wm,nThe assigned subject number is assigned to the subject,the expression wm,nNumber, n, assigned to topic kk'Representing the number of all words assigned to topic k, α is the dirichlet symmetric hyperparameter;
topic-word distribution to topic modelsThe formula used for the initialization estimation is formula (3):
wherein,a topic-word distribution representing an initialization estimate, β a dirichlet symmetric hyperparameter;
s4, theme sampling: for each word w in the documentm,nCalculating the probability of each topic belonging to the word according to the formula (4), and sampling the corresponding topic number z by adopting an accumulation distribution modem,n∈[1,K];
In the topic model, z is an indispensable intermediate bridge in the Gibbs sampling solving process as an implicit variable of the model, and directly influences the topic-word distribution finally required to be acquired by the topic modelAnd the effect of the document-topic distribution theta. Unlike the original gibbs update rule, the present invention employs equation (4) as the gibbs update rule, which is characterized by directly using the subject-word distribution in the update ruleThe beneficial effect is that the distribution can be fully utilizedThe statistical significance and the practical significance of the method can be improved, the calculation speed can be increased, and the method is more suitable for application of large-scale data sets.
S5, theme vector updating: for each topic vector tk,k∈[1,K]The vector representation is recalculated according to equation (5):
wherein,to indicate a function, when x is true, the result is 1, otherwise it is 0.The expression wm,nCorresponding word vector representation, W represents the vocabulary size of the text document set, nk,wIndicating the number of words w assigned to topic k, which is a subtle manifestation of word ambiguity, a word may belong to different topics. Furthermore, the calculation and updating of the theme vector can be carried out simultaneously without mutual interference.
The primary purpose of the topic vector is to use vector space to represent underlying topic information in a document collection rather than similarityThe polynomial distribution of (2) so that the theme has more spatial geometrical significance and is more closely combined with the word vector. Unlike the TWE model which trains the topic vector in a Skip-Gram-like manner, the vector representation corresponding to each topic is directly updated and calculated by the formula (5), and the method is characterized in that the vector representation of each topic is only related to words under the topic, and has the advantages that the topic vector is simple in calculation manner, easy to understand, fast and efficient, and the vector can be closer to the geometric center of the word vectors by mean calculation, and can be approximately regarded as the semantic center of a topic concept according to the hypothesis 2.
S6, training word vectors: constructing a Huffman tree, wherein leaf nodes are each word w in a vocabulary table, non-leaf nodes are used as auxiliary vectors u, and a target function shown in a formula (1) is solved by adopting a random gradient descending mode; specifically, S6 includes the steps of:
s601, updating theme-word distributionIs calculated according to equation (6)The gradient of each component; where it is noted that the distribution has a probabilistic meaning, we need to define for each component its constraint as
Wherein, L (w)m,n+j) Representing the distance w from the root node to the leaf node of the Huffman treem,n+jThe path length (number of nodes, including root and leaf nodes),representing the Huffman coding of the node i → i +1 on the path, σ (x) 1/(1+ e)-x),Representing the ith non-leaf node on the path;
s602, updating a word vector w: calculating the gradient of each word according to the formula (7) and updating by using the auxiliary vector;
s603, updating the auxiliary vector u of the non-leaf node of the Huffman tree. The step is mainly to calculate a non-leaf node vector u on the path of the Huffman tree according to the formula (8) so as to influence the training quality of leaf nodes (namely word vectors w);
the computation complexity of the softmax function is linearly related to the size W of the vocabulary, so that the training of a large-scale data set is not facilitated. The invention continues to use the Skip-Gram approximate calculation method and adopts the idea of layered softmax to construct a Huffman tree, leaf nodes are each word w in the vocabulary, and non-leaf nodes u are used as auxiliary vectors u. In the word vector training stage, the invention adopts random gradient descent to solve the target function shown in the formula (1), and the distribution of the subject and the word is realizedIs characterized by a topic-word distribution as shown in equation (6)Directly using the subject vector tkAnd a non-leaf node vector u of the Huffman tree, which has the beneficial effects that the subject-word scoreClothContinuously absorbing topic vector t in iterative updatekAnd exchange information with the word vector is achieved by the auxiliary vector u, so thatThe updating directly or indirectly utilizes the spatial characteristics of the theme vector and the word vector.
Further, the calculation of the update gradient of the node vector in the huffman tree is shown in the formula (7) and the formula (8), respectively, and is characterized in that the update of the non-leaf vector u directly uses the topic vector and the topic distributionThe updating of the leaf node w directly utilizes the non-leaf vector, and has the advantages that the theme vector and the theme distributionBy permeating non-leaf nodes (namely branches) of the whole Huffman tree, the leaf nodes on the Huffman tree are influenced by the leaf nodes and the branch nodes on a deeper level, so that the mutual promotion effect is achieved.
S7, circularly executing S4-S6 for a plurality of times to perform a plurality of iterations; the iteration is mainly performed to further improve the topic model, the word vector and the topic vector by cross learning.
S8, outputting and storing the obtained word vector and the obtained theme vector;
s9, judging whether the word is ambiguous: splicing the word vector and the subject vector of the word to be analyzed to form a new vector representing the whole context environment, then calculating the cosine value of the new vector, and when the cosine value is smaller than a set threshold (e.g. e <0.60), the word has a word ambiguity phenomenon; otherwise, the meaning of the word is consistent in different given contexts, and the word has no phenomenon of word ambiguity.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (5)

1. A word-polysemous analysis method based on a topic model and a vector space is characterized by comprising the following steps:
s1, taking the formula (1) as an objective function, establishing a word-polysemous topic model:
whereinFor a collection of text documents, M is the number of documents in the collection, NmNumber of words for mth document, c size of context information window, wm,nRepresenting the nth word of the mth document, K representing the number of topics, tkWhich represents the k-th topic vector,representing a topic-word distribution in the topic model,denotes wm,nThe subject number of (1);
s2, reading the whole document setThe data of (a);
s3, topic-word distributionInitialization: firstly, a GibbsLDA algorithm is adopted to collect text documentsSubject sampling is performed on each word in the list; then, topic-word distribution to topic modelsCarrying out initialization estimation;
s4, theme sampling: for each word w in the documentm,nCalculating the probability of each topic belonging to the word, and sampling the corresponding topic number z by adopting an accumulation distribution modem,n∈[1,K];
S5, theme vector updating: for each topic vector tk,k∈[1,K]The vector representation is recalculated according to equation (5):
wherein,to indicate a function, when x is true, the result is 1, otherwise it is 0.The expression wm,nCorresponding word vector representation, W represents the vocabulary size of the document collection, nk,wThe number of words w assigned to the topic k;
s6, training word vectors: constructing a Huffman tree, wherein leaf nodes are each word w in a vocabulary table, non-leaf nodes are used as auxiliary vectors u, and a target function shown in a formula (1) is solved by adopting a random gradient descending mode;
s7, circularly executing S4-S6 for a plurality of times to perform a plurality of iterations;
s8, outputting and storing the obtained word vector and the obtained theme vector;
s9, judging whether the word is ambiguous: splicing the word vector and the subject vector of the word to be analyzed to form a new vector representing the whole context environment, then calculating the cosine value of the new vector, and when the cosine value is smaller than a set threshold value, determining that the word has a word ambiguity phenomenon; otherwise, the word is determined not to have a word ambiguity phenomenon.
2. The analysis method according to claim 1, wherein in S3, the update rule used in the topic sampling process is as shown in equation (2):
p ( z m , n = k | z - ( m . n ) , w ) &Proportional; n k , w m , n - ( m , n ) + &beta; &Sigma; w = 1 W n k , w - ( m , n ) + W &beta; &CenterDot; n m , k - ( m , n ) + &alpha; &Sigma; k &prime; = 1 K n k &prime; - ( m , n ) + K &alpha; - - - ( 2 ) ;
where- (m, n) denotes removing the current word at statistical time, W denotes the collection of text documentsNumber of words of (1), nm,kRepresenting the number of words belonging to topic k in the mth document, zm,nThe expression wm,nThe assigned subject number is assigned to the subject,the expression wm,nNumber, n, assigned to topic kk'Representing the number of all words assigned to topic k, α is the dirichlet symmetric hyperparameter;
topic-word distribution to topic modelsThe formula used for the initialization estimation is formula (3):
wherein,the topic-word distribution representing the initialization estimate, β represents the dirichlet symmetric hyperparameter.
3. The analysis method according to claim 1, wherein in S4, the probability that the word belongs to each topic is calculated according to equation (4):
4. the analysis method according to claim 1, wherein in S6, the method specifically comprises the steps of:
s601, updating theme-word distributionIs calculated according to equation (6)The gradient of each component; for each component, defining its constraint as
Wherein,representing from a root node to a leaf node of a Huffman treePath length (number of nodes, including root node and leaf)A node),representing the Huffman coding of the node i → i +1 on the path, σ (x) 1/(1+ e)-x),Representing the ith non-leaf node on the path;
s602, updating a word vector w: calculating the gradient of each word according to the formula (7) and updating by using the auxiliary vector;
s603, updating the auxiliary vector u of the non-leaf node of the Huffman tree: calculating a non-leaf node vector u on a Huffman tree path according to the formula (8) so that the non-leaf node vector u can influence the training quality of a word vector w;
5. the analysis method according to claim 1, wherein in S9, the set threshold value is 0.6.
CN201710067919.6A 2017-02-07 2017-02-07 One-word polysemous analysis method based on topic model and vector space Active CN106909537B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710067919.6A CN106909537B (en) 2017-02-07 2017-02-07 One-word polysemous analysis method based on topic model and vector space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710067919.6A CN106909537B (en) 2017-02-07 2017-02-07 One-word polysemous analysis method based on topic model and vector space

Publications (2)

Publication Number Publication Date
CN106909537A true CN106909537A (en) 2017-06-30
CN106909537B CN106909537B (en) 2020-04-07

Family

ID=59208107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710067919.6A Active CN106909537B (en) 2017-02-07 2017-02-07 One-word polysemous analysis method based on topic model and vector space

Country Status (1)

Country Link
CN (1) CN106909537B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629009A (en) * 2018-05-04 2018-10-09 南京信息工程大学 Topic relativity modeling method based on FrankCopula functions
CN108920467A (en) * 2018-08-01 2018-11-30 北京三快在线科技有限公司 Polysemant lexical study method and device, search result display methods
CN108984526A (en) * 2018-07-10 2018-12-11 北京理工大学 A kind of document subject matter vector abstracting method based on deep learning
CN109376226A (en) * 2018-11-08 2019-02-22 合肥工业大学 Complain disaggregated model, construction method, system, classification method and the system of text
CN110134786A (en) * 2019-05-14 2019-08-16 南京大学 A kind of short text classification method based on theme term vector and convolutional neural networks
CN110705304A (en) * 2019-08-09 2020-01-17 华南师范大学 Attribute word extraction method
CN110717015A (en) * 2019-10-10 2020-01-21 大连理工大学 Neural network-based polysemous word recognition method
CN112052334A (en) * 2020-09-02 2020-12-08 广州极天信息技术股份有限公司 Text paraphrasing method, text paraphrasing device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020022956A1 (en) * 2000-05-25 2002-02-21 Igor Ukrainczyk System and method for automatically classifying text
US20030217047A1 (en) * 1999-03-23 2003-11-20 Insightful Corporation Inverse inference engine for high performance web search
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files
CN103970730A (en) * 2014-04-29 2014-08-06 河海大学 Method for extracting multiple subject terms from single Chinese text
JP5754018B2 (en) * 2011-07-11 2015-07-22 日本電気株式会社 Polysemy extraction system, polysemy extraction method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030217047A1 (en) * 1999-03-23 2003-11-20 Insightful Corporation Inverse inference engine for high performance web search
US20020022956A1 (en) * 2000-05-25 2002-02-21 Igor Ukrainczyk System and method for automatically classifying text
JP5754018B2 (en) * 2011-07-11 2015-07-22 日本電気株式会社 Polysemy extraction system, polysemy extraction method, and program
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files
CN103970730A (en) * 2014-04-29 2014-08-06 河海大学 Method for extracting multiple subject terms from single Chinese text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DEVORAH E.KLEIN ET AL: "The Representation of Polysemous Words", 《JOURNAL OF MEMORY AND LANGUAGE》 *
曾琦 等: "一种多义词词向量计算方法", 《小型微型计算机系统》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108629009A (en) * 2018-05-04 2018-10-09 南京信息工程大学 Topic relativity modeling method based on FrankCopula functions
CN108984526A (en) * 2018-07-10 2018-12-11 北京理工大学 A kind of document subject matter vector abstracting method based on deep learning
CN108920467A (en) * 2018-08-01 2018-11-30 北京三快在线科技有限公司 Polysemant lexical study method and device, search result display methods
CN109376226A (en) * 2018-11-08 2019-02-22 合肥工业大学 Complain disaggregated model, construction method, system, classification method and the system of text
CN110134786A (en) * 2019-05-14 2019-08-16 南京大学 A kind of short text classification method based on theme term vector and convolutional neural networks
CN110705304A (en) * 2019-08-09 2020-01-17 华南师范大学 Attribute word extraction method
CN110705304B (en) * 2019-08-09 2020-11-06 华南师范大学 Attribute word extraction method
CN110717015A (en) * 2019-10-10 2020-01-21 大连理工大学 Neural network-based polysemous word recognition method
CN112052334A (en) * 2020-09-02 2020-12-08 广州极天信息技术股份有限公司 Text paraphrasing method, text paraphrasing device and storage medium
CN112052334B (en) * 2020-09-02 2024-04-05 广州极天信息技术股份有限公司 Text interpretation method, device and storage medium

Also Published As

Publication number Publication date
CN106909537B (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN106909537B (en) One-word polysemous analysis method based on topic model and vector space
CN112214995B (en) Hierarchical multitasking term embedded learning for synonym prediction
CN113792818B (en) Intention classification method and device, electronic equipment and computer readable storage medium
Ganea et al. Hyperbolic neural networks
CN109213995B (en) Cross-language text similarity evaluation technology based on bilingual word embedding
CN108733742B (en) Global normalized reader system and method
CN108920460B (en) Training method of multi-task deep learning model for multi-type entity recognition
Teng et al. Context-sensitive lexicon features for neural sentiment analysis
CN104834747B (en) Short text classification method based on convolutional neural networks
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN104699763B (en) The text similarity gauging system of multiple features fusion
CN111291556B (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN110334219A (en) The knowledge mapping for incorporating text semantic feature based on attention mechanism indicates learning method
CN108628823A (en) In conjunction with the name entity recognition method of attention mechanism and multitask coordinated training
CN110704621A (en) Text processing method and device, storage medium and electronic equipment
CN106202010A (en) The method and apparatus building Law Text syntax tree based on deep neural network
CN114358007A (en) Multi-label identification method and device, electronic equipment and storage medium
CN110688489B (en) Knowledge graph deduction method and device based on interactive attention and storage medium
CN108874896B (en) Humor identification method based on neural network and humor characteristics
CN111274794B (en) Synonym expansion method based on transmission
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN111881256B (en) Text entity relation extraction method and device and computer readable storage medium equipment
WO2023004528A1 (en) Distributed system-based parallel named entity recognition method and apparatus
Sadr et al. Unified topic-based semantic models: a study in computing the semantic relatedness of geographic terms
CN113553440A (en) Medical entity relationship extraction method based on hierarchical reasoning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant