CN112949319B - Method, device, processor and storage medium for marking ambiguous words in text - Google Patents

Method, device, processor and storage medium for marking ambiguous words in text Download PDF

Info

Publication number
CN112949319B
CN112949319B CN202110270079.XA CN202110270079A CN112949319B CN 112949319 B CN112949319 B CN 112949319B CN 202110270079 A CN202110270079 A CN 202110270079A CN 112949319 B CN112949319 B CN 112949319B
Authority
CN
China
Prior art keywords
context
text
word
processor
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110270079.XA
Other languages
Chinese (zh)
Other versions
CN112949319A (en
Inventor
陆恒杨
黄渊卓
方伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202110270079.XA priority Critical patent/CN112949319B/en
Publication of CN112949319A publication Critical patent/CN112949319A/en
Application granted granted Critical
Publication of CN112949319B publication Critical patent/CN112949319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention relates to a method, equipment, a processor and a storage medium for marking ambiguous words in a text; the method comprises the steps of inputting an original corpus to be processed; training a context-related word embedding model to obtain a context-related vector; establishing a semantic vector generation algorithm according to the context correlation vector, distinguishing a plurality of meanings of each word, and labeling an original corpus; and outputting the pseudo document labeled with the ambiguity. The context related word embedding model is utilized, different semantemes of the polysemous words are labeled, ambiguity of various semantemes is eliminated, and compared with a text without considering word polysemous, the accuracy of tasks such as subsequent text processing, text classification and a theme model is greatly improved.

Description

Method, device, processor and storage medium for marking ambiguous words in text
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a method, a device, a processor, and a storage medium for annotating ambiguous words in a text.
Background
Word embedding models (word embedding) are often used in various types of natural language processing tasks, such as text mining, sentiment analysis, text classification, and the like. Common word embedding models, such as word2vec and GloVe, can only learn one vector for each word, ignoring word ambiguity problems in different contexts, e.g., the word "applet" can have multiple semantics, in the word "I like eating applets", it refers to a fruit; when appearing in the sentence "We went to the Apple store yesterday," it refers to the name of a technology company. This phenomenon can be attributed to word ambiguity problems. Recent studies have shown that considering word ambiguity issues can effectively improve the effectiveness of models when natural language processing tasks are performed.
Generally, different word vectors are learned in different contexts for the same word, and the learning is mainly performed by three types of learning paradigms: two-stage models, joint models, and contextized word models. Wherein Two-stage models categorize word semantics by clustering given contexts, which has the disadvantage of large computation; the Joint models cluster the context vectors of given words to jointly form the semantics of the words, so that the limitation of using local context only is solved, the method further extends to the embedding of ambiguous words and the like, and the disadvantage is that most methods need to define the fixed semantic quantity of each word, which is unrealistic; context-related vectors of words are learned by training a bidirectional LSTM language model, each word in a corpus has a semantic vector, the semantic vectors can work together in the context of all layers, the context-related vectors can be easily added to various existing NLP tasks, and the ambiguity of the words can be found more flexibly by using the model. Therefore, the present application adopts Contextualized word elements for word ambiguity labeling.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to provide an ambiguous word labeling method combining a context word embedding model, which is used for solving the problem of word ambiguity and labeling the word semantics most suitable for the text context.
In order to solve the technical problem, the invention provides a method for labeling ambiguous words in a text, which comprises the steps of inputting an original corpus to be processed; training a context-related word embedding model to obtain a context-related vector; building a semantic vector generation algorithm according to the context correlation vector, distinguishing a plurality of meanings of each word, and labeling an original corpus; and outputting the pseudo document labeled with the ambiguity.
In an embodiment of the present invention, the context-related word embedding model is ELMo, and a specific calculation manner using the ELMo algorithm is as follows:
Figure BDA0002973957710000021
wherein gamma is a parameter for adjusting the vector scale; s j Normalizing the parameters of the weight for the softmax of the j-th layer;
Figure BDA0002973957710000022
is the jth hidden layer in both forward and backward directions.
In an embodiment of the present invention, the process of constructing the semantic vector generation algorithm is as follows:
inputting an original corpus D, a dictionary dic of each word and a corresponding semantic context correlation vector thereof, and a cosine distance threshold epsilon
Initialize dic to null
foreach document d in D do
for i←0to len(d)-1do
If the current semantics
Figure BDA0002973957710000023
Not in the dictionary dic, but in the dictionary dic,
will be provided with
Figure BDA0002973957710000024
Is assigned to w i #s;
Will be provided with
Figure BDA0002973957710000025
Is assigned to
Figure BDA0002973957710000026
Will be provided with
Figure BDA0002973957710000031
Adding to dictionary dic
Figure BDA0002973957710000032
The preparation method comprises the following steps of (1) performing;
will w i Addition of # s to pd i Performing the following steps;
if not, then,
initializing minDist =1 and minIndex =0;
initializing found = False;
for k←0to len(dic
Figure BDA0002973957710000033
)-1do。
in one embodiment of the invention, the cosine distance of two semantic vectors is calculated, expressed as
Figure BDA0002973957710000034
The calculation formula is as follows:
Figure BDA0002973957710000035
in one embodiment of the invention, the computed cosine distance is used
Figure BDA0002973957710000036
Comparing with a preset cosine distance threshold value epsilon, and if the cosine distance is equal to the preset cosine distance threshold value epsilon
Figure BDA0002973957710000037
If the distance is greater than the cosine distance threshold epsilon, the two semantic vectors have different meanings in the two contexts; distance of cosine
Figure BDA0002973957710000038
Less than the cosine distance threshold epsilon, the two semantic vectors have the same meaning in both contexts.
In one embodiment of the present invention, when two semantic vectors have the same meaning in two contexts, the context correlation vector needs to be recalculated, and a new semantic vector bisector is formed by using the angle bisector of the two semantic vectors, and the calculation formula is:
Figure BDA0002973957710000039
in an embodiment of the present invention, before inputting an original corpus to be processed, the original corpus needs to be preprocessed, which includes: unifying case letters, deleting all stop words, deleting documents containing less than three words
The present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method when executing the program.
The present invention provides a processor for running a program, wherein the program executes the method.
To solve the above technical problem, the present invention provides a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the steps of the above method.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention provides an ambiguous word labeling method combining a context word embedding model, which is used for solving the problem of word ambiguity and labeling word semantics most suitable for a text context;
each word has different semantics in different context, which is roughly divided into two types, one is to have the same semantics in different context, but the word vectors of the word are similar but not identical due to the difference of the respective partial contexts. The other is that the text has different semantics in different context, which causes the word vector of the word to have larger difference, and the two conditions cause the word vector of some ambiguous words in the text to have misalignment degree, so as to generate ambiguity; the invention utilizes the context related word embedded model, eliminates ambiguity of various semantics by labeling different semantics of the ambiguous words, and greatly improves the accuracy of tasks such as subsequent text processing, text classification, topic model and the like compared with the text without considering word ambiguity.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which
FIG. 1 is a flow chart of the steps of the method for labeling ambiguous words in text according to the present invention.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
Referring to fig. 1, a method for labeling ambiguous words in a text according to the present invention includes inputting an original corpus to be processed; training a context-related word embedding model to obtain a context-related vector; establishing a semantic vector generation algorithm according to the context correlation vector, distinguishing a plurality of meanings of each word, and labeling an original corpus; and outputting the pseudo document labeled with the ambiguity.
Each word has different semantics in different context, which is roughly divided into two types, one is to have the same semantics in different context, but the word vectors of the word are similar but not identical due to the difference of the contexts of the parts. The other is that different semantics are possessed in different context, which causes the word vectors of the words to have larger difference, and the two conditions cause the word vectors of some ambiguous words in the text to have misalignment degree, thereby generating ambiguity; the context related word embedding model is utilized, different semantemes of the polysemous words are labeled, ambiguity of various semantemes is eliminated, and compared with a text without considering word polysemous, the accuracy of tasks such as subsequent text processing, text classification and a theme model is greatly improved.
Specifically, in this embodiment, before the original corpus to be processed is input, the preprocessing is performed on the original corpus, which includes: the method has the advantages that the method can reduce the vocabulary quantity on one hand and reduce the calculation quantity of an algorithm on the other hand, improves the calculation efficiency, and does not have context for the documents containing less than three words, so that a context related word embedding model cannot be trained, and the documents containing less than three words are actively eliminated before input.
In this embodiment, the context-dependent word embedding model is an ELMo algorithm, which is a Language model pre-training methodology in the existing Natural Language Processing (NLP), ELMo (embedded from Language Models), which uses a bidirectional LSTM Language model, and is composed of a forward Language model and a backward Language model, and the objective function is the maximum likelihood of the two directional Language Models, and the algorithm is characterized in that: the representation of each word is a function of the whole input sentence, and the specific method is to train a bidirectional LSTM model on a large corpus by taking a language model as a target and then generate the representation of the word by using the LSTM. ELMo tokens are "deep," that is, they are a function of the internal tokens of all layers of the bi-directional LSTM, which has the advantage of being able to produce rich word tokens. The state of the higher level LSTM can capture the features of the word meaning in the context (e.g., can be used to disambiguate semantics), while the lower level LSTM can find the features of the grammar (e.g., can be part-of-speech tagged). If they are combined, advantages will be realized in the NLP task downstream.
In this embodiment, a specific calculation method using the ELMo algorithm is as follows:
Figure BDA0002973957710000061
wherein gamma is a parameter for adjusting the vector scale; s j Normalizing the parameters of the weight for the softmax of the j-th layer;
Figure BDA0002973957710000062
is the j-th hidden layer in the forward and backward directions, in this way a context correlation vector is obtained.
Specifically, the process of constructing the semantic vector generation algorithm is as follows:
firstly, input original corpus D, dictionary dic of each word and its corresponding semantic context related vector, cosine distance threshold epsilon
Initialize dic to null
foreach document d in D do
for i←0to len(d)-1do
If the current semantics
Figure BDA0002973957710000063
Not in the dictionary dic, but in the dictionary dic,
will be provided with
Figure BDA0002973957710000064
Is assigned to w i #s;
Will be provided with
Figure BDA0002973957710000065
Is assigned to
Figure BDA0002973957710000066
Will be provided with
Figure BDA0002973957710000067
Adding to dictionary dic
Figure BDA0002973957710000068
Performing the following steps;
will w i Addition of # s to pd i Performing the following steps;
if not, then,
initializing minDist =1 and minIndex =0;
initializing found = False;
for k←0to len(dic
Figure BDA0002973957710000069
)-1do。
the cosine distance of two semantic vectors is calculated and expressed as
Figure BDA00029739577100000610
The calculation formula is as follows:
Figure BDA00029739577100000611
the cosine distance obtained by calculation
Figure BDA00029739577100000612
Comparing with a preset cosine distance threshold value epsilon, and if the cosine distance is equal to the preset cosine distance threshold value epsilon
Figure BDA00029739577100000613
If the distance is greater than the cosine distance threshold epsilon, the two semantic vectors have different meanings in the two contexts; distance of cosine
Figure BDA0002973957710000071
If the distance is less than the cosine distance threshold epsilon, the two semantic vectors have the same meaning in the two contexts, and the specific algorithm process is as follows:
if it is
Figure BDA0002973957710000072
And is
Figure BDA0002973957710000073
Change the value of minDist to
Figure BDA0002973957710000074
Changing the value of minidex to k;
changing the value of Found to True;
if the mount is False,
changing the value of s to len (dic)
Figure BDA0002973957710000075
);
Will be provided with
Figure BDA0002973957710000076
Is assigned to w i #s;
Will be provided with
Figure BDA0002973957710000077
Is assigned to
Figure BDA0002973957710000078
Will be provided with
Figure BDA0002973957710000079
Addition to dic
Figure BDA00029739577100000710
[s];
Will w i Addition of # s to pd i Performing the following steps;
if not, then,
changing the value of s to minidex;
will be provided with
Figure BDA00029739577100000711
Is assigned to w i #s;
When two semantic vectors have the same meaning in two contexts, the context related vector needs to be recalculated, and a new semantic vector bisector is formed by using the angle bisector of the two semantic vectors, and the calculation formula is:
Figure BDA00029739577100000712
applying the above semantic vector bisectorUpdating
Figure BDA00029739577100000713
The value of (a) is,
will be provided with
Figure BDA00029739577100000714
Adding to dictionary dic
Figure BDA00029739577100000715
[s]The preparation method comprises the following steps of (1) performing;
will w i Addition of # s to pd i In (1).
Will pd i And adding the document to a PD (pseudo document), and finally outputting the pseudo document marked with the ambiguity.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications derived therefrom are intended to be within the scope of the invention.

Claims (7)

1. A method for labeling ambiguous words in text is characterized in that: the method comprises the following steps:
inputting an original corpus to be processed;
training a context-related word embedding model to obtain a context-related vector;
building a semantic vector generation algorithm according to the context correlation vector, distinguishing a plurality of meanings of each word, and labeling an original corpus, specifically comprising the following steps: calculating the cosine distance of the two semantic vectors, comparing the calculated cosine distance with a preset cosine distance threshold value epsilon, and if the cosine distance is greater than the cosine distance threshold value epsilon, the two semantic vectors have different meanings in the two contexts; if the cosine distance is smaller than the cosine distance threshold epsilon, the two semantic vectors have the same meaning in the two contexts, when the two semantic vectors have the same meaning in the two contexts, context related vectors need to be calculated again, and a new semantic vector bisegctor is formed by utilizing the angle bisector of the two semantic vectors;
and outputting the pseudo document labeled with the ambiguity.
2. The method for labeling ambiguous words in text as recited in claim 1, further comprising: the context-related word embedding model is ELMo, and a specific calculation mode by utilizing an ELMo algorithm is as follows:
Figure FDA0003886137450000011
wherein gamma is a parameter for adjusting the vector scale; s j Normalizing parameters of the weight for the jth layer softmax;
Figure FDA0003886137450000012
Figure FDA0003886137450000013
is the jth hidden layer in both forward and backward directions.
3. The method for labeling ambiguous words in text as recited in claim 1, further comprising: the process of constructing the semantic vector generation algorithm comprises the following steps:
inputting an original corpus D, a dictionary dic of each word and the corresponding semantic context related vector, and a cosine distance threshold epsilon
Initialize dic to null
foreach document d in D do
for i←0 to len(d)-1 do
If the current semantics
Figure FDA0003886137450000021
Not in the dictionary dic, but,
will be provided with
Figure FDA0003886137450000022
Is assigned to w i #s;
Will be provided with
Figure FDA0003886137450000023
Is assigned to
Figure FDA0003886137450000024
Will be provided with
Figure FDA0003886137450000025
Add to dictionary
Figure FDA0003886137450000026
Performing the following steps;
will w i Addition of # s to pd i The preparation method comprises the following steps of (1) performing;
if not, then,
initializing minDist =1 and minIndex =0;
initializing found = False;
Figure FDA0003886137450000027
4. the method for labeling ambiguous words in text as recited in claim 1, further comprising: before inputting an original corpus to be processed, preprocessing the original corpus is required, including: unifying case letters, deleting all stop words, deleting documents containing less than three words.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 4 when executing the program.
6. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 4.
7. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202110270079.XA 2021-03-12 2021-03-12 Method, device, processor and storage medium for marking ambiguous words in text Active CN112949319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110270079.XA CN112949319B (en) 2021-03-12 2021-03-12 Method, device, processor and storage medium for marking ambiguous words in text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110270079.XA CN112949319B (en) 2021-03-12 2021-03-12 Method, device, processor and storage medium for marking ambiguous words in text

Publications (2)

Publication Number Publication Date
CN112949319A CN112949319A (en) 2021-06-11
CN112949319B true CN112949319B (en) 2023-01-06

Family

ID=76229613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110270079.XA Active CN112949319B (en) 2021-03-12 2021-03-12 Method, device, processor and storage medium for marking ambiguous words in text

Country Status (1)

Country Link
CN (1) CN112949319B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334768A (en) * 2008-08-05 2008-12-31 北京学之途网络科技有限公司 Method and system for eliminating ambiguity for word meaning by computer, and search method
CN103229137A (en) * 2010-09-29 2013-07-31 国际商业机器公司 Context-based disambiguation of acronyms and abbreviations
CN105808530A (en) * 2016-03-23 2016-07-27 苏州大学 Translation method and device in statistical machine translation
CN105912523A (en) * 2016-04-06 2016-08-31 苏州大学 Word meaning marking method and device
CN106021272A (en) * 2016-04-04 2016-10-12 上海大学 Keyword automatic extraction method based on distributed expression word vector calculation
US9760627B1 (en) * 2016-05-13 2017-09-12 International Business Machines Corporation Private-public context analysis for natural language content disambiguation
KR101799681B1 (en) * 2016-06-15 2017-11-20 울산대학교 산학협력단 Apparatus and method for disambiguating homograph word sense using lexical semantic network and word embedding
CN108153730A (en) * 2017-12-25 2018-06-12 北京奇艺世纪科技有限公司 A kind of polysemant term vector training method and device
CN109002432A (en) * 2017-06-07 2018-12-14 北京京东尚科信息技术有限公司 Method for digging and device, computer-readable medium, the electronic equipment of synonym
CN109753569A (en) * 2018-12-29 2019-05-14 上海智臻智能网络科技股份有限公司 A kind of method and device of polysemant discovery
CN110674304A (en) * 2019-10-09 2020-01-10 北京明略软件系统有限公司 Entity disambiguation method and device, readable storage medium and electronic equipment
CN111310475A (en) * 2020-02-04 2020-06-19 支付宝(杭州)信息技术有限公司 Training method and device of word sense disambiguation model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8856642B1 (en) * 2013-07-22 2014-10-07 Recommind, Inc. Information extraction and annotation systems and methods for documents
CN107844473B (en) * 2017-09-25 2020-12-18 沈阳航空航天大学 Word sense disambiguation method based on context similarity calculation
CN107967255A (en) * 2017-11-08 2018-04-27 北京广利核系统工程有限公司 A kind of method and system for judging text similarity
CN109829149A (en) * 2017-11-23 2019-05-31 中国移动通信有限公司研究院 A kind of generation method and device, equipment, storage medium of term vector model
CN110162766B (en) * 2018-02-12 2023-03-24 深圳市腾讯计算机系统有限公司 Word vector updating method and device
CN110376896A (en) * 2019-07-30 2019-10-25 浙江大学 It is a kind of that refrigerating method is optimized based on deep learning and the single heat source air-conditioning of fuzzy control

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334768A (en) * 2008-08-05 2008-12-31 北京学之途网络科技有限公司 Method and system for eliminating ambiguity for word meaning by computer, and search method
CN103229137A (en) * 2010-09-29 2013-07-31 国际商业机器公司 Context-based disambiguation of acronyms and abbreviations
CN105808530A (en) * 2016-03-23 2016-07-27 苏州大学 Translation method and device in statistical machine translation
CN106021272A (en) * 2016-04-04 2016-10-12 上海大学 Keyword automatic extraction method based on distributed expression word vector calculation
CN105912523A (en) * 2016-04-06 2016-08-31 苏州大学 Word meaning marking method and device
US9760627B1 (en) * 2016-05-13 2017-09-12 International Business Machines Corporation Private-public context analysis for natural language content disambiguation
KR101799681B1 (en) * 2016-06-15 2017-11-20 울산대학교 산학협력단 Apparatus and method for disambiguating homograph word sense using lexical semantic network and word embedding
CN109002432A (en) * 2017-06-07 2018-12-14 北京京东尚科信息技术有限公司 Method for digging and device, computer-readable medium, the electronic equipment of synonym
CN108153730A (en) * 2017-12-25 2018-06-12 北京奇艺世纪科技有限公司 A kind of polysemant term vector training method and device
CN109753569A (en) * 2018-12-29 2019-05-14 上海智臻智能网络科技股份有限公司 A kind of method and device of polysemant discovery
CN110674304A (en) * 2019-10-09 2020-01-10 北京明略软件系统有限公司 Entity disambiguation method and device, readable storage medium and electronic equipment
CN111310475A (en) * 2020-02-04 2020-06-19 支付宝(杭州)信息技术有限公司 Training method and device of word sense disambiguation model

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
一个汉语词义自动标注系统的设计与实现;葛瑞芳等;《计算机工程与应用》;20010901(第17期);全文 *
基于专利信息的潜在语义索引优化技术的研究;毕臣等;《山西大学学报(自然科学版)》;20140215(第01期);全文 *
基于知网义原词向量表示的无监督词义消歧方法;唐共波等;《中文信息学报》;20151115(第06期);全文 *
学习挑选伪标签:一种用于命名实体识别的半监督学习方法(英文);李真真等;《Frontiers of Information Technology & Electronic Engineering》;20200603(第06期);全文 *
规则与统计相结合的词义消歧方法研究;苗海等;《计算机科学》;20131215(第12期);全文 *
词向量语义表示研究进展;李枫林等;《情报科学》;20190501(第05期);全文 *

Also Published As

Publication number Publication date
CN112949319A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
Yao et al. Bi-directional LSTM recurrent neural network for Chinese word segmentation
Kim et al. Two-stage multi-intent detection for spoken language understanding
US20200311207A1 (en) Automatic text segmentation based on relevant context
CN106909537B (en) One-word polysemous analysis method based on topic model and vector space
Rendel et al. Using continuous lexical embeddings to improve symbolic-prosody prediction in a text-to-speech front-end
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN114911892A (en) Interaction layer neural network for search, retrieval and ranking
WO2023134082A1 (en) Training method and apparatus for image caption statement generation module, and electronic device
CN112883199A (en) Collaborative disambiguation method based on deep semantic neighbor and multi-entity association
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN113723077B (en) Sentence vector generation method and device based on bidirectional characterization model and computer equipment
CN110874397A (en) Water army comment detection system and method based on attention mechanism
Dadas et al. Evaluation of sentence representations in polish
CN112949319B (en) Method, device, processor and storage medium for marking ambiguous words in text
CN114970467B (en) Method, device, equipment and medium for generating composition manuscript based on artificial intelligence
Permatasari et al. Human-robot interaction based on dialog management using sentence similarity comparison method
Shet et al. Segmenting multi-intent queries for spoken language understanding
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN112749251B (en) Text processing method, device, computer equipment and storage medium
Nazarizadeh et al. Using Group Deep Learning and Data Augmentation in Persian Sentiment Analysis
CN113190681A (en) Fine-grained text classification method based on capsule network mask memory attention
Chorowski et al. Read, tag, and parse all at once, or fully-neural dependency parsing
CN111985548A (en) Label-guided cross-modal deep hashing method
CN111199154A (en) Fault-tolerant rough set-based polysemous word expression method, system and medium
Seo et al. FAGON: fake news detection model using grammatical transformation on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant