CN106598947A - Bayesian word sense disambiguation method based on synonym expansion - Google Patents

Bayesian word sense disambiguation method based on synonym expansion Download PDF

Info

Publication number
CN106598947A
CN106598947A CN201611157518.1A CN201611157518A CN106598947A CN 106598947 A CN106598947 A CN 106598947A CN 201611157518 A CN201611157518 A CN 201611157518A CN 106598947 A CN106598947 A CN 106598947A
Authority
CN
China
Prior art keywords
word
corpus
training
meaning
disambiguation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611157518.1A
Other languages
Chinese (zh)
Inventor
杨陟卓
张虎
李茹
陈千
谭红叶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN201611157518.1A priority Critical patent/CN106598947A/en
Publication of CN106598947A publication Critical patent/CN106598947A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of natural language processing methods, and in particular relates to a Bayesian word sense disambiguation method based on synonym expansion. The Bayesian word sense disambiguation method disclosed by the invention is used for mainly solving the problem that the current word sense disambiguation method has the problems of poor disambiguation effect, wasting time and energy to obtain disambiguation knowledge and the like. The Bayesian word sense disambiguation method based on synonym expansion disclosed by the invention comprises the following steps of: (1), expanding the context of a training corpus by adopting the Chinese thesaurus, and generating a lot of pseudo training corpuses; (2), removing noise in the pseudo training corpuses by utilizing a word collocation corpus, and generating a pseudo training corpus; (3), training a Bayesian disambiguation model by adopting the training corpus and the pseudo training corpus simultaneously; and (4), inputting a test corpus into the Bayesian disambiguation model, and co-determining word senses of ambiguous words by comprehensively utilizing the disambiguation knowledge in the two corpuses.

Description

A kind of Bayes's Word sense disambiguation method extended based on synonym
Technical field
The invention belongs to natural language processing method and technology field, specifically, it is related to a kind of extend based on synonym Bayes's Word sense disambiguation method.
Technical background
Word sense disambiguation (Word Sense Disambiguation, WSD) refers to that determination polysemant is specific in natural language Meaning in context, it is a key problem of natural language processing field.During machine understands natural language, When an ambiguity word occurs in a specific context, the Ambiguity of word just occurs, especially " information is quick-fried currently It is fried " Internet era, the ambiguity problem of vocabulary just seems more serious.Either Chinese or western language, polysemy Phenomenon generally existing.Statistical study shows, in Large Scale Corpus, what Chinese text and English text occurred in language material Ambiguity word frequency rate reaches 40% or so.The ambiguity word of very high frequency has had a strong impact on machine just understanding and locate to natural language Reason, the problem has been one of its greatest difficulty for facing.The development of the technology, can greatly facilitate such as language identification, sentence The development of the natural language processing fields such as method analysis, information retrieval, machine translation, text-processing.
At present, the Word sense disambiguation method based on corpus can be divided into supervision and unsupervised approaches.Unsupervised approaches are not required to Corpus are wanted, but the disambiguation effect of the method is not fully up to expectations, is extremely difficult to practical purpose.There is the disambiguation effect of measure of supervision Fruit will be substantially better than unsupervised approaches, but the method needs extensive high-quality corpus to support, and obtain extensive high The corpus of quality are wasted time and energy, and have seriously hindered supervision Word sense disambiguation method large-scale application.In order to solve this problem, Many scholars begin one's study and automatically generate the method for marking language material.The method is generally first with dictionary and on a large scale without mark Corpus has automatically generated labeled data, then using there is measure of supervision to train disambiguation model, carries out disambiguation.
The content of the invention
Present invention is generally directed to current Word sense disambiguation method is present, disambiguation effect is poor, obtain disambiguation knowledge wastes time and energy A kind of problem, there is provided Bayes's Word sense disambiguation method extended based on synonym.
The technical scheme taken to solve the above problems of the present invention is:
A kind of Bayes's Word sense disambiguation method extended based on synonym, is comprised the following steps:
Step 1, the context of training corpus is extended using Chinese thesaurus, generates a large amount of puppet corpus;
Step 2, the noise removed using collocations corpus in puppet corpus, generate pseudo- training corpus;
Step 3, while using training corpus and pseudo- training corpus training Bayes's disambiguation model;
Step 4, testing material is input into into Bayes's disambiguation model, comprehensively utilizes the disambiguation knowledge in two kinds of corpus, altogether With the meaning of a word of decision-making ambiguity word.
Further, step 1 of the present invention is concretely comprised the following steps:First, little rule are set up by the way of artificial mark Mould word sense disambiguation training corpus, then using Chinese thesaurus, is extended, most to the context in sentence residing for ambiguity word Afterwards by the synonym after extension, ambiguity word and in this ambiguity word the meaning of a word, generate a large amount of puppet corpus.
Step 2 of the present invention is concretely comprised the following steps:The context of ambiguity word is extended using Chinese thesaurus, for expanding The context of exhibition, statistics and ambiguity word co-occurrence number of times in collocations corpus, only using upper with certain co-occurrence number of times Hereafter, pseudo- training corpus is built.
Simultaneously using training corpus and pseudo- training corpus training Bayes's disambiguation model in step 3 of the present invention, Computing formula is:
In formula, siRepresent the ambiguity word meaning of a word, w-L...wLRepresent ambiguity word w0Word under neighbouring certain window size L, fjCertain contextual feature of ambiguity word is represented, F represents the characteristic set of context, p (fj|si) represent the bar of the meaning of a word and feature Part probability, calculates as formula is:
c(si) represent meaning of a word siThe number of times occurred in corpus, c (fj,si) represent feature fjWith meaning of a word siIn training language Co-occurrence number of times in material.
Step 4 of the present invention is concretely comprised the following steps:The language piece that the context extended by Chinese thesaurus is constituted Section, when corpus are faked, comprehensively utilizes the knowledge in training corpus and pseudo- training corpus, carries out word sense disambiguation, is estimating During the conditional probability of the meter meaning of a word and feature, calculated by below equation:
C in formulat(fj,si) represent meaning of a word siWith feature fjCo-occurrence number of times in corpus, ct(si) represent meaning of a word si Occurrence number in corpus, cp(fj,si) represent feature and co-occurrence number of times of the ambiguity word in pseudo- corpus, cp(si) Represent meaning of a word siThe occurrence number in pseudo- corpus, the value of λ is 0.7.
The present invention adopts above-mentioned technical proposal, using Chinese thesaurus, the context of ambiguity word in corpus is carried out Extension, the language fragments that the synonym after extension is constituted are similar to the meaning that former context is stated, and generate pseudo- corpus Storehouse.Then the noise in pseudo- training corpus is removed using collocations corpus, followed by training corpus and pseudo- training Corpus, trains Bayes's disambiguation model, finally, using the meaning of a word of the disambiguation model decision ambiguity word.Specifically:
1st, the context for training example is extended using Chinese thesaurus, generates pseudo- training corpus.The present invention is adopted Context residing for ambiguity word under certain window size is extended with Chinese thesaurus, for word sense disambiguation more knowledge are provided.Due to, The language fragments that the synonym of context is constituted are similar to the meaning of former context language fragment expression.Meanwhile, occur in this The same meaning that ambiguity vocabulary in two kinds of similar contexts language fragments reaches.Therefore, it can in sentence residing for ambiguity word Context is extended, then by the context after extension, ambiguity word and in this ambiguity word the meaning of a word, collectively form pseudo- instruction Practice corpus.For example:" work of whole unit team members has been completed ambiguity sentences.", wherein ambiguity word " unit " is in Modern Chinese In the meaning of a word have two, be respectively " personnel " and " machine ".Speculate from the context residing for ambiguity word, the word of ambiguity word " unit " Justice is " personnel ".The sentence is carried out into the process of synonym extension, can be represented with Fig. 1.
Word near ambiguity word, the impact to the ambiguity word meaning of a word is maximum, therefore, only list in figure near on Hereafter word " whole ", " team member " and " work ", and only synonym extension is carried out to these context words.As schemed Show, each TongYiCi CiLin only lists 4 synonyms of respective contexts word, can be combined into again using these synonyms Multiple sentences comprising ambiguity word, such as " whole crew's tasks ", " all unit group member responsibilities ", " all units team member It is full-time " etc..In the language fragments combined by synonym, the meaning of a word of ambiguity word " unit " is still represented " personnel ".With Upper synonym, ambiguity word " unit " and the ambiguity word meaning of a word " personnel " collectively form pseudo- training corpus.
2nd, the noise in pseudo- corpus is removed using meaning of a word collocation corpus.Context words are carried out it is synon During extension, 2 problems are run into:(1) it is not that all synonyms of context words are suitable for for expanding, builds new training Language material.The synonym for for example " working " includes " sole duty ", " bounden duty " etc..And this 2 words are not appropriate for adding the language for expanding In material storehouse, because in daily life such word combination is rarely employed.(2) under many circumstances, context words are not It is univocal, equally exists the problem of ambiguity, the word that for example " works " there are 2 basic meaning of a word, is respectively that 1. " task " 2. " is grasped Make " " doing things ".In order to solve this 2 problems, the present invention limits synon extension using the relation of collocations.Specifically do Method is, only using the word training disambiguation model with ambiguity word with certain co-occurrence number of times.By the restriction of collocations number of times, Collocations rarely needed in daily Chinese not only can be filtered out, and can largely be solved due to upper and lower cliction The ambiguity of language and caused noise word problem.
3rd, while using training corpus and pseudo- training corpus training Bayes's disambiguation model.
The present invention carries out disambiguation training using Bayesian model, as shown in Equation 1.
In formula (1), siRepresent the ambiguity word meaning of a word, w-L...wLRepresent ambiguity word w0Word under neighbouring certain window size L Language, fjCertain contextual feature of ambiguity word is represented, F represents the characteristic set of context, p (fj|si) represent the meaning of a word with feature Conditional probability, calculates as shown in formula (2).
In formula 2, c (si) represent meaning of a word siThe number of times occurred in corpus, c (fj,si) represent feature fjWith meaning of a word si Co-occurrence number of times in corpus.
4th, training corpus and pseudo- training corpus, the meaning of a word of codetermination ambiguity word are comprehensively utilized.
The language fragments that the present invention is constituted the synonym expanded by context words, build pseudo- training corpus, comprehensive Close using training corpus and pseudo- training corpus, carry out word sense disambiguation.Pseudo- training corpus compared with artificial language material, also one Fixed noise, should play relatively small effect.When meaning of a word decision-making is carried out using Bayesian formula (1), by two types The language material codetermination meaning of a word, when the conditional probability of ambiguity word and feature is estimated, calculated using formula (3):
In formula 3, ct(fj,si) represent meaning of a word siWith feature fjCo-occurrence number of times in corpus, ct(si) represent word Adopted siOccurrence number in corpus.cp(fj,si) represent feature and co-occurrence number of times of the ambiguity word in pseudo- corpus, cp (si) represent meaning of a word siThe occurrence number in pseudo- corpus.λ is used to adjust impact of the pseudo- corpus to the ambiguity word meaning of a word, takes It is worth for 0.7.
In a word the present invention is a kind of practical Chinese word sense disambiguation technology, can on the basis of it need not manually mark, Effectively alleviate the Sparse Problem that word sense disambiguation is faced, improve the accuracy rate of word sense disambiguation, the method has wide sending out Exhibition prospect, can greatly facilitate the natural languages such as information retrieval, machine translation, language identification, syntactic analysis, text-processing The development of process field, compared with Word sense disambiguation method is supervised in traditional having, accuracy rate improves 4.35 percentage points.
Description of the drawings
Word sense disambiguation methods of the Fig. 1 based on synonym extension;
Fig. 2 is the overall structure diagram of the present invention.
Specific embodiment
Embodiment 1
Fig. 1 is the schematic diagram of the whole method of the present invention, and below in conjunction with example specific implementation is given." all< word>Unit</word>Team member's work has been completed " for corpus, sentence "<word>Unit</word>Personnel do not send out Go out distress signal " it is testing material, disambiguation process is carried out to the ambiguity word " unit " of testing material.
Whole implementation process is as follows:
(1) context of training corpus is extended using Chinese thesaurus, generates pseudo- corpus.
Using Chinese thesaurus ambiguity sentences are extended respectively, obtain context TongYiCi CiLin.Such as " all, institute Have, all, it is whole ", " personnel, group member, team member, party member ", " task, responsibility, sole duty, bounden duty " etc..Above TongYiCi CiLin, Ambiguity word " unit " and the ambiguity word meaning of a word " personnel " collectively form pseudo- corpus.
(2) noise in pseudo- corpus is removed using collocations corpus.
For problem of the language material comprising some noises that step (1) is produced, the present invention is removed using collocations corpus The noise of pseudo- corpus.Only pseudo- training corpus will be added with the synonym of certain co-occurrence number of times.Co-occurrence frequency threshold value mesh Front value is 25, and using synonym of the co-occurrence number of times more than 25 pseudo- training corpus is built.Therefore, " party member ", " sole duty " and " my god Duty " these synonyms will be filtered, in being added without pseudo- corpus.
(3) while using training corpus and pseudo- training corpus training Bayes's disambiguation model.
According to formula (2), the co-occurrence number of times of the meaning of a word (" personnel " and " machine ") and feature is counted.Feature fjNot only include Context (" entirety ", " unit ", " team member " " task ") in corpus, also including the context synonym removed after noise (" all ", " all ", " whole ", " personnel ", " group member ", " responsibility ").Co-occurrence number of times ct(fj,si) and cp(fj,si) in training Statistical result in corpus is as shown in table 1:
The co-occurrence number of times of feature and the meaning of a word in the training corpus of table 1
Additionally, each meaning of a word siThe number of times c occurred in training corpus and pseudo- training corpust(sPersonnel)=10, ct (sMachine)=8 and cp(sPersonnel)=10, cp(sMachine)=8.
(4) test case and the input of context synonym there are into monitor model, comprehensive utilization training corpus and pseudo- training Corpus, the meaning of a word of codetermination ambiguity word.
Contextual feature " personnel ", " sending " in testing material, " emergency " " signal ", find out respectively ambiguity word word The co-occurrence number of times of adopted " personnel " and " machine " under these features, as shown in table 2.
The co-occurrence number of times of the testing material of table 2
λ value in formula (3) estimates that value is 0.7 in corpus.Probability is calculated using formula (3)WithIn addition, the probability that the ambiguity word meaning of a word occurs isWithAccording to formula (1), calculate Therefore, the ambiguity word meaning of a word of maximum of probability is " personnel ", and the meaning of a word is labeled as into the final meaning of a word of test case ambiguity word, complete Into word sense disambiguation.
The foregoing is only the preferred embodiment of the present invention.Above-mentioned embodiment is not limited to the present invention, For a person skilled in the art, the present invention can have various modifications and variations.It is all the spirit and principles in the present invention it Interior, any modification, equivalent substitution and improvements made etc. should be included within scope of the presently claimed invention.

Claims (5)

1. it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that comprise the following steps:
Step 1, the context of training corpus is extended using Chinese thesaurus, generates a large amount of puppet corpus;
Step 2, the noise removed using collocations corpus in puppet corpus, generate pseudo- training corpus;
Step 3, while using training corpus and pseudo- training corpus training Bayes's disambiguation model;
Step 4, testing material is input into into Bayes's disambiguation model, comprehensively utilizes the disambiguation knowledge in two kinds of corpus, it is common certainly The meaning of a word of plan ambiguity word.
2. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that:Institute State concretely comprising the following steps for step 1:First, small-scale word sense disambiguation training corpus is set up by the way of artificial mark, then Using Chinese thesaurus, the context in sentence residing for ambiguity word is extended, finally by the synonym after extension, ambiguity word And in this ambiguity word the meaning of a word, generate a large amount of puppet corpus.
3. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that step Rapid 2 concretely comprise the following steps:The context of ambiguity word is extended using Chinese thesaurus, for the context for extending, statistics With ambiguity word in collocations corpus co-occurrence number of times, only using the context with certain co-occurrence number of times, build pseudo- training Corpus.
4. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that:Institute State simultaneously using training corpus and pseudo- training corpus training Bayes's disambiguation model in step 3, computing formula is:
p ( s i | w - L ... w 0 ... w L ) &Proportional; p ( s i ) &Pi; f j &Element; F p ( f j | s i ) ,
In formula, siRepresent the ambiguity word meaning of a word, w-L...wLRepresent ambiguity word w0Word under neighbouring certain window size L, fjTable Show certain contextual feature of ambiguity word, F represents the characteristic set of context, p (fj|si) represent that the meaning of a word is general with the condition of feature Rate, calculates as formula is:
p ( f j | s i ) = c ( f j , s i ) c ( s i ) ,
c(si) represent meaning of a word siThe number of times occurred in corpus, c (fj,si) represent feature fjWith meaning of a word siIn corpus Co-occurrence number of times.
5. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that:Institute State concretely comprising the following steps for step 4:The language fragments that the context extended by Chinese thesaurus is constituted, when faking training language Material, comprehensively utilizes the knowledge in training corpus and pseudo- training corpus, word sense disambiguation is carried out, in the bar for estimating the meaning of a word and feature During part probability, calculated by below equation:
p ( f j | s i ) = c t ( f j , s i ) c t ( s i ) + &lambda; c p ( f j , s i ) c p ( s i )
C in formulat(fj,si) represent meaning of a word siWith feature fjCo-occurrence number of times in corpus, ct(si) represent meaning of a word siIn instruction Practice the occurrence number in language material, cp(fj,si) represent feature and co-occurrence number of times of the ambiguity word in pseudo- corpus, cp(si) represent Meaning of a word siThe occurrence number in pseudo- corpus, λ values are 0.7.
CN201611157518.1A 2016-12-15 2016-12-15 Bayesian word sense disambiguation method based on synonym expansion Pending CN106598947A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611157518.1A CN106598947A (en) 2016-12-15 2016-12-15 Bayesian word sense disambiguation method based on synonym expansion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611157518.1A CN106598947A (en) 2016-12-15 2016-12-15 Bayesian word sense disambiguation method based on synonym expansion

Publications (1)

Publication Number Publication Date
CN106598947A true CN106598947A (en) 2017-04-26

Family

ID=58801472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611157518.1A Pending CN106598947A (en) 2016-12-15 2016-12-15 Bayesian word sense disambiguation method based on synonym expansion

Country Status (1)

Country Link
CN (1) CN106598947A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357786A (en) * 2017-07-13 2017-11-17 山西大学 A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data
CN108388553A (en) * 2017-12-28 2018-08-10 广州索答信息科技有限公司 Talk with method, electronic equipment and the conversational system towards kitchen of disambiguation
CN109033307A (en) * 2018-07-17 2018-12-18 华北水利水电大学 Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method
WO2019085640A1 (en) * 2017-10-31 2019-05-09 株式会社Ntt都科摩 Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium
CN111783418A (en) * 2020-06-09 2020-10-16 北京北大软件工程股份有限公司 Chinese meaning representation learning method and device
CN112528670A (en) * 2020-12-01 2021-03-19 清华大学 Word meaning processing method and device, electronic equipment and storage medium
CN112836057A (en) * 2019-11-22 2021-05-25 华为技术有限公司 Knowledge graph generation method, device, terminal and storage medium
CN112966071A (en) * 2021-02-03 2021-06-15 北京奥鹏远程教育中心有限公司 User feedback information analysis method, device, equipment and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295294A (en) * 2008-06-12 2008-10-29 昆明理工大学 Improved Bayes acceptation disambiguation method based on information gain

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295294A (en) * 2008-06-12 2008-10-29 昆明理工大学 Improved Bayes acceptation disambiguation method based on information gain

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
GERARD ESCUDERO ET AL.: "Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited", 《HTTPS://WWW.RESEARCHGATE.NET/PUBLICATION/1955019》 *
卢志茂,等: "基于依存分析改进贝叶斯模型的词义消歧", 《高技术通讯》 *
杨陟卓,等: "基于语言模型的有监督词义消歧模型优化研究", 《中文信息学报》 *
车超: "知识自动获取的词义消歧方法", 《中国博士学位论文全文数据库 信息科技辑(月刊)》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107357786A (en) * 2017-07-13 2017-11-17 山西大学 A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data
WO2019085640A1 (en) * 2017-10-31 2019-05-09 株式会社Ntt都科摩 Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium
CN108388553A (en) * 2017-12-28 2018-08-10 广州索答信息科技有限公司 Talk with method, electronic equipment and the conversational system towards kitchen of disambiguation
CN108388553B (en) * 2017-12-28 2021-10-15 广州索答信息科技有限公司 Method for eliminating ambiguity in conversation, electronic equipment and kitchen-oriented conversation system
CN109033307B (en) * 2018-07-17 2021-08-31 华北水利水电大学 CRP clustering-based word multi-prototype vector representation and word sense disambiguation method
CN109033307A (en) * 2018-07-17 2018-12-18 华北水利水电大学 Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method
CN112836057A (en) * 2019-11-22 2021-05-25 华为技术有限公司 Knowledge graph generation method, device, terminal and storage medium
WO2021098491A1 (en) * 2019-11-22 2021-05-27 华为技术有限公司 Knowledge graph generating method, apparatus, and terminal, and storage medium
CN112836057B (en) * 2019-11-22 2024-03-26 华为技术有限公司 Knowledge graph generation method, device, terminal and storage medium
CN111783418A (en) * 2020-06-09 2020-10-16 北京北大软件工程股份有限公司 Chinese meaning representation learning method and device
CN111783418B (en) * 2020-06-09 2024-04-05 北京北大软件工程股份有限公司 Chinese word meaning representation learning method and device
CN112528670A (en) * 2020-12-01 2021-03-19 清华大学 Word meaning processing method and device, electronic equipment and storage medium
CN112528670B (en) * 2020-12-01 2022-08-30 清华大学 Word meaning processing method and device, electronic equipment and storage medium
CN112966071A (en) * 2021-02-03 2021-06-15 北京奥鹏远程教育中心有限公司 User feedback information analysis method, device, equipment and readable storage medium
CN112966071B (en) * 2021-02-03 2023-09-08 北京奥鹏远程教育中心有限公司 User feedback information analysis method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN106598947A (en) Bayesian word sense disambiguation method based on synonym expansion
Kondratyuk et al. 75 languages, 1 model: Parsing universal dependencies universally
Wieting et al. Charagram: Embedding words and sentences via character n-grams
Jianqiang Pre-processing boosting Twitter sentiment analysis?
CN110362678A (en) A kind of method and apparatus automatically extracting Chinese text keyword
CN108388554A (en) Text emotion identifying system based on collaborative filtering attention mechanism
CN105512110A (en) Wrong word knowledge base construction method based on fuzzy matching and statistics
CN108363688A (en) A kind of name entity link method of fusion prior information
CN104933032A (en) Method for extracting keywords of blog based on complex network
CN107688630A (en) A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme
CN114298010A (en) Text generation method integrating dual-language model and sentence detection
CN106202039A (en) Vietnamese portmanteau word disambiguation method based on condition random field
CN108038166A (en) A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item
Wang et al. Tibetan word segmentation method based on bilstm_ crf model
Hao et al. SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis
CN110765762B (en) System and method for extracting optimal theme of online comment text under big data background
pal Singh et al. Naive Bayes classifier for word sense disambiguation of Punjabi language
Nawar CUFE@ QALB-2015 shared task: Arabic error correction system
CN106126606A (en) A kind of short text new word discovery method
CN108776656A (en) Food safety affair entity abstracting method based on condition random field
JP2007072610A (en) Information processing method, apparatus and program
CN107357786A (en) A kind of Bayes&#39;s Word sense disambiguation method based on a large amount of pseudo- data
CN111814456A (en) Verb-based Chinese text similarity calculation method
Wang et al. Morpheme Sense Disambiguation: A New Task Aiming for Understanding the Language at Character Level
Çavuşoğlu et al. Adapting established text representations for predicting review sentiment in Turkish

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170426