CN106598947A - Bayesian word sense disambiguation method based on synonym expansion - Google Patents
Bayesian word sense disambiguation method based on synonym expansion Download PDFInfo
- Publication number
- CN106598947A CN106598947A CN201611157518.1A CN201611157518A CN106598947A CN 106598947 A CN106598947 A CN 106598947A CN 201611157518 A CN201611157518 A CN 201611157518A CN 106598947 A CN106598947 A CN 106598947A
- Authority
- CN
- China
- Prior art keywords
- word
- corpus
- training
- meaning
- disambiguation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
Abstract
The invention belongs to the technical field of natural language processing methods, and in particular relates to a Bayesian word sense disambiguation method based on synonym expansion. The Bayesian word sense disambiguation method disclosed by the invention is used for mainly solving the problem that the current word sense disambiguation method has the problems of poor disambiguation effect, wasting time and energy to obtain disambiguation knowledge and the like. The Bayesian word sense disambiguation method based on synonym expansion disclosed by the invention comprises the following steps of: (1), expanding the context of a training corpus by adopting the Chinese thesaurus, and generating a lot of pseudo training corpuses; (2), removing noise in the pseudo training corpuses by utilizing a word collocation corpus, and generating a pseudo training corpus; (3), training a Bayesian disambiguation model by adopting the training corpus and the pseudo training corpus simultaneously; and (4), inputting a test corpus into the Bayesian disambiguation model, and co-determining word senses of ambiguous words by comprehensively utilizing the disambiguation knowledge in the two corpuses.
Description
Technical field
The invention belongs to natural language processing method and technology field, specifically, it is related to a kind of extend based on synonym
Bayes's Word sense disambiguation method.
Technical background
Word sense disambiguation (Word Sense Disambiguation, WSD) refers to that determination polysemant is specific in natural language
Meaning in context, it is a key problem of natural language processing field.During machine understands natural language,
When an ambiguity word occurs in a specific context, the Ambiguity of word just occurs, especially " information is quick-fried currently
It is fried " Internet era, the ambiguity problem of vocabulary just seems more serious.Either Chinese or western language, polysemy
Phenomenon generally existing.Statistical study shows, in Large Scale Corpus, what Chinese text and English text occurred in language material
Ambiguity word frequency rate reaches 40% or so.The ambiguity word of very high frequency has had a strong impact on machine just understanding and locate to natural language
Reason, the problem has been one of its greatest difficulty for facing.The development of the technology, can greatly facilitate such as language identification, sentence
The development of the natural language processing fields such as method analysis, information retrieval, machine translation, text-processing.
At present, the Word sense disambiguation method based on corpus can be divided into supervision and unsupervised approaches.Unsupervised approaches are not required to
Corpus are wanted, but the disambiguation effect of the method is not fully up to expectations, is extremely difficult to practical purpose.There is the disambiguation effect of measure of supervision
Fruit will be substantially better than unsupervised approaches, but the method needs extensive high-quality corpus to support, and obtain extensive high
The corpus of quality are wasted time and energy, and have seriously hindered supervision Word sense disambiguation method large-scale application.In order to solve this problem,
Many scholars begin one's study and automatically generate the method for marking language material.The method is generally first with dictionary and on a large scale without mark
Corpus has automatically generated labeled data, then using there is measure of supervision to train disambiguation model, carries out disambiguation.
The content of the invention
Present invention is generally directed to current Word sense disambiguation method is present, disambiguation effect is poor, obtain disambiguation knowledge wastes time and energy
A kind of problem, there is provided Bayes's Word sense disambiguation method extended based on synonym.
The technical scheme taken to solve the above problems of the present invention is:
A kind of Bayes's Word sense disambiguation method extended based on synonym, is comprised the following steps:
Step 1, the context of training corpus is extended using Chinese thesaurus, generates a large amount of puppet corpus;
Step 2, the noise removed using collocations corpus in puppet corpus, generate pseudo- training corpus;
Step 3, while using training corpus and pseudo- training corpus training Bayes's disambiguation model;
Step 4, testing material is input into into Bayes's disambiguation model, comprehensively utilizes the disambiguation knowledge in two kinds of corpus, altogether
With the meaning of a word of decision-making ambiguity word.
Further, step 1 of the present invention is concretely comprised the following steps:First, little rule are set up by the way of artificial mark
Mould word sense disambiguation training corpus, then using Chinese thesaurus, is extended, most to the context in sentence residing for ambiguity word
Afterwards by the synonym after extension, ambiguity word and in this ambiguity word the meaning of a word, generate a large amount of puppet corpus.
Step 2 of the present invention is concretely comprised the following steps:The context of ambiguity word is extended using Chinese thesaurus, for expanding
The context of exhibition, statistics and ambiguity word co-occurrence number of times in collocations corpus, only using upper with certain co-occurrence number of times
Hereafter, pseudo- training corpus is built.
Simultaneously using training corpus and pseudo- training corpus training Bayes's disambiguation model in step 3 of the present invention,
Computing formula is:
In formula, siRepresent the ambiguity word meaning of a word, w-L...wLRepresent ambiguity word w0Word under neighbouring certain window size L,
fjCertain contextual feature of ambiguity word is represented, F represents the characteristic set of context, p (fj|si) represent the bar of the meaning of a word and feature
Part probability, calculates as formula is:
c(si) represent meaning of a word siThe number of times occurred in corpus, c (fj,si) represent feature fjWith meaning of a word siIn training language
Co-occurrence number of times in material.
Step 4 of the present invention is concretely comprised the following steps:The language piece that the context extended by Chinese thesaurus is constituted
Section, when corpus are faked, comprehensively utilizes the knowledge in training corpus and pseudo- training corpus, carries out word sense disambiguation, is estimating
During the conditional probability of the meter meaning of a word and feature, calculated by below equation:
C in formulat(fj,si) represent meaning of a word siWith feature fjCo-occurrence number of times in corpus, ct(si) represent meaning of a word si
Occurrence number in corpus, cp(fj,si) represent feature and co-occurrence number of times of the ambiguity word in pseudo- corpus, cp(si)
Represent meaning of a word siThe occurrence number in pseudo- corpus, the value of λ is 0.7.
The present invention adopts above-mentioned technical proposal, using Chinese thesaurus, the context of ambiguity word in corpus is carried out
Extension, the language fragments that the synonym after extension is constituted are similar to the meaning that former context is stated, and generate pseudo- corpus
Storehouse.Then the noise in pseudo- training corpus is removed using collocations corpus, followed by training corpus and pseudo- training
Corpus, trains Bayes's disambiguation model, finally, using the meaning of a word of the disambiguation model decision ambiguity word.Specifically:
1st, the context for training example is extended using Chinese thesaurus, generates pseudo- training corpus.The present invention is adopted
Context residing for ambiguity word under certain window size is extended with Chinese thesaurus, for word sense disambiguation more knowledge are provided.Due to,
The language fragments that the synonym of context is constituted are similar to the meaning of former context language fragment expression.Meanwhile, occur in this
The same meaning that ambiguity vocabulary in two kinds of similar contexts language fragments reaches.Therefore, it can in sentence residing for ambiguity word
Context is extended, then by the context after extension, ambiguity word and in this ambiguity word the meaning of a word, collectively form pseudo- instruction
Practice corpus.For example:" work of whole unit team members has been completed ambiguity sentences.", wherein ambiguity word " unit " is in Modern Chinese
In the meaning of a word have two, be respectively " personnel " and " machine ".Speculate from the context residing for ambiguity word, the word of ambiguity word " unit "
Justice is " personnel ".The sentence is carried out into the process of synonym extension, can be represented with Fig. 1.
Word near ambiguity word, the impact to the ambiguity word meaning of a word is maximum, therefore, only list in figure near on
Hereafter word " whole ", " team member " and " work ", and only synonym extension is carried out to these context words.As schemed
Show, each TongYiCi CiLin only lists 4 synonyms of respective contexts word, can be combined into again using these synonyms
Multiple sentences comprising ambiguity word, such as " whole crew's tasks ", " all unit group member responsibilities ", " all units team member
It is full-time " etc..In the language fragments combined by synonym, the meaning of a word of ambiguity word " unit " is still represented " personnel ".With
Upper synonym, ambiguity word " unit " and the ambiguity word meaning of a word " personnel " collectively form pseudo- training corpus.
2nd, the noise in pseudo- corpus is removed using meaning of a word collocation corpus.Context words are carried out it is synon
During extension, 2 problems are run into:(1) it is not that all synonyms of context words are suitable for for expanding, builds new training
Language material.The synonym for for example " working " includes " sole duty ", " bounden duty " etc..And this 2 words are not appropriate for adding the language for expanding
In material storehouse, because in daily life such word combination is rarely employed.(2) under many circumstances, context words are not
It is univocal, equally exists the problem of ambiguity, the word that for example " works " there are 2 basic meaning of a word, is respectively that 1. " task " 2. " is grasped
Make " " doing things ".In order to solve this 2 problems, the present invention limits synon extension using the relation of collocations.Specifically do
Method is, only using the word training disambiguation model with ambiguity word with certain co-occurrence number of times.By the restriction of collocations number of times,
Collocations rarely needed in daily Chinese not only can be filtered out, and can largely be solved due to upper and lower cliction
The ambiguity of language and caused noise word problem.
3rd, while using training corpus and pseudo- training corpus training Bayes's disambiguation model.
The present invention carries out disambiguation training using Bayesian model, as shown in Equation 1.
In formula (1), siRepresent the ambiguity word meaning of a word, w-L...wLRepresent ambiguity word w0Word under neighbouring certain window size L
Language, fjCertain contextual feature of ambiguity word is represented, F represents the characteristic set of context, p (fj|si) represent the meaning of a word with feature
Conditional probability, calculates as shown in formula (2).
In formula 2, c (si) represent meaning of a word siThe number of times occurred in corpus, c (fj,si) represent feature fjWith meaning of a word si
Co-occurrence number of times in corpus.
4th, training corpus and pseudo- training corpus, the meaning of a word of codetermination ambiguity word are comprehensively utilized.
The language fragments that the present invention is constituted the synonym expanded by context words, build pseudo- training corpus, comprehensive
Close using training corpus and pseudo- training corpus, carry out word sense disambiguation.Pseudo- training corpus compared with artificial language material, also one
Fixed noise, should play relatively small effect.When meaning of a word decision-making is carried out using Bayesian formula (1), by two types
The language material codetermination meaning of a word, when the conditional probability of ambiguity word and feature is estimated, calculated using formula (3):
In formula 3, ct(fj,si) represent meaning of a word siWith feature fjCo-occurrence number of times in corpus, ct(si) represent word
Adopted siOccurrence number in corpus.cp(fj,si) represent feature and co-occurrence number of times of the ambiguity word in pseudo- corpus, cp
(si) represent meaning of a word siThe occurrence number in pseudo- corpus.λ is used to adjust impact of the pseudo- corpus to the ambiguity word meaning of a word, takes
It is worth for 0.7.
In a word the present invention is a kind of practical Chinese word sense disambiguation technology, can on the basis of it need not manually mark,
Effectively alleviate the Sparse Problem that word sense disambiguation is faced, improve the accuracy rate of word sense disambiguation, the method has wide sending out
Exhibition prospect, can greatly facilitate the natural languages such as information retrieval, machine translation, language identification, syntactic analysis, text-processing
The development of process field, compared with Word sense disambiguation method is supervised in traditional having, accuracy rate improves 4.35 percentage points.
Description of the drawings
Word sense disambiguation methods of the Fig. 1 based on synonym extension;
Fig. 2 is the overall structure diagram of the present invention.
Specific embodiment
Embodiment 1
Fig. 1 is the schematic diagram of the whole method of the present invention, and below in conjunction with example specific implementation is given." all<
word>Unit</word>Team member's work has been completed " for corpus, sentence "<word>Unit</word>Personnel do not send out
Go out distress signal " it is testing material, disambiguation process is carried out to the ambiguity word " unit " of testing material.
Whole implementation process is as follows:
(1) context of training corpus is extended using Chinese thesaurus, generates pseudo- corpus.
Using Chinese thesaurus ambiguity sentences are extended respectively, obtain context TongYiCi CiLin.Such as " all, institute
Have, all, it is whole ", " personnel, group member, team member, party member ", " task, responsibility, sole duty, bounden duty " etc..Above TongYiCi CiLin,
Ambiguity word " unit " and the ambiguity word meaning of a word " personnel " collectively form pseudo- corpus.
(2) noise in pseudo- corpus is removed using collocations corpus.
For problem of the language material comprising some noises that step (1) is produced, the present invention is removed using collocations corpus
The noise of pseudo- corpus.Only pseudo- training corpus will be added with the synonym of certain co-occurrence number of times.Co-occurrence frequency threshold value mesh
Front value is 25, and using synonym of the co-occurrence number of times more than 25 pseudo- training corpus is built.Therefore, " party member ", " sole duty " and " my god
Duty " these synonyms will be filtered, in being added without pseudo- corpus.
(3) while using training corpus and pseudo- training corpus training Bayes's disambiguation model.
According to formula (2), the co-occurrence number of times of the meaning of a word (" personnel " and " machine ") and feature is counted.Feature fjNot only include
Context (" entirety ", " unit ", " team member " " task ") in corpus, also including the context synonym removed after noise
(" all ", " all ", " whole ", " personnel ", " group member ", " responsibility ").Co-occurrence number of times ct(fj,si) and cp(fj,si) in training
Statistical result in corpus is as shown in table 1:
The co-occurrence number of times of feature and the meaning of a word in the training corpus of table 1
Additionally, each meaning of a word siThe number of times c occurred in training corpus and pseudo- training corpust(sPersonnel)=10, ct
(sMachine)=8 and cp(sPersonnel)=10, cp(sMachine)=8.
(4) test case and the input of context synonym there are into monitor model, comprehensive utilization training corpus and pseudo- training
Corpus, the meaning of a word of codetermination ambiguity word.
Contextual feature " personnel ", " sending " in testing material, " emergency " " signal ", find out respectively ambiguity word word
The co-occurrence number of times of adopted " personnel " and " machine " under these features, as shown in table 2.
The co-occurrence number of times of the testing material of table 2
λ value in formula (3) estimates that value is 0.7 in corpus.Probability is calculated using formula (3)WithIn addition, the probability that the ambiguity word meaning of a word occurs isWithAccording to formula (1), calculate
Therefore, the ambiguity word meaning of a word of maximum of probability is " personnel ", and the meaning of a word is labeled as into the final meaning of a word of test case ambiguity word, complete
Into word sense disambiguation.
The foregoing is only the preferred embodiment of the present invention.Above-mentioned embodiment is not limited to the present invention,
For a person skilled in the art, the present invention can have various modifications and variations.It is all the spirit and principles in the present invention it
Interior, any modification, equivalent substitution and improvements made etc. should be included within scope of the presently claimed invention.
Claims (5)
1. it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that comprise the following steps:
Step 1, the context of training corpus is extended using Chinese thesaurus, generates a large amount of puppet corpus;
Step 2, the noise removed using collocations corpus in puppet corpus, generate pseudo- training corpus;
Step 3, while using training corpus and pseudo- training corpus training Bayes's disambiguation model;
Step 4, testing material is input into into Bayes's disambiguation model, comprehensively utilizes the disambiguation knowledge in two kinds of corpus, it is common certainly
The meaning of a word of plan ambiguity word.
2. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that:Institute
State concretely comprising the following steps for step 1:First, small-scale word sense disambiguation training corpus is set up by the way of artificial mark, then
Using Chinese thesaurus, the context in sentence residing for ambiguity word is extended, finally by the synonym after extension, ambiguity word
And in this ambiguity word the meaning of a word, generate a large amount of puppet corpus.
3. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that step
Rapid 2 concretely comprise the following steps:The context of ambiguity word is extended using Chinese thesaurus, for the context for extending, statistics
With ambiguity word in collocations corpus co-occurrence number of times, only using the context with certain co-occurrence number of times, build pseudo- training
Corpus.
4. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that:Institute
State simultaneously using training corpus and pseudo- training corpus training Bayes's disambiguation model in step 3, computing formula is:
In formula, siRepresent the ambiguity word meaning of a word, w-L...wLRepresent ambiguity word w0Word under neighbouring certain window size L, fjTable
Show certain contextual feature of ambiguity word, F represents the characteristic set of context, p (fj|si) represent that the meaning of a word is general with the condition of feature
Rate, calculates as formula is:
c(si) represent meaning of a word siThe number of times occurred in corpus, c (fj,si) represent feature fjWith meaning of a word siIn corpus
Co-occurrence number of times.
5. it is according to claim 1 it is a kind of based on synonym extend Bayes's Word sense disambiguation method, it is characterised in that:Institute
State concretely comprising the following steps for step 4:The language fragments that the context extended by Chinese thesaurus is constituted, when faking training language
Material, comprehensively utilizes the knowledge in training corpus and pseudo- training corpus, word sense disambiguation is carried out, in the bar for estimating the meaning of a word and feature
During part probability, calculated by below equation:
C in formulat(fj,si) represent meaning of a word siWith feature fjCo-occurrence number of times in corpus, ct(si) represent meaning of a word siIn instruction
Practice the occurrence number in language material, cp(fj,si) represent feature and co-occurrence number of times of the ambiguity word in pseudo- corpus, cp(si) represent
Meaning of a word siThe occurrence number in pseudo- corpus, λ values are 0.7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611157518.1A CN106598947A (en) | 2016-12-15 | 2016-12-15 | Bayesian word sense disambiguation method based on synonym expansion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611157518.1A CN106598947A (en) | 2016-12-15 | 2016-12-15 | Bayesian word sense disambiguation method based on synonym expansion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106598947A true CN106598947A (en) | 2017-04-26 |
Family
ID=58801472
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611157518.1A Pending CN106598947A (en) | 2016-12-15 | 2016-12-15 | Bayesian word sense disambiguation method based on synonym expansion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106598947A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357786A (en) * | 2017-07-13 | 2017-11-17 | 山西大学 | A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data |
CN108388553A (en) * | 2017-12-28 | 2018-08-10 | 广州索答信息科技有限公司 | Talk with method, electronic equipment and the conversational system towards kitchen of disambiguation |
CN109033307A (en) * | 2018-07-17 | 2018-12-18 | 华北水利水电大学 | Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method |
WO2019085640A1 (en) * | 2017-10-31 | 2019-05-09 | 株式会社Ntt都科摩 | Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium |
CN111783418A (en) * | 2020-06-09 | 2020-10-16 | 北京北大软件工程股份有限公司 | Chinese meaning representation learning method and device |
CN112528670A (en) * | 2020-12-01 | 2021-03-19 | 清华大学 | Word meaning processing method and device, electronic equipment and storage medium |
CN112836057A (en) * | 2019-11-22 | 2021-05-25 | 华为技术有限公司 | Knowledge graph generation method, device, terminal and storage medium |
CN112966071A (en) * | 2021-02-03 | 2021-06-15 | 北京奥鹏远程教育中心有限公司 | User feedback information analysis method, device, equipment and readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101295294A (en) * | 2008-06-12 | 2008-10-29 | 昆明理工大学 | Improved Bayes acceptation disambiguation method based on information gain |
-
2016
- 2016-12-15 CN CN201611157518.1A patent/CN106598947A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101295294A (en) * | 2008-06-12 | 2008-10-29 | 昆明理工大学 | Improved Bayes acceptation disambiguation method based on information gain |
Non-Patent Citations (4)
Title |
---|
GERARD ESCUDERO ET AL.: "Naive Bayes and Exemplar-Based approaches to Word Sense Disambiguation Revisited", 《HTTPS://WWW.RESEARCHGATE.NET/PUBLICATION/1955019》 * |
卢志茂,等: "基于依存分析改进贝叶斯模型的词义消歧", 《高技术通讯》 * |
杨陟卓,等: "基于语言模型的有监督词义消歧模型优化研究", 《中文信息学报》 * |
车超: "知识自动获取的词义消歧方法", 《中国博士学位论文全文数据库 信息科技辑(月刊)》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357786A (en) * | 2017-07-13 | 2017-11-17 | 山西大学 | A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data |
WO2019085640A1 (en) * | 2017-10-31 | 2019-05-09 | 株式会社Ntt都科摩 | Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium |
CN108388553A (en) * | 2017-12-28 | 2018-08-10 | 广州索答信息科技有限公司 | Talk with method, electronic equipment and the conversational system towards kitchen of disambiguation |
CN108388553B (en) * | 2017-12-28 | 2021-10-15 | 广州索答信息科技有限公司 | Method for eliminating ambiguity in conversation, electronic equipment and kitchen-oriented conversation system |
CN109033307B (en) * | 2018-07-17 | 2021-08-31 | 华北水利水电大学 | CRP clustering-based word multi-prototype vector representation and word sense disambiguation method |
CN109033307A (en) * | 2018-07-17 | 2018-12-18 | 华北水利水电大学 | Word polyarch vector based on CRP cluster indicates and Word sense disambiguation method |
CN112836057A (en) * | 2019-11-22 | 2021-05-25 | 华为技术有限公司 | Knowledge graph generation method, device, terminal and storage medium |
WO2021098491A1 (en) * | 2019-11-22 | 2021-05-27 | 华为技术有限公司 | Knowledge graph generating method, apparatus, and terminal, and storage medium |
CN112836057B (en) * | 2019-11-22 | 2024-03-26 | 华为技术有限公司 | Knowledge graph generation method, device, terminal and storage medium |
CN111783418A (en) * | 2020-06-09 | 2020-10-16 | 北京北大软件工程股份有限公司 | Chinese meaning representation learning method and device |
CN111783418B (en) * | 2020-06-09 | 2024-04-05 | 北京北大软件工程股份有限公司 | Chinese word meaning representation learning method and device |
CN112528670A (en) * | 2020-12-01 | 2021-03-19 | 清华大学 | Word meaning processing method and device, electronic equipment and storage medium |
CN112528670B (en) * | 2020-12-01 | 2022-08-30 | 清华大学 | Word meaning processing method and device, electronic equipment and storage medium |
CN112966071A (en) * | 2021-02-03 | 2021-06-15 | 北京奥鹏远程教育中心有限公司 | User feedback information analysis method, device, equipment and readable storage medium |
CN112966071B (en) * | 2021-02-03 | 2023-09-08 | 北京奥鹏远程教育中心有限公司 | User feedback information analysis method, device, equipment and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106598947A (en) | Bayesian word sense disambiguation method based on synonym expansion | |
Kondratyuk et al. | 75 languages, 1 model: Parsing universal dependencies universally | |
Wieting et al. | Charagram: Embedding words and sentences via character n-grams | |
Jianqiang | Pre-processing boosting Twitter sentiment analysis? | |
CN110362678A (en) | A kind of method and apparatus automatically extracting Chinese text keyword | |
CN108388554A (en) | Text emotion identifying system based on collaborative filtering attention mechanism | |
CN105512110A (en) | Wrong word knowledge base construction method based on fuzzy matching and statistics | |
CN108363688A (en) | A kind of name entity link method of fusion prior information | |
CN104933032A (en) | Method for extracting keywords of blog based on complex network | |
CN107688630A (en) | A kind of more sentiment dictionary extending methods of Weakly supervised microblogging based on semanteme | |
CN114298010A (en) | Text generation method integrating dual-language model and sentence detection | |
CN106202039A (en) | Vietnamese portmanteau word disambiguation method based on condition random field | |
CN108038166A (en) | A kind of Chinese microblog emotional analysis method based on the subjective and objective skewed popularity of lexical item | |
Wang et al. | Tibetan word segmentation method based on bilstm_ crf model | |
Hao et al. | SCESS: a WFSA-based automated simplified chinese essay scoring system with incremental latent semantic analysis | |
CN110765762B (en) | System and method for extracting optimal theme of online comment text under big data background | |
pal Singh et al. | Naive Bayes classifier for word sense disambiguation of Punjabi language | |
Nawar | CUFE@ QALB-2015 shared task: Arabic error correction system | |
CN106126606A (en) | A kind of short text new word discovery method | |
CN108776656A (en) | Food safety affair entity abstracting method based on condition random field | |
JP2007072610A (en) | Information processing method, apparatus and program | |
CN107357786A (en) | A kind of Bayes's Word sense disambiguation method based on a large amount of pseudo- data | |
CN111814456A (en) | Verb-based Chinese text similarity calculation method | |
Wang et al. | Morpheme Sense Disambiguation: A New Task Aiming for Understanding the Language at Character Level | |
Çavuşoğlu et al. | Adapting established text representations for predicting review sentiment in Turkish |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170426 |