CN105677639A - English word sense disambiguation method based on phrase structure syntax tree - Google Patents

English word sense disambiguation method based on phrase structure syntax tree Download PDF

Info

Publication number
CN105677639A
CN105677639A CN201610011045.8A CN201610011045A CN105677639A CN 105677639 A CN105677639 A CN 105677639A CN 201610011045 A CN201610011045 A CN 201610011045A CN 105677639 A CN105677639 A CN 105677639A
Authority
CN
China
Prior art keywords
word
meaning
disambiguation
ambiguity
phrase structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610011045.8A
Other languages
Chinese (zh)
Inventor
鹿文鹏
成金勇
张维玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201610011045.8A priority Critical patent/CN105677639A/en
Publication of CN105677639A publication Critical patent/CN105677639A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an English word sense disambiguation method based on a phrase structure syntax tree, and belongs to natural language processing. The method comprises the steps that 1, phrase structure syntax analysis is conducted on a sentence, and a phrase structure syntax tree of the sentence is generated; 2, word sense relevant words are screened by taking the phrase structure syntax tree as the basis; 3, a word sense disambiguation model is constructed, and correct word sense is determined by evaluating the intimate level of word sense of ambiguous words and the word sense relevant words; 4, parameters of the word sense disambiguation model in the step 3 are optimized according to a word sense tagged corpus through a genetic algorithm; 5, the step 1 and the step 2 are repeatedly conducted on words to be subjected to disambiguation, and correct word sense of the ambiguous words is determined through the optimized word sense disambiguation model obtained in the step 4. According to the English word sense disambiguation method based on the phrase structure syntax tree, the phrase structure syntax tree is utilized for screening the word sense relevant words and giving disambiguation weight to the word sense relevant words, interference of noise words can be reduced, the computing accuracy of word sense relevancy is improved, and the accuracy of English word sense disambiguation is improved.

Description

A kind of English Word sense disambiguation method based on phrase structure syntax tree
Technical field
The present invention relates to a kind of English Word sense disambiguation method, particularly to a kind of English Word sense disambiguation method based on phrase structure syntax tree, belong to natural language processing technique field.
Background technology
Word sense disambiguation refers to that the context environmental residing for ambiguity word judges its correct meaning of a word. The meaning of a word is the ultimate unit constituting a sentence implication, is the premise understanding a sentence. Word sense disambiguation belongs to the basic task of natural language processing field, has a wide range of applications demand in fields such as machine translation, information retrieval, text classification, question answering systems.
The meaning of a word of ambiguity word is determined by its residing context environmental. Context meaning of a word related term can be selected exactly, the performance of sense disambiguation systems will be directly affected. Existing Word sense disambiguation method generally utilizes context sliding window to select context-sensitive word, namely selects the word within the certain distance of left and right centered by ambiguity word. This method only considered word direct range in sentence, and does not consider the grammer of word, semantic relation. This method cannot filter in-plant noise word, also easily omits remote related term.
The meaning of a word of ambiguity word is determined usually by the level of intimate of relatively each meaning of a word and context meaning of a word related term. Level of intimate can be calculated exactly, the performance of sense disambiguation systems is had decisive influence. The related term of different distance is to the influence degree of the ambiguity word meaning of a word and differs, it is necessary to give suitable disambiguation weight. The weight of context meaning of a word related term is generally considered as equal by existing Word sense disambiguation method, and this cannot embody the weight difference of different distance word, it is difficult to the level of intimate of the accurate evaluation meaning of a word and context meaning of a word related term.
In view of the above problems, the application proposes a kind of English Word sense disambiguation method based on phrase structure syntax tree, the method can make full use of phrase structure syntax tree to carry out the screening of meaning of a word related term and to give disambiguation weight for it, the correct judgment meaning of a word according to the level of intimate of the meaning of a word and context meaning of a word related term.
Summary of the invention
The invention aims to overcome the deficiency of existing word sense disambiguation technology, mainly solve the screening of context meaning of a word related term and compose the computational problem of power and meaning of a word degree of association, it is proposed that a kind of new English Word sense disambiguation method based on phrase structure syntax tree.
It is an object of the invention to be achieved through the following technical solutions.
A kind of English Word sense disambiguation method based on phrase structure syntax tree, its concrete operation step is as follows.
Step one, by sentence is carried out phrase structure syntactic analysis, generate its phrase structure syntax tree; Specific as follows.
Step 1.1: represent pending sentence with symbol S.
Step 1.2: sentence S is carried out pretreatment, mainly includes removing mess code character, special symbol, English hyphenation (Tokenization) etc., it is thus achieved that pretreated sentence S '.
Step 1.3: use phrase structure parser, carries out phrase structure syntactic analysis to sentence S ', generates phrase structure syntax tree T.
Step 1.4: the word in phrase structure syntax tree T is carried out lemmatization.
Step 2, with phrase structure syntax tree for foundation, calculate hierarchy distance and the path distance of other word in ambiguity word and sentence, filter out meaning of a word related term; Specific as follows.
Step 2.1: use symbol wtRepresent the ambiguity word treating disambiguation, represent other word in sentence with symbol w, represent in sentence except ambiguity word w with symbol WtOutside the set of whole notional words.
Step 2.2: by phrase structure syntax tree T, adds up ambiguity word wtHierarchy distance d with other word wl, by dlCharge to w, and be saved in W.
Step 2.3: by phrase structure syntax tree T, adds up ambiguity word wtPath distance d with other word wp, by dpCharge to w, and be saved in W.
Step 2.4: specify level distance parameter d_layer and path distance parameter d_path, screens d from WlIt is not more than d_layer and dpIt is not more than the word of d_path, builds the meaning of a word related term set R of ambiguity word.
Step 3, structure the Model of Word Sense Disambiguation, judge the correct meaning of a word by the level of intimate of each meaning of a word of assessment ambiguity word and meaning of a word related term; Specific as follows.
Step 3.1: for each word w in meaning of a word related term set R, according to its hierarchy distance dlWith path distance dp, formula (1) calculate its disambiguation weight.
(1)
Wherein, α and β is hierarchy distance dlWith path distance dpAdjustment parameter.
Step 3.2: for ambiguity word wtEach meaning of a word si, formula (2) calculate the level of intimate of its word set R relevant to the meaning of a word.
(2)
Wherein, siRepresent ambiguity word wtThe i-th meaning of a word, sense (wt) represent ambiguity word wtThe set of whole meaning of a word, si∈sense(wt), wjRepresenting jth meaning of a word related term, R represents ambiguity word wtThe set of whole meaning of a word related terms, wj∈ R, weight (wj) represent by formula (1) calculated wjDisambiguation weight, wnss (si,wj) represent meaning of a word siWith meaning of a word related term wjMeaning of a word degree of association.
Step 3.3: according to by each meaning of a word s of step 3.2 gainediThe level of intimate of word set R relevant to the meaning of a word, selects the highest meaning of a word of level of intimate as the correct meaning of a word of ambiguity word.
Step 4, by word sense annotated corpus, utilize genetic algorithm, the parameter of the Model of Word Sense Disambiguation in step 3 be optimized, it is thus achieved that the Model of Word Sense Disambiguation of optimization; Specific as follows.
Step 4.1: select suitable word sense annotated corpus Corpus.
Step 4.2: collect each ambiguity word in corpus Corpus, the sentence at place and correct word sense tagging, build the Model of Word Sense Disambiguation training dataset Ctrain
Step 4.3: using the hierarchy distance parameter d_layer in step 2.4 and 3.1, path distance parameter d_path and regulate parameter alpha, β as the input vector of genetic algorithm, using the formula (3) object function as genetic algorithm, at CtrainOn be optimized training, it is thus achieved that optimum d_layer, d_path, α, β parameter.
(3)
Wherein, precision is disambiguation accuracy, and its value is the quantity ratio with ambiguity word sum of the ambiguity word of correct disambiguation.
Step 4.4: d_layer, d_path step 4.3 obtained substitutes into step 2.4, substitutes into formula (1) by α, β, completes the parameter optimization of the Model of Word Sense Disambiguation.
Step 5, for treating disambiguation word, repeat step one and two, utilize the Model of Word Sense Disambiguation of the optimization that step 4 obtains, it is determined that the correct meaning of a word of ambiguity word; Specific as follows.
Step 5.1: according to step one, generates and treats disambiguation word wtThe phrase structure syntax tree T of place sentence.
Step 5.2: according to step 2, it is thus achieved that treat disambiguation word wtWith the hierarchy distance of other word in sentence and path distance, and d_layer, d_path of obtaining according to step 4 screen meaning of a word related term, builds meaning of a word related term set R.
Step 5.3: α, β parameter obtained according to step 4, by step 3.1, the disambiguation weight of each meaning of a word related term in calculating meaning of a word related term set R.
Step 5.4: by step 3.2, it is determined that ambiguity word wtEach meaning of a word siThe level of intimate of word set R relevant to the meaning of a word.
Step 5.5: by step 3.3, it is determined that ambiguity word wtThe correct meaning of a word.
Through the operation of above step, namely can determine that the meaning of a word of English ambiguity language, complete word sense disambiguation task.
Beneficial effect
The present invention proposes the English Word sense disambiguation method based on phrase structure syntax tree, uses phrase structure syntax tree as the screening foundation of the context meaning of a word related term of ambiguity word; According to hierarchy distance on phrase structure syntax tree of meaning of a word related term and ambiguity word and path distance, give meaning of a word related term disambiguation weight; Each meaning of a word according to ambiguity word associates level of intimate and the correct judgment meaning of a word with context meaning of a word related term. The English Word sense disambiguation method based on phrase structure syntax tree that the present invention proposes is compared with existing English Word sense disambiguation method, it can screen context meaning of a word related term more accurately, and give suitable disambiguation weight for meaning of a word related term, the calculating of the ambiguity word meaning of a word Yu the level of intimate of context meaning of a word related term is more accurate. This method can be prevented effectively from the screening of meaning of a word related term and the tax inaccurate problem of power that traditional method exists, and improves the computational accuracy of meaning of a word degree of association, improves the accuracy of English word sense disambiguation.
Accompanying drawing explanation
Fig. 1 is the phrase structure syntax tree of the sentence during the present invention is embodied as.
Detailed description of the invention
Below in conjunction with specific embodiment, the present invention is described in further detail.
For sentence " ⊙ Thecoaches ' teachingfootballarestandingonthebus@. ", ambiguity word coach therein is carried out disambiguation process.
According to WordNet3.0 dictionary, the meaning of a word of ambiguity word coach is as shown in table 1.
The meaning of a word table of table 1coach#n
The meaning of a word is numbered Meaning of a word explanation
coach#n#1 coach,manager, handler -- ((sports) someone in charge of training an athlete or a team)
coach#n#2 coach, private instructor, tutor -- (a person who gives private instruction (as in singing,acting, etc.))
coach#n#3 passenger car, coach, carriage -- (a railcar where passengers ride)
coach#n#4 coach, four-in-hand,coach-and-four -- (a carriage pulled by four horses with one driver)
coach#n#5 bus,autobus, coach, charabanc, double-decker, jitney, motorbus, motorcoach,omnibus, passenger vehicle -- (a vehicle carrying many passengers; used for public transport; "he always rode the bus to work")
Wherein, #n represents that part of speech is noun; #1, #2, #3, #4, #5 represent the meaning of a word sequence number in WordNet3.0.
Step one, by sentence is carried out phrase structure syntactic analysis, generate its phrase structure syntax tree; Specific as follows.
Step 1.1: represent pending sentence with symbol S, in this example, S is " ⊙ Thecoaches ' teachingfootballarestandingonthebus@. ".
Step 1.2: sentence S is carried out pretreatment, mainly include removing mess code character, special symbol, English hyphenation (Tokenization) etc., obtaining pretreated sentence S ', this example can obtain " thecoachesteachingfootballarestandingonthebus. ".
Step 1.3: use phrase structure parser, carries out phrase structure syntactic analysis to sentence S ', generates phrase structure syntax tree T. The StanfordParser parser using Stanford University to provide in this example, uses englishPCFG.ser.gz language model, and the phrase structure syntax tree of generation is as shown in Figure 1.
Step 1.4: the word in phrase structure syntax tree T is carried out lemmatization. The MorphAdorner tool kit provided by means of WordNet3.0 and Northwestern Univ USA in this example, completes lemmatization work, and the word in Fig. 1 will be reduced to: the, coach, teach, football, be, stand, on, the, bus.
Step 2, with phrase structure syntax tree for foundation, calculate hierarchy distance and the path distance of other word in ambiguity word and sentence, filter out meaning of a word related term; Specific as follows.
Step 2.1: use symbol wtRepresent the ambiguity word coach treating disambiguation, represent other word in sentence with symbol w, represent in sentence except ambiguity word w with symbol WtOutside the set of whole notional words, namely teach#n, football#n, stand#v, bus#n}(wherein #n represents noun, and #v represents verb).
Step 2.2: by phrase structure syntax tree T, the hierarchy distance d of statistics ambiguity word coach and other word wl, by dlCharge to w, and be saved in W. If the total father node that coach and w is in T is f, then hierarchy distance is that the path distance length of coach and f subtracts 1. In this example, as shown in Figure 1, coach and teach, football, stand, the hierarchy distance of bus is followed successively by: 1,1,2,2.
Step 2.3: by phrase structure syntax tree T, the path distance d of statistics ambiguity word coach and other word wp, by dpCharge to w, and be saved in W. In this example, as shown in Figure 1, coach and teach, football, stand, the path distance of bus is followed successively by: 4,4,7,9.
Step 2.4: specify level distance parameter d_layer and path distance parameter d_path, screens d from WlIt is not more than d_layer and dpIt is not more than the set of the word w of d_path, builds the meaning of a word related term set R of ambiguity word. In this example, d_layer and d_path is set to 2,9, the meaning of a word related term set that can obtain coach is { teach#n, football#n, stand#v, bus#n}.
Step 3, structure the Model of Word Sense Disambiguation, judge the correct meaning of a word by the level of intimate of each meaning of a word of assessment ambiguity word and meaning of a word related term; Specific as follows.
Step 3.1: for each word w in meaning of a word related term set R, according to its hierarchy distance dlWith path distance dp, formula (1) calculate its disambiguation weight.
(1)
Wherein, α and β is hierarchy distance dlWith path distance dpAdjustment parameter.
In this example, will be set to 1 and 0, be equivalent to the weight of each meaning of a word related term is all assigned to 1.
Step 3.2: for ambiguity word wtEach meaning of a word si, formula (2) calculate the level of intimate of its word set R relevant to the meaning of a word.
(2)
Wherein, siRepresent ambiguity word wtThe i-th meaning of a word, sense (wt) represent ambiguity word wtThe set of whole meaning of a word, si∈sense(wt), wjRepresenting jth meaning of a word related term, R represents ambiguity word wtThe set of whole meaning of a word related terms, wj∈ R, weight (wj) represent by formula (1) calculated wjDisambiguation weight, wnss (si,wj) represent meaning of a word siWith meaning of a word related term wjMeaning of a word degree of association.
In this example, the meaning of a word of ambiguity word coach#n is correlated with word set R={teach#n, football#n, stand#v, bus#n}, it is necessary first to calculate each meaning of a word of coach#n and the meaning of a word degree of association of each related term, i.e. wnss value.Wnss can complete by multiple similarity or relatedness computation instrument; Select at this TedPedersen WordNet::Similarity tool kit write to be calculated, each meaning of a word degree of association can be obtained as shown in table 2.
The meaning of a word degree of association of the meaning of a word of table 2coach#n and related term
teach#n football#n stand#v bus#n
coach#n#1 0.0274664653923546 0.474638267730824 0.0794203349688148 0.0953982038879483
coach#n#2 0.0411270396042137 0.0636370034284592 0.125973809222455 0.105985587733038
coach#n#3 0.0441240510549878 0.109828009114997 0.118997168597431 0.165005388203732
coach#n#4 0.0395030928811857 0.118434570601007 0.116094035457169 0.31888473124512
coach#n#5 0.0563124527152087 0.113685514457318 0.113552132406334 0.999999999999987
Relevance degree Use Word Net::Similarity::vector_pairs module in table 2 calculates and obtains.
Calculate for ease of the later stage, first to each related term wj, calculateValue. Wherein,=wnss(coach#n#1,teach#n)+wnss(coach#n#2,teach#n)+wnss(coach#n#3,teach#n)+wnss(coach#n#4,teach#n)+wnss(coach#n#5,teach#n)=0.0274664653923546+0.0411270396042137+0.0441240510549878+0.0395030928811857+0.0563124527152087=0.20853310164795047。
In like manner, can obtain,
=0.8802233653326053;
=0.5540374806522037;
=1.6852739110698254。
For meaning of a word coach#n#1, by formula (2), relatedness (coach#n#1)=+++=+++=0.13171273613300974+0.5392247995501404+0.14334830718550395+0.05660694280100067=0.8708927856696547。
In like manner, for other meaning of a word, by formula (2), can obtain
Relatedness (coach#n#2)=0.5597805034534482;
Relatedness (coach#n#3)=0.6490573694718037;
Relatedness (coach#n#4)=0.7227439715647753;
relatedness(coach#n#5)=1.197525369840318。
Step 3.3: according to by each meaning of a word s of step 3.2 gainediThe level of intimate of word set R relevant to the meaning of a word, selects the highest meaning of a word of level of intimate as the correct meaning of a word of ambiguity word.
In this example, the meaning of a word degree of association (relatedness value) calculated in comparison step 3.2, selects the maximum coach#n#5 of degree of association as the correct meaning of a word of ambiguity word (it practice, coach#n#5 be the wrong meaning of a word, subsequent step by Optimized model parameter, will correct this false judgment).
Step 4, by word sense annotated corpus, utilize genetic algorithm, the parameter of the Model of Word Sense Disambiguation in step 3 be optimized, it is thus achieved that the Model of Word Sense Disambiguation of optimization; Specific as follows.
Step 4.1: select suitable word sense annotated corpus Corpus. When implementing, it is possible to adopt any kind of word sense annotated corpus. In this example, select the part mark language material in DianaMcCarthy and the RobKoeling ReutersBNC provided.
Step 4.2: collect each ambiguity word in corpus Corpus, the sentence at place and correct word sense tagging, build the Model of Word Sense Disambiguation training dataset Ctrain. In this instance, the ReutersBNC selected by step 4.1 can directly as training dataset. For other tagged corpus, only need to carry out simple text-processing conversion, training dataset can be built.
Step 4.3: using the hierarchy distance parameter d_layer in step 2.4 and 3.1, path distance parameter d_path and regulate parameter alpha, β as the input vector of genetic algorithm, using the formula (3) object function as genetic algorithm, at CtrainOn be optimized training, it is thus achieved that optimum d_layer, d_path, α, β parameter.
(3)
Wherein, precision is disambiguation accuracy, and its value is the quantity ratio with ambiguity word sum of the ambiguity word of correct disambiguation.
In this example, the GeneticAlgorithm by the OptimizationTool of Matlab software offer obtains optimized parameter, and the parameter of GeneticAlgorithm uses the default setting of Matlab. Through training, in this example, 4 parameters are optimised for 3,10,0.5,1.2 respectively.
Step 4.4: d_layer, d_path step 4.3 obtained substitutes into step 2.4, substitutes into formula (1) by α, β, completes the parameter optimization of the Model of Word Sense Disambiguation.
In this example, hierarchy distance is not more than 3 and path distance be not more than the word of 10 using the meaning of a word related term as ambiguity word.Formula (1) will be rewritten as formula (4):
(4)
Wherein, α, the β in formula (1) has been separately optimized is 0.5,1.2.
Step 5, for treating disambiguation word, repeat step one and two, utilize the Model of Word Sense Disambiguation of the optimization that step 4 obtains, it is determined that the correct meaning of a word of ambiguity word; Specific as follows.
In this embodiment, still for sentence " ⊙ Thecoaches ' teachingfootballarestandingonthebus@. ", ambiguity word coach therein is carried out disambiguation process.
Step 5.1: according to step one, generates and treats disambiguation word wtThe phrase structure syntax tree T of place sentence. In this example, phrase structure syntax tree is as shown in Figure 1.
Step 5.2: according to step 2, it is thus achieved that treat disambiguation word wtWith the hierarchy distance of other word in sentence and path distance, and d_layer, d_path of obtaining according to step 4 screen meaning of a word related term, builds meaning of a word related term set R. In this example, by the phrase structure syntax tree of Fig. 1, coach and teach, football, stand, the hierarchy distance of bus is followed successively by: 1,1,2,2; The path distance of coach and teach, football, stand, bus is followed successively by: 4,4,7,9. D_layer, d_path respectively 3,10, teach, football after step 4 optimizes, the hierarchy distance of stand, bus and coach and path distance are satisfied by condition, therefore the meaning of a word related term set R={teach#n built, football#n, stand#v, bus#n}.
Step 5.3: the optimized parameter obtained according to step 4, by step 3.1, the weight of each meaning of a word related term in calculating meaning of a word related term set R. In this example, by formula (4), according to its hierarchy distance and path distance, teach#n, football#n, the disambiguation weight weight of stand#v, bus#n is respectively as follows: 0.2902804823653377,0.2902804823653377,0.12412383171664482,0.11654517159405858.
Step 5.4: by step 3.2, it is determined that ambiguity word wtEach meaning of a word siThe level of intimate of word set R relevant to the meaning of a word. In this example, for meaning of a word coach#n#1, by formula (2), relatedness (coach#n#1)=+++=+++
=0.03823363657834851+0.1565264349167673+0.0177929411579594+0.006597265862157682=0.2191502785152329。
In like manner, can obtain,
Relatedness (coach#n#2)=0.11378754409746956;
Relatedness (coach#n#3)=0.13571081450099737;
Relatedness (coach#n#4)=0.1421077906515997;
relatedness(coach#n#5)=0.21047354027607934。
Step 5.5: by step 3.3, it is determined that ambiguity word wtThe correct meaning of a word. In this example, meaning of a word degree of association (relatedness value) size of each meaning of a word of the coach obtained in comparison step 5.4; Select the maximum coach#n#1 of degree of association as the correct meaning of a word.
Through the operation of above step, namely can determine that the meaning of a word of English ambiguity language, complete word sense disambiguation task.

Claims (1)

1. the English Word sense disambiguation method based on phrase structure syntax tree, it is characterised in that: its concrete operation step is:
Step one, by sentence is carried out phrase structure syntactic analysis, generate its phrase structure syntax tree; Particularly as follows:
Step 1.1: represent pending sentence with symbol S;
Step 1.2: sentence S is carried out pretreatment, mainly includes removing mess code character, special symbol, English hyphenation (Tokenization) etc., it is thus achieved that pretreated sentence S ';
Step 1.3: use phrase structure parser, carries out phrase structure syntactic analysis to sentence S ', generates phrase structure syntax tree T;
Step 1.4: the word in phrase structure syntax tree T is carried out lemmatization;
Step 2, with phrase structure syntax tree for foundation, calculate hierarchy distance and the path distance of other word in ambiguity word and sentence, filter out meaning of a word related term; Particularly as follows:
Step 2.1: use symbol wtRepresent the ambiguity word treating disambiguation, represent other word in sentence with symbol w, represent in sentence except ambiguity word w with symbol WtOutside the set of whole notional words;
Step 2.2: by phrase structure syntax tree T, adds up ambiguity word wtHierarchy distance d with other word wl, by dlCharge to w, and be saved in W;
Step 2.3: by phrase structure syntax tree T, adds up ambiguity word wtPath distance d with other word wp, by dpCharge to w, and be saved in W;
Step 2.4: specify level distance parameter d_layer and path distance parameter d_path, screens d from WlIt is not more than d_layer and dpIt is not more than the word of d_path, builds the meaning of a word related term set R of ambiguity word;
Step 3, structure the Model of Word Sense Disambiguation, judge the correct meaning of a word by the level of intimate of each meaning of a word of assessment ambiguity word and meaning of a word related term; Particularly as follows:
Step 3.1: for each word w in meaning of a word related term set R, according to its hierarchy distance dlWith path distance dp, formula (1) calculate its disambiguation weight;
(1)
Wherein, α and β is hierarchy distance dlWith path distance dpAdjustment parameter;
Step 3.2: for ambiguity word wtEach meaning of a word si, formula (2) calculate the level of intimate of its word set R relevant to the meaning of a word;
(2)
Wherein, siRepresent ambiguity word wtThe i-th meaning of a word, sense (wt) represent ambiguity word wtThe set of whole meaning of a word, si∈sense(wt), wjRepresenting jth meaning of a word related term, R represents ambiguity word wtThe set of whole meaning of a word related terms, wj∈ R, weight (wj) represent by formula (1) calculated wjDisambiguation weight, wnss (si,wj) represent meaning of a word siWith meaning of a word related term wjMeaning of a word degree of association;
Step 3.3: according to by each meaning of a word s of step 3.2 gainediThe level of intimate of word set R relevant to the meaning of a word, selects the highest meaning of a word of level of intimate as the correct meaning of a word of ambiguity word;
Step 4, by word sense annotated corpus, utilize genetic algorithm, the parameter of the Model of Word Sense Disambiguation in step 3 be optimized, it is thus achieved that the Model of Word Sense Disambiguation of optimization; Particularly as follows:
Step 4.1: select suitable word sense annotated corpus Corpus;
Step 4.2: collect each ambiguity word in corpus Corpus, the sentence at place and correct word sense tagging, build the Model of Word Sense Disambiguation training dataset Ctrain;
Step 4.3: using the hierarchy distance parameter d_layer in step 2.4 and 3.1, path distance parameter d_path and regulate parameter alpha, β as the input vector of genetic algorithm, using the formula (3) object function as genetic algorithm, at CtrainOn be optimized training, it is thus achieved that optimum d_layer, d_path, α, β parameter;
(3)
Wherein, precision is disambiguation accuracy, and its value is the quantity ratio with ambiguity word sum of the ambiguity word of correct disambiguation;
Step 4.4: d_layer, d_path step 4.3 obtained substitutes into step 2.4, substitutes into formula (1) by α, β, completes the parameter optimization of the Model of Word Sense Disambiguation;
Step 5, for treating disambiguation word, repeat step one and two, utilize the Model of Word Sense Disambiguation of the optimization that step 4 obtains, it is determined that the correct meaning of a word of ambiguity word; Particularly as follows:
Step 5.1: according to step one, generates and treats disambiguation word wtThe phrase structure syntax tree T of place sentence;
Step 5.2: according to step 2, it is thus achieved that treat disambiguation word wtWith the hierarchy distance of other word in sentence and path distance, according to d_layer, d_path parameter that step 4 obtains, screen meaning of a word related term, build meaning of a word related term set R;
Step 5.3: α, β parameter obtained according to step 4, by step 3.1, the disambiguation weight of each meaning of a word related term in calculating meaning of a word related term set R;
Step 5.4: by step 3.2, it is determined that ambiguity word wtEach meaning of a word siThe level of intimate of word set R relevant to the meaning of a word;
Step 5.5: by step 3.3, it is determined that ambiguity word wtThe correct meaning of a word;
Through the operation of above step, namely can determine that the meaning of a word of English ambiguity language, complete word sense disambiguation task.
CN201610011045.8A 2016-01-10 2016-01-10 English word sense disambiguation method based on phrase structure syntax tree Pending CN105677639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610011045.8A CN105677639A (en) 2016-01-10 2016-01-10 English word sense disambiguation method based on phrase structure syntax tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610011045.8A CN105677639A (en) 2016-01-10 2016-01-10 English word sense disambiguation method based on phrase structure syntax tree

Publications (1)

Publication Number Publication Date
CN105677639A true CN105677639A (en) 2016-06-15

Family

ID=56299412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610011045.8A Pending CN105677639A (en) 2016-01-10 2016-01-10 English word sense disambiguation method based on phrase structure syntax tree

Country Status (1)

Country Link
CN (1) CN105677639A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126501A (en) * 2016-06-29 2016-11-16 齐鲁工业大学 A kind of noun Word sense disambiguation method based on interdependent constraint and knowledge and device
CN108804529A (en) * 2018-05-02 2018-11-13 深圳智能思创科技有限公司 A kind of question answering system implementation method based on Web
CN110008310A (en) * 2019-04-04 2019-07-12 北京神州泰岳软件股份有限公司 A kind of content search method and device
CN110333990A (en) * 2019-05-29 2019-10-15 阿里巴巴集团控股有限公司 Data processing method and device
CN111079429A (en) * 2019-10-15 2020-04-28 平安科技(深圳)有限公司 Entity disambiguation method and device based on intention recognition model and computer equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HEYAN HUANG等: "Knowledge-based Word Sense Disambiguation with Feature Words Based on Dependency Relation and Syntax Tree", 《INTERNATIONAL JOURNAL OF ADVANCEMENTS IN COMPUTING TECHNOLOGY》 *
郎倩雨等: "电力专业英语语料库在电力专业学习中的应用", 《学理论》 *
鹿文鹏: "基于依存和领域知识的词义消歧方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126501A (en) * 2016-06-29 2016-11-16 齐鲁工业大学 A kind of noun Word sense disambiguation method based on interdependent constraint and knowledge and device
CN106126501B (en) * 2016-06-29 2019-02-19 齐鲁工业大学 A kind of noun Word sense disambiguation method and device based on interdependent constraint and knowledge
CN108804529A (en) * 2018-05-02 2018-11-13 深圳智能思创科技有限公司 A kind of question answering system implementation method based on Web
CN110008310A (en) * 2019-04-04 2019-07-12 北京神州泰岳软件股份有限公司 A kind of content search method and device
CN110333990A (en) * 2019-05-29 2019-10-15 阿里巴巴集团控股有限公司 Data processing method and device
CN110333990B (en) * 2019-05-29 2023-06-27 创新先进技术有限公司 Data processing method and device
CN111079429A (en) * 2019-10-15 2020-04-28 平安科技(深圳)有限公司 Entity disambiguation method and device based on intention recognition model and computer equipment
CN111079429B (en) * 2019-10-15 2022-03-18 平安科技(深圳)有限公司 Entity disambiguation method and device based on intention recognition model and computer equipment

Similar Documents

Publication Publication Date Title
CN107239446B (en) A kind of intelligence relationship extracting method based on neural network Yu attention mechanism
Li et al. A co-attention neural network model for emotion cause analysis with emotional context awareness
US10776566B2 (en) System and method of document generation
CN107463607B (en) Method for acquiring and organizing upper and lower relations of domain entities by combining word vectors and bootstrap learning
CN110297913A (en) A kind of electronic government documents entity abstracting method
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
CN105677639A (en) English word sense disambiguation method based on phrase structure syntax tree
CN102214166B (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN108846017A (en) The end-to-end classification method of extensive newsletter archive based on Bi-GRU and word vector
CN110427608B (en) Chinese word vector representation learning method introducing layered shape-sound characteristics
CN106126620A (en) Method of Chinese Text Automatic Abstraction based on machine learning
CN103678285A (en) Machine translation method and machine translation system
CN103678271B (en) A kind of text correction method and subscriber equipment
CN106569993A (en) Method and device for mining hypernym-hyponym relation between domain-specific terms
Sharma et al. Using Hidden Markov Model to improve the accuracy of Punjabi POS tagger
CN113743099B (en) System, method, medium and terminal for extracting terms based on self-attention mechanism
CN105975455A (en) Information analysis system based on bidirectional recursive neural network
CN110489554B (en) Attribute-level emotion classification method based on location-aware mutual attention network model
Bilgin et al. Sentiment analysis with term weighting and word vectors
CN106202065A (en) A kind of across language topic detecting method and system
CN110334362B (en) Method for solving and generating untranslated words based on medical neural machine translation
Du et al. Named entity recognition method with word position
CN110929518A (en) Text sequence labeling algorithm using overlapping splitting rule
Avetisyan et al. Word embeddings for the armenian language: intrinsic and extrinsic evaluation
CN109977391B (en) Information extraction method and device for text data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160615