CN105718443A - Adjective word sense disambiguation method based on dependency vocabulary association degree - Google Patents

Adjective word sense disambiguation method based on dependency vocabulary association degree Download PDF

Info

Publication number
CN105718443A
CN105718443A CN201610048601.9A CN201610048601A CN105718443A CN 105718443 A CN105718443 A CN 105718443A CN 201610048601 A CN201610048601 A CN 201610048601A CN 105718443 A CN105718443 A CN 105718443A
Authority
CN
China
Prior art keywords
word
interdependent
meaning
dependency
adjective
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610048601.9A
Other languages
Chinese (zh)
Inventor
鹿文鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201610048601.9A priority Critical patent/CN105718443A/en
Publication of CN105718443A publication Critical patent/CN105718443A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an adjective word sense disambiguation method based on the dependency vocabulary association degree and belongs to the technical field of natural language processing.The method includes the steps that firstly, according to a semantic dictionary, synonyms, near-synonyms and antonyms of all word senses of a target adjective ambiguous word are collected, and a relevant word set of the corresponding word senses is established; secondly, a sentence where the target ambiguous word is located is subjected to dependency grammar analysis, an adjective embellished and adverb embellished dependency tuple containing the target ambiguous word is collected, and corresponding dependency co-occurrence words are extracted; thirdly, large-scale corpus is subjected to dependency parsing, dependency co-occurrence word pairs in the large-scale corpus are collected, and a dependency co-occurrence word pair database DB is established; fourthly, according to the DB, the dependency vocabulary association degree of various word senses of the target ambiguous word is calculated; fifthly, the word sense with the largest overall dependency vocabulary association degree is judged as a correct word sense.Compared with the prior art, dependency co-occurrence words can be accurately selected, and interference of noise words is avoided; the dependency co-occurrence word pair database can be automatically established, and no manual assisting operation is needed; the adjective word sense disambiguation effect can be improved.

Description

A kind of adjective word sense disambiguation method based on interdependent vocabulary association degree
Technical field
The present invention relates to a kind of adjective word sense disambiguation method, be related specifically to a kind of adjective word sense disambiguation method based on interdependent vocabulary association degree, belong to natural language processing technique field.
Background technology
The phenomenon of ubiquity polysemy in natural language.Namely word sense disambiguation refers to that the context environmental residing for polysemant automatically determines its meaning of a word.Word sense disambiguation belongs to the bottom research of natural language processing field, machine translation, information retrieval, information extraction, sentiment analysis, public sentiment monitoring etc. is respectively provided with and directly affects.
Word sense disambiguation method can divide the method in measure of supervision, unsupervised approaches and knowledge based storehouse.Measure of supervision is had to utilize meaning of a word grader to carry out the judgement of the meaning of a word;The meaning of a word is classified by unsupervised approaches mainly through the context words of ambiguity word is clustered;The method in knowledge based storehouse based on context environment, utilizes knowledge base to judge the meaning of a word of ambiguity word.Having measure of supervision to need substantial amounts of word sense tagging language material to train meaning of a word grader, this seriously constrains its range of application;Unsupervised approaches is substantially a kind of meaning of a word discrimination method, can not really be applied to extensive word sense disambiguation task;The method in knowledge based storehouse needs to use substantial amounts of knowledge base, and the quality of knowledge base directly affects its disambiguation ability.Wherein, the method in knowledge based storehouse is the currently the only method that can really be applied to extensive word sense disambiguation task.
The method in knowledge based storehouse needs the context environmental in conjunction with ambiguity word, judges the meaning of a word of ambiguity word according to its knowledge base.Existing method generally utilizes sliding window to carry out the selection of context, and this can introduce the noise word that some are unrelated unavoidably;The knowledge base that existing method uses is usually artificial constructed, and it is with high costs, is not easy to extension;Existing method often and does not differentiate between the part of speech of ambiguity word, fails to make full use of the unique characteristics of different part of speech ambiguity word.
Summary of the invention
The invention aims to overcome the deficiencies in the prior art, mainly solve adjectival word sense disambiguation problem, it is proposed to a kind of adjective word sense disambiguation method based on interdependent vocabulary association degree.
It is an object of the invention to be achieved through the following technical solutions.
A kind of adjective word sense disambiguation method based on interdependent vocabulary association degree, its concrete operation step is as follows.
Step one, according to semantic dictionary, collect target adjective ambiguity word wtThe synonym of each meaning of a word si, near synonym, antonym, build the relevant word set W of the corresponding meaning of a wordsi;Specific as follows.
Step 1.1: according to WordNet, take the synset of meaning of a word concept si.
Step 1.2: according to WordNet, take the near synonym collection of meaning of a word concept si.
Step 1.3: according to WordNet, take the antisense word set of meaning of a word concept si.
Step 1.4: by the synset of step 1.1 ~ 1.3 gained, near synonym collection, antonym set also, builds the relevant word set W of the corresponding meaning of a wordsi
Step 2, sentence to target ambiguities word place carry out interdependent syntactic analysis, collect the adjective comprising target ambiguities word and modify and interdependent tuple modified in adverbial word, extract corresponding interdependent co-occurrence word wamodAnd wadvmod;Specific as follows.
Step 2.1: utilize interdependent syntactic analysis instrument that the sentence at target ambiguities word place is carried out interdependent syntactic analysis, obtain its interdependent tuple-set.
Step 2.2: by the interdependent tuple-set of step 2.1 gained, extracts the adjective comprising target ambiguities word and modifies and the adverbial word interdependent tuple of modification.
Step 2.3: by the interdependent tuple of step 2.2 gained, extract the interdependent co-occurrence notional word w of ambiguity wordamodAnd wadvmod
Step 3, large-scale corpus is carried out interdependent syntactic analysis, collect interdependent co-occurrence word pair therein, build interdependent co-occurrence word pair database DB;Specific as follows.
Step 3.1: utilize interdependent syntactic analysis instrument that extensive corpus of text is carried out interdependent syntactic analysis, obtains its interdependent tuple-set DSet.
Step 3.2: give up the dependency relationship type information of interdependent tuple in DSet, add up interdependent co-occurrence word pair, builds interdependent co-occurrence word pair database DB.
Step 4, according to DB, calculate the interdependent vocabulary association degree of each meaning of a word of target ambiguities word;Specific as follows.
Step 4.1: for the relevant word set W of meaning of a word sisiIn each related term wsi, by formula (1), calculate itself and wamod、wadvmodInterdependent vocabulary association degree, i.e. relatedness (wamod,wsi) and relatedness (wsi,wadvmod)。
relatedness(w1,w2)=LLR(w1,w2)=2[LogL(p1,a,a+b)+LogL(p2,c,c+d)-LogL(p,a,a+b)-LogL(p,c,c+d)](1)
Wherein,
a=freq(w1,w2) represent that governing word is w1, and dependent is w2The sum of interdependent tuple;
b=freq(w1, *) and-a represents that governing word is w1, but dependent is not w2The sum of interdependent tuple;
c=freq(*,w2)-a represents that dependent is w2, but governing word is not w1The sum of interdependent tuple;
D=N-a-b-c represents that governing word is not w1And dependent is not w2The sum of interdependent tuple;
N represents the sum of the whole interdependent tuple that corpus comprises.
Step 4.2: by formula (2), calculates meaning of a word si and interdependent co-occurrence word wamodAnd wadvmodThe interdependent vocabulary association degree of entirety.
relatedness(si)=relatedness(wamod,Wsi)+relatedness(Wsi,wadvmod)(2)
Wherein,
WsiRepresent the relevant word set of the meaning of a word si obtained by step one.
Step 5, the meaning of a word maximum for overall interdependent vocabulary association degree is judged to the correct meaning of a word;Specific as follows.
The interdependent vocabulary association degree of entirety of each meaning of a word relatively obtained by step 4.2, is judged to the correct meaning of a word of ambiguity word by the meaning of a word maximum for interdependent vocabulary association degree.
Through the operation of above step, namely can determine that the meaning of a word of adjective ambiguity word, complete word sense disambiguation task.
Beneficial effect
The adjective word sense disambiguation method based on interdependent vocabulary association degree that the present invention proposes, interdependent syntactic analysis is utilized to obtain interdependent co-occurrence word for adjective, the interdependent vocabulary association degree of the meaning of a word is calculated, thus judging the adjectival correct meaning of a word according to the interdependent co-occurrence word pair database automatically obtained.Compared with traditional Word sense disambiguation method, the method that the present invention proposes can select interdependent co-occurrence word more accurately for adjectival feature, is prevented effectively from the interference of uncorrelated noise word;Can automatically build interdependent co-occurrence word pair database, it is not necessary to any artificial auxiliary operation, it is easy to data base is extended.The method that the present invention proposes can improve the effect of adjective word sense disambiguation.
Detailed description of the invention
Below in conjunction with example, the specific embodiment of the present invention is described in further details.
For sentence " Thelargenumberofmentallyillpeopletendtocommitsuicideinmo stdevelopedcountries. ", adjective ambiguity word ill, developed therein are carried out disambiguation process.
According to WordNet3.0 dictionary, the meaning of a word of adjective ambiguity word ill, developed is such as shown in table 1, table 2.
The meaning of a word table of table 1 adjective ill
The meaning of a word is numbered Meaning of a word explanation
ill#a#1 ill, sick -- (affected by an impairment of normal physical or mental function; "ill from the monotony of his suffering")
ill#a#2 ill -- (resulting in suffering or adversity; "ill effects"; "it's an ill wind that blows no good")
ill#a#3 ill -- (distressing; "ill manners"; "of ill repute")
ill#a#4 ill -- (indicating hostility or enmity; "you certainly did me an ill turn"; "ill feelings"; "ill will")
ill#a#5 ill, inauspicious, ominous -- (presaging ill fortune; "ill omens"; "ill predictions"; "a by-election at a time highly unpropitious for the government")
Wherein, #a represents that part of speech is adjective, and #1 ~ #5 represents meaning of a word sequence number.
The meaning of a word table of table 2 adjective developed
The meaning of a word is numbered Meaning of a word explanation
developed#a#1 developed -- (being changed over time so as to be e.g. stronger or more complete or more useful; "they have very small limbs with only two fully developed toes on each")
developed#a#2 developed, highly-developed -- ((used of societies) having high industrial development; "developed countries")
developed#a#3 developed -- ((of real estate) made more useful and profitable as by building or laying out roads; "condominiums were built on the developed site")
Wherein, #a represents that part of speech is adjective, and #1 ~ #3 represents meaning of a word sequence number.
Step one, according to semantic dictionary, collect target adjective ambiguity word wtThe synonym of each meaning of a word si, near synonym, antonym, build the relevant word set W of the corresponding meaning of a wordsi;Specific as follows.
Step 1.1: according to WordNet, take the synset of meaning of a word concept si.
In this instance, according to WordNet, the synonym of each meaning of a word of ill and developed can be obtained such as shown in table 3, table 4.
Step 1.2: according to WordNet, take the near synonym collection of meaning of a word concept si.
In this instance, according to WordNet, the near synonym of each meaning of a word of ill and developed can be obtained such as shown in table 3, table 4.
Step 1.3: according to WordNet, take the antisense word set of meaning of a word concept si.
In this instance, according to WordNet, the antonym of each meaning of a word of ill and developed can be obtained such as shown in table 3, table 4.
Step 1.4: by the synset of step 1.1 ~ 1.3 gained, near synonym collection, antonym set also, builds the relevant word set W of the corresponding meaning of a wordsi
In this instance, the relevant word set of each meaning of a word of ill and developed can be obtained such as shown in table 5, table 6.
The related term of each meaning of a word of table 3 adjective ill
The meaning of a word is numbered Synonym Near synonym Antonym 3-->
ill#a#1 sick afflicted stricken aguish ailing indisposed peaked poorly sickly unwell seedy airsick carsick seasick autistic bedfast bedridden bedrid sick-abed bilious liverish livery bronchitic consumptive convalescent recovering delirious hallucinating diabetic dizzy giddy woozy vertiginous dyspeptic faint light swooning light-headed lightheaded feverish feverous funny gouty green milk-sick nauseated nauseous queasy sickish palsied paralytic paralyzed paraplegic rickety rachitic scrofulous sneezy spastic tubercular tuberculous unhealed upset well
ill#a#2 - harmful -
ill#a#3 - bad -
ill#a#4 - hostile -
ill#a#5 inauspicious ominous unpropitious -
Wherein, #a represents that part of speech is adjective, and #1 ~ #5 represents meaning of a word sequence number.
The related term of each meaning of a word of table 4 adjective developed
The meaning of a word is numbered Synonym Near synonym Antonym
developed#a#1 - formed formulated mature matured undeveloped
developed#a#2 highly-developed industrial -
developed#a#3 - improved -
Wherein, #a represents that part of speech is adjective, and #1 ~ #3 represents meaning of a word sequence number.
Each meaning of a word of table 5 adjective ill relevant word set
The meaning of a word is numbered Relevant word set
ill#a#1 sick afflicted stricken aguish ailing indisposed peaked poorly sickly unwell seedy airsick carsick seasick autistic bedfast bedridden bedrid sick-abed bilious liverish livery bronchitic consumptive convalescent recovering delirious hallucinating diabetic dizzy giddy woozy vertiginous dyspeptic faint light swooning light-headed lightheaded feverish feverous funny gouty green milk-sick nauseated nauseous queasy sickish palsied paralytic paralyzed paraplegic rickety rachitic scrofulous sneezy spastic tubercular tuberculous unhealed upset well
ill#a#2 harmful
ill#a#3 bad
ill#a#4 hostile
ill#a#5 inauspicious ominous unpropitious
Wherein, #a represents that part of speech is adjective, and #1 ~ #5 represents meaning of a word sequence number.
The relevant word set of each meaning of a word of table 6 adjective developed
The meaning of a word is numbered Relevant word set
developed#a#1 formed formulated mature matured undeveloped
developed#a#2 highly-developed industrial
developed#a#3 improved
Wherein, #a represents that part of speech is adjective, and #1 ~ #3 represents meaning of a word sequence number.
Step 2, sentence to target ambiguities word place carry out interdependent syntactic analysis, collect the adjective comprising target ambiguities word and modify and interdependent tuple modified in adverbial word, extract corresponding interdependent co-occurrence word wamodAnd wadvmod;Specific as follows.
Step 2.1: utilize interdependent syntactic analysis instrument that the sentence at target ambiguities word place is carried out interdependent syntactic analysis, obtain its interdependent tuple-set.
In this example, by the StanfordParser parser that Stanford University provides, use englishPCFG.ser.gz language model, and Use Word Net3.0 carries out lemmatization, the interdependent tuple-set that can obtain sentence is as follows: det (number-3, the-1), amod (number-3, large-2), nsubj (tend-8, number-3), xsubj (commit-10, number-3), advmod (ill-6, mentally-5), amod (people-7, ill-6), prep_of (number-3, people-7), aux (commit-10, to-9), xcomp (tend-8, commit-10), dobj (commit-10, suicide-11), advmod (developed-14, most-13), amod (country-15, developed-14), prep_in (suicide-11, country-15).
Step 2.2: by the interdependent tuple-set of step 2.1 gained, extracts the adjective comprising target ambiguities word and modifies and the adverbial word interdependent tuple of modification.
In this example, for ambiguity word ill, amod (people-7, ill-6) and advmod (ill-6, mentally-5) can be extracted;For ambiguity word developed, amod (country-15, developed-14) and advmod (developed-14, most-13) can be extracted.
Step 2.3: by the interdependent tuple of step 2.2 gained, extract the interdependent co-occurrence notional word w of ambiguity wordamodAnd wadvmod
In this example, for ambiguity word ill, w can be obtainedamodFor people, wadvmodFor mentally;For ambiguity word developed, w can be obtainedamodFor country, wadvmodFor most.
Step 3, large-scale corpus is carried out interdependent syntactic analysis, collect interdependent co-occurrence word pair therein, build interdependent co-occurrence word pair database DB;Specific as follows.
Step 3.1: utilize interdependent syntactic analysis instrument that extensive corpus of text is carried out interdependent syntactic analysis, obtains its interdependent tuple-set DSet.
In this example, interdependent syntactic analysis instrument adopts the StanfordParser parser that Stanford University provides, and uses englishPCFG.ser.gz language model, and Use Word Net3.0 carries out lemmatization.Extensive corpus of text adopts the ReuterCorpus that Reuter provides.Utilize StanfordParser that the corpus of text in ReuterCorpus carries out syntactic analysis sentence by sentence, collect the interdependent tuple obtained, be stored in interdependent tuple-set DSet.In this example, the DSet finally given comprises 93850841 interdependent tuples altogether.
Step 3.2: give up the dependency relationship type information of interdependent tuple in DSet, add up interdependent co-occurrence word pair, builds interdependent co-occurrence word pair database DB.
In this example, the interdependent tuple in DSet is given up dependency relationship type information, only retain governing word and dependent information, the co-occurrence frequency information of the interdependent co-occurrence word pair that statistics governing word and dependent are constituted, build interdependent co-occurrence word pair database DB.
In this example, comprising altogether and deposit co-occurrence word to 9269109 pairs in the DB finally given, its co-occurrence frequency summation is 93850841.
Step 4, according to DB, calculate the interdependent vocabulary association degree of each meaning of a word of target ambiguities word;Specific as follows.
Step 4.1: for the relevant word set W of meaning of a word sisiIn each related term wsi, by formula (1), calculate itself and wamod、wadvmodInterdependent vocabulary association degree, i.e. relatedness (wamod,wsi) and relatedness (wsi,wadvmod)。
relatedness(w1,w2)=LLR(w1,w2)=2[LogL(p1,a,a+b)+LogL(p2,c,c+d)-LogL(p,a,a+b)-LogL(p,c,c+d)](1)
Wherein,
a=freq(w1,w2) represent that governing word is w1, and dependent is w2The sum of interdependent tuple;
b=freq(w1, *) and-a represents that governing word is w1, but dependent is not w2The sum of interdependent tuple;
c=freq(*,w2)-a represents that dependent is w2, but governing word is not w1The sum of interdependent tuple;
D=N-a-b-c represents that governing word is not w1And dependent is not w2The sum of interdependent tuple;
N represents the sum of the whole interdependent tuple that corpus comprises.
In this example, for ambiguity word ill, its wamodFor people, wadvmodFor mentally, by formula (1), calculate the interdependent vocabulary association degree of its each meaning of a word related term.
Wherein the meaning of a word related term sick of ill#a#1, sickly, light, funny, green and people interdependent vocabulary association degree be respectively as follows: 414.633560,2.797437,10.267433,10.214535,3.727571;The degree of association of other meaning of a word related term is 0.
The interdependent vocabulary association degree of meaning of a word related term sick and the mentally of ill#a#1 is: 36.692474;The degree of association of other meaning of a word related term is 0.
Meaning of a word related term harmful and the people of ill#a#2, mentally interdependent vocabulary association degree be 0.
Meaning of a word related term bad and the people of ill#a#3, mentally interdependent vocabulary association degree respectively 0.703737,0.
Meaning of a word related term hostile and the people of ill#a#4, mentally interdependent vocabulary association degree respectively 0.609087,0.
The interdependent vocabulary association degree of the meaning of a word related term inauspicious, ominous, unpropitious and people, mentally of ill#a#5 is 0.
For ambiguity word developed, its wamodFor country, wadvmodFor most, by formula (1), calculate the interdependent vocabulary association degree of its each meaning of a word related term.
Wherein, the interdependent word-correlativity respectively 0,0,0,0,22.751748 of the meaning of a word related term formed of developed#a#1, formulated, mature, matured, undeveloped and country;Its interdependent word-correlativity with most respectively 0,0,7.076829,0,1.862240.
The meaning of a word related term highly-developed of developed#a#2, industrial and country interdependent word-correlativity respectively 0,611.842281;Its interdependent word-correlativity with most respectively 0,16.894161.
The interdependent word-correlativity of meaning of a word related term improved and the country of developed#a#3 is 0;Its interdependent word-correlativity with most is 0.
Step 4.2: by formula (2), calculates meaning of a word si and interdependent co-occurrence word wamodAnd wadvmodThe interdependent vocabulary association degree of entirety.
relatedness(si)=relatedness(wamod,Wsi)+relatedness(Wsi,wadvmod)(2)
Wherein,
WsiRepresent the relevant word set of the meaning of a word si obtained by step one.
In this example, for ambiguity word ill, relatedness (ill#a#1)=relatedness (" people ", Will#n#1)+relatedness(Will#n#1,“mentally”)=max(414.633560,2.797437,10.267433,10.214535,3.727571,0,0,…,0)+max(36.692474,0,0,…,0)=414.633560+36.692474=451.326034。
In like manner, relatedness (ill#a#2)=0;Relatedness (ill#a#3)=0.703737;Relatedness (ill#a#4)=0.609087;Relatedness (ill#a#5)=0.
For ambiguity word developed, relatedness (developed#a#1)=relatedness (" country ", Wdeveloped#a#1)+relatedness(Wdeveloped#a#1,“most”)=max(0,0,0,0,22.751748)+max(0,0,7.076829,0,1.862240)=22.751748+7.076829=29.828577。
In like manner, relatedness (developed#a#2)=628.736442;Relatedness (developed#a#3)=0.
Step 5, the meaning of a word maximum for overall interdependent vocabulary association degree is judged to the correct meaning of a word;Specific as follows.
The interdependent vocabulary association degree of entirety of each meaning of a word relatively obtained by step 4.2, is judged to the correct meaning of a word of ambiguity word by the meaning of a word maximum for interdependent vocabulary association degree.
In this instance, for ambiguity word ill, by step 4.2, the interdependent vocabulary association degree of its ill#a#1, ill#a#2, ill#a#3, ill#a#4, ill#a#5 is respectively as follows: 451.326034,0,0.703737,0.609087,0;Visible, the interdependent vocabulary association degree of ill#a#1 is maximum, and it will be judged as the correct meaning of a word of ambiguity word ill.
For ambiguity word developed, by step 4.2, the interdependent vocabulary association degree of its developed#a#1, developed#a#2, developed#a#3 is respectively as follows: 29.828577,628.736442,0;Visible, the interdependent vocabulary association degree of developed#a#2 is maximum, and it will be judged as the correct meaning of a word of ambiguity word developed.
Through the operation of above step, namely can determine that the meaning of a word of adjective ambiguity word, complete word sense disambiguation task.
As it has been described above, the invention provides a kind of adjective word sense disambiguation method based on interdependent vocabulary association degree.User inputs sentence and indicates target adjective ambiguity word, and the adjectival meaning of a word of target will be judged by system automatically.
Above-described specific descriptions; the purpose of invention, technical scheme and beneficial effect have been described in detail; it is it should be understood that; the foregoing is only specific embodiments of the invention; the protection domain being not intended to limit the present invention; all within the spirit and principles in the present invention, any amendment of making, equivalent replacement, improvement etc., should be included within protection scope of the present invention.

Claims (1)

1. the adjective word sense disambiguation method based on interdependent vocabulary association degree, it is characterised in that: its concrete operation step is:
Step one, according to semantic dictionary, collect target adjective ambiguity word wtThe synonym of each meaning of a word si, near synonym, antonym, build the relevant word set W of the corresponding meaning of a wordsi;Particularly as follows:
Step 1.1: according to WordNet, take the synset of meaning of a word concept si;
Step 1.2: according to WordNet, take the near synonym collection of meaning of a word concept si;
Step 1.3: according to WordNet, take the antisense word set of meaning of a word concept si;
Step 1.4: by the synset of step 1.1 ~ 1.3 gained, near synonym collection, antonym set also, builds the relevant word set W of the corresponding meaning of a wordsi
Step 2, sentence to target ambiguities word place carry out interdependent syntactic analysis, collect the adjective comprising target ambiguities word and modify and interdependent tuple modified in adverbial word, extract corresponding interdependent co-occurrence word wamodAnd wadvmod;Particularly as follows:
Step 2.1: utilize interdependent syntactic analysis instrument that the sentence at target ambiguities word place is carried out interdependent syntactic analysis, obtain its interdependent tuple-set;
Step 2.2: by the interdependent tuple-set of step 2.1 gained, extracts the adjective comprising target ambiguities word and modifies and the adverbial word interdependent tuple of modification;
Step 2.3: by the interdependent tuple of step 2.2 gained, extract the interdependent co-occurrence notional word w of ambiguity wordamodAnd wadvmod
Step 3, large-scale corpus is carried out interdependent syntactic analysis, collect interdependent co-occurrence word pair therein, build interdependent co-occurrence word pair database DB;Particularly as follows:
Step 3.1: utilize interdependent syntactic analysis instrument that extensive corpus of text is carried out interdependent syntactic analysis, obtains its interdependent tuple-set DSet;
Step 3.2: give up the dependency relationship type information of interdependent tuple in DSet, add up interdependent co-occurrence word pair, builds interdependent co-occurrence word pair database DB;
Step 4, according to DB, calculate the interdependent vocabulary association degree of each meaning of a word of target ambiguities word;Particularly as follows:
Step 4.1: for the relevant word set W of meaning of a word sisiIn each related term wsi, by formula (1), calculate itself and wamod、wadvmodInterdependent vocabulary association degree, i.e. relatedness (wamod,wsi) and relatedness (wsi,wadvmod);
relatedness(w1,w2)=LLR(w1,w2)=2[LogL(p1,a,a+b)+LogL(p2,c,c+d)-LogL(p,a,a+b)-LogL(p,c,c+d)](1)
Wherein,
a=freq(w1,w2) represent that governing word is w1, and dependent is w2The sum of interdependent tuple;
b=freq(w1, *) and-a represents that governing word is w1, but dependent is not w2The sum of interdependent tuple;
c=freq(*,w2)-a represents that dependent is w2, but governing word is not w1The sum of interdependent tuple;
D=N-a-b-c represents that governing word is not w1And dependent is not w2The sum of interdependent tuple;
N represents the sum of the whole interdependent tuple that corpus comprises;
Step 4.2: by formula (2), calculates meaning of a word si and interdependent co-occurrence word wamodAnd wadvmodThe interdependent vocabulary association degree of entirety;
relatedness(si)=relatedness(wamod,Wsi)+relatedness(Wsi,wadvmod)(2)
Wherein,
WsiRepresent the relevant word set of the meaning of a word si obtained by step one;
Step 5, the meaning of a word maximum for overall interdependent vocabulary association degree is judged to the correct meaning of a word;Particularly as follows:
The interdependent vocabulary association degree of entirety of each meaning of a word relatively obtained by step 4.2, is judged to the correct meaning of a word of ambiguity word by the meaning of a word maximum for interdependent vocabulary association degree;
Through the operation of above step, namely can determine that the meaning of a word of adjective ambiguity word, complete word sense disambiguation task.
CN201610048601.9A 2016-01-26 2016-01-26 Adjective word sense disambiguation method based on dependency vocabulary association degree Pending CN105718443A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610048601.9A CN105718443A (en) 2016-01-26 2016-01-26 Adjective word sense disambiguation method based on dependency vocabulary association degree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610048601.9A CN105718443A (en) 2016-01-26 2016-01-26 Adjective word sense disambiguation method based on dependency vocabulary association degree

Publications (1)

Publication Number Publication Date
CN105718443A true CN105718443A (en) 2016-06-29

Family

ID=56154948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610048601.9A Pending CN105718443A (en) 2016-01-26 2016-01-26 Adjective word sense disambiguation method based on dependency vocabulary association degree

Country Status (1)

Country Link
CN (1) CN105718443A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590168A (en) * 2016-07-08 2018-01-16 百度(美国)有限责任公司 The system and method inferred for relation
CN107958042A (en) * 2017-11-23 2018-04-24 维沃移动通信有限公司 A kind of method for pushing and mobile terminal of target special topic
CN108509449A (en) * 2017-02-24 2018-09-07 腾讯科技(深圳)有限公司 A kind of method and server of information processing
CN108563643A (en) * 2018-03-27 2018-09-21 常熟鑫沐奇宝软件开发有限公司 A kind of polysemy interpretation method based on artificial intelligence knowledge mapping
CN110991196A (en) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
CN113407717A (en) * 2021-05-28 2021-09-17 数库(上海)科技有限公司 Method, device, equipment and storage medium for eliminating ambiguity of industry words in news

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
PING CHEN 等: "A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge", 《ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *
刘鹏远: "基于双语词汇Web间接关联的无指导译文消歧", 《软件学报》 *
鹿文鹏 等: "基于依存适配度的知识自动获取词义消歧方法", 《软件学报》 *
鹿文鹏: "基于依存和领域知识的词义消歧方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590168A (en) * 2016-07-08 2018-01-16 百度(美国)有限责任公司 The system and method inferred for relation
CN107590168B (en) * 2016-07-08 2023-06-16 百度(美国)有限责任公司 System and method for relationship inference
CN108509449A (en) * 2017-02-24 2018-09-07 腾讯科技(深圳)有限公司 A kind of method and server of information processing
CN107958042A (en) * 2017-11-23 2018-04-24 维沃移动通信有限公司 A kind of method for pushing and mobile terminal of target special topic
CN107958042B (en) * 2017-11-23 2020-09-08 维沃移动通信有限公司 Target topic pushing method and mobile terminal
CN108563643A (en) * 2018-03-27 2018-09-21 常熟鑫沐奇宝软件开发有限公司 A kind of polysemy interpretation method based on artificial intelligence knowledge mapping
CN108563643B (en) * 2018-03-27 2021-10-01 常熟鑫沐奇宝软件开发有限公司 Artificial intelligence knowledge graph-based word polysemous translation method
CN110991196A (en) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
CN110991196B (en) * 2019-12-18 2021-10-26 北京百度网讯科技有限公司 Translation method and device for polysemous words, electronic equipment and medium
US11275904B2 (en) 2019-12-18 2022-03-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for translating polysemy, and medium
JP7196145B2 (en) 2019-12-18 2022-12-26 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Polysemous translation method, polysemous translation device, electronic device and medium
CN113407717A (en) * 2021-05-28 2021-09-17 数库(上海)科技有限公司 Method, device, equipment and storage medium for eliminating ambiguity of industry words in news

Similar Documents

Publication Publication Date Title
CN105718443A (en) Adjective word sense disambiguation method based on dependency vocabulary association degree
CN102799577B (en) A kind of Chinese inter-entity semantic relation extraction method
Ferrández et al. Aligning FrameNet and WordNet based on Semantic Neighborhoods.
CN109062892A (en) A kind of Chinese sentence similarity calculating method based on Word2Vec
CN104463754B (en) The method for building up of medical information ontology database based on genius morbi
CN106202034B (en) A kind of adjective word sense disambiguation method and device based on interdependent constraint and knowledge
CN104484845A (en) Disease self-analysis method based on medical ontology database
CN108776940A (en) A kind of intelligent food and drink proposed algorithm excavated based on text comments
Davies Google Scholar and COCA-Academic: Two very different approaches to examining academic English
CN112347204B (en) Method and device for constructing drug research and development knowledge base
Wang et al. Pattern-based synonym and antonym extraction
CN109101488B (en) Word semantic similarity calculation method based on known network
CN103412855A (en) Method and system for automatic identification of relative words in complex sentence of modern Chinese language
CN105488098A (en) Field difference based new word extraction method
CN102541837A (en) Method for correcting inputted Chinese characters
CN108319584A (en) A kind of new word discovery method based on the microblogging class short text for improving FP-Growth algorithms
Lipski On the tenacity of Andean Spanish
Weller et al. Using noun class information to model selectional preferences for translating prepositions in smt
Sheth et al. Stemming techniques and naïve approach for Gujarati stemmer
Nastase et al. knoWitiary: A Machine Readable Incarnation of Wiktionary.
Koptjevskaja-Tamm et al. Lexical typology in morphology
Henrich et al. Consistency of manual sense annotation and integration into the tüba-d/z treebank
Shan et al. Sentence similarity measure based on events and content words
Khan et al. Language Contact in Sanandaj: A Study of the Impact of Iranian on Neo-Aramaic
Huang et al. E-game learning model for GIS education

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160629