CN105893346A - Graph model word sense disambiguation method based on dependency syntax tree - Google Patents

Graph model word sense disambiguation method based on dependency syntax tree Download PDF

Info

Publication number
CN105893346A
CN105893346A CN201610189859.0A CN201610189859A CN105893346A CN 105893346 A CN105893346 A CN 105893346A CN 201610189859 A CN201610189859 A CN 201610189859A CN 105893346 A CN105893346 A CN 105893346A
Authority
CN
China
Prior art keywords
word
meaning
sentence
syntax tree
disambiguation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610189859.0A
Other languages
Chinese (zh)
Inventor
鹿文鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201610189859.0A priority Critical patent/CN105893346A/en
Publication of CN105893346A publication Critical patent/CN105893346A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Abstract

The invention relates to a graph model word sense disambiguation method based on a dependency syntax tree, and belongs to the technical field of natural language processing. The method comprises the steps that 1, preprocessing is conducted on a sentence, notional words to be disambiguated are extracted, and normalization processing, word segmentation, word form reduction and the like are included; 2, dependency parsing is conducted on the sentence, and the dependency syntax tree of the sentence is constructed; 3, the distance of words in the sentence on the dependency syntax tree is obtained, that is to say, the length of the shortest path is obtained; 4, a disambiguation knowledge graph is constructed for word sense concepts of the words in the sentence according to a knowledge database; 5, graph rating values of all word sense nodes are calculated according to the semantic association path length between word sense nodes, weights of incidence edges and the distance of path endpoints on the dependency syntax tree in the disambiguation knowledge graph; 6, a greatest word sense of the graph rating values is selected as a correct word sense for each ambiguous word. According to the graph model word sense disambiguation method based on the dependency syntax tree, disambiguation potential of a graph model can be further exploited, and a good disambiguation effect is achieved.

Description

A kind of graph model Word sense disambiguation method based on interdependent syntax tree
Technical field
The present invention relates to a kind of Word sense disambiguation method, particularly to a kind of graph model word sense disambiguation based on interdependent syntax tree Method, belongs to natural language processing technique field.
Background technology
Word sense disambiguation refers to automatically judge its correct meaning of a word according to the context environmental residing for ambiguity word.Word sense disambiguation belongs to In the Floor layer Technology of natural language processing, its to natural languages such as machine translation, information retrieval, automatic question answering, sentiment analysis at Reason task is respectively provided with directly impact.
The Word sense disambiguation method in knowledge based storehouse can be divided into method based on measuring similarity and side based on graph model Method.The former, judge the correct meaning of a word by the similarity degree of the meaning of a word Yu context words that compare ambiguity word;The latter, according to knowing Knowing storehouse is that meaning of a word node builds disambiguation knowledge graph, utilizes the node importance evaluation method of graph model to comment meaning of a word node Point, thus judge the correct meaning of a word.In recent years, graph model is gradually paid attention to by researcher because of its good performance.
No matter method based on measuring similarity, the method being also based on graph model, its disambiguation performance is all by context The impact of related term.Context-sensitive word is different from the distance of ambiguity word, and the impact of the ambiguity word meaning of a word is also not quite similar by they. In word sense disambiguation graph model, how reasonably to embody the impact of distance, be a problem demanding prompt solution.
At present, graph model Word sense disambiguation method majority utilizes PageRank algorithm to comment the importance degree of meaning of a word node Valency.PageRank algorithm achieves immense success in a search engine, but this does not imply that it is same in word sense disambiguation field Effectively.For the particular demands of word sense disambiguation task, a kind of effective graph model node importance of design evaluates mechanism, the most also It it is a problem demanding prompt solution.
Summary of the invention
It is an object of the invention to the deficiency existed for current Word sense disambiguation method, propose a kind of based on interdependent syntax tree Graph model Word sense disambiguation method.
It is an object of the invention to be achieved through the following technical solutions.
A kind of graph model Word sense disambiguation method based on interdependent syntax tree, its concrete operation step is as follows.
Step one, sentence pre-processes and extracts the notional word treating disambiguation, mainly include standardization processing, hyphenation and word Shape reduction etc.;Specific as follows.
Step 1.1: use symbolSRepresent pending sentence.
Step 1.2: to sentenceSPre-process, mainly include standardization processing, hyphenation etc., it is thus achieved that pretreated sentence SonS’
Step 1.3: to sentenceS’In word carry out lemmatization.
Step 1.4: extractS’In treat the notional word of disambiguation, be stored in notional word setW
Step 2, sentence is carried out interdependent syntactic analysis, build its interdependent syntax tree;Specific as follows.
Step 2.1: use interdependent syntactic analysis instrument, to sentenceS’Carry out interdependent syntactic analysis, it is thus achieved that its interdependent tuple SetDSet
Step 2.2: according to interdependent tuple-setDSetIn tuple information, build interdependent syntax tree.
Word distance on interdependent syntax tree, the i.e. length of shortest path in step 3, acquisition sentence;Specific as follows.
Interdependent syntax tree is considered as a non-directed graph;Utilize dijkstra's algorithm or Floyd algorithm, calculate in figure any two The length of the shortest path between individual word node, obtains word distance on interdependent syntax tree.
Step 4, according to knowledge base, build disambiguation knowledge graph for the meaning of a word concept of notional word in sentence;Specific as follows.
Step 4.1: according to BabelNet knowledge base, extracts the semantic association path between whole notional words in sentence, builds Semantic association set of pathsR
Step 4.2: by semantic association set of pathsR, build disambiguation knowledge graphG
Step 5, according to semantic association path between meaning of a word node, the weight of incidence edge, road in disambiguation knowledge graph Footpath end points distance on interdependent syntax tree, calculates the figure score value of each meaning of a word node;Specific as follows.
Step 5.1: for meaning of a word nodes i , by disambiguation knowledge graphGWith semantic association set of pathsR, find all with it As the semantic association path of beginning or end, it is stored in set of paths
Step 5.2: for meaning of a word nodes i , according to formula (1), by semantic association set of paths, path end points is interdependent Distance on syntax tree, determines its figure score value jointly.
(1)
Wherein,pRepresent semantic association set of pathsIn a certain paths.eRepresent associated pathpIn a certain bar association Limit.w e For incidence edgeeWeight.It it is incidence edgeeWeight coefficient, for being labeled as " r " (i.e. Wikipedia Relations type) incidence edge, its weight coefficient is 0.3;For other type of incidence edge, weight coefficient is 0.7. Represent associated pathpThe distance on interdependent syntax tree of the word corresponding to two end points concepts;Weighting system for distance Number, is set to 2.
Step 6, for each ambiguity word, select the maximum meaning of a word of figure score value as the correct meaning of a word;Specific as follows.
For each ambiguity word, contrast the figure score value of its each meaning of a word, select the meaning of a word of figure score value maximum as just The really meaning of a word.
Through the operation of above step, word sense disambiguation in full can be completed and process.
Beneficial effect
The graph model Word sense disambiguation method based on interdependent syntax tree that the present invention proposes, utilizes BabelNet knowledge base for ambiguity word Each meaning of a word concept build disambiguation knowledge graph, consider the length of meaning of a word associated path in figure, the weight of incidence edge, path The word corresponding to end points concept beeline on interdependent syntax tree and determine the figure score value of meaning of a word concept node, for respectively The highest meaning of a word concept of ambiguity word selection figure score value is as the correct meaning of a word.Compared with traditional graph model Word sense disambiguation method, This invention introduces the beeline of word on interdependent syntax tree, proposes a kind of combination associated path length and incidence edge weight Graph model node importance evaluation method.The present invention can effectively embody the impact that node importance is evaluated by word distance, more Evaluate the significance level of meaning of a word node all-sidedly and accurately, it is possible to improve the effect of graph model word sense disambiguation.
Accompanying drawing explanation
Fig. 1 is the interdependent syntax tree in the specific embodiment of the invention.
Fig. 2 is the disambiguation knowledge graph in the specific embodiment of the invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings and specific embodiment, the present invention is described in further detail.
With sentence " the coach and athletes will leave for Shanghai by [train. " As a example by, whole notional words therein, i.e. coach, athlete, leave, Shanghai, train are carried out at full text word sense disambiguation Reason.
According to BabelNet 1.0 dictionary, the meaning of a word of above notional word is respectively as shown in table 1 ~ table 5.In BabelNet dictionary The meaning of a word derives from Wikipedia and WordNet.Generally using the WordNet meaning of a word as standard in word sense disambiguation evaluation and test, say for simplifying Bright, this example only lists the meaning of a word coming from WordNet.
The meaning of a word table of table 1 coach#n
The meaning of a word is numbered (BabelNet) Meaning of a word explanation The meaning of a word is numbered (WordNet)
bn:00020121n a carriage pulled by four horses with one driver coach#n#4
bn:00016240n a railcar where passengers ride coach#n#3
bn:00007329n a vehicle carrying many passengers; used for public transport; "he always rode the bus to work" coach#n#5
bn:00020120n a person who gives private instruction (as in singing, acting, etc.) coach#n#2
bn:00020119n (sports) someone in charge of training an athlete or a team coach#n#1
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1, #2, #3, #4, #5 represent in WordNet 3.0 Meaning of a word sequence number.
The meaning of a word table of table 2 athlete#n
The meaning of a word is numbered (BabelNet) Meaning of a word explanation The meaning of a word is numbered (WordNet)
bn:00006747n a person trained to compete in sports athlete#n#1
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1 represents the meaning of a word sequence number in WordNet 3.0.
The meaning of a word table of table 3 leave#v
The meaning of a word is numbered (BabelNet) Meaning of a word explanation The meaning of a word is numbered (WordNet)
bn:00090273v leave unchanged or undisturbed or refrain from taking; "leave it as is"; "leave the young fawn alone"; "leave the flowers that you see in the park behind" leave#v#4
bn:00090275v be survived by after one's death; "He left six children"; "At her death, she left behind her husband and 11 cats" leave#v#12
bn:00088482v leave behind unintentionally; "I forgot my umbrella in the restaurant"; "I left my keys inside the car and locked the doors" leave#v#14
bn:00090271v go and leave behind, either intentionally or by neglect or forgetfulness; "She left a mess when she moved out"; "His good luck finally left him"; "her husband left her after 20 years of marriage"; "she wept thinking she had been left behind" leave#v#2
bn:00087845v move out of or depart from; "leave the room"; "the fugitive has left the country" leave#v#5
bn:00088939v go away from a place; "At what time does your train leave"; "She didn't leave until midnight"; "The ship leaves at midnight" leave#v#1
bn:00083420v leave or give by will after one's death; "My aunt bequeathed me all her jewelry"; "My grandfather left me his entire estate" leave#v#10
bn:00088821v transmit (knowledge or skills); "give a secret to the Russians"; "leave your name and address here"; "impart a new skill to the students" leave#v#13
bn:00087695v put into the care or protection of someone; "He left the decision to his deputy"; "leave your child the nurse's care" leave#v#9
bn:00086604v remove oneself from an association with or participation in; "She wants to leave"; "The teenager left home"; "She left her position with the Red Cross"; "He left the Senate after two terms"; "after 20 years with the same company, she pulled up stakes" leave#v#8
bn:00090243v have as a result or residue; "The water left a mark on the silk dress"; "Her blood left a stain on the napkin" leave#v#7
bn:00082540v make a possibility or provide opportunity for; permit to be attainable or cause to remain; "This leaves no room for improvement"; "The evidence allows only one conclusion"; "allow for mistakes"; "leave lots of time for the trip"; "This procedure provides for lots of leeway" leave#v#6
bn:00090272v act or be so as to become in a specified state; "The inflation left them penniless"; "The president's remarks left us speechless" leave#v#3
bn:00090274v have left or have as a remainder; "That left the four of us"; "19 minus 8 leaves 11" leave#v#11
Wherein, bn represents BabelNet;V, #v represent that part of speech is verb;#1 ~ #14 represents the meaning of a word sequence in WordNet 3.0 Number.
The meaning of a word table of table 4 Shanghai#n
The meaning of a word is numbered (BabelNet) Meaning of a word explanation The meaning of a word is numbered (WordNet)
bn:00070893n the largest city of China; located in the east on the Pacific; one of the largest ports in the world Shanghai#n#1
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1 represents the meaning of a word sequence number in WordNet 3.0.
The meaning of a word table of table 5 train#n
The meaning of a word is numbered (BabelNet) Meaning of a word explanation The meaning of a word is numbered (WordNet)
bn:00066028n public transport provided by a line of railway carscoupled together and drawn by a locomotive; " express trains don't stop at Princeton Junction" train#n#1
bn:00037572n wheelwork consisting of a connected set of rotating gears by which force is transmitted or motion or torque is changed; "the fool got his tie caught in the geartrain" train#n#6
bn:00077914n piece of cloth forming the long back section of a gown that is drawn along the floor; "the bride's train was carried by her two young nephews" train#n#5
bn:00077913n a series of consequences wrought by an event; "it led to a train of disasters" train#n#4
bn:00015839n a procession (of wagons or mules or camels) traveling together in single file; "we were part of a caravan of almost a thousand camels"; "they joined the wagon train for safety" train#n#3
bn:00074684n a sequentially ordered set of things or events or ideas in which each successive member is related to the preceding; "a string of islands"; "train of mourners"; "a train of thought" train#n#2
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1, #2, #3, #4, #5, #6 represent at WordNet 3.0 In meaning of a word sequence number.
Step one, sentence pre-processes and extracts the notional word treating disambiguation, mainly include standardization processing, hyphenation and word Shape reduction etc.;Specific as follows.
Step 1.1: use symbolSRepresent pending sentence.
In this example,S=“the coach and athletes © will leave for Shanghai by 【train.”。
Step 1.2: to sentenceSPre-process, mainly include standardization processing, hyphenation etc., it is thus achieved that pretreated sentence SonS’
In this example,S’=“the coach and athletes will leave for Shanghai by train . ”。
Step 1.3: to sentenceS’In word carry out lemmatization.
The MorphAdorner kit provided by means of WordNet3.0 and Northwestern Univ USA in this example, completes word Shape reduction work.Only relating to a word " athletes " in this example, it will be reduced to " athlete ".
Step 1.4: extractS’In treat the notional word of disambiguation, be stored in notional word setW
In this example, comprise 5 notional words treating disambiguation altogether, respectively coach, athlete, leave, Shanghai, train。
Step 2, sentence is carried out interdependent syntactic analysis, build its interdependent syntax tree;Specific as follows.
Step 2.1: use interdependent syntactic analysis instrument, to sentenceS’Carry out interdependent syntactic analysis, it is thus achieved that its interdependent tuple SetDSet
The Stanford Parser parser using Stanford University to be provided in this example, uses EnglishPCFG.ser.gz language model, uses CCPropagatedDependencies parameter to allow to enter dependence Row folds and transmission processes.Lemmatization information in integrating step 1.3, available following interdependent tuple-setDSet,DSet= { det(coach-2, the-1)、nsubj(leave-6, coach-2)、conj_and(coach-2, athlete-4)、 nsubj(leave-6, athlete-4)、aux(leave-6, will-5)、prep_for(leave-6, Shanghai-8)、 prep_by(leave-6, train-10) }。
Step 2.2: according to interdependent tuple-setDSetIn tuple information, build interdependent syntax tree.
In this example, byDSetIn interdependent tuple data, interdependent syntax tree as shown in Figure 1 can be built.
Word distance on interdependent syntax tree, the i.e. length of shortest path in step 3, acquisition sentence;Specific as follows.
Interdependent syntax tree is considered as a non-directed graph;Utilize dijkstra's algorithm or Floyd algorithm, calculate in figure any two The length of the shortest path between individual word node, obtains word distance on interdependent syntax tree.
In this example, accompanying drawing 1 is considered as non-directed graph, utilizes dijkstra's algorithm to calculate the shortest path between each node successively The length in footpath, as shown in table 6.
Shortest path length between table 6 word node
the coach athlete will leave Shanghai train
the 0 1 2 3 2 3 3
coach 1 0 1 2 1 2 2
athlete 2 1 0 2 1 2 2
will 3 2 2 0 1 2 2
leave 2 1 1 1 0 1 1
Shanghai 3 2 2 2 1 0 2
train 3 2 2 2 1 2 0
From table 6, because accompanying drawing 1 is considered as non-directed graph, word distance is diagonally symmetrical.
Step 4, according to knowledge base, build disambiguation knowledge graph for the meaning of a word concept of notional word in sentence;Specific as follows.
Step 4.1: according to BabelNet knowledge base, extracts the semantic association path between whole notional words in sentence, builds Semantic association set of pathsR
In this example, the meaning of a word concept contained because of BabelNet and semantic association relation are than WordNet more horn of plenty.In order to enable Enough giving full play to the advantage of BabelNet, the present invention extracts in sentence between whole BabelNet meaning of a word concepts of whole notional words Semantic association relation.In order to ensure the quality of incidence relation of extraction, give up the length associated path more than 3, give up and there is ring Associated path, give up the weight of the incidence edge associated path less than 0.01.BabelNet word for 5 notional words in this example Justice concept, meets totally 1162, the semantic association path of conditions above, and wherein part path is as follows.
[bn:00006747n, ~, 0.03152, bn:00035713n, r, 0.05971, bn:00036014n, r, 0.02804, bn:00020119n]
[bn:00006747n, ~, 0.03182, bn:00008897n, ~, 0.10154, bn:00036014n, r, 0.02804, bn:00020119n]
[bn:00006747n, ~, 0.0187, bn:00074678n, r, 0.02084, bn:00020119n]
[bn:00066028n, gdis, 0.04991, bn:00015785n, ~, 0.0556, bn:00036420n, r, 0.11841, bn:00016240n]
[bn:00020119n, gmono, 0.03247, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n, gmono, 0.03076, bn:00006747n]
[bn:00808723n, r, 0.04456, bn:00045278n, @, 0.05508, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:02554145n, r, 0.01137, bn:00003403n, r, 0.03647, bn:00051309n, r, 0.01701, bn:00020119n]
[bn:00808723n, r, 0.02219, bn:00008805n, r, 0.03697, bn:00003403n, r, 0.01158, bn:02554145n]。
As a example by Article 1 path, this path is 3, comprises four meaning of a word nodes, wherein path end points bn: Two notional words (athlete and coach) in 00006747n with bn:00020119n corresponding sentence respectively;bn:00035713n It is that the middle of path closes tie-point with bn:00036014n.~, r represent different semantic association relations respectively.0.03152、 0.05971,0.02804 weight representing incidence edge respectively.
Step 4.2: by semantic association set of pathsR, build disambiguation knowledge graphG
In this example, according to semantic association setR, disambiguation knowledge graph as shown in Figure 2 can be built.Accompanying drawing 2 is only signal Figure, only depicts setRThe sub-fraction semantic association relation comprised.
Step 5, according to semantic association path between meaning of a word node, the weight of incidence edge, road in disambiguation knowledge graph Footpath end points distance on interdependent syntax tree, calculates the figure score value of each meaning of a word node;Specific as follows.
Step 5.1: for meaning of a word nodes i , by disambiguation knowledge graphGWith semantic association set of pathsR, find all with it As the semantic association path of beginning or end, it is stored in set of paths
In this example, by disambiguation knowledge graphGWith semantic association set of pathsR, the beginning and end in comparison path one by one, can obtain To meaning of a word nodes i Relevant associated path.
As a example by meaning of a word concept bn:00020119n, its introductory path totally 57, can be obtained itAs follows.
[bn:00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n, gmono, 0.03338, bn:00006747n]
[bn:00020119n, +, 0.0766, bn:00085223v, gdis, 0.01403, bn:00006759n, r, 0.01589, bn:01228222n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gmono, 0.18966, bn:00008897n, gdis, 0.04801, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gdis, 0.08358, bn:00008205n, gmono, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gdis, 0.10569, bn:00076528n, gdis, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gmono, 0.10569, bn:00076528n, @, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gdis, 0.18966, bn:00008897n, gmono, 0.04801, bn:00006747n]
[bn:00020119n, gmono, 0.03247, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n, gmono, 0.03076, bn:00006747n]
[bn:00020119n, r, 0.10964, bn:01228222n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gmono, 0.09436, bn:00035713n, gdis, 0.03338, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gmono, 0.17857, bn:00044335n, gdis, 0.05689, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gmono, 0.17857, bn:00044335n, @, 0.05689, bn:00006747n]
[bn:00020119n, gdis, 0.0766, bn:00085223v, gdis, 0.01403, bn:00006759n, r, 0.01589, bn:01228222n]
[bn:00020119n, r, 0.30744, bn:00006547n, r, 0.02294, bn:00074678n, @, 0.3871, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gmono, 0.10569, bn:00076528n, gdis, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gmono, 0.09436, bn:00035713n, gmono, 0.03338, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n, gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, @i, 0.10159, bn:00008205n, gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n, @, 0.03338, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gdis, 0.17857, bn:00044335n, gmono, 0.05689, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gmono, 0.10159, bn:00008205n, gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gdis, 0.17857, bn:00044335n, @, 0.05689, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gdis, 0.08358, bn:00008205n, gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gmono, 0.09436, bn:00035713n, @, 0.03338, bn:00006747n]
[bn:00020119n, gdis, 0.0116, bn:00073699n, r, 0.10336, bn:00006759n, r, 0.01589, bn:01228222n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n, gdis, 0.03338, bn:00006747n]
[bn:00020119n, gdis, 0.03247, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gdis, 0.10569, bn:00076528n, gmono, 0.02831, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, @i, 0.10159, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gdis, 0.10569, bn:00076528n, @, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.06078, bn:00021660n, gmono, 0.02708, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gmono, 0.17857, bn:00044335n, gmono, 0.05689, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gdis, 0.18966, bn:00008897n, gdis, 0.04801, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n, gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gdis, 0.18966, bn:00008897n, @, 0.04801, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, @i, 0.10159, bn:00008205n, gmono, 0.03076, bn:00006747n]
[bn:00020119n, r, 0.30975, bn:00003403n, r, 0.01158, bn:02554145n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n, gmono, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gmono, 0.10569, bn:00076528n, gmono, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.06078, bn:00021660n, gdis, 0.02708, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gdis, 0.08358, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gmono, 0.18966, bn:00008897n, @, 0.04801, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gmono, 0.18966, bn:00008897n, gmono, 0.04801, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gmono, 0.10159, bn:00008205n, gmono, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gdis, 0.17857, bn:00044335n, gdis, 0.05689, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gmono, 0.10159, bn:00008205n, @, 0.03076, bn:00006747n]
[bn:01228222n, r, 0.09407, bn:00020119n]
[bn:00006747n, ~, 0.03152, bn:00035713n, r, 0.05971, bn:00036014n, r, 0.02804, bn:00020119n]
[bn:00006747n, ~, 0.03182, bn:00008897n, ~, 0.10154, bn:00036014n, r, 0.02804, bn:00020119n]
[bn:02554145n, r, 0.01035, bn:00006547n, r, 0.01303, bn:00036014n, r, 0.02804, bn:00020119n]
[bn:00006747n, ~, 0.0187, bn:00074678n, r, 0.02084, bn:00020119n]
[bn:00006747n, ~, 0.02777, bn:00008205n, ~i, 0.03802, bn:00048315n, @i, 0.20541, bn:00020119n]
[bn:02554145n, r, 0.01035, bn:00006547n, r, 0.02294, bn:00074678n, r, 0.02084, bn:00020119n]
[bn:02554145n, r, 0.01137, bn:00003403n, r, 0.03647, bn:00051309n, r, 0.01701, bn:00020119n]。
By disambiguation knowledge graphGWith semantic association set of pathsR, the quantity in semantic association path of each meaning of a word node can be obtained such as Shown in table 7.
The quantity table in the semantic association path of table 7 each meaning of a word node
Meaning of a word numbering (BabelNet) Meaning of a word numbering (WordNet) Number of paths
bn:00020121n coach#n#4 24
bn:00016240n coach#n#3 258
bn:00007329n coach#n#5 222
bn:00020120n coach#n#2 1
bn:00020119n coach#n#1 57
bn:00006747n athlete#n#1 52
bn:00090273v leave#v#4 0
bn:00090275v leave#v#12 0
bn:00088482v leave#v#14 0
bn:00090271v leave#v#2 0
bn:00087845v leave#v#5 6
bn:00088939v leave#v#1 0
bn:00083420v leave#v#10 0
bn:00088821v leave#v#13 0
bn:00087695v leave#v#9 1
bn:00086604v leave#v#8 0
bn:00090243v leave#v#7 2
bn:00082540v leave#v#6 0
bn:00090272v leave#v#3 0
bn:00090274v leave#v#11 0
bn:00070893n Shanghai#n#1 11
bn:00066028n train#n#1 496
bn:00037572n train#n#6 1
bn:00077914n train#n#5 0
bn:00077913n train#n#4 2
bn:00015839n train#n#3 12
bn:00074684n train#n#2 0
Symbolic significance in table 7 is with table 1 ~ table 5.
Step 5.2: for meaning of a word nodes i , according to formula (1), by semantic association set of paths, path end points depending on Deposit the distance on syntax tree, jointly determine its figure score value.
(1)
Wherein,pRepresent semantic association set of pathsIn a certain paths.eRepresent associated pathpIn a certain bar association Limit.w e For incidence edgeeWeight.It it is incidence edgeeWeight coefficient, for being labeled as " r " (i.e. Wikipedia Relations type) incidence edge, its weight coefficient is 0.3;For other type of incidence edge, weight coefficient is 0.7. Represent associated pathpThe distance on interdependent syntax tree of the word corresponding to two end points concepts;For distanceWeighting Coefficient, is set to 2.
As a example by meaning of a word concept bn:00020119n, from step 5.1, its introductory path setComprise 57 altogether Path.
First each paths score value to bn:00020119n is calculated respectively by formula (1).With path[bn: 00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n, gmono, 0.03338, bn:00006747n]As a example by.The length in this pathpIt is 3, end points conceptbn:00020119nWithbn: 00006747nCorresponding word coach and athlete respectively, as shown in Table 6, its beeline on interdependent syntax treedIt is 1, Then this path is as follows to the score value of bn:00020119n.
In like manner, can calculate successivelyIn other path score value to meaning of a word concept bn:00020119n.
Being added up by each score value by formula (1), the total figure score value that can obtain meaning of a word concept bn:00020119n is 10.700425261762511。
In like manner, notional word set can be calculated successivelyWThe figure score value of other corresponding meaning of a word node, as shown in table 8.
The figure score value of table 8 each meaning of a word node
Meaning of a word numbering (BabelNet) Meaning of a word numbering (WordNet) Figure score value
bn:00020121n coach#n#4 1.0082584099
bn:00016240n coach#n#3 11.4882290706
bn:00007329n coach#n#5 10.5894412402
bn:00020120n coach#n#2 0.170904903
bn:00020119n coach#n#1 13.3931907933
bn:00006747n athlete#n#1 10.7004252618
bn:00090273v leave#v#4 0
bn:00090275v leave#v#12 0
bn:00088482v leave#v#14 0
bn:00090271v leave#v#2 0
bn:00087845v leave#v#5 0.9645209914
bn:00088939v leave#v#1 0
bn:00083420v leave#v#10 0
bn:00088821v leave#v#13 0
bn:00087695v leave#v#9 0.170904903
bn:00086604v leave#v#8 0
bn:00090243v leave#v#7 0.4209186144
bn:00082540v leave#v#6 0
bn:00090272v leave#v#3 0
bn:00090274v leave#v#11 0
bn:00070893n Shanghai#n#1 0.3871979381
bn:00066028n train#n#1 22.9460264215
bn:00037572n train#n#6 0.0374394109
bn:00077914n train#n#5 0
bn:00077913n train#n#4 0.4209186144
bn:00015839n train#n#3 0.5335290356
bn:00074684n train#n#2 0
Symbolic significance in table 8 is with table 1 ~ table 5.
Step 6, for each ambiguity word, select the maximum meaning of a word of figure score value as the correct meaning of a word;Specific as follows.
For each ambiguity word, contrast the figure score value of its each meaning of a word, select the meaning of a word of figure score value maximum as just The really meaning of a word.
In this example, contrast the figure score value of each meaning of a word of ambiguity word according to table 8, it is known that: the correct meaning of a word of coach is Bn:00020119n(coach#n#1), the correct meaning of a word of athlete is bn:00006747n(athlete#n#1), leave The correct meaning of a word is bn:00087845v(leave#v#5), the correct meaning of a word of Shanghai be bn:00070893n(Shanghai# N#1), the correct meaning of a word of train is bn:00066028n(train#n#1).
Through the operation of above step, word sense disambiguation in full can be completed and process.
In conjunction with former sentence and table 1 ~ table 5, it is known that the disambiguation result of above five notional words is all correct.
As it has been described above, the invention provides a kind of graph model Word sense disambiguation method based on interdependent syntax tree.User only needs Input sentence, system will carry out disambiguation process automatically according to interdependent syntax tree and graph model to the whole notional words in sentence.
Above-described specific descriptions, have been described in detail purpose, technical scheme and the beneficial effect of invention, have been answered Be understood by, the foregoing is only the specific embodiment of the present invention, the protection domain being not intended to limit the present invention, all Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included in the protection of the present invention Within the scope of.

Claims (1)

1. a graph model Word sense disambiguation method based on interdependent syntax tree, it is characterised in that: its concrete operation step is:
Step one, sentence pre-processes and extracts the notional word treating disambiguation, mainly include standardization processing, hyphenation and morphology also Former etc.;Specific as follows;
Step 1.1: use symbolSRepresent pending sentence;
Step 1.2: to sentenceSPre-process, mainly include standardization processing, hyphenation etc., it is thus achieved that pretreated sentenceS’
Step 1.3: to sentenceS’In word carry out lemmatization;
Step 1.4: extractS’In treat the notional word of disambiguation, be stored in notional word setW
Step 2, sentence is carried out interdependent syntactic analysis, build its interdependent syntax tree;Specific as follows;
Step 2.1: use interdependent syntactic analysis instrument, to sentenceS’Carry out interdependent syntactic analysis, it is thus achieved that its interdependent tuple-setDSet
Step 2.2: according to interdependent tuple-setDSetIn tuple information, build interdependent syntax tree;
Word distance on interdependent syntax tree, the i.e. length of shortest path in step 3, acquisition sentence;Specific as follows;
Interdependent syntax tree is considered as a non-directed graph;Utilize dijkstra's algorithm or Floyd algorithm, calculate any two word in figure The length of the shortest path between language node, obtains word distance on interdependent syntax tree;
Step 4, according to knowledge base, build disambiguation knowledge graph for the meaning of a word concept of notional word in sentence;Specific as follows;
Step 4.1: according to BabelNet knowledge base, extracts the semantic association path between whole notional words in sentence, builds semanteme Associated path setR
Step 4.2: by semantic association set of pathsR, build disambiguation knowledge graphG
Step 5, according to semantic association path between meaning of a word node, the weight of incidence edge, track end in disambiguation knowledge graph Point distance on interdependent syntax tree, calculates the figure score value of each meaning of a word node;Specific as follows;
Step 5.1: for meaning of a word nodes i , by disambiguation knowledge graphGWith semantic association set of pathsR, find all using it as rising Point or the semantic association path of terminal, be stored in set of paths
Step 5.2: for meaning of a word nodes i , according to formula (1), by semantic association set of paths, path end points is at interdependent sentence Distance on method tree, determines its figure score value jointly;
(1)
Wherein,pRepresent semantic association set of pathsIn a certain paths;eRepresent associated pathpIn a certain bar association Limit;w e For incidence edgeeWeight;It it is incidence edgeeWeight coefficient, for being labeled as " r " (i.e. Wikipedia Relations type) incidence edge, its weight coefficient is 0.3;For other type of incidence edge, weight coefficient is 0.7; Represent associated pathpThe distance on interdependent syntax tree of the word corresponding to two end points concepts;For distanceWeighting Coefficient, is set to 2;
Step 6, for each ambiguity word, select the maximum meaning of a word of figure score value as the correct meaning of a word;Specific as follows;
For each ambiguity word, contrast the figure score value of its each meaning of a word, select the meaning of a word of figure score value maximum as correct word Justice;
Through the operation of above step, word sense disambiguation in full can be completed and process.
CN201610189859.0A 2016-03-30 2016-03-30 Graph model word sense disambiguation method based on dependency syntax tree Pending CN105893346A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610189859.0A CN105893346A (en) 2016-03-30 2016-03-30 Graph model word sense disambiguation method based on dependency syntax tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610189859.0A CN105893346A (en) 2016-03-30 2016-03-30 Graph model word sense disambiguation method based on dependency syntax tree

Publications (1)

Publication Number Publication Date
CN105893346A true CN105893346A (en) 2016-08-24

Family

ID=57014391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610189859.0A Pending CN105893346A (en) 2016-03-30 2016-03-30 Graph model word sense disambiguation method based on dependency syntax tree

Country Status (1)

Country Link
CN (1) CN105893346A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107656921A (en) * 2017-10-10 2018-02-02 上海数眼科技发展有限公司 A kind of short text dependency analysis method based on deep learning
CN107957991A (en) * 2017-12-05 2018-04-24 湖南星汉数智科技有限公司 A kind of entity attribute information extraction method and device relied on based on syntax
CN108446266A (en) * 2018-02-01 2018-08-24 阿里巴巴集团控股有限公司 A kind of method, apparatus and equipment that sentence is split
CN108664468A (en) * 2018-05-02 2018-10-16 武汉烽火普天信息技术有限公司 A kind of name recognition methods and device based on dictionary and semantic disambiguation
CN109271621A (en) * 2017-07-18 2019-01-25 腾讯科技(北京)有限公司 Semanteme disambiguates processing method, device and its equipment
CN109359303A (en) * 2018-12-10 2019-02-19 枣庄学院 A kind of Word sense disambiguation method and system based on graph model
CN109614620A (en) * 2018-12-10 2019-04-12 齐鲁工业大学 A kind of graph model Word sense disambiguation method and system based on HowNet
CN110674640A (en) * 2019-09-25 2020-01-10 北京明略软件系统有限公司 Chinese name acquisition method, and training method and device of Chinese name extraction model
CN112099764A (en) * 2020-08-13 2020-12-18 南京航空航天大学 Formal conversion rule-based avionics field requirement standardization method
CN112214999A (en) * 2020-09-30 2021-01-12 内蒙古科技大学 Word meaning disambiguation method and device based on combination of graph model and word vector

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval
US20120143597A1 (en) * 2008-04-18 2012-06-07 Biz360 Inc. System and Methods for Evaluating Feature Opinions for Products, Services, and Entities

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120143597A1 (en) * 2008-04-18 2012-06-07 Biz360 Inc. System and Methods for Evaluating Feature Opinions for Products, Services, and Entities
CN101510221A (en) * 2009-02-17 2009-08-19 北京大学 Enquiry statement analytical method and system for information retrieval

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
鹿文鹏: "基于依存和领域知识的词义消歧方法研究", 《中国博士学位论文全文数据库 信息科技辑》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271621A (en) * 2017-07-18 2019-01-25 腾讯科技(北京)有限公司 Semanteme disambiguates processing method, device and its equipment
CN109271621B (en) * 2017-07-18 2023-04-18 腾讯科技(北京)有限公司 Semantic disambiguation processing method, device and equipment
CN107656921A (en) * 2017-10-10 2018-02-02 上海数眼科技发展有限公司 A kind of short text dependency analysis method based on deep learning
CN107957991A (en) * 2017-12-05 2018-04-24 湖南星汉数智科技有限公司 A kind of entity attribute information extraction method and device relied on based on syntax
CN108446266B (en) * 2018-02-01 2022-03-22 创新先进技术有限公司 Statement splitting method, device and equipment
CN108446266A (en) * 2018-02-01 2018-08-24 阿里巴巴集团控股有限公司 A kind of method, apparatus and equipment that sentence is split
CN108664468A (en) * 2018-05-02 2018-10-16 武汉烽火普天信息技术有限公司 A kind of name recognition methods and device based on dictionary and semantic disambiguation
CN109359303A (en) * 2018-12-10 2019-02-19 枣庄学院 A kind of Word sense disambiguation method and system based on graph model
CN109614620A (en) * 2018-12-10 2019-04-12 齐鲁工业大学 A kind of graph model Word sense disambiguation method and system based on HowNet
CN109359303B (en) * 2018-12-10 2023-04-07 枣庄学院 Word sense disambiguation method and system based on graph model
CN109614620B (en) * 2018-12-10 2023-01-17 齐鲁工业大学 HowNet-based graph model word sense disambiguation method and system
CN110674640B (en) * 2019-09-25 2022-10-25 北京明略软件系统有限公司 Chinese name acquisition method, and training method and device of Chinese name extraction model
CN110674640A (en) * 2019-09-25 2020-01-10 北京明略软件系统有限公司 Chinese name acquisition method, and training method and device of Chinese name extraction model
CN112099764B (en) * 2020-08-13 2022-03-15 南京航空航天大学 Formal conversion rule-based avionics field requirement standardization method
CN112099764A (en) * 2020-08-13 2020-12-18 南京航空航天大学 Formal conversion rule-based avionics field requirement standardization method
CN112214999A (en) * 2020-09-30 2021-01-12 内蒙古科技大学 Word meaning disambiguation method and device based on combination of graph model and word vector

Similar Documents

Publication Publication Date Title
CN105893346A (en) Graph model word sense disambiguation method based on dependency syntax tree
O'Brien Introductory thanksgivings in the letters of Paul
Barr The concept of biblical theology: An Old Testament perspective
Brown et al. Peter in the New Testament: A Collaborative Assessment by Protestant and Roman Catholic Scholars
Behr The way to Nicaea
Harris Testimonies: Volume 2
Mullins The Axioms of Religion
Hildebrand The Trinitarian Theology of Basil of Caesarea: A Synthesis of Greek Thought and Biblical Truth
Hafemann Paul, Moses, and the history of Israel: The letter/Spirit contrast and the argument from scripture in 2 Corinthians 3
CN104750676B (en) Machine translation processing method and processing device
Spivak Translating in a World of Languages
Driver An Introduction to the Literature of the Old Testament
Bowie Women's suffrage in Thailand: a Southeast Asian historiographical challenge
Harris The Odes and Psalms of Solomon
Little Consider the hermaphroditic mind: Comment on “The interplay of evidential constraints and political interests: Recent archaeological research on gender”
Rapoport-Albert et al. Late Aramaic: The Literary and Linguistic Context of the Zohar
Porter et al. The Gospel of John in Modern Interpretation
CN105718442A (en) Word sense disambiguation method based on syntactic analysis
Conway The making of Latin: an introduction to Latin, Greek and English etymology
Ebihara Evidentiality of the Tibetan Verb snang
Ruzhekova-Rogozherova Teaching English passive contrastively and in comparison with other categories
Kilgour The Rule against the Use of Legislative History: Canon of Construction or Counsel of Action
e Habiba et al. A MARXIST FEMINIST STUDY OF MALE AND FEMALE IMAGES IN CHETAN BHAGAT’S ONE INDIAN GIRL
De Weerdt et al. Observations on possessive and existential constructions in Flemish Sign Language
Tavris No Precedent in Point: So What and Why Not

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160824