CN105893346A - Graph model word sense disambiguation method based on dependency syntax tree - Google Patents
Graph model word sense disambiguation method based on dependency syntax tree Download PDFInfo
- Publication number
- CN105893346A CN105893346A CN201610189859.0A CN201610189859A CN105893346A CN 105893346 A CN105893346 A CN 105893346A CN 201610189859 A CN201610189859 A CN 201610189859A CN 105893346 A CN105893346 A CN 105893346A
- Authority
- CN
- China
- Prior art keywords
- word
- meaning
- sentence
- syntax tree
- disambiguation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Abstract
The invention relates to a graph model word sense disambiguation method based on a dependency syntax tree, and belongs to the technical field of natural language processing. The method comprises the steps that 1, preprocessing is conducted on a sentence, notional words to be disambiguated are extracted, and normalization processing, word segmentation, word form reduction and the like are included; 2, dependency parsing is conducted on the sentence, and the dependency syntax tree of the sentence is constructed; 3, the distance of words in the sentence on the dependency syntax tree is obtained, that is to say, the length of the shortest path is obtained; 4, a disambiguation knowledge graph is constructed for word sense concepts of the words in the sentence according to a knowledge database; 5, graph rating values of all word sense nodes are calculated according to the semantic association path length between word sense nodes, weights of incidence edges and the distance of path endpoints on the dependency syntax tree in the disambiguation knowledge graph; 6, a greatest word sense of the graph rating values is selected as a correct word sense for each ambiguous word. According to the graph model word sense disambiguation method based on the dependency syntax tree, disambiguation potential of a graph model can be further exploited, and a good disambiguation effect is achieved.
Description
Technical field
The present invention relates to a kind of Word sense disambiguation method, particularly to a kind of graph model word sense disambiguation based on interdependent syntax tree
Method, belongs to natural language processing technique field.
Background technology
Word sense disambiguation refers to automatically judge its correct meaning of a word according to the context environmental residing for ambiguity word.Word sense disambiguation belongs to
In the Floor layer Technology of natural language processing, its to natural languages such as machine translation, information retrieval, automatic question answering, sentiment analysis at
Reason task is respectively provided with directly impact.
The Word sense disambiguation method in knowledge based storehouse can be divided into method based on measuring similarity and side based on graph model
Method.The former, judge the correct meaning of a word by the similarity degree of the meaning of a word Yu context words that compare ambiguity word;The latter, according to knowing
Knowing storehouse is that meaning of a word node builds disambiguation knowledge graph, utilizes the node importance evaluation method of graph model to comment meaning of a word node
Point, thus judge the correct meaning of a word.In recent years, graph model is gradually paid attention to by researcher because of its good performance.
No matter method based on measuring similarity, the method being also based on graph model, its disambiguation performance is all by context
The impact of related term.Context-sensitive word is different from the distance of ambiguity word, and the impact of the ambiguity word meaning of a word is also not quite similar by they.
In word sense disambiguation graph model, how reasonably to embody the impact of distance, be a problem demanding prompt solution.
At present, graph model Word sense disambiguation method majority utilizes PageRank algorithm to comment the importance degree of meaning of a word node
Valency.PageRank algorithm achieves immense success in a search engine, but this does not imply that it is same in word sense disambiguation field
Effectively.For the particular demands of word sense disambiguation task, a kind of effective graph model node importance of design evaluates mechanism, the most also
It it is a problem demanding prompt solution.
Summary of the invention
It is an object of the invention to the deficiency existed for current Word sense disambiguation method, propose a kind of based on interdependent syntax tree
Graph model Word sense disambiguation method.
It is an object of the invention to be achieved through the following technical solutions.
A kind of graph model Word sense disambiguation method based on interdependent syntax tree, its concrete operation step is as follows.
Step one, sentence pre-processes and extracts the notional word treating disambiguation, mainly include standardization processing, hyphenation and word
Shape reduction etc.;Specific as follows.
Step 1.1: use symbolSRepresent pending sentence.
Step 1.2: to sentenceSPre-process, mainly include standardization processing, hyphenation etc., it is thus achieved that pretreated sentence
SonS’。
Step 1.3: to sentenceS’In word carry out lemmatization.
Step 1.4: extractS’In treat the notional word of disambiguation, be stored in notional word setW。
Step 2, sentence is carried out interdependent syntactic analysis, build its interdependent syntax tree;Specific as follows.
Step 2.1: use interdependent syntactic analysis instrument, to sentenceS’Carry out interdependent syntactic analysis, it is thus achieved that its interdependent tuple
SetDSet。
Step 2.2: according to interdependent tuple-setDSetIn tuple information, build interdependent syntax tree.
Word distance on interdependent syntax tree, the i.e. length of shortest path in step 3, acquisition sentence;Specific as follows.
Interdependent syntax tree is considered as a non-directed graph;Utilize dijkstra's algorithm or Floyd algorithm, calculate in figure any two
The length of the shortest path between individual word node, obtains word distance on interdependent syntax tree.
Step 4, according to knowledge base, build disambiguation knowledge graph for the meaning of a word concept of notional word in sentence;Specific as follows.
Step 4.1: according to BabelNet knowledge base, extracts the semantic association path between whole notional words in sentence, builds
Semantic association set of pathsR。
Step 4.2: by semantic association set of pathsR, build disambiguation knowledge graphG。
Step 5, according to semantic association path between meaning of a word node, the weight of incidence edge, road in disambiguation knowledge graph
Footpath end points distance on interdependent syntax tree, calculates the figure score value of each meaning of a word node;Specific as follows.
Step 5.1: for meaning of a word nodes i , by disambiguation knowledge graphGWith semantic association set of pathsR, find all with it
As the semantic association path of beginning or end, it is stored in set of paths。
Step 5.2: for meaning of a word nodes i , according to formula (1), by semantic association set of paths, path end points is interdependent
Distance on syntax tree, determines its figure score value jointly.
(1)
Wherein,pRepresent semantic association set of pathsIn a certain paths.eRepresent associated pathpIn a certain bar association
Limit.w e For incidence edgeeWeight.It it is incidence edgeeWeight coefficient, for being labeled as " r " (i.e. Wikipedia
Relations type) incidence edge, its weight coefficient is 0.3;For other type of incidence edge, weight coefficient is 0.7.
Represent associated pathpThe distance on interdependent syntax tree of the word corresponding to two end points concepts;Weighting system for distance
Number, is set to 2.
Step 6, for each ambiguity word, select the maximum meaning of a word of figure score value as the correct meaning of a word;Specific as follows.
For each ambiguity word, contrast the figure score value of its each meaning of a word, select the meaning of a word of figure score value maximum as just
The really meaning of a word.
Through the operation of above step, word sense disambiguation in full can be completed and process.
Beneficial effect
The graph model Word sense disambiguation method based on interdependent syntax tree that the present invention proposes, utilizes BabelNet knowledge base for ambiguity word
Each meaning of a word concept build disambiguation knowledge graph, consider the length of meaning of a word associated path in figure, the weight of incidence edge, path
The word corresponding to end points concept beeline on interdependent syntax tree and determine the figure score value of meaning of a word concept node, for respectively
The highest meaning of a word concept of ambiguity word selection figure score value is as the correct meaning of a word.Compared with traditional graph model Word sense disambiguation method,
This invention introduces the beeline of word on interdependent syntax tree, proposes a kind of combination associated path length and incidence edge weight
Graph model node importance evaluation method.The present invention can effectively embody the impact that node importance is evaluated by word distance, more
Evaluate the significance level of meaning of a word node all-sidedly and accurately, it is possible to improve the effect of graph model word sense disambiguation.
Accompanying drawing explanation
Fig. 1 is the interdependent syntax tree in the specific embodiment of the invention.
Fig. 2 is the disambiguation knowledge graph in the specific embodiment of the invention.
Detailed description of the invention
Below in conjunction with the accompanying drawings and specific embodiment, the present invention is described in further detail.
With sentence " the coach and athletes will leave for Shanghai by [train. "
As a example by, whole notional words therein, i.e. coach, athlete, leave, Shanghai, train are carried out at full text word sense disambiguation
Reason.
According to BabelNet 1.0 dictionary, the meaning of a word of above notional word is respectively as shown in table 1 ~ table 5.In BabelNet dictionary
The meaning of a word derives from Wikipedia and WordNet.Generally using the WordNet meaning of a word as standard in word sense disambiguation evaluation and test, say for simplifying
Bright, this example only lists the meaning of a word coming from WordNet.
The meaning of a word table of table 1 coach#n
The meaning of a word is numbered (BabelNet) | Meaning of a word explanation | The meaning of a word is numbered (WordNet) |
bn:00020121n | a carriage pulled by four horses with one driver | coach#n#4 |
bn:00016240n | a railcar where passengers ride | coach#n#3 |
bn:00007329n | a vehicle carrying many passengers; used for public transport; "he always rode the bus to work" | coach#n#5 |
bn:00020120n | a person who gives private instruction (as in singing, acting, etc.) | coach#n#2 |
bn:00020119n | (sports) someone in charge of training an athlete or a team | coach#n#1 |
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1, #2, #3, #4, #5 represent in WordNet 3.0
Meaning of a word sequence number.
The meaning of a word table of table 2 athlete#n
The meaning of a word is numbered (BabelNet) | Meaning of a word explanation | The meaning of a word is numbered (WordNet) |
bn:00006747n | a person trained to compete in sports | athlete#n#1 |
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1 represents the meaning of a word sequence number in WordNet 3.0.
The meaning of a word table of table 3 leave#v
The meaning of a word is numbered (BabelNet) | Meaning of a word explanation | The meaning of a word is numbered (WordNet) |
bn:00090273v | leave unchanged or undisturbed or refrain from taking; "leave it as is"; "leave the young fawn alone"; "leave the flowers that you see in the park behind" | leave#v#4 |
bn:00090275v | be survived by after one's death; "He left six children"; "At her death, she left behind her husband and 11 cats" | leave#v#12 |
bn:00088482v | leave behind unintentionally; "I forgot my umbrella in the restaurant"; "I left my keys inside the car and locked the doors" | leave#v#14 |
bn:00090271v | go and leave behind, either intentionally or by neglect or forgetfulness; "She left a mess when she moved out"; "His good luck finally left him"; "her husband left her after 20 years of marriage"; "she wept thinking she had been left behind" | leave#v#2 |
bn:00087845v | move out of or depart from; "leave the room"; "the fugitive has left the country" | leave#v#5 |
bn:00088939v | go away from a place; "At what time does your train leave"; "She didn't leave until midnight"; "The ship leaves at midnight" | leave#v#1 |
bn:00083420v | leave or give by will after one's death; "My aunt bequeathed me all her jewelry"; "My grandfather left me his entire estate" | leave#v#10 |
bn:00088821v | transmit (knowledge or skills); "give a secret to the Russians"; "leave your name and address here"; "impart a new skill to the students" | leave#v#13 |
bn:00087695v | put into the care or protection of someone; "He left the decision to his deputy"; "leave your child the nurse's care" | leave#v#9 |
bn:00086604v | remove oneself from an association with or participation in; "She wants to leave"; "The teenager left home"; "She left her position with the Red Cross"; "He left the Senate after two terms"; "after 20 years with the same company, she pulled up stakes" | leave#v#8 |
bn:00090243v | have as a result or residue; "The water left a mark on the silk dress"; "Her blood left a stain on the napkin" | leave#v#7 |
bn:00082540v | make a possibility or provide opportunity for; permit to be attainable or cause to remain; "This leaves no room for improvement"; "The evidence allows only one conclusion"; "allow for mistakes"; "leave lots of time for the trip"; "This procedure provides for lots of leeway" | leave#v#6 |
bn:00090272v | act or be so as to become in a specified state; "The inflation left them penniless"; "The president's remarks left us speechless" | leave#v#3 |
bn:00090274v | have left or have as a remainder; "That left the four of us"; "19 minus 8 leaves 11" | leave#v#11 |
Wherein, bn represents BabelNet;V, #v represent that part of speech is verb;#1 ~ #14 represents the meaning of a word sequence in WordNet 3.0
Number.
The meaning of a word table of table 4 Shanghai#n
The meaning of a word is numbered (BabelNet) | Meaning of a word explanation | The meaning of a word is numbered (WordNet) |
bn:00070893n | the largest city of China; located in the east on the Pacific; one of the largest ports in the world | Shanghai#n#1 |
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1 represents the meaning of a word sequence number in WordNet 3.0.
The meaning of a word table of table 5 train#n
The meaning of a word is numbered (BabelNet) | Meaning of a word explanation | The meaning of a word is numbered (WordNet) |
bn:00066028n | public transport provided by a line of railway carscoupled together and drawn by a locomotive; " express trains don't stop at Princeton Junction" | train#n#1 |
bn:00037572n | wheelwork consisting of a connected set of rotating gears by which force is transmitted or motion or torque is changed; "the fool got his tie caught in the geartrain" | train#n#6 |
bn:00077914n | piece of cloth forming the long back section of a gown that is drawn along the floor; "the bride's train was carried by her two young nephews" | train#n#5 |
bn:00077913n | a series of consequences wrought by an event; "it led to a train of disasters" | train#n#4 |
bn:00015839n | a procession (of wagons or mules or camels) traveling together in single file; "we were part of a caravan of almost a thousand camels"; "they joined the wagon train for safety" | train#n#3 |
bn:00074684n | a sequentially ordered set of things or events or ideas in which each successive member is related to the preceding; "a string of islands"; "train of mourners"; "a train of thought" | train#n#2 |
Wherein, bn represents BabelNet;N, #n represent that part of speech is noun;#1, #2, #3, #4, #5, #6 represent at WordNet 3.0
In meaning of a word sequence number.
Step one, sentence pre-processes and extracts the notional word treating disambiguation, mainly include standardization processing, hyphenation and word
Shape reduction etc.;Specific as follows.
Step 1.1: use symbolSRepresent pending sentence.
In this example,S=“the coach and athletes © will leave for Shanghai by
【train.”。
Step 1.2: to sentenceSPre-process, mainly include standardization processing, hyphenation etc., it is thus achieved that pretreated sentence
SonS’。
In this example,S’=“the coach and athletes will leave for Shanghai by train .
”。
Step 1.3: to sentenceS’In word carry out lemmatization.
The MorphAdorner kit provided by means of WordNet3.0 and Northwestern Univ USA in this example, completes word
Shape reduction work.Only relating to a word " athletes " in this example, it will be reduced to " athlete ".
Step 1.4: extractS’In treat the notional word of disambiguation, be stored in notional word setW。
In this example, comprise 5 notional words treating disambiguation altogether, respectively coach, athlete, leave, Shanghai,
train。
Step 2, sentence is carried out interdependent syntactic analysis, build its interdependent syntax tree;Specific as follows.
Step 2.1: use interdependent syntactic analysis instrument, to sentenceS’Carry out interdependent syntactic analysis, it is thus achieved that its interdependent tuple
SetDSet。
The Stanford Parser parser using Stanford University to be provided in this example, uses
EnglishPCFG.ser.gz language model, uses CCPropagatedDependencies parameter to allow to enter dependence
Row folds and transmission processes.Lemmatization information in integrating step 1.3, available following interdependent tuple-setDSet,DSet=
{ det(coach-2, the-1)、nsubj(leave-6, coach-2)、conj_and(coach-2, athlete-4)、
nsubj(leave-6, athlete-4)、aux(leave-6, will-5)、prep_for(leave-6, Shanghai-8)、
prep_by(leave-6, train-10) }。
Step 2.2: according to interdependent tuple-setDSetIn tuple information, build interdependent syntax tree.
In this example, byDSetIn interdependent tuple data, interdependent syntax tree as shown in Figure 1 can be built.
Word distance on interdependent syntax tree, the i.e. length of shortest path in step 3, acquisition sentence;Specific as follows.
Interdependent syntax tree is considered as a non-directed graph;Utilize dijkstra's algorithm or Floyd algorithm, calculate in figure any two
The length of the shortest path between individual word node, obtains word distance on interdependent syntax tree.
In this example, accompanying drawing 1 is considered as non-directed graph, utilizes dijkstra's algorithm to calculate the shortest path between each node successively
The length in footpath, as shown in table 6.
Shortest path length between table 6 word node
the | coach | athlete | will | leave | Shanghai | train | |
the | 0 | 1 | 2 | 3 | 2 | 3 | 3 |
coach | 1 | 0 | 1 | 2 | 1 | 2 | 2 |
athlete | 2 | 1 | 0 | 2 | 1 | 2 | 2 |
will | 3 | 2 | 2 | 0 | 1 | 2 | 2 |
leave | 2 | 1 | 1 | 1 | 0 | 1 | 1 |
Shanghai | 3 | 2 | 2 | 2 | 1 | 0 | 2 |
train | 3 | 2 | 2 | 2 | 1 | 2 | 0 |
From table 6, because accompanying drawing 1 is considered as non-directed graph, word distance is diagonally symmetrical.
Step 4, according to knowledge base, build disambiguation knowledge graph for the meaning of a word concept of notional word in sentence;Specific as follows.
Step 4.1: according to BabelNet knowledge base, extracts the semantic association path between whole notional words in sentence, builds
Semantic association set of pathsR。
In this example, the meaning of a word concept contained because of BabelNet and semantic association relation are than WordNet more horn of plenty.In order to enable
Enough giving full play to the advantage of BabelNet, the present invention extracts in sentence between whole BabelNet meaning of a word concepts of whole notional words
Semantic association relation.In order to ensure the quality of incidence relation of extraction, give up the length associated path more than 3, give up and there is ring
Associated path, give up the weight of the incidence edge associated path less than 0.01.BabelNet word for 5 notional words in this example
Justice concept, meets totally 1162, the semantic association path of conditions above, and wherein part path is as follows.
[bn:00006747n, ~, 0.03152, bn:00035713n, r, 0.05971, bn:00036014n, r,
0.02804, bn:00020119n]
[bn:00006747n, ~, 0.03182, bn:00008897n, ~, 0.10154, bn:00036014n, r,
0.02804, bn:00020119n]
[bn:00006747n, ~, 0.0187, bn:00074678n, r, 0.02084, bn:00020119n]
[bn:00066028n, gdis, 0.04991, bn:00015785n, ~, 0.0556, bn:00036420n, r,
0.11841, bn:00016240n]
[bn:00020119n, gmono, 0.03247, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n,
gmono, 0.03076, bn:00006747n]
[bn:00808723n, r, 0.04456, bn:00045278n, @, 0.05508, bn:00008205n, @,
0.03076, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n, @,
0.03076, bn:00006747n]
[bn:02554145n, r, 0.01137, bn:00003403n, r, 0.03647, bn:00051309n, r,
0.01701, bn:00020119n]
[bn:00808723n, r, 0.02219, bn:00008805n, r, 0.03697, bn:00003403n, r,
0.01158, bn:02554145n]。
As a example by Article 1 path, this path is 3, comprises four meaning of a word nodes, wherein path end points bn:
Two notional words (athlete and coach) in 00006747n with bn:00020119n corresponding sentence respectively;bn:00035713n
It is that the middle of path closes tie-point with bn:00036014n.~, r represent different semantic association relations respectively.0.03152、
0.05971,0.02804 weight representing incidence edge respectively.
Step 4.2: by semantic association set of pathsR, build disambiguation knowledge graphG。
In this example, according to semantic association setR, disambiguation knowledge graph as shown in Figure 2 can be built.Accompanying drawing 2 is only signal
Figure, only depicts setRThe sub-fraction semantic association relation comprised.
Step 5, according to semantic association path between meaning of a word node, the weight of incidence edge, road in disambiguation knowledge graph
Footpath end points distance on interdependent syntax tree, calculates the figure score value of each meaning of a word node;Specific as follows.
Step 5.1: for meaning of a word nodes i , by disambiguation knowledge graphGWith semantic association set of pathsR, find all with it
As the semantic association path of beginning or end, it is stored in set of paths。
In this example, by disambiguation knowledge graphGWith semantic association set of pathsR, the beginning and end in comparison path one by one, can obtain
To meaning of a word nodes i Relevant associated path.
As a example by meaning of a word concept bn:00020119n, its introductory path totally 57, can be obtained itAs follows.
[bn:00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n,
gmono, 0.03338, bn:00006747n]
[bn:00020119n, +, 0.0766, bn:00085223v, gdis, 0.01403, bn:00006759n, r,
0.01589, bn:01228222n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gmono, 0.18966, bn:00008897n,
gdis, 0.04801, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gdis, 0.08358, bn:00008205n,
gmono, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gdis, 0.10569, bn:00076528n,
gdis, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gmono, 0.10569, bn:00076528n, @,
0.02831, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gdis, 0.18966, bn:00008897n,
gmono, 0.04801, bn:00006747n]
[bn:00020119n, gmono, 0.03247, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n,
gmono, 0.03076, bn:00006747n]
[bn:00020119n, r, 0.10964, bn:01228222n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gmono, 0.09436, bn:00035713n,
gdis, 0.03338, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gmono, 0.17857, bn:00044335n,
gdis, 0.05689, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n, @,
0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gmono, 0.17857, bn:00044335n, @,
0.05689, bn:00006747n]
[bn:00020119n, gdis, 0.0766, bn:00085223v, gdis, 0.01403, bn:00006759n,
r, 0.01589, bn:01228222n]
[bn:00020119n, r, 0.30744, bn:00006547n, r, 0.02294, bn:00074678n, @,
0.3871, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gmono, 0.10569, bn:00076528n,
gdis, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gmono, 0.09436, bn:00035713n,
gmono, 0.03338, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n,
gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, @i, 0.10159, bn:00008205n,
gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n, @,
0.03338, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gdis, 0.17857, bn:00044335n,
gmono, 0.05689, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gmono, 0.10159, bn:00008205n,
gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gdis, 0.17857, bn:00044335n, @,
0.05689, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gdis, 0.08358, bn:00008205n,
gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gmono, 0.09436, bn:00035713n, @,
0.03338, bn:00006747n]
[bn:00020119n, gdis, 0.0116, bn:00073699n, r, 0.10336, bn:00006759n, r,
0.01589, bn:01228222n]
[bn:00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n,
gdis, 0.03338, bn:00006747n]
[bn:00020119n, gdis, 0.03247, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gdis, 0.10569, bn:00076528n,
gmono, 0.02831, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, @i, 0.10159, bn:00008205n, @,
0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gdis, 0.10569, bn:00076528n, @,
0.02831, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n, @,
0.03076, bn:00006747n]
[bn:00020119n, ~, 0.06078, bn:00021660n, gmono, 0.02708, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gmono, 0.17857, bn:00044335n,
gmono, 0.05689, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gdis, 0.18966, bn:00008897n,
gdis, 0.04801, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gdis, 0.10159, bn:00008205n,
gdis, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gdis, 0.18966, bn:00008897n, @,
0.04801, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, @i, 0.10159, bn:00008205n,
gmono, 0.03076, bn:00006747n]
[bn:00020119n, r, 0.30975, bn:00003403n, r, 0.01158, bn:02554145n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gmono, 0.08358, bn:00008205n,
gmono, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00076524n, gmono, 0.10569, bn:00076528n,
gmono, 0.02831, bn:00006747n]
[bn:00020119n, ~, 0.06078, bn:00021660n, gdis, 0.02708, bn:00006747n]
[bn:00020119n, ~, 0.0665, bn:00008810n, gdis, 0.08358, bn:00008205n, @,
0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gmono, 0.18966, bn:00008897n, @,
0.04801, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00008892n, gmono, 0.18966, bn:00008897n,
gmono, 0.04801, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gmono, 0.10159, bn:00008205n,
gmono, 0.03076, bn:00006747n]
[bn:00020119n, ~, 0.07073, bn:00044333n, gdis, 0.17857, bn:00044335n,
gdis, 0.05689, bn:00006747n]
[bn:00020119n, ~i, 0.09063, bn:00048315n, gmono, 0.10159, bn:00008205n,
@, 0.03076, bn:00006747n]
[bn:01228222n, r, 0.09407, bn:00020119n]
[bn:00006747n, ~, 0.03152, bn:00035713n, r, 0.05971, bn:00036014n, r,
0.02804, bn:00020119n]
[bn:00006747n, ~, 0.03182, bn:00008897n, ~, 0.10154, bn:00036014n, r,
0.02804, bn:00020119n]
[bn:02554145n, r, 0.01035, bn:00006547n, r, 0.01303, bn:00036014n, r,
0.02804, bn:00020119n]
[bn:00006747n, ~, 0.0187, bn:00074678n, r, 0.02084, bn:00020119n]
[bn:00006747n, ~, 0.02777, bn:00008205n, ~i, 0.03802, bn:00048315n, @i,
0.20541, bn:00020119n]
[bn:02554145n, r, 0.01035, bn:00006547n, r, 0.02294, bn:00074678n, r,
0.02084, bn:00020119n]
[bn:02554145n, r, 0.01137, bn:00003403n, r, 0.03647, bn:00051309n, r,
0.01701, bn:00020119n]。
By disambiguation knowledge graphGWith semantic association set of pathsR, the quantity in semantic association path of each meaning of a word node can be obtained such as
Shown in table 7.
The quantity table in the semantic association path of table 7 each meaning of a word node
Meaning of a word numbering (BabelNet) | Meaning of a word numbering (WordNet) | Number of paths |
bn:00020121n | coach#n#4 | 24 |
bn:00016240n | coach#n#3 | 258 |
bn:00007329n | coach#n#5 | 222 |
bn:00020120n | coach#n#2 | 1 |
bn:00020119n | coach#n#1 | 57 |
bn:00006747n | athlete#n#1 | 52 |
bn:00090273v | leave#v#4 | 0 |
bn:00090275v | leave#v#12 | 0 |
bn:00088482v | leave#v#14 | 0 |
bn:00090271v | leave#v#2 | 0 |
bn:00087845v | leave#v#5 | 6 |
bn:00088939v | leave#v#1 | 0 |
bn:00083420v | leave#v#10 | 0 |
bn:00088821v | leave#v#13 | 0 |
bn:00087695v | leave#v#9 | 1 |
bn:00086604v | leave#v#8 | 0 |
bn:00090243v | leave#v#7 | 2 |
bn:00082540v | leave#v#6 | 0 |
bn:00090272v | leave#v#3 | 0 |
bn:00090274v | leave#v#11 | 0 |
bn:00070893n | Shanghai#n#1 | 11 |
bn:00066028n | train#n#1 | 496 |
bn:00037572n | train#n#6 | 1 |
bn:00077914n | train#n#5 | 0 |
bn:00077913n | train#n#4 | 2 |
bn:00015839n | train#n#3 | 12 |
bn:00074684n | train#n#2 | 0 |
Symbolic significance in table 7 is with table 1 ~ table 5.
Step 5.2: for meaning of a word nodes i , according to formula (1), by semantic association set of paths, path end points depending on
Deposit the distance on syntax tree, jointly determine its figure score value.
(1)
Wherein,pRepresent semantic association set of pathsIn a certain paths.eRepresent associated pathpIn a certain bar association
Limit.w e For incidence edgeeWeight.It it is incidence edgeeWeight coefficient, for being labeled as " r " (i.e. Wikipedia
Relations type) incidence edge, its weight coefficient is 0.3;For other type of incidence edge, weight coefficient is 0.7.
Represent associated pathpThe distance on interdependent syntax tree of the word corresponding to two end points concepts;For distanceWeighting
Coefficient, is set to 2.
As a example by meaning of a word concept bn:00020119n, from step 5.1, its introductory path setComprise 57 altogether
Path.
First each paths score value to bn:00020119n is calculated respectively by formula (1).With path[bn: 00020119n, ~, 0.06707, bn:00035706n, gdis, 0.09436, bn:00035713n, gmono, 0.03338, bn:00006747n]As a example by.The length in this pathpIt is 3, end points conceptbn:00020119nWithbn: 00006747nCorresponding word coach and athlete respectively, as shown in Table 6, its beeline on interdependent syntax treedIt is 1,
Then this path is as follows to the score value of bn:00020119n.
In like manner, can calculate successivelyIn other path score value to meaning of a word concept bn:00020119n.
Being added up by each score value by formula (1), the total figure score value that can obtain meaning of a word concept bn:00020119n is
10.700425261762511。
In like manner, notional word set can be calculated successivelyWThe figure score value of other corresponding meaning of a word node, as shown in table 8.
The figure score value of table 8 each meaning of a word node
Meaning of a word numbering (BabelNet) | Meaning of a word numbering (WordNet) | Figure score value |
bn:00020121n | coach#n#4 | 1.0082584099 |
bn:00016240n | coach#n#3 | 11.4882290706 |
bn:00007329n | coach#n#5 | 10.5894412402 |
bn:00020120n | coach#n#2 | 0.170904903 |
bn:00020119n | coach#n#1 | 13.3931907933 |
bn:00006747n | athlete#n#1 | 10.7004252618 |
bn:00090273v | leave#v#4 | 0 |
bn:00090275v | leave#v#12 | 0 |
bn:00088482v | leave#v#14 | 0 |
bn:00090271v | leave#v#2 | 0 |
bn:00087845v | leave#v#5 | 0.9645209914 |
bn:00088939v | leave#v#1 | 0 |
bn:00083420v | leave#v#10 | 0 |
bn:00088821v | leave#v#13 | 0 |
bn:00087695v | leave#v#9 | 0.170904903 |
bn:00086604v | leave#v#8 | 0 |
bn:00090243v | leave#v#7 | 0.4209186144 |
bn:00082540v | leave#v#6 | 0 |
bn:00090272v | leave#v#3 | 0 |
bn:00090274v | leave#v#11 | 0 |
bn:00070893n | Shanghai#n#1 | 0.3871979381 |
bn:00066028n | train#n#1 | 22.9460264215 |
bn:00037572n | train#n#6 | 0.0374394109 |
bn:00077914n | train#n#5 | 0 |
bn:00077913n | train#n#4 | 0.4209186144 |
bn:00015839n | train#n#3 | 0.5335290356 |
bn:00074684n | train#n#2 | 0 |
Symbolic significance in table 8 is with table 1 ~ table 5.
Step 6, for each ambiguity word, select the maximum meaning of a word of figure score value as the correct meaning of a word;Specific as follows.
For each ambiguity word, contrast the figure score value of its each meaning of a word, select the meaning of a word of figure score value maximum as just
The really meaning of a word.
In this example, contrast the figure score value of each meaning of a word of ambiguity word according to table 8, it is known that: the correct meaning of a word of coach is
Bn:00020119n(coach#n#1), the correct meaning of a word of athlete is bn:00006747n(athlete#n#1), leave
The correct meaning of a word is bn:00087845v(leave#v#5), the correct meaning of a word of Shanghai be bn:00070893n(Shanghai#
N#1), the correct meaning of a word of train is bn:00066028n(train#n#1).
Through the operation of above step, word sense disambiguation in full can be completed and process.
In conjunction with former sentence and table 1 ~ table 5, it is known that the disambiguation result of above five notional words is all correct.
As it has been described above, the invention provides a kind of graph model Word sense disambiguation method based on interdependent syntax tree.User only needs
Input sentence, system will carry out disambiguation process automatically according to interdependent syntax tree and graph model to the whole notional words in sentence.
Above-described specific descriptions, have been described in detail purpose, technical scheme and the beneficial effect of invention, have been answered
Be understood by, the foregoing is only the specific embodiment of the present invention, the protection domain being not intended to limit the present invention, all
Within the spirit and principles in the present invention, any modification, equivalent substitution and improvement etc. done, should be included in the protection of the present invention
Within the scope of.
Claims (1)
1. a graph model Word sense disambiguation method based on interdependent syntax tree, it is characterised in that: its concrete operation step is:
Step one, sentence pre-processes and extracts the notional word treating disambiguation, mainly include standardization processing, hyphenation and morphology also
Former etc.;Specific as follows;
Step 1.1: use symbolSRepresent pending sentence;
Step 1.2: to sentenceSPre-process, mainly include standardization processing, hyphenation etc., it is thus achieved that pretreated sentenceS’;
Step 1.3: to sentenceS’In word carry out lemmatization;
Step 1.4: extractS’In treat the notional word of disambiguation, be stored in notional word setW;
Step 2, sentence is carried out interdependent syntactic analysis, build its interdependent syntax tree;Specific as follows;
Step 2.1: use interdependent syntactic analysis instrument, to sentenceS’Carry out interdependent syntactic analysis, it is thus achieved that its interdependent tuple-setDSet;
Step 2.2: according to interdependent tuple-setDSetIn tuple information, build interdependent syntax tree;
Word distance on interdependent syntax tree, the i.e. length of shortest path in step 3, acquisition sentence;Specific as follows;
Interdependent syntax tree is considered as a non-directed graph;Utilize dijkstra's algorithm or Floyd algorithm, calculate any two word in figure
The length of the shortest path between language node, obtains word distance on interdependent syntax tree;
Step 4, according to knowledge base, build disambiguation knowledge graph for the meaning of a word concept of notional word in sentence;Specific as follows;
Step 4.1: according to BabelNet knowledge base, extracts the semantic association path between whole notional words in sentence, builds semanteme
Associated path setR;
Step 4.2: by semantic association set of pathsR, build disambiguation knowledge graphG;
Step 5, according to semantic association path between meaning of a word node, the weight of incidence edge, track end in disambiguation knowledge graph
Point distance on interdependent syntax tree, calculates the figure score value of each meaning of a word node;Specific as follows;
Step 5.1: for meaning of a word nodes i , by disambiguation knowledge graphGWith semantic association set of pathsR, find all using it as rising
Point or the semantic association path of terminal, be stored in set of paths;
Step 5.2: for meaning of a word nodes i , according to formula (1), by semantic association set of paths, path end points is at interdependent sentence
Distance on method tree, determines its figure score value jointly;
(1)
Wherein,pRepresent semantic association set of pathsIn a certain paths;eRepresent associated pathpIn a certain bar association
Limit;w e For incidence edgeeWeight;It it is incidence edgeeWeight coefficient, for being labeled as " r " (i.e. Wikipedia
Relations type) incidence edge, its weight coefficient is 0.3;For other type of incidence edge, weight coefficient is 0.7;
Represent associated pathpThe distance on interdependent syntax tree of the word corresponding to two end points concepts;For distanceWeighting
Coefficient, is set to 2;
Step 6, for each ambiguity word, select the maximum meaning of a word of figure score value as the correct meaning of a word;Specific as follows;
For each ambiguity word, contrast the figure score value of its each meaning of a word, select the meaning of a word of figure score value maximum as correct word
Justice;
Through the operation of above step, word sense disambiguation in full can be completed and process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610189859.0A CN105893346A (en) | 2016-03-30 | 2016-03-30 | Graph model word sense disambiguation method based on dependency syntax tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610189859.0A CN105893346A (en) | 2016-03-30 | 2016-03-30 | Graph model word sense disambiguation method based on dependency syntax tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105893346A true CN105893346A (en) | 2016-08-24 |
Family
ID=57014391
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610189859.0A Pending CN105893346A (en) | 2016-03-30 | 2016-03-30 | Graph model word sense disambiguation method based on dependency syntax tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105893346A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107656921A (en) * | 2017-10-10 | 2018-02-02 | 上海数眼科技发展有限公司 | A kind of short text dependency analysis method based on deep learning |
CN107957991A (en) * | 2017-12-05 | 2018-04-24 | 湖南星汉数智科技有限公司 | A kind of entity attribute information extraction method and device relied on based on syntax |
CN108446266A (en) * | 2018-02-01 | 2018-08-24 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and equipment that sentence is split |
CN108664468A (en) * | 2018-05-02 | 2018-10-16 | 武汉烽火普天信息技术有限公司 | A kind of name recognition methods and device based on dictionary and semantic disambiguation |
CN109271621A (en) * | 2017-07-18 | 2019-01-25 | 腾讯科技(北京)有限公司 | Semanteme disambiguates processing method, device and its equipment |
CN109359303A (en) * | 2018-12-10 | 2019-02-19 | 枣庄学院 | A kind of Word sense disambiguation method and system based on graph model |
CN109614620A (en) * | 2018-12-10 | 2019-04-12 | 齐鲁工业大学 | A kind of graph model Word sense disambiguation method and system based on HowNet |
CN110674640A (en) * | 2019-09-25 | 2020-01-10 | 北京明略软件系统有限公司 | Chinese name acquisition method, and training method and device of Chinese name extraction model |
CN112099764A (en) * | 2020-08-13 | 2020-12-18 | 南京航空航天大学 | Formal conversion rule-based avionics field requirement standardization method |
CN112214999A (en) * | 2020-09-30 | 2021-01-12 | 内蒙古科技大学 | Word meaning disambiguation method and device based on combination of graph model and word vector |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
US20120143597A1 (en) * | 2008-04-18 | 2012-06-07 | Biz360 Inc. | System and Methods for Evaluating Feature Opinions for Products, Services, and Entities |
-
2016
- 2016-03-30 CN CN201610189859.0A patent/CN105893346A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120143597A1 (en) * | 2008-04-18 | 2012-06-07 | Biz360 Inc. | System and Methods for Evaluating Feature Opinions for Products, Services, and Entities |
CN101510221A (en) * | 2009-02-17 | 2009-08-19 | 北京大学 | Enquiry statement analytical method and system for information retrieval |
Non-Patent Citations (1)
Title |
---|
鹿文鹏: "基于依存和领域知识的词义消歧方法研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109271621A (en) * | 2017-07-18 | 2019-01-25 | 腾讯科技(北京)有限公司 | Semanteme disambiguates processing method, device and its equipment |
CN109271621B (en) * | 2017-07-18 | 2023-04-18 | 腾讯科技(北京)有限公司 | Semantic disambiguation processing method, device and equipment |
CN107656921A (en) * | 2017-10-10 | 2018-02-02 | 上海数眼科技发展有限公司 | A kind of short text dependency analysis method based on deep learning |
CN107957991A (en) * | 2017-12-05 | 2018-04-24 | 湖南星汉数智科技有限公司 | A kind of entity attribute information extraction method and device relied on based on syntax |
CN108446266B (en) * | 2018-02-01 | 2022-03-22 | 创新先进技术有限公司 | Statement splitting method, device and equipment |
CN108446266A (en) * | 2018-02-01 | 2018-08-24 | 阿里巴巴集团控股有限公司 | A kind of method, apparatus and equipment that sentence is split |
CN108664468A (en) * | 2018-05-02 | 2018-10-16 | 武汉烽火普天信息技术有限公司 | A kind of name recognition methods and device based on dictionary and semantic disambiguation |
CN109359303A (en) * | 2018-12-10 | 2019-02-19 | 枣庄学院 | A kind of Word sense disambiguation method and system based on graph model |
CN109614620A (en) * | 2018-12-10 | 2019-04-12 | 齐鲁工业大学 | A kind of graph model Word sense disambiguation method and system based on HowNet |
CN109359303B (en) * | 2018-12-10 | 2023-04-07 | 枣庄学院 | Word sense disambiguation method and system based on graph model |
CN109614620B (en) * | 2018-12-10 | 2023-01-17 | 齐鲁工业大学 | HowNet-based graph model word sense disambiguation method and system |
CN110674640B (en) * | 2019-09-25 | 2022-10-25 | 北京明略软件系统有限公司 | Chinese name acquisition method, and training method and device of Chinese name extraction model |
CN110674640A (en) * | 2019-09-25 | 2020-01-10 | 北京明略软件系统有限公司 | Chinese name acquisition method, and training method and device of Chinese name extraction model |
CN112099764B (en) * | 2020-08-13 | 2022-03-15 | 南京航空航天大学 | Formal conversion rule-based avionics field requirement standardization method |
CN112099764A (en) * | 2020-08-13 | 2020-12-18 | 南京航空航天大学 | Formal conversion rule-based avionics field requirement standardization method |
CN112214999A (en) * | 2020-09-30 | 2021-01-12 | 内蒙古科技大学 | Word meaning disambiguation method and device based on combination of graph model and word vector |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105893346A (en) | Graph model word sense disambiguation method based on dependency syntax tree | |
O'Brien | Introductory thanksgivings in the letters of Paul | |
Barr | The concept of biblical theology: An Old Testament perspective | |
Brown et al. | Peter in the New Testament: A Collaborative Assessment by Protestant and Roman Catholic Scholars | |
Behr | The way to Nicaea | |
Harris | Testimonies: Volume 2 | |
Mullins | The Axioms of Religion | |
Hildebrand | The Trinitarian Theology of Basil of Caesarea: A Synthesis of Greek Thought and Biblical Truth | |
Hafemann | Paul, Moses, and the history of Israel: The letter/Spirit contrast and the argument from scripture in 2 Corinthians 3 | |
CN104750676B (en) | Machine translation processing method and processing device | |
Spivak | Translating in a World of Languages | |
Driver | An Introduction to the Literature of the Old Testament | |
Bowie | Women's suffrage in Thailand: a Southeast Asian historiographical challenge | |
Harris | The Odes and Psalms of Solomon | |
Little | Consider the hermaphroditic mind: Comment on “The interplay of evidential constraints and political interests: Recent archaeological research on gender” | |
Rapoport-Albert et al. | Late Aramaic: The Literary and Linguistic Context of the Zohar | |
Porter et al. | The Gospel of John in Modern Interpretation | |
CN105718442A (en) | Word sense disambiguation method based on syntactic analysis | |
Conway | The making of Latin: an introduction to Latin, Greek and English etymology | |
Ebihara | Evidentiality of the Tibetan Verb snang | |
Ruzhekova-Rogozherova | Teaching English passive contrastively and in comparison with other categories | |
Kilgour | The Rule against the Use of Legislative History: Canon of Construction or Counsel of Action | |
e Habiba et al. | A MARXIST FEMINIST STUDY OF MALE AND FEMALE IMAGES IN CHETAN BHAGAT’S ONE INDIAN GIRL | |
De Weerdt et al. | Observations on possessive and existential constructions in Flemish Sign Language | |
Tavris | No Precedent in Point: So What and Why Not |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160824 |