CN103488624A - Relation mining method based on Chinese syntactic structure - Google Patents

Relation mining method based on Chinese syntactic structure Download PDF

Info

Publication number
CN103488624A
CN103488624A CN201310411161.5A CN201310411161A CN103488624A CN 103488624 A CN103488624 A CN 103488624A CN 201310411161 A CN201310411161 A CN 201310411161A CN 103488624 A CN103488624 A CN 103488624A
Authority
CN
China
Prior art keywords
tuple
relation
sentence
chinese
seed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310411161.5A
Other languages
Chinese (zh)
Inventor
李付民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN201310411161.5A priority Critical patent/CN103488624A/en
Publication of CN103488624A publication Critical patent/CN103488624A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a relation mining method based on Chinese syntactic structure. The method includes the steps of a, selecting a seed relational tuple set; b, acquiring expression forms of seed tuples in terms of Chinese syntax; c, acquiring a relation mining template according to the expression forms; d, mining with the obtained relation mining template to obtain a new relation tuple; e, accuralizing the new relation tuple to obtain a more accurate relation tuple. With no dependence on deep language processing techniques or limitations on specific expression forms of relation keywords, the method is more widely applicable.

Description

A kind of relation excavation method based on Chinese syntactic structure
Technical field
The present invention relates to the obtaining of structure, Internet resources, natural language processing technique (participle, Entity recognition, syntax parsing etc.) field, specifically a kind of relation excavation method based on Chinese syntactic structure of knowledge base.
Background technology
Current era is an information age, and mass data makes people face a difficult problem of selecting rapidly and accurately related data, and information extraction technique has occurred in this case.And Relation extraction also claims that relation excavation is the most valued field in information extraction.
Relation excavation refers to the process of finding out a plurality of entities and can mean the relation between these entities from text.According to the difference of excavated relationship type, relation excavation can be divided into two primary categories.The first kind, the excavation of carrying out for particular kind of relationship type (as man and wife, general headquarters).The benefit of this class method for digging is that accuracy rate and recall rate are high, but is not to be included in predefined set of relationship owing under actual conditions, always having some relationship types, so these class methods do not have good expansion and transplantability.Equations of The Second Kind, open relation excavation method.These class methods do not impose any restrictions the particular type of relation and the form of expression of defining relation only.For example: will concern that key definition is verb, just can excavate the relation showed with verb from sentence; Can certainly take the relation that noun is keyword by concerning that key definition is noun, so just can excavating.Do not concern kind owing to having predefine in open relation excavation method, can find from dissimilar data centralization more relationship type and concern the tuple example, so both can be applied to seal data set, can be applied to again, in network environment, to there is good transplantability.
But, simply concerning that key definition is that verb or noun all can only find the relation of verb or noun, application has certain limitation.So concerning that key definition is that simple part of speech combination is inappropriate, especially for complicated Chinese structure.By the observation to Chinese grammer and statistics, find in Chinese to exist some typical syntactic structures, and have mapping between these syntactic structures and entity relationship.
Summary of the invention
The objective of the invention is the deficiency existed for existing digging technology and a kind of relation excavation method based on Chinese syntactic structure provided.The method does not need a large amount of training sets, has reduced the dependence to training set, and in mining process, use syntax parsing to improve accuracy rate, has reduced the appearance that " without information " concerns tuple simultaneously.
The concrete technical scheme that realizes the object of the invention is:
A kind of relation excavation method based on Chinese syntactic structure, the method comprises following concrete steps:
A, selected seed tuple-set;
B, obtain the seed tuple in the syntactical form of expression of Chinese, specifically comprise:
I) obtain the co-occurrence sentence at seed tuple place;
II) give the syntax resolver tuple co-occurrence sentence and carry out the syntax parsing;
III) obtain the seed tuple in the syntactical form of expression of Chinese according to the syntax analysis result;
C, according to III in step b) the form of expression obtain the relation excavation template, specifically comprise:
I) obtain the all-links path of seed tuple on sentence structure;
II) find out shortest path from the all-links path;
III) short chain is connect to path as the relation excavation template, for the relation excavation process;
D, the relation excavation template obtained for excavating, obtain the new tuple that concerns, specifically comprise:
I) obtain corpus to be excavated;
II) sentence in corpus is carried out to the natural language processing of participle, Entity recognition;
III) filter out and comprise the sentence that the entity number is less than 2, only retain the sentence that comprises 2 and 2 above entities;
IV) sentence retained is carried out to Chinese syntax parsing;
V), according to the relation excavation template that obtained and the Chinese syntax analysis result of sentence, obtain the new tuple (if present) that concerns;
E, the new tuple that concerns is carried out to " accuracy ", obtains the more accurate tuple that concerns, specifically comprise:
I), if concern that keyword is verb, take this verb carries out verb keyword " accuracy " as core;
II), if concern that keyword is noun, take this noun carries out noun keyword " accuracy " as core.
Described seed tuple comprise entity to concern keyword.
Described concern tuple have the structure identical with the seed tuple comprise entity to concern keyword.
The method according to this invention, the user need to obtain a seed tuple-set, is used for training; Then utilize the seed tuple to obtain the relation excavation template; Finally utilize the excavation template carry out the excavation of new relation tuple and carry out " accuracy " corpus to be excavated.So just obtained having carried out the purpose of relation excavation according to Chinese syntactic structure.
The present invention has utilized technology commonly used in some natural language processings (participle, named entity recognition), has also used the syntax resolver to come centering sentence to carry out the syntax parsing simultaneously.These technology are with respect to other deeper treatment technologies, and the time needed and space still less, are easier to process.
The present invention can utilize Chinese syntactic structure to obtain the relation excavation template; Can utilize the relation excavation template to carry out relation excavation, obtain concerning tuple; Can carry out " accuracy " to concerning tuple.The present invention does not rely on profound Language Processing technology, the also concrete manifestation form of restriction relation keyword not, thus there is range of application more widely.
The accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention;
Fig. 2 is that the present invention obtains the process flow diagram that excavates template;
Fig. 3 is that the new relation ancestral of unit of the present invention excavates process flow diagram;
Fig. 4 is new relation tuple of the present invention " accuracy " process flow diagram.
Embodiment
The present invention does not rely on profound natural language processing technique, not the concrete manifestation form of restriction relation keyword.
When the user more known concern tuple after and its corresponding co-occurrence sentence set after, obtain the relation excavation template according to Chinese syntax resolver from these sentences, recycle these relation excavation templates and excavate the tuple that concerns made new advances from language material to be excavated, finally can also to excavation to concern tuple carry out " accuracy " thus obtain more accurate tuple.
For the problem of using deep layer Language Processing technology to bring in current entity relationship method for digging, this method only utilizes shallow-layer Language Processing technology just can obtain the effect suitable with them, and in mining process, the concrete manifestation form to relation is not done any restriction, has increased the scope of application of this method.
Below in conjunction with accompanying drawing, describe the present invention:
Consult Fig. 1, the present invention at first seed tuple and corresponding co-occurrence sentence set intersection to the syntax resolver, the template of the relation excavation for the mining process set obtained from analysis result; After obtaining language material to be excavated, it is simply processed and filter rear utilization excavation template and excavated; After completing, mining process can obtain the new tuple that concerns; Finally to new concern tuple carry out " accuracy " thus obtain the final tuple that concerns.
Consult Fig. 2, the present invention obtains the detailed process of excavating template: be at first that step s101 obtains the seed tuple for training; Then step s102 obtains a plurality of co-occurrence sentences of each seed tuple in step; Step s103 can carry out the syntax parsing to each contribution sentence, obtains the syntax form of expression of each sentence; Step s104 obtains all paths of all-links seed tuple all elements for each syntax tree, form a set of paths; Step s105 extracts the shortest path for each seed tuple out from set of paths, forms set of minimal paths; Step s106 carries out formal expression to each path of set of minimal paths the inside, and final formation excavated the template set.
Consult Fig. 3, the detailed process that the ancestral of new relation of the present invention unit excavates: step s201 obtains the corpus for excavating, and this corpus is that the user provides, can be sealing can be also open, from this respect, the present invention does not rely on corpus.Step s202 carries out participle and named entity recognition out to each sentence in corpus.Step s203 is the result obtained for upper step, filters out the entity number and is less than 2 the sentence sentence that only reservation comprises 2 or 2 above entities.This is because relation excavation is the relation of excavating between entity, if only have an entity or there is no entity in a sentence, just thinks and can not have relation in this sentence.Step s204 carries out the syntax parsing to the sentence remained, and obtains its corresponding syntax analytic tree.S205 carries out relation excavation according to the excavation template obtained and the syntax tree of sentence, more particularly, if can form a template in template set in syntax tree just can excavate the new relation tuple according to the characteristics of template, if can not form such template think in this sentence, there do not is the new tuple that concerns.Step s206 is the process that the new relation tuple got is stored.
Consult Fig. 4, the present invention carries out the process of " accuracy ": step s301 to the new relation tuple be for concerning that the verb composition in keyword carries out " accuracy ", specifically, while in concerning keyword, comprising verb, will be merged verb and adverbial word before and after it, until there is no verb and noun before and after it.Step s302 is for concerning that the noun composition in keyword carries out " accuracy ", specifically, while in concerning keyword, comprising noun, will be merged the nouns and adjectives before and after it, until there is no verb and noun before and after it.

Claims (3)

1. the relation excavation method based on Chinese syntactic structure is characterized in that the method comprises following concrete steps:
A, selected seed tuple-set;
B, obtain the seed tuple in the syntactical form of expression of Chinese, specifically comprise:
I) obtain the co-occurrence sentence at seed tuple place;
II) give the syntax resolver tuple co-occurrence sentence and carry out the syntax parsing;
III) obtain the seed tuple in the syntactical form of expression of Chinese according to the syntax analysis result;
C, according to III in step b) the form of expression obtain the relation excavation template, specifically comprise:
I) obtain the all-links path of seed tuple on sentence structure;
II) find out shortest path from the all-links path;
III) short chain is connect to path as the relation excavation template, for the relation excavation process;
D, the relation excavation template obtained for excavating, obtain the new tuple that concerns, specifically comprise:
I) obtain corpus to be excavated;
II) sentence in corpus is carried out to the natural language processing of participle, Entity recognition;
III) filter out the sentence that the entity number that comprises is less than 2, only retain the sentence that comprises 2 and 2 above entities;
IV) sentence retained is carried out to Chinese syntax parsing;
V) according to the relation excavation template obtained and the Chinese syntax analysis result of sentence, obtain the new tuple that concerns;
E, the new tuple that concerns is carried out to " accuracy ", obtains the more accurate tuple that concerns, specifically comprise:
I), if concern that keyword is verb, take this verb carries out verb keyword " accuracy " as core;
II), if concern that keyword is noun, take this noun carries out noun keyword " accuracy " as core.
2. method according to claim 1, it is characterized in that described seed tuple comprise entity to concern keyword.
3. method according to claim 1, it is characterized in that described concern tuple have the structure identical with the seed tuple comprise entity to concern keyword.
CN201310411161.5A 2013-09-11 2013-09-11 Relation mining method based on Chinese syntactic structure Pending CN103488624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310411161.5A CN103488624A (en) 2013-09-11 2013-09-11 Relation mining method based on Chinese syntactic structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310411161.5A CN103488624A (en) 2013-09-11 2013-09-11 Relation mining method based on Chinese syntactic structure

Publications (1)

Publication Number Publication Date
CN103488624A true CN103488624A (en) 2014-01-01

Family

ID=49828867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310411161.5A Pending CN103488624A (en) 2013-09-11 2013-09-11 Relation mining method based on Chinese syntactic structure

Country Status (1)

Country Link
CN (1) CN103488624A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5841895A (en) * 1996-10-25 1998-11-24 Pricewaterhousecoopers, Llp Method for learning local syntactic relationships for use in example-based information-extraction-pattern learning
WO2012067586A1 (en) * 2010-11-15 2012-05-24 Agency For Science, Technology And Research Database searching

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5841895A (en) * 1996-10-25 1998-11-24 Pricewaterhousecoopers, Llp Method for learning local syntactic relationships for use in example-based information-extraction-pattern learning
WO2012067586A1 (en) * 2010-11-15 2012-05-24 Agency For Science, Technology And Research Database searching

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李庆玲: "弱指导中文实体关系抽取方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王晶: "无监督的中文实体关系抽取研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN107797991B (en) Dependency syntax tree-based knowledge graph expansion method and system
KR101707369B1 (en) Construction method and device for event repository
CN102708100B (en) Method and device for digging relation keyword of relevant entity word and application thereof
CN108717423B (en) Code segment recommendation method based on deep semantic mining
CN101676898B (en) Method and device for translating Chinese organization name into English with the aid of network knowledge
WO2016127677A1 (en) Address structuring method and device
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
CN105975625A (en) Chinglish inquiring correcting method and system oriented to English search engine
CN106055623A (en) Cross-language recommendation method and system
CN105302796A (en) Semantic analysis method based on dependency tree
Derwojedowa et al. Words, concepts and relations in the construction of Polish WordNet
CN104657440A (en) Structured query statement generating system and method
CN109829173B (en) English place name translation method and device
CN111008309B (en) Query method and device
JP2020191075A (en) Recommendation of web apis and associated endpoints
CN102968431B (en) A kind of control device that the Chinese entity relationship based on dependency tree is excavated
US10223349B2 (en) Inducing and applying a subject-targeted context free grammar
CN106202039B (en) Vietnamese portmanteau word disambiguation method based on condition random field
CN113779062A (en) SQL statement generation method and device, storage medium and electronic equipment
CN112732969A (en) Image semantic analysis method and device, storage medium and electronic equipment
Ji Mining name translations from comparable corpora by creating bilingual information networks
US9104755B2 (en) Ontology enhancement method and system
CN103488624A (en) Relation mining method based on Chinese syntactic structure
CN114970543A (en) Semantic analysis method for crowdsourced design resources
Sridhar et al. A scalable approach to building a parallel corpus from the Web

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140101