CN109241538A - Based on the interdependent Chinese entity relation extraction method of keyword and verb - Google Patents

Based on the interdependent Chinese entity relation extraction method of keyword and verb Download PDF

Info

Publication number
CN109241538A
CN109241538A CN201811124153.1A CN201811124153A CN109241538A CN 109241538 A CN109241538 A CN 109241538A CN 201811124153 A CN201811124153 A CN 201811124153A CN 109241538 A CN109241538 A CN 109241538A
Authority
CN
China
Prior art keywords
verb
entity
relationship
subject
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811124153.1A
Other languages
Chinese (zh)
Other versions
CN109241538B (en
Inventor
许青青
谢赟
韩欣
卓建飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tak Billiton Information Technology Ltd By Share Ltd
Original Assignee
Shanghai Tak Billiton Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tak Billiton Information Technology Ltd By Share Ltd filed Critical Shanghai Tak Billiton Information Technology Ltd By Share Ltd
Priority to CN201811124153.1A priority Critical patent/CN109241538B/en
Publication of CN109241538A publication Critical patent/CN109241538A/en
Application granted granted Critical
Publication of CN109241538B publication Critical patent/CN109241538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Abstract

The invention discloses a kind of Chinese entity relation extraction methods interdependent based on keyword and verb first to segment text, extracting keywords using extensive unstructured free text as target text, forms text key word dictionary;Then subordinate sentence, participle, part-of-speech tagging, name Entity recognition, interdependent syntactic analysis processing are carried out to text, constructs entity corpus in conjunction with name entity dictionary and keyword thesaurus;The dependence between feature, syntactic structure and word is constituted according to Chinese sentence and constructs entity relationship syntactic rule from verb, then the matching of relationship syntactic rule is carried out to sentence each in text;Last output relation triple, obtains text relationship triplet sets.The present invention can make the entity relation extraction of extensive Chinese text more efficient, more accurate.

Description

Based on the interdependent Chinese entity relation extraction method of keyword and verb
Technical field
The present invention relates to Chinese entity relation extraction methods more particularly to a kind of based on keyword and verb dependency analysis Extensive free text abstracting method.
Background technique
With the rapid development of Internet information technique, explosive growth is presented in the text information on internet.How from The information that people's needs are quickly and accurately extracted in large-scale text information has become a hot topic of research.Therefore, information extraction Technology is come into being.Important component of the entity relation extraction as information extraction, its object is to from natural language text Excavate the semantic association between entity, not only facilitate the foundation of domain body, promote the building of knowledge mapping, and by pair Semantic information between entity deeply excavates the search intention that user can be further understood with analysis, to provide more for user Accurately search service improves user's search experience.
Traditional Chinese entity relation extraction towards be specific area text, qualified relation classification, limit entity class Not Deng extraction, common method is based on the machine learning method for having supervision.This entity relation extraction method for having supervision It needs manually to mark relationship corpus, predefined relationship type, but unstructured, the cross-cutting Chinese for middle magnanimity of facing the reality Text is many times difficult to predefine entity relationship type, so there is the machine learning method of supervision in entity relation extraction In encounter severe challenge.Therefore, in face of extensive free text, open unsupervised Relation extraction method is by increasingly More concerns.
Existing open unsupervised Relation extraction method mainly has based on clustering algorithm, based on heuristic rule and based on sentence The entity abstracting method of method analysis.Wherein, the entity relation extraction method based on cluster is obtained by limitations such as distance, positions Entity pair, then with certain clustering algorithm by semantic similar entity to gathering for same class cluster, then select representative word to make It is stated for the relationship of each class cluster.No need to reserve adopted relationship classification, mark corpus, the artificial rule of formulation, very great Cheng for this method Artificial participation is reduced on degree, but such method still has clusters number intrinsic in some shortcomings, such as clustering algorithm, gathers The determination problem at class center, and need a large amount of related entities to the effect that can have just trained, for low frequency entity to calling together The rate of returning is lower, is furthermore difficult to accurately conclude the relationship description word of each class cluster.Entity abstracting method one based on heuristic rule As be by analyzing a large amount of Chinese entity drawing-out structure features, the entity for summing up common triple is adjusted the distance and relationship Then deictic words position distribution between entity generates candidate triple by the limitation between distance entity and relative position, Finally candidate triple is filtered.Research is that relationship instruction is excavated using the method for global sequence and byte orderings Then word is filtered relationship triple using relationship deictic words and clause rule, the entity that this method is extracted from text Relationship triple accuracy rate is higher, but may filter out some related entities pair by the limitation of distance between entity pair And relationship deictic words can not will have the triple of semantic association to filter out comprehensively, so that the recall rate extracted is lower.Base Pass through the interdependent pass relation recognition entity pair between parsing sentence syntactic structure and word in the entity relation extraction of syntactic analysis Between relationship.Commonly the method based on syntactic analysis is to obtain most short dependency tree between two entities, then between two entities Vocabulary in most short independent path constructs triple as relative, such method is easy to operate, but obtained relative can It can be the combination of multiple words, and the portmanteau word obtained in some complicated Chinese sentences may not necessarily also form correct phrase, So relationship statement may be very abstract, meaning is relatively fuzzyyer.In addition, being to go deep into digging there are also a kind of method based on syntactic analysis The interdependent semanteme that is contained of most short interdependent path between pick entity is using features such as dependence, part-of-speech information and positional relationships It limits, obtains interdependent semantic normal form, if the interdependent path in input text between entity pair is matched with the interdependent semantic normal form mentioned, Triple relationship can then be extracted.This method is relatively accurate to the relative of extraction, and computation complexity is low, and extraction efficiency is high, and For complex text, especially there is the sentence of multiple entities, it is not satisfactory to extract effect.But on the whole, it is based on syntactic analysis Entity relationship computation complexity it is low, the entity relation extraction of extensive, cross-cutting Chinese text can be coped with.
No matter at home and abroad the present age many grammarians emphasize that verb is the core of sentence structure, advocate " verb center Say " because the syntactic-semantic association of objective reality is mainly manifested in verb lexical item to other lexical item (predominantly nouns between lexical item Lexical item) restriction on, so verb can be used as investigate semantic structure of sentences starting point, verb and front and back noun can be excavated The semantic combination relationship of property ingredient.By statisticalling analyze to a large amount of relationship examples, the verb of two entities of connection can generally be indicated Semantic relation between entity, therefore the ingredient for depending on verb can be analyzed, excavate with verb constitute subject-predicate, dynamic guest, The lexical item of the dependences such as dynamic benefit can construct accurate entity relationship triple if these lexical items are entities.For example, For the verb of sentence, if finding the subject and object dependent on the verb, and subject and object are all entities, then the master Language, object and verb can be configured such that triple.
The study found that the entity relation extraction in nonspecific field is taken out mainly for the relationship between people, mechanism, place at present It takes, and such as one, some articles are about the article for introducing big data or a product description, wherein people, mechanism, place etc. The frequency for naming entity to occur is less, if only extracting the semantic relation of these name entities, is difficult to excavate the Deep Semantics of text Relationship.And the vocabulary frequently occurred in a text document, especially noun, it is able to reflect the master of document description to a certain extent Content is wanted, therefore key nouns can be increased and expand document entity collection, excavates in more texts and is contacted between main information, thus Construct semantic network more abundant.
Summary of the invention
It is an object of the invention to the entity relation extractions in face of existing extensive unstructured free text, there is supervision Method due to be difficult to predefined relationship classification, mark relationship corpus and be restricted, existing open unsupervised entity relationship is taken out Although taking some effects of method, overall accuracy is not high, to propose that one kind can make cross-cutting free text Relation extraction More accurate unsupervised Relation extraction method, that is, provide a kind of Chinese entity relation extraction interdependent based on keyword and verb Method keeps the entity relation extraction of extensive Chinese text more efficient, more accurate.
Realizing the technical solution of above-mentioned purpose is:
A kind of Chinese entity relation extraction method interdependent based on keyword and verb, comprising:
Text is segmented, extracting keywords, generates text key word dictionary;
To text subordinate sentence, each simple sentence is segmented respectively, part-of-speech tagging, name Entity recognition and interdependent syntax are divided Analysis obtains participle, part of speech, name entity and the interdependent syntactic analysis information of each simple sentence;
Obtain verb set and the entity sets in each simple sentence;
When verb and entity number are both greater than 0 in simple sentence, analysis depend on verb lexical item whether matching relationship syntax advise Then, if matching, obtains initial entity relationship triple, then expand entity relationship triple;Otherwise, the simple sentence is carried out Next verb matching;
After all simple sentences execute Relation extraction in text, text triples set is obtained.
Preferably, subordinate sentence processing is carried out according to the text of fullstop, exclamation mark and question mark to input, obtains simple sentence set.
Preferably, when extracting keywords, first word segmentation result is filtered according to part of speech feature, only retains nominal lexical item As candidate keywords, then the TF-IDF weight of candidate keywords is calculated, the word that threshold value is greater than given threshold is finally inputted into text This keyword set;Wherein, TF refers to that the number that word occurs in the text, IDF refer to inverse file frequency.
Preferably, the entity sets are made of text overall situation keyword set and name entity.
Preferably, the relationship syntactic rule be according to the interdependent syntactic structure of sentence, using verb as candidate relationship word, Whether other lexical items and the dependence of verb are subject-predicate, dynamic guest, guest Jie, dynamic benefit relationship in parsing sentence, if there are two in sentence Dependence between a lexical item and verb is two kinds in these relationships, such as subject-predicate and dynamic guest, subject-predicate and guest Jie etc., and this two A lexical item is all entity, then can determine initial entity relationship triple.
Preferably, the relationship syntactic rule includes that the non-isA for isA rule and other verbs for judging class verb is advised Then.
Preferably, in isA rule, the sentence structure of entity rule related with relative is expressed as " Entity1+ Noun + is+Entity2 " or " Entity2+is+Entity1+Noun ", entity relationship triple be tentatively expressed as (Entity1, Noun,Entity2);Wherein, Entity1 and Entity2 is the entity pair in sentence, an entity and is judged based on class verb Meaning relationship or dynamic guest's relationship, another entity and judge class verb without direct relation;Noun indicates the noun in sentence, with judgement There are subject-predicate relationship or dynamic guest's relationships for class verb, and there are another entity and judge class verb without direct relation, but with this Noun is surely middle dependence, for modifying the noun;
In isA rule, the entity rule unrelated with relative refers in sentence that there are an entities with class verb is judged and is Subject-predicate relationship there are a noun and judge that class verb for dynamic guest's relationship, and is coordination between entity and entity, sentence Structure is represented by " Entity1+Conj+Entity2 (++)+is+Noun ", and relationship triple can be tentatively expressed as (Entity1,Noun,Entity2);Wherein, Entity1 and Entity2 is the entity pair in sentence, and Entity2 (++) is indicated There may be one or more entities and Entity1 are arranged side by side, Noun is the noun in sentence.
Preferably, non-isA rule includes that verb has subject rule and verb without subject rule;
Verb have subject rule include Subject, Predicate and Object structure, subject-predicate guest Jie structure, subject-predicate mend guest's structure, preceding guest guest Jie structure and Other structures specifically include:
The Subject, Predicate and Object structure refers to from a certain verb, and according to interdependent syntax, the subject and object of the verb exist, And all be entity, initial entity relationship triple can be built up;
The subject-predicate guest Jie structure refers to from a certain verb, and according to interdependent syntax, the subject of the verb exists and is Entity, the preposition for depending on the verb have object and are entity, can extract initial entity relationship triple out;
The subject-predicate is mended guest's structure and is referred to from a certain verb, which is intransitive verb, should according to interdependent syntax Verb is there are subject and is entity, there is the complement for depending on verb, and complement has object and is entity, may make up initial reality Body relationship triple;
The preceding guest guest Jie structure refers to from a certain verb, according to interdependent syntax, exist before depending on the verb It sets object and is entity, there is the preposition for depending on the verb, and preposition has object and is entity, can form initial entity and close It is triple;
The other structures refer to from a certain verb, and according to interdependent syntax, the subject of the verb exists and is entity, In the presence of the other structures for depending on verb, the structure is there are object and is entity, can construct triple relationship;
Verb includes verb parallel construction and sentence without subject structure without subject rule, is specifically included:
The verb parallel construction indicates that there are a lexical items, and dynamic guest's relationship can be directly established with some verb in sentence Or establish guest's Jie relationship indirectly, mend guest's relationship, and the lexical item is entity, and the lexical item of subject-predicate relationship can not be established with the verb, But there are other verbs arranged side by side with the verb, the two subject is consistent, so using the subject of its verb arranged side by side as subject, it can structure Build entity relationship triple;
Subject is not present without subject representation sentence in the sentence, but there are a lexical items directly to build with some verb Vertical dynamic guest's relationship establishes guest Jie indirectly, mends guest's relationship, and the lexical item is entity, and the sentence can be traced according to Chinese heuristic rule The previous sentence of son, subject of the subject of former statement kernel word aroused in interest as this;In interdependent syntax theory, advocate that core is dynamic Word is the center compositions of sentence, dominates other compositions, and a sentence, there may be multiple verbs, each verb may be deposited In subject, so the rule only takes subject of the subject of the core verb of preceding sentence as this.
Preferably, described that entity relationship triple is expanded, including entity word expands, relative expands and arranged side by side Triple expands, specifically:
The entity word expansion is to merge keyword entity with its attribute qualifier;
It includes the addition negative adverbial modifier, addition non-physical object that the relative, which expands,;
The triple arranged side by side expansion is incited somebody to action when having obtained the entity in entity relationship triple there are when entity arranged side by side Entity arranged side by side and the relative form new triple.
The beneficial effects of the present invention are: the present invention be different from general Chinese entity extraction only focus on name entity (name, Name, mechanism name) between relationship, with the traditional name entity set of text overall situation keyword expansion, increase description text mainly in The semantic relation of the keyword of appearance keeps the text semantic network of building more abundant.The present invention, which is different from, is generally basede on syntax point The unsupervised entity abstracting method of analysis is mostly from entity to its relative of set off in search, and inefficient, this method is to wait with verb Relative is selected, semantic relation between the nominal ingredient in front and back can be connected according to verb in Chinese grammer, with the interdependent syntax of verb For foothold, without limiting the distance between entity, without considering relatival position, it is relatively generally basede on the entity of syntactic analysis The triple more precise and high efficiency that abstracting method obtains.
Detailed description of the invention
Fig. 1 is entity relation extraction flow chart of the present invention;
Fig. 2 is relationship syntactic rule classification chart of the present invention;
Fig. 3 is an example schematic of the present invention according to syntactic analysis;
Fig. 4 is two example schematics of the present invention according to syntactic analysis;
Fig. 5 is three example schematics of the present invention according to syntactic analysis;
Fig. 6 is a schematic diagram of relationship syntactic rule proposed by the present invention;
Fig. 7 is two schematic diagrames of relationship syntactic rule proposed by the present invention;
Fig. 8 is three schematic diagrames of relationship syntactic rule proposed by the present invention;
Fig. 9 is four schematic diagrames of relationship syntactic rule proposed by the present invention;
Figure 10 is the example schematic in the present invention according to syntactic analysis;
Figure 11 is two example schematics in the present invention according to syntactic analysis.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings.
The Chinese entity relation extraction method interdependent based on keyword and verb of the invention, analyzes the interdependent pass of verb System realizes the entity relation extraction of extensive free text, supports for building text semantic network provided data, referring to Fig. 1, tool Body the following steps are included:
Step 1, the text of input is segmented, extracting keywords, the keyword of extraction is generated into keyword thesaurus.
The purpose for extracting text overall situation keyword is to expand conventional entity collection, and conventional entity collection is just for name, place name, machine Structure name etc. name entity, and the present invention be towards the extensive no free text in field, if a text document is almost without people Name, place name and mechanism name, then will lead to not take out entity relationship, so the present invention is using keyword as one of entity set Point, entity relation extraction is carried out, semantic relation in document is excavated.Keyword is the word or phrase that can indicate text subject, And keyword is mostly noun, the keyword in a corpus of text may frequently occur in the document, and in other documents Middle frequency of occurrence is less, so first segmenting to text when the present invention extracts keyword, filtering out one further according to part of speech feature The word of a little non-name parts of speech, only retains nominal lexical item as candidate keywords, calculates the TF-IDF (word frequency-of candidate keywords Inverse document frequency) weight, the word that threshold value is greater than given threshold and inputs text key word set as text key word.Its In, TF refers to that the number that word occurs in the text, IDF refer to inverse file frequency, extracts TF-IDF feature and tends to identification at certain Often occur in piece document, but the uncommon word in other documents.
Step 2, according to fullstop, exclamation mark, text progress subordinate sentence processing of the question mark to input, output simple sentence set.
Step 3, each simple sentence segmented respectively, part-of-speech tagging, name Entity recognition and interdependent syntactic analysis, obtained Participle, part of speech, name entity and the interdependent syntactic analysis information of each simple sentence.
Wherein, name entity mainly obtains the entities such as name in sentence, place name, mechanism name.Each language list in sentence structure First (lexical item) is constantly present certain relationship, and this relationship is being mainly reflected between linguistic unit (lexical item) in sentence Syntactic-semantic relationship on, interdependent syntactic analysis is exactly the relation of interdependence between each linguistic unit of parsing sentence (lexical item), To identify in sentence the grammatical items such as " Subject, Predicate and Object ", " determining shape benefit ".
Step 4, verb collection and the entity set in each simple sentence are obtained, if verb and entity number are both greater than 0 in simple sentence, Step 5 is carried out, the processing to the sentence is otherwise terminated.
Wherein, entity not only includes name entity, further includes the global keyword set for describing text subject content, makes It obtains text entities library and more enriches and enrich.
Step 5, analysis depend on verb lexical item whether matching relationship syntactic rule, if matching, obtain initial entity Relationship triple, then entity relationship triple is expanded;Otherwise, the next verb for carrying out the sentence is matched;
Relationship syntactic rule is formulated before entity extracts, and is according to interdependent syntactic analysis and to combine some Chinese grammers Heuristic rule building.In Modern Chinese grammer, verb has other semantic components such as agent, word denoting the receiver of an action etc. in sentence Certain restrictive function, it is possible to which verb is excavated into verb and its front and back name as the starting point of parsing sentence semantic structure The semantic combination relationship of part of speech ingredient.Therefore, relationship syntactic rule of the invention is from verb, and analysis depends on verb Ingredient, for example when the subject of verb and object exist and be all entity, which just constitutes an entity relationship Triple.The relationship syntactic rule of satisfaction only obtains initial practice relationship triple, it is also necessary to further to entity relationship ternary Group carries out modification supplement.
Step 6, after all sentences are carried out Relation extraction in text, text triples set can be obtained.
In conjunction with described above, core of the invention is that relationship syntactic rule and entity relationship triple expand, below Dense medium continues related content:
One, relationship syntactic rule
In Modern Chinese grammer, verb is the foothold of sentence semantics analysis, it is to other semantic components in sentence Such as agent, word denoting the receiver of an action have certain restrictive function.By finding to a large amount of entity relationship instance analysis, entity relationship triple is total Occur with the syntactic structure of certain fixations, and verb often plays connection function in these structures, so the verb of two entities of connection It generally can be with the semantic relation between presentation-entity.In interdependent syntactic structure, main mark relationship is as shown in table 1, wherein with The related structure of verb has: subject-predicate relationship, dynamic guest's relationship, preposition object, structure of complementation, verbal endocentric phrase, guest's Jie relationship etc..By this Several structure combinatorial mappings in a little structures are relationship syntactic rule, can be applied to entity relation extraction.Having a kind of verb is to sentence Disconnected class verb (such as verb "Yes") not as relative in Relation extraction, and noun associated with it just plays relationship and connects The effect connect.Therefore, the present invention, which is individually handled, judges class verb, and relation rule is divided into isA rule and non-isA rule two is big Class, each class are divided into each group again, and referring to fig. 2, isA rule includes entity rule related with relative, entity and pass The unrelated rule of copula.Non- isA rule includes that verb has subject rule and verb without subject rule, and wherein verb has subject regular Guest's structure, preceding guest guest Jie structure and other structures are mended including Subject, Predicate and Object structure, subject-predicate guest Jie structure, subject-predicate;Verb is without subject Structure includes verb parallel construction and sentence without subject structure.
Relationship type Interdependent label
Subject-predicate relationship SBV
Dynamic guest's relationship VOB
Preposition object FOB
Verbal endocentric phrase ADV
Structure of complementation CMP
Guest's Jie relationship POB
Nominal endocentric phrase ATT
Coordination COO
And language DBL
Between guest's relationship IOB
Left additional relationships LAD
Right additional relationships RAD
Absolute construction IS
Punctuate WP
Key Relationships HAD
Table 1
Table 1 is that interdependent syntax marks relationship.
1, isA rule:
Judge the effect of verb (such as verb "Yes") mainly with the presence of table judgement, table explanation, table in Modern Chinese.Wherein, Table judgement is to indicate that things belongs to or what is equal to, if Beijing is the capital of China;Table illustrate be the feature for indicating things, Situation or situation, if this vehicle is red;It is the existence for indicating things that table, which exists, is such as cattle and sheep everywhere.Present invention research When entity extracts relationship, table judgement effect therein is only considered.According to relationship example, verb "Yes" is generally not as relationship description IsA rule is divided into entity rule related with relative, the entity rule unrelated with relative by word.
1.1, entity and the related rule of relative:
Entity rule related with relative refers to that the subject of verb "Yes" is entity, and object is common noun or verb The subject of "Yes" is common noun, and object is entity, and is surely middle relationship there are another entity and the noun.Its sentence structure It is expressed as " Entity1+Noun+is+Entity2 " or " Entity2+is+Entity1+Noun ", at the beginning of entity relationship triple Step is expressed as (Entity1, Noun, Entity2);Wherein, Entity1 and Entity2 is the entity pair in sentence, a reality Body and verb "Yes" are subject-predicate relationship or dynamic guest's relationship, another entity and verb "Yes" are without direct relation;Noun indicates sentence In noun, there are subject-predicate relationships or dynamic guest's relationship with verb "Yes", and there are another entity with verb "Yes" without direct Relationship, but be surely middle dependence with the noun, for modifying the noun.As " Chinese capital is Beijing.", interdependent syntax Referring to Fig. 3, verb "Yes" is " capital " there are subject " capital ", object " Beijing ", " China " as sentence core word for analysis Attribute can construct triple (China, capital, Beijing);For another example " Zhang is the son of famous movie star Zhang San.", interdependent syntax Referring to fig. 4, verb "Yes" is as sentence core word, and there are subject " Zhang ", object " son ", " Zhang San " is " son " for analysis Attribute may make up triple (Zhang San, son, Zhang).
1.2, the entity rule unrelated with relative:
It is entity that the entity rule unrelated with relative, which is verb "Yes" there are subject, is common noun there are object, deposits It is coordination in another or multiple entities and subject entity.Its sentence structure is expressed as " Entity1+Noun+is+ Entity2 " or " Entity2+is+Entity1+Noun ", entity relationship triple be tentatively expressed as (Entity1, Noun, Entity2);Wherein, Entity1 and Entity2 is the entity pair in sentence, an entity and verb "Yes" be subject-predicate relationship or Dynamic guest's relationship, another entity and verb "Yes" are without direct relation;Noun indicates the noun in sentence, exists with verb "Yes" and leads Meaning relationship or dynamic guest's relationship, and there are another entity and verb "Yes" without direct relation, but with the noun be in fixed it is interdependent Relationship, for modifying the noun.As " Xiao Li and Xiao Wang are men and wives.", interdependent syntactic analysis is referring to Fig. 5, verb "Yes" conduct Sentence core word is coordination there are subject " Xiao Li ", object " man and wife ", " Xiao Li " and " Xiao Wang ", it is (small can to form triple Lee, man and wife, Xiao Wang).
2, non-isA rule:
Non- isA rule refers to judge that verb as core, constructs entity relationship sentence as the non-of Entity Semantics relationship description Method rule.Core of the verb as syntactic structure, semantic structure, from syntax, verb determines the basic side of sentence structure Looks, from semantically, semantic structure be set up by core of verb, so referred to as predication structure or meaning state structure, Kinetonucleus structure.The present invention constructs the relationship syntax rule of non-isA according to the mapping between basic predication structure and typical syntactic structure Then, the semantic connection due to verb between lexical item, need to obtain two lexical items dominated by it, whether there is or not by its domination according to verb Subject, which is divided into verb, has subject rule and verb without subject rule, can be divided into several groups again under every rule-like, (referring to fig. 2) It is specifically described as follows:
2.1, verb has subject regular:
(1) Subject, Predicate and Object structure, Subject, Predicate and Object structure are the most common structures in Verb Predicate Sentence, and the present invention constructs Subject, Predicate and Object knot Structure rule refers to from a certain verb, and according to interdependent syntax, the subject and object of the verb exist, and is all entity, building Initial triple out, dependence is referring to Fig. 6 between verb and entity, wherein E1, E2 presentation-entity, V indicates verb, interdependent The specific paraphrase of arc is referring to table 1.For example, " king five accesses the U.S.." in verb " access " subject be " king five ", the object of verb For " U.S. ", exportable triple (king five, access, the U.S.).
(2) subject-predicate guest Jie structure, which refers to that verb and entity are expressed as subject-predicate guest's Jie structure on syntax, i.e., from certain One verb sets out, and according to interdependent syntax, the subject of the verb exists and is entity, and the preposition for depending on the verb has object and is Entity can extract initial triple out, and dependence is referring to Fig. 7 between verb and entity, wherein E1, E2 presentation-entity, V table Show that verb, P indicate preposition, the specific paraphrase of interdependent arc is referring to table 1.For example, verb in " Li Si gives a lecture in Tongji University " The subject of " delivering " is " Li Si ", and the preposition " " for depending on verb " delivering " has object " Tongji University ", exportable initial Triple (Li Si delivers, Tongji University).
(3) subject-predicate mends guest's structure, which refers to that verb and entity are expressed as subject-predicate on syntax and mend guest's structure, i.e., from certain One verb sets out, which is intransitive verb, and according to interdependent syntax, the subject of the verb exists and is entity, and there are interdependent In the complement of verb, which has object and is entity, may make up preliminary triple, and specific dependence indicates to participate in Fig. 8, In, E1, E2 presentation-entity, V indicate that verb, P indicate preposition, and the specific paraphrase of interdependent arc is referring to table 1.For example, " king five graduates from Harvard University." in intransitive verb " graduation " there are subject " king five ", there are complement prepositions " in ", and there are objects for preposition " in " " Harvard University " can export triple (king five, graduation, Harvard University).
(4) guest guest Jie structure before, which refers to guest Jie's guest's structure before verb and entity are expressed as on syntax, i.e., from certain One verb sets out, and according to interdependent syntax, which has preposition object and be entity, there is the preposition dominated by it, and preposition is deposited In object and it is entity, initial triple can be formed, specific dependence is indicated referring to Fig. 9, E1, E2 presentation-entity, V table Show that verb, P indicate preposition, the specific paraphrase of interdependent arc is referring to table 1.Substantially there is no subjects for the rule, but verb still props up With two lexical items, so can be regarded as the similar structure for having subject, this structure is common in passive sentence pattern, such as " Zhang San's quilt Fudan University enrolls." in verb " admission " preposition object be " Zhang San ", there is the preposition " quilt " for depending on " admission ", Jie Word " quilt " has object " Fudan University ", exportable triple (Fudan University, admission, Zhang San).
(5) other structures, the rule refer to from a certain verb, according to interdependent syntax, the subject of the verb exist and For entity, there are the other structures for depending on verb, the structure is there are object and is entity, can construct triple relationship. For example, " Li Si emphasizes to promote financial reform in enterprise's forum.", for the dependence of the sentence referring to Figure 10, verb is " strong Adjust " there are subject " Li Si " and object " propulsion ", verb " propulsion " can construct initial triple there are object " reform " (Li Si emphasizes, reforms).
2.2, verb is without subject rule:
(1) verb parallel construction, verb parallel construction indicate that subject is not present in verb, and there are some lexical items can be with the verb It directly establishes dynamic guest's relationship or establishes dynamic guest's relationship indirectly by another word, and this lexical item is entity, but is existed and the verb Other verbs arranged side by side, the two subject is consistent, so entity relationship three can be constructed using the subject of its verb arranged side by side as subject Tuple, specific interdependent syntax are indicated referring to Figure 11, wherein E1, E2, E3 presentation-entity, V1 and V2 indicate verb, interdependent arc LikeVOB refers to lexical item and verb may be directly dynamic guest's relationship, it is also possible to by some word obtain it is indirect move guest's relationship (such as certain Lexical item is the object of certain preposition, which depends on some verb).For example, " Li Si accesses China, and delivers in Tongji University Speech." in previous verb can be extracted triple (Li Si, access, China) according to " Subject, Predicate and Object structure " this rule, second A verb " delivering " obtains object " Tongji University " by guest's Jie structure, and but without subject, but verb " delivering " and " access " are simultaneously Column, so according to the subject consistency of two verbs, it may be determined that another preliminary triple (Li Si delivers, Tongji University).This Embodiment but is not limited to two verbs arranged side by side only with two verb examples arranged side by side, the rule is equally applicable include it is multiple simultaneously The sentence of column verb.
(2) subject is not present without subject structure, the Rule Expression sentence in sentence, and there are a lexical items can be straight with some verb It connects the dynamic guest's relationship of foundation or establishes the relationships such as guest Jie, dynamic benefit guest indirectly, and the lexical item is entity, it can according to Chinese heuristic rule Trace the previous sentence of the sentence, subject of the subject of former statement kernel word aroused in interest as this;It is main in interdependent syntax theory The center compositions that core verb is sentence are opened, other compositions are dominate, and there may be multiple verbs, each verbs for a sentence May all there be subject, so the rule only takes subject of the subject of preceding statement kernel word aroused in interest as this.
Two, entity relationship triple expands
Meet relationship syntactic rule and only obtain initial triple, the present invention also needs further to expand triple, packet It includes entity expansion, relative expansion and triple expands side by side, specifically:
(1) entity expands, the purpose that entity expands be the participle stage is split noun phrase for multiple lexical items into Row merges, and the present invention merges keyword entity (non-name, place name, mechanism name) with its attribute qualifier;If keyword entity Previous word and the keyword entity are surely middle relationships, and do not constitute and connect for quantifier or keyword and preceding several non-quantifier lexical items Continuous nominal endocentric phrase then merges it with keyword entity.As " Li Si emphasizes to promote in enterprise's forum in above-described embodiment Financial reform." initial triple (Li Si emphasizes, reforms) is extracted, " finance " is to modify the attribute of " reform " in the sentence, So the two is merged, triple is updated to (Li Si emphasize, financial reform).
(2) relative expands, and in order to keep relationship description between entity more accurate specific, the present invention gives relative addition negative The adverbial modifier, non-physical object expand;Wherein, if there is negative adverb modification before verb as candidate relationship word, expression be with The opposite meaning of verb, therefore need to will be added in relative with the negative adverb before candidate relationship word, for example, " Zhang San does not like Joyous sea ", according to relationship syntactic rule, the initial triple of extraction is (Zhang San likes, sea), is expanded more by relative Newly it is (Zhang San does not like, sea).If not can make between two entities in addition, entity word acts as the object of verb in sentence Relationship is more clear.In above-described embodiment in " Li Si gives a lecture in Tongji University " triple by it is initial (Li Si delivers, Tongji University) it is extended to (Li Si gives a lecture, Tongji University).
(3) triple expands side by side, is commonly present subject arranged side by side or object arranged side by side in sentence, when these subjects or object are real When entity in body relationship triple, its entity arranged side by side need to be formed new triple.For example, for sentence " Xiao Fang, it is small red and Small China is the daughter of Xiao Li and Xiao Wang.", the relationship syntactic rule proposed according to the present invention, the initial triple of extraction is (small Lee, daughter, Xiao Fang), expanded by entity arranged side by side, increase other 5 triples, respectively (Xiao Li, daughter are small red), (Xiao Li, daughter, small China), (Xiao Wang, daughter, Xiao Fang), (Xiao Wang, daughter are small red), (Xiao Wang, daughter, small China).
Above embodiments are used for illustrative purposes only, rather than limitation of the present invention, the technology people in relation to technical field Member, without departing from the spirit and scope of the present invention, can also make various transformation or modification, therefore all equivalent Technical solution also should belong to scope of the invention, should be limited by each claim.

Claims (9)

1. a kind of Chinese entity relation extraction method interdependent based on keyword and verb characterized by comprising
Text is segmented, extracting keywords, generates text key word dictionary;
To text subordinate sentence, each simple sentence is segmented respectively, part-of-speech tagging, names Entity recognition and interdependent syntactic analysis, is obtained Obtain participle, part of speech, name entity and the interdependent syntactic analysis information of each simple sentence;
Obtain verb set and the entity sets in each simple sentence;
When verb and entity number are both greater than 0 in simple sentence, analysis depend on verb lexical item whether matching relationship syntactic rule, if Matching, obtains initial entity relationship triple, then expand entity relationship triple;Otherwise, it carries out under the simple sentence One verb matching;
After all simple sentences execute Relation extraction in text, text triples set is obtained.
2. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 1, feature exist In carrying out subordinate sentence processing to the text of input according to fullstop, exclamation mark and question mark, obtain simple sentence set.
3. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 1, feature exist When, extracting keywords, first word segmentation result is filtered according to part of speech feature, only retains nominal lexical item as candidate key Word, then the TF-IDF weight of candidate keywords is calculated, the word that threshold value is greater than given threshold is finally inputted into text key word set; Wherein, TF refers to that the number that word occurs in the text, IDF refer to inverse file frequency.
4. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 1, feature exist In the entity sets are made of text overall situation keyword set and name entity.
5. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 1, feature exist In, the relationship syntactic rule be according to the interdependent syntactic structure of sentence, using verb as candidate relationship word, its in parsing sentence Whether his lexical item and the dependence of verb are subject-predicate, dynamic guest, guest Jie, dynamic benefit relationship, if there are two lexical items and verbs in sentence Between dependence be two kinds in these relationships, for example subject-predicate and dynamic guest, subject-predicate and guest Jie etc., and the two lexical items are all real Body then can determine initial entity relationship triple.
6. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 5, feature exist In the relationship syntactic rule includes the non-isA rule for isA rule and other verbs for judging class verb.
7. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 6, feature exist In,
In isA rule, the sentence structure of entity rule related with relative is expressed as " Entity1+Noun+is+Entity2 " Or " Entity2+is+Entity1+Noun ", entity relationship triple are tentatively expressed as (Entity1, Noun, Entity2);Its In, Entity1 and Entity2 are the entities pair in sentence, an entity and judge class verb for subject-predicate relationship or dynamic guest's relationship, Another entity and judge class verb without direct relation;Noun indicates the noun in sentence, and there are subject-predicate passes with class verb is judged System or dynamic guest's relationship, and exist another entity and judge class verb without direct relation, but with the noun be calmly in interdependent pass System, for modifying the noun;
IsA rule in, the entity rule unrelated with relative refer in sentence there are an entity with judge class verb for subject-predicate Relationship there are a noun and judge that class verb for dynamic guest's relationship, and is coordination between entity and entity, sentence structure Be represented by " Entity1+Conj+Entity2 (++)+is+Noun ", relationship triple can tentatively be expressed as (Entity1, Noun,Entity2);Wherein, Entity1 and Entity2 is the entity pair in sentence, and Entity2 (++) indicates that there may be one A or multiple entities and Entity1 are arranged side by side, and Noun is the noun in sentence.
8. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 6, feature exist In non-isA rule includes that verb has subject rule and verb without subject rule;
Verb have subject rule include Subject, Predicate and Object structure, subject-predicate guest Jie structure, subject-predicate mend guest's structure, preceding guest guest Jie structure and other Structure specifically includes:
The Subject, Predicate and Object structure refers to from a certain verb, and according to interdependent syntax, the subject and object of the verb exist, and all For entity, initial entity relationship triple can be built up;
The subject-predicate guest Jie structure refers to from a certain verb, and according to interdependent syntax, the subject of the verb exists and is entity, The preposition for depending on the verb has object and is entity, can extract initial entity relationship triple out;
The subject-predicate is mended guest's structure and is referred to from a certain verb, which is intransitive verb, according to interdependent syntax, the verb There are subject and it is entity, there is the complement for depending on verb, and complement has object and is entity, may make up initial entity and closes It is triple;
The preceding guest guest Jie structure refers to from a certain verb, according to interdependent syntax, there is the preposition guest for depending on the verb , there is the preposition for depending on the verb in language and be entity, and preposition has object and for entity, can form initial entity relationship three Tuple;
The other structures refer to from a certain verb, and according to interdependent syntax, the subject of the verb exists and is entity, exist The other structures of verb are depended on, the structure is there are object and is entity, can construct triple relationship;
Verb includes verb parallel construction and sentence without subject structure without subject rule, is specifically included:
The verb parallel construction indicate to exist in sentence a lexical item can directly be established with some verb move guest's relationship or It connects guest's Jie relationship of establishing, mend guest's relationship, and the lexical item is entity, can not establish the lexical item of subject-predicate relationship with the verb, but deposit In other verbs arranged side by side with the verb, the two subject is consistent, so can construct reality using the subject of its verb arranged side by side as subject Body relationship triple;
Subject is not present without subject representation sentence in the sentence, but can directly establish and move with some verb there are a lexical item Guest's relationship establishes guest Jie indirectly, mends guest's relationship, and the lexical item is entity, and the sentence can be traced according to Chinese heuristic rule Previous sentence, subject of the subject of former statement kernel word aroused in interest as this;In interdependent syntax theory, advocate that core verb is The center compositions of sentence, dominate other compositions, and a sentence is there may be multiple verbs, and each verb may have master Language, so the rule only takes subject of the subject of the core verb of preceding sentence as this.
9. the Chinese entity relation extraction method interdependent based on keyword and verb according to claim 1, feature exist In, it is described that entity relationship triple is expanded, including entity word expands, relative expands and triple expands side by side, Specifically:
The entity word expansion is to merge keyword entity with its attribute qualifier;
It includes the addition negative adverbial modifier, addition non-physical object that the relative, which expands,;
The described triple arranged side by side expansion be when having obtained the entity in entity relationship triple there are when entity arranged side by side, will be arranged side by side Entity and the relative form new triple.
CN201811124153.1A 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs Active CN109241538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811124153.1A CN109241538B (en) 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811124153.1A CN109241538B (en) 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs

Publications (2)

Publication Number Publication Date
CN109241538A true CN109241538A (en) 2019-01-18
CN109241538B CN109241538B (en) 2022-12-20

Family

ID=65056162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811124153.1A Active CN109241538B (en) 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs

Country Status (1)

Country Link
CN (1) CN109241538B (en)

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582975A (en) * 2019-01-31 2019-04-05 北京嘉和美康信息技术有限公司 It is a kind of name entity recognition methods and device
CN109918672A (en) * 2019-03-13 2019-06-21 东华大学 A kind of structuring processing method of the Thyroid ultrasound report based on tree construction
CN109977235A (en) * 2019-04-04 2019-07-05 吉林大学 A kind of determination method and apparatus of trigger word
CN109992651A (en) * 2019-03-14 2019-07-09 广州智语信息科技有限公司 A kind of problem target signature automatic identification and abstracting method
CN109992777A (en) * 2019-03-26 2019-07-09 浙江大学 A kind of crucial semantic information extracting method of Chinese medicine state of an illness text based on keyword
CN110032649A (en) * 2019-04-12 2019-07-19 北京科技大学 Relation extraction method and device between a kind of entity of TCM Document
CN110046351A (en) * 2019-04-19 2019-07-23 福州大学 Text Relation extraction method under regular drive based on feature
CN110083822A (en) * 2019-03-06 2019-08-02 杭州电子科技大学 It is a kind of from demand text conversion to the conversion method of SysML demand figure
CN110110329A (en) * 2019-04-30 2019-08-09 湖南星汉数智科技有限公司 A kind of entity behavior derivation method, apparatus, computer installation and computer readable storage medium
CN110119510A (en) * 2019-05-17 2019-08-13 浪潮软件集团有限公司 A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word
CN110162788A (en) * 2019-05-06 2019-08-23 三角兽(北京)科技有限公司 The determination method and device of entity dependence
CN110196913A (en) * 2019-05-23 2019-09-03 北京邮电大学 Multiple entity relationship joint abstracting method and device based on text generation formula
CN110222332A (en) * 2019-04-29 2019-09-10 闽江学院 The method for realizing name of the dish Entity recognition based on dependency analysis
CN110263336A (en) * 2019-06-12 2019-09-20 东华大学 A method of building breast ultrasound domain body
CN110263341A (en) * 2019-06-20 2019-09-20 贵州电网有限责任公司 A method of profile is excavated and positioned from text
CN110263120A (en) * 2019-04-26 2019-09-20 北京零秒科技有限公司 Corpus labeling method and device
CN110309393A (en) * 2019-03-28 2019-10-08 平安科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium storing program for executing
CN110362673A (en) * 2019-07-17 2019-10-22 福州大学 Computer vision class papers contents method of discrimination and system based on abstract semantic analysis
CN110377901A (en) * 2019-06-20 2019-10-25 湖南大学 A kind of text mining method for making a report on case for distribution line tripping
CN110413732A (en) * 2019-07-16 2019-11-05 扬州大学 The knowledge searching method of software-oriented defect knowledge
CN110502642A (en) * 2019-08-21 2019-11-26 武汉工程大学 A kind of entity relation extraction method based on interdependent syntactic analysis and rule
CN110532553A (en) * 2019-08-21 2019-12-03 河海大学 A kind of method water conservancy spatial relationship word identification and extracted
CN110543630A (en) * 2019-08-21 2019-12-06 北京仝睿科技有限公司 Method and device for generating text structured representation and computer storage medium
CN110555083A (en) * 2019-08-26 2019-12-10 北京工业大学 non-supervision entity relationship extraction method based on zero-shot
EP3579120A1 (en) * 2018-06-04 2019-12-11 Infosys Limited Extraction of tokens and relationship between tokens from documents to form an entity relationship map
CN110569510A (en) * 2019-09-17 2019-12-13 四川长虹电器股份有限公司 method for identifying named entity of user request data
CN110569504A (en) * 2019-09-04 2019-12-13 北京明略软件系统有限公司 relation word determining method and device
CN110597998A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario entity relationship extraction method and device combined with syntactic analysis
CN110705310A (en) * 2019-09-20 2020-01-17 北京金山数字娱乐科技有限公司 Article generation method and device
CN110765759A (en) * 2019-10-21 2020-02-07 普信恒业科技发展(北京)有限公司 Intention identification method and device
CN110874396A (en) * 2019-11-07 2020-03-10 腾讯科技(深圳)有限公司 Keyword extraction method and device and computer storage medium
CN110909537A (en) * 2019-11-19 2020-03-24 曲英洲 Artificial intelligence method for modern Chinese component analysis
CN110991180A (en) * 2019-11-28 2020-04-10 同济人工智能研究院(苏州)有限公司 Command identification method based on keywords and Word2Vec
CN111126052A (en) * 2019-12-26 2020-05-08 中科鼎富(北京)科技发展有限公司 Function point generation method and device, electronic equipment and computer readable storage medium
CN111178079A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Triple extraction method and device
CN111177215A (en) * 2019-12-20 2020-05-19 京东数字科技控股有限公司 Method and device for generating financial data
CN111198932A (en) * 2019-12-30 2020-05-26 北京明略软件系统有限公司 Triple acquiring method and device, electronic equipment and readable storage medium
CN111241827A (en) * 2020-01-10 2020-06-05 同方知网(北京)技术有限公司 Attribute extraction method based on sentence retrieval mode
CN111382571A (en) * 2019-11-08 2020-07-07 南方科技大学 Information extraction method, system, server and storage medium
CN111597794A (en) * 2020-05-11 2020-08-28 浪潮软件集团有限公司 Dependency relationship-based 'yes' word and sentence relationship extraction method and device
CN111597812A (en) * 2020-05-09 2020-08-28 北京合众鼎成科技有限公司 Financial field multiple relation extraction method based on mask language model
CN111597349A (en) * 2020-04-30 2020-08-28 西安理工大学 Rail transit standard entity relation automatic completion method based on artificial intelligence
CN111597351A (en) * 2020-05-14 2020-08-28 上海德拓信息技术股份有限公司 Visual document map construction method
CN111611399A (en) * 2020-04-15 2020-09-01 广发证券股份有限公司 Information event mapping system and method based on natural language processing
CN111639499A (en) * 2020-06-01 2020-09-08 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN111666738A (en) * 2020-06-09 2020-09-15 南京师范大学 Formalized coding method for motion description natural text
CN111666767A (en) * 2020-06-10 2020-09-15 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111680492A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 New word mining method and device and electronic equipment
CN111798847A (en) * 2020-06-22 2020-10-20 广州小鹏车联网科技有限公司 Voice interaction method, server and computer-readable storage medium
CN111984778A (en) * 2020-09-08 2020-11-24 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
CN112052340A (en) * 2020-08-10 2020-12-08 深圳数联天下智能科技有限公司 Data model construction method and device and electronic equipment
CN112084793A (en) * 2020-09-14 2020-12-15 深圳前海微众银行股份有限公司 Semantic recognition method, device and readable storage medium based on dependency syntax
CN112131343A (en) * 2020-09-14 2020-12-25 杭州东信北邮信息技术有限公司 Chinese novel dialect dialogue character recognition method
CN112148838A (en) * 2020-09-23 2020-12-29 北京中电普华信息技术有限公司 Business source object extraction method and device
CN112183059A (en) * 2020-09-24 2021-01-05 万齐智 Chinese structured event extraction method
CN112364648A (en) * 2020-12-02 2021-02-12 中金智汇科技有限责任公司 Keyword extraction method and device, electronic equipment and storage medium
CN112380868A (en) * 2020-12-10 2021-02-19 广东泰迪智能科技股份有限公司 Petition-purpose multi-classification device based on event triples and method thereof
CN112560488A (en) * 2020-12-07 2021-03-26 北京明略软件系统有限公司 Noun phrase extraction method, system, storage medium and electronic equipment
CN112580349A (en) * 2020-12-24 2021-03-30 竹间智能科技(上海)有限公司 Phrase extraction method and device and electronic equipment
CN112651226A (en) * 2020-09-21 2021-04-13 深圳前海黑顿科技有限公司 Knowledge analysis system and method based on dependency syntax tree
CN112699677A (en) * 2020-12-31 2021-04-23 竹间智能科技(上海)有限公司 Event extraction method and device, electronic equipment and storage medium
CN112749549A (en) * 2021-01-22 2021-05-04 中国科学院电子学研究所苏州研究院 Chinese entity relation extraction method based on incremental learning and multi-model fusion
CN112749548A (en) * 2020-11-02 2021-05-04 万齐智 Rule-based Chinese structured financial event default completion extraction method
CN112784574A (en) * 2021-02-02 2021-05-11 网易(杭州)网络有限公司 Text segmentation method and device, electronic equipment and medium
CN113361272A (en) * 2021-06-22 2021-09-07 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title
CN113515630A (en) * 2021-06-10 2021-10-19 深圳数联天下智能科技有限公司 Triple generating and checking method and device, electronic equipment and storage medium
CN113609838A (en) * 2021-07-14 2021-11-05 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Document information extraction and mapping method and system
CN113705198A (en) * 2021-10-21 2021-11-26 北京达佳互联信息技术有限公司 Scene graph generation method and device, electronic equipment and storage medium
CN113743090A (en) * 2021-09-08 2021-12-03 度小满科技(北京)有限公司 Keyword extraction method and device
CN113761919A (en) * 2020-06-04 2021-12-07 国家计算机网络与信息安全管理中心 Entity attribute extraction method of spoken short text and electronic device
CN113821605A (en) * 2021-10-12 2021-12-21 广州汇智通信技术有限公司 Event extraction method
CN113971216A (en) * 2021-10-22 2022-01-25 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and memory
CN114186552A (en) * 2021-12-13 2022-03-15 北京百度网讯科技有限公司 Text analysis method, device and equipment and computer storage medium
CN114997398A (en) * 2022-03-09 2022-09-02 哈尔滨工业大学 Knowledge base fusion method based on relation extraction
CN115062609A (en) * 2022-08-19 2022-09-16 北京语言大学 Method and device for enhancing syntax dependence of Chinese language
CN115238217A (en) * 2022-09-23 2022-10-25 山东省齐鲁大数据研究院 Method for extracting numerical information from bulletin text and terminal
CN116361422A (en) * 2023-06-02 2023-06-30 深圳得理科技有限公司 Keyword extraction method, text retrieval method and related equipment
CN117609518A (en) * 2024-01-17 2024-02-27 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002097662A1 (en) * 2001-06-01 2002-12-05 Synomia Method and large syntactical analysis system of a corpus, a specialised corpus in particular
WO2010050675A2 (en) * 2008-10-29 2010-05-06 한국과학기술원 Method for automatically extracting relation triplets through a dependency grammar parse tree
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN107180045A (en) * 2016-03-10 2017-09-19 中国科学院地理科学与资源研究所 A kind of internet text contains the abstracting method of geographical entity relation
CN107291687A (en) * 2017-04-27 2017-10-24 同济大学 It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002097662A1 (en) * 2001-06-01 2002-12-05 Synomia Method and large syntactical analysis system of a corpus, a specialised corpus in particular
WO2010050675A2 (en) * 2008-10-29 2010-05-06 한국과학기술원 Method for automatically extracting relation triplets through a dependency grammar parse tree
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN107180045A (en) * 2016-03-10 2017-09-19 中国科学院地理科学与资源研究所 A kind of internet text contains the abstracting method of geographical entity relation
CN107291687A (en) * 2017-04-27 2017-10-24 同济大学 It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李明耀等: "基于依存分析的开放式中文实体关系抽取方法", 《计算机工程》 *
翟羽佳等: "基于文本挖掘的中文领域本体构建方法研究", 《情报科学》 *

Cited By (128)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568142B2 (en) 2018-06-04 2023-01-31 Infosys Limited Extraction of tokens and relationship between tokens from documents to form an entity relationship map
EP3579120A1 (en) * 2018-06-04 2019-12-11 Infosys Limited Extraction of tokens and relationship between tokens from documents to form an entity relationship map
CN109582975B (en) * 2019-01-31 2023-05-23 北京嘉和海森健康科技有限公司 Named entity identification method and device
CN109582975A (en) * 2019-01-31 2019-04-05 北京嘉和美康信息技术有限公司 It is a kind of name entity recognition methods and device
CN110083822B (en) * 2019-03-06 2022-11-15 杭州电子科技大学 Conversion method for converting requirement text into SysML requirement diagram
CN110083822A (en) * 2019-03-06 2019-08-02 杭州电子科技大学 It is a kind of from demand text conversion to the conversion method of SysML demand figure
CN109918672A (en) * 2019-03-13 2019-06-21 东华大学 A kind of structuring processing method of the Thyroid ultrasound report based on tree construction
CN109918672B (en) * 2019-03-13 2023-06-02 东华大学 Structural processing method of thyroid ultrasound report based on tree structure
CN109992651B (en) * 2019-03-14 2024-01-02 广州智语信息科技有限公司 Automatic identification and extraction method for problem target features
CN109992651A (en) * 2019-03-14 2019-07-09 广州智语信息科技有限公司 A kind of problem target signature automatic identification and abstracting method
CN109992777A (en) * 2019-03-26 2019-07-09 浙江大学 A kind of crucial semantic information extracting method of Chinese medicine state of an illness text based on keyword
CN110309393A (en) * 2019-03-28 2019-10-08 平安科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium storing program for executing
CN110309393B (en) * 2019-03-28 2023-06-20 平安科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium
CN109977235A (en) * 2019-04-04 2019-07-05 吉林大学 A kind of determination method and apparatus of trigger word
CN109977235B (en) * 2019-04-04 2022-10-25 吉林大学 Method and device for determining trigger word
CN110032649B (en) * 2019-04-12 2021-10-01 北京科技大学 Method and device for extracting relationships between entities in traditional Chinese medicine literature
CN110032649A (en) * 2019-04-12 2019-07-19 北京科技大学 Relation extraction method and device between a kind of entity of TCM Document
CN110046351B (en) * 2019-04-19 2022-06-14 福州大学 Text relation extraction method based on features under rule driving
CN110046351A (en) * 2019-04-19 2019-07-23 福州大学 Text Relation extraction method under regular drive based on feature
CN110263120A (en) * 2019-04-26 2019-09-20 北京零秒科技有限公司 Corpus labeling method and device
CN110222332A (en) * 2019-04-29 2019-09-10 闽江学院 The method for realizing name of the dish Entity recognition based on dependency analysis
CN110222332B (en) * 2019-04-29 2023-06-16 闽江学院 Method for realizing identification of famous entity based on dependency analysis
CN110110329B (en) * 2019-04-30 2022-05-17 湖南星汉数智科技有限公司 Entity behavior extraction method and device, computer device and computer readable storage medium
CN110110329A (en) * 2019-04-30 2019-08-09 湖南星汉数智科技有限公司 A kind of entity behavior derivation method, apparatus, computer installation and computer readable storage medium
CN110162788A (en) * 2019-05-06 2019-08-23 三角兽(北京)科技有限公司 The determination method and device of entity dependence
CN110162788B (en) * 2019-05-06 2021-02-09 腾讯科技(深圳)有限公司 Entity dependency relationship determination method and device
CN110119510B (en) * 2019-05-17 2023-02-14 浪潮软件集团有限公司 Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word
CN110119510A (en) * 2019-05-17 2019-08-13 浪潮软件集团有限公司 A kind of Relation extraction method and device based on transmitting dependence and structural auxiliary word
CN110196913A (en) * 2019-05-23 2019-09-03 北京邮电大学 Multiple entity relationship joint abstracting method and device based on text generation formula
CN110263336A (en) * 2019-06-12 2019-09-20 东华大学 A method of building breast ultrasound domain body
CN110263336B (en) * 2019-06-12 2023-06-23 东华大学 Method for constructing breast ultrasound field ontology
CN110377901A (en) * 2019-06-20 2019-10-25 湖南大学 A kind of text mining method for making a report on case for distribution line tripping
CN110263341A (en) * 2019-06-20 2019-09-20 贵州电网有限责任公司 A method of profile is excavated and positioned from text
CN110263341B (en) * 2019-06-20 2023-06-20 贵州电网有限责任公司 Method for mining and locating personal ability from text
CN110413732B (en) * 2019-07-16 2023-11-24 扬州大学 Knowledge searching method for software defect knowledge
CN110413732A (en) * 2019-07-16 2019-11-05 扬州大学 The knowledge searching method of software-oriented defect knowledge
CN110362673A (en) * 2019-07-17 2019-10-22 福州大学 Computer vision class papers contents method of discrimination and system based on abstract semantic analysis
CN110362673B (en) * 2019-07-17 2022-07-08 福州大学 Computer vision type thesis content distinguishing method and system based on abstract semantic analysis
CN110597998A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario entity relationship extraction method and device combined with syntactic analysis
CN110532553B (en) * 2019-08-21 2023-08-22 河海大学 Water conservancy space relation word recognition and extraction method
CN110543630A (en) * 2019-08-21 2019-12-06 北京仝睿科技有限公司 Method and device for generating text structured representation and computer storage medium
CN110502642A (en) * 2019-08-21 2019-11-26 武汉工程大学 A kind of entity relation extraction method based on interdependent syntactic analysis and rule
CN110502642B (en) * 2019-08-21 2024-01-23 武汉工程大学 Entity relation extraction method based on dependency syntactic analysis and rules
CN110532553A (en) * 2019-08-21 2019-12-03 河海大学 A kind of method water conservancy spatial relationship word identification and extracted
CN110555083A (en) * 2019-08-26 2019-12-10 北京工业大学 non-supervision entity relationship extraction method based on zero-shot
CN110555083B (en) * 2019-08-26 2021-06-25 北京工业大学 Non-supervision entity relationship extraction method based on zero-shot
CN110569504A (en) * 2019-09-04 2019-12-13 北京明略软件系统有限公司 relation word determining method and device
CN110569504B (en) * 2019-09-04 2022-11-15 北京明略软件系统有限公司 Relation word determining method and device
CN110569510A (en) * 2019-09-17 2019-12-13 四川长虹电器股份有限公司 method for identifying named entity of user request data
CN110705310B (en) * 2019-09-20 2023-07-18 北京金山数字娱乐科技有限公司 Article generation method and device
CN110705310A (en) * 2019-09-20 2020-01-17 北京金山数字娱乐科技有限公司 Article generation method and device
CN110765759A (en) * 2019-10-21 2020-02-07 普信恒业科技发展(北京)有限公司 Intention identification method and device
CN110874396A (en) * 2019-11-07 2020-03-10 腾讯科技(深圳)有限公司 Keyword extraction method and device and computer storage medium
CN110874396B (en) * 2019-11-07 2024-02-09 腾讯科技(深圳)有限公司 Keyword extraction method and device and computer storage medium
CN111382571B (en) * 2019-11-08 2023-06-06 南方科技大学 Information extraction method, system, server and storage medium
CN111382571A (en) * 2019-11-08 2020-07-07 南方科技大学 Information extraction method, system, server and storage medium
CN110909537A (en) * 2019-11-19 2020-03-24 曲英洲 Artificial intelligence method for modern Chinese component analysis
CN110991180A (en) * 2019-11-28 2020-04-10 同济人工智能研究院(苏州)有限公司 Command identification method based on keywords and Word2Vec
CN111177215A (en) * 2019-12-20 2020-05-19 京东数字科技控股有限公司 Method and device for generating financial data
CN111126052B (en) * 2019-12-26 2023-11-03 鼎富智能科技有限公司 Function point generation method, device, electronic equipment and computer readable storage medium
CN111126052A (en) * 2019-12-26 2020-05-08 中科鼎富(北京)科技发展有限公司 Function point generation method and device, electronic equipment and computer readable storage medium
CN111198932A (en) * 2019-12-30 2020-05-26 北京明略软件系统有限公司 Triple acquiring method and device, electronic equipment and readable storage medium
CN111198932B (en) * 2019-12-30 2023-03-21 北京明略软件系统有限公司 Triple acquiring method and device, electronic equipment and readable storage medium
CN111178079A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Triple extraction method and device
CN111178079B (en) * 2019-12-31 2023-05-26 北京明略软件系统有限公司 Triplet extraction method and device
CN111241827A (en) * 2020-01-10 2020-06-05 同方知网(北京)技术有限公司 Attribute extraction method based on sentence retrieval mode
CN111241827B (en) * 2020-01-10 2022-05-20 同方知网(北京)技术有限公司 Attribute extraction method based on sentence retrieval mode
CN111611399A (en) * 2020-04-15 2020-09-01 广发证券股份有限公司 Information event mapping system and method based on natural language processing
CN111597349A (en) * 2020-04-30 2020-08-28 西安理工大学 Rail transit standard entity relation automatic completion method based on artificial intelligence
CN111597349B (en) * 2020-04-30 2022-10-11 西安理工大学 Rail transit standard entity relation automatic completion method based on artificial intelligence
CN111597812A (en) * 2020-05-09 2020-08-28 北京合众鼎成科技有限公司 Financial field multiple relation extraction method based on mask language model
CN111597794A (en) * 2020-05-11 2020-08-28 浪潮软件集团有限公司 Dependency relationship-based 'yes' word and sentence relationship extraction method and device
CN111597794B (en) * 2020-05-11 2023-06-06 浪潮软件集团有限公司 Dependency relationship-based 'Yes' word and sentence relationship extraction method and device
CN111597351A (en) * 2020-05-14 2020-08-28 上海德拓信息技术股份有限公司 Visual document map construction method
CN111639499B (en) * 2020-06-01 2023-06-16 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN111639499A (en) * 2020-06-01 2020-09-08 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN113761919A (en) * 2020-06-04 2021-12-07 国家计算机网络与信息安全管理中心 Entity attribute extraction method of spoken short text and electronic device
CN111666738A (en) * 2020-06-09 2020-09-15 南京师范大学 Formalized coding method for motion description natural text
CN111680492A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 New word mining method and device and electronic equipment
CN111666767A (en) * 2020-06-10 2020-09-15 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111666767B (en) * 2020-06-10 2023-07-18 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111798847A (en) * 2020-06-22 2020-10-20 广州小鹏车联网科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112052340A (en) * 2020-08-10 2020-12-08 深圳数联天下智能科技有限公司 Data model construction method and device and electronic equipment
CN111984778B (en) * 2020-09-08 2022-06-03 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
CN111984778A (en) * 2020-09-08 2020-11-24 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
CN112131343B (en) * 2020-09-14 2023-07-07 新讯数字科技(杭州)有限公司 Method for identifying characters in Chinese novel dialogue
CN112131343A (en) * 2020-09-14 2020-12-25 杭州东信北邮信息技术有限公司 Chinese novel dialect dialogue character recognition method
CN112084793A (en) * 2020-09-14 2020-12-15 深圳前海微众银行股份有限公司 Semantic recognition method, device and readable storage medium based on dependency syntax
CN112651226B (en) * 2020-09-21 2022-03-29 深圳前海黑顿科技有限公司 Knowledge analysis system and method based on dependency syntax tree
CN112651226A (en) * 2020-09-21 2021-04-13 深圳前海黑顿科技有限公司 Knowledge analysis system and method based on dependency syntax tree
CN112148838B (en) * 2020-09-23 2024-04-19 北京中电普华信息技术有限公司 Service source object extraction method and device
CN112148838A (en) * 2020-09-23 2020-12-29 北京中电普华信息技术有限公司 Business source object extraction method and device
CN112183059A (en) * 2020-09-24 2021-01-05 万齐智 Chinese structured event extraction method
CN112749548A (en) * 2020-11-02 2021-05-04 万齐智 Rule-based Chinese structured financial event default completion extraction method
CN112749548B (en) * 2020-11-02 2024-04-26 万齐智 Rule-based default completion extraction method for Chinese structured financial events
CN112364648A (en) * 2020-12-02 2021-02-12 中金智汇科技有限责任公司 Keyword extraction method and device, electronic equipment and storage medium
CN112560488A (en) * 2020-12-07 2021-03-26 北京明略软件系统有限公司 Noun phrase extraction method, system, storage medium and electronic equipment
CN112380868A (en) * 2020-12-10 2021-02-19 广东泰迪智能科技股份有限公司 Petition-purpose multi-classification device based on event triples and method thereof
CN112380868B (en) * 2020-12-10 2024-02-13 广东泰迪智能科技股份有限公司 Multi-classification device and method for interview destination based on event triplets
CN112580349A (en) * 2020-12-24 2021-03-30 竹间智能科技(上海)有限公司 Phrase extraction method and device and electronic equipment
CN112580349B (en) * 2020-12-24 2023-09-29 竹间智能科技(上海)有限公司 Phrase extraction method and device and electronic equipment
CN112699677B (en) * 2020-12-31 2023-05-02 竹间智能科技(上海)有限公司 Event extraction method and device, electronic equipment and storage medium
CN112699677A (en) * 2020-12-31 2021-04-23 竹间智能科技(上海)有限公司 Event extraction method and device, electronic equipment and storage medium
CN112749549A (en) * 2021-01-22 2021-05-04 中国科学院电子学研究所苏州研究院 Chinese entity relation extraction method based on incremental learning and multi-model fusion
CN112749549B (en) * 2021-01-22 2023-10-13 中国科学院电子学研究所苏州研究院 Chinese entity relation extraction method based on incremental learning and multi-model fusion
CN112784574B (en) * 2021-02-02 2023-09-15 网易(杭州)网络有限公司 Text segmentation method and device, electronic equipment and medium
CN112784574A (en) * 2021-02-02 2021-05-11 网易(杭州)网络有限公司 Text segmentation method and device, electronic equipment and medium
CN113515630B (en) * 2021-06-10 2024-04-09 深圳数联天下智能科技有限公司 Triplet generation and verification method and device, electronic equipment and storage medium
CN113515630A (en) * 2021-06-10 2021-10-19 深圳数联天下智能科技有限公司 Triple generating and checking method and device, electronic equipment and storage medium
CN113361272A (en) * 2021-06-22 2021-09-07 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title
CN113361272B (en) * 2021-06-22 2023-03-21 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title
CN113609838A (en) * 2021-07-14 2021-11-05 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Document information extraction and mapping method and system
CN113743090B (en) * 2021-09-08 2024-04-12 度小满科技(北京)有限公司 Keyword extraction method and device
CN113743090A (en) * 2021-09-08 2021-12-03 度小满科技(北京)有限公司 Keyword extraction method and device
CN113821605A (en) * 2021-10-12 2021-12-21 广州汇智通信技术有限公司 Event extraction method
CN113705198A (en) * 2021-10-21 2021-11-26 北京达佳互联信息技术有限公司 Scene graph generation method and device, electronic equipment and storage medium
CN113971216A (en) * 2021-10-22 2022-01-25 北京百度网讯科技有限公司 Data processing method and device, electronic equipment and memory
CN114186552A (en) * 2021-12-13 2022-03-15 北京百度网讯科技有限公司 Text analysis method, device and equipment and computer storage medium
CN114186552B (en) * 2021-12-13 2023-04-07 北京百度网讯科技有限公司 Text analysis method, device and equipment and computer storage medium
CN114997398A (en) * 2022-03-09 2022-09-02 哈尔滨工业大学 Knowledge base fusion method based on relation extraction
CN114997398B (en) * 2022-03-09 2023-05-26 哈尔滨工业大学 Knowledge base fusion method based on relation extraction
CN115062609A (en) * 2022-08-19 2022-09-16 北京语言大学 Method and device for enhancing syntax dependence of Chinese language
CN115062609B (en) * 2022-08-19 2022-12-09 北京语言大学 Method and device for enhancing syntax dependence of Chinese language
CN115238217A (en) * 2022-09-23 2022-10-25 山东省齐鲁大数据研究院 Method for extracting numerical information from bulletin text and terminal
CN116361422B (en) * 2023-06-02 2023-09-19 深圳得理科技有限公司 Keyword extraction method, text retrieval method and related equipment
CN116361422A (en) * 2023-06-02 2023-06-30 深圳得理科技有限公司 Keyword extraction method, text retrieval method and related equipment
CN117609518A (en) * 2024-01-17 2024-02-27 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure
CN117609518B (en) * 2024-01-17 2024-04-26 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure

Also Published As

Publication number Publication date
CN109241538B (en) 2022-12-20

Similar Documents

Publication Publication Date Title
CN109241538A (en) Based on the interdependent Chinese entity relation extraction method of keyword and verb
US10496756B2 (en) Sentence creation system
CN110502642B (en) Entity relation extraction method based on dependency syntactic analysis and rules
CN102662936B (en) Chinese-English unknown words translating method blending Web excavation, multi-feature and supervised learning
Pourvali et al. Automated text summarization base on lexicales chain and graph using of wordnet and wikipedia knowledge base
CN109960756A (en) Media event information inductive method
CN109522418A (en) A kind of automanual knowledge mapping construction method
Sahu et al. Prashnottar: a Hindi question answering system
CN110851714A (en) Text recommendation method and system based on heterogeneous topic model and word embedding model
Lynn et al. An improved method of automatic text summarization for web contents using lexical chain with semantic-related terms
Verma et al. A new approach for idiom identification using meanings and the web
CN103678287A (en) Method for unifying keyword translation
Bella et al. Domain-based sense disambiguation in multilingual structured data
Adhitama et al. Topic labeling towards news document collection based on Latent Dirichlet Allocation and ontology
Abend et al. Fully unsupervised core-adjunct argument classification
Ogrodniczuk et al. Rule-based coreference resolution module for Polish
Ung et al. Combination of features for vietnamese news multi-document summarization
Tran et al. A named entity recognition approach for tweet streams using active learning
CN107480142B (en) Method for extracting evaluation object based on dependency relationship
Ibrahiem et al. FEATURE EXTRACTION ENCHANCEMENT IN USERS’ATTITUDE DETECTION
Sahin Classification of turkish semantic relation pairs using different sources
Ji et al. Measurement of sentence similarity based on constituency parsing and dilated convolution
Wu et al. Fine-grained product feature extraction in chinese reviews
Ding et al. Building chinese event type paradigm based on trigger clustering
AU2021105953A4 (en) Method for fine-grained domain terminology self-learning based on contextual semantics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant