CN109241538B - Chinese entity relation extraction method based on dependency of keywords and verbs - Google Patents

Chinese entity relation extraction method based on dependency of keywords and verbs Download PDF

Info

Publication number
CN109241538B
CN109241538B CN201811124153.1A CN201811124153A CN109241538B CN 109241538 B CN109241538 B CN 109241538B CN 201811124153 A CN201811124153 A CN 201811124153A CN 109241538 B CN109241538 B CN 109241538B
Authority
CN
China
Prior art keywords
entity
verb
relationship
subject
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811124153.1A
Other languages
Chinese (zh)
Other versions
CN109241538A (en
Inventor
许青青
谢赟
韩欣
卓建飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Datatom Information Technology Co ltd
Original Assignee
Shanghai Datatom Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Datatom Information Technology Co ltd filed Critical Shanghai Datatom Information Technology Co ltd
Priority to CN201811124153.1A priority Critical patent/CN109241538B/en
Publication of CN109241538A publication Critical patent/CN109241538A/en
Application granted granted Critical
Publication of CN109241538B publication Critical patent/CN109241538B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese entity relation extraction method based on keyword and verb dependence, which takes a large-scale unstructured free text as a target text, firstly carries out word segmentation on the text and extracts keywords to form a text keyword lexicon; then, carrying out sentence segmentation, word segmentation, part of speech tagging, named entity identification and dependency syntactic analysis processing on the text, and constructing an entity corpus by combining a named entity word bank and a keyword word bank; constructing an entity relation syntactic rule from a verb according to the Chinese sentence composition characteristics, the syntactic structure and the dependency relationship among the words, and matching the relation syntactic rule for each sentence in the text; and finally, outputting the relation triple to obtain a text relation triple set. The method can enable the entity relationship extraction of the large-scale Chinese text to be more effective and accurate.

Description

Chinese entity relation extraction method based on dependency of keywords and verbs
Technical Field
The invention relates to a Chinese entity relation extraction method, in particular to a large-scale free text extraction method based on keyword and verb dependency analysis.
Background
With the rapid development of internet information technology, text information on the internet has been explosively increased. How to extract the information needed by people from large-scale text information quickly and accurately becomes a research hotspot. Therefore, information extraction techniques have been developed. The entity relation extraction is used as an important component of information extraction, the semantic association between entities is mined from a natural language text, the establishment of a domain ontology is facilitated, the construction of a knowledge graph is promoted, and the search intention of a user can be further understood through deep mining and analysis of semantic information between the entities, so that more accurate search service is provided for the user, and the search experience of the user is improved.
The traditional Chinese entity relation extraction is oriented to the extraction of specific field texts, limited relation categories, limited entity categories and the like, and a common method is based on a supervised machine learning method. The supervised entity relationship extraction method needs to manually label a relationship corpus and predefine relationship types, but in the face of massive unstructured and cross-domain Chinese texts in reality, the entity relationship types are difficult to be determined in advance, so that the supervised machine learning method meets severe challenges in entity relationship extraction. Therefore, in the face of large-scale free text, the open unsupervised relationship extraction method is receiving more and more attention.
The existing open unsupervised relation extraction methods mainly comprise entity extraction methods based on clustering algorithm, heuristic rules and syntactic analysis. The clustering-based entity relation extraction method comprises the steps of obtaining entity pairs through limits of distance, positions and the like, clustering the entity pairs with similar semantics into the same class cluster by using a certain clustering algorithm, and selecting a representative word as a relation expression of each class cluster. The method does not need to predefine relationship categories, label corpora and formulate artificial rules, thereby greatly reducing manual participation, but the method has some defects, such as inherent problems of cluster number and cluster center determination in a clustering algorithm, needs a large number of related entity pairs to train good effect, has low recall rate for low-frequency entity pairs, and is difficult to accurately induce the relationship descriptors of each category. An entity extraction method based on heuristic rules generally analyzes a large number of Chinese entity extraction structural features to summarize the distribution of entity-to-entity distances and relationship indicators of common triples among entities, then generates candidate triples by limiting the distances among the entities and the positions of the relationship indicators, and finally filters the candidate triples. Some researches have been made to mine the relationship indicating words by using a global ordering and type ordering method, and then filter the relationship triples by using the relationship indicating words and the sentence pattern rules, the accuracy of the entity relationship triples extracted from the text by the method is high, but some related entity pairs may be filtered out by limiting the distance between the entity pairs and triples with semantic relevance are filtered out due to incomplete relationship indicating words, so that the extraction recall rate is low. Syntactic analysis-based entity relationship extraction relationships between pairs of entities are identified by analyzing sentence syntactic structures and inter-word dependency relationships. The method is simple in operation, but the obtained relation words may be the combination of a plurality of words, and the combination words obtained in some complex Chinese sentences may not necessarily form correct phrases, so that the relation expression may be abstract and the meaning is fuzzy. In addition, another syntax analysis-based method is to deeply mine the dependency semantics contained in the shortest dependency path between entities, obtain a dependency semantic paradigm by using the characteristics of dependency relationship, part-of-speech information, position relationship, and the like as limitations, and extract the triple relationship if the dependency path between entity pairs in the input text matches the proposed dependency semantic paradigm. The method has the advantages of relatively accurate extraction of the relation words, low calculation complexity and high extraction efficiency, and the extraction effect is not ideal for complex texts, especially sentences with a plurality of entities. However, in general, the entity relationship based on the syntactic analysis has low computational complexity and can deal with the extraction of the entity relationship of large-scale and cross-domain Chinese texts.
In the present generation of many lawfishes, regardless of Chinese and foreign languages, verbs are emphasized to be the core of sentence structures, and 'verb-centered utterances' are claimed, because the objective syntactic semantic association between lexical items is mainly expressed on the restriction of verb lexical items on other lexical items (mainly noun lexical items), verbs can be used as the starting point for inspecting the sentence semantic structures, and the semantic combination relationship between the verbs and the preceding and following nominal part-of-speech components can be mined. Through statistical analysis of a large number of relation instances, verbs connecting two entities can generally represent semantic relations among the entities, so that components dependent on the verbs can be analyzed, lexical items forming dependency relations of principals, devils, dynamic compensation and the like with the verbs are discovered, and if the lexical items are entities, more accurate entity relation triples can be constructed. For example, for a verb of a sentence, if a subject and an object that depend on the verb are found and both are entities, the subject, object, and verb may be constructed as a triplet.
Research finds that, at present, entity relation extraction in non-specific fields mainly aims at relation extraction between people, organizations and places, and some articles such as an article about introduction of big data or a product specification have less occurrence frequency of named entities such as people, organizations and places, and if only the semantic relation of the named entities is extracted, it is difficult to mine deep semantic relation of texts. And words, particularly nouns, frequently appearing in one text document can reflect main contents of document description to a certain extent, so that key nouns can be added to expand a document entity set, and more relations among main body information in the text are mined, so that a richer semantic network is constructed.
Disclosure of Invention
The invention aims to provide an unsupervised relation extraction method which can enable the extraction of the cross-domain free text relation to be more accurate, namely a Chinese entity relation extraction method based on keyword and verb dependence, so that the extraction of the entity relation of a large-scale Chinese text is more effective and more accurate.
The technical scheme for realizing the purpose is as follows:
a Chinese entity relation extraction method based on keyword and verb dependency comprises the following steps:
segmenting a text, extracting keywords, and generating a text keyword lexicon;
segmenting a text sentence, and respectively performing word segmentation, part of speech tagging, named entity identification and dependency syntactic analysis on each single sentence to obtain word segmentation, part of speech, named entity and dependency syntactic analysis information of each single sentence;
acquiring a verb set and an entity set in each single sentence;
when the number of verbs and entities in a single sentence is larger than 0, analyzing whether the lexical items depending on the verbs are matched with the syntactic relation rule or not, if so, obtaining an initial entity relation triple, and then expanding the entity relation triple; otherwise, matching the next verb of the single sentence;
and after extracting the relation of all the single sentences in the text, obtaining a text triple set.
Preferably, the inputted text is divided into sentences according to the periods, exclamation marks and question marks, so as to obtain a single sentence set.
Preferably, when extracting keywords, filtering word segmentation results according to part-of-speech characteristics, only reserving part-of-speech terms as candidate keywords, calculating TF-IDF weights of the candidate keywords, and finally inputting words with thresholds larger than set thresholds into a text keyword set; wherein, TF refers to the number of times a word appears in the text, and IDF refers to the inverse file frequency.
Preferably, the entity set is composed of a text global keyword set and named entities.
Preferably, the relationship syntactic rule is that a verb is taken as a candidate relationship word according to a sentence dependency syntactic structure, whether the dependency relationships of other lexical items and the verb in a sentence are subject, verb, intervening and dynamic complement relationships is analyzed, and if the dependency relationships between two lexical items and the verb in the sentence are two of the relationships, such as the subject and the verb, the subject and the intervening, and the like, and the two lexical items are entities, an initial entity relationship triple can be determined.
Preferably, the syntactic relation rule includes an isA rule for judging verbs of classes and a non-isA rule for judging verbs of other classes.
Preferably, in the isA rule, the sentence structure of the rule related to the relation word of the Entity is represented as "Entity1+ Noun + is + Entity2" or "Entity2+ is + Entity1+ Noun", and the Entity relationship triple is preliminarily represented as (Entity 1, noun, entity 2); wherein, entity1 and Entity2 are Entity pairs in the sentence, one Entity is in a main-predicate relationship or a moving-object relationship with the judgment verb, and the other Entity has no direct relationship with the judgment verb; noun represents a Noun in a sentence, and has a main meaning relationship or an actor-guest relationship with a judgment verb, and another entity which has no direct relationship with the judgment verb but has a centering dependence relationship with the Noun is used for modifying the Noun;
in the isA rule, a rule that an Entity is unrelated to a relation word means that an Entity and a judgment verb are in a main-predicate relationship in a sentence, a Noun and the judgment verb are in a guest-moving relationship, and the Entity are in a parallel relationship, the sentence structure can be represented as "Entity1+ Conj + Entity2 (+ +) + is + Noun", and the relation triple can be primarily represented as (Entity 1, noun, entity 2); where Entity1 and Entity2 are pairs of entities in a sentence, entity2 (++) indicates that there may be one or more entities juxtaposed with Entity1, and Noun is a Noun in the sentence.
Preferably, the non-isA rules include verb subject rules and verb non-subject rules;
the verb subject rule comprises a main subject guest structure, a main subject intermediary structure, a main subject supplement structure, a previous subject intermediary structure and other structures, and specifically comprises the following steps:
the subject-predicate object structure is that starting from a verb, according to the dependency syntax, the subject and the object of the verb exist and are both entities, and an initial entity relationship triple can be built;
the subject-predicate-intermediary structure is that starting from a verb, according to the dependency syntax, the subject of the verb exists and is an entity, and the preposition of the verb has an object and is an entity, so that an initial entity relation triple can be extracted;
the subject-predicate object-supplement structure is started from a verb, the verb is a verb of a short object, according to the dependency syntax, the verb has a subject and is an entity, a complement dependent on the verb exists, and the complement has an object and is an entity, so that an initial entity relationship triple can be formed;
the preposed guest-mediated structure is that starting from a verb, according to dependency syntax, a preposed object depending on the verb exists and is an entity, a preposed word depending on the verb exists, the preposed word has an object and is an entity, and an initial entity relation triple can be formed;
the other structures are started from a verb, the subject of the verb exists and is an entity according to the dependency syntax, other structures dependent on the verb exist, the structure exists and is an entity, and the triple relationship can be constructed;
the verb no-subject rule comprises a verb parallel structure and a sentence no-subject structure, and specifically comprises the following steps:
the verb parallel structure indicates that a lexical item exists in a sentence and can directly establish an verb relationship or indirectly establish a guest-intervening relationship and a guest-complementing relationship with a verb, the lexical item is an entity, no lexical item capable of establishing a subject-predicate relationship with the verb exists, but other verbs parallel to the verb exist, and subjects of the verbs are consistent, so that a subject of the parallel verbs is used as a subject, and an entity relationship triple can be constructed;
the sentence non-subject structure indicates that the sentence has no subject, but a lexical item can directly establish a moving subject relationship or indirectly establish a intervening subject relationship and a supplementing subject relationship with a verb, the lexical item is an entity, the previous sentence of the sentence can be traced according to a Chinese heuristic rule, and the subject of the previous sentence core verb is used as the subject of the sentence; in the dependency syntax theory, it is claimed that the core verb is the central component of a sentence, dominating other components, while a sentence may have multiple verbs, each of which may have a subject, so the rule takes only the subject of the core verb of the previous sentence as the subject of the sentence.
Preferably, the expanding the entity relationship triplet includes entity word expanding, relation word expanding and parallel triplet expanding, and specifically includes:
the entity word expansion is to combine the keyword entity with the fixed language modifier;
the relation word expansion comprises adding negative number words and adding non-entity objects;
the parallel triple expansion is to form a new triple by the parallel entity and the relation word when the entity in the acquired entity relation triple exists in the parallel entity.
The beneficial effects of the invention are: the method is different from the method that the general Chinese entity extraction only concerns the relation among named entities (names of people, places and organizations), expands the traditional named entity set by the global keywords of the text, increases the semantic relation of the keywords describing the main content of the text, and enriches the constructed text semantic network. Compared with the common unsupervised entity extraction method based on syntactic analysis, the method mostly searches for the relation words from the entity pairs, and has low efficiency.
Drawings
FIG. 1 is a flow chart of entity relationship extraction according to the present invention;
FIG. 2 is a diagram of the syntactic relationship rule classification of the present invention;
FIG. 3 is a diagram illustrating an example of parsing in accordance with the present invention;
FIG. 4 is a diagram of two examples of syntactic analysis in accordance with the present invention;
FIG. 5 is a diagram of three examples according to syntactic analysis in accordance with the present invention;
FIG. 6 is a diagram illustrating syntax rules for relationships proposed by the present invention;
FIG. 7 is a diagram illustrating a second exemplary syntax rule of relationships proposed in the present invention;
FIG. 8 is a three-dimensional schematic diagram of a syntactic relation rule according to the present invention;
FIG. 9 is a diagram illustrating four exemplary syntax rules for relationships proposed by the present invention;
FIG. 10 is a diagram illustrating an example of parsing in accordance with the present invention;
FIG. 11 is a diagram of two examples of the present invention according to syntactic analysis.
Detailed Description
The invention will be further explained with reference to the drawings.
The Chinese entity relationship extraction method based on keyword and verb dependency of the invention analyzes dependency relationship of verbs, realizes entity relationship extraction of large-scale free texts, provides data support for constructing text semantic networks, and specifically comprises the following steps:
step 1, segmenting input texts, extracting keywords, and generating a keyword lexicon from the extracted keywords.
The purpose of extracting the global keywords of the text is to expand a traditional entity set, the traditional entity set only names entities such as names of people, places, names of organizations and the like, but the invention is a large-scale domain-free text, if a text document almost has no names of people, places and names of organizations, the entity relationship can not be extracted, so the invention takes the keywords as a part of the entity set to extract the entity relationship and excavate the semantic relationship in the document. The keywords are words or phrases capable of representing text subjects, most of the keywords are nouns, the keywords in one text corpus may frequently appear in the document, and the number of occurrences in other documents is small, so when the keywords are extracted, the text is segmented, some words of non-famous parts of speech are filtered according to the part of speech characteristics, only the famous part of speech is kept as candidate keywords, TF-IDF (word frequency-inverse document frequency) weight values of the candidate keywords are calculated, the words with the threshold value larger than the set threshold value are used as the text keywords, and a text keyword set is input. Where TF refers to the number of times a word appears in a text, IDF refers to the inverse document frequency, and extracting TF-IDF features tends to identify words that often appear in a certain document but are not common in other documents.
And 2, performing sentence splitting processing on the input text according to the periods, exclamation marks and question marks, and outputting a single sentence set.
And 3, performing word segmentation, part of speech tagging, named entity identification and dependency syntactic analysis on each single sentence respectively to obtain word segmentation, part of speech, named entity and dependency syntactic analysis information of each single sentence.
The named entity mainly obtains entities such as a person name, a place name and a mechanism name in a sentence. The dependency parsing is to analyze the interdependency relationship among the language units (terms) of the sentence, so as to identify grammatical components such as 'main predicate object', 'fixed form complement', and the like in the sentence.
And 4, acquiring a verb set and an entity set in each single sentence, if the number of the verbs and the entities in the single sentence is more than 0, performing the step 5, and if not, finishing the processing of the sentence.
The entities not only comprise named entities, but also comprise a global keyword set for describing the subject content of the text, so that the text entity library is more abundant.
Step 5, analyzing whether the lexical items depending on the verbs match the syntactic relation rules, if so, obtaining an initial entity relation triple, and then expanding the entity relation triple; otherwise, matching the next verb of the sentence;
the relation syntactic rule is established before entity extraction, and is constructed according to the dependency syntactic analysis and the heuristic rule of combining some Chinese grammars. In the modern Chinese grammar, the verb has certain restriction effect on other semantic components in the sentence, such as a professional affair, a subject affair and the like, so that the verb can be used as a starting point for analyzing the semantic structure of the sentence to mine the semantic combination relationship between the verb and the preceding and following nominal part of the verb. Therefore, the syntactic relation rule of the invention is to analyze the components dependent on the verb from the verb, for example, when the subject and the object of the verb exist and are both entities, the subject-predicate structure forms an entity relation triple. Only the initial practice relationship triple is obtained when the relationship syntax rule is satisfied, and the entity relationship triple needs to be further modified and supplemented.
And 6, after all sentences in the text are subjected to relation extraction, a text triple set can be obtained.
In combination with the above, the core of the present invention lies in the relationship syntax rule and entity relationship triplet extension, and the following focuses on the relevant contents:
1. rules of relationship syntax
In modern Chinese grammar, verb is the foothold of semantic analysis of sentences, and has certain restriction effect on other semantic components in the sentences such as affairs and affairs. Through analysis of a large number of entity relationship instances, entity relationship triples always appear in some fixed syntactic structures, and verbs often play a connecting role in these structures, so that verbs connecting two entities can generally represent semantic relationships between the entities. In the dependency syntax structure, the main labeled relationships are shown in table 1, where the structure related to verbs is: a cardinal relationship, a guest-moving relationship, a preposed object, a complement structure, a structure in a form, a guest-intervening relationship, etc. And mapping a plurality of structure combinations in the structures into a relation syntax rule, and applying the relation syntax rule to entity relation extraction. A verb is a judgment verb (such as the verb 'yes') which is not generally used as a relation word in relation extraction, and the associated noun plays a role of relation connection. Therefore, the invention processes judgment class verbs independently, and divides the relationship rules into two major classes, namely the isA rules and the non-isA rules, and each class is divided into various minor classes, referring to fig. 2, the isA rules comprise rules of entities related to relationship words and rules of entities unrelated to relationship words. The non-isA rules comprise verb subject rules and verb no subject rules, wherein the verb subject rules comprise a subject guest structure, a subject intermediary structure, a subject supplement structure, a previous guest intermediary structure and other structures; the verb non-subject structure comprises a verb parallel structure and a sentence non-subject structure.
Type of relationship Dependency label
Relationship between major and minor SBV
Moving guest relationship VOB
Preposition object FOB
Middle structure ADV
Dynamic compensation structure CMP
Intermediary relation POB
Centering structure ATT
In a parallel relationship COO
Concurrent language DBL
Inter-guest relationships IOB
Left additive relationship LAD
Right additive relationship RAD
Independent structure IS
Punctuation WP
Core relationships HAD
TABLE 1
Table 1 labels the relationships for dependency syntax.
1. The isA rule:
the function of judging verb (such as verb 'yes') in modern Chinese is mainly table judgment, table description and table existence. Wherein, the table judgment is to show what the things belong to or are equal to, for example, beijing is the capital of China; the table description is representative of a characteristic, condition or situation of an object, such as the vehicle being red; the presence of a table indicates the presence of things, such as cattle and sheep. When the entity extraction relationship is researched, only the table judgment function is considered. According to the relational example, the verb "yes" is not generally used as a relational descriptor, and the isA rules are divided into rules that an entity is related to a relational term and rules that an entity is not related to a relational term.
1.1, rules of the entity related to the relation words:
the rule that an entity is related to a relationship word means that the subject of the verb "is" entity and the object is a general noun, or the subject of the verb "is" general noun and the object is entity and there is a fixed relationship that another entity is with the noun. The sentence structure is expressed as "Entity1+ Noun + is + Entity2" or "Entity2+ is + Entity1+ Noun", and the Entity relationship triple is preliminarily expressed as (Entity 1, noun, entity 2); wherein, entity1 and Entity2 are Entity pairs in sentences, one Entity is in a main-to-predicate relationship or a moving-to-guest relationship with verb 'yes', and the other Entity has no direct relationship with verb 'yes'; noun represents a Noun in a sentence, there is a predicate or verb relationship with the verb "is", and there is another entity that has no direct relationship with the verb "is", but is a central dependency relationship with the Noun, to modify the Noun. For example, "the first capital of China is Beijing. "dependency analysis of the sentence is shown in fig. 3, the verb" is "as the core word of the sentence, there are definite languages of the subject" capital ", the object" beijing ", and" chinese "as" capital ", and a triple (chinese, capital, beijing) can be constructed; for another example, "Zhang is the son of Zhang III, famous movie star. "its dependency parsing referring to fig. 4, the verb" is "as the core word of the sentence, there are the subjects" zhan-ji ", the object" son ", and" zhan-san "as the final words of" son ", and may constitute the triples (zhan-san, son, zhan-ji).
1.2, rules that entities are not related to relation words:
the rules that an entity is not related to a relationship term are that the verb "is" the existence subject is an entity, the existence object is a common noun, and there is another entity or entities that are in a side-by-side relationship with the subject entity. The sentence structure is expressed as 'Entity 1+ Noun + is + Entity 2' or 'Entity 2+ is + Entity1+ Noun', and the Entity relationship triple is preliminarily expressed as (Entity 1, noun, entity 2); wherein, entity1 and Entity2 are Entity pairs in sentences, one Entity is in a main-to-predicate relationship or a moving-to-guest relationship with verb 'yes', and the other Entity has no direct relationship with verb 'yes'; noun represents a Noun in a sentence, there is a predicate or verb relationship with the verb "is", and there is another entity that has no direct relationship with the verb "is", but is a central dependency relationship with the Noun, to modify the Noun. For example, the small plum and the small king are couples. "dependency analysis thereof referring to fig. 5, the verb" is "as the core word of the sentence, and there are the subjects" dui ", the subjects" couple "," dui "and" queen "in parallel relationship, which can form a triple (dui, couple, queen).
2. non-isA rules:
the non-isA rule is a syntactic rule for constructing an entity relationship by using a non-judgment verb described as an entity semantic relationship as a core. Verbs are used as the core of syntactic structures and semantic structures, and in syntactic view, verbs determine the basic appearance of sentence structures, and in semantic view, semantic structures are built by using verbs as the core, so that the verbs are called predicate structures or predicate structures and active core structures. The invention constructs a non-isA relationship syntactic rule according to the mapping between a basic predicate structure and a typical syntactic structure, two lexical items governed by verbs are required to be obtained due to semantic connection of the verbs among the lexical items, the verbs are divided into a verb-to-subject rule and a verb-to-subject rule according to whether the verbs are governed or not, and the verbs can be divided into a plurality of subclasses under each class of rules (see fig. 2) as follows:
2.1, verb has subject rule:
(1) The method is characterized in that a predicate structure rule is constructed by starting from a verb, the subject and the object of the verb exist and are entities according to a dependency syntax, an initial triple is constructed, the dependency relationship between the verb and the entities is shown in FIG. 6, wherein E1 and E2 represent the entities, V represents the verb, and the specific definition of a dependency arc is shown in Table 1. For example, "wang wu visits the united states. The subject of the "middle verb" visit "is" wang five ", the object of the verb is" usa ", and triples (wang five, visit, usa) can be output.
(2) The rule is that a verb and an entity are syntactically represented as a main-meaning intermediary structure, namely, starting from a verb, the subject of the verb exists and is an entity according to the dependency syntax, the preposition of the verb exists and is an entity, the preposition dependent on the verb exists and is an entity, the initial triple can be extracted, the dependency relationship between the verb and the entity is shown in FIG. 7, wherein E1 and E2 represent the entity, V represents the verb, P represents the preposition, and the specific definition of the dependency arc is shown in Table 1. For example, the subject of the verb "release" in "Li four presents at college" is "Li four", the preposition "at" with subject "dependent on the verb" release "may output an initial triplet (Li four, presents, college).
(3) The rule means that a verb and an entity are syntactically represented as a main meaning object supplement structure, namely, starting from a verb, the verb is a short verb, according to the dependency syntax, the subject of the verb exists and is the entity, a complement dependent on the verb exists, the complement has an object and is the entity, a preliminary triple can be formed, the specific dependency relationship represents joining in a graph 8, wherein E1 and E2 represent the entities, V represents the verb, P represents the preposition, and the specific definition of the dependency arc refers to a table 1. For example, "Wangwu graduates at Harvard university. The "division of the" middle-missing-object verb "has the subject" wang wu ", the" in the supplementary words "in the object" harvard university ", and the triplet (wang wu, graduation, harvard university) can be output.
(4) The rule refers to that the verb and the entity are syntactically represented as a previous guest-intervening structure, namely, starting from a verb, the verb has a preposed object and is an entity and has a preposed word under the control of the verb according to a dependency syntax, and the preposed word has an object and is an entity, an initial triple can be formed, the representation of the specific dependency relationship refers to FIG. 9, E1 and E2 refer to the entity, V refers to the verb, P refers to the preposed word, and the specific definition of the dependency arc refers to Table 1. The rules essentially have no subject, but the verb still governs two terms, so it can be viewed as a subject-like structure, which is commonly found in passive sentences, such as "zhang san is recorded by the university of double denier. The preposed object of the ' Chinese verb ' admission ' is ' Zhang III ', the preposed word ' quilt ' depending on the ' admission ' exists, the preposed word ' quilt ' has the object ' Compound Dan university ', and triples (Compound Dan university, admission, zhang III) can be output.
(5) The rule is a rule that, starting from a verb, a subject of the verb is an entity and another structure dependent on the verb exists and an object is an entity, and a triplet relationship can be constructed. For example, "lie four emphasizes the advance of financial reform on corporate seating. "dependency of the sentence referring to fig. 10, the verb" emphasizes "the presence of the subject" lie four "and the object" promotion ", and the verb" promotion "the presence of the object" reform ", an initial triple (lie four, emphasis, reform) can be constructed.
2.2, verb no subject rule:
(1) The verb parallel structure shows that the verb does not have a subject, a certain term can directly establish an actor-guest relationship with the verb or indirectly establish the actor-guest relationship by means of another word, the term is an entity, but other verbs parallel to the verb exist, and the subjects of the two are consistent, so that an entity relationship triple can be constructed by taking the subject of the parallel verb as the subject, and the specific dependency syntax shows that the reference is shown in FIG. 11, wherein E1, E2 and E3 show the entity, V1 and V2 show the verb, and the dependency arc libeVOB shows that the term and the verb may be in a direct actor-guest relationship or in an indirect actor-guest relationship obtained by means of a certain word (for example, a certain term is an object of a preposition, and the preposition depends on a certain verb). For example, "Liqu visits China and delivers a speech at college university. The former verb in can extract the triplets (lie four, visit, china) according to the rule of "subject-predicate-object structure", the second verb "publish" gets the object "college of coordination" through the intermediary structure, but there is no subject, but the verb "publish" is juxtaposed to "visit", so that another preliminary triplet (lie four, publish, college of coordination) can be determined according to subject-agreement of the two verbs. The embodiment is only exemplified by two parallel verbs, but is not limited to two parallel verbs, and the rule is also applicable to sentences containing a plurality of parallel verbs.
(2) The sentence has no subject structure, the rule indicates that the sentence has no subject, a lexical item can directly establish a moving object relationship with a verb or indirectly establish a relationship such as a moving object, a moving object and the like, the lexical item is an entity, the previous sentence of the sentence can be traced according to a Chinese heuristic rule, and the subject of the core verb of the previous sentence is used as the subject of the sentence; in the dependency syntax theory, the proposing core verb is the central component of the sentence, and dominates other components, while a sentence may have multiple verbs, and each verb may have a subject, so the rule takes only the subject of the previous sentence core verb as the subject of the sentence.
2. Entity relationship triplet extension
Satisfying the relation syntax rule only obtains the initial triple, the invention needs to further expand the triple, including entity expansion, relation word expansion and parallel triple expansion, specifically:
(1) Entity expansion, the purpose of the entity expansion is to merge noun phrases which are segmented into a plurality of terms in a word segmentation stage, and the invention merges keyword entities (non-names, place names and organization names) and fixed language modifiers thereof; if the previous word of the keyword entity is in a centering relationship with the keyword entity and is not a quantifier, or the keyword and the previous non-quantifier terms form a continuous centering structure, merging the previous word with the keyword entity. "Liquan" in the above example emphasizes the advance of financial reform at the business seating meeting. "extract the initial triple (lie four, emphasis, reform), in which" finance "is a fixed phrase that modifies" reform ", so both are merged and the triple is updated (lie four, emphasis, reform).
(2) Relation words are expanded, in order to enable the relation description between the entities to be more accurate and concrete, negative zhuge and non-entity object expansion are added to the relation words; however, if there is a negative adverb modification before the verb as the candidate related word, the expression is the opposite meaning to the verb, so the negative adverb before the candidate related word needs to be added to the related word, for example, "zhang san dislikes the sea," according to the rules of the syntax of the relationship, the initial triple extracted is (zhang, like, sea), and the initial triple extracted is updated to (zhang, dislike, sea) by expansion of the related word. In addition, if the non-entity words serve as the object of the verbs in the sentence, the relationship between the two entities can be clearer. The triplet in "li four in the lecture of college university" in the above embodiment is expandable from initial (li four, lecture, college of college) to (li four, lecture, college of college).
(3) And (3) parallel triplet expansion, wherein parallel subjects or parallel objects often exist in the sentence, and when the subjects or the objects are entities in entity relationship triplets, the parallel entities need to be combined into a new triplet. For example, for the sentence "xiaoyang, xiaohong, and xianhua, it is the daughter of pluma and xiaowang. According to the syntactic relation rule provided by the invention, the extracted initial triples are (plumule, daughter and xiaofang), and other 5 triples are added through parallel entity expansion, namely (plumule, daughter and xiaohong), (plumule, daughter and xiaohua), (xiaowang, daughter and xiaohuang), (xiaowang, daughter and xiaohua).
The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, therefore all equivalent technical solutions should also fall into the scope of the present invention, and should be defined by the claims.

Claims (6)

1. A Chinese entity relation extraction method based on dependency of keywords and verbs is characterized by comprising the following steps:
segmenting words of the text, extracting keywords and generating a text keyword lexicon;
performing word segmentation, part-of-speech tagging, named entity identification and dependency syntactic analysis on each single sentence to obtain word segmentation, part-of-speech, named entity and dependency syntactic analysis information of each single sentence;
obtaining a verb set and an entity set in each single sentence;
when the number of verbs and entities in a single sentence is larger than 0, analyzing whether the lexical items depending on the verbs are matched with the syntactic relation rule or not, if so, obtaining an initial entity relation triple, and then expanding the entity relation triple; otherwise, matching the next verb of the single sentence;
after extracting the relation of all the single sentences in the text, obtaining a text triple set;
when extracting keywords, firstly filtering word segmentation results according to part-of-speech characteristics, only reserving part-of-speech terms as candidate keywords, then calculating TF-IDF weights of the candidate keywords, and finally inputting words with thresholds larger than set thresholds into a text keyword set; wherein TF refers to the number of times of occurrence of words in a text, and IDF refers to inverse document frequency;
the entity set consists of a text global keyword set and named entities;
the entity relationship triple is expanded, and the expansion comprises entity word expansion, relation word expansion and parallel triple expansion, and specifically comprises the following steps:
the entity word expansion is to combine the keyword entity with the fixed language modifier;
the relation word expansion comprises adding negative number words and adding non-entity objects;
the parallel triple expansion is to form a new triple by the parallel entity and the relation word when the entity in the acquired entity relation triple exists in the parallel entity.
2. The method for extracting Chinese entity relationship based on dependency of keywords and verbs as claimed in claim 1, wherein the inputted text is subject to sentence division according to periods, exclamation marks and question marks to obtain a single sentence set.
3. The method for extracting the relationship between the Chinese entities based on the dependency of the keywords and the verbs according to claim 1, wherein the syntactic relation rule is that the verbs are used as candidate relationship words according to a sentence dependency syntactic structure, whether the dependency of other lexical items and the verbs in the sentence is a predicate, verb, intervening and complement relationship is analyzed, and if the dependency of two lexical items and the verbs in the sentence is two of the relationships, such as the predicate and the verb, the predicate and the intervening, and the two lexical items are entities, the initial entity relationship triple can be determined.
4. The method as claimed in claim 3, wherein the syntactic relation rules include isA rules for judgment class verbs and non-isA rules for other verbs.
5. The method for extracting Chinese entity relationship based on keyword and verb dependency according to claim 4,
in the isA rule, the sentence structure of the rule of the Entity related to the relation word is represented as "Entity1+ Noun + is + Entity2" or "Entity2+ is + Entity1+ Noun", and the Entity relation triple is preliminarily represented as (Entity 1, noun, entity 2); wherein, entity1 and Entity2 are Entity pairs in sentences, one Entity is in a main-to-predicate relationship or a moving-to-guest relationship with a judgment verb, and the other Entity has no direct relationship with the judgment verb; noun represents a Noun in a sentence, and has a main-predicate relationship or an actor-guest relationship with a judgment verb, and there is another entity which has no direct relationship with the judgment verb but has a fixed dependency relationship with the Noun and is used for modifying the Noun;
in the isA rule, a rule that an Entity is unrelated to a relation word means that an Entity and a judgment verb are in a main-predicate relationship in a sentence, a Noun and the judgment verb are in a guest-moving relationship, and the Entity are in a parallel relationship, the sentence structure can be represented as "Entity1+ Conj + Entity2 (+ +) + is + Noun", and the relation triple can be primarily represented as (Entity 1, noun, entity 2); where Entity1 and Entity2 are pairs of entities in a sentence, entity2 (++) indicates that there may be one or more entities juxtaposed with Entity1, and Noun is a Noun in the sentence.
6. The method for extracting Chinese entity relationship based on keyword and verb dependency according to claim 5, wherein the non-isA rules include verb-to-subject rules and verb-to-no-subject rules;
the verb subject rule comprises a main subject guest structure, a main subject intermediary structure, a main subject supplement structure, a previous subject intermediary structure and other structures, and specifically comprises the following steps:
the subject-predicate object structure is that starting from a verb, according to the dependency syntax, the subject and the object of the verb exist and are both entities, and an initial entity relationship triple can be built;
the subject-predicate-intermediary structure is that starting from a verb, according to the dependency syntax, the subject of the verb exists and is an entity, and the preposition of the verb has an object and is an entity, so that an initial entity relation triple can be extracted;
the subject-predicate object-supplement structure is started from a verb, the verb is a verb of a short object, according to the dependency syntax, the verb has a subject and is an entity, a complement dependent on the verb exists, and the complement has an object and is an entity, so that an initial entity relationship triple can be formed;
the preposed guest-mediated structure is that starting from a verb, according to dependency syntax, a preposed object depending on the verb exists and is an entity, a preposed word depending on the verb exists, the preposed word has an object and is an entity, and an initial entity relation triple can be formed;
the other structure is a structure which starts from a verb, a subject of the verb exists and is an entity according to the dependency syntax, and the structure exists and is dependent on the verb, has an object and is an entity, and can construct a triple relationship;
the verb no-subject rule comprises a verb parallel structure and a sentence no-subject structure, and specifically comprises the following steps:
the verb parallel structure indicates that one lexical item in a sentence can directly establish an action-subject relationship with a verb or indirectly establish an intermediary relationship and a supplement relationship, the lexical item is an entity, no lexical item capable of establishing a subject-subject relationship with the verb exists, but other verbs parallel to the verb exist, and the subjects of the two verbs are consistent, so that an entity relationship triple can be constructed by taking the subject of the parallel verb as a subject;
the sentence has no subject structure, namely the sentence has no subject, but a lexical item can directly establish a guest moving relationship or indirectly establish a guest intervening relationship and a guest complementing relationship with a verb, the lexical item is an entity, the previous sentence of the sentence can be traced according to a Chinese heuristic rule, and the subject of the previous sentence core verb is used as the subject of the sentence; in the dependency syntax theory, it is claimed that the core verb is the central component of a sentence, dominating other components, while a sentence may have multiple verbs, each of which may have a subject, so the rule takes only the subject of the core verb of the previous sentence as the subject of the sentence.
CN201811124153.1A 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs Active CN109241538B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811124153.1A CN109241538B (en) 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811124153.1A CN109241538B (en) 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs

Publications (2)

Publication Number Publication Date
CN109241538A CN109241538A (en) 2019-01-18
CN109241538B true CN109241538B (en) 2022-12-20

Family

ID=65056162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811124153.1A Active CN109241538B (en) 2018-09-26 2018-09-26 Chinese entity relation extraction method based on dependency of keywords and verbs

Country Status (1)

Country Link
CN (1) CN109241538B (en)

Families Citing this family (92)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568142B2 (en) * 2018-06-04 2023-01-31 Infosys Limited Extraction of tokens and relationship between tokens from documents to form an entity relationship map
CN109582975B (en) * 2019-01-31 2023-05-23 北京嘉和海森健康科技有限公司 Named entity identification method and device
CN110083822B (en) * 2019-03-06 2022-11-15 杭州电子科技大学 A Transformation Method from Requirement Text to SysML Requirement Diagram
CN109918672B (en) * 2019-03-13 2023-06-02 东华大学 A structured processing method for thyroid ultrasound report based on tree structure
CN109992651B (en) * 2019-03-14 2024-01-02 广州智语信息科技有限公司 Automatic identification and extraction method for problem target features
CN109992777B (en) * 2019-03-26 2020-10-13 浙江大学 Keyword-based traditional Chinese medicine disease condition text key semantic information extraction method
CN110309393B (en) * 2019-03-28 2023-06-20 平安科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium
CN109977235B (en) * 2019-04-04 2022-10-25 吉林大学 Method and device for determining trigger word
CN110032649B (en) * 2019-04-12 2021-10-01 北京科技大学 A method and device for extracting relationships between entities in traditional Chinese medicine documents
CN110046351B (en) * 2019-04-19 2022-06-14 福州大学 Text relation extraction method based on features under rule driving
CN110263120A (en) * 2019-04-26 2019-09-20 北京零秒科技有限公司 Corpus labeling method and device
CN110222332B (en) * 2019-04-29 2023-06-16 闽江学院 A Method of Recognition of Dish Name Entity Based on Dependency Analysis
CN110110329B (en) * 2019-04-30 2022-05-17 湖南星汉数智科技有限公司 Entity behavior extraction method and device, computer device and computer readable storage medium
CN110162788B (en) * 2019-05-06 2021-02-09 腾讯科技(深圳)有限公司 Entity dependency relationship determination method and device
CN110119510B (en) * 2019-05-17 2023-02-14 浪潮软件集团有限公司 Relationship extraction method and device based on transfer dependency relationship and structure auxiliary word
CN110196913A (en) * 2019-05-23 2019-09-03 北京邮电大学 Multiple entity relationship joint abstracting method and device based on text generation formula
CN110263336B (en) * 2019-06-12 2023-06-23 东华大学 A method for constructing breast ultrasound field ontology
CN110263341B (en) * 2019-06-20 2023-06-20 贵州电网有限责任公司 Method for mining and locating personal ability from text
CN110377901B (en) * 2019-06-20 2022-11-18 湖南大学 A Text Mining Method for Reporting Cases of Distribution Line Tripping
CN110413732B (en) * 2019-07-16 2023-11-24 扬州大学 Knowledge searching method for software defect knowledge
CN110362673B (en) * 2019-07-17 2022-07-08 福州大学 Method and system for content discrimination of computer vision papers based on abstract semantic analysis
CN110597998A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 A method and device for extracting military scenario entity relations combined with syntactic analysis
CN110532553B (en) * 2019-08-21 2023-08-22 河海大学 A Method for Recognition and Extraction of Water Conservancy Spatial Relational Words
CN110543630B (en) * 2019-08-21 2020-06-09 北京仝睿科技有限公司 Method and device for generating text structured representation and computer storage medium
CN110502642B (en) * 2019-08-21 2024-01-23 武汉工程大学 Entity relation extraction method based on dependency syntactic analysis and rules
CN110555083B (en) * 2019-08-26 2021-06-25 北京工业大学 A method for unsupervised entity relation extraction based on zero-shot
CN110569504B (en) * 2019-09-04 2022-11-15 北京明略软件系统有限公司 Method and device for determining relative words
CN110569510A (en) * 2019-09-17 2019-12-13 四川长虹电器股份有限公司 method for identifying named entity of user request data
CN110705310B (en) * 2019-09-20 2023-07-18 北京金山数字娱乐科技有限公司 Article generation method and device
CN110765759B (en) * 2019-10-21 2023-05-19 普信恒业科技发展(北京)有限公司 Intention recognition method and device
CN110874396B (en) * 2019-11-07 2024-02-09 腾讯科技(深圳)有限公司 Keyword extraction method and device and computer storage medium
CN111382571B (en) * 2019-11-08 2023-06-06 南方科技大学 Information extraction method, system, server and storage medium
CN110909537A (en) * 2019-11-19 2020-03-24 曲英洲 Artificial intelligence method for modern Chinese component analysis
CN110991180A (en) * 2019-11-28 2020-04-10 同济人工智能研究院(苏州)有限公司 A command recognition method based on keywords and Word2Vec
CN111177215A (en) * 2019-12-20 2020-05-19 京东数字科技控股有限公司 Method and device for generating financial data
CN111126052B (en) * 2019-12-26 2023-11-03 鼎富智能科技有限公司 Function point generation method, device, electronic equipment and computer readable storage medium
CN111198932B (en) * 2019-12-30 2023-03-21 北京明略软件系统有限公司 Triple acquiring method and device, electronic equipment and readable storage medium
CN111178079B (en) * 2019-12-31 2023-05-26 北京明略软件系统有限公司 Triplet extraction method and device
CN111241827B (en) * 2020-01-10 2022-05-20 同方知网(北京)技术有限公司 Attribute extraction method based on sentence retrieval mode
CN111611399A (en) * 2020-04-15 2020-09-01 广发证券股份有限公司 Information event mapping system and method based on natural language processing
CN111597349B (en) * 2020-04-30 2022-10-11 西安理工大学 An artificial intelligence-based method for automatic completion of entity relationship in rail transit specification
CN111597812B (en) * 2020-05-09 2021-09-17 北京合众鼎成科技有限公司 Financial field multiple relation extraction method based on mask language model
CN111597794B (en) * 2020-05-11 2023-06-06 浪潮软件集团有限公司 Dependency relationship-based 'Yes' word and sentence relationship extraction method and device
CN111597351A (en) * 2020-05-14 2020-08-28 上海德拓信息技术股份有限公司 Visual document map construction method
CN111639499B (en) * 2020-06-01 2023-06-16 北京中科汇联科技股份有限公司 Composite entity extraction method and system
CN113761919B (en) * 2020-06-04 2025-01-07 国家计算机网络与信息安全管理中心 A method and electronic device for extracting entity attributes from colloquial short text
CN111666738B (en) * 2020-06-09 2023-06-20 南京师范大学 Formalized coding method for action description natural text
CN111680492A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 New word mining method and device and electronic equipment
CN111666767B (en) * 2020-06-10 2023-07-18 创新奇智(上海)科技有限公司 Data identification method and device, electronic equipment and storage medium
CN111798847A (en) * 2020-06-22 2020-10-20 广州小鹏车联网科技有限公司 Voice interaction method, server and computer-readable storage medium
CN111897970B (en) * 2020-07-27 2024-05-10 平安科技(深圳)有限公司 Text comparison method, device, equipment and storage medium based on knowledge graph
CN112052340B (en) * 2020-08-10 2024-06-21 深圳数联天下智能科技有限公司 Data model construction method and device and electronic equipment
CN112036120B (en) * 2020-08-31 2024-07-12 上海硕恩网络科技股份有限公司 Skill phrase extraction method
CN111984778B (en) * 2020-09-08 2022-06-03 四川长虹电器股份有限公司 Dependency syntax analysis and Chinese grammar-based multi-round semantic analysis method
CN112084793B (en) * 2020-09-14 2024-05-14 深圳前海微众银行股份有限公司 Semantic recognition method, device and readable storage medium based on dependency syntax
CN112131343B (en) * 2020-09-14 2023-07-07 新讯数字科技(杭州)有限公司 Method for identifying characters in Chinese novel dialogue
CN112651226B (en) * 2020-09-21 2022-03-29 深圳前海黑顿科技有限公司 Knowledge analysis system and method based on dependency syntax tree
CN112148838B (en) * 2020-09-23 2024-04-19 北京中电普华信息技术有限公司 Service source object extraction method and device
CN112183059B (en) * 2020-09-24 2024-06-11 万齐智 Chinese structured event extraction method
CN114428781A (en) * 2020-09-30 2022-05-03 中国石油化工股份有限公司 Method, system and storage medium for updating incidence relation of named entity
CN112749548B (en) * 2020-11-02 2024-04-26 万齐智 Rule-based default completion extraction method for Chinese structured financial events
CN112364648A (en) * 2020-12-02 2021-02-12 中金智汇科技有限责任公司 Keyword extraction method and device, electronic equipment and storage medium
CN112560488B (en) * 2020-12-07 2025-02-21 北京明略软件系统有限公司 Noun phrase extraction method, system, storage medium and electronic device
CN112380868B (en) * 2020-12-10 2024-02-13 广东泰迪智能科技股份有限公司 Multi-classification device and method for interview destination based on event triplets
CN112559687B (en) * 2020-12-15 2024-07-30 中国平安人寿保险股份有限公司 Question identification and query method and device, electronic equipment and storage medium
CN112580349B (en) * 2020-12-24 2023-09-29 竹间智能科技(上海)有限公司 Phrase extraction method and device and electronic equipment
CN112699677B (en) * 2020-12-31 2023-05-02 竹间智能科技(上海)有限公司 Event extraction method and device, electronic equipment and storage medium
CN112613311B (en) * 2021-01-07 2024-12-06 北京捷通华声科技股份有限公司 Information processing method and device
CN112749549B (en) * 2021-01-22 2023-10-13 中国科学院电子学研究所苏州研究院 Chinese entity relation extraction method based on incremental learning and multi-model fusion
CN112784574B (en) * 2021-02-02 2023-09-15 网易(杭州)网络有限公司 Text segmentation method and device, electronic equipment and medium
CN113779062B (en) * 2021-02-23 2025-02-21 北京沃东天骏信息技术有限公司 SQL statement generation method, device, storage medium and electronic device
CN113515630B (en) * 2021-06-10 2024-04-09 深圳数联天下智能科技有限公司 Triplet generation and verification method and device, electronic equipment and storage medium
CN113361272B (en) * 2021-06-22 2023-03-21 海信视像科技股份有限公司 Method and device for extracting concept words of media asset title
CN113609838B (en) * 2021-07-14 2024-05-24 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Document information extraction and graphing method and system
CN113901800B (en) * 2021-08-31 2025-02-11 北京影谱科技股份有限公司 A method and system for extracting scene graphs from Chinese text
CN113743090B (en) * 2021-09-08 2024-04-12 度小满科技(北京)有限公司 Keyword extraction method and device
CN114004219B (en) * 2021-09-29 2024-10-11 西北工业大学 An automatic text summarization method based on semantic dependency
CN113821605B (en) * 2021-10-12 2024-05-14 广州汇智通信技术有限公司 Event extraction method
CN113705198B (en) * 2021-10-21 2022-03-25 北京达佳互联信息技术有限公司 Scene graph generation method and device, electronic equipment and storage medium
CN113971216B (en) * 2021-10-22 2023-02-03 北京百度网讯科技有限公司 Data processing method, device, electronic device and memory
CN114186552B (en) * 2021-12-13 2023-04-07 北京百度网讯科技有限公司 Text analysis method, device and equipment and computer storage medium
CN114461809A (en) * 2021-12-24 2022-05-10 温浩 Method and equipment for automatically generating semantic knowledge graph of Chinese abstract
CN114997398B (en) * 2022-03-09 2023-05-26 哈尔滨工业大学 Knowledge base fusion method based on relation extraction
CN114637862A (en) * 2022-03-28 2022-06-17 中科金审(北京)科技有限公司 Generation method of semantic dependency analysis tree in economic security field
CN115062609B (en) * 2022-08-19 2022-12-09 北京语言大学 Method and device for enhancing syntax dependence of Chinese language
CN115238217B (en) * 2022-09-23 2022-12-20 山东省齐鲁大数据研究院 Method for extracting numerical information from bulletin text and terminal
CN116150393A (en) * 2022-12-13 2023-05-23 南京大学 Relation extraction and knowledge graph construction method for text dependency syntactic analysis
CN116361422B (en) * 2023-06-02 2023-09-19 深圳得理科技有限公司 Keyword extraction method, text retrieval method and related equipment
CN117609518B (en) * 2024-01-17 2024-04-26 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure
CN119807325B (en) * 2024-11-21 2025-09-23 中国船舶集团有限公司第七一三研究所 Fine granularity information extraction method and system based on part-of-speech tagging
CN119848228A (en) * 2024-12-02 2025-04-18 中国人民解放军海军工程大学 Method and device for extracting unstructured text information of maintenance guarantee of ship equipment
CN119537613A (en) * 2025-01-17 2025-02-28 中电信数字城市科技有限公司 Social governance knowledge graph construction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002097662A1 (en) * 2001-06-01 2002-12-05 Synomia Method and large syntactical analysis system of a corpus, a specialised corpus in particular
WO2010050675A2 (en) * 2008-10-29 2010-05-06 한국과학기술원 Method for automatically extracting relation triplets through a dependency grammar parse tree
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN107180045A (en) * 2016-03-10 2017-09-19 中国科学院地理科学与资源研究所 A kind of internet text contains the abstracting method of geographical entity relation
CN107291687A (en) * 2017-04-27 2017-10-24 同济大学 It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002097662A1 (en) * 2001-06-01 2002-12-05 Synomia Method and large syntactical analysis system of a corpus, a specialised corpus in particular
WO2010050675A2 (en) * 2008-10-29 2010-05-06 한국과학기술원 Method for automatically extracting relation triplets through a dependency grammar parse tree
CN104933027A (en) * 2015-06-12 2015-09-23 华东师范大学 Open Chinese entity relation extraction method using dependency analysis
CN107180045A (en) * 2016-03-10 2017-09-19 中国科学院地理科学与资源研究所 A kind of internet text contains the abstracting method of geographical entity relation
CN107291687A (en) * 2017-04-27 2017-10-24 同济大学 It is a kind of based on interdependent semantic Chinese unsupervised open entity relation extraction method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于依存分析的开放式中文实体关系抽取方法;李明耀等;《计算机工程》;20160615(第06期);全文 *
基于文本挖掘的中文领域本体构建方法研究;翟羽佳等;《情报科学》;20150605(第06期);全文 *

Also Published As

Publication number Publication date
CN109241538A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109241538B (en) Chinese entity relation extraction method based on dependency of keywords and verbs
CN106776711B (en) Chinese medical knowledge map construction method based on deep learning
CN110502642B (en) Entity relation extraction method based on dependency syntactic analysis and rules
JP6466952B2 (en) Sentence generation system
RU2686000C1 (en) Retrieval of information objects using a combination of classifiers analyzing local and non-local signs
RU2662688C1 (en) Extraction of information from sanitary blocks of documents using micromodels on basis of ontology
Phan et al. Pair-linking for collective entity disambiguation: Two could be better than all
RU2679988C1 (en) Extracting information objects with the help of a classifier combination
CN113312922B (en) Improved chapter-level triple information extraction method
CN109101551B (en) Question-answer knowledge base construction method and device
KR20120001053A (en) Document sensitivity analysis system and method
WO2017198031A1 (en) Semantic parsing method and apparatus
Alkadri et al. Semantic feature based arabic opinion mining using ontology
JP6830971B2 (en) Systems and methods for generating data for sentence generation
Mohamed et al. Lexicon and Rule-based Word Lemmatization Approach for the Somali Language
AU2021105953A4 (en) Method for fine-grained domain terminology self-learning based on contextual semantics
Kadli et al. Cross Domain Hybrid Feature Fusion based Sarcastic Opinion Recognition Over E-Commerce Reviews Using Adversarial Transfer Learning.
Siddiqui et al. Sarcasm detection from Twitter database using text mining algorithms
Jagdale et al. Review on sentiment lexicons
Tohalino et al. Using virtual edges to extract keywords from texts modeled as complex networks
Mashina Application of statistical methods to solve the problem of enriching ontologies of developing subject areas
Ibrahiem et al. FEATURE EXTRACTION ENCHANCEMENT IN USERS’ATTITUDE DETECTION
CN107480142B (en) A method for extracting evaluation objects based on dependencies
Akhtar et al. Unsupervised morphological expansion of small datasets for improving word embeddings
CN115795057B (en) Audit knowledge processing method and system based on AI technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant