CN108595421B

CN108595421B - Method, device and system for extracting Chinese entity association relationship

Info

Publication number: CN108595421B
Application number: CN201810329836.4A
Authority: CN
Inventors: 李德彦; 晋耀红; 吴相博
Original assignee: Dingfu Intelligent Technology Co ltd
Current assignee: China Science and Technology (Beijing) Co., Ltd.
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2022-04-08
Anticipated expiration: 2038-04-13
Also published as: CN108595421A

Abstract

The application discloses a method, a device and a system for extracting Chinese entity association relationship, which are used for extracting a target event entity and a target event entity related to a relation word in a text according to the relation property of the relation word in a Chinese text, and then generating the Chinese entity association relationship corresponding to the relation word in the text according to the target event entity and the target event entity corresponding to the relation word and the relation word. According to the technical scheme provided by the embodiment of the application, the unstructured Chinese text is divided into different words according to different relation properties, the position ranges of the target event entity and the target event entity of each relation word are further reduced, the searching precision and the searching speed are improved, and the operation amount is reduced. In addition, the technical scheme in the embodiment of the application also uses a division rule on a Chinese grammar level, so that some redundant error related words and error entities are filtered to a great extent, and the accuracy rate of extracting the related words and the error entities is improved.

Description

Method, device and system for extracting Chinese entity association relationship

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method, an apparatus, and a system for extracting an association relationship between chinese entities.

Background

With the rapid development of the internet and the rapid increase of the economic level, when the enterprise strategy is formulated, the business is controlled to have a sharp sense of smell and grasp more related information, and the relationship between the enterprises and the individual can be grasped as much as possible to assist the decision maker to make the most reasonable planning.

Existing enterprise associative identification techniques generally rely more on standardized and structured collected data. However, this method has great limitations, such as slow update and high delay of text information sources, and the structuring of data may take much time to screen and sort the information, which may not result in the most timely information.

In addition, the above technique is only applicable to standardized and structured chinese text, which is obviously insufficient when processing a single unstructured chinese text. Moreover, most of the existing unstructured chinese text information does not have a single association relationship, and the sentence pattern of the captured unstructured chinese text information actually existing in the internet related to the enterprise is usually complex, and one sentence may include a plurality of pairs of association relationships with different attributes, and the existing association recognition technology cannot improve the accuracy of relationship recognition from a grammatical level. Therefore, how to accurately extract the association relationship from the unstructured complex Chinese text becomes a problem to be solved urgently.

Disclosure of Invention

The application provides a method, a device and a system for extracting an incidence relation of a Chinese entity, which aim to solve the problem that the incidence relation can not be accurately extracted from an unstructured Chinese text in the prior art.

In one aspect, an embodiment of the present application provides a method for extracting an association relationship between chinese entities, including:

extracting relation words in the text;

if the number of the extracted relation words is more than 1, determining the relation property of each relation word;

according to the relation property of each relation word, sequentially extracting a target event entity and a target subject entity corresponding to each relation word from the text;

and generating a Chinese entity association relation according to the relation words and the target event entity and the target subject entity corresponding to the relation words.

In a second aspect, an embodiment of the present application provides an apparatus for extracting an association relationship between chinese entities, where the apparatus includes:

the relation word extracting module is used for extracting relation words in the text;

the property determining module is used for determining the relation property of each relation word if the number of the extracted relation words is more than 1;

the target entity extraction module is used for sequentially extracting a target event entity and a target subject entity corresponding to each relation word from the text according to the relation property of each relation word;

and the incidence relation generating module is used for generating a Chinese entity incidence relation according to the relation words and the target event entity and the target subject entity corresponding to the relation words.

In a third aspect, an embodiment of the present application provides a system for extracting an association relationship between chinese entities, where the system includes a memory and a processor;

the memory is used for storing an executable program of the processor;

the processor is configured to:

extracting relation words in the text;

According to the technical scheme, the method, the device and the system for extracting the Chinese entity association relationship provided by the embodiment of the application extract the target event entity and the target event entity related to the relation word in the text according to the relation property of the relation word in the Chinese text, and generate the Chinese entity association relationship corresponding to the relation word in the text according to the target event entity and the target event entity corresponding to the relation word and the relation word. According to the technical scheme provided by the embodiment of the application, the unstructured Chinese text is divided into different words according to different relation properties, the position ranges of the target event entity and the target event entity of each relation word are further reduced, the searching precision and the searching speed are improved, and the operation amount is reduced. In addition, the technical scheme in the embodiment of the application also uses a division rule on a Chinese grammar level, so that some redundant error related words and error entities are filtered to a great extent, and the accuracy rate of extracting the related words and the error entities is improved.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a flowchart of an extraction method of chinese entity association provided in an embodiment of the present application;

FIG. 2 is a flow chart of step 102 in a preferred embodiment provided by an embodiment of the present application;

FIG. 3 is a flow chart of step 102 in a second preferred embodiment provided by embodiments of the present application;

FIG. 4 is a flow chart of step 102 in a third preferred embodiment provided by embodiments of the present application;

FIG. 5 is a flow chart of step 102 in a fourth preferred embodiment provided by embodiments of the present application;

fig. 6 is a structural diagram of an extracting apparatus for chinese entity association provided in an embodiment of the present application;

fig. 7 is a schematic diagram of an extraction system for chinese entity association provided in an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

Structured information is information managed by databases that we typically contact, including records on production, business, transaction, customer information, and the like. Unstructured information, the term of art being content, covers a wider range of information, which can be divided into: operation content: such as contracts, invoices, letters and purchase records; department content: such as document processing, spreadsheets, presentation files, and e-mail; web content: information in formats such as HTML and XML; multimedia content: such as sound, film, graphics, etc.

The vast amount of information appearing on the internet is roughly divided into three types, structured, semi-structured, and unstructured. Of course, the same is true of Chinese text information, structured information such as e-commerce information, the nature of the information and the position of the occurrence of the magnitude are fixed; semi-structured information such as subdivided channels on professional websites has quite standard grammars of titles and texts and quite limited scope of keywords; unstructured information such as BLOG and BBS, all of which are unpredictable.

Since most of the existing enterprise association recognition technologies generally rely more on standardized and structured chinese text information, but lack a more accurate recognition method for unstructured chinese text information, the embodiment of the present application provides a method for extracting chinese entity association relationship, specifically referring to fig. 1, the method includes:

step 100, extracting relation words in a text; the relation words are necessarily existed in each segment of Chinese text with relation extraction value, so before extracting the text entity association relation, the relation words in the text are firstly extracted to determine the relation existed in the text. In general, the relation words may be nouns or verbs.

The Chinese text information is information with a certain semantic meaning combined by a series of words, and for Chinese text information with complex semantic meaning, entities with associated relations in the Chinese text information are required to be extracted, and firstly, which relation or relations exist in the text needs to be determined so as to achieve the purpose of accurate extraction.

Optionally, after extracting the relation words in the text, it is further determined whether the relation words exist in the predefined relation library. The predefined relational database has a large amount of relational words, the relational words are obtained from a large amount of processed text information, and the relational database can also comprise collaborative words related to the relational words, some attributes of the relational words, relations corresponding to the relational words and the like, and each relational word has a specific attribute, a specific relation and a specific entity position relation. The predefined relational database can provide certain reference for extracting the relational words in the Chinese text, if the extracted relational words exist in the predefined relational database, some attributes corresponding to the relational words and other parameters of the relational words can be directly called from the relational database, so that the process of reestablishing the attributes and the parameters of the relational words is avoided, the whole early-stage process of extracting the entity association relationship is quicker, and in addition, due to the fact that the prior relational word properties and the parameters are used for comparison, the subsequent relational word extraction and the acquisition of the relational word properties are more accurate.

Furthermore, the attributes of the relation words comprise the meanings of the relation words, the parts of speech of the relation words, the relation properties of the relation words and the like, and the specific positions of the entities related to the relation words can be further obtained according to the meanings of the relation words, the parts of speech of the relation words, the relation properties of the relation words and the like, and the specific positions are stored in a predefined relation library in advance so as to be used quickly when the relation words are extracted.

Determining a relational property of the relational term if the relational term exists in a predefined relational library. Generally, the semantic relation in the chinese text mainly depends on the relation property of the relation word, so after the relation word in the text is extracted, the relation property of the relation word is further determined, so as to further process the text according to the relation property.

If the relation word is not found in the predefined relation library, it is indicated that the relation word is not stored in the predefined relation library before, and other information related to the relation word cannot be searched in the predefined relation library, at this time, the further operation on the relation word can be selected to be abandoned, that is, the relation word is judged to be invalid; or, establishing information related to the relation word in a predefined relation library, including: after a series of information is established, the next operation is carried out on the relation word, and because the information about the relation word is established this time, if the relation word is encountered when the entity association relation is extracted next time, the related information of the relation word can be quickly extracted from the predefined relation library, so that the process of establishing the relation word information can enrich the predefined relation library, and the content of the predefined relation library is more comprehensive.

Step 101, if the number of the extracted relation words is larger than 1, determining the relation property of each relation word. If the number of the extracted relation words is larger than 1, the entity association relationship in the text is more than one, and for the case that the number of the relation words is larger than 1, the relationship property of each relation word needs to be clear, so that the entity association relationship is extracted from the text according to the relationship property of the relation words.

In addition, after the relation words are extracted, all the relation words in the text can be stored together to generate a relation word set, one relation word set corresponds to one section of text, the sequence of the relation words in the relation word set is consistent with the sequence of the relation words appearing in the text, and besides, the relation word set also records the collaborative words related to the relation words, the relation properties of the relation words and the relation properties of the collaborative words.

After the relation word set is generated, the target event-executing entity and the target event-receiving entity corresponding to each relation word can be sequentially extracted from the text according to the sequence of the relation words in the relation word set and the relation properties of the relation words.

And 102, sequentially extracting the target event-executing entity and the target event-receiving entity corresponding to each relation word from the text according to the relation property of each relation word. The relational nature of relational words is generally divided into verb initiative relationship, noun forward relationship, verb passive relationship, and noun reverse relationship, etc. The relation word in verb initiative relation is usually an initiative verb, such as "buy", "merge", and "add up", etc.; the relation term in the positive relation of nouns is usually a positive noun, such as "stockholder" and "investor" etc.; the relation word in the verb passive relation is generally composed of two parts, one part is a cooperative word, the other part is a relation word main body, the cooperative word represents the passive relation, the relation word main body is still a verb, for example, "… … purchased" and "… … merger" and the like, here, "quilt" and "by" are both cooperative words and represent the passive relation, and "purchase" and "merger" are main bodies of the relation word; the relation word in the reverse relation of the noun is also divided into two parts, namely a cooperation word and a relation word body, the cooperation word represents the reverse relation, the relation word body is a noun, for example, "as … … stock holder" and "become … … stock holder" and the like, wherein, "as" and "become" are cooperation words and represent the reverse relation of the noun, and "stock holder" are the relation word body.

Each relation word generally has a working entity and a subject entity, the working entity is the active party forming the entity association, the subject entity is the passive party forming the entity association, i.e. the working entity is the subject of the relation word, and the subject entity is the object of the relation word. In the text with the complex relation, because the relation words are multiple, the determination of the executing entity and the subject entity of each relation word has a relation with the executing entity and the subject entity of other relation words near the relation word, and the target executing entity and the target subject entity of the target relation word need to be determined according to the executing entity and the subject entity of other relation words in the text.

It should be noted that the location of the incident entity and the incident entity corresponding to each relation term is usually fixed, and the specific location varies according to the relation nature of the relation term. In verb proactive relationships, the actor entity is located before the relationship term and the victim entity is located after the relationship term, such as A buys B, where A is the actor entity and B is the victim entity. In the verb passive relationship, the subject entity is located before the collaborative word, and the subject entity is located between the collaborative word and the relation word subject, for example, B is purchased by a, where B is the subject entity and a is the subject entity. In the positive noun relationship, in active relationship with the verb, the professional entity is located before the related term, and the subject entity is located after the related term, for example, the buyer of A is B, where A is the professional entity and B is the subject entity. In the noun reverse relationship, in a passive relationship with a verb, a subject entity is located before a collaborative word, and a subject entity is located between the collaborative word and a subject of the relation word, for example, B is a buyer of a, where B is the subject entity and a is the subject entity.

And 103, generating a Chinese entity association relation according to the relation words and the target event entity and the target subject entity corresponding to the relation words.

It is worth mentioning that in the technical scheme of the application, when the number of the extracted relation words is greater than 1, the relation property of each relation word is determined, then the target event entity and the target subject entity corresponding to each relation word are sequentially extracted from the text according to the relation property of each relation word, and the Chinese entity association relationship is correspondingly generated. However, when the number of the relation words in one text is only one, the technical scheme of the application is still applicable, and compared with the text with complex relation, the process of processing the text with only one relation word is simpler, the relation properties of other relation words and the positions of related entities do not need to be considered, and only the judgment and entity extraction of the relation word are carried out. For example, the text entity association relationship of the text "all da sports purchased IRONMAN series events" may be extracted, the relation word "purchase" may be extracted first, then the relation word is determined to be verb active relation, then the target event entity "all da sports" and the target event entity "IRONMAN series events" may be extracted according to the position relation between the target event entity and the target event entity in the verb active relation, and finally the chinese entity association relationship "all da sports- > purchase- > IRONMAN series events" may be generated.

According to the method for extracting the Chinese entity association relationship, the target event entity and the target subject entity related to the relation word in the text are extracted according to the relation property of the relation word in the Chinese text, and then the Chinese entity association relationship corresponding to the relation word in the text is generated according to the target event entity and the target subject entity corresponding to the relation word and the relation word. According to the technical scheme provided by the embodiment of the application, the unstructured Chinese text is divided into different words according to different relation properties, the position ranges of the target event entity and the target event entity of each relation word are further reduced, the searching precision and the searching speed are improved, and the operation amount is reduced. In addition, the technical scheme in the embodiment of the application also uses a division rule on a Chinese grammar level, so that some redundant error related words and error entities are filtered to a great extent, and the accuracy rate of extracting the related words and the error entities is improved.

In a preferred embodiment of the present application, the step 102 is further explained by taking the verb initiative relationship as an example, and as shown in fig. 2, the step 102 may specifically include:

step 201, if the relation property of the relation word is verb initiative relation, a first target relation word which is located before the relation word and is closest to the relation word and a second target relation word which is located after the relation word and is farthest from the relation word are searched in the text.

Taking the text "the last year, the panda sports under the panda flag bought the IRONMAN series events under the flag of the world IRONMAN company" as an example, three relation words, namely "under flag", "buy", and "under flag" exist in the text, and in the preferred embodiment, we study the verb initiative relation, so after judging the relation nature of the three relations, it is determined that "under flag" is the noun initiative relation, and "buy" is the verb initiative relation. Again according to step 201, since the relationship word closest to "acquisition" is "down-flagged" before "acquisition", the first target relationship word is "down-flagged"; since the relationship after and furthest from "acquisition" is also "under the flag," the second target relationship is also "under the flag.

Step 202, extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from the text.

Since the first target relation word "under flag" is the noun forward relation, the first subject entity of "under flag" is located behind "under flag" and before purchase, and the first subject entity is located before "under flag" and after the position of the entity is determined, so the first subject entity is located in the text of "last year, Wanda group", further entity identification can determine that the "Wanda group" is the first subject entity of "under flag", and the first subject entity is located in the text of "Wanda sports", and after identification, the "Wanda sports" can be determined as the first subject entity of "under flag".

The second target relation word is 'under flag', so that the second event entity of 'under flag' is positioned in the text of 'world IRONMAN' between 'acquisition' and 'under flag', through entity identification, the second event entity can be determined to be 'world IRONMAN', the second event entity is positioned in the text of 'IRONMAN series events' after 'under flag', and the second event entity can be determined to be 'IRONMAN series events' after identification.

Step 203, the first subject entity is used as a target subject entity of the relation term, and the second subject entity is used as a target subject entity of the relation term.

Therefore, after the

above steps

201 and 202, the target event entity of "acquisition" is "all-the-world sports", and the target event entity of "acquisition" is "IRONMAN series events".

Then, according to step 103, a chinese entity association relationship of "wanda sports- > acquisition- > IRONMAN series of events" is generated according to the relation word "acquisition" and the corresponding target event entity "wanmann sports" and target event entity "IRONMAN series of events" of acquisition ".

Optionally, as can be seen from the foregoing, a specific process of using the first subject entity as a target actor entity of the relation term and using the second subject entity as a target subject entity of the relation term includes: respectively carrying out entity identification on the first subject entity and the second subject entity; and taking the first subject entity after the entity recognition as a target event entity of the relation word, and taking the second subject entity after the entity recognition as a target subject entity of the relation word. In fact, the step of entity identification is performed synchronously in step 202 or in step 203, which meets the requirements of the embodiments of the present application, and the purpose of identifying the entity in the short Chinese text segment can be achieved. The process of extracting the first subject entity and the second subject entity is a process of determining the position of the entity, and only the range where the entity is located can be actually determined, and the entity and the exact position of the entity can be really determined after the entity is identified, so the accuracy of the whole entity association relation extraction process can be improved by the process of entity identification.

In addition, in step 202, if no relation word is found before or after "acquisition", it indicates that the first target relation word or the second target relation word does not exist, and at this time, it is necessary to find the entity closest to "acquisition" before "acquisition" as the target event entity or the entity farthest from "acquisition" after "acquisition" as the target event entity. For example, in the text of "all da sports acquisition of the IRONMAN series events of the world IRONMAN company of all da group," acquisition "has no other related words before and after," acquisition ", so that the entity" all da sports "closest to" acquisition "before" acquisition "is sought as the target performance entity, and the entity" IRONMAN series events "farthest from" acquisition "after" acquisition "is sought as the target performance entity.

In the second preferred embodiment of the present application, the term forward relation is taken as an example to further explain step 102, and as shown in fig. 3, step 102 may specifically include:

step 301, if the relation property of the relation word is the positive relation of the noun, searching a first target relation word which is positioned before the relation word and is closest to the relation word in the text, and searching a second target relation word which is positioned after the relation word and is farthest from the relation word in the text.

Taking the stockholder C purchasing D of the subsidiary B of the text "a" as an example, taking the noun forward relation word "stockholder" as an example, the first target relation word closest to "stockholder" before "stockholder" in the text is "subsidiary", and the second target relation word farthest from "stockholder" after "stockholder" is "purchasing".

Step 302, extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from the text.

In the text, the first professional entity of the first target relation word "subsidiary" is "A", the first subject entity is "B", the second professional entity of the second target relation word is "C", and the second subject entity is "D".

Step 303, using the first subject entity as the target actor entity of the relation word, and using the second subject entity as the target subject entity of the relation word. The target event entity of the "shareholder" is "a" and the target subject entity is "D".

Then, according to step 103, the Chinese entity association relationship is generated as "A- > shareholders-controlling- > D" according to the relationship word "shareholders" and the target construction entity "A" and the target subject entity "D" corresponding to the "shareholders-controlling".

Optionally, as can be seen from the foregoing, a specific process of using the first subject entity as a target actor entity of the relation term and using the second subject entity as a target subject entity of the relation term includes: respectively carrying out entity identification on the first subject entity and the second subject entity; and taking the first subject entity after the entity recognition as a target event entity of the relation word, and taking the second subject entity after the entity recognition as a target subject entity of the relation word. In fact, the step of entity identification is performed synchronously in step 302 or all steps in step 303 meet the requirements of the embodiment of the present application, and the purpose of identifying the entity in the short Chinese text can be achieved. The process of extracting the first subject entity and the second subject entity is a process of determining the position of the entity, and only the range where the entity is located can be actually determined, and the entity and the exact position of the entity can be really determined after the entity is identified, so the accuracy of the whole entity association relation extraction process can be improved by the process of entity identification.

In addition, if no other relation words are found before or after the relation word "holding stockholder", the entity closest to the "holding stockholder" before the "holding stockholder" is required to be searched as the target employment entity, or the entity farthest from the "holding stockholder" after the "holding stockholder" is required to be searched as the target subject entity.

In the third preferred embodiment of the present application, taking verb passive relationship as an example, the step 102 is further explained, as shown in fig. 4, the step 102 may specifically include:

step 401, if the relation property of the relation word is verb passive relation, the relation word is decomposed into a collaborative word and a relation word main body.

Taking the text "American television production company Dick as the capital stock of company A is acquired by 10 billion dollars (about 78 million harbor dollars) by the sub-company B of Vanda group," the relation word "acquired by … …" with a verb passive relationship exists in the text, wherein "acquired" is a cooperative word and "acquired" is a relation word subject.

Step 402, finding a first target relation word which is positioned before the collaborative word and is closest to the collaborative word in the text, and a second target relation word which is positioned before the relation word main body and is closest to the relation word main body in the text.

The first target relationship word "as … … shareholder" closest before the collaborative word "get by" is found in the text, and the second target relationship word "subsidiary" closest to "acquisition" between the collaborative word "get by" and the relationship word subject "acquisition" is found.

And 403, extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from the text.

The first target relation word "as … … shareholder" is a noun inverse relation, and at this time, the first subject entity of the relation word is located in the text "Dick of the american television production company" before being "and the first subject entity of the first target relation word" Dick of the american television production company "can be determined through an entity recognition process. The second target relationship term "sub-company" is the noun forward relationship, and the second subject entity of the relationship term is located in the text of "B rejoins $ 10 million (about 78 million harbor dollars)" after the "sub-company", and the second subject entity is "B" after the entity recognition.

Step 404, the first subject entity is used as the target subject entity of the relation term, and the second subject entity is used as the target subject entity of the relation term.

After step 403, the first subject entity is "Dick", the second subject entity is "B", so the target subject entity of the relation word "purchased by … …" is "Dick", and the target subject entity is "B".

Then, according to step 103, an entity association relationship "B- > acquisition- > Dick" of american television production company may be generated.

Optionally, as can be seen from the above, a specific process of using the first subject entity as a target subject entity of the relation term and using the second subject entity as a target event entity of the relation term includes: respectively carrying out entity identification on the first subject entity and the second subject entity; and taking the first subject entity after the entity recognition as a target subject entity of the relation word, and taking the second subject entity after the entity recognition as a target subject entity of the relation word. In fact, the step of entity identification is performed synchronously in step 403 or all steps in step 404 meet the requirements of the embodiment of the present application, and the purpose of identifying the entity in the short Chinese text can be achieved. The process of extracting the first subject entity and the second subject entity is a process of determining the position of the entity, and only the range where the entity is located can be actually determined, and the entity and the exact position of the entity can be really determined after the entity is identified, so the accuracy of the whole entity association relation extraction process can be improved by the process of entity identification.

In addition, if the text including the relation word "purchased by … …" is "the american television production company Dick was purchased by the panada group for $ 10 million (about 78 million harbor dollars)", there are no other relation words before the conjunction word "by", and it is necessary to identify the entity "american television production company Dick" closest to "by" in the text before "by" as the target subject entity; similarly, if no other relation words exist between the cooperation word "quilt" and the relation word main body "acquisition", the entity "Wanda group" closest to "acquisition" between "quilt" and "acquisition" is identified as the target professional entity. Therefore, the entity association relationship finally generated is "Wanda group- > procurement- > American television manufacturing company Dick".

In the fourth preferred embodiment of the present application, the step 102 is further explained by taking the term reverse relationship as an example, as shown in fig. 5, the step 102 may specifically include:

step 501, if the relation property of the relation word is the reverse relation of the noun, the relation word is decomposed into a collaborative word and a relation word body.

Take the text "the actual control company a of the power company in Gansu province as the whole capital subsidiary of the subsidiary company b of the national grid company" as an example, wherein "… … whole capital subsidiary" is the relation word of the reverse relation of the noun, "as" is the cooperation word, and "whole capital subsidiary" is the main body of the relation word.

Step 502, finding a first target relation word which is positioned before the collaborative word and is closest to the collaborative word in the text, and a second target relation word which is positioned before the relation word main body and is closest to the relation word main body in the text.

The first target relationship word which is closest to ' as before ' is found in the text to be ' actual control company ', and between ' as ' and ' full-funded subsidiary company ', the second target relationship word which is closest to ' full-funded subsidiary company ' is ' subsidiary.

Step 503, extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from the text.

The first target relationship word "real control company" is the noun forward relationship, the first subject entity of which is "company a". The first event entity is "power company of Gansu province"; the second target relation word 'subsidiary' is a noun forward relation, a second subject entity of the second target relation word is located in the text of 'company B', the second subject entity of the 'company B' can be determined through entity identification, a second event entity of the 'company B' is located in the text of 'national grid company', and the 'national grid company' can be determined as the second event entity after the entity identification.

Step 504, the first subject entity is used as the target subject entity of the relation word, and the second subject entity is used as the target subject entity of the relation word.

After step 503, it is determined that the first subject entity is "company a" as the target subject entity with the relationship word "as … … full funding sub-company", and the second subject entity is "company b" as the target subject entity with the relationship word "as … … full funding sub-company", so the entity relationship generated according to step 103 is "company b- > full funding sub-company- > company a".

Optionally, as can be seen from the above, a specific process of using the first subject entity as a target subject entity of the relation term and using the second subject entity as a target event entity of the relation term includes: respectively carrying out entity identification on the first subject entity and the second subject entity; and taking the first subject entity after the entity recognition as a target subject entity of the relation word, and taking the second subject entity after the entity recognition as a target subject entity of the relation word. In fact, the step of entity identification is performed synchronously in step 503 or in step 504, which meets the requirements of the embodiments of the present application, and the purpose of identifying the entity in the short Chinese text segment can be achieved. The process of extracting the first subject entity and the second subject entity is a process of determining the position of the entity, and only the range where the entity is located can be actually determined, and the entity and the exact position of the entity can be really determined after the entity is identified, so the accuracy of the whole entity association relation extraction process can be improved by the process of entity identification.

In addition, if the text is "the Gansu province electric power company is used as the whole capital subsidiary of the national grid company", the reverse relation word "used as the cooperation word" of … … whole capital subsidiary "in the text is not existed before as" other relation words, the "entity" Gansu province electric power company "which is" the closest before as "is identified as the target subject entity of the relation word" used as … … whole capital subsidiary ", and then the entity" national grid company "which is closest to the relation word main body between the cooperation word and the relation word main body is identified as the relation word" as the target subject entity of … … whole capital subsidiary ". The finally generated entity association relationship is 'national grid company- > full-funding subsidiary company- > Gansu province electric power company'.

In the above preferred embodiment, how to extract the entity relationship of the relation words with different relationship properties is described, for the complex chinese text having a plurality of relation words, the entity association relationship needs to be extracted for each relation word, and then the entity association relationships corresponding to all relation words form all the entity association relationships in the section of chinese text.

For example, in the text "the last year, the panda sports under the panda flag purchased the IRONMAN series of events under the flag of the world IRONMAN company", there are three relation words "under flag", "purchase", and "under flag", and the relation properties of the three relation words are a noun forward relation, a verb active relation, and a noun forward relation, respectively, and the extraction and generation of the entity association relation are performed on the three relation words according to the relation properties, so that three entity association relations can be obtained, which are: "Wanda group- > Qidan- > Wanda sports", "Wanda sports- > procurement- > IRONMAN series race" and "world ferriman company- > Qidan- > IRONMAN series race".

In the text "Dick of american television production company as a shareholder of company a is acquired by a subsidiary of the clan by repudiating $ 10 billion (about 78 million harbor units)" as … … shareholder "," acquired by … … "and" subsidiary ", and the relational properties of the three relational terms are noun reverse relation, verb passive relation and noun forward relation, respectively, and the extraction and generation of entity association relation are performed on the three relational terms according to the relational properties, so that three entity association relations can be obtained, which are: "company" a "holdingstockandDong" U.S. television production company Dick "," B "procurement" U.S. television production company Dick "and" Wanda group "subsidiary company" B ".

In the text "the actual control company a of the power company in Gansu province is used as the full-resource subsidiary of the subsidiary company b of the national grid company", there are three relation words "the actual control company", "as … … full-resource subsidiary" and "subsidiary", and the relation properties of the three relation words are the noun forward relation, the noun reverse relation and the noun forward relation respectively, and the extraction and generation of the entity association relation are performed on the three relation words respectively according to the relation properties, so that three entity association relations can be obtained, which are: "power company- > actual control company- > company a", "company b- > capital subsidiary-company a", and "national grid company- > subsidiary-company b", in Gansu province.

According to the technical scheme, the method for extracting the Chinese entity association relationship provided by the embodiment of the application extracts the target event entity and the target event entity related to the relation word in the text according to the relation property of the relation word in the Chinese text, and generates the Chinese entity association relationship corresponding to the relation word in the text according to the target event entity and the target event entity corresponding to the relation word and the relation word. According to the technical scheme provided by the embodiment of the application, the unstructured Chinese text is divided into different words according to different relation properties, the position ranges of the target event entity and the target event entity of each relation word are further reduced, the searching precision and the searching speed are improved, and the operation amount is reduced. In addition, the technical scheme in the embodiment of the application also uses a division rule on a Chinese grammar level, so that some redundant error related words and error entities are filtered to a great extent, and the accuracy rate of extracting the related words and the error entities is improved.

Referring to fig. 6, an embodiment of the present application further provides an extraction device for chinese entity association, including:

the relation word extracting module 601 is used for extracting relation words in the text;

a property determining module 602, configured to determine a relationship property of each relationship word if the number of the extracted relationship words is greater than 1;

the target entity extraction module 603 is configured to sequentially extract a target event entity and a target subject entity corresponding to each relation word from the text according to the relation property of each relation word;

the association relationship generating module 604 is configured to generate a chinese entity association relationship according to the relation term and the target event entity and the target subject entity corresponding to the relation term.

Optionally, the target entity extraction module 603 further includes: a verb initiative relationship entity extraction module for,

if the relational property of the relational word is verb initiative relationship, searching a first target relational word which is positioned in front of the relational word and is closest to the relational word and a second target relational word which is positioned behind the relational word and is farthest from the relational word in the text;

extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from the text;

and taking the first subject entity as a target executing entity of the relation word, and taking the second subject entity as a target subject entity of the relation word.

Optionally, the target entity extraction module 603 further includes: a noun forward relationship entity extraction module, configured to,

if the relation property of the relation word is the positive relation of the noun, searching a first target relation word which is positioned in front of the relation word and is closest to the relation word and a second target relation word which is positioned behind the relation word and is farthest from the relation word in the text;

Optionally, the target entity extraction module 603 further includes: a verb passive relationship entity extraction module for,

if the relational property of the relational word is verb passive relation, decomposing the relational word into a collaborative word and a relational word main body;

searching a first target relation word which is positioned in the text and is before the collaborative word and closest to the collaborative word, and a second target relation word which is positioned in the text and is before the relation word main body and closest to the relation word main body;

and taking the first subject entity as a target subject entity of the relation word, and taking the second subject entity as a target application entity of the relation word.

Optionally, the target entity extraction module 603 further includes: a noun reverse relation entity extraction module, configured to,

if the relation property of the relation word is the reverse relation of the noun, the relation word is decomposed into a cooperative word and a relation word main body;

Optionally, the apparatus further comprises:

the relation word judging module is used for judging whether the relation words exist in a predefined relation library or not;

if the relational term exists in the predefined relational library, the relational nature of the relational term is determined.

Optionally, the verb active relationship entity extraction module or the noun forward relationship entity extraction module includes:

the first entity identification module is used for respectively carrying out entity identification on the first subject entity and the second subject entity;

and taking the first subject entity after the entity recognition as a target event entity of the relation word, and taking the second subject entity after the entity recognition as a target subject entity of the relation word.

Optionally, the verb passive relationship entity extraction module or the noun reverse relationship entity extraction module includes:

the second entity identification module is used for respectively carrying out entity identification on the first subject entity and the second subject entity;

and taking the first subject entity after the entity recognition as a target subject entity of the relation word, and taking the second subject entity after the entity recognition as a target subject entity of the relation word.

Referring to fig. 7, an embodiment of the present application further provides a system for extracting an association relationship between chinese entities, where the system includes a memory 701 and a processor 702;

the memory 701 is used for storing an executable program of the processor 702;

the processor 702 is configured to:

extracting relation words in the text;

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for extracting Chinese entity association relation is characterized by comprising the following steps:

extracting relation words in the text;

generating a Chinese entity association relation according to the relation words and the target event entity and the target subject entity corresponding to the relation words;

the step of sequentially extracting the target event-applying entity and the target event-receiving entity corresponding to each relation word from the text according to the relation property of each relation word comprises the following steps:

if the relational property of the relational word is verb passive relation, decomposing the relational word into a collaborative word and a relational word main body; finding a first target relation word which is located before the collaborative word and closest to the collaborative word in the text, and a second target relation word which is located between the collaborative word and the relation word main body and closest to the relation word main body in the text; extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from a text; taking the first subject entity as a target subject entity of the relation word and the second subject entity as a target event entity of the relation word;

if the relation property of the relation word is the reverse relation of the noun, decomposing the relation word into a cooperative word and a relation word main body; and if the first target relation word does not exist before the collaborative word, taking an entity before the collaborative word and closest to the collaborative word as a target subject entity; or if a second target relation word does not exist between the collaborative word and the relation word main body, taking an entity which is closest to the relation word main body between the collaborative word and the relation word main body as a target application entity.

2. The method according to claim 1, wherein the step of sequentially extracting the target event entity and the target subject entity corresponding to each relation term from the text according to the relation property of each relation term comprises:

if the relational property of the relational word is verb initiative relationship, searching a first target relational word which is positioned in front of the relational word and is closest to the relational word and a second target relational word which is positioned behind the relational word and is farthest away from the relational word in the text;

extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from a text;

and taking the first subject entity as a target subject entity of the relation word, and taking the second subject entity as a target subject entity of the relation word.

3. The method according to claim 1, wherein the step of sequentially extracting the target event entity and the target subject entity corresponding to each relation term from the text according to the relation property of each relation term comprises:

4. The method according to claim 1, wherein the step of sequentially extracting the target event entity and the target subject entity corresponding to each relation term from the text according to the relation property of each relation term comprises:

if the relation property of the relation word is the reverse relation of the noun, decomposing the relation word into a cooperative word and a relation word main body;

finding a first target relation word which is positioned in the text and is before the collaborative word and closest to the collaborative word, and a second target relation word which is positioned in the text and is before the relation word main body and closest to the relation word main body;

and taking the first subject entity as a target subject entity of the relation word, and taking the second subject entity as a target event entity of the relation word.

5. The method according to any one of claims 2-4, wherein after extracting the relation words in the text, the method further comprises:

judging whether the relation words exist in a predefined relation library or not;

determining a relational property of the relational term if the relational term exists in a predefined relational library.

6. The method according to any one of claims 2-3, wherein the step of using the first subject entity as the target actor entity of the relation term and the second subject entity as the target subject entity of the relation term comprises:

respectively carrying out entity identification on the first subject entity and the second subject entity;

7. The method of claim 4, wherein the step of using the first subject entity as the target subject entity of the relationship term and the second subject entity as the target actor entity of the relationship term comprises:

8. An extraction device for Chinese entity association relationship, the device comprising:

the incidence relation generating module is used for generating a Chinese entity incidence relation according to the relation words and the target event entity and the target subject entity corresponding to the relation words;

the target entity extraction module is further used for decomposing the relation words into cooperative words and relation word bodies if the relation properties of the relation words are verb passive relations; finding a first target relation word which is located before the collaborative word and closest to the collaborative word in the text, and a second target relation word which is located between the collaborative word and the relation word main body and closest to the relation word main body in the text; extracting a first subject entity of the first target relation word and a second subject entity of the second target relation word from a text; taking the first subject entity as a target subject entity of the relation word and the second subject entity as a target event entity of the relation word; if the relation property of the relation word is the reverse relation of the noun, decomposing the relation word into a cooperative word and a relation word main body; and if the first target relation word does not exist before the collaborative word, taking an entity before the collaborative word and closest to the collaborative word as a target subject entity; or if a second target relation word does not exist between the collaborative word and the relation word main body, taking an entity which is closest to the relation word main body between the collaborative word and the relation word main body as a target application entity.

9. The extraction system of Chinese entity incidence relation is characterized by comprising a memory and a processor;

the memory is used for storing an executable program of the processor;

the processor is configured to:

extracting relation words in the text;

the processor is further configured to: