WO2017122904A1

WO2017122904A1 - Open information extraction method and system for extracting reified ternary relationship

Info

Publication number: WO2017122904A1
Application number: PCT/KR2016/010902
Authority: WO
Inventors: 최기선; 남상하; 함영균
Original assignee: 한국과학기술원
Priority date: 2016-01-11
Filing date: 2016-09-29
Publication date: 2017-07-20

Abstract

Disclosed is an open information extraction method and system for extracting a reified ternary relationship. A computer-implemented method comprises the steps of: receiving an input of a text for information extraction; extracting an argument and a predicate included in the text; and representing the argument and the predicate using a ternary relationship in the resource description framework (RDF).

Description

Open information extraction method and system for materialized ternary relation extraction

The description below relates to a technique for extracting information from text.

Today, with the growth of the Internet, various information is provided through web sites. The current web requires users to follow a link to the site and to the destination of their choice. However, it is more effective to query a large number of web pages than to read them all. To query, it is necessary to extract the information contained in the web pages and convert them into structured or semi-structured data.

Currently, there are many web information extraction tools, which are classified into automatic extraction tools and manual extraction tools. If a web page consists of data structured according to a given schema, it can be automatically extracted, but many web pages are in the form of unstructured data without a defined schema. As such, in the case of unstructured data, the user must specify the schema of the data to be extracted. Extraction rules are required to extract the data of the schema specified by the user.

Most of the conventional methods for extracting information from a web page use a method of extracting information dependent on a specific domain, and thus there is a problem that porting to another domain is not easy. Prior patent, "System and Method for Extracting Domain-Specific Information from Unstructured Web Documents" (Application No. 10-2005-0063896) describes the rules for extracting information from unstructured web documents containing unstructured data classified by domain. In this paper, a method of extracting information for each domain for automatically extracting key information from a web document of a specific domain is disclosed. Most information extraction techniques use a method of mapping a specific class to a specific domain ontology, targeting text in a specific domain.

The present invention provides a method and system for extracting information from all texts through a fully open information extraction method applicable to all domains other than a specific domain.

It provides a method and system for extracting open information by interpreting the linguistic structure into a coherent ternary relationship when extracting new knowledge using text as a knowledge source.

An open information extraction provides a method and system for refining all predicate-non-relational relationships in text and expressing them in a resource description framework (RDF) ternary relationship, which is a knowledge expression language.

It provides a method and system that can facilitate knowledge base integration and query processing by specifying all information extracted from text in a ternary relationship.

A computer-implemented method comprising: receiving text as an information extraction target; Extracting arguments and predicates included in the text; And expressing the argument and the predicate as a ternary relationship of a resource description framework (RDF).

According to an aspect, the extracting may extract all the arguments and predicates included in the text in phrase units.

According to another aspect, the method may further include analyzing a syntax structure between the argument and the predicate, and the expressing may be performed by converting the relation between the argument and the predicate into a ternary relation according to the syntax structure between the argument and the predicate. I can express it.

According to another aspect, the method may further include analyzing a syntax structure between the argument and the predicate, and the expressing of the relation may include the relation between the argument and the predicate according to a ternary relation conversion rule corresponding to the syntax structure. Can be expressed as a ternary relationship.

According to another aspect, the analyzing may analyze the dependency structure of the argument to the predicate for each of the arguments.

According to yet another aspect, the method may further include determining a topic corresponding to a subject among the arguments, and the expressing may be performed by expressing a core ternary relation including the subject with respect to the relation between the argument and the predicate. Based on the core ternary relationship, we can express the ternary relationship, which specifies the relationship between the remaining arguments and predicates.

According to another aspect, the expressing may include expressing a core ternary relationship including a core subject, a core verb, and a core object with respect to the relation between the argument and the predicate, and the relationship between the remaining arguments and the predicate based on the core ternary relationship. You can express the ternary relationship that is specified.

Combined with a computer system, receiving text as an information extraction target; Extracting arguments and predicates included in the text; And a computer program recorded on a recording medium to execute the step of expressing the argument and the predicate as a ternary relationship of a resource description framework (RDF).

A computer-implemented system comprising: at least one processor configured to execute instructions readable by the computer, wherein the at least one processor receives text as an object of information extraction and includes an argument included in the text. ) And a predicate, and expressing the argument and the predicate as a ternary relationship of a resource description framework (RDF).

According to an exemplary embodiment of the present invention, as an information extraction on an open domain, information may be extracted for all texts using a fully open information extraction method applicable to all domains other than a specific domain.

According to an embodiment of the present invention, more information can be accurately extracted from the text by suggesting a method of converting all predicate-claim relations in one text into a uniformly specified ternary relation.

According to an embodiment of the present invention, the information extracted from the text maintains a ternary relationship, thereby facilitating integration with other knowledge bases and maintaining a form capable of query processing in a conventional manner.

According to an embodiment of the present invention, the information extracted from the text has a materialized relationship, thereby preventing confusion between individual knowledges and further improving the accuracy of the query processing result.

1 is a block diagram illustrating an internal configuration of an open information extraction system according to an embodiment of the present invention.

2 is a flowchart illustrating an open information extraction method according to an embodiment of the present invention.

3 to 5 illustrate an example of a process of expressing information in text in a specific ternary relationship according to an embodiment of the present invention.

6 to 12 are exemplary diagrams for describing a ternary relation conversion rule according to a syntax structure according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The present embodiments relate to a technique for extracting information from text, and more particularly, to a method and system for extracting information by knowledge of all predicate-discourse relationships existing in text. This can be applied in various fields such as knowledge base construction, question and answer system, knowledge-based decision making system (eg, healthcare, legal expertise, decision support, etc.).

The present invention provides an open information extraction technique in a form applicable to all domains other than a specific domain. In particular, all information in the text can be extracted by expanding it into a specified Ternary Relationship to prevent the loss of important information about the time and space of the event in relation to the text. In addition, RDF (Resource Description Framework) is a knowledge expression language by refining all predicate-discourse relationships in text to facilitate integration with existing knowledge bases and to process queries on predicates and to prevent confusion between individual knowledges. ) Can be expressed in a ternary relationship. Existing open information extraction is limited to one predicate located between two arguments, that is, binary fact extraction, and thus lacks the ability to extract and express all the information that is meant in text. However, RDF ternary relations are typically <predicates, predicates, and objects> constructs where the predicate is a relationship or attribute between the entity at the subject and the object or value at the object. means property. In this regard, in the present invention, all information that can be extracted from text can be embodied in a ternary relationship to facilitate knowledge base integration and query processing.

Hereinafter, an open information extraction system implemented by a computer and an open information extraction method that can be performed by the open information extraction system will be described in more detail.

1 is a block diagram illustrating an internal configuration of an open information extraction system according to an embodiment of the present invention, and FIG. 2 is a flowchart illustrating an open information extraction method according to an embodiment of the present invention.

The open information extraction system 100 according to the present embodiment may include a processor 110, a bus 120, a network interface 130, a memory 140, and a database 150. The memory 140 may include an operating system 141 and an information extraction routine 142. The processor 110 uses the predicate-article extractor 111, the syntax structure analyzer 112, the subject determiner 113, the syntax structure pattern comparator 114, the ternary relation extractor 115, and the ternary relation materializer 116. It may include. In other embodiments the open information extraction system 100 may include more components than the components of FIG. 1.

The memory 140 is a computer-readable recording medium, and may include a permanent mass storage device such as random access memory (RAM), read only memory (ROM), and a disk drive. In addition, the memory 140 may store program codes for the operating system 141 and the information extraction routine 142. These software components may be loaded from a computer readable recording medium separate from the memory 140. Such a separate computer-readable recording medium may include a computer-readable recording medium (not shown) such as a floppy drive, a disk, a tape, a DVD / CD-ROM drive, a memory card, and the like. In other embodiments, software components may be loaded into memory 140 via network interface 130 rather than on a computer readable recording medium.

The bus 120 may enable communication and data transmission between components of the open information extraction system 100. The bus 120 may be configured using a high-speed serial bus, a parallel bus, a storage area network and / or other suitable communication technology.

The network interface 130 may be a computer hardware component for connecting the open information extraction system 100 to a computer network. The network interface 130 may connect the open information extraction system 100 to a computer network through a wireless or wired connection. The network interface 130 may provide a function for communicating with other electronic devices through a computer network. For example, a computer network may include a personal area network (PAN), a local area network (LAN), a campus area network (CAN), a metropolitan area network (MAN), a wide area network (WAN), a broadband network (BBN), and the Internet. It may include any one or more of the network, such as. The computer network may also include any one or more of network topologies including, but not limited to, bus networks, star networks, ring networks, mesh networks, star-bus networks, trees, or hierarchical networks. Do not.

The database 150 serves to store and maintain data that is an object of information extraction, and may include natural language text and the like as a knowledge source. Although FIG. 1 illustrates that the database 150 is built and included in the open information extraction system 100, the present invention is not limited thereto and may be omitted depending on a system implementation method or environment, or the database may be partially or partially. It is also possible to exist as an external database built on a separate other system.

The processor 110 may be configured to process instructions of a computer program by performing input / output operations of the basic arithmetic, logic, and open information extraction system 100. The instructions may be provided to the processor 110 by the memory 140 or the network interface 130 and via the bus 120. For example, the processor 110 may be configured to execute a command received according to a program code stored in a recording device such as the memory 140.

The processor 110 includes, as components, a predicate-article extractor 111, a syntax structure analyzer 112, a subject determiner 113, a syntax structure pattern comparator 114, a ternary relation extractor 115, and a ternary relation materializer ( 116). The processor 110 and the components of the processor 110 execute the program code loaded in the memory 140 to perform the steps S210 to S260 included in the method of FIG. 2 to open the information extraction system 100. Can be controlled. Such program code may be loaded from a program file into a recording device such as memory 140. The processor 110 and the components of the processor 110 may be implemented to execute instructions according to code of at least one program included in the memory 140. In addition, the components of the processor 110 may be representations of different functions performed by the processor 110. For example, the predicate-dispensing extractor 111 may be used as a functional expression in which the processor 110 operates to extract predicates and arguments from text according to the instructions described above.

In operation S210, the predicate-dissertation extractor 111 may receive a natural language text that is an information extraction target, that is, a knowledge source, as an input, and extract arguments and predicates included in the text. For example, the predicate-dissertation extractor 111 may extract arguments and predicates, which are basic components of information extraction from text, in phrase units. For example, the predicate-dissertation extractor 111 may separate text into morpheme units and extract predicates and arguments based on parts of speech that represent the grammatical properties of each word for the separated morphemes. The predicates that make up a sentence correspond to elements that form the basis of sentence construction and describe the behavior, state, or nature of the subject. Predicates require other language elements to complete the meaning they represent, which are called arguments for predicates. For example, vocabulary sequences such as single nouns, compound nouns, noun phrases, and noun clauses may correspond to the argument. Then, the predicate-terminal extractor 111 extracts ending word and position information about predicates included in the text and postposition and position information about each argument from the input text. can do.

In operation S220, the syntax structure analyzer 112 may analyze the syntax structure of the predicate and the argument unit extracted from the text. The syntax structure between predicate-discussions can contain information about which predicates each argument depends on. The syntax structure analyzer 112 may analyze the dependency structure between the predicate extracted from the text and the argument based on the ending and position of the predicate extracted by the predicate-dispute extractor 111 and the investigation and the position of the argument. For example, a predicate is extracted for each word in the text based on the parts of speech that represent the grammatical properties of the word. The predicates required by the predicate are based on the predicate structure according to the grammar form of the predicate. By extracting the elements, we can analyze the dependency structure between the predicate and the argument.

In step S230, the subject determiner 113 serves to determine the subject from the arguments extracted from the text. For example, the subject determiner 113 selects a candidate argument (eg, a noun phrase, etc.) that may be a subject among the arguments included in the text, and then uses the predicate-dissertation extractor 111 among the candidate arguments. Based on the survey and the location of the argument extracted from, it can be decided to give the argument with the earliest position and / or the argument with the specific survey (,,,,).

In operation S240, the syntax structure pattern comparator 114 may determine the structure pattern corresponding to the text by comparing the predicate-dissertation syntax structure analyzed by the syntax structure analyzer 112 with a predetermined representative dependency structure pattern. A representative structure pattern may be previously defined for the dependency structure between the predicate and the argument, and the syntax structure pattern comparator 114 compares the syntax structure of the input text with a previously defined representative dependency structure pattern. You can decide which conversion rule to apply to. The representative dependency structure pattern is for proposing a conversion rule into a coherent ternary relationship according to the syntax structure of the text, which will be described later in detail.

In operation S250, the ternary relation extractor 115 may extract the core ternary relation of the text based on a structural pattern corresponding to the syntax structure between the predicate-non-terminal of the text. As an example, the ternary relation extractor 115 may extract the core ternary relation based on the subject (SBJ), the verb (VP), and the object (OBJ), which are the core contents of the text. In other words, the ternary relationship extractor 115 may create a core ternary relationship based on the subject, the core verb, and the core object of the text. This core ternary relationship can be the basis for building a materialized ternary relationship.

In step S260, the ternary relation refiner 116 may embody the remaining non-predicate relations based on the core ternary relation extracted from the ternary relation extractor 115. The ternary relationship refiner 116 may express all predicate-non-relational relations included in the text as materialized ternary relations. In other words, the materialized ternary relation means a form in which all predicates and arguments of text are embodied under the criteria of the core ternary relation.

3 illustrates an open information extraction process for an example sentence.

Given the input sentence 300, "In 2013, CNN selected Italian cuisine as the best dish in the world."

The predicate-dissertation extractor 111 may extract all predicates and arguments included in the input sentence 300 (301). Input sentence (300) In "2013 CNN selected Italian cuisine as the best dish in the world.", In the argument <2013, CNN, the best dish in the world, Italian cuisine> and predicate < Under selection> is extracted.

The syntax structure analyzer 112 may analyze the dependency structure between the predicate and the argument extracted from the input sentence 300 (302). As a dependency structure in terms of phrases, the relationship with the predicate <selection> can be expressed for the argument <2013, CNN, the world's best cuisine, Italian cuisine>.

The subject determiner 113 may determine a subject among the arguments extracted from the input sentence 300 (303). Based on the research and location of the arguments, you can select the key subject <CNN> from the arguments <2013, CNN, World's Best Cuisine, Italian Cuisine>.

The ternary relation extractor 115 may extract the core ternary relation of the input sentence 300 according to a structure pattern corresponding to the dependency structure between the predicate of the input sentence 300 and the argument (304). In the input sentence (300) "CNN selected Italian food as the best food in the world in 2013," the <CNN>, the core verb <selection>, and the core object <Italian cuisine> can be expressed as a core ternary relationship.

The ternary relation materializer 116 may specify the remaining predicate-non-term relations included in the input sentence 300 based on the core ternary relation of the input sentence 300 (305). Based on the core ternary relations <CNN>-<selection>-<Italian cuisine>, the remaining predicate-declaration relations are <selection # 1>-<to>-<the best dishes of the world>, <selection # 1>-< It can be expressed as a ternary relationship specified as JOSA>-<2013>, <Selection # 1>-<SP>-<Selection>.

In other words, referring to FIG. 4, the input sentence 400 is based on the predicate-dissertation syntax structure 402 of the input sentence 400 for the input sentence 400 "CNN selected Italian cuisine as the best dish in the world." All predicate-dissertation relations within (400) can be expressed as materialized ternary relations (405). Therefore, by generating a ternary relationship embodied based on the syntax structure of the sentence, it is possible to prevent information loss occurring in the word unit extraction method through the information extraction method of the phrase unit.

Although a Korean sentence is described as an example, the English sentence is similarly illustrated in FIG. 5 when the English sentence 500 "A. Einstein was awarded the Nobel Prize in Sweden in 1921." The predicate-dissertation syntax structure 502 of the sentence 500 may be grasped, and based on this, all the predicate-dissertation relations in the sentence 500 may be expressed as a materialized ternary relation 505.

The present invention can be defined by dividing the syntax structure pattern of the text into four representative dependency structure patterns in order to propose a conversion rule into a coherent ternary relationship according to the syntax structure of the text.

6 to 12 are diagrams for explaining an example of the representative dependency structure pattern.

6 to 12, SBJ denotes a core subject, VP denotes a verb phrase, NP denotes a noun phrase, and REL denotes a core predicate.

6 shows an example of a first structural pattern.

The first structural pattern 600 has a syntax structure corresponding to <SBJ (VP) * REL>. For the syntax structure of <SBJ (VP) * REL>, [{SBJ-REL # 1-ANONYMOUS}, {REL Ternary relationship conversion rules 610 may be defined, such as # 1-VP # 1-ANONYMOUS}, {VP # 1-VP # 2-ANONYMOUS}, {VP # 2-VP * -ANONYMOUS *}]. For example, in the sentence "man is born, live and die", <person> corresponds to SBJ, <born> and <sal> correspond to VP, and <death> corresponds to REL. The sentence "A person is born, lives, and dies" has a syntax structure of <SBJ (VP) * REL>, and converts all the information (SBJ, VP, REL) extracted from the sentence into the conversion rule of the first structural pattern 600 ( 610 may be expressed as a ternary relationship 605 embodied according to the present invention. In this case, for the sentence "People are born, live, and die," create a core ternary relationship, such as {Person-Juk # 1-ANONYMOUS}. 1-ANONYMOUS}, {Thanked # 1-born or # 1-ANONYMOUS} can be expressed as a ternary relationship (605).

7 shows an example of a second structural pattern.

The second structural pattern 700 has a syntax structure corresponding to <(NP) * SBJ (NP) * REL>, and the syntax structure of <(NP) * SBJ (NP) * REL> is [{SBJ-REL>. # 1-NP # 1}, {REL # 1-NP # 2JOSA-NP # 2}, {REL # 1-NP # 3JOSA-NP # 3}, {REL # 1-NP * JOSA-NP *}] The same ternary relationship transformation rule 710 may be defined. For example, in the sentence "CNN selected Italian food as the best food in the world in 2013", <CNN> is SBJ, and <2013> and <Best food in the world> are NP. <Selection> corresponds to REL. The sentence "2013 CNN selected Italian food as the best dish in the world" has the syntax structure of <(NP) * SBJ (NP) * REL> and all information extracted from the sentence (SBJ, NP, REL) May be expressed as a ternary relationship 705 embodied according to the conversion rule 710 of the second structural pattern 700. In this case, for the sentence "CNN selected Italian food as the best dish in the world in 2013," the core ternary relationship was created as in <CNN-Selected # 1-Italian Cuisine> and all the remaining phrases based on the core ternary relationship were used. The relationship can be expressed as a ternary relationship (705) embodied as {choice # 1-to-the best dish of the world}, {choice # 1-JOSA-2013}.

8 shows an example of a third structural pattern.

The third structural pattern 800 has a syntax structure corresponding to <(NP) * SBJ (NP) * (VP) * REL>, and the syntax of <(NP) * SBJ (NP) * (VP) * REL> [{SBJ-REL # 1-ANONYMOUS}, {REL # 1-VP # 1-ANONYMOUS}, {VP # 1-VP * -NP # 1}, {VP * -NP # 2JOSA-NP # 2 }, {VP * -NP * JOSA-NP *}] may be defined ternary relationship transformation rule 810. For example, the sentence "Ida's trajectory lies between Mars and its purpose, like the other asteroids in the asteroids", corresponds to the SBJ, and other planets of the asteroids and Mars. Between Jupiter corresponds to NP, <lie> corresponds to VP, and <to> corresponds to REL. The sentence "The trajectory of Ida, like the other asteroids in the asteroid belt, lies between Mars and the purpose" has a syntactic structure of <(NP) * SBJ (NP) * (VP) * REL> and is extracted from the sentence. All information SBJ, NP, VP, and REL may be expressed as a ternary relationship 805 embodied according to the conversion rule 810 of the third structural pattern 800. Then, for the sentence "Ida's trajectory lies between Mars and purpose," like the other asteroids in the asteroid belt, a key ternary relationship is created, as in <Ida's orbit # 1-ANONYMOUS>. By reference to the relationship between all other phrases {{# 1-anon # 1-ANONYMOUS}, {an # 1-on-between Mars and Jupiter}, {like # 1-another-asteroids} It can be expressed as a ternary relationship 805 embodied as follows.

9 shows examples of the fourth structural pattern. Referring to FIG. 9, the fourth structural pattern 900 has a syntax structure corresponding to <(NP) * SBJ (NP) * (VP) + (NP) + (VP) * REL> and has a variety of predicate-dissertation. Can cover dependencies.

As shown in FIG. 10, the syntax structure of <(NP) * SBJ (NP) * (VP) + (NP) + (VP) * REL>, which is the fourth structural pattern 900, is shown in [{SBJ-REL # 1-REL.NP # 1}, {REL # 1-VP # 1-VP # 1.NP # 2}, {VP # 1-VP + -VP + .NP *}, {VP + -NP * JOSA-VP + .NP *}] Ternary relationship transformation rule 1010 may be defined. Here, REL.NP # 1 is an expression for indicating that NP # 1 depends on REL, and VP # 1.NP # 2 is an expression for indicating that NP # 2 depends on VP # 1, VP +, unlike VP *, must appear at least once.

For example, referring to FIG. 11, in the case of the sentence "Antoine Lavoisier, it is also shortly referred to as Lavoisier in Korea." In the case of <Antoine Lavoisier> corresponds to SBJ, and <Korea> and <Lavoisier> correspond to NP. , <Short line> corresponds to VP, and <mark> corresponds to REL. The sentence "Antoine Lavoisier is also abbreviated as Lavoisier in the Republic of Korea." Has a syntax structure of <(NP) * SBJ (NP) * (VP) + (NP) + (VP) * REL>. All of the extracted information SBJ, NP, VP, and REL may be expressed as a ternary relationship 1105 embodied according to the conversion rule 1010 of the fourth structural pattern 900. At this time, for the sentence "Antoine Lavoisier, abbreviated as Lavoisier in the Republic of Korea", create a core ternary relationship such as <Antoine Lavoisier-Notation # 1-ANONYMOUS>, and then use the rest of the syntax based on the core ternary relation. The relationship between the three forms of relationship (1105), such as {notation # 1-also-labuzier}, {notation # 1-shorten # 1-ANONYMOUS}, {shorten # 1-in Korea] I can express it.

As another example, referring to FIG. 12, in the case of the sentence "labuazie insisted on a new combustion theory and developed chemistry while discarding the phlojistone theory", <labuajie> corresponds to SBJ, and <new combustion theory> and <floodstone theory > And <chemistry> correspond to NP, <disposal> and <claim> to VP, and <to develop> to REL. The sentence "The Lavoisier developed chemistry by insisting on a new combustion theory and discarding the phlogiston theory" is the syntax of <(NP) * SBJ (NP) * (VP) + (NP) + (VP) * REL>. It has a structure, and all information SBJ, NP, VP, and REL extracted from a sentence may be expressed as a ternary relationship 1205 embodied according to the transformation rule 1010 of the fourth structure pattern 900. At this time, the core ternary relationship was created as in <Lavoisier-Development # 1-Chemistry> for the sentence "Lavoisier insisted on a new combustion theory and discarded the phlogiston theory." The relationship between all the remaining phrases can be expressed as a ternary relationship (1205) embodied as {development # 1-waste geometry # 1-phlogiston theory} and {waste geometry # 1-assertion # 1-new combustion theory}. .

The ternary relation conversion rule according to the syntax structure is determined according to linguistic structure, grammar form, and the like, and is not limited thereto.

Therefore, the open information extraction system and the open information extraction method according to the present invention can express the relation between all predicates-discourses existing in a sentence in a ternary relationship embodied according to the conversion rule according to the syntax structure of the sentence.

As described above, according to the embodiments of the present invention, information extraction may be performed on all texts using a fully open information extraction method applicable to all domains other than a specific domain as information extraction on an open domain. In particular, according to an embodiment of the present invention, more information can be accurately extracted from the text by suggesting a method of converting the predicate-dissertation relationship present in one text into a coherent ternary relationship. In addition, according to an embodiment of the present invention, the information extracted from the text maintains a ternary relationship, thereby facilitating integration with other knowledge bases and maintaining a form capable of query processing in a conventional manner. In addition, according to an embodiment of the present invention, by having a specific relationship between the information extracted from the text, it is possible to prevent the occurrence of confusion between the individual knowledge and to further improve the accuracy of the query processing results.

The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments may include a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA), a programmable PLU (programmable). It can be implemented using one or more general purpose or special purpose computers, such as logic units, microprocessors, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

In a computer implemented method,

Receiving text as an information extraction target;

Extracting arguments and predicates included in the text; And

Expressing the argument and the predicate as a ternary relationship of a resource description framework (RDF)

How to include.
The method of claim 1,

The extracting step,

Extracting all arguments and predicates contained in the text in phrase units

Characterized by the above.
The method of claim 1,

Analyzing a syntax structure between the argument and the predicate

More,

The expressing step,

Expressing the relation between the argument and the predicate as the ternary relation according to the syntax structure between the argument and the predicate.

Characterized by the above.
The method of claim 1,

Analyzing a syntax structure between the argument and the predicate

More,

The expressing step,

Expressing the relation between the argument and the predicate as the ternary relation according to a ternary relation transformation rule corresponding to the syntax structure.

Characterized by the above.
The method according to claim 3 or 4,

The analyzing step,

Analyzing the dependency structure of the argument to the predicate for each of the arguments

Characterized by the above.
The method of claim 1,

Determining an argument corresponding to a subject among the above arguments

More,

The expressing step,

Expressing a core ternary relationship including the subject with respect to the relation between the argument and the predicate, and expressing a ternary relationship that embodies the relationship between the remaining arguments and the predicate based on the core ternary relationship

Characterized by the above.
The method of claim 1,

The expressing step,

Expressing a core ternary relationship including a core subject, a core verb, and a core object with respect to the relationship between the argument and the predicate, and expressing a ternary relationship that embodies the relationship between the remaining arguments and the predicate based on the core ternary relationship.

Characterized by the above.
Combined with a computer system,

Receiving text as an information extraction target;

Extracting arguments and predicates included in the text; And

Expressing the argument and the predicate as a ternary relationship of a resource description framework (RDF)

A computer program recorded on a recording medium for executing the program.
In a computer implemented system,

At least one processor implemented to execute instructions readable by the computer

Including,

The at least one processor,

Receives text as an information extraction target and extracts arguments and predicates included in the text,

Expressing the argument and the predicate as a ternary relationship of a resource description framework (RDF)

System characterized in that.
The method of claim 9,

The at least one processor is configured to extract arguments and predicates included in the text.

Extracting all arguments and predicates contained in the text in phrase units

System characterized in that.
The method of claim 9,

The at least one processor,

Analyze a syntax structure between the argument and the predicate,

The at least one processor, in order to express in the ternary relationship,

Expressing the relation between the argument and the predicate as the ternary relation according to the syntax structure between the argument and the predicate.

System characterized in that.
The method of claim 9,

The at least one processor,

Analyze a syntax structure between the argument and the predicate,

The at least one processor, in order to express in the ternary relationship,

Expressing the relation between the argument and the predicate as the ternary relation according to a ternary relation transformation rule corresponding to the syntax structure.

System characterized in that.
The method according to claim 11 or 12, wherein

The at least one processor is configured to analyze the syntax structure,

Analyzing the dependency structure of the argument to the predicate for each of the arguments

System characterized in that.
The method of claim 9,

The at least one processor,

From the above arguments to determine the argument corresponding to the subject,

The at least one processor, in order to express in the ternary relationship,

Expressing a core ternary relationship including the subject with respect to the relation between the argument and the predicate, and expressing a ternary relationship that embodies the relationship between the remaining arguments and the predicate based on the core ternary relationship

System characterized in that.
The method of claim 9,

The at least one processor, in order to express in the ternary relationship,

Expressing a core ternary relationship including a core subject, a core verb, and a core object with respect to the relationship between the argument and the predicate, and expressing a ternary relationship that embodies the relationship between the remaining arguments and the predicate based on the core ternary relationship.

System characterized in that.