CN113822043A - Method and system for extracting cause and effect relationship of affairs - Google Patents

Method and system for extracting cause and effect relationship of affairs Download PDF

Info

Publication number
CN113822043A
CN113822043A CN202111111833.1A CN202111111833A CN113822043A CN 113822043 A CN113822043 A CN 113822043A CN 202111111833 A CN202111111833 A CN 202111111833A CN 113822043 A CN113822043 A CN 113822043A
Authority
CN
China
Prior art keywords
cause
effect
causal
relationship
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111111833.1A
Other languages
Chinese (zh)
Inventor
唐广法
董世鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Minglue Artificial Intelligence Group Co Ltd
Original Assignee
Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Minglue Artificial Intelligence Group Co Ltd filed Critical Shanghai Minglue Artificial Intelligence Group Co Ltd
Priority to CN202111111833.1A priority Critical patent/CN113822043A/en
Publication of CN113822043A publication Critical patent/CN113822043A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to a method and a system for extracting causal relationship of events, wherein the method for extracting causal relationship of events comprises the following steps: punctuating a text to be processed to obtain at least one clause; judging whether causal relationships exist among the clauses according to the clauses and predefined causal conjunctions, generating a first causal set, and respectively defining the corresponding clauses as a cause segment and an effect segment; dividing the factor segment and the fruit segment into at least one candidate event by utilizing a dependency grammar respectively, judging a cause-effect relationship between the candidate events, generating a second cause-effect set, and obtaining a sub-factor segment and a sub-fruit segment corresponding to the factor segment and the fruit segment; and extracting the child fruit segments in the factor segments as the factors of the text core relationship, and extracting the child factor segments in the fruit segments as the results in the text core relationship. According to the method and the device, the causal relationship of the post-cause and pre-cause effects can be identified, and the cause and the result are accurately extracted, so that the effectiveness of causal relationship identification is improved.

Description

Method and system for extracting cause and effect relationship of affairs
Technical Field
The application relates to the technical field of natural language identification, in particular to a method and a system for extracting causal relationships.
Background
Domain mapping for a particular industry often requires identification of events and causal relationships between events, a task also known as causal relationship extraction. The conventional method is that "during the alignment of the 24 th runway, the front wheel steering failure indicator sounds, and the captain indicates that he has forgotten to press the front wheel steering switch on the throttle lever. When the sentence is subjected to cause and effect extraction, the reason that the front wheel steering switch on the throttle lever is forgotten to be pressed is at the end of the sentence, and the cause and effect relation of the "cause and effect" is often extracted by the traditional method, so that the reason cannot be extracted.
In the prior art, an extraction method based on a dependency parser, an extraction method based on statistical machine learning, or an integration method using a mixture of the two methods are generally adopted. The above-mentioned method based on dependency parsing cannot determine which event is the cause of the two events and which event is the effect, and the conventional statistical machine learning method and integration method need to prepare a large amount of labeled data.
At present, no effective solution is provided for the problem that the specific reason and result cannot be accurately judged in the related technology.
Disclosure of Invention
The embodiment of the application provides a method and a system for extracting a causal relationship, so as to at least solve the problem that specific reasons and results cannot be accurately judged in the related technology.
In a first aspect, an embodiment of the present application provides a method for extracting cause and effect relationships, including the following steps;
a sentence segmentation step, namely segmenting a sentence of the text to be processed through punctuations to obtain at least one clause;
a first cause and effect set acquisition step, namely judging whether cause and effect relationships exist between clauses according to clauses and predefined cause and effect conjunctions to generate a first cause and effect set, and respectively defining the corresponding clauses as a cause segment and an effect segment;
a second cause and effect set obtaining step, namely dividing the factor segment and the fruit segment into at least one candidate event by utilizing dependency grammar respectively, judging cause and effect relationship between corresponding candidate events, and generating a second cause and effect set to obtain a sub factor segment and a sub fruit segment corresponding to the factor segment and the fruit segment;
and a cause and effect relation extraction step, wherein the sub-fruit segments in the factor segments are extracted as the factors of the text core relation, and the sub-factor segments in the effect segments are extracted as the effects in the text core relation.
In some of these embodiments, the second cause and effect set acquisition step further comprises,
and judging whether weak postcausal connection words exist among the candidate events, if so, judging that the relation is the postcausal relation, directly generating a second causal set, if not, judging whether the causal relation exists again by using the dependency grammar, and if so, generating the second causal set.
The weak postcausal connection includes at least an indication, an approximation, an origin, a possibility, an estimation, a dependence, a cause, and a point.
In some embodiments, the step of obtaining the candidate event specifically includes:
analyzing the clause by using the dependency syntax and obtaining a dependency syntax tree, checking nodes in the dependency syntax tree and judging the part-of-speech labels of corresponding words on adjacent nodes, and if the parts-of-speech labels are nouns and verbs, combining the parts-of-speech labels into candidate events.
In some embodiments, when there is no weak postcausal link in the second causal set obtaining step, the step of determining again whether there is a causal relationship using the dependency syntax specifically includes:
and describing the dependency relationship among the nodes by inquiring the dependency path in the dependency syntax tree, and judging whether the causal relationship exists among the candidate events based on the specific field grammar library.
In some embodiments, the predefined causal links include a high-priority causal link and a low-priority causal link, and whether causal relationships exist in the clauses is sequentially determined by using the high-priority causal link and the low-priority causal link, wherein,
high priority causal links include at least cause and cause, and low priority causal links include at least entrainment, triggering, relationship, infiltration, inducement, sweeping, and inducement.
In a second aspect, an embodiment of the present application provides an event cause and effect relationship extraction system, including:
the sentence segmentation module is used for segmenting a sentence of the text to be processed through punctuations to obtain at least one clause;
the first cause and effect set acquisition module is used for judging whether cause and effect relationships exist between clauses according to clauses and predefined cause and effect conjunctions, generating a first cause and effect set and respectively defining the corresponding clauses as a cause segment and an effect segment;
the second cause and effect set acquisition module is used for dividing the factor segment and the fruit segment into at least one candidate event by utilizing the dependency grammar respectively, judging the cause and effect relationship between the corresponding candidate events and generating a second cause and effect set to obtain the sub-factor segment and the sub-fruit segment corresponding to the factor segment and the fruit segment;
and the cause and effect relation extraction module extracts the sub-fruit segments in the factor segments as the factors of the text core relation and extracts the sub-factor segments in the effect segments as the effects in the text core relation.
In some embodiments, the second cause and effect set obtaining module may further determine whether a weak post cause and effect conjunction exists between the candidate events, if so, determine that the relationship is a post cause and effect relationship, directly generate the second cause and effect set, if not, determine again whether the cause and effect relationship exists by using the dependency syntax, and if so, generate the second cause and effect set.
In some embodiments, when the weak postcausal connection does not exist, the dependency syntax is used to determine whether the causal relationship exists again, the second causal set obtaining module describes the dependency relationship between the nodes by querying the dependency path, and determines whether the causal relationship exists between the candidate events based on the domain-specific syntax library.
In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the cause and effect extraction method according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the cause and effect extraction method according to the first aspect.
Compared with the related art, the method and the system for extracting the causal relationship of the affairs provided by the embodiment of the application can be applied to the technical field of knowledge maps and can also be applied to the technical field of knowledge reasoning, the first causal set and the second causal set are respectively constructed by analyzing clauses and candidate events to obtain the core relationship of the text, the causal relationship of the former cause and the later cause can be identified, the cause and the result are accurately extracted, and therefore the effectiveness of identifying the causal relationship of the affairs is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a cause and effect extraction method according to an embodiment of the present application;
FIG. 2 is a flow chart of another method of causal extraction according to an embodiment of the present application;
FIG. 3 is a flow chart of a cause and effect extraction method according to a preferred embodiment of the present application;
FIG. 4 is a block diagram of a cause and effect extraction system according to an embodiment of the present application;
fig. 5 is a hardware structure diagram of a computer device according to an embodiment of the present application.
Wherein:
a sentence segmentation module 1; a first cause and effect set acquisition module 2;
a second cause and effect set acquisition module 3; a causal relationship extraction module 4; a processor 81; a memory 82;
a communication interface 83; a bus 80.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
In the embodiment of the present application, the causal relationship refers to identifying an event and obtaining a triple composed of an event a and an event B having a causal relationship.
The reason postscript means that the sentence first describes the result of the event, and indicates the reason for the result. The antecedent consequence means that the sentence describes the reason first and then describes the result.
The embodiment also provides a method for extracting the causal relationship of the affairs. Fig. 1 is a flowchart of a cause and effect extraction method according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
a sentence segmentation step S1, namely segmenting a sentence of the text to be processed through punctuations to obtain at least one clause;
a first cause and effect set obtaining step S2 of determining whether a cause and effect relationship exists between clauses according to clauses and predefined cause and effect conjunctions, generating a first cause and effect set, and defining corresponding clauses as a cause segment and an effect segment, respectively;
a second cause and effect set obtaining step S3, dividing the factor segment and the fruit segment into at least one candidate event by using dependency syntax, and determining a cause and effect relationship between corresponding candidate events to generate a second cause and effect set, so as to obtain a sub-factor segment and a sub-fruit segment corresponding to the factor segment and the fruit segment;
and a cause and effect relationship extraction step S4, extracting the sub-fruit segments in the factor segments as the factors of the text core relationship, and extracting the sub-factor segments in the effect segments as the effects in the text core relationship.
Through the steps, the clauses and the candidate events are analyzed, the first cause and effect set and the second cause and effect set are respectively constructed, the core relation of the text is obtained, the cause and effect relation of the post-cause and pre-cause effects can be identified, the cause and the result are accurately extracted, and therefore the effectiveness of cause and effect identification is improved.
The dependency syntax is proposed first by french linguist l.tesniere, which parses a sentence into a dependency syntax tree describing the dependency relationships between words.
In some embodiments, the predefined causal connection words include a high-priority causal connection word and a low-priority causal connection word, and whether causal relationships exist among the clauses is sequentially judged by using the high-priority causal connection word and the low-priority causal connection word, wherein the high-priority causal connection word at least includes cause and cause, and the low-priority causal connection word at least includes entrainment, trigger, relationship, infiltration, temptation, entrainment and inducement.
In the steps, the high-priority conjunctions are used for matching the strong causal relationship, and then the low-priority causal conjunctions are used for matching the secondary strong causal relationship, so that the more important text core relationship can be matched, the subsequent causal relationship extraction is performed around the text core relationship, and the extraction efficiency and accuracy can be effectively improved.
In some of these embodiments, the second cause and effect set obtaining step S3 further includes,
and judging whether weak postcausal connection words exist among the candidate events, if so, judging that the relation is the postcausal relation, directly generating a second causal set, if not, judging whether the causal relation exists again by using the dependency grammar, and if so, generating the second causal set.
Through the steps, whether causal relationship exists between the candidate events can be effectively judged.
The weak post-causal connection includes pointing out, presumably, originating, likely, estimated, dependent, out, and standing, etc. Otherwise, if there are other causal connection words, it is the antecedent consequence.
In some embodiments, the step of obtaining the candidate event specifically includes:
analyzing the clause by using the dependency syntax and obtaining a dependency syntax tree, checking nodes in the dependency syntax tree and judging the part-of-speech labels of corresponding words on adjacent nodes, and if the parts-of-speech labels are nouns and verbs, combining the parts-of-speech labels into candidate events.
When a causal segment is extracted using dependency syntax, nodes in the dependency syntax tree are checked, and if noun and verb phrases (VP, NP) are present in a clause, they can be combined into one candidate event, and if there are a plurality of combinations of (VP, NP) in a clause, they can be decomposed into a plurality of candidate events according to (VP, NP) and (VV, (NP, NP).
In some embodiments, when there is no weak postcausal link in the second causal set obtaining step, the step of determining again whether there is a causal relationship using the dependency syntax specifically includes:
and describing the dependency relationship among the nodes by inquiring the dependency path, and judging whether the causal relationship exists among the candidate events based on a specific field grammar library.
In the absence of high priority causal links between certain clauses or candidate events, pre-causal and post-causal results may be obtained by the above steps. The expression of other preposed cause and effect can not be influenced when the postpositional cause and effect is forcibly extracted.
It should be noted that, after the causal relationship of the postcause is obtained, the causal relationship is merged into one event, and the linear construction at this time is constructed by the ante-causal relationship, and cannot be constructed in the order of occurrence of the events.
The embodiment also provides a method for extracting the causal relationship of the affairs. Fig. 2 is a flowchart of another method for extracting cause and effect relationship according to an embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:
s201, firstly, inputting sentences, extracting cause-effect relationships by using keywords, segmenting the sentences to obtain a plurality of clauses, and generating a first cause-effect set;
s202, dividing candidate events according to the clauses by using the dependency grammar;
s203, judging whether the candidate event has a weak post cause and effect conjunction, if so, judging to form a post cause and effect, directly generating a second cause and effect set, and if not, executing S204;
s204, judging whether causal relationships exist among the candidate events by using the dependency grammar, and generating a second causal set according to the identified causal relationships;
s205, the second cause and effect set is merged into the first cause and effect set, some cause and effect words in the first cause and effect set are disassembled from cause and effect relations, and the final cause and effect relations and specific cause and effect relations are output.
Through the steps, the second cause and effect set assists the cause and effect relationship in the first cause and effect set, the final cause and effect relationship and the specific cause and effect are determined, and the efficiency and the effectiveness of cause and effect identification are improved.
The embodiments of the present application are described and illustrated below by means of preferred embodiments.
FIG. 3 is a flow chart of a cause and effect extraction method according to a preferred embodiment of the present application.
S301, preliminarily judging whether causal relationships exist among the clauses.
For the text to be processed, after comma sentence break, a predefined high-priority causal conjunct is used for judging whether causal relationships exist among clauses, for example, the causal relationships such as 'cause, cause' indicate that obvious reasons exist in the sentence, and words such as 'carry, trigger, relationship, infiltration, temptation, wink and entice' indicate causal conjuncts with low priority.
And the high-priority conjunctions are preferentially used for matching, and then the low-priority conjunctions are used for matching other causal relationships, so that more important core relationships can be matched, and the subsequent causal relationships are extracted around the core relationships.
S3021, when the causal relationship is extracted by lacking effective keywords, judging that no core relationship exists, if no core relationship exists, after the sentence is segmented by using the dependency syntax analysis, constructing the causal relationship of related candidate events according to the previous cause, and when the sentence cannot be segmented by using the dependency syntax analysis, judging that the text represents an event.
And S3022, after the core relationship is extracted, further obtaining a causal segment by using dependency syntax analysis, and constructing a relationship set.
The causal segment is constructed around a core relationship, and the core relationship divides the statement into two causal segments; and respectively constructing linear cause and effect in the two sections, taking the extracted effect in the cause section as the cause of the core relationship, and obtaining the effect of the core relationship from the cause of the effect section.
In the above-described procedure, when a causal segment is extracted using dependency syntax, nodes on the dependency syntax tree are checked, and if noun and verb phrases (VP, NP) exist in a clause, they can be combined into one candidate event, and if there are a plurality of combinations of (VP, NP) in a clause, they can be decomposed into a plurality of candidate events according to (VP, NP) and (VV, (NP, NP).
Extracting the causal relationship (including the pre-reason and the post-reason): whether a causal conjunction with low priority exists between two candidate events is checked, if yes, whether the causal conjunction exists indicates that presumably, the source and possibly the estimation depend on that keywords such as out, stand-up and the like indicate causal postings, and if other causal conjunction exists, the causal conjunction is a causal consequence.
In the absence of high priority causal links between certain clauses or candidate events, pre-causal and post-causal results may be obtained by the above steps. The expression of other preposed cause and effect can not be influenced when the postpositional cause and effect is forcibly extracted.
However, after the cause and effect of the postcause are obtained, the cause and effect are combined into an event, and the linear construction at the moment is constructed by the pre-cause effect and cannot be constructed according to the sequence of the occurrence of the event. The purpose of this is that the causal relationship of the postcause is constructed according to the postcause ante-effect; if there are two causal relationships between C and a and D, it is determined that a causes D to be wrong, and it should be complete: a leads to C, which in turn leads to D.
The linear causal relationship modeling method is one of the commonly used analysis methods, including regression analysis, path analysis, and structural equation models.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment also provides a system for extracting causal relationship of events, and the device is used for implementing the above embodiments and preferred embodiments, and the description of the device is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a cause and effect extraction system according to an embodiment of the present application, and as shown in fig. 4, the system includes:
the sentence segmentation module 1 is used for segmenting a sentence of a text to be processed through punctuations to obtain at least one clause;
the first cause and effect set acquisition module 2 is used for judging whether cause and effect relationships exist between clauses according to clauses and predefined cause and effect conjunctions, generating a first cause and effect set, and respectively defining the corresponding clauses as a cause segment and an effect segment;
the second cause and effect set acquisition module 3 is used for dividing the factor segment and the fruit segment into at least one candidate event by utilizing the dependency grammar respectively, judging the cause and effect relationship between the corresponding candidate events, and generating a second cause and effect set to obtain the sub factor segment and the sub fruit segment corresponding to the factor segment and the fruit segment;
and the cause and effect relation extraction module 4 is used for extracting the sub-fruit segments in the factor segments as the factors of the text core relation and extracting the sub-factor segments in the effect segments as the effects in the text core relation.
In some embodiments, the second cause and effect set obtaining module 3 may further determine whether a weak post cause and effect link exists between the candidate events, if so, determine that the relationship is a post cause and effect relationship, directly generate the second cause and effect set, if not, determine again whether a cause and effect relationship exists by using the dependency syntax, and if so, generate the second cause and effect set.
In some embodiments, the second cause and effect set obtaining module 3 may further obtain candidate events, specifically, analyze clauses by using dependency syntax and obtain a dependency syntax tree, check nodes in the dependency syntax tree and determine part-of-speech tags of corresponding words on adjacent nodes, and if the clauses are nouns and verbs, combine the clauses and the verbs into the candidate events.
In some embodiments, when the weak postcausal connection does not exist, the dependency syntax is used to determine whether the causal relationship exists again, the second causal set obtaining module describes the dependency relationship between the nodes by querying the dependency path, and determines whether the causal relationship exists between the candidate events based on the domain-specific syntax library.
The causality conjunctions are defined in advance to comprise high-priority causality conjunctions and low-priority causality conjunctions, and whether the clauses have causality relationships or not is judged in sequence by adopting the high-priority causality conjunctions and the low-priority causality conjunctions, wherein the high-priority causality conjunctions at least comprise causing and causing, and the low-priority causality conjunctions at least comprise trapping, triggering, relationship, infiltration, temptation, harmony and induction.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
In addition, the method for extracting the cause and effect relationship of the embodiment of the application described in conjunction with fig. 1 can be implemented by computer equipment. Fig. 5 is a hardware structure diagram of a computer device according to an embodiment of the present application.
The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 implements any of the above described methods of causal extraction by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 5, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
Bus 80 includes hardware, software, or both to couple the components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The computer device may execute the cause and effect relationship extraction method in the embodiment of the present application based on the clause obtained by sentence-breaking processing of the text to be processed, thereby implementing the cause and effect relationship extraction method described with reference to fig. 1.
In addition, in combination with the method for extracting cause and effect relationship in the foregoing embodiments, the embodiments of the present application may provide a computer readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the above described methods of causal relationship extraction.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for extracting cause and effect relationship of affairs is characterized by comprising the following steps;
a sentence segmentation step, namely segmenting a sentence of the text to be processed through punctuations to obtain at least one clause;
a first cause and effect set acquisition step, namely judging whether cause and effect relationships exist between the clauses according to the clauses and predefined cause and effect conjunctions to generate a first cause and effect set, and respectively defining the corresponding clauses as a cause segment and an effect segment;
a second cause and effect set obtaining step, wherein the cause segment and the fruit segment are divided into at least one candidate event by using dependency grammar respectively, and a cause and effect relationship between corresponding candidate events is judged to generate a second cause and effect set, so that a sub-cause segment and a sub-fruit segment corresponding to the cause segment and the fruit segment are obtained;
and a cause and effect relation extraction step, wherein the sub-fruit segments in the factor segments are extracted as the factors of the text core relation, and the sub-factor segments in the effect segments are extracted as the effects in the text core relation.
2. A causal relationship extraction method according to claim 1, wherein said second causal set acquisition step further comprises,
and judging whether weak postcausal connection words exist among the candidate events, if so, judging that the relation is the postcausal relation, directly generating a second causal set, if not, judging whether the causal relation exists again by using the dependency grammar, and if so, generating the second causal set.
3. The method for extracting cause-and-effect relationship of events according to claim 2, wherein the step of obtaining the candidate events comprises:
and analyzing the clause by utilizing the dependency syntax to obtain a dependency syntax tree, checking nodes in the dependency syntax tree, judging part-of-speech labels of corresponding words on adjacent nodes, and combining the parts-of-speech labels into the candidate event if the parts-of-speech labels are nouns and verbs.
4. The cause-and-effect relationship extraction method according to claim 3, wherein the step of judging again whether or not a cause-and-effect relationship exists by using the dependency syntax when the weak postcausal link does not exist in the second cause-and-effect set acquisition step specifically includes:
and describing the dependency relationship among the nodes by inquiring the dependency path in the dependency syntax tree, and judging whether the candidate events have causal relationship based on a specific field grammar library.
5. The causal relationship extraction method of claim 1, wherein the predefined causal links include high-priority causal links and low-priority causal links, and whether causal relationships exist in the clauses is sequentially determined by using the high-priority causal links and the low-priority causal links, wherein the high-priority causal links at least include cause and cause, and the low-priority causal links at least include entrainment, trigger, relationship, infiltration, enticement, ripple, and enticement.
6. A cause and effect relationship extraction system, comprising:
the sentence segmentation module is used for segmenting a sentence of the text to be processed through punctuations to obtain at least one clause;
the first cause and effect set acquisition module is used for judging whether cause and effect relationships exist between the clauses according to the clauses and predefined cause and effect conjunctions to generate a first cause and effect set, and corresponding clauses are respectively defined as a cause segment and an effect segment;
a second cause and effect set obtaining module, configured to divide the factor segment and the fruit segment into at least one candidate event by using a dependency syntax, and determine a cause and effect relationship between the candidate events, so as to generate a second cause and effect set, and obtain a sub-factor segment and a sub-fruit segment corresponding to the factor segment and the fruit segment;
and the cause and effect relation extraction module is used for extracting the sub-fruit segments in the factor segments as the factors of the text core relation and extracting the sub-factor segments in the effect segments as the effects in the text core relation.
7. The causal relationship extraction system of claim 6, wherein the second causal set obtaining module is further configured to determine whether a weak postcausal link exists between the candidate events, determine that the relationship is a postcausal relationship if the weak postcausal link exists, and directly generate the second causal set, and if the weak postcausal link does not exist, determine whether the causal relationship exists again by using the dependency syntax, and if the weak postcausal relationship exists, generate the second causal set.
8. The causal relationship extraction system of claim 6, wherein when there is no weak postcausal link, and whether there is a causal relationship is determined again by using the dependency syntax, the second causal set extraction module describes a dependency relationship between nodes by querying a dependency path in the dependency syntax tree, and determines whether there is a causal relationship between the candidate events based on a domain-specific syntax library.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the cause and effect extraction method of any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the cause and effect extraction method according to any one of claims 1 to 5.
CN202111111833.1A 2021-09-23 2021-09-23 Method and system for extracting cause and effect relationship of affairs Pending CN113822043A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111111833.1A CN113822043A (en) 2021-09-23 2021-09-23 Method and system for extracting cause and effect relationship of affairs

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111111833.1A CN113822043A (en) 2021-09-23 2021-09-23 Method and system for extracting cause and effect relationship of affairs

Publications (1)

Publication Number Publication Date
CN113822043A true CN113822043A (en) 2021-12-21

Family

ID=78921052

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111111833.1A Pending CN113822043A (en) 2021-09-23 2021-09-23 Method and system for extracting cause and effect relationship of affairs

Country Status (1)

Country Link
CN (1) CN113822043A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08221415A (en) * 1995-02-09 1996-08-30 Canon Inc Japanese sentence analyzing device
US20160321244A1 (en) * 2013-12-20 2016-11-03 National Institute Of Information And Communications Technology Phrase pair collecting apparatus and computer program therefor
CN109284518A (en) * 2018-04-24 2019-01-29 西北工业大学 A kind of optimistic time management method and device
CN110781369A (en) * 2018-07-11 2020-02-11 天津大学 Emotional cause mining method based on dependency syntax and generalized causal network
CN113392542A (en) * 2021-08-16 2021-09-14 傲林科技有限公司 Root cause tracing method and device based on event network and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08221415A (en) * 1995-02-09 1996-08-30 Canon Inc Japanese sentence analyzing device
US20160321244A1 (en) * 2013-12-20 2016-11-03 National Institute Of Information And Communications Technology Phrase pair collecting apparatus and computer program therefor
CN109284518A (en) * 2018-04-24 2019-01-29 西北工业大学 A kind of optimistic time management method and device
CN110781369A (en) * 2018-07-11 2020-02-11 天津大学 Emotional cause mining method based on dependency syntax and generalized causal network
CN113392542A (en) * 2021-08-16 2021-09-14 傲林科技有限公司 Root cause tracing method and device based on event network and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANTONIO SORGENTE ET AL: "Automatic extraction of cause-effect relations in Natural Language Text", 《RESEARCHGATE》, pages 1 - 12 *

Similar Documents

Publication Publication Date Title
KR102013230B1 (en) Apparatus and method for syntactic parsing based on syntactic preprocessing
CN109857992B (en) Medical data structured analysis method and device, readable medium and electronic equipment
US20210150142A1 (en) Method and apparatus for determining feature words and server
US10394950B2 (en) Generation of a grammatically diverse test set for deep question answering systems
US10133724B2 (en) Syntactic classification of natural language sentences with respect to a targeted element
CN108874778B (en) Semantic entity relation extraction method and device and electronic equipment
CN112163681B (en) Equipment fault cause determining method, storage medium and electronic equipment
Arora et al. Requirement boilerplates: Transition from manually-enforced to automatically-verifiable natural language patterns
JP2001523019A (en) Automatic recognition of discourse structure in text body
CN108763202B (en) Method, device and equipment for identifying sensitive text and readable storage medium
Zhang et al. Automated multiword expression prediction for grammar engineering
Zhao et al. Automatic assertion generation from natural language specifications using subtree analysis
Begum et al. Identification of conjunct verbs in hindi and its effect on parsing accuracy
Harris et al. Generating formal hardware verification properties from natural language documentation
CN110929520A (en) Non-named entity object extraction method and device, electronic equipment and storage medium
CN111985244A (en) Method and device for detecting manuscript washing of document content
CN112650836B (en) Text analysis method and device based on syntax structure element semantics and computing terminal
WO2016068690A1 (en) Method and system for automated semantic parsing from natural language text
CN108573025B (en) Method and device for extracting sentence classification characteristics based on mixed template
CN113822043A (en) Method and system for extracting cause and effect relationship of affairs
CN113642739B (en) Training method of sensitive word shielding quality evaluation model and corresponding evaluation method
KR101409298B1 (en) Method of re-preparing lexico-semantic-pattern for korean syntax recognizer
Han et al. Subcategorization acquisition and evaluation for Chinese verbs
CN108563617B (en) Method and device for mining Chinese sentence mixed template
CN112183114A (en) Model training and semantic integrity recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211221