CN112559656A

CN112559656A - Method for constructing affair map based on hydrologic events

Info

Publication number: CN112559656A
Application number: CN202011426608.2A
Authority: CN
Inventors: 冯钧; 邬炜; 陆佳民
Original assignee: Hohai University HHU
Current assignee: Hohai University HHU
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-03-26

Abstract

The invention discloses a method for constructing a case map based on a hydrological event, which belongs to the technical field of case map construction and comprises a causal relationship extraction method based on a template library, an event extraction method based on mode matching and a neural network and a field case map for constructing a fusion water conservancy model; the causal relation extraction method based on the template library partially extracts the causal sentence and the effect sentence from the sentences. And the event extraction method part based on pattern matching and the neural network acquires a formalized causal event pair from the causal sentence obtained in the last step. And clustering and manually adjusting the trigger words when defining the event trigger words, so that the recall rate of the event extraction task is improved. The voting mechanism is adopted to integrate three event element extraction methods, so that the event extraction performance is improved; and the field affair map part for constructing and fusing the water conservancy model fuses the water conservancy model with the field characteristic into the affair map, performs quantitative analysis on the basis of qualitative analysis, and provides support for reasoning of the water conservancy affair map.

Description

Method for constructing affair map based on hydrologic events

Technical Field

The invention belongs to the technical field of construction of a matter graph, and particularly relates to a method for constructing a matter graph based on a hydrological event.

Background

Events are the basic unit of knowledge that humans remember and understand the real world. The knowledge graph mainly describes the relationship among various entities in the real world, and the case graph is a logic network which is used for describing the real world by taking an event as a core and tightly connecting static knowledge with dynamic logic rules on the basis of the knowledge graph. With the development of water conservancy projects in China, a large amount of literature data is accumulated in the water conservancy field, so that the construction of a water conservancy field affair map is possible.

Researchers describe the rules and modes of logic evolution among water conservancy events by constructing a water conservancy affair map, and support is provided for water resource management decisions.

The related research of the existing water conservancy affair map construction method has the following defects: the existing Chinese causal relation connection word library is not comprehensive and has poor field applicability; in the event extraction process, the problems of one-to-many trigger words and event types, low accuracy of event element extraction and the like exist; the transfer probability in the case map is fixed, and the interpretability is poor, so that the case map is difficult to adapt to the mutability and diversity among water conservancy events.

Therefore, it is necessary to design a new method for constructing a case map based on hydrologic events.

Disclosure of Invention

The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a hydrologic event-based matter graph construction method, a field matter graph part for constructing and fusing a water conservancy model fuses a water conservancy model with a special field into a matter graph, quantitative analysis is carried out on the basis of qualitative analysis, and support is provided for reasoning of the water conservancy matter graph.

The technical scheme is as follows: in order to achieve the purpose, the invention adopts the following technical scheme:

the method for constructing the affair map based on the hydrologic events comprises the following steps:

(1) collecting text corpora in the water conservancy field, and preprocessing the text corpora to be used as a subsequent map construction;

(2) extracting cause and effect relation example sentences and sentence characteristics by using the existing general cause and effect relation connecting words and the text corpus obtained in the step (1) to construct a cause and effect relation corpus;

(3) extracting cause sentences and result sentences from the sentences based on the causal relation extraction of the template library;

(4) acquiring a formalized causal event pair from the reason sentence and the result sentence obtained in the last step based on pattern matching and event extraction of a neural network;

(5) and (3) constructing a field case diagram fused with the water conservancy model, fusing the water conservancy model with the field specific water conservancy model into the case diagram, and carrying out quantitative analysis on the basis of qualitative analysis to provide support for reasoning of the water conservancy case diagram.

Further, the step (3) is specifically as follows: based on the idea of a Bootstrapping algorithm, a causal relationship extraction method is continuously and iteratively searched by utilizing the existing extraction result, semantic similarity is calculated through a convolution tree kernel similarity based on a syntactic structure and a BERT model based on semantic features, and new causal relationship connecting words are extracted.

Further, the step (4) is specifically as follows:

4.1) defining an event frame, and clustering and manually adjusting trigger words when defining the event trigger words;

4.2) aiming at a predefined event frame, combining an event trigger word and a domain feature word construction rule, positioning the event trigger word and identifying the identification of the event type;

4.3) extracting event elements by adopting a Bi-LSTM + Attention + CRF neural network model, and fusing three event element extraction methods by adopting a voting mechanism;

4.4) finally, the fusion of nouns in the event instance is carried out by means of a synonym dictionary.

Further, the step (5) is specifically as follows: firstly, designing a method for packaging a water conservancy model based on an XML Schema technology and an OWL language, and constructing a water conservancy model library of event granularity; and a fusion method of the physical map and the water conservancy model is provided, the water conservancy model is fused into the physical map in a calculation mode, and the field physical map is constructed and applied.

Further, in the step (3), the method for extracting causal relationship based on the template library includes the following steps:

3.1) mining to obtain a new template based on the syntactic structure and the semantics; calculating the syntactic structure similarity between sentences by using a convolution tree kernel method, extracting sentences similar to the syntactic structure of the causal relation corpus from the training corpus, and taking the most similar example sentence group as a candidate set; the sentences of the causal relation corpus are marked by connecting words, and the sentences with the same connecting words form a group;

3.2) simultaneously, in order to solve the problems of similar structure and different semantics in the candidate sentences, expressing the semantic features of the sentences by using a BERT model, and obtaining a batch of causal relation example sentences with similar syntactic structures and semantic characteristics; extracting causal relation connecting word templates in the sentences, and listing example sentences of the same connecting words in a causal relation corpus;

3.3) template generalization; generalizing the extracted template in order to fully utilize the collaborative filtering capability of the causal relationship example; by combining a general Chinese causal relationship classification method, people pay more attention to the positions (such as Chinese and cause-to-effect) of causal relationship connection words, and a K-means algorithm is selected for clustering by adopting convolution tree kernel similarity calculation based on a syntactic structure.

Further, in the step 3.2), the method specifically comprises the following steps:

firstly, two sentences with similar syntactic structures are input into a model, and [ CLS ] is added into a header]Adding [ SEP ] between two sentences]As a separator, the input sequence of the model { [ CLS ] is finally obtained]，w₁，w₂，...，w_n，[SEP]，w’₁，w’₂，...，w’_mThe symbol is composed of a sentence sequence and a sentence sequence;

performing word segmentation processing, and mapping to a word embedding vector E; e ═ Ecls, E₁，...，E_n，Esep，E’₁，...，E’_m}; encoder editing through multiple layers of transformersCode;

finally, obtaining the feature vector of the sentence, and calculating the semantic similarity of the two sentences into Sim through sigmoid_sem＝sigmoid(CW^T)。

Further, in the step 3.3), the following steps are specifically performed: the input data set is divided into K clusters by a K-means algorithm, so that the data in the clusters have the maximum similarity;

firstly, randomly selecting k objects as centroids;

then calculating the distance from other data to the centroid, and dividing the data into clusters with the minimum distance; recalculating and selecting the cluster center of each cluster after one round of division is finished;

and circularly iterating the above process until the cluster center is not obviously changed.

Further, in the step (4), the method for extracting events based on pattern matching and neural network includes the following steps:

4.11) finishing the construction of an event frame based on a triggering word clustering method;

firstly, a verb in a sentence and a trigger word extraction algorithm of a corresponding bingo structure and a corresponding predicate structure are identified by analyzing a dependency syntax structure, and a core predicate in the sentence is extracted as a candidate trigger word set;

the core of the algorithm is judgment V_SBV＝V_VOB＝V_tWhether the relation of SBV is established or not, wherein the SBV relation represents a main predicate structure, namely that the head is a predicate verb and the slave is an object of the verb; VOB relation represents the structure of the kinematical object, namely the head is verb of the predicate, and the subordinate is the object of the verb;

the above formula is to judge verb V in the main and subordinate relation in the sentence_SBVVerb V in moving guest relation_VOBVerb V judged with the current_tIf they are the same, extracting V_tAs a candidate trigger word; then, thinning and filtering the trigger words in the candidate set, such as tying verbs and assisting verbs;

calculating the word meaning similarity of the candidate trigger words based on HowNet, and clustering according to the word meaning similarity; based on the clustering result and manual adjustment, carrying out synonym diffusion on the trigger words for the subsequent identification events which can be in a larger range, and forming a trigger word-event type comparison table;

4.12) extracting event trigger words and identifying event types by adopting a mode matching method, and constructing a matching rule for identifying water conservancy field event types by combining the event trigger words and the field characteristic words aiming at an event frame;

extracting event elements by adopting a neural network, providing an event element extraction model based on Bi-LSTM + Attention + CRF by utilizing an Encoder-Decoder framework through word vectors, part of speech characteristics POS and trigger word characteristics, extracting the characteristics by utilizing Bi-LSTM, distributing weight coefficients for phrases and screening information with higher importance by combining an Attention mechanism, increasing the integrity of words by utilizing CRF decoding, and extracting a subject-predicate-guest triple of an event;

finally, a voting mechanism is adopted to integrate three event element extraction methods, so that the performance of event extraction is improved;

4.13) event fusion; event examples under the same event type are merged through the synonym dictionary, a plurality of pieces of event data expressing the same real event are merged into one piece of event data, and event fusion is achieved.

Further, in the step (5), the constructing of the domain event map fused with the water conservancy model comprises the following steps:

5.1) constructing a water conservancy model body based on XML Schema and OWL;

designing water conservancy model metadata based on XML Schema; the water conservancy model metadata model consists of three parts, namely base _ info, parameters and function

Constructing a water conservancy model body; firstly, splitting a collected water conservancy model according to event granularity; splitting the water conservancy model according to a sub-process according to the characteristics of high cohesion and low coupling of the water conservancy model; secondly, classifying the models according to functions, and constructing a model pool; then, defining a water conservancy model metadata structure based on an XML Schema technology; finally, converting the Schema into an ontology document of OWL grammar by using a metadata structure, and establishing a water conservancy model ontology;

and 5.2) fusing the water conservancy model with the causal events, and replacing the transition probability in the general event map by using the specific water conservancy model in the water conservancy field.

Has the advantages that: compared with the prior art, the method for constructing the water conservancy event based incident map comprises the steps of extracting causal relationship based on a template library, extracting events based on pattern matching and a neural network, and constructing a field incident map fusing a water conservancy model; the causal relation extraction method based on the template library partially extracts the causal sentence and the effect sentence from the sentences. And the event extraction method part based on pattern matching and the neural network acquires a formalized causal event pair from the causal sentence obtained in the last step. And clustering and manually adjusting the trigger words when defining the event trigger words, so that the recall rate of the event extraction task is improved. The voting mechanism is adopted to integrate three event element extraction methods, so that the event extraction performance is improved; and the field affair map part for constructing and fusing the water conservancy model fuses the water conservancy model with the field characteristic into the affair map, performs quantitative analysis on the basis of qualitative analysis, and provides support for reasoning of the water conservancy affair map.

Drawings

FIG. 1 is a process flow diagram for case mapping;

FIG. 2 is a flow diagram of causal extraction;

FIG. 3 is a flow diagram of event extraction;

FIG. 4 is a diagram of the overall architecture of an event element extraction model;

FIG. 5 is a flow chart of hydraulic model library construction.

Detailed Description

In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.

(1) and collecting the text corpora of the water conservancy domain papers and the news types, and preprocessing the text corpora to be used as the subsequent map construction.

(2) And (3) extracting a cause and effect relation example sentence and extracting sentence characteristics to construct a cause and effect relation corpus by using the existing general cause and effect relation connecting words and the linguistic data obtained in the step (1).

(3) And extracting cause sentences and result sentences from the sentences based on the causal relation extraction of the template library. Aiming at the problems that the existing Chinese causal relationship connection word library is incomplete and does not have field applicability, the field causal relationship template library construction method based on the syntactic structure and the semantic features is provided. The method is combined with the characteristics of the water conservancy field, based on the idea of Bootstrap algorithm, and the existing extraction results are utilized to continuously iterate to search more effective causal relation extraction methods, so that the defect that the rule template method completely depends on manual writing rules is overcome. Semantic similarity is calculated through a convolutional tree kernel similarity based on a syntactic structure and a BERT model based on semantic features, new causal relation connecting words are extracted, and the problem that candidate sentences are similar in structure but different in semantics is solved.

(4) And acquiring a formalized causal event pair from the causal sentence obtained in the last step based on pattern matching and event extraction of the neural network. Due to the lack of the event framework in the water conservancy domain, the event framework is defined. And clustering and manually adjusting the trigger words when defining the event trigger words, so that the recall rate of the event extraction task is improved. And aiming at a predefined event frame, combining an event trigger word and a domain feature word construction rule, positioning the event trigger word and identifying the identification of the event type. The Bi-LSTM + Attention + CRF neural network model is adopted to extract event elements, and a voting mechanism is adopted to integrate the three event element extraction methods, so that the event extraction performance is obviously improved. And finally, the synonym dictionary is used for fusing nouns in the event example, so that redundant information in the map is avoided.

(5) And (3) constructing a field case diagram fused with the water conservancy model, fusing the water conservancy model with the field specific water conservancy model into the case diagram, and carrying out quantitative analysis on the basis of qualitative analysis to provide support for reasoning of the water conservancy case diagram. Firstly, a method for packaging a water conservancy model based on an XML Schema technology and an OWL language is designed, and a water conservancy model library of event granularity is constructed. And a fusion method of the physical map and the water conservancy model is provided, the water conservancy model is fused into the physical map in a computable mode, and the field physical map is constructed and simply applied.

The causal relationship extraction method based on the template library in the step (3) comprises the following steps:

(31) and obtaining a new template based on syntactic structure and semantic mining. And calculating the syntactic structure similarity between sentences by using a convolution tree kernel method, extracting sentences similar to the syntactic structure of the causal relation corpus from the training corpus, and taking the most similar example sentence group as a candidate set. The sentences of the causal relation corpus are marked by connecting words, and the sentences with the same connecting words form a group. The formalization definition for calculating the similarity of two sets of sentence syntax trees is:

wherein, cⁱAnd c^jRepresents the i, j group of sentences, cⁱAnd c^jContaining M and N sentences, K (T)^m，Tⁿ) Is a kernel function that measures how similar a sentence syntax parse tree is in the sense of the same sub-tree,

wherein the content of the first and second substances,

is a syntactic parse tree T_mThe set of sub-trees of (a),

is a syntactic parse tree T_nSet of subtrees. To calculate the similarity between the sentences with unknown causal relationship and the sentences with known causal relationship (i.e. the sentences in the corpus in units of groups), the above convolution tree kernel method needs to be modified, formalized and defined as:

wherein, cⁱAnd a group of example sentences corresponding to the ith group of connecting words in the causal relation corpus comprises M sentences, and s is a sentence with unknown causal relation in any sentence in the training data.

(32) Meanwhile, in order to solve the problem that candidate sentences have similar structures but different semantemes, a BERT model is used for expressing semantic features of the sentences to obtain a batch of syntax structuresAnd a causal relation example sentence similar to the semantic characteristic. And extracting causal relation connecting word templates in the sentences, and listing example sentences of the same connecting words into a causal relation corpus. Firstly, two sentences with similar syntactic structures are input into a model, and [ CLS ] is added into a header]Adding [ SEP ] between two sentences]As a separator, the input sequence of the model { [ CLS ] is finally obtained]，w₁，w₂，...，w_n，[SEP]，w’₁，w’₂，...，w’_mAnd w' are two sentence sequences respectively. And performing word segmentation processing and mapping to a word embedding vector E. E ═ Ecls, E₁，...，E_n，Esep，E’₁，...，E’_m}. Finally obtaining the feature vector of the sentence through encoder coding of multiple layers of transformers, and calculating the semantic similarity of the two sentences into Sim through sigmoid_sem＝sigmoid(CW^T)。

(33) And (5) generalizing the template. In order to fully utilize the collaborative filtering capability of the causal relationship example, the extracted template is generalized. By combining a general Chinese causal relationship classification method, people pay more attention to the positions (such as Chinese and cause-to-effect) of causal relationship connection words, and a K-means algorithm is selected for clustering by adopting convolution tree kernel similarity calculation based on a syntactic structure. The K-means algorithm divides the input data set into K clusters so that the data within the clusters have the greatest similarity. Firstly, randomly selecting k objects as centroids, then calculating the distances from other data to the centroids, and dividing the data into clusters with the minimum distances. And recalculating the cluster center of each cluster after one round of division is finished. And circularly iterating the above process until the cluster center is not obviously changed.

The event extraction method based on pattern matching and the neural network in the step (4) comprises the following steps:

(41) and finishing the construction of an event frame based on a triggering word clustering method. Firstly, a trigger extraction algorithm for identifying verbs in sentences and corresponding bingo structures and predicate structures by analyzing the dependency syntax structures is used, and core predicates in the sentences are extracted to serve as candidate trigger sets. The core of the algorithm is judgment V_SBV＝V_VOB＝V_tWhether or not it is establishedWherein, the SBV relationship represents a main predicate structure, namely that the head is a verb of the predicate and the slave is an object of the verb; VOB relationships represent a kinematical structure, i.e., the head is the verb predicate and the slave is the object of the verb. The above formula is to judge verb V in the main and subordinate relation in the sentence_SBVVerb V in moving guest relation_VOBVerb V judged with the current_tIf they are the same, extracting V_tAs candidate trigger words. And then, the trigger words in the candidate set are subjected to refined filtering, such as tying verbs and assisting verbs. And performing word sense similarity calculation of the candidate trigger words based on HowNet, and clustering according to the word sense similarity calculation. And performing synonym diffusion of the trigger words for the subsequent identification events which can be in a larger range based on the clustering result and manual adjustment, and forming a trigger word-event type comparison table.

(42) And extracting event trigger words and identifying event types by adopting a mode matching method, and constructing a matching rule by combining the event trigger words and the field characteristic words aiming at an event frame for identifying the water conservancy field event types. In a few cases with trigger ambiguity, it is found that the direct subject or direct object of the trigger in the sentence also plays an important role in the event description. Here, the constraints for these small number of event types are formulated in a way that keywords are associated with trigger words. Examples of rules are as follows: SEQ is given, Key is nitrogen phosphorus concentration, TR is rising. Wherein, SEQ is whether the position sequence of the domain feature words and the trigger words needs to be considered; key is a field feature word label, points to a field feature word and indicates that the field feature word must appear in a sentence to be determined to contain an event; TR is a trigger tag, points to a trigger, and indicates that the trigger needs to appear in a sentence to be determined to contain an event.

The method comprises the steps of extracting event elements by adopting a neural network, providing an event element extraction model based on Bi-LSTM + Attention + CRF by utilizing an Encoder-Decoder framework through word vectors, part-of-speech characteristics POS and trigger word characteristics, extracting the characteristics by utilizing Bi-LSTM, distributing weight coefficients for phrases and screening information with higher importance degree by combining an Attention mechanism, increasing the integrity of words by utilizing CRF decoding, and extracting the subject-predicate-guest triplets of events. And finally, a voting mechanism is adopted to integrate three event element extraction methods, so that the performance of event extraction is improved.

(43) And (4) event fusion. Event examples under the same event type are merged through the synonym dictionary, a plurality of pieces of event data expressing the same real event are merged into one piece of event data, and event fusion is achieved. Obtaining an event instance E through event element identification by inputting<V_t，Sub，Obj>And its corresponding event type T and synonym table D, for each event type in T, two-by-two comparison, if Sub_i，Obj_jAre synonymous with each other, then E_iAnd E_jAre merged into one event instance.

The field affair map fused with the water conservancy model constructed in the step (5) comprises the following steps:

(51) and constructing a water conservancy model body based on the XML Schema and the OWL. And designing a metadata model of the water conservancy model by combining an XML Schema technology, and then converting the Schema into an OWL (ontology Web language) grammar document to construct the water conservancy model with event granularity.

The method comprises the following specific steps:

(1) and designing water conservancy model metadata based on XML Schema. The water conservancy model metadata model is composed of three parts, namely base _ info, parameters and function. The complex type "base _ info" is a basic information module of the model, and comprises three sub-elements of "model _ ID" (model ID), "name" (model name) and "descriptor" (model function description) to store and define metadata information; the complex type "parameters" is element information of the model. The child node "attribute" is a parameter in the model, and includes two types, i.e., "measured parameter" and "empirical _ value", and the number is not limited. The child nodes "input" and "output" are input and output information of the model respectively, and include three types of elements, namely "name" (information name), "type" (data type) and "descriptor" (information description), wherein the number of "input" is at least 1, and "output" can be missing; the complex type "function" is function information of the model. The "language" is an algorithm development language, "version" is an algorithm version number, which is used for distinguishing different code versions of the same algorithm, and "code" is a stored code address.

(2) And constructing a water conservancy model body. Firstly, splitting the collected water conservancy model according to event granularity. For example, in a rainfall-runoff model, rainfall is taken as input, runoff is taken as output, and the model internally comprises a plurality of sub-processes of crown interception, evapotranspiration, surface runoff, interflow, underground water and river confluence, wherein the sub-processes are event-grained. And splitting the water conservancy model according to the sub-process according to the characteristics of high cohesion and low coupling of the water conservancy model. Secondly, classifying the models according to functions and constructing a model pool. Then, defining a water conservancy model metadata structure based on XML Schema technology. And finally, converting the Schema into an ontology document of OWL grammar by using a metadata structure, and establishing a water conservancy model ontology.

(52) The water conservancy model and the causal event are fused, and the specific water conservancy model in the water conservancy field is utilized to replace the transition probability in the general case map, so that the event transition information of the water conservancy case map has higher computability and interpretability. The model includes input elements, model equations, and output elements. The input elements comprise elements (including object attributes and states) of the precursor event nodes, the output elements are endowed to the subsequent event nodes and control the state transition of the nodes, and the model equation calculates the output elements according to the input elements.

Examples

As shown in fig. 1, the method of the invention comprises the following steps:

s1: text corpora of water conservancy domain papers, news and the like are collected and preprocessed to be used for subsequent map construction.

S2: and extracting the cause and effect relation example sentences and sentence characteristics by using the existing general cause and effect relation connecting words and the linguistic data acquired in the previous step to construct a cause and effect relation corpus.

S3: the causal relationship extraction method based on the template library obtains a new template based on syntactic structure and semantic mining, and generalizes the template. And extracting a new causal relationship example according to the field causal relationship template library.

S4: and (4) based on pattern matching and event extraction of the neural network, and based on a triggering word clustering method, completing construction of an event frame. And extracting event trigger words and identifying event types by adopting a mode matching method, and extracting event elements by adopting a neural network to finish the process of extracting events from the causal event sentence. And then, the synonym dictionary is used for fusing nouns in the event examples to obtain events and causal relationships among the events.

S5: and constructing a field affair map fused with the water conservancy model. The water conservancy model specific to the field is integrated into the matter map, quantitative analysis is carried out on the basis of qualitative analysis, and support is provided for water resource management decision.

Wherein, in S1, collecting data and preprocessing comprises the following steps:

s101: 384 relevant academic journal papers such as hydrology, hydrology progress, hydrologic informatization, and Hetian university newspaper (Nature science edition) in the water conservancy field are collected together, and 1187 relevant website news texts such as the Taihu river basin administration of creeper and the China Water conservancy network in the department of Water conservancy are used as experimental data. Each document is stored in an independent TXT format.

S102: and performing noise reduction processing on the document, removing information such as pictures, tables, reference documents and the like in the data set, and only keeping the text. And performing operations such as sentence segmentation, word segmentation, part of speech tagging and the like on the text, and organizing the document into a required input format. As used herein, "? ","! ",". "," … … ", etc. terminal characters divide the sentence. Jieba word segmentation is used as a text word segmentation tool. After word segmentation is completed, the word part tagging is carried out by using an LTP tool.

In S2, the cause and effect corpus is constructed as follows:

the existing common causal relation connecting words are classified according to syntactic rules and are made into regular expressions. And extracting causal relation example sentences from the training corpus by using the regular expressions, and classifying the causal relation example sentences according to the types of the connecting words to form a causal relation corpus. Expressions have the results from the cause to the effect (pairs of conjunctions): \ s? (because)/[ p | c ] + \ s (·) (so)/[ p | c ] + \ s (·); from the causal fruit (front end formula): (. \ s + (dragging | led | guide)/[ d | v ] + \ s (.); the fruit tracing factor (centered type): (.) (root origin | out)/[ p | c ] + \ s (. |), etc.

In S3, the causal relationship extraction based on the template library is shown in fig. 2, and specifically includes the following steps:

s301: and calculating the syntactic structure similarity between sentences by using a convolution tree kernel method, extracting sentences similar to the syntactic structure of the causal relation corpus from the training corpus, and taking the example sentence group most similar to the sentence group as a candidate set. The sentences of the causal relation corpus are marked by connecting words, and the sentences with the same connecting words form a group. The formalization definition for calculating the similarity of two sets of sentence syntax trees is:

wherein the content of the first and second substances,

is a syntactic parse tree T_mThe set of sub-trees of (a),

S302: through the comparison of the similarity of syntactic structures, some of the data are extracted from the training corpusCausal sentences have candidate sentences of similar syntactic structure. However, it is observed that there are cases where sentences are similar in structure but different in semantics. Therefore, the BERT model is adopted to calculate the semantic similarity between sentences. The most important part of BERT is the bi-directional transform coding structure, which models the text based entirely on the attention mechanism. The algorithm processes each sentence in the article to calculate the relation between a single word and each word, then the weight of each word in the sentence is expressed again by using the weight according to the interrelation between the words. This feature implies the word itself and the relationship between words, which is a more global expression. The specific method is that before two sentences with similar syntactic structures are input into a model, a [ CLS ] is added into a header]Adding [ SEP ] between two sentences]As a separator, the input sequence of the model { [ CLS ] is finally obtained]，w₁，w₂，...，w_n，[SEP]，w’₁，w’₂，...，w’_mAnd w' are two sentence sequences respectively. And performing word segmentation processing and mapping to a word embedding vector E. E ═ Ecls, E₁，...，E_n，Esep，E’₁，...，E’_m}. Finally obtaining the feature vector of the sentence through encoder coding of multiple layers of transformers, and calculating the semantic similarity of the two sentences into Sim through sigmoid_sem＝sigmoid(CW^T)。

S303: in order to fully utilize the collaborative filtering capability of the causal relationship example, the extracted template needs to be generalized. By combining a general Chinese causal relationship classification method, people pay more attention to the positions (such as Chinese style, cause-to-effect and the like) of causal relationship connection words, and therefore a K-means algorithm is selected for clustering based on the convolution tree kernel similarity calculation of a syntactic structure. The K-means algorithm divides the input data set into K clusters so that the data within the clusters have the greatest similarity. Firstly, randomly selecting k objects as centroids, then calculating the distances from other data to the centroids, and dividing the data into clusters with the minimum distances. And recalculating the cluster center of each cluster after one round of division is finished. And circularly iterating the above process until the cluster center is not obviously changed.

In S4, event extraction is performed by using the pattern matching and neural network method as shown in fig. 3, which specifically includes the following steps:

s401: and finishing the construction of an event frame based on a triggering word clustering method. Firstly, a trigger extraction algorithm for identifying verbs in sentences and corresponding bingo structures and predicate structures by analyzing the dependency syntax structures is used, and core predicates in the sentences are extracted to serve as candidate trigger sets. The core of the algorithm is judgment V_SBV＝V_VOB＝V_tWhether the relation of SBV is established or not, wherein the SBV relation represents a main predicate structure, namely that the head is a predicate verb and the slave is an object of the verb; VOB relationships represent a kinematical structure, i.e., the head is the verb predicate and the slave is the object of the verb. The above formula is to judge verb V in the main and subordinate relation in the sentence_SBVVerb V in moving guest relation_VOBVerb V judged with the current_tIf they are the same, extracting V_tAs candidate trigger words. And then, performing detailed filtering on the trigger words in the candidate set, such as tying verbs, assisting verbs and the like. And performing word sense similarity calculation of the candidate trigger words based on HowNet, and clustering according to the word sense similarity calculation. And performing synonym diffusion of the trigger words for the subsequent identification events which can be in a larger range based on the clustering result and manual adjustment, and forming a trigger word-event type comparison table.

S402: extracting event trigger words and identifying event types by adopting a mode matching method; and extracting event elements by adopting a neural network to complete the process of extracting events from the causal event sentences. It can be divided into two subtasks:

event triggers the recognition of words and event types. And aiming at the event framework, combining the event trigger words and the field characteristic words to construct a matching rule for identifying the types of the events in the water conservancy field. Taking the 'increase of nitrogen and phosphorus concentration of a water body' as an example, in the events related to the water conservancy characteristic value, verbs which show trends only by 'increase' and 'increase' cannot clearly represent the events. The subject "nitrogen and phosphorus concentration" plays an important role in sentence representation, so that the "nitrogen and phosphorus concentration" is introduced into the rule in the form of a domain characteristic word. In the event extraction process, according to the matching rule, the field characteristic word 'nitrogen and phosphorus concentration' and the trigger word 'rise' appear in the sentence, and meanwhile, the relative position of the appearance of the words also accords with the rule, so that the sentence is extracted and used as an event.

And identifying and extracting event elements. The model overall framework is shown in fig. 4. The method comprises the following specific steps: (1) the model splices word vectors, part of speech characteristics POS, trigger word position characteristics and trigger word types in an embedding layer. (2) And inputting the spliced vector into a neural network coding layer, and learning and inputting rich context information of the token through a bidirectional long-term and short-term memory unit. To solve the problems of gradient explosion, gradient disappearance and long-distance dependence, an LSTM model of a 'gate structure' is introduced, and the model calculates a memory state by using a Hadamard product (Hadamard product) multiplied bit by bit, wherein the calculation formula is as follows: c. C_t＝f_t⊙c_t-1+i_t⊙tan(W_c[_t-1，x_t]+b_c). Wherein t represents a hidden state vector, c_tIndicating a memory state, x_tRepresenting an input vector, i_tDenotes an input gate, f_tIndicating forgetting gate, W_cRepresenting a connection weight parameter, b_cRepresenting the bias parameter. (3) And introducing an attention mechanism, and learning the rich context information of the input token through a Bi-LSTM model of the coding layer and attention. First, the hidden state of the input information is saved

Secondly, calculating the score of the hidden state of each Encoder according to the formula

Wherein, W_combined、W_dAnd W is the hyperparametric matrix. Then, all scores are fed into the softmax layer for normalization processing so that the sum of all alignment scores is 1. Normalized in a manner of

Finally, multiplying the hidden state i of each Encoder by the score a after normalization_t，iAnd all the alignment vectors are accumulated and summed to obtainThe calculation formula of the aggregation information is as follows:

(4) and sending the context vector calculated by the Attention layer into a decoding layer, obtaining final output through conditional random field decoding of the decoding layer, and predicting the label of each word.

S403: and (4) event fusion. Event examples under the same event type are merged through the synonym dictionary, a plurality of pieces of event data expressing the same real event are merged into one piece of event data, and event fusion is achieved. Obtaining an event instance E through event element identification by inputting<V_t，Sub，Obj>And its corresponding event type T and synonym table D, for each event type in T, two-by-two comparison, if Sub_i，Obj_jAre synonymous with each other, then E_iAnd E_jAre merged into one event instance.

In S5, the construction of the domain affair map fused with the water conservancy model comprises the following steps:

s501: a method for packaging a water conservancy model based on an XML Schema technology and an OWL language is designed, and a water conservancy model library with event granularity is constructed, as shown in FIG. 5. The method comprises the following specific steps: (1) and designing water conservancy model metadata based on XML Schema. The water conservancy model metadata model is composed of three parts, namely base _ info, parameters and function. The complex type "base _ info" is a basic information module of the model, and comprises three sub-elements of "model _ ID" (model ID), "name" (model name) and "descriptor" (model function description) to store and define metadata information; the complex type "parameters" is element information of the model. The child node "attribute" is a parameter in the model, and includes two types, i.e., "measured parameter" and "empirical _ value", and the number is not limited. The child nodes "input" and "output" are input and output information of the model respectively, and include three types of elements, namely "name" (information name), "type" (data type) and "descriptor" (information description), wherein the number of "input" is at least 1, and "output" can be missing; the complex type "function" is function information of the model. The "language" is an algorithm development language, "version" is an algorithm version number, which is used for distinguishing different code versions of the same algorithm, and "code" is a stored code address. (2) And constructing a water conservancy model body. Firstly, splitting the collected water conservancy model according to event granularity. For example, in a rainfall-runoff model, rainfall is taken as input, runoff is taken as output, and the model internally comprises a plurality of sub-processes such as crown interception, evapotranspiration, surface runoff, interflow, underground water, river confluence and the like, wherein the sub-processes are event-grained. And splitting the water conservancy model according to the sub-process according to the characteristics of high cohesion and low coupling of the water conservancy model. Secondly, classifying the models according to functions and constructing a model pool. Then, defining a water conservancy model metadata structure based on XML Schema technology. And finally, converting the Schema into an ontology document of OWL grammar by using a metadata structure, and establishing a water conservancy model ontology.

S502: the model includes input elements, model equations, and output elements. The input elements comprise elements (including object attributes and states) of the precursor event nodes, the output elements are endowed to the subsequent event nodes and control the state transition of the nodes, and the model equation calculates the output elements according to the input elements. The eutrophication dynamic equation of the model receives three input factors, namely three precursor event nodes of temperature rise, strong illumination and water body eutrophication, and the growth rate of the output factor can be obtained through calculation of the model equation and is endowed to subsequent event nodes of plankton mass propagation.

The foregoing shows and describes the general principles and broad features of the present invention and advantages thereof. The industry has described the principles of the invention, and variations and modifications are possible without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The method for constructing the affair map based on the hydrological events is characterized by comprising the following steps: the method comprises the following steps:

2. The method for constructing a physiological graph based on hydrologic events according to claim 1, wherein: the step (3) is specifically as follows: based on the idea of a Bootstrapping algorithm, a causal relationship extraction method is continuously and iteratively searched by utilizing the existing extraction result, semantic similarity is calculated through a convolution tree kernel similarity based on a syntactic structure and a BERT model based on semantic features, and new causal relationship connecting words are extracted.

3. The method for constructing a physiological graph based on hydrologic events according to claim 1, wherein: the step (4) is specifically as follows:

4. The method for constructing a physiological graph based on hydrologic events according to claim 1, wherein: the step (5) is specifically as follows: firstly, designing a method for packaging a water conservancy model based on an XML Schema technology and an OWL language, and constructing a water conservancy model library of event granularity; and a fusion method of the physical map and the water conservancy model is provided, the water conservancy model is fused into the physical map in a calculation mode, and the field physical map is constructed and applied.

5. The method for constructing a physiological graph based on hydrologic events according to claim 1, wherein: in the step (3), the method for extracting the causal relationship based on the template library comprises the following steps:

3.3) template generalization; generalizing the extracted template in order to fully utilize the collaborative filtering capability of the causal relationship example; and (3) combining a general Chinese causal relationship classification method, adopting convolution tree kernel similarity calculation based on a syntactic structure, and selecting a K-means algorithm for clustering.

6. The method for constructing a physiological graph based on hydrologic events according to claim 5, wherein: in the step 3.2), the method specifically comprises the following steps:

first two syntax knotsBefore similar sentences are input into the model, CLS is added into the head]Adding [ SEP ] between two sentences]As a separator, the input sequence of the model { [ CLS ] is finally obtained]，w₁，w₂，...，w_n，[SEP]，w’₁，w’₂，...，w’_mThe symbol is composed of a sentence sequence and a sentence sequence;

performing word segmentation processing, and mapping to a word embedding vector E; e ═ Ecls, E₁，...，E_n，Esep，E’₁，...，E’_m}; encoding by an encoder of a multi-layer Transformer;

7. The method for constructing a physiological graph based on hydrologic events according to claim 5, wherein: in the step 3.3), the concrete steps are as follows: the input data set is divided into K clusters by a K-means algorithm, so that the data in the clusters have the maximum similarity;

firstly, randomly selecting k objects as centroids;

8. The method for constructing a physiological graph based on hydrologic events according to claim 1, wherein: in the step (4), the method for extracting events based on pattern matching and neural network comprises the following steps:

the core of the algorithm is judgmentBroken V_SBV＝V_VOB＝V_tWhether the relation of SBV is established or not, wherein the SBV relation represents a main predicate structure, namely that the head is a predicate verb and the slave is an object of the verb; VOB relation represents the structure of the kinematical object, namely the head is verb of the predicate, and the subordinate is the object of the verb;

9. The method for constructing a physiological graph based on hydrologic events according to claim 1, wherein: in the step (5), the field affair map integrated with the water conservancy model is constructed by the following steps:

5.1) constructing a water conservancy model body based on XML Schema and OWL;