CN117271701A

CN117271701A - Method and system for extracting system operation abnormal event relation based on TGGAT and CNN

Info

Publication number: CN117271701A
Application number: CN202311178825.8A
Authority: CN
Inventors: 高德荃; 李济伟; 丁雪伟; 杨猛; 冯宝; 卞宇翔; 刘超; 邵月; 杜静; 白东霞; 李妍
Original assignee: State Grid Information and Telecommunication Co Ltd; Nari Information and Communication Technology Co
Current assignee: State Grid Information and Telecommunication Co Ltd; Nari Information and Communication Technology Co
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-12-22

Abstract

The invention discloses a system operation abnormal event relation extraction method and system based on TGGAT and CNN, which are used for acquiring text data and preprocessing; processing the preprocessed text data by using a BERT model to generate a feature vector matrix; inputting the feature vector matrix into a multi-scale convolutional neural network CNN, and extracting the local features of sentences; inputting the feature vector matrix into a type guidance graph annotation network TGGAT, and extracting the global features of sentences; splicing the local features and the global features, and obtaining a characterization vector matrix weighted by the attention value through self-attention; the weighted token vector matrix is event relationship classified using a softmax classifier. And the type guidance graph annotation network is used for capturing long-range syntax dependency relationships and simultaneously capturing type information by considering different contributions of different types of dependency relationships, so that global event knowledge is accurately captured.

Description

A method for extracting abnormal event relationships in system operation based on TGGAT and CNN and system

技术领域Technical field

本发明涉及文本数据识别，具体是涉及一种基于TGGAT和CNN的系统运行异常事件关系抽取方法及系统。The present invention relates to text data recognition, specifically to a method and system for extracting abnormal event relationships in system operation based on TGGAT and CNN.

背景技术Background technique

事件关系提取是事件知识图谱中的一个重要步骤，它对信息提取、投资策略和问题解答等自然语言处理应用大有裨益。通过捕捉关联关系，相关事件可以相辅相成，充分发挥其价值。Event relationship extraction is an important step in the event knowledge graph, which is of great benefit to natural language processing applications such as information extraction, investment strategies, and question answering. By capturing related relationships, related events can complement each other and fully realize their value.

目前，在事件关系提取的任务中，大多数研究集中在提取因果关系和时间关系上。机器学习在提取事件关系的任务中得到了广泛应用。Kruengkrai等人采用多列卷积神经网络处理多个背景知识源，以提取缺乏明确线索表明事件因果关系存在的因果关系。Ning等人结合整数线性规划和常识知识提取时间关系。Hu等人使用预训练的语言模型自适应地聚类化时间关系特征。Jiang等人提出了一种在分类中结合句法树结构的模型。为了提高分类效果，Fan等人提出了一种关系分类模型，利用双线神经网络提取句法信息来完成关系分类任务。人们还提出了处理文档级事件因果关系的不同思路，Trong等人设计了一种强化学习机制来从文档中选择关键上下文。Currently, in the task of event relationship extraction, most research focuses on extracting causal relationships and temporal relationships. Machine learning is widely used in the task of extracting event relationships. Kruengkrai et al. employed multi-column convolutional neural networks to process multiple sources of background knowledge to extract causal relationships that lack clear clues indicating the existence of event causality. Ning et al. combined integer linear programming and common sense knowledge to extract temporal relationships. Hu et al. use pre-trained language models to adaptively cluster temporal relationship features. Jiang et al. proposed a model that incorporates syntactic tree structures in classification. In order to improve the classification effect, Fan et al. proposed a relationship classification model that uses a dual-line neural network to extract syntactic information to complete the relationship classification task. Different ideas for handling document-level event causality have also been proposed, and Trong et al. designed a reinforcement learning mechanism to select key context from documents.

由于丰富的语言知识包含在句法依赖中，因此有很多基于句法依赖的方法被提出。Aldawsari等人使用依赖树中另一事件的祖先事件，同时融合话语和叙事特征，以获得更丰富的事件表示。Meng等人使用序列编码器对事件之间最短的依赖路径进行编码，以识别它们之间的关系。由于序列编码器不能很好地学习结构信息，因此有人提出了一些基于图的方法来完成这项任务。Wang等人提出了不同的约束条件来提取时间关系和子事件关系。Zhang等人提出使用图转换器来捕捉语法图中的时间知识。Mathur等人和Tran等人在使用句法依赖性的同时，还引入了修辞、话语和语义等其他知识，通过不同图中节点之间的交互来丰富事件的表示。Since rich language knowledge is contained in syntactic dependencies, many methods based on syntactic dependencies have been proposed. Aldawsari et al. use an ancestor event of another event in a dependency tree while fusing discourse and narrative features to obtain a richer event representation. Meng et al. use a sequence encoder to encode the shortest dependency paths between events to identify relationships between them. Since sequence encoders cannot learn structural information well, some graph-based methods have been proposed to accomplish this task. Wang et al. proposed different constraints to extract temporal relationships and sub-event relationships. Zhang et al. proposed using graph transformers to capture temporal knowledge in syntax graphs. While using syntactic dependencies, Mathur et al. and Tran et al. also introduced other knowledge such as rhetoric, discourse, and semantics to enrich the representation of events through interactions between nodes in different graphs.

这些关系抽取模型部分是通用的关系抽取方法，部分用于金融领域的关系抽取模型，并没有针对数据库运维领域设计的关系抽取方法。另外，大多数现有的基于依赖关系的方法在对事件之间的语义上下文进行建模时，通常平等地对待不同类型的依赖关系，导致事件关系提取的性能下降。Some of these relationship extraction models are general relationship extraction methods, and some are relationship extraction models used in the financial field. There are no relationship extraction methods designed for the database operation and maintenance field. In addition, most existing dependency-based methods usually treat different types of dependencies equally when modeling the semantic context between events, leading to performance degradation in event relationship extraction.

发明内容Contents of the invention

发明目的：针对以上缺点，本发明提供一种基于TGGAT和CNN的系统运行异常事件关系抽取方法，包括以下步骤：Purpose of the invention: In view of the above shortcomings, the present invention provides a method for extracting abnormal event relationships in system operation based on TGGAT and CNN, which includes the following steps:

(1)获取文本数据，并进行预处理；(1) Obtain text data and perform preprocessing;

(2)运用BERT模型处理预处理后的文本数据，生成特征向量矩阵；(2) Use the BERT model to process the preprocessed text data and generate a feature vector matrix;

(3)将特征向量矩阵输入多尺度卷积神经网络CNN，提取句子的局部特征；将特征向量矩阵输入类型引导图注意网络TGGAT，提取句子的全局特征；(3) Input the feature vector matrix into the multi-scale convolutional neural network CNN to extract the local features of the sentence; input the feature vector matrix into the type guided graph attention network TGGAT to extract the global features of the sentence;

(4)将局部特征和全局特征进行拼接，然后通过自注意力得到经过注意力值加权后的表征向量矩阵；(4) Splice local features and global features, and then obtain a representation vector matrix weighted by attention values through self-attention;

(5)使用softmax分类器对加权后的表征向量矩阵进行事件关系分类。(5) Use the softmax classifier to classify event relationships on the weighted representation vector matrix.

进一步的，所述步骤(1)中获取数据库系统告警日志文本数据进行预处理：在多个业务系统上的数据库中获取十多年报警日志记录的上万条文本数据。Further, in step (1), the database system alarm log text data is obtained for preprocessing: tens of thousands of text data of alarm log records for more than ten years are obtained from databases on multiple business systems.

进一步的，所述步骤(2)中生成的特征向量矩阵V的表达式如下：Further, the expression of the feature vector matrix V generated in step (2) is as follows:

其中，t表示序列长度，768表示单词向量的维度，cls表示句子的初始标签，seq表示句子的结尾标签，v表示单词向量矩阵。Among them, t represents the sequence length, 768 represents the dimension of the word vector, cls represents the initial label of the sentence, seq represents the end label of the sentence, and v represents the word vector matrix.

进一步的，所述步骤(3)中将特征向量矩阵输入多尺度卷积神经网络CNN，提取句子的局部特征具体包括：Further, in step (3), the feature vector matrix is input into the multi-scale convolutional neural network CNN, and the extraction of local features of the sentence specifically includes:

(3.11)对输入的特征向量矩阵进行一维卷积，形成局部特征向量f_i：(3.11) Perform one-dimensional convolution on the input eigenvector matrix to form local eigenvector f _i :

fi＝ReLu(k_iv_t:t+j-1+b)fi＝ReLu(k _i v _t:t+j-1 +b)

其中，v表示输入的单词向量矩阵，j表示卷积核k_i的窗口大小，b表示偏置值；Among them, v represents the input word vector matrix, j represents the window size of the convolution kernel k _i , and b represents the bias value;

(3.12)针对n个卷积核对输入的特征向量矩阵进行n个卷积运算，形成单词向量v_t上下文特征集合的高维向量F_t：(3.12) Perform n convolution operations on the input feature vector matrix for n convolution kernels to form a high-dimensional vector F _t of the word vector v _t context feature set:

F_t＝{f₁,f₂,...,f_n}F _t ={f ₁ ,f ₂ ,...,f _n }

(3.13)对高维向量F_t进行降低维数的最大池化运算，形成单词向量v_t的局部上下文特征M_t：(3.13) Perform a maximum pooling operation to reduce the dimensionality of the high-dimensional vector F _t to form the local context feature M _t of the word vector v _t :

M_t＝max(F_t),M _t =max(F _t ),

(3.14)对于输入长度为t的特征向量矩阵，使用卷积集K扫描整个文本，形成整个文本的局部特征集S：(3.14) For an input feature vector matrix of length t, use the convolution set K to scan the entire text to form a local feature set S of the entire text:

S＝{M₁,M₂,...,M_t}S＝{M ₁ ,M ₂ ,...,M _t }

进一步的，所述步骤(3)中通过Stanford CoreNLP工具获得输入特征向量矩阵中句子的句法依赖树，通过在句法依赖树中构建输入句子的句法结构，将句子转换为图结构进行表示，其中将单词表示为节点，将句法依赖树构造单词之间的依赖关系表示为边；Further, in step (3), the Stanford CoreNLP tool is used to obtain the syntactic dependency tree of the sentence in the input feature vector matrix, and by constructing the syntactic structure of the input sentence in the syntactic dependency tree, the sentence is converted into a graph structure for representation, where Words are represented as nodes, and the dependencies between words constructed from a syntactic dependency tree are represented as edges;

将特征向量矩阵输入类型引导图注意网络TGGAT，提取句子的全局特征具体包括：Input the feature vector matrix into the type guided graph attention network TGGAT, and extract the global features of the sentence, including:

(3.21)使用TGGAT通过类型引导对句子的句法依赖树进行建模，获得与句子中每个单词相关的句法知识，然后利用依赖路径来转移和聚合句子中的单词信息，得到每个节点和所有相邻单词的注意力值为：(3.21) Use TGGAT to model the syntactic dependency tree of the sentence through type guidance, obtain the syntactic knowledge related to each word in the sentence, and then use the dependency path to transfer and aggregate the word information in the sentence to obtain each node and all The attention value of adjacent words is:

其中，W表示可学习的权重矩阵，a表示单层前馈反馈网络，表示节点i的语义向量，/>表示节点j的语义向量，/>表示节点i和节点j的依赖路径；Among them, W represents the learnable weight matrix, a represents a single-layer feedforward feedback network, Represents the semantic vector of node i, /> Represents the semantic vector of node j, /> Represents the dependency path of node i and node j;

(3.22)使用SoftMax函数对注意力值进行归一化，得到注意力系数：(3.22) Use the SoftMax function to normalize the attention value and obtain the attention coefficient:

α_i,r,j＝softmax(d_i,r,t)α _i,r,j =softmax(d _i,r,t )

(3.23)通过注意力系数的加权求和，以及将原始节点i的初始信息相加，聚合计算得到节点i的新向量特征 (3.23) Through the weighted summation of attention coefficients and the addition of the initial information of the original node i, the new vector feature of the node i is obtained by aggregation calculation

其中，σ表示激活函数，W_r、W₀表示可学习的权重，N_i是节点i在句法依赖图中所有邻居节点j的集合，R是节点i在句法依赖图中所有边的集合；Among them, σ represents the activation function, W _r and W ₀ represent the learnable weights, _Ni is the set of all neighbor nodes j of node i in the syntactic dependency graph, and R is the set of all edges of node i in the syntactic dependence graph;

(3.24)句子的全局特征如下：(3.24) The global characteristics of the sentence are as follows:

进一步的，所述步骤(4)中将局部特征和全局特征进行拼接，然后通过自注意力得到经过注意力值加权后的表征向量矩阵，具体公式如下：Further, in step (4), local features and global features are spliced, and then a representation vector matrix weighted by attention values is obtained through self-attention. The specific formula is as follows:

HS＝contact(H,S)HS＝contact(H,S)

Q＝W^q·HSQ＝W ^q ·HS

K＝W^k·HSK＝ ^Wk ·HS

V＝W^v·HSV＝ ^Wv ·HS

A′＝softmax(K^T·Q)A′=softmax(K ^T ·Q)

O＝V·A′O＝V·A′

其中，contact表示拼接函数，Q表示查询向量，W^q表示查询矩阵，K表示键向量，W^k表示键矩阵，V表示值向量，W^v表示值矩阵，A′表示对查询向量和键向量计算内积并缩放归一化得到的权重，O表示表征向量矩阵。Among them, contact represents the splicing function, Q represents the query vector, W ^q represents the query matrix, K represents the key vector, W ^k represents the key matrix, V represents the value vector, W ^v represents the value matrix, and A′ represents the calculation of the query vector and key vector. Inner product and scale the weights obtained by normalization, O represents the representation vector matrix.

进一步的，所述步骤(5)中使用softmax分类器来预测文本中事件对的关系，并输出y以获得事件关系分类：Further, in step (5), a softmax classifier is used to predict the relationship between event pairs in the text, and y is output to obtain event relationship classification:

y＝relu(O·w+b)y＝relu(O·w+b)

其中，w和b是全连接层的参数和偏置条目。Among them, w and b are the parameters and bias entries of the fully connected layer.

本发明还采用一种基于TGGAT和CNN的系统运行异常事件关系抽取系统，包括：The present invention also adopts a system operation abnormal event relationship extraction system based on TGGAT and CNN, including:

获取模块，用于获取文本数据，并进行预处理；Obtain module, used to obtain text data and perform preprocessing;

处理模块，用于运用BERT模型处理预处理后的文本数据，生成特征向量矩阵；将特征向量矩阵输入多尺度卷积神经网络CNN，提取句子的局部特征；将特征向量矩阵输入类型引导图注意网络TGGAT，提取句子的全局特征；The processing module is used to process the preprocessed text data using the BERT model and generate a feature vector matrix; input the feature vector matrix into the multi-scale convolutional neural network CNN to extract the local features of the sentence; input the feature vector matrix into the type guidance graph attention network TGGAT, extracts global features of sentences;

拼接模块，用于将局部特征和全局特征进行拼接，然后通过自注意力得到经过注意力值加权后的表征向量矩阵；The splicing module is used to splice local features and global features, and then obtain a representation vector matrix weighted by attention values through self-attention;

分类模块，用于使用softmax分类器对加权后的表征向量矩阵进行事件关系分类。The classification module is used to classify event relationships using a softmax classifier on the weighted representation vector matrix.

有益效果：本发明相对于现有技术，其显著优点是通过类型引导图注意网络不仅可以捕获长程句法依赖关系，还可以捕获类型信息，这对于捕获事件的全局上下文语义信息至关重要。事件关系提取任务中的大多数句子属于长程难句，其中两个相关单词可能相距遥远，句法结构复杂，不同类型的依赖关系可能会有不同的贡献，使得仅使用句子的表面信息难以准确捕捉全局事件知识。因此，引入句法信息来整理句子的结构，并对句法中的类型相关知识进行建模，以进一步捕捉全局特征。Beneficial effects: Compared with the existing technology, the significant advantage of the present invention is that the type-guided graph attention network can not only capture long-range syntactic dependencies, but also capture type information, which is crucial for capturing global contextual semantic information of events. Most sentences in the event relationship extraction task are long-range difficult sentences, in which two related words may be far apart, the syntactic structure is complex, and different types of dependencies may have different contributions, making it difficult to accurately capture the overall situation using only the surface information of the sentence. Event knowledge. Therefore, syntactic information is introduced to organize the structure of sentences, and type-related knowledge in syntax is modeled to further capture global features.

附图说明Description of the drawings

图1为本发明数据库异常事件关系抽取流程示意图；Figure 1 is a schematic diagram of the database abnormal event relationship extraction process of the present invention;

图2为本发明事件关系抽取系统结构框图；Figure 2 is a structural block diagram of the event relationship extraction system of the present invention;

图3为本发明示例句子的句法依赖树示意图；Figure 3 is a schematic diagram of a syntactic dependency tree of an example sentence of the present invention;

图4为本发明中TGGAT的注意力机制的原理结构图。Figure 4 is a schematic structural diagram of the attention mechanism of TGGAT in the present invention.

具体实施方式Detailed ways

实施例1Example 1

如图1所示，本实施例中一种基于TGGAT和CNN的系统运行异常事件关系抽取方法，包括以下步骤：As shown in Figure 1, in this embodiment, a system operation abnormal event relationship extraction method based on TGGAT and CNN includes the following steps:

(1)在多个业务系统上的Oracle数据库中获取十多年报警日志记录的上万条文本数据。(1) Obtain tens of thousands of text data recorded in alarm logs for more than ten years from Oracle databases on multiple business systems.

(2)对步骤(1)获取的Oracle数据库日志文本数据进行数据清洗与预处理。数据清洗包括主要包括去除不需要的字段、格式不一致的文本；对清洗后的数据库系统日志文本数据进行预处理，将日告警日志划分为句子，并使用BERT+CRF模型提取触发词。例如“由于事务[等待]资源会导致队列资源[陷入]僵局”,其中“等待”和“陷入”是触发单词。然后，根据事件对的类型、原因、伴随、处置、携带和子事件类别，将其标记为五类，分别用数字0、1、2、3和4表示形成数据集。最后，将数据集按8:1:1的比例划分为训练集、验证集和测试集，用于Oracle数据库事件关系提取的模型训练。(2) Perform data cleaning and preprocessing on the Oracle database log text data obtained in step (1). Data cleaning mainly includes removing unnecessary fields and text with inconsistent formats; preprocessing the cleaned database system log text data, dividing daily alarm logs into sentences, and using the BERT+CRF model to extract trigger words. For example, "The queue resource [is] deadlocked due to the transaction [waiting] for the resource", where "waiting" and "trapped" are the trigger words. Then, according to the type, cause, accompanying, disposition, carrying and sub-event categories of the event pairs, they are marked into five categories, respectively represented by numbers 0, 1, 2, 3 and 4 to form a data set. Finally, the data set is divided into a training set, a validation set and a test set in a ratio of 8:1:1 for model training of Oracle database event relationship extraction.

(3)将预处理后的数据输入编码器。运用BERT模型处理预处理后的文本数据。将输入的告警文本语句{w₁，w₂，…，w_t}生成为特征向量矩阵V。特征向量矩阵V的表达式如下：(3) Input the preprocessed data into the encoder. Use the BERT model to process preprocessed text data. Generate the input alarm text sentence {w ₁ , w ₂ ,..., w _t } into a feature vector matrix V. The expression of the eigenvector matrix V is as follows:

其中，t是序列长度，768是单词向量的维度，cls是句子的初始标签，seq是句子的结尾标签，v表示单词向量矩阵，并且可以通过Bert模型将输入的每个句子转换为特征向量矩阵。Among them, t is the sequence length, 768 is the dimension of the word vector, cls is the initial label of the sentence, seq is the end label of the sentence, v represents the word vector matrix, and each input sentence can be converted into a feature vector matrix through the Bert model .

(4)将步骤(3)中生成的特征向量矩阵V输入到多尺度卷积神经网络CNN层，提取句子的局部特征。使用卷积核K对特征向量矩阵V进行卷积计算，对得到局部特征F_t设计池化层降低其维度，最终得到词向量的句法特征。在事件关系提取任务中，K＝{k₁,k₂,…,k_n}是卷积核的集合，n是卷积核数。具体包括：(4) Input the feature vector matrix V generated in step (3) to the multi-scale convolutional neural network CNN layer to extract the local features of the sentence. The convolution kernel K is used to perform convolution calculation on the feature vector matrix V, and a pooling layer is designed to reduce the dimension of the obtained local feature F _t , and finally the syntactic features of the word vector are obtained. In the event relationship extraction task, K = {k ₁ , k ₂ ,..., k _n } is a set of convolution kernels, and n is the number of convolution kernels. Specifically include:

(4.11)对输入的特征向量矩阵进行一维卷积，目标词向量可以形成局部特征向量f_i。具体公式如下：(4.11) Perform one-dimensional convolution on the input feature vector matrix, and the target word vector can form a local feature vector _fi . The specific formula is as follows:

f_i＝ReLu(k_iv_t:t+j-1+b)f _i =ReLu(k _i v _t:t+j-1 +b)

其中，v表示输入的单词向量矩阵，j表示卷积核k_i的窗口大小，b表示偏置值。Among them, v represents the input word vector matrix, j represents the window size of the convolution kernel k _i , and b represents the bias value.

(4.12)卷积核的整个集合K作用于窗口中心词向量，并形成该词向量v_t的不同局部特征，得到单词向量v_t上下文特征集合的高维向量F_t。n个卷积运算表示如下：(4.12) The entire set K of convolution kernels acts on the window center word vector and forms different local features of the word vector v _t , obtaining a high-dimensional vector F _t of the context feature set of the word vector v _t . n convolution operations are expressed as follows:

v_t＝F_t＝{f₁,f₂,...,f_n}v _t =F _t ={f ₁ ,f ₂ ,...,f _n }

其中，F_t是在n次卷积运算之后由目标词v_t形成的上下文特征的集合。Among them, F _t is the set of contextual features formed by the target word v _t after n convolution operations.

(4.13)由于F_t是一个多特征的高维向量，使用池化操作来降低其维数。作为显著特征的因果语义角色词可以使用最大池化运算作为特征保留，其表达式为：(4.13) Since F _t is a high-dimensional vector with multiple features, pooling operations are used to reduce its dimensionality. Causal semantic role words as salient features can be retained as features using the max pooling operation, and its expression is:

M_t＝max(F_t)M _t =max(F _t )

(4.14)每个局部特征向量f_i在最大池化操作之后保留的特征m被完全连接以固定其维度输出，并最终形成中心词向量v_t的局部上下文表示，其表达式为：(4.14) The features m retained after each local feature vector f _i after the max pooling operation are fully connected to fix its dimension output, and finally form the local context representation of the central word vector v _t , whose expression is:

V_t＝M_t＝{m₁,m₂,...,m_n}V _t =M _t ={m ₁ ,m ₂ ,...,m _n }

(4.15)对于输入长度为t的特征向量矩阵，使用卷积集K扫描整个文本，以形成具有以下表达式的整个文本的局部特征集S：(4.15) For an input feature vector matrix of length t, the entire text is scanned using the convolution set K to form a local feature set S of the entire text with the following expression:

S＝{V₁,V₂,...,V_t}＝{M₁,M₂,...,M_t}S={V ₁ , V ₂ ,..., V _t }={M ₁ , M ₂ ,..., M _t }

(5)将步骤(3)中生成的特征向量矩阵V输入到TGGAT模型中，将类型依赖性添加到图注意力机制计算中，聚合文本的关键信息。使用Stanford CoreNLP工具获得输入句子的句法依赖树。例如，“由于事务等待资源会导致队列资源陷入僵局”的句法依赖树如图3所示。引入句法信息来整理句子的结构，并对句法中的类型相关知识进行建模，以进一步捕捉全局特征。在BERT编码层之后，使用TGGAT对句法树进行编码，不仅提取单词的语义特征，还提取句法依赖特征，以增强模型对句子的理解，使关系提取更加准确。(5) Input the feature vector matrix V generated in step (3) into the TGGAT model, add type dependency to the calculation of the graph attention mechanism, and aggregate the key information of the text. Use the Stanford CoreNLP tool to obtain the syntactic dependency tree of the input sentence. For example, the syntactic dependency tree of "queue resources will be deadlocked due to transactions waiting for resources" is shown in Figure 3. Syntactic information is introduced to organize the structure of sentences, and type-related knowledge in syntax is modeled to further capture global features. After the BERT encoding layer, TGGAT is used to encode the syntactic tree to extract not only the semantic features of the words, but also the syntactic dependency features to enhance the model's understanding of the sentence and make the relationship extraction more accurate.

通过在句法依赖树中构建输入句子的句法结构，将单词表示为节点，将句法依赖树构造单词之间的依赖关系表示为边，可以将句子转换为图结构进行表示。也就是说，在图中，本文通过句法依赖树将文本中的类型依赖信息转换为相应的邻接矩阵A。矩阵中的每个元素a_i,j表示第i个词和第j个词之间是否存在依赖关系。如果两个单词之间存在依赖关系，则a_i,j＝1，否则a_i,j＝0，如图4所示。By constructing the syntactic structure of the input sentence in a syntactic dependency tree, representing words as nodes, and expressing the dependency relationships between words constructed in the syntactic dependency tree as edges, the sentence can be converted into a graph structure for representation. That is to say, in the figure, this article converts the type dependency information in the text into the corresponding adjacency matrix A through the syntactic dependency tree. Each element a _i,j in the matrix indicates whether there is a dependency relationship between the i-th word and the j-th word. If there is a dependency relationship between two words, then a _i,j =1, otherwise a _i,j =0, as shown in Figure 4.

(5.11)为了充分利用句子的句法信息，使用TGGAT通过类型引导对句子的句法依赖树进行建模，以获得与句子中每个单词相关的句法知识，然后有效地利用依赖路径来转移和聚合句子中的单词信息。从而增强了单词的特征表示。每个节点和所有相邻单词的注意力值为：(5.11) In order to fully utilize the syntactic information of the sentence, use TGGAT to model the syntactic dependency tree of the sentence through type guidance to obtain the syntactic knowledge related to each word in the sentence, and then effectively utilize the dependency path to transfer and aggregate the sentences word information in . This enhances the feature representation of words. The attention value of each node and all adjacent words is:

其中，W表示可学习的权重矩阵，a表示单层前馈反馈网络，表示节点i的语义向量，/>表示节点j的语义向量，/>表示节点i和节点j的依赖路径。Among them, W represents the learnable weight matrix, a represents a single-layer feedforward feedback network, Represents the semantic vector of node i, /> Represents the semantic vector of node j, /> Represents the dependency path of node i and node j.

(5.12)使用SoftMax函数对中心节点和所有相邻实体的注意力值进行归一化，归一化后的注意力权重是最终的注意力系数：(5.12) Use the SoftMax function to normalize the attention values of the central node and all adjacent entities. The normalized attention weight is the final attention coefficient:

(5.13)节点的新向量表示是计算出的注意力系数的加权求和，并将原始节点的初始信息相加。聚合计算得到节点i的新向量特征 (5.13) The new vector representation of a node is a weighted summation of the calculated attention coefficients, summed with the initial information of the original node. Aggregation calculation obtains the new vector feature of node i

其中，σ表示激活函数，W_r、W₀表示可学习的权重，N_i是节点i在句法依赖图中所有邻居节点j的集合，R是节点i在句法依赖图中所有边的集合。Among them, σ represents the activation function, W _r and W ₀ represent the learnable weights, _Ni is the set of all neighbor nodes j of node i in the syntactic dependency graph, and R is the set of all edges of node i in the syntactic dependence graph.

(5.14)为了进一步区分不同语境特征的重要性，本方案增加了多头注意机制。注意机制可以满足区分不同上下文的需要，帮助模型关注句子中的一些重要信息，而忽略不相关的上下文信息：(5.14) In order to further distinguish the importance of different contextual features, this solution adds a multi-head attention mechanism. The attention mechanism can meet the need to distinguish different contexts and help the model focus on some important information in the sentence while ignoring irrelevant contextual information:

(5.15)输入文本的全局特征矩阵如下：(5.15) The global feature matrix of the input text is as follows:

(6)将步骤(4)中得到的局部特征向量和步骤(5)中得到的全局特征向量进行拼接，使用自注意力来计算文本中每个单词与其他单词之间的注意力值，使单词之间相互关联，从而使每个单词具有不同的重要性，得到加权后的表征向量矩阵，具体公式如下：(6) Splice the local feature vector obtained in step (4) and the global feature vector obtained in step (5), and use self-attention to calculate the attention value between each word and other words in the text, so that Words are related to each other, so that each word has different importance, and a weighted representation vector matrix is obtained. The specific formula is as follows:

HS＝contact(H,S)HS＝contact(H,S)

Q＝W^q·HSQ＝W ^q ·HS

K＝W^k·HSK＝ ^Wk ·HS

V＝W^v·HSV＝ ^Wv ·HS

A′＝softmax(K^T·Q)A′=softmax(K ^T ·Q)

O＝V·A′O＝V·A′

(7)使用softmax分类器对加权后的表征向量矩阵进行事件关系分类，输出y以获得事件关系分类：(7) Use the softmax classifier to classify the event relationship on the weighted representation vector matrix, and output y to obtain the event relationship classification:

y＝relu(O·w+b)y＝relu(O·w+b)

实施例2Example 2

如图2所示，本实施例中一种基于类型引导图注意网络(type-guided graphattention network，TGGAT)和CNN的系统运行异常事件关系抽取系统，包括获取模块，用于获取文本数据，并进行预处理；处理模块，用于运用BERT模型处理预处理后的文本数据，生成特征向量矩阵；将特征向量矩阵输入多尺度卷积神经网络CNN，提取句子的局部特征；将特征向量矩阵输入类型引导图注意网络TGGAT，提取句子的全局特征；拼接模块，用于将局部特征和全局特征进行拼接，然后通过自注意力得到经过注意力值加权后的表征向量矩阵；分类模块，用于使用softmax分类器对加权后的表征向量矩阵进行事件关系分类。As shown in Figure 2, in this embodiment, a system based on type-guided graph attention network (TGGAT) and CNN runs an abnormal event relationship extraction system, including an acquisition module for acquiring text data and performing Preprocessing; processing module, used to use the BERT model to process preprocessed text data and generate a feature vector matrix; input the feature vector matrix into the multi-scale convolutional neural network CNN to extract the local features of the sentence; guide the input type of the feature vector matrix The graph attention network TGGAT extracts the global features of the sentence; the splicing module is used to splice local features and global features, and then obtains the representation vector matrix weighted by the attention value through self-attention; the classification module is used to classify using softmax The device classifies event relationships on the weighted representation vector matrix.

运用BERT模型对输入文本进行编码，得到语义特征向量矩阵。然后，为了提取全局信息，类型引导图注意网络以类型依赖知识为引导信号，对文本的句法依赖树进行编码。同时，CNN用于提取局部信息，以增强文本的代表性。Use the BERT model to encode the input text and obtain a semantic feature vector matrix. Then, in order to extract global information, the type-guided graph attention network uses type dependency knowledge as the guiding signal to encode the syntactic dependency tree of the text. At the same time, CNN is used to extract local information to enhance the representativeness of the text.

通过类型引导图注意网络不仅可以捕获长程句法依赖关系，还可以捕获类型信息，这对于捕获事件的全局上下文语义信息至关重要。事件关系提取任务中的大多数句子属于长程难句，其中两个相关单词可能相距遥远，句法结构复杂，使得仅使用句子的表面信息难以准确捕捉全局事件知识。因此，引入句法信息来整理句子的结构，并对句法中的类型相关知识进行建模，以进一步捕捉全局特征。The type-guided graph attention network not only captures long-range syntactic dependencies but also type information, which is crucial for capturing global contextual semantic information of events. Most sentences in the event relationship extraction task are long-range difficult sentences, in which two related words may be far apart and have complex syntactic structures, making it difficult to accurately capture global event knowledge using only the surface information of the sentence. Therefore, syntactic information is introduced to organize the structure of sentences, and type-related knowledge in syntax is modeled to further capture global features.

Claims

1. The system operation abnormal event relation extraction method based on the TGGATT and the CNN is characterized by comprising the following steps:

(1) Acquiring text data and preprocessing;

(2) Processing the preprocessed text data by using a BERT model to generate a feature vector matrix;

(3) Inputting the feature vector matrix into a multi-scale convolutional neural network CNN, and extracting the local features of sentences; inputting the feature vector matrix into a type guide graph annotation network TGGATT, and extracting the global features of sentences;

(4) Splicing the local features and the global features, and obtaining a characterization vector matrix weighted by the attention value through self-attention;

(5) The weighted token vector matrix is event relationship classified using a softmax classifier.

2. The system operation abnormal event relation extraction method according to claim 1, wherein in the step (1), database system alarm log text data is obtained for preprocessing: tens of thousands of pieces of text data of the alarm log records are acquired in databases on a plurality of business systems.

3. The system operation abnormal event relation extracting method according to claim 1, wherein the expression of the feature vector matrix V generated in the step (2) is as follows:

where t represents the sequence length, 768 represents the dimension of the word vector, cls represents the initial tag of the sentence, seq represents the ending tag of the sentence, and v represents the word vector matrix.

4. The system operation abnormal event relation extraction method according to claim 3, wherein in the step (3), the feature vector matrix is input into a multi-scale convolutional neural network CNN, and the extracting of the local feature of the sentence specifically comprises:

(3.11) one-dimensional convolution of the input eigenvector matrix to form a local eigenvector f _i ：

f _i ＝ReLu(k _i v _t:t+j-1 +b)

Where v represents the input word vector matrix and j represents the convolution kernel k _i B represents the offset value;

(3.12) performing n convolution operations on the feature vector matrix input by the n convolution checks to form a word vector v _t High-dimensional vector F of context feature set _t ：

F _t ＝{f ₁ ,f ₂ ,...,f _n }

(3.13) for high-dimensional vector F _t Performing maximum pooling operation with reduced dimension to form word vector v _t Local context feature M of (2) _t ：

M _t ＝max(F _t ),

(3.14) for a feature vector matrix of input length t, scanning the entire text using a convolution set K to form a local feature set S of the entire text:

S＝{M ₁ ,M ₂ ,...,M _t }。

5. the system operation abnormal event relation extraction method according to claim 4, wherein in the step (3), a syntax dependency tree of the sentence in the input feature vector matrix is obtained through a Stanford CoreNLP tool, the sentence is converted into a graph structure by constructing a syntax structure of the input sentence in the syntax dependency tree, wherein the word is represented as a node, and the dependency relation between the words constructed by the syntax dependency tree is represented as an edge;

inputting the feature vector matrix into a type guide graph meaning network TGGATT, wherein the extracting of the global features of sentences specifically comprises the following steps:

(3.21) modeling a syntactic dependency tree of the sentence through type guidance using TGGATT to obtain syntactic knowledge related to each word in the sentence, and then transferring and aggregating word information in the sentence using dependency paths to obtain the attention value of each node and all adjacent words as:

where W represents a learnable weight matrix, a represents a single layer feed-forward feedback network,semantic vector representing node i, ++>Semantic vector representing node j, +.>Representing the dependency paths of node i and node j;

(3.22) normalizing the attention value using a SoftMax function to obtain an attention coefficient:

α _i,r,j ＝softmax(d _i,r,t )

(3.23) obtaining new vector characteristics of the node i by aggregate calculation through weighted summation of attention coefficients and addition of initial information of the original node i

Wherein sigma represents an activation function, W _r 、W ₀ Representing a learnable weight, N _i Is the set of all neighbor nodes j of node i in the syntax dependency graph, R is the nodeThe point i is a set of all edges in the syntax dependency graph;

the global features of the sentence (3.24) are as follows:

。

6. the method for extracting abnormal event relation in system operation according to claim 5, wherein in the step (4), the local feature and the global feature are spliced, and then the characterization vector matrix weighted by the attention value is obtained through self-attention, and the specific formula is as follows:

HS＝contact(H,S)

Q＝W ^q ·HS

K＝W ^k ·HS

V＝W ^v ·HS

A′＝softmax(K ^T ·Q)

O＝V·A′

wherein contact represents a splicing function, Q represents a query vector, W ^q Represents a query matrix, K represents a key vector, W ^k Representing a key matrix, V representing a value vector, W ^v Representing a value matrix, A' representing the weight obtained by calculating the inner product of the query vector and the key vector and scaling the normalization, and O representing the characterization vector matrix.

7. The system operation anomaly event relationship extraction method of claim 6, wherein the step (5) uses a softmax classifier to predict the relationship of event pairs in text and outputs y to obtain event relationship classifications:

y＝relu(O·w+b)

where w and b are parameters and offset entries for the fully connected layer.

8. A system operation abnormal event relation extraction system based on TGGAT and CNN, comprising:

the acquisition module is used for acquiring text data and preprocessing the text data;

the processing module is used for processing the preprocessed text data by using the BERT model to generate a feature vector matrix; inputting the feature vector matrix into a multi-scale convolutional neural network CNN, and extracting the local features of sentences; inputting the feature vector matrix into a type guide graph annotation network TGGATT, and extracting the global features of sentences;

the splicing module is used for splicing the local features and the global features, and then obtaining a characterization vector matrix weighted by the attention value through self-attention;

and the classification module is used for classifying the event relationship of the weighted characterization vector matrix by using a softmax classifier.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.