CN111274790B

CN111274790B - Chapter-level event embedding method and device based on syntactic dependency graph

Info

Publication number: CN111274790B
Application number: CN202010090488.7A
Authority: CN
Inventors: 杨鹏; 季冬; 李幼平; 纪雯
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2023-05-16
Anticipated expiration: 2040-02-13
Also published as: CN111274790A

Abstract

The invention discloses a chapter level event embedding method and device based on a syntactic dependency graph. Firstly, carrying out syntactic dependency analysis on each news text by using a natural language processing tool to construct a syntactic dependency graph; then, calculating the weight of each node word in the syntactic dependency graph by using an iterative updating algorithm; then, constructing positive and negative training samples by adopting a negative sampling technology based on the syntactic dependency graph; then, respectively constructing and training event element weights and a relation prediction model to obtain a low-dimensional dense vector representation of the chapter-level event; finally, the event embedded vector is input into a machine learning model and applied to related tasks such as event classification, clustering and the like. The invention adopts an unsupervised mode to learn the generated vector representation based on the syntactic dependency graph, can solve the problems of high-dimensional sparseness, semantics and grammar structure deletion of the event representation based on the conventional word bag model, and further improves the effect of downstream event analysis related tasks.

Description

Chapter-level event embedding method and device based on syntactic dependency graph

Technical Field

The invention belongs to the technical field of event embedding, and particularly relates to a chapter-level event embedding method and device based on a syntactic dependency graph.

Background

The event is an important knowledge unit of the human cognitive world, takes the event as a basic unit, processes and analyzes information, and is beneficial to high-efficiency and intelligent application of the information, such as dialogue understanding, information recommendation and the like. There are a large number of text messages describing events such as news, microblogs, referee documents, electronic medical records, etc. in the internet.

Event features are critical to event analysis. In the field of natural language processing, a bag-of-words model is the most common feature representation method, and has the characteristics of simplicity and easiness in implementation. In the field of chapter-level text event analysis, researchers can perform special processing according to event characteristics, such as noun and verb screening according to parts of speech, keyword extraction, named entity extraction and the like. However, the bag of words model ignores semantic information of words, and features represent high dimensions and sparsity. Even two words that are semantically similar are considered to be completely different words. Thus, for two documents that describe related events in different ways, the event feature representation based on the bag of words model may not be able to efficiently characterize the semantic association between them.

Embedding techniques (also known as representation learning techniques) aim to learn vectors that are continuous in low dimensions to represent each discrete object, through which the relationships between the discrete objects can be characterized. In the field of natural language processing, low-dimensional vector representations may be learned for different semantic units, such as words, sentences, paragraphs, documents, and the like. In terms of Word embedding, common methods are Word2vec, glove, fasttext, elmo, bert, etc. Chapter-level events can be handled generally as a document, and document embedding techniques such as Doc2vec, XLNet, etc. can be utilized; or based on the word bag model, the word id is replaced by the corresponding word vector, and then pooling operations such as average pooling, maximum pooling and the like are performed.

However, in the NLP field, the existing embedding techniques mostly train low-dimensional vector representations of words or documents based on language model ideas, by modeling contextual semantic information to predict target words, while ignoring the semantic structure information that is displayed. In the field of event analysis, the explanation of event related entities and the relation among the entities are important for analyzing and understanding different chapter-level events and the relation thereof. The event feature representation can capture semantic information of event related entity words and trigger words, and can also characterize semantic relations among the entities, so that deeper analysis is facilitated.

Disclosure of Invention

The invention aims to: in order to solve the problems of chapter-level event feature representation in the prior art, the invention provides a chapter-level event embedding method and device based on a syntactic dependency graph.

The technical scheme is as follows: the chapter-level event embedding method based on the syntactic dependency graph comprises the following steps:

(1) Acquiring event document corpus, sequentially performing word segmentation, part-of-speech tagging, entity identification, reference resolution and syntactic dependency analysis on each document by using a natural language processing tool, and constructing a vocabulary;

(2) Constructing an initial syntactic dependency graph based on the syntactic dependency analysis result; giving initial weights to nodes in the graph, and iteratively updating weights of all the nodes to generate a final syntactic dependency graph;

(3) Based on the syntactic dependency graph, respectively constructing an event element weight positive and negative sample and an event element relation positive and negative sample by adopting a negative sampling method, wherein the event element weight sample comprises an event id, a target word and a target word weight, and the event element relation sample comprises the event id, a subject, an object, a predicate, the target word and a label;

(4) Constructing an event element weight prediction model based on a Skip-Gram framework, and training the feature representation of an event and elements thereof by utilizing positive and negative samples of the event element weight;

(5) Constructing an event element relation prediction model based on a CBOW architecture, and training the feature representation of an event and elements thereof by utilizing positive and negative samples of the event element relation;

(6) Generating a corresponding event embedded vector for a newly input text based on the trained event element weight prediction model and the event element relation prediction model;

(7) Based on the event embedded vector, the event embedded vector is used as input of a machine learning algorithm to carry out event classification or clustering.

Further, in the step (2), an initial syntax dependency graph is constructed according to the syntax analysis result, specifically:

each word is used as a node, and the dependency relationship among the words represents directed edges among the corresponding nodes; except for verbs, the same words are combined into the same node, and all the dependency relations of the words are reserved; and combining a plurality of words under the same named entity into a node, eliminating the dependency relationship among the words, and reserving all the dependency relationship among the words and other words.

Further, in the step (2), an initial syntax dependency graph is constructed based on the syntax analysis result; giving initial weights to nodes in the graph, iteratively updating weights of all the nodes in the graph, and generating a final syntactic dependency graph, wherein the specific steps are as follows:

(2-1) for each node v in the syntactic dependency _i Giving initial weightW ⁰ (v _i ) The method comprises the steps of carrying out a first treatment on the surface of the The maximum iteration number is K;

(2-2) updating each node v _i Weight of (2):

W ⁿ⁺¹ (v _i )＝f(G,W ⁿ ,v _i )

where f is a weight update function, G is a constructed syntactic dependency, W ⁿ Is node weight mapping function after the nth iteration, W ⁿ⁺¹ (v _i ) Is node v after the n+1th iteration _i Weighting;

(2-3) if the weights of all nodes of the syntactic dependency graph are updated by the absolute value difference of |W before and after update ⁿ⁺¹ (v _i )-W ⁿ (v _i ) If the i is smaller than the threshold a, or the iteration number reaches the maximum iteration number, the final node weight W (v) _i )＝W ⁿ⁺¹ (v _i ) The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, executing the step (2-2).

Further, in the step (3), based on the syntactic dependency graph, a negative sampling method is adopted to respectively construct positive and negative samples of event element weights and positive and negative samples of event element relationships, and the specific steps are as follows:

(3-1) constructing positive and negative samples of event element weights, each sample having the format: (event id, target word weight); selecting all noun and verb nodes from the sentence dependency graph according to the part-of-speech labeling result, and carrying out normalization processing on weights of the noun and verb nodes to be used as a regression positive sample set; randomly selecting L nouns and M verbs which are not in the regression positive sample set from the vocabulary, and giving a weight of 0 to be used as the regression negative sample set;

(3-2) constructing positive and negative samples of the event element relationship, each sample having the format: (event id, subject, predicate, object, target word, tag); for each verb in the dependency graph, selecting a direct subject and an object thereof to form a triplet (subject, predicate, object); each element in the triples is selected as a target word, the element is replaced by a set MASK character string [ MASK ], a positive sample with a label of 1 is constructed, and a classification positive sample set is added; and for each positive sample, randomly selecting N words with the same part of speech and different from the target word from the vocabulary according to the part of speech of the target word to replace the target word in the positive sample, constructing N negative samples with the label of 0, and adding the negative samples into the classification negative sample set.

Further, in the step (4), an event element weight prediction model based on Skip-Gram architecture is constructed, and the feature representation of the training event and the elements thereof is performed by using positive and negative samples of the event element weight, which comprises the following specific steps:

(4-1) for event id, d-dimensional embedded vector v is obtained by means of lookup table _e The method comprises the steps of carrying out a first treatment on the surface of the For target words, a pre-trained word embedding tool is used for embedding to obtain k-dimensional word vectors

/>

(4-2) v _e And

respectively performing linear transformation to obtain->

And->

And->

Is the same as the dimension of:

wherein W is ^e And W is ^t Is a trainable parameter matrix;

(4-3) calculation

And->

As the predicted target word weight; the mean square error is used as an objective function and is formalized as:

loss＝(y-u) ²

(4-4) optimizing the objective function with the gradient descent algorithm, updating the event embedded representation v _e Parameter matrix W ^e And W is ^t And a target word vector

Further, in the step (5), an event element relation prediction model based on a CBOW architecture is constructed, and the feature representation of the training event and the elements thereof is performed by utilizing positive and negative samples of the event element relation, wherein the specific steps are as follows:

(5-1) for event id, d-dimensional embedded vector v is obtained by means of look-up tables _e The method comprises the steps of carrying out a first treatment on the surface of the For the main and the object words and the target words, respectively utilizing an open source tool fastatex to embed the main and the object words to obtain k-dimensional word vectors

And->

(5-2) v _e ，

And->

Respectively performing linear transformation to obtain +.>

And

wherein W is ^e ，W ^s ，W ^p ，W ^o And W is ^t Is a trainable parameter matrix;

(5-3) will

Summing and averaging to obtain a context vector +.>

Calculate->

And->

Calculating the output probability through a sigmoid function; the cross entropy loss function is used as an objective function and is formalized as:

loss＝-ylog(p ^t )-(1-y)log(1-p ^t )

wherein p is ^t The output probability distribution of the target word is that y is the real label of the sample;

(5-4) optimizing the objective function with the gradient descent algorithm, updating the event feature representation v _e Parameter matrix, W ^e ，W ^s ，W ^p ，W ^o And W is ^t And a main predicate vector

And target word vector->

Further, in the step (6), based on the trained event element weight prediction model and the trained event element relation prediction model, a corresponding event embedding vector is generated for the newly input text, and the specific steps are as follows:

(6-1) generating positive and negative samples of the weight of the constructed event element and positive and negative samples of the relation of the event element of the current text according to the step (3);

(6-2) training an event element weight prediction model based on the event element weight training samples according to the step (4), and updating an event embedded vector; in the training process, except for event embedding vectors, all other parameters are fixed;

(6-3) training an event element relation prediction model based on the event element relation training sample according to the step (5), and updating an event embedded vector; in the training process, all other parameters are fixed except the event embedded vector.

Based on the same inventive concept, the chapter level event embedding device based on the syntax dependency graph comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the chapter level event embedding method based on the syntax dependency graph is realized when the computer program is loaded to the processor.

The beneficial effects are that: according to the invention, the embedding technology is utilized to model the entity importance, action importance and relationship among entities described in the event text in a display manner, and the event elements and the structural information thereof can be captured in a deeper level through the low-dimensional event vector representation obtained through training, so that the problems of high-dimensional sparsity, semantics and grammar structure deletion existing in the event feature representation based on the conventional word bag model are effectively solved, and further, the effects of downstream tasks such as event classification and clustering are improved.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the invention.

FIG. 2 is a syntactic dependency analysis diagram according to an embodiment of the present invention.

FIG. 3 is a final syntactic dependency diagram according to an embodiment of the present invention.

Fig. 4 is a view of an event element weight prediction model based on Skip-Gram architecture according to an embodiment of the present invention.

Fig. 5 is a CBOW architecture-based event element relation prediction model diagram according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent.

As shown in fig. 1, the chapter level event embedding method based on the syntactic dependency graph disclosed by the embodiment of the invention comprises the following steps:

(6) Generating a corresponding event embedded vector for a newly input text based on the two types of prediction models of the element weights and the element relations of the event after training;

(7) Based on the event embedded vector, the event embedded vector is used as input of a general machine learning algorithm to carry out event classification and clustering.

In an alternative embodiment of the present invention, step (1) above downloads a dog search news data set from the internet containing 18 channels of news data from domestic, international, sports, social, entertainment, etc. during the period 6 months to 7 months of 2012. The document information of the data set is partially shown in Table 1.

Table 1 two example news texts

In an alternative embodiment of the present invention, step (1) above uses a stanford CoreNLP natural language processing tool to perform word segmentation, part-of-speech tagging, entity recognition, reference resolution, and syntactic analysis tasks, and adds all nouns and verbs extracted in the dataset to a vocabulary, each term in the vocabulary being in the form of a (word, part-of-speech collection), where the word serves as a key.

The analysis result obtained by the document 1 through step 1 is shown in fig. 2.

In an optional embodiment of the present invention, the step (2) constructs an initial syntactic dependency graph according to the syntactic dependency analysis result, specifically:

In an optional embodiment of the present invention, in the step (2), based on the initial syntax dependency graph, the weights of the nodes are iteratively updated by using a PageRank algorithm, so as to generate a final syntax dependency graph, and the specific steps are as follows:

(2-1) for each node v in the syntactic dependency _i Giving an initial weight W ⁰ (v _i ) =1.0; the maximum number of iterations is k=100;

(2-2) updating weights of the nodes in the graph, wherein the weights update formula:

wherein d is a damping coefficient, the value is 0.85, in (v _i ) Is directed to node v _i Is set for all nodes, out (v) _i ) Is node v _i All node sets pointed to; in (v) In undirected graph _i )＝Out(v _i )；

The final syntactic dependency of one of the documents is shown in FIG. 3.

In an optional embodiment of the present invention, in the step (3), based on the syntactic dependency graph, positive and negative samples of the event element weights and positive and negative samples of the event element relationships are respectively constructed by adopting a negative sampling method, and the specific steps include: (3-1) constructing positive and negative samples of event element weights, each sample having the format: (event id, target word weight); selecting all noun and verb nodes from the sentence dependency graph according to the part-of-speech labeling result, and carrying out normalization processing on weights of the noun and verb nodes to be used as a regression positive sample set; randomly selecting L nouns and M verbs which are not in the regression positive sample set from the vocabulary, and giving a weight of 0 to be used as the regression negative sample set;

(3-2) constructing positive and negative samples of the event element relationship, each sample having the format: (event id, subject, predicate, object, target word, tag); for each verb in the dependency graph, selecting a direct subject and an object thereof to form a triplet (subject, predicate, object); each element in the triples is selected as a target word, the element is replaced by a set MASK character string [ MASK ], a positive sample with a label of 1 is constructed, and a classification positive sample set is added; for each positive sample, according to the part of speech of the target word, randomly selecting N words with the same part of speech and different from the target word from the vocabulary to replace the target word in the positive sample, and constructing a negative sample with N labels of 0.

In an optional embodiment of the invention, in the step (4), an event element weight prediction model is constructed by using Skip-Gram architecture, and the weights w of the predicted entity words or verbs are represented according to the event features _i The model structure is shown in fig. 4, and the training process is specifically as follows:

(4-1) for event id, d-dimensional (e.g., 100-dimensional) embedded vector v is obtained by look-up table means _e The method comprises the steps of carrying out a first treatment on the surface of the For target words, the K-dimensional (such as 300-dimensional) word vector is obtained by embedding the target words by using an open source tool fastatex

(4-2) v _e And

respectively performing linear transformation to obtain->

And->

And->

Is the same (e.g., 256 dimensions each):

wherein W is ^e And W is ^t Is a trainable parameter matrix;

(4-3) calculation

And->

loss＝(y-u) ²

In an optional embodiment of the present invention, in the step (5), an event element relationship prediction model is constructed by using a CBOW architecture, according to an event feature representation, two entities are given, their relationships are predicted, or one entity and its associated verb are given, the other entity is predicted, and a learning chapter-level event and its element vector representation are predicted, as shown in fig. 5, which shows the model structure, and the training process specifically includes:

(5-1) for event id, d-dimensional (e.g., 100-dimensional) embedded vector v is obtained by look-up table means _e The method comprises the steps of carrying out a first treatment on the surface of the For the main and the object words and the target words, respectively utilizing an open source tool fastatex to embed to obtain a k-dimensional (such as 300-dimensional) word vector

And

(5-2) v _e ，

And->

Respectively performing linear transformation to obtain +.>

And

the dimension after transformation is 256:

(5-3) will

Summing and averaging to obtain a context vector +.>

Calculate->

And->

Calculating the output probability through a sigmoid function; the cross entropy loss function is used as an objective function to be formalized as:

loss＝-ylog(p ^t )-(1-y)log(1-p ^t )

(5-4) optimizing the objective function with a random gradient descent algorithm, updating the event feature representation v _e Parameter matrix W ^e ，W ^s ，W ^p ，W ^o And W is ^t And a main predicate vector

And target word vector->

In an optional embodiment of the present invention, in the step (6), based on the two types of models that are trained, 2000 sports news stories with the same period are selected from news corpus, and a corresponding event embedding vector is generated for each news text, and the specific steps are as follows:

(6-1) producing positive and negative samples of the construction event element weight and positive and negative samples of the event element relation of the current text according to the step (3);

In an optional embodiment of the present invention, in the step (7), based on the event embedding vector, the event embedding vector is used as input of a Single-Pass clustering algorithm, and event clustering is performed on 2000 news texts, and clustering effects of event feature representation based on TF-IDF are compared; wherein the distance measure selects cosine similarity, and the similarity threshold is set to 0.8.

Based on the same inventive concept, the chapter level event embedding device based on the syntax dependency graph disclosed by the embodiment of the invention comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the chapter level event embedding method based on the syntax dependency graph is realized when the computer program is loaded to the processor.

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. The chapter-level event embedding method based on the syntactic dependency graph is characterized by comprising the following steps:

(3) Based on the syntactic dependency graph, respectively constructing positive and negative samples of event element weights and positive and negative samples of event element relationships by adopting a negative sampling method; the event element weight sample comprises an event id, a target word and a target word weight, and the event element relation sample comprises an event id, a subject, an object, a predicate, a target word and a label;

(6) Generating a corresponding event embedded vector for a newly input text based on the trained event element weight prediction model and the event element relation prediction model; comprising the following steps: (6-1) generating positive and negative samples of the weight of the constructed event element and positive and negative samples of the relation of the event element of the current text according to the step (3); (6-2) training an event element weight prediction model based on the event element weight training samples according to the step (4), and updating an event embedded vector; in the training process, except for event embedding vectors, all other parameters are fixed; (6-3) training an event element relation prediction model based on the event element relation training sample according to the step (5), and updating an event embedded vector; in the training process, except for event embedding vectors, all other parameters are fixed;

2. The chapter level event embedding method based on syntactic dependency according to claim 1, wherein in the step (2), an initial syntactic dependency is constructed according to syntactic dependency analysis result, specifically:

3. The chapter level event embedding method based on syntax dependency graph according to claim 2, wherein in the step (2) initial weights are given to nodes in the graph, weights of all nodes in the initial syntax dependency graph are updated iteratively, and a final syntax dependency graph is generated, and the specific steps are as follows:

(2-1) for each node v in the syntactic dependency _i Giving an initial weight W ⁰ (v _i ) The method comprises the steps of carrying out a first treatment on the surface of the The maximum iteration number is K;

(2-2) updating each node v _i Weight of (2):

W ⁿ⁺¹ (v _i )＝f(G,W ⁿ ,v _i )

4. The chapter level event embedding method based on a syntactic dependency graph according to claim 1, wherein in the step (3), based on the syntactic dependency graph, a negative sampling method is adopted to construct positive and negative samples of event element weights and positive and negative samples of event element relationships, respectively, and the specific steps are as follows:

(3-1) constructing positive and negative samples of event element weights: selecting all noun and verb nodes from the sentence dependency graph according to the part-of-speech labeling result, and carrying out normalization processing on weights of the noun and verb nodes to be used as a regression positive sample set; randomly selecting L nouns and M verbs which are not in the regression positive sample set from the vocabulary, and giving a weight of 0 to be used as the regression negative sample set;

(3-2) constructing positive and negative samples of event element relation: for each verb in the dependency graph, selecting a direct subject and an object thereof to form a triplet (subject, predicate, object); each element in the triples is selected as a target word, the element is replaced by a set mask character string, a positive sample with a label of 1 is constructed, and a classification positive sample set is added; and for each positive sample, randomly selecting N words with the same part of speech and different from the target word from the vocabulary according to the part of speech of the target word to replace the target word in the positive sample, constructing N negative samples with the label of 0, and adding the negative samples into the classification negative sample set.

5. The chapter level event embedding method based on syntactic dependency according to claim 1, wherein in the step (4), an event element weight prediction model based on Skip-Gram architecture is constructed, and the feature representation of the training event and its elements is performed by using positive and negative event element weights, which comprises the following specific steps:

(4-2) v _e And

respectively performing linear transformation to obtain->

And->

And->

Is the same as the dimension of:

wherein W is ^e And W is ^t Is a trainable parameter matrix;

(4-3) calculation

And->

As the predicted target word weight; the real target word weight is y; the mean square error is used as an objective function and is formalized as:

loss＝(y-u) ²

6. The chapter level event embedding method based on syntactic dependency according to claim 1, wherein in the step (5), an event element relation prediction model based on a CBOW architecture is constructed, and the feature representation of the training event and its elements is performed by using positive and negative samples of the event element relation, which comprises the following specific steps:

(5-1) for event id, d-dimensional embedded vector v is obtained by means of look-up tables _e The method comprises the steps of carrying out a first treatment on the surface of the For the main and the object words and the target words, respectively utilizing a pre-trained word embedding tool to embed the main and the object words to obtain k-dimensional word vectors

And->

(5-2) v _e ，

And->

Respectively performing linear transformation to obtain +.>

And

(5-3) will

Summing and averaging to obtainThe following vectors->

Calculate->

And->

loss＝-ylog(p ^t )-(1-y)log(1-p ^t )

(5-4) optimizing the objective function using the gradient descent algorithm, updating the event embedding vector v _e Parameter matrix W ^e ，W ^s ，W ^p ，W ^o And W is ^t And a main predicate vector

And target word vector->

7. Chapter level event embedding device based on syntactic dependency, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that said computer program, when loaded to the processor, implements a chapter level event embedding method based on syntactic dependency according to any one of claims 1-6.