WO2024087963A1

WO2024087963A1 - Text processing method and apparatus, and electronic device and storage medium

Info

Publication number: WO2024087963A1
Application number: PCT/CN2023/120521
Authority: WO
Inventors: 程昊熠
Original assignee: 中移(苏州)软件技术有限公司; 中国移动通信集团有限公司
Priority date: 2022-10-26
Filing date: 2023-09-21
Publication date: 2024-05-02
Also published as: CN116821276A

Abstract

Provided in the embodiments of the present disclosure are a text processing method and apparatus, and an electronic device and a storage medium. The method comprises: acquiring event pair data comprised in a first text; processing the event pair data by using a dependency syntactic parsing tool, so as to obtain event short-sentence pair data corresponding to the event pair data; determining a first linear similarity and a first nonlinear similarity of the event pair data and a second linear similarity and a second nonlinear similarity of the event short-sentence pair data; and determining a confidence coefficient of the event to data on the basis of the event pair data, the event short-sentence pair data, the first linear similarity, the first nonlinear similarity, the second linear similarity and the second nonlinear similarity, wherein the confidence coefficient represents the degree of the event pair data having referentiality.

Description

Text processing method, device, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This disclosure is based on the Chinese patent application with application number 202211320876.5 and application date October 26, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated into this application by introduction.

Technical Field

The present disclosure relates to data processing technology, and in particular to a text processing method, device, electronic device and storage medium.

Background technique

Nowadays, with the rapid development of Internet technology, the interactive information generated by people on the Internet is increasing day by day, and they can get the information they want through the Internet anytime and anywhere. Although the Internet provides people with faster and more diverse information, it also generates a lot of junk information, which causes people to spend a lot of energy when looking for the information they need, or even return empty-handed. In the era of big data, how to process big data and filter out valuable information has become an important topic. Event extraction can help machines find valuable event information in texts, classify text content with the same semantics into one category, and thus resolve event co-references.

Event co-reference resolution is to determine whether event sentences with different description methods refer to the same event in real life, which mainly depends on the similarity between the two. The difficulty lies in how to accurately calculate the similarity value between two event sentences and how to improve the accuracy of similarity calculation. There is currently no effective solution to this problem.

Summary of the invention

In view of this, the main purpose of the present invention is to provide a text processing method, device, and electronic device. equipment and storage media.

To achieve the above objectives, the technical solution of the present disclosure is implemented as follows:

The present disclosure provides a text processing method, including:

Acquire event pair data included in the first text;

Using a dependency syntax analysis tool to process the event pair data to obtain event short sentence pair data corresponding to the event pair data;

Determine a first linear similarity and a first non-linear similarity of the event pair data and determine a second linear similarity and a second non-linear similarity of the event phrase pair data;

The confidence of the event pair data is determined based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence represents the degree to which the event pair data has a same-reference relationship.

In the above scheme, the process of using a dependency syntax analysis tool to process the event pair data to obtain event short sentence pair data corresponding to the event pair data includes:

Determining arguments and dependent words of trigger words in the event pair data using the dependency syntax analysis tool;

Determining a first distance between the argument and the trigger word, and determining a second distance between the dependency word and the trigger word;

Sorting the first distance and the second distance to obtain a sorting result;

Determine two arguments or trigger words corresponding to the maximum distance in the sorting result, and use the two arguments or trigger words corresponding to the maximum distance as the start word and the end word of the event short sentence pair data;

The event pair data is intercepted based on the start word and the end word to obtain the event short sentence pair data.

In the above solution, determining the confidence of the event pair data in the first text based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity includes:

The first similarity is determined based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity, and the second non-linear similarity. A confidence vector of the event in the text to the data;

The confidence vector is processed based on a fully connected classifier to obtain the confidence of the event on the data.

In the above solution, the method further includes:

A pre-trained model (Bidirectional Encoder Representation from Transformers, BERT) is used to predict the event pair data to obtain the word vector pairs corresponding to the event pair data.

In the above scheme, the event pair data includes a plurality of word pair data; the method further includes:

Acquire a first information pair and a second information pair of a plurality of word pair data in the event pair data; the first information pair represents a part-of-speech information pair of the word pair data; the second information pair represents a position information pair of the word pair data;

Based on the word vector pair, the event pair data, the first information pair and the second information pair, a first event vector pair corresponding to the event pair data is determined.

In the above solution, the method further includes:

Using a Bi-directional Long Short-Term Memory network (Bi-LSTM) to extract the first event vector pair, and obtain a global information pair corresponding to the first event vector pair;

Using a convolutional neural network (CNN) to extract the first event vector pair to obtain a local information pair corresponding to the first event vector pair;

fusing the global information pair and the local information pair to obtain a fused vector pair corresponding to the first event vector pair;

Perform a first global maximum pooling layer process on the fused vector pair to obtain a second event vector pair corresponding to the first event vector pair.

In the above solution, determining the first linear similarity and the first non-linear similarity of the event pair data includes:

Determine a first linear similarity and a first nonlinear similarity of the event pair data according to the second event vector pair;

The first linear similarity includes a first cosine distance; and the first non-linear similarity includes at least one of a first bilinear distance and a first single-layer network distance.

In the above solution, the method further includes:

Based on the word vector pair and the event short sentence pair data, determining a first event short sentence vector pair corresponding to the event short sentence pair data;

The first event short sentence vector pair is processed by a second global maximum pooling layer to obtain a second event short sentence vector pair corresponding to the first event short sentence vector pair.

In the above solution, determining the second linear similarity and the second non-linear similarity of the event phrase pair data includes:

Determine a second linear similarity and a second non-linear similarity of the event phrase pair data according to the second event phrase vector pair;

The second linear similarity includes a second cosine distance; and the second non-linear similarity includes at least one of a second bilinear distance and a second single-layer network distance.

The present disclosure provides a text processing device, including:

A first acquisition module, configured to acquire event pair data included in the first text;

A first processing module is configured to process the event pair data using a dependency syntax analysis tool to obtain event short sentence pair data corresponding to the event pair data;

A first determination module is configured to determine a first linear similarity and a first non-linear similarity of the event pair data and to determine a second linear similarity and a second non-linear similarity of the event phrase pair data;

The second determination module is configured to determine the confidence of the event pair data based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence represents the degree to which the event pair data has a synonymous relationship.

An embodiment of the present disclosure provides a text processing device, including a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and when the processor executes the program, any of the above-mentioned methods is implemented.

An embodiment of the present disclosure provides a storage medium, wherein the storage medium stores executable instructions. When the executable instructions are executed by a processor, any of the above methods is implemented.

The disclosed embodiment provides a text processing method, device, electronic device and storage medium. The method includes: obtaining event pair data included in a first text; processing the event pair data using a dependency syntax analysis tool to obtain event short sentence pair data corresponding to the event pair data; determining the first linear similarity and the first non-linear similarity of the event pair data and determining the second linear similarity and the second non-linear similarity of the event short sentence pair data; determining the confidence of the event pair data based on the event pair data, the event short sentence pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence characterizes the degree to which the event pair data has a co-referential relationship. By combining the first linear similarity and the first non-linear similarity of the event pair data and the second linear similarity and the second non-linear similarity of the event short sentence pair data to determine the confidence of the event pair data, it is possible to make up for the defect of only considering the event pair data as a whole when the confidence is determined by linear similarity.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG1 is a schematic diagram of a flow chart of a text processing method according to an embodiment of the present disclosure;

FIG2 is a schematic diagram of the technical process of the BNN system of the text processing method according to an embodiment of the present disclosure;

FIG3 is a schematic diagram of the structure of a text processing device according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a hardware entity structure of a text processing device according to an embodiment of the present disclosure.

Detailed ways

To make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the specific technical solutions disclosed will be further described in detail below in conjunction with the drawings in the embodiments of the present disclosure. The following embodiments are used to illustrate the present disclosure, but are not used to limit the scope of the present disclosure.

In the related art, there are two main methods for event synonym resolution. One is to use a probability or graph-based machine learning method, which requires a lot of feature engineering to manually extract features from event sentences, and then combine machine learning methods to identify synonymous relationships. The other is to use mainstream neural network methods to design a similarity model to calculate the similarity between two event sentences, thereby identifying synonymous relationships.

In machine learning methods, relevant scholars have introduced a series of Whether the event attributes, such as trigger words, tense, polarity, etc. are consistent. Relevant scholar 2 designed a maximum entropy classifier and introduced more than 100 features for experiments. Relevant scholar 3 proposed a joint reasoning model based on Markov chain to correct the erroneous results produced by the classifier. Relevant scholar 4 designed a graph-based model classifier to merge events into an undirected graph, and then remove non-co-referential events from the graph. Relevant scholar 5 first used a clustering algorithm to generate an undirected graph of event co-referential relationships, and then used the optimal cutting algorithm to optimize the graph, deleting the erroneous edges from the undirected graph, thereby optimizing the event co-referential resolution. Teng Jiayue used the maximum entropy classifier model combined with a large number of features extracted by tools for research.

In the neural network method, the sixth scholar first used a convolutional pooling network to extract feature information of the event sentence and the trigger word context, and then introduced event pair matching features to assist in determining whether there is a synonymous relationship between event pairs. The seventh scholar first used a fully connected layer to perform a dimensionality change operation on the two event sentences, and then calculated the cosine distance and Euclidean distance of the two event sentences, and finally used an activation function to derive a confidence level to determine the synonymous relationship. Fang Jie mainly used the attention mechanism to extract important information from event sentences, and combined the linear similarity between event sentences with event pair matching features to determine whether there is a synonymous relationship between event pairs.

The above-mentioned related technologies have the following disadvantages:

First, probability or graph-based machine learning methods require a lot of feature engineering to extract features, which has high labor costs, low accuracy, and poor portability.

Second, the method proposed by the six relevant scholars uses a convolutional neural network to extract the contextual feature information of words in event sentences. It only considers the local information between words in the event sentences, does not consider the relationship between a pair of event sentences, and does not deeply extract the features in the event sentences, resulting in low performance of event co-reference resolution.

Third, the method proposed by the relevant scholar No. 7 simply performs dimensionality transformation on the event sentences, and does not extract features in depth, which results in the calculated cosine distance and Euclidean distance between the event sentences being not very accurate, affecting the final classification performance.

Fourth, the input information of the neural network method is not rich enough and has certain errors. It basically only combines the event sentence and the relative distance between each word and the trigger word. In addition, three words before and after the trigger word are taken to form an event short sentence. However, the event short sentences extracted using fixed rules will have certain errors. This affects the discriminative performance of the model.

In order to solve the above shortcomings, this application proposes a text processing method, device, electronic device and storage medium. It aims to pre-train accurate word vectors to represent event sentences, deeply extract useful feature information from event sentences with high dimensions, complex semantic information and complex sentence structures, and assist in distinguishing the same reference relationship by calculating the similarity between event short sentences.

The disclosed embodiment proposes a text processing method, the functions implemented by the method can be implemented by calling program codes by a processor in a text processing device. Of course, the program codes can be stored in a computer storage medium. It can be seen that the computing device at least includes a processor and a storage medium.

FIG1 is a schematic diagram of a text processing method according to an embodiment of the present disclosure. As shown in FIG1 , the method includes:

Step 101: Acquire event pair data included in a first text;

Step 102: using a dependency syntax analysis tool to process the event pair data to obtain event short sentence pair data corresponding to the event pair data;

Step 103: determining a first linear similarity and a first non-linear similarity of the event pair data and determining a second linear similarity and a second non-linear similarity of the event phrase pair data;

Step 104: Determine the confidence of the event pair data based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence represents the degree to which the event pair data has a coreference relationship.

In step 101, the text processing method can be determined according to actual conditions and is not limited here. As an example, the text processing method can be an event homonym resolution method based on BERT pre-training.

The first text may be determined according to actual conditions, and is not limited here. As an example, the first text may be an event sentence. The obtaining of the first text may be to determine the event sentence based on the corpus in a preset corpus. The preset corpus may be determined according to actual conditions, and is not limited here. As an example, the preset corpus may be the corpus of the International Knowledge Base Population Contest and the 2005 Automatic Content Extraction Contest. Extraction, ACE) corpus.

The step of acquiring the event pair data included in the first text may include acquiring the first text; and preprocessing the first text to obtain the event pair data included in the first text.

In some embodiments, two event sentences in the first text that need to be judged as having a same-referential relationship are used as event pair data included in the first text.

In some embodiments, the preprocessing of the first text may include: performing data cleaning on the first text in combination with a regular expression and a stop word list; filtering special symbols and stop words in the first text; and restoring words in the first text to their original forms.

In step 102, the event pair data is processed using a dependency syntax analysis tool to obtain event short sentence pair data corresponding to the event pair data. This can be done by respectively processing each of the two event sentences in the event pair data using a dependency syntax analysis tool to obtain two event short sentences corresponding to the two event sentences, and using the two event short sentences as the event short sentence pair data corresponding to the event pair data.

In step 103, the first linear similarity can be determined according to actual conditions, which is not limited here. As an example, the first linear similarity can be the first cosine distance of the event pair data; the second linear similarity can be determined according to actual conditions, which is not limited here. As an example, the second linear similarity can be the second cosine distance of the event phrase pair data.

The first nonlinear similarity can be determined according to actual conditions and is not limited here. As an example, the first nonlinear similarity can be the first bilinear distance and the first single-layer network distance of the event pair data; the second nonlinear similarity can be determined according to actual conditions and is not limited here. As an example, the second nonlinear similarity can be the second bilinear distance and the second single-layer network distance of the event phrase pair data.

In step 104, after determining the confidence of the event pair data, the method further includes: judging whether the confidence is greater than a preset threshold; if the confidence is greater than the preset threshold, determining that the event pair data has a same-reference relationship; wherein the existence of the same-reference relationship between the event pair data indicates that the degree of the same-reference relationship between the event pair data is high; if the confidence is less than or equal to the preset threshold, determining that the event pair data does not have a same-reference relationship; wherein the event pair data has a same-reference relationship between the event pair data and the same-reference relationship between the event pair data. The absence of a common reference relationship for the data indicates that the degree to which the event has a common reference relationship for the data is low. The preset threshold can be determined according to actual conditions and is not limited here. As an example, the confidence level can be a value between 0 and 1, and the preset threshold can be 0.5.

The disclosed embodiment provides a text processing method, which obtains event pair data included in a first text; uses a dependency syntax analysis tool to process the event pair data to obtain event short sentence pair data corresponding to the event pair data; determines the first linear similarity and the first non-linear similarity of the event pair data and determines the second linear similarity and the second non-linear similarity of the event short sentence pair data; determines the confidence of the event pair data based on the event pair data, the event short sentence pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence characterizes the degree to which the event pair data has a co-referential relationship. By combining the first linear similarity and the first non-linear similarity of the event pair data and the second linear similarity and the second non-linear similarity of the event short sentence pair data to determine the confidence of the event pair data, it is possible to make up for the defect that the linear similarity determines the confidence by only considering the event pair data as a whole.

This embodiment proposes a method combining linear similarity and non-linear similarity, and uses non-linear similarity to calculate the similarity between words to make up for the shortcoming that linear similarity can only calculate the similarity between sentences of the entire event.

In an optional embodiment of the present disclosure, the process of using a dependency syntax analysis tool to process the event pair data to obtain event short sentence pair data corresponding to the event pair data includes:

Sorting the first distance and the second distance to obtain a sorting result;

In this embodiment, the dependency syntax analysis tool can be determined according to actual conditions, and is not limited here. As an example, the dependency syntax analysis tool can be a Stanford natural language processing tool.

The trigger word can be determined according to actual conditions and is not limited here. As an example, the trigger word can be a word in the event sentence that starts a process or action process.

The argument can be determined according to actual conditions and is not limited here. As an example, the argument can be the agent, the patient, the time and place of the event in the event sentence, etc.

The dependent words can be determined according to actual conditions and are not limited here. As an example, the dependent words can be the subject and object in the event sentence.

The method of sorting the first distance and the second distance can be determined according to actual conditions and is not limited here. As an example, the first distance and the second distance are arranged in order from small to large to obtain the sorting result.

This embodiment uses a dependency word analysis tool to obtain the dependent words of the trigger word, and then uses the trigger word, the dependent words, and the arguments together to determine the starting and ending positions of the event short sentence in the sentence, thereby extracting the event short sentence.

In an optional embodiment of the present disclosure, determining the confidence of the event pair data in the first text based on the event pair data, the event short sentence pair data, the first linear similarity, the first non-linear similarity, the second linear similarity, and the second non-linear similarity includes:

Determine a confidence vector of the event pair data in the first text based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity, and the second non-linear similarity;

In this embodiment, the confidence vector is processed based on the fully connected classifier to obtain the confidence of the event on the data. The confidence vector is processed using a slope (Rectified Linear Unit, relu) activation function in the fully connected classifier to obtain a processed confidence vector; the processed confidence vector is processed by a logistic regression model (sigmoid) activation function to obtain the confidence of the event on the data.

In an optional embodiment of the present disclosure, the method further includes:

The pre-trained model BERT is used to predict the event pair data to obtain the word vector pairs corresponding to the event pair data.

In this embodiment, the pre-trained model BERT is used to predict the event pair data to obtain the word vector pair corresponding to the event pair data. The method can be: the BERT is used to predict the masked words or sentences by using characters to respectively mask the words of each of the two event sentences in the event pair data or the sentences in the text where each of the two event sentences is located, to obtain two word vectors corresponding to the two event sentences in the event pair data, and the two word vectors are used as the word vector pair corresponding to the event pair data.

This embodiment no longer uses fixed word vectors, but instead uses the BERT pre-trained model for training to obtain accurate word vector expressions.

In an optional embodiment of the present disclosure, the event pair data includes a plurality of word pair data; the method further includes:

In this embodiment, the word data can be determined according to actual conditions, and no limitation is made here. As an example, the word data can be a word in the event sentence.

The event pair data may include a plurality of word pair data, and each of the two event sentences in the event pair data may include a plurality of word data respectively, and the plurality of word data respectively included in each of the two event sentences may be used as the plurality of word pair data included in the event pair data.

The method of obtaining the first information pair and the second information pair of the plurality of word pair data in the event pair data may be as follows: respectively obtaining the first information and the second information of the plurality of word data of each event sentence in the two event sentences in the event pair data, and using the first information and the second information of the plurality of word data of each event sentence in the two event sentences as the first information pair and the second information of the plurality of word pair data in the event pair data. Interest is right.

The method of obtaining the first information pair and the second information pair of the multiple word pair data in the event pair data may be: using the Stanford natural language processing tool to determine the first information pair of the multiple word pair data in the event pair data; and determining the second information pair of the multiple word pair data in the event pair data based on the relative distance between each word pair data in the multiple word pair data and the trigger word of the event pair data.

The determination of the first event vector pair corresponding to the event pair data based on the word vector pair, the event pair data, the first information pair and the second information pair may be as follows: encoding the event pair data based on the word vector pair to obtain a first dimension vector pair; encoding the first information pair based on the word vector pair to obtain a second dimension vector pair; determining a third dimension vector pair based on the second information pair; and determining the first event vector pair based on the first dimension vector pair, the second dimension vector pair and the third dimension vector pair. The first dimension vector pair may be an event vector pair of the first dimension; the second dimension vector pair may be a part-of-speech vector pair of the second dimension; the third dimension vector pair may be a position vector pair of the third dimension; and the first event vector pair may be an event vector pair of the fourth dimension.

This embodiment concatenates the event sentence, the position information of each word in the event sentence, and the part-of-speech information of each word, thereby enriching the feature information of the input data.

Using a long short-term memory network Bi-LSTM to extract the first event vector pair, to obtain a global information pair corresponding to the first event vector pair;

Using a convolutional neural network (CNN) to extract the first event vector pair, to obtain a local information pair corresponding to the first event vector pair;

The fused vector pair is processed by a first global maximum pooling layer to obtain a second event vector pair corresponding to the first event vector pair.

In this embodiment, the long short-term memory network Bi-LSTM is used to process the first event vector pair. Row extraction, to obtain the global information pair corresponding to the first event vector pair can be, using the Bi-LSTM to transmit the word information of each event sentence in the two event sentences of the first event vector in a front-to-back order, and then in a back-to-front order; obtain the global information of each event sentence in the two event sentences of the first event vector pair; use the global information of each event sentence in the two event sentences as the global information pair corresponding to the first event vector pair. Among them, the number of neurons of the Bi-LSTM can be determined according to actual conditions, and is not limited here. As an example, the number of neurons of the Bi-LSTM can be 150. The global information pair can be determined according to actual conditions, and is not limited here. As an example, the global information pair can be a global vector pair. The global information pair can be a global vector pair of the fifth dimension.

The use of the convolutional neural network CNN to extract the first event vector pair to obtain the local information pair corresponding to the first event vector pair can be to use the CNN to extract the local information of each of the two event sentences of the first event vector; and use the local information of each of the two event sentences as the local information pair corresponding to the first event vector pair. The number of convolution kernels and the convolution kernel window size of the CNN can be determined according to actual conditions and are not limited here. As an example, the number of convolution kernels of the CNN is set to 300 and the convolution kernel window size is 2.

When the convolution kernel window size is 2, the local information between two adjacent words in each of the two event sentences of the first event vector is obtained by using the CNN; the local information between two adjacent words in each of the two event sentences is used as the local information pair corresponding to the first event vector pair. The local information pair can be determined according to actual conditions and is not limited here. As an example, the local information pair can be a local vector pair. The local information pair can be a local vector pair of the sixth dimension.

The fusing of the global information pair and the local information pair to obtain the fused vector pair corresponding to the first event vector pair may be performed by bitwise addition of the global information pair and the local information pair to obtain the fused vector pair corresponding to the first event vector pair. The fused vector pair may be a fused vector pair of the seventh dimension.

The second event vector pair can be determined according to actual conditions and is not limited here. As an example, the second event vector pair can be an event vector pair of the eighth dimension.

In an optional embodiment of the present disclosure, determining the first linear similarity and the first non-linear similarity of the event pair data includes:

In this embodiment, determining the first linear similarity and the first non-linear similarity of the event pair data according to the second event vector pair may be determining the first linear similarity and the first non-linear similarity of the event pair data according to two second event vectors in the second event vector pair.

In this embodiment, the determination of the first event short sentence vector pair corresponding to the event short sentence pair data based on the word vector pair and the event short sentence pair data can be performed by encoding the event short sentence pair data based on the word vector pair to obtain the first event short sentence vector pair corresponding to the event short sentence pair data; the first event short sentence vector pair can be an event short sentence vector pair of the ninth dimension.

The second event phrase vector pair can be determined according to actual conditions and is not limited here. As an example, the second event phrase vector pair can be an event phrase vector pair of the tenth dimension.

In an optional embodiment of the present disclosure, determining the second linear similarity and the second non-linear similarity of the event phrase pair data includes:

In this embodiment, the second line of the event pair data is determined according to the second event phrase vector pair. The second linear similarity and the second non-linear similarity may be determined by determining the second linear similarity and the second non-linear similarity of the event pair data according to two second event phrase vectors in the second event phrase vector pair.

In some embodiments, the confidence of the event pair data in the first text is determined based on the second event vector pair, the second event phrase vector pair, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity.

In some embodiments, the company's current intelligent customer service system still relies heavily on manual customer service to answer customers' questions. The method proposed in this embodiment can automatically obtain the answer that best matches the question raised by the customer, thereby reducing labor costs and improving user experience.

This embodiment effectively enriches the feature information of the input data, and performs one-to-one splicing of words, word position information and word part-of-speech information; uses the BERT pre-training model for training to obtain accurate word vector expressions; uses Bi-LSTM to encode event sentences to obtain global vectors, and uses CNN to encode event sentences to obtain local vectors, and combines the two; uses the dependency words, trigger words and arguments of trigger words to extract event short sentences, rather than fixedly extracting three words before and after the trigger word to form an event short sentence; combines linear similarity with nonlinear similarity, and does not only calculate linear similarity, but also calculates nonlinear similarity to make up for the shortcomings of linear similarity; compared with the methods of related technologies, the performance is improved.

For ease of understanding, an event homonymy resolution method based on BERT pre-training is exemplified here. The method is applied to an event homonymy resolution system (BNN system) based on BERT, Bi-LSTM and CNN. FIG2 is a technical flow diagram of the BNN system of the text processing method of the embodiment of the present disclosure. As shown in FIG2, the method includes the following steps:

The first step: preprocess the event sentences.

The corpus was determined using the KBP and ACE2005 corpora. The KBP corpus contains 6538 event sentences, and the ACE2005 corpus contains 5349 event sentences. The event sentences provided by the corpus were input into the preprocessing module of the BNN system. The event sentences provided by the corpus are news texts directly crawled from web pages. Since there are a large number of special symbols, stop words and other irrelevant information in the crawled text data, the text data needs to be processed in the preprocessing module. The preprocessing module mainly uses regular expressions and stop word lists to clean the text data, filter out special symbols and stop words, and replace the sentences with the original text. The words are restored to their original form. The processed sentences are used as the input event sentences (Sentence, Sen). The Stanford natural language processing tool is used to obtain the part-of-speech information (Pos) of each word in the event sentence, and then the location information (Location, Loc) of each word in the sentence is assigned. The location information is the relative distance between each word and the trigger word of the event sentence. The preprocessing module uses the two event sentences that need to be judged as the event pair data.

Step 2: Perform BERT prediction on event sentences.

Since the accuracy of the information input into the BNN system largely determines the accuracy of event co-reference, previous experiments mostly use fixed word vectors to represent input information, which is not accurate enough for the representation of event sentences. This embodiment uses the BERT pre-training model to obtain the vector representation of words.

The BERT pre-training model predicts the masked words or sentences by masking the words in the event sentence or the sentences in the text where the event sentence is located with characters, thereby obtaining the vector representation BM of various words. Therefore, there is a strong correlation between the words in the sentence, and there is also a strong contextual connectivity and logic between the sentences in the text. It has a great impact on the experimental results. The formula is shown in (1):
BM _i = BERT(Sen _i )(i=1,2) (1)

Step 3: Use word vectors to encode event sentences.

The word vector BM trained by the BERT pre-training model is used to encode the event sentence Sen and the part-of-speech information Pos to obtain the event sentence vector SEN with a dimension of a×b and the part-of-speech vector POS with a dimension of a×b. Then, the event sentence vector, the part-of-speech vector and the position information with a dimension of a×1 are horizontally concatenated to form an event vector EB with a dimension of a×(2b+1). The formula is shown in (2):
EB _i =Concat(SEN _i ,POS _i ,Loc _i )(i=1,2) (2)

Step 4: Extract global and local information of event sentences.

When comparing whether two event sentences are in a co-referential relationship, we can first observe whether there are similarities between the two event sentences from the overall structure. If the similarity is not high, it is still possible that they are in a co-referential relationship. For this reason, we need to compare word by word to find the co-referential relationship between the two.

To this end, this embodiment first uses Bi-LSTM to extract the global information of the event vector EB, and sets the number of Bi-LSTM neurons to 150. Bi-LSTM will pass the information of the previous words in the event sentence to the back in sequence, and then pass the information from the back to the front in reverse, observing an event sentence from a global perspective. Use CNN to extract local information of event vector EB, set the number of CNN convolution kernels to 300, the convolution kernel window size to 2, and keep the dimension unchanged. Since the convolution kernel window size is 2, local information between two adjacent words in the event sentence will be extracted. The two networks obtain a global vector GE with a dimension of a×300 and a local vector LE with a dimension of a×300, respectively, as shown in formulas (3) and (4):

LE _i =Conv(EB _i )(i=1,2) (4)

Since the dimensions of the global vector GE and the local vector LE are the same, this embodiment adds the global vector GE and the local vector LE bit by bit to obtain a vector GL with a dimension of a×300, which is equivalent to fusing the global information and local information of each word in the event sentence together. Formula (5) is shown as follows:

Finally, the vector GL is passed through the global maximum pooling layer to obtain a vector EX with a dimension of a×1, as shown in formula (6):
EX _i =GlobalMax(GL _i )(i=1,2) (6)

Step 5: Extract event short sentences from event sentences.

In related technologies, researchers will extract three words before and after the trigger word as event short sentences to briefly describe the event sentence. This method may extract a sentence with incomplete structural information, thereby incorrectly representing the meaning of the original sentence. To this end, this embodiment optimizes the extraction method, and the steps of event short sentence extraction are as follows:

Step (5.1) uses the Stanford natural language processing tool to obtain the arguments in the event sentence. The arguments mainly include: agent, patient, time and place of the event, etc.

Step (5.2) uses a dependency word analysis tool to generate dependency words for the trigger word in the sentence.

Step (5.3) calculates the distance between each argument and each dependency word and the trigger word, determines the two words farthest from the trigger word before and after the trigger word, and uses these two words as the start and end positions of the event phrase.

Step (5.4) extracts the sentence from the starting position to the ending position as the event short sentence.

Example: A sentence: Zhang Junxiong, the newly appointed Executive President, was also invited to attend the inauguration ceremony and delivered a speech. In the sentence, the trigger word is "appointed", and the dependent words of the trigger word are "Zhang Junxiong", "President", and "invited". The distances between the three dependent words and the trigger word are -3, 2, and 5. The arguments in the event sentence are "Zhang Junxiong" and "invited", and the distances between the two arguments and the trigger word are -3 and 5 respectively.

The event short sentence extracted according to the fixed method of extracting short sentences is "Junxiong the newly appointed Executive President was", which shows that the short sentence is incomplete. According to the optimization method proposed in this embodiment, the dependent word or argument "Zhang Junxiong" farthest before the trigger word is taken as the starting position, and the dependent word or argument "invited" farthest after the trigger word is taken as the ending position, then the event short sentence "Zhang Junxiong the newly appointed Executive President was also invited" can be extracted.

According to the above method, the event short sentence is obtained, and the word vector BM is used to encode the event short sentence to obtain the event short sentence vector ES with a dimension of a×b. Then the event short sentence vector ES is passed through the global maximum pooling layer to obtain the event short sentence vector SX with a dimension of a×1. The formula is shown in (7):
SX _i =GlobalMax(ES _i )(i=1,2) (7)

Step 6: Calculate the similarity between two event sentences.

The key to determining whether there is a synonymous relationship between event sentences is to calculate the similarity between the two. The accuracy and comprehensiveness of the similarity calculation has a great impact on the performance results of the model. In related technologies, researchers have only used the cosine distance calculation method to obtain the linear similarity between event sentences. Linear similarity considers the relationship between two event sentences from a holistic perspective. If the structural gap between the two is too large, it will be misjudged as a non-synonymous relationship. Non-linear similarity can calculate the relationship between words between a pair of events, thereby making up for the shortcomings of linear similarity.

This embodiment proposes three similarity calculation methods, namely: cosine distance C, bilinear distance S and single-layer network distance L. The formulas are shown in (8), (9), (10), (11), (12) and (13):

In formula (8), C ₁ represents the cosine distance corresponding to the event sentence vector. In formula (9), C ₂ represents the cosine distance corresponding to the event short sentence vector. In formula (10), represents the weight used to calculate the bilinear distance corresponding to the event sentence vector. In formula (11), represents the weight used to calculate the bilinear distance corresponding to the event short sentence vector. In formula (12), Represents the weight used to calculate the single-layer network distance corresponding to the event sentence vector; Represents the offset vector used to calculate the single-layer network distance corresponding to the event sentence vector. In formula (13), Represents the weight used to calculate the single-layer network distance corresponding to the event short sentence vector; Represents the offset vector used to calculate the single-layer network distance corresponding to the event short sentence vector.

Step 7: Output confidence.

Combine the event sentence vector EX, the event short sentence vector SX, the similarity vector C, the similarity bilinear vector S and the similarity single-layer network vector L to generate the vector P, as shown in (14):
P = Concat(EX ₁ , EX ₂ , SX ₁ , SX ₂ , C ₁ , C ₂ , S ₁ , S ₂ , L ₁ , L ₂ ) (14)

Put the vector P into a fully connected classifier, which uses the relu activation function as shown in (15):
V _h =α(W _h *P+b _h ) (15)

In formula (15), W _h represents the weight of the activation function corresponding to the vector P; b _h represents the offset vector of the activation function corresponding to the vector P.

The confidence of event homonymy is obtained through the sigmoid layer, as shown in formula (16):
score = sigmoid ( W ₀ * V _h + b ₀ ) (16)

In formula (16), _W0 represents the weight of the confidence; _b0 represents the offset vector of the confidence.

The confidence score is a value between 0 and 1. If the score is greater than 0.5, it is determined to be a co-referential relationship. Otherwise, it is determined to be a non-co-referential relationship. In order to prevent overfitting, this embodiment uses Dropout, which is a strategy widely used in deep learning to solve the problem of model overfitting. The value is set to 0.2.

The BNN system uses BERT pre-training and extraction of global and local information to transform the semantics of text content. The information is mined accurately and comprehensively and converted into vector expressions. Through the extraction of event short sentences and the calculation of similarity distance, the auxiliary model can identify the same-reference relationship. The system has achieved good results in actual tests, and has improved performance compared with the methods in related technologies and existing technologies. Table 1 shows the KBP performance result data, and Table 2 shows the ACE performance result data. As described in Tables 1 and 2, the performance results are as follows:

Table 1 shows the KBP performance results.

In Table 1, MUC, B3, BLANC, CEAFe, and Links are performance evaluation methods, and KBP and ACE are test sets.

As can be seen from Table 1, the BNN system has greatly improved compared with the neural network methods of related scholars 6 and KBP-TOP, and has improved by 0.6% on average compared with the machine learning method of related scholar 4. Although it is only improved by 0.6%, the neural network method has the advantages of low labor cost, high efficiency and strong portability compared with the machine learning method.

The present disclosure provides a text processing device. FIG3 is a schematic diagram of the structure of the text processing device according to the present disclosure. As shown in FIG3 , the device 300 includes:

A first acquisition module 301 is configured to acquire event pair data included in a first text;

A first processing module 302 is configured to process the event pair data using a dependency syntax analysis tool to obtain event short sentence pair data corresponding to the event pair data;

A first determination module 303 is configured to determine a first linear similarity and a first non-linear similarity of the event pair data and to determine a second linear similarity and a second non-linear similarity of the event phrase pair data;

The second determination module 304 is configured to determine the confidence of the event pair data based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence represents the degree to which the event pair data has a synonymous relationship.

In other embodiments, the first processing module 302 is further configured to use the dependency syntax analysis tool to determine the arguments and dependent words of the trigger word in the event pair data; determine a first distance between the argument and the trigger word, and determine a second distance between the dependency word and the trigger word; sort the first distance and the second distance to obtain a sorting result; determine two arguments or trigger words corresponding to the maximum distance in the sorting result, and use the two arguments or trigger words corresponding to the maximum distance as the starting word and ending word of the event short sentence pair data; intercept the event pair data based on the starting word and the ending word to obtain the event short sentence pair data.

In other embodiments, the second determination module 304 is further configured to determine a confidence vector of the event pair data in the first text based on the event pair data, the event sentence pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; and process the confidence vector based on a fully connected classifier to obtain the confidence of the event pair data.

In other embodiments, the device 300 further includes: a prediction module, configured to use a pre-trained model BERT to predict the event pair data to obtain a word vector pair corresponding to the event pair data.

In other embodiments, the event pair data includes a plurality of word pair data; the device 300 further includes: a second acquisition module and a third determination module; wherein,

The second acquisition module is configured to acquire a first information pair and a second information pair of a plurality of word pair data in the event pair data; the first information pair represents a part-of-speech information pair of the word pair data; the second information pair represents a position information pair of the word pair data;

The third determination module is configured to determine a first event vector pair corresponding to the event pair data based on the word vector pair, the event pair data, the first information pair and the second information pair.

In other embodiments, the device 300 further includes: a first extraction module, a second extraction module, a fusion module, and a second processing module; wherein,

The first extraction module is configured to extract the first event vector pair using a long short-term memory network Bi-LSTM to obtain a global information pair corresponding to the first event vector pair;

The second extraction module is configured to extract the first event vector pair using a convolutional neural network (CNN) to obtain a local information pair corresponding to the first event vector pair;

The fusion module is configured to fuse the global information pair and the local information pair to obtain a fusion vector pair corresponding to the first event vector pair;

The second processing module is configured to perform a first global maximum pooling layer processing on the fused vector pair to obtain a second event vector pair corresponding to the first event vector pair.

In other embodiments, the first determination module 303 is further configured to determine a first linear similarity and a first non-linear similarity of the event pair data based on the second event vector pair; wherein the first linear similarity includes a first cosine distance; and the first non-linear similarity includes at least one of a first bilinear distance and a first single-layer network distance.

In other embodiments, the device 300 further includes: a fourth determining module and a third processing module; wherein,

The fourth determination module is configured to determine a first event short sentence vector pair corresponding to the event short sentence pair data based on the word vector pair and the event short sentence pair data;

The third processing module is configured to perform a second global maximum pooling layer processing on the first event short sentence vector pair to obtain a second event short sentence vector pair corresponding to the first event short sentence vector pair.

In other embodiments, the first determination module 303 is further configured to determine a second linear similarity and a second non-linear similarity of the event phrase pair data based on the second event phrase vector pair; wherein the second linear similarity includes a second cosine distance; and the second non-linear similarity includes at least one of a second bilinear distance and a second single-layer network distance.

The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiment of the present disclosure, please refer to the description of the method embodiment of the present disclosure for understanding.

It should be noted that in the embodiments of the present disclosure, if the above-mentioned text processing method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical embodiment of the embodiments of the present disclosure is essentially or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a storage medium and includes a number of instructions for enabling a text processing device (which can be a personal computer, server, or network device, etc.) to execute the entire method described in each embodiment of the present disclosure. The aforementioned storage medium includes: a USB flash drive, a mobile hard disk, a read-only memory (ROM), a magnetic disk or an optical disk, and other media that can store program codes. Thus, the embodiments of the present disclosure are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present disclosure further provides a text processing device, including a memory and a processor, wherein the memory stores a computer program that can be executed on the processor, and when the processor executes the program, any step in the above-mentioned method is implemented.

Correspondingly, an embodiment of the present disclosure further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, any step in the above-mentioned method is implemented.

It should be noted here that the description of the above storage medium and device embodiments is similar to the description of the above method embodiments, and has similar beneficial effects as the method embodiments. For technical details not disclosed in the storage medium and device embodiments of the present disclosure, please refer to the description of the method embodiments of the present disclosure for understanding.

It should be noted that Figure 4 is a schematic diagram of a hardware entity structure of a text processing device according to an embodiment of the present disclosure. As shown in Figure 4 , the hardware entity of the text processing device 400 includes: a processor 401 and a memory 403. Optionally, the text processing device 400 may also include a communication interface 402.

It can be understood that the memory 403 can be a volatile memory or a non-volatile memory, and can also include both volatile and non-volatile memories. Among them, the non-volatile memory can be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a magnetic random access memory (FRAM), a flash memory, a magnetic surface memory, an optical disk, or a compact disc read-only memory (CD-ROM); the magnetic surface memory can be a disk memory or a tape memory. The volatile memory can be a random access memory (RAM), which is used as an external cache. By way of example and not limitation, many forms of RAM are available, such as static random access memory (SRAM), synchronous static random access memory (SSRAM), dynamic random access memory (DRAM), and so on. Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM), SyncLink Dynamic Random Access Memory (SLDRAM), Direct Rambus Random Access Memory (DRRAM). The memory 403 described in the embodiments of the present disclosure is intended to include but is not limited to these and any other suitable types of memory.

The method disclosed in the above embodiment of the present disclosure can be applied to the processor 401, or implemented by the processor 401. The processor 401 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by the hardware integrated logic circuit in the processor 401 or the instruction in the form of software. The above processor 401 can be a general processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The processor 401 can implement or execute the methods, steps and logic block diagrams disclosed in the embodiment of the present disclosure. The general processor can be a microprocessor or any conventional processor, etc. The steps of the method disclosed in the embodiment of the present disclosure can be directly embodied as a hardware decoding processor to execute, or a combination of hardware and software modules in the decoding processor to execute. The software module can be located in a storage medium, which is located in the memory 403. The processor 401 reads the information in the memory 403 and completes the steps of the above method in combination with its hardware.

In an exemplary embodiment, the text processing device can be implemented by one or more application specific integrated circuits (ASIC), DSP, programmable logic device (PLD), complex programmable logic device (CPLD), field programmable gate array (FPGA), general processor, controller, microcontroller (MCU), microprocessor, or other electronic components to execute the aforementioned method.

In the several embodiments provided in the present disclosure, it should be understood that the disclosed methods and devices can be implemented in other ways. The device embodiments described above are merely illustrative. For example, The division of units is only a logical function division. There may be other divisions in actual implementation, such as: multiple units or components can be combined, or can be integrated into another observation, or some features can be ignored or not executed. In addition, the communication connection between the components shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.

The units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units; some or all of the units may be selected according to actual needs to achieve the purpose of this embodiment.

A person of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiment can be completed by hardware related to program instructions, and the aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps of the above method embodiment; and the aforementioned storage medium includes: mobile storage devices, read-only memories (ROM, Read-Only Memory), magnetic disks or optical disks, and other media that can store program codes.

Alternatively, if the above-mentioned integrated unit of the embodiment of the present disclosure is implemented in the form of a software functional unit and sold or used as an independent product, it can also be stored in a computer-readable storage medium. Based on this understanding, the technical embodiment of the embodiment of the present disclosure is essentially or the part that contributes to the prior art can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including a number of instructions for a text processing device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in each embodiment of the present disclosure. The aforementioned storage medium includes: various media that can store program codes, such as mobile storage devices, ROMs, magnetic disks, or optical disks.

The text processing method, device and computer storage medium recorded in the examples of the present disclosure are only taken as examples of the embodiments described in the present disclosure, but are not limited to this. As long as the text processing method, device and computer storage medium are involved, they are within the protection scope of the present disclosure.

It should be understood that references to "one embodiment" or "an embodiment" throughout the specification mean that a particular feature, structure, or characteristic associated with the embodiment is included in at least one embodiment of the present disclosure. The phrases “in one embodiment” or “in an embodiment” that appear in various places in the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics may be combined in one or more embodiments in any suitable manner. It should be understood that in the various embodiments of the present disclosure, the size of the serial numbers of the above-mentioned processes does not mean the order of execution. The order of execution of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure. The serial numbers of the embodiments of the present disclosure are for description only and do not represent the advantages and disadvantages of the embodiments.

It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or device including the element.

The above is only a specific embodiment of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art who is familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present disclosure, which should be included in the protection scope of the present disclosure. Therefore, the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

A text processing method, comprising:

Acquire event pair data included in the first text;

Using a dependency syntax analysis tool to process the event pair data to obtain event short sentence pair data corresponding to the event pair data;

Determine a first linear similarity and a first non-linear similarity of the event pair data and determine a second linear similarity and a second non-linear similarity of the event phrase pair data;

The confidence of the event pair data is determined based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence represents the degree to which the event pair data has a same-reference relationship.
The method according to claim 1, wherein the step of processing the event pair data using a dependency syntax analysis tool to obtain event short sentence pair data corresponding to the event pair data comprises:

Determining arguments and dependent words of trigger words in the event pair data using the dependency syntax analysis tool;

Determining a first distance between the argument and the trigger word, and determining a second distance between the dependency word and the trigger word;

Sorting the first distance and the second distance to obtain a sorting result;

Determine two arguments or trigger words corresponding to the maximum distance in the sorting result, and use the two arguments or trigger words corresponding to the maximum distance as the start word and the end word of the event short sentence pair data;

The event pair data is intercepted based on the start word and the end word to obtain the event short sentence pair data.
The method according to claim 1, wherein the determining the confidence of the event pair data in the first text based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity comprises:

Based on the event pair data, the event short sentence pair data, the first linear similarity, the The first non-linear similarity, the second linear similarity and the second non-linear similarity determine a confidence vector of event pair data in the first text;

The confidence vector is processed based on a fully connected classifier to obtain the confidence of the event on the data.
The method according to claim 1, wherein the method further comprises:

The pre-trained model BERT is used to predict the event pair data to obtain the word vector pairs corresponding to the event pair data.
The method according to claim 4, wherein the event pair data comprises a plurality of word pair data; the method further comprising:

Acquire a first information pair and a second information pair of a plurality of word pair data in the event pair data; the first information pair represents a part-of-speech information pair of the word pair data; the second information pair represents a position information pair of the word pair data;

Based on the word vector pair, the event pair data, the first information pair and the second information pair, a first event vector pair corresponding to the event pair data is determined.
The method according to claim 5, wherein the method further comprises:

Using a long short-term memory network Bi-LSTM to extract the first event vector pair, to obtain a global information pair corresponding to the first event vector pair;

Using a convolutional neural network (CNN) to extract the first event vector pair, to obtain a local information pair corresponding to the first event vector pair;

fusing the global information pair and the local information pair to obtain a fused vector pair corresponding to the first event vector pair;

The fused vector pair is processed by a first global maximum pooling layer to obtain a second event vector pair corresponding to the first event vector pair.
The method according to claim 6, wherein determining the first linear similarity and the first non-linear similarity of the event pair data comprises:

Determine a first linear similarity and a first nonlinear similarity of the event pair data according to the second event vector pair;

The first linear similarity includes a first cosine distance; and the first non-linear similarity includes at least one of a first bilinear distance and a first single-layer network distance.
The method according to claim 4, wherein the method further comprises:

Based on the word vector pair and the event short sentence pair data, determining a first event short sentence vector pair corresponding to the event short sentence pair data;

The first event short sentence vector pair is processed by a second global maximum pooling layer to obtain a second event short sentence vector pair corresponding to the first event short sentence vector pair.
The method according to claim 8, wherein determining the second linear similarity and the second non-linear similarity of the event phrase pair data comprises:

Determine a second linear similarity and a second non-linear similarity of the event phrase pair data according to the second event phrase vector pair;

The second linear similarity includes a second cosine distance; and the second non-linear similarity includes at least one of a second bilinear distance and a second single-layer network distance.
A text processing device, comprising:

A first acquisition module, configured to acquire event pair data included in the first text;

A first processing module is configured to process the event pair data using a dependency syntax analysis tool to obtain event short sentence pair data corresponding to the event pair data;

A first determination module is configured to determine a first linear similarity and a first non-linear similarity of the event pair data and to determine a second linear similarity and a second non-linear similarity of the event phrase pair data;

The second determination module is configured to determine the confidence of the event pair data based on the event pair data, the event phrase pair data, the first linear similarity, the first non-linear similarity, the second linear similarity and the second non-linear similarity; the confidence represents the degree to which the event pair data has a synonymous relationship.
A text processing device comprises a memory and a processor, wherein the memory stores a computer program that can be run on the processor, and when the processor executes the program, the method described in any one of claims 1 to 9 is implemented.
A storage medium storing executable instructions, wherein when the executable instructions are executed by a processor, the method according to any one of claims 1 to 9 is implemented.