CN115952790A

CN115952790A - Information extraction method and device

Info

Publication number: CN115952790A
Application number: CN202211579222.4A
Authority: CN
Inventors: 陈庆洋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-04-11

Abstract

The application discloses an information extraction method and device, relates to the field of artificial intelligence, and particularly relates to technologies such as Natural Language Processing (NLP) and deep learning. The specific implementation scheme is as follows: acquiring a feature vector of a text to be processed and a dependency relationship graph of the text to be processed; generating a corresponding distance matrix according to the dependency relationship graph; calculating the attention value between any two words in the text to be processed by adopting an attention mechanism according to the characteristic vector and the distance matrix of the text to be processed; obtaining semantic feature representation of the text to be processed according to the feature vector of the text to be processed and the attention value between any two words in the text to be processed; and extracting information of the text to be processed according to the semantic feature representation of the text to be processed. The dependency relationship information in the text to be processed is coded in a display coding mode, so that more accurate semantic feature representation can be obtained. The semantic feature representation is used for information extraction, so that the information extraction effect can be improved.

Description

Information extraction method and device

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to techniques of Natural Language Processing (NLP) and deep learning, and more particularly, to an information extraction method and apparatus.

Background

The information extraction refers to a text processing technology for extracting fact information such as entities, relations and events of specified types from natural language texts and forming structured data output. Information extraction is a technique for extracting specific information from text data. The text data is composed of specific units, such as sentences, paragraphs and chapters, and the text information is composed of small specific units, such as words, phrases, sentences and paragraphs or combinations of these specific units. The extraction of noun phrases, names of people, names of places, etc. in the text data is text information extraction, and of course, the information extracted by the text information extraction technology can be various types of information.

In the related art, generally, text data needs to be semantically expressed to obtain a semantic feature representation of the text data, and information extraction is performed on the text data by using the semantic feature representation. Therefore, how to obtain a more accurate semantic feature representation to improve the information extraction effect is an urgent problem to be solved.

Disclosure of Invention

The application provides an information extraction device, an information extraction electronic device and a storage medium.

According to a first aspect of the present application, there is provided an information extraction method, including:

acquiring a feature vector of a text to be processed and a dependency relationship graph of the text to be processed;

generating a corresponding distance matrix according to the dependency relationship graph;

calculating the attention value between any two words in the text to be processed by adopting an attention mechanism according to the feature vector of the text to be processed and the distance matrix;

acquiring semantic feature representation of the text to be processed according to the feature vector of the text to be processed and the attention value between any two words in the text to be processed;

and extracting information of the text to be processed according to the semantic feature representation of the text to be processed.

According to a second aspect of the present application, there is provided an information extraction apparatus comprising:

the first acquisition module is used for acquiring a feature vector of a text to be processed and a dependency relationship graph of the text to be processed;

the generating module is used for generating a corresponding distance matrix according to the dependency relationship graph;

the calculation module is used for calculating an attention value between any two words in the text to be processed by adopting an attention mechanism according to the characteristic vector of the text to be processed and the distance matrix;

the second acquisition module is used for acquiring semantic feature representation of the text to be processed according to the feature vector of the text to be processed and the attention value between any two words in the text to be processed;

and the information extraction module is used for extracting the information of the text to be processed according to the semantic feature representation of the text to be processed.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.

According to a fifth aspect of the present application, a computer program product is provided, comprising a computer program, wherein the computer program, when executed by a processor, implements the steps of the method of the aforementioned first aspect.

According to the technical scheme, the dependency relationship information in the text to be processed can be coded in a coding display mode, and the distance matrix obtained after coding is introduced into an attention mechanism, so that each node can interact with more other nodes in the feature extraction process, and meanwhile, the nodes with closer distances can be concerned, and more accurate semantic feature representation can be obtained. The semantic feature representation is used for information extraction, and the information extraction effect can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a flowchart of an information extraction method according to an embodiment of the present application;

fig. 2a is a flowchart of another information extraction method provided in the embodiment of the present application;

FIG. 2b is an exemplary diagram of a dependency graph of a to-be-processed text according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for calculating an attention value between any two words in a text according to an embodiment of the present application;

FIG. 4 is a diagram illustrating an exemplary network architecture for an attention mechanism provided in an embodiment of the present application;

FIG. 5 is a flowchart of another method for calculating an attention value between any two words in a text according to an embodiment of the present application;

FIG. 6 is a diagram illustrating an example of a network architecture for another attention mechanism provided by an embodiment of the present application;

FIG. 7 is an exemplary graph of the output results for each layer in the attention mechanism provided by the embodiments of the present application;

fig. 8 is a block diagram of an information extraction apparatus according to an embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing the method of information extraction according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flowchart of an information extraction method according to an embodiment of the present application. As shown in fig. 1, the method may include, but is not limited to, the following steps.

In step 101, a feature vector of a text to be processed and a dependency graph of the text to be processed are obtained.

Alternatively, in the embodiment of the present application, the feature vector of the text to be processed may be understood as a text representation (or word vector) of the text to be processed.

In a possible implementation manner, the text to be processed may be processed by using a preset Word vector model Word2vec to obtain a text representation of the text to be processed, where the text representation of the text to be processed is a feature vector of the text to be processed. Or, other implementation manners may also be adopted to obtain the feature vector of the text to be processed, for example, a One-Hot (One-Hot) vector encoding technology or a Word Embedding (Word Embedding) technology is adopted to obtain the feature vector of the text to be processed, which is not specifically limited in this application and is not described again.

In an embodiment of the present application, the dependency graph of the to-be-processed text refers to a graph that can indicate dependencies between words in the to-be-processed text. The nodes in the dependency graph may be words in the text to be processed, and the edges in the dependency graph may be the corresponding nodes with dependency relationships.

In a possible implementation manner, a dependency parsing technique may be adopted to perform syntax analysis on a to-be-processed text to obtain a syntax analysis result of the to-be-processed text, and a dependency relationship diagram of the to-be-processed text is constructed according to the syntax analysis result. The words in the text to be processed are used as nodes, and the nodes are subjected to edge connection according to the dependency relationship among the words so as to construct the dependency relationship graph of the text to be processed.

In step 102, a corresponding distance matrix is generated from the dependency graph.

In the related art, an adjacency matrix is generally generated according to the connection relationship of nodes in a dependency relationship graph, and the adjacency matrix is introduced into a graph attention network for enhanced representation so as to obtain a semantic feature representation of a text to be processed. However, the inventors of the present application found through experiments that the features learned by using the adjacency matrix generated by this method are not good, because the dependency graph is sparse (for example, in a text of L words, the size of the adjacency matrix is L × L, and only L elements are not 0), which results in that the aggregated information of each node is limited when the features are updated, and thus the quality of the learned features is poor.

To alleviate this, in one possible implementation, corresponding adjacency matrices may be generated from the dependency graph (with directed and added reverse edges), and the adjacency matrices may be subjected to densification processing according to the position information of the nodes in the dependency graph in the text to be processed, so as to obtain the distance matrix.

Optionally, an adjacency matrix may be generated according to the connection relationship of the nodes in the dependency relationship graph, and the numerical value of the corresponding element position in the adjacency matrix is adjusted according to the position information of each node in the text to be processed, so as to obtain the distance matrix. Therefore, the dependency relationship information in the text is coded in a display coding mode, and the distance matrix obtained after coding is introduced into a semantization stage, so that more accurate semantic feature representation can be obtained.

In step 103, according to the feature vector and the distance matrix of the text to be processed, an attention mechanism is used to calculate an attention value between any two words in the text to be processed.

In the embodiment of the application, the dependency relationship information in the text is encoded in a display encoding mode, and the distance matrix obtained after encoding is introduced into an attention mechanism, so that each node can interact with more other nodes in the feature extraction process, and meanwhile, the nodes closer to each other can be concerned.

In step 104, semantic feature representation of the text to be processed is obtained according to the feature vector of the text to be processed and the attention value between any two words in the text to be processed.

In a possible implementation manner, a corresponding value vector may be determined according to a feature vector of a text to be processed, and a product operation is performed on the value vector and an attention value between any two words in the text to be processed, so as to obtain a semantic feature representation of the text to be processed.

As an example, a feature vector of a text to be processed may be converted into a value vector V, and an attention value between any two words in the text to be processed and a corresponding element in the value vector V may be subjected to weighting processing to obtain a semantic feature representation of the text to be processed.

In step 105, extracting information of the text to be processed according to the semantic feature representation of the text to be processed.

In one possible implementation manner, entity recognition may be performed on the text to be processed based on the semantic feature representation to obtain recognized entity information in the text to be processed. Optionally, the task identified based on the named entity performs decoding processing on the semantic feature representation of the text to be processed, predicts the probability distribution of each word at each prediction time step, takes the word with the maximum probability as the predicted word, and obtains the identified entity information in the text to be processed through analysis.

By implementing the embodiment of the application, the dependency relationship information in the text to be processed can be coded in a coding display mode, and the distance matrix obtained after coding is introduced into the attention mechanism, so that each node can interact with more other nodes in the feature extraction process, and meanwhile, the nodes with closer distances can be concerned, and more accurate semantic feature representation can be obtained. The semantic feature representation is used for information extraction, so that the information extraction effect can be improved.

Fig. 2a is a flowchart of another information extraction method according to an embodiment of the present disclosure. As shown in fig. 2a, the method may include, but is not limited to, the following steps.

In step 201, a feature vector of the text to be processed and a dependency relationship diagram of the text to be processed are obtained.

In the embodiment of the present application, step 201 may be implemented by using any one of the embodiments of the present application, which is not limited herein and is not described in detail herein.

In step 202, a corresponding adjacency matrix is generated from the dependency graph.

Alternatively, the adjacency matrix may be generated according to the connection relationship of the nodes in the dependency relationship graph.

In step 203, according to the position information of the node in the dependency relationship graph in the text to be processed, the adjacency matrix is subjected to densification processing to obtain a distance matrix.

Optionally, the numerical value of the corresponding element position in the adjacency matrix may be adjusted according to the position information of each node in the text to be processed, so as to obtain the distance matrix. In one possible implementation manner, the implementation manner of step 203 may include the following steps:

step 203a, determining the position information of the word corresponding to the node in the dependency relationship graph in the text to be processed.

In one possible implementation manner, position information of words corresponding to each node in the dependency relationship graph in the text to be processed may be determined.

Step 203b, under the condition that the dependency relationship does not exist between the two nodes in the dependency relationship graph, determining the distance value of the word corresponding to each of the two nodes in the text to be processed according to the position information, and setting the distance value used for representing the distance between the two nodes in the adjacency matrix as the distance value of the word corresponding to each of the two nodes in the text to be processed.

In step 203c, in the case that there is a dependency relationship between two nodes in the dependency relationship graph, a distance value used for representing the distance between the two nodes in the adjacency matrix is set as a first value to obtain a distance matrix.

For example, assume that the text to be processed is "word 1, word 5, word 6, word2, word 3, word 4.", as shown in FIG. 2b, is the dependency graph of the text to be processed. An adjacency matrix may be generated from the connection relationship of the nodes in the dependency relationship graph, in which the element corresponding to the node having the dependency relationship in the adjacency matrix is 1, and the elements corresponding to the other nodes not having the dependency relationship are 0, as shown in the following formula (1). Determining the position information of the word corresponding to the node in the dependency relationship graph in the text to be processed, if the dependency relationship exists between two nodes, setting the distance between the two nodes as 1, otherwise, setting the distance between the two nodes as the distance between the two nodes in the text, as shown in the following formula (2), adjusting the elements in the formula (1) by using the position information of the word corresponding to the node in the dependency relationship graph in the text to be processed, and obtaining the distance matrix. Wherein, the rows and columns in the formula (1) and the formula (2) are respectively: "word 1", "word 2", "word 3", "word 4", "word 5", "word 6", ". ".

In step 204, an attention mechanism is used to calculate an attention value between any two words in the text to be processed according to the feature vector and the distance matrix of the text to be processed.

In the embodiment of the present application, step 204 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

In step 205, semantic feature representation of the text to be processed is obtained according to the feature vector of the text to be processed and the attention value between any two words in the text to be processed.

In the embodiment of the present application, step 205 may be implemented by using any one of the embodiments of the present application, which is not limited herein and is not described in detail herein.

In step 206, the information of the text to be processed is extracted according to the semantic feature representation of the text to be processed.

In the embodiment of the present application, step 206 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

By implementing the embodiment of the application, the dependency relationship information in the text is coded in a display coding mode, and the distance matrix obtained after coding is introduced into the attention mechanism, so that each node can interact with more other nodes in the feature extraction process, and meanwhile, the nodes with closer distances can be noticed.

Fig. 3 is a flowchart of a method for calculating an attention value between any two words in a text according to an embodiment of the present application. As shown in fig. 3, the method may include, but is not limited to, the following steps.

In step 301, a query vector and a key vector are determined according to a feature vector of a text to be processed.

Alternatively, the feature vector of the text to be processed may be converted into a query vector Q and a key vector K in the attention mechanism.

In step 302, a double affine calculation is performed according to the query vector and the key vector to obtain a first intermediate matrix, and a matrix multiplication operation is performed on the query vector and the key vector to obtain a second intermediate matrix.

In step 303, the data in the first intermediate matrix is modified according to the distance matrix to obtain a third intermediate matrix.

In one possible implementation, the data in the first intermediate matrix is divided by the data at the corresponding position in the distance matrix to obtain a third intermediate matrix.

In step 304, the second intermediate matrix and the third intermediate matrix are subjected to a matrix addition operation to obtain a fourth intermediate matrix.

In step 305, scaling, masking and regression function Softmax operations are performed on the fourth intermediate matrix to obtain a value of attention between any two words in the text to be processed.

For example, as shown in fig. 4, the network structure of the attention mechanism adopted for the present application includes a Biaffine mathul module, a Distance Scaling module, an Add module, a Scaling module, a Mask module, a Softmax module, and a Matmul module. The calculation formula of the Biaffine Matmul module is shown as follows.

Wherein

And &>

Representing an edge r between a level i node and a node j _ij Corresponding double affine parameters. />

Table elements in the first intermediate matrix. Through double affine calculation, side information is merged in and interaction is carried out with node information.

Output of Biaffine Matmul module

And inputting a Distance Scaling module. Paired->

A modification is made to obtain a third intermediate matrix. The calculation formula of the Distance Scaling module is shown as follows.

Wherein d is _ij Is the distance between node i and node j, i.e. in a distance matrixAnd (4) elements.

Then, the output result of the Distance Scaling module and the output result of the Matmul module are processed by the Add module

And adding, wherein the calculation formula of the Add module is expressed as follows.

And then, carrying out scaling, masking and regression function Softmax operation on the output result of the Add module to obtain the attention value between any two words in the text to be processed. It is understood that the calculation method behind the Add module shown in fig. 4 is the same as the calculation method of attention in the related art, and is not described in detail here.

By implementing the embodiment of the application, the dependency relationship information in the text is coded in a display coding mode, and the distance matrix obtained after coding is introduced into the attention mechanism, so that each node can interact with more other nodes in the feature extraction process, and meanwhile, the nodes with closer distances can be focused.

Fig. 5 is a flowchart of another method for calculating an attention value between any two words in a text according to an embodiment of the present application. As shown in fig. 5, the method may include, but is not limited to, the following steps.

In step 501, a query vector and a key vector are determined according to a feature vector of a text to be processed.

Optionally, the feature vector of the text to be processed may be converted into a query vector Q and a key vector K in the attention mechanism.

In step 502, a double affine calculation is performed according to the query vector and the key vector to obtain a first intermediate matrix, and a matrix multiplication operation is performed on the query vector and the key vector to obtain a second intermediate matrix.

In step 503, a third intermediate matrix is obtained by performing a matrix addition operation on the first intermediate matrix and the second intermediate matrix.

In step 504, scaling, masking and regression functions Softmax are performed on the third intermediate matrix to obtain a corresponding attention matrix.

In step 505, the data in the attention matrix is modified and normalized according to the distance matrix to obtain the attention value between any two words in the text to be processed.

In a possible implementation manner, the data in the attention matrix and the data at the corresponding position in the distance matrix are subjected to division operation to obtain a fourth intermediate matrix, and the data in the fourth intermediate matrix is subjected to normalization processing to obtain the attention value between any two words in the text to be processed.

For example, as shown in fig. 6, a network structure of an attention mechanism adopted in the present application includes a Biaffine Matmul (dual affine matrix multiplication) module, a Distance Scaling (Distance Scaling) module, an Add (addition) module, a Scaling module, a Mask (opt) module, a Softmax (regression function) module, and a Matmul (matrix multiplication) module. The calculation formula of the Biaffine Matmul module is shown as follows.

Wherein

And &>

Table elements in the first intermediate matrix. And integrating the side information and interacting with the node information through double affine calculation.

Output of Biaffine Matmul module

The Add module is input. Adding the result of the Biaffine Matmul module and the result of the Matmul module through an Add module, wherein the calculation formula of the Add module can be expressed as follows:

then, scaling, masking and regression function Softmax operation are carried out on the output result of the Add module to obtain a corresponding attention matrix.

It is noted that the attention network structure shown in fig. 6 is different from the attention network structure shown in fig. 4 in that the Distance Scaling module follows the Softmax module, but since this step requires the output of the result after normalization, the calculation is somewhat different from that of the Distance Scaling module shown in fig. 4. The calculation formula of the Distance Scaling module is shown as follows.

That is, after passing through the softmax operation, the result of the softmax operation may be compared

And inputting the data into a Distance Scaling module for calculation. The result of the softmax operation is/is asserted by the Distance Scaling module>

The sum of the sum and the element in the distance matrix is divided, and the normalization processing is carried out by using the formula (8) to obtain the result after normalization>

I.e. the attention value between node i and node j.

It should be noted that, in the embodiment of the present application, for the attention mechanism, as shown in fig. 7, dense Connection (Dense Connection) may be used for Connection between each layer in the attention mechanism, that is:

H _i ＝[H ₁ ；H ₂ ；…；H _i-1 ] (9)

wherein H _i Represents the output of the ith layer; indicating a splicing operation. Therefore, the attention network with dense connection is used, information interaction among multiple layers is increased, a gradient back propagation mode is added, the network is easier to train, and the problem that the gradient disappears can be relieved if the number of layers is increased.

Fig. 8 is a block diagram of an information extraction apparatus according to an embodiment of the present application. As shown in fig. 8, the information extracting apparatus may include: a first obtaining module 801, a generating module 802, a calculating module 803, a second obtaining module 804 and an information extracting module 805.

The first obtaining module 801 is configured to obtain a feature vector of a text to be processed and a dependency relationship diagram of the text to be processed.

The generating module 802 is configured to generate a corresponding distance matrix according to the dependency graph. In a possible implementation, the generating module 802 is specifically configured to: generating a corresponding adjacency matrix according to the dependency relationship graph; and performing densification processing on the adjacency matrix according to the position information of the nodes in the dependency relationship graph in the text to be processed to obtain a distance matrix.

Optionally, in an implementation manner, the generating module 802 is specifically configured to: determining the position information of the words corresponding to the nodes in the dependency relationship graph in the text to be processed; under the condition that the dependency relationship does not exist between two nodes in the dependency relationship graph, determining the distance value of a word corresponding to each of the two nodes in the text to be processed according to the position information, and setting the distance value used for representing the distance between the two nodes in the adjacency matrix as the distance value of the word corresponding to each of the two nodes in the text to be processed; in the case where there is a dependency relationship between two nodes in the dependency relationship diagram, a distance value representing the distance between the two nodes in the adjacency matrix is set to a first value to obtain a distance matrix.

The calculating module 803 is configured to calculate an attention value between any two words in the text to be processed by using an attention mechanism according to the feature vector and the distance matrix of the text to be processed.

In a possible implementation manner, the calculation module 803 is specifically configured to: determining a query vector and a key vector according to the feature vector of the text to be processed; carrying out double affine calculation according to the query vector and the key vector to obtain a first intermediate matrix, and carrying out matrix multiplication operation on the query vector and the key vector to obtain a second intermediate matrix; correcting data in the first intermediate matrix according to the distance matrix to obtain a third intermediate matrix; performing matrix addition operation on the second intermediate matrix and the third intermediate matrix to obtain a fourth intermediate matrix; and carrying out scaling, masking and regression function Softmax operation on the fourth intermediate matrix to obtain the attention value between any two words in the text to be processed.

Optionally, the calculating module 803 performs a division operation on the data in the first intermediate matrix and the data at the corresponding position in the distance matrix to obtain a third intermediate matrix.

In another possible implementation manner, the calculation module 803 is specifically configured to: determining a query vector and a key vector according to the feature vector of the text to be processed; carrying out double affine calculation according to the query vector and the key vector to obtain a first intermediate matrix, and carrying out matrix multiplication operation on the query vector and the key vector to obtain a second intermediate matrix; performing matrix addition operation on the first intermediate matrix and the second intermediate matrix to obtain a third intermediate matrix; carrying out scaling, masking and regression function Softmax operation on the third intermediate matrix to obtain a corresponding attention matrix; and correcting and normalizing the data in the attention matrix according to the distance matrix to obtain the attention value between any two words in the text to be processed.

Optionally, the calculating module 803 performs a division operation on the data in the attention matrix and the data at the corresponding position in the distance matrix to obtain a fourth intermediate matrix, and performs a normalization process on the data in the fourth intermediate matrix to obtain the attention value between any two words in the text to be processed.

The second obtaining module 804 is configured to obtain semantic feature representation of the text to be processed according to the feature vector of the text to be processed and an attention value between any two words in the text to be processed. In a possible implementation manner, the second obtaining module 804 determines a value vector according to the feature vector of the text to be processed, and performs a product operation on the value vector and an attention value between any two words in the text to be processed to obtain a semantic feature representation of the text to be processed.

The information extraction module 805 is configured to perform information extraction on the text to be processed according to the semantic feature representation of the text to be processed. In one possible implementation, the information extraction module 805 performs entity identification on the text to be processed based on the semantic feature representation to obtain identified entity information in the text to be processed.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 9, it is a block diagram of an electronic device according to the method for information extraction in the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 9, the electronic apparatus includes: one or more processors 901, memory 902, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of the GU I on an external input/output device (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 9 illustrates an example of a processor 901.

Memory 902 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the information extraction method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the information extraction method provided by the present application.

The memory 902, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the information extraction method in the embodiments of the present application. The processor 901 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 902, that is, implements the information extraction method in the above method embodiment.

The memory 902 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 902 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device may further include: an input device 903 and an output device 904. The processor 901, the memory 902, the input device 903 and the output device 904 may be connected by a bus or other means, and fig. 9 illustrates the connection by a bus as an example.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 904 may include a display device, auxiliary lighting devices (e.g., LEDs), tactile feedback devices (e.g., vibrating motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An information extraction method, comprising:

2. The method of claim 1, wherein the generating a corresponding distance matrix from the dependency graph comprises:

generating a corresponding adjacency matrix according to the dependency relationship graph;

and performing densification processing on the adjacency matrix according to the position information of the nodes in the dependency relationship graph in the text to be processed to obtain the distance matrix.

3. The method as claimed in claim 2, wherein the performing a densification process on the adjacency matrix according to the position information of the node in the dependency graph in the text to be processed to obtain the distance matrix includes:

determining the position information of the words corresponding to the nodes in the dependency relationship graph in the text to be processed;

under the condition that the dependency relationship does not exist between two nodes in the dependency relationship graph, determining the distance value of a word corresponding to each of the two nodes in the text to be processed according to the position information, and setting the distance value used for representing the distance between the two nodes in the adjacency matrix as the distance value of the word corresponding to each of the two nodes in the text to be processed;

in the case that the dependency relationship exists between two nodes in the dependency relationship graph, setting a distance value used for representing the distance between the two nodes in the adjacency matrix as a first numerical value to obtain the distance matrix.

4. The method of claim 1, wherein the calculating the attention value between any two words in the text to be processed by adopting an attention mechanism according to the feature vector of the text to be processed and the distance matrix comprises:

determining a query vector and a key vector according to the feature vector of the text to be processed;

carrying out double affine calculation according to the query vector and the key vector to obtain a first intermediate matrix, and carrying out matrix multiplication operation on the query vector and the key vector to obtain a second intermediate matrix;

correcting data in the first intermediate matrix according to the distance matrix to obtain a third intermediate matrix;

performing matrix addition operation on the second intermediate matrix and the third intermediate matrix to obtain a fourth intermediate matrix;

and carrying out scaling, masking and regression function Softmax operation on the fourth intermediate matrix to obtain the attention value between any two words in the text to be processed.

5. The method of claim 4, wherein the modifying the data in the first intermediate matrix according to the distance matrix to obtain a third intermediate matrix comprises:

and performing division operation on the data in the first intermediate matrix and the data at the corresponding position in the distance matrix to obtain a third intermediate matrix.

6. The method of claim 1, wherein the calculating the attention value between any two words in the text to be processed by adopting an attention mechanism according to the feature vector of the text to be processed and the distance matrix comprises:

performing matrix addition operation on the first intermediate matrix and the second intermediate matrix to obtain a third intermediate matrix;

carrying out scaling, masking and regression function Softmax operation on the third intermediate matrix to obtain a corresponding attention matrix;

and correcting and normalizing the data in the attention matrix according to the distance matrix to obtain the attention value between any two words in the text to be processed.

7. The method of claim 6, wherein the modifying and normalizing the data in the attention matrix according to the distance matrix to obtain the attention value between any two words in the text to be processed comprises:

performing division operation on data in the attention matrix and data at a corresponding position in the distance matrix to obtain a fourth intermediate matrix;

and carrying out normalization processing on the data in the fourth intermediate matrix to obtain an attention value between any two words in the text to be processed.

8. The method of any one of claims 1 to 7, wherein the obtaining of the semantic feature representation of the text to be processed according to the feature vector of the text to be processed and the attention value between any two words in the text to be processed comprises:

determining a value vector according to the feature vector of the text to be processed;

and performing product operation on the value vector and the attention value between any two words in the text to be processed to obtain semantic feature representation of the text to be processed.

9. The method of any one of claims 1 to 7, wherein the extracting information of the text to be processed according to the semantic feature representation of the text to be processed comprises:

and performing entity identification on the text to be processed based on the semantic feature representation to obtain identified entity information in the text to be processed.

10. An information extraction apparatus comprising:

the calculation module is used for calculating the attention value between any two words in the text to be processed by adopting an attention mechanism according to the feature vector of the text to be processed and the distance matrix;

11. The apparatus of claim 10, wherein the generating module is specifically configured to:

12. The apparatus of claim 11, wherein the generation module is specifically configured to:

when the dependency relationship does not exist between two nodes in the dependency relationship graph, determining distance values of words corresponding to the two nodes in the text to be processed according to the position information, and setting the distance values used for representing the distance between the two nodes in the adjacency matrix as the distance values of the words corresponding to the two nodes in the text to be processed;

13. The apparatus of claim 10, wherein the computing module is specifically configured to:

14. The apparatus of claim 13, wherein the computing module is specifically configured to:

15. The apparatus of claim 10, wherein the computing module is specifically configured to:

16. The apparatus of claim 15, wherein the computing module is specifically configured to:

and carrying out normalization processing on the data in the fourth intermediate matrix to obtain the attention value between any two words in the text to be processed.

17. The apparatus according to any one of claims 10 to 16, wherein the second obtaining module is specifically configured to:

18. The apparatus according to any one of claims 10 to 16, wherein the information extraction module is specifically configured to:

19. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 9.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1 to 9.

21. A computer program product comprising a computer program, wherein the computer program realizes the steps of the method of any one of claims 1 to 9 when executed by a processor.