CN114064852A

CN114064852A - Method and device for extracting relation of natural language, electronic equipment and storage medium

Info

Publication number: CN114064852A
Application number: CN202111230984.9A
Authority: CN
Inventors: 嵇望; 朱鹏飞; 梁青; 陈默; 安毫亿; 王伟凯; 张一驰
Original assignee: Hangzhou Yuanchuan New Technology Co ltd
Current assignee: Hangzhou Yuanchuan New Technology Co ltd
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-02-18

Abstract

The application relates to a method and a device for extracting relation of natural language, electronic equipment and a storage medium, belonging to the technical field of natural language processing, wherein the method comprises the following steps: identifying an entity from the original sentence through an entity identification model; analyzing absolute position information of the entity and relative position information between each word and the entity; calculating a position vector of each word; performing word vectorization representation on an original sentence to obtain a word vector of each word, and splicing the word vector and a position vector to obtain a fusion vector; inputting the fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common causal convolution and a void causal convolution; sending the extracted features into an attention layer, inputting the features output by the attention layer into a pyramid pooling layer, inputting the result output by the pyramid pooling layer into a full-link layer, and outputting the relationship category through Softmax. The embodiment of the application is improved from multiple angles, and the performance of the whole model is improved, so that the accuracy of the model prediction result is improved.

Description

Method and device for extracting relation of natural language, electronic equipment and storage medium

Technical Field

The present application relates to the field of natural language processing technologies, and in particular, to a method and an apparatus for extracting a relationship of a natural language, an electronic device, and a storage medium.

Background

The information extraction technology is a research focus and a hotspot of natural language processing, and the research hotspot mainly comprises entity extraction, relation extraction and event extraction, and has great research value in machine translation, knowledge map construction, question-answering system, intelligent search and other applications.

The information extraction can convert unstructured massive data into triple structured data for storage, two key technologies of the information extraction are entity extraction and relationship extraction, and the information extraction is a main form of the information extraction based on a pipeline mode, namely, the concrete relationship types between all entity pairs which possibly have relationships are further judged by modeling text information on the basis of entity identification, so that effective semantic knowledge between the entities is automatically extracted.

Because indexes such as accuracy of supervised training are obviously improved compared with an unsupervised method, the supervised method is mostly adopted for relation extraction of the current mainstream. Currently, the most commonly used models for the entity relationship extraction task include a Convolutional Neural Network (CNN), a Recurrent Neural Network (RNN), a Long-Term Memory Network (LSTM), and the like. The RNN-based relation extraction model has the disadvantages that each node has a word vector and a matrix, too many parameters need to be trained, the increase of the calculated amount and the calculated time can be caused along with the increase of the time span and the increase of the network layer depth, the situations of over-gradient explosion and gradient disappearance easily occur, the stability of the visible model is poor, and the position information of two entities in sentences is not considered; the CNN-based relation extraction model has the disadvantages that a convolution kernel is usually not selected too much, and long sentences cannot be modeled and long-distance semantic information cannot be embodied due to the fact that a time sequence cannot be modeled, so that the long-distance dependence relation of two entities is solved; the LSTM-based relational extraction model has the disadvantages that parallel calculation cannot be realized, the operation speed cannot be increased by means of a GPU, and gradient disappearance still occurs along with the increase of the time sequence length and the depth of a neural network. Therefore, the prior art relation extraction model has poor performance, which can result in low accuracy of model prediction.

Disclosure of Invention

In a first aspect, an embodiment of the present application provides a method for extracting a relationship in a natural language, including: identifying an entity from the original sentence through an entity identification model; analyzing absolute position information of the entity and relative position information between each word in the original sentence and the entity; calculating a position vector of each word according to the absolute position information, the relative position information and the length information of the original sentence; performing word vectorization representation on the original sentence to obtain a word vector of each word, and splicing the word vector and the position vector to obtain a fusion vector; inputting the fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common causal convolution and a void causal convolution; sending the extracted features into an attention layer, inputting the features output by the attention layer into a pyramid pooling layer, and performing down-sampling on the features through pyramid pooling to obtain dimension reduction features; inputting the dimension reduction features into a full connection layer, and outputting the relation category through Softmax.

In some embodiments, said calculating a position vector for each word based on said absolute position information, said relative position information, and said length information of said original sentence comprises: calculating a relative distance value of each word and the entity according to the absolute position information and the relative position information; if the relative distance value is an even number, sinusoidal position coding is used, and the position vector calculation formula of the word is as follows:

PE(pos，2i)＝sin(pos/1000^{2i/len(sentence)})

if the relative distance value is an odd number, cosine position coding is used, and a position vector calculation formula of the word is as follows:

PE(pos，2i+1)＝cos(pos/1000^{2i+1/len(sentence)})

wherein PE is the position vector, pos is the absolute position information of the entity in the original sentence, i is an integer greater than 0, 2i and 2i +1 respectively represent an even relative distance value and an odd relative distance value, and len (content) represents length information of the original sentence.

In some embodiments, the performing word vectorization on the original sentence to obtain a word vector of each word includes: performing word vectorization representation on the original sentence through a BERT model to obtain a dynamic word vector; performing Word vectorization representation on the original statement through a Word2vec model to obtain a static Word vector, wherein the dimensions of the static Word vector and the dynamic Word vector are kept consistent; the word vector is calculated by the following formula:

vector＝weight_1*vector_bert+weight_2*vector_word2vec

wherein vector is the word vector, vector_bertRepresenting said dynamic word vector, vector_word2vecRepresenting the static word vector, wherein weight _1 and weight _2 respectively represent a preset first weight value and a preset second weight value, and the sum of the first weight value and the second weight value is 1.

In some embodiments, the splicing the word vector and the position vector to obtain a fusion vector includes: setting the dimensionality of the position vector as m and the dimensionality of the word vector as n; for the original sentence with length L, the dimension of the transformed fusion vector is [ L, N ], where N is N + m.

In some of these embodiments, the common causal convolution includes at least two different convolution kernels in each dimension.

In some embodiments, the inputting the fused vector into a hybrid convolutional layer for feature extraction includes: extracting sentence local features through the common cause and effect convolution; extracting long-distance sentence features through the hole causal convolution; and the sentence local feature and the sentence long-distance feature are subjected to an activation function to obtain a nonlinear feature which is used as the extracted feature.

In some of these embodiments, after said entering the extracted features into the attention layer, the method further comprises: and (4) dividing the features generated by different convolution kernels with the same dimension into important degrees by utilizing a channel attention domain, and giving weights to the features.

In a second aspect, an embodiment of the present application provides a relationship extraction apparatus for natural language, including:

the recognition module is used for recognizing an entity from the original sentence through the entity recognition model;

the analysis module is used for analyzing the absolute position information of the entity and the relative position information between each word in the original sentence and the entity;

the calculation module is used for calculating the position vector of each word according to the absolute position information, the relative position information and the length information of the original statement;

the fusion module is used for performing word vectorization representation on the original sentence to obtain a word vector of each word, and splicing the word vector and the position vector to obtain a fusion vector;

the extraction module is used for inputting the fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common causal convolution and a cavity causal convolution; sending the extracted features into an attention layer, inputting the features output by the attention layer into a pyramid pooling layer, and performing down-sampling on the features through pyramid pooling to obtain dimension reduction features; inputting the dimension reduction features into a full connection layer, and outputting the relation category through Softmax.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform any one of the methods described above.

In a fourth aspect, an embodiment of the present application provides a storage medium, in which a computer program is stored, where the computer program is configured to execute any one of the methods described above when the computer program runs.

Compared with the convolutional neural network in the related art, the convolutional neural network cannot utilize position information, in the feature construction stage, the relative position information and the absolute position information are utilized, the length information of an original statement is utilized, and a position vector is designed by utilizing sine and cosine position coding; meanwhile, the performance of the whole model can be improved by introducing the fusion vector; the local features and the long-distance features of the sentences are extracted by utilizing a mixed convolution layer (comprising common causal convolution and cavity causal convolution); in addition, the traditional pooling cannot output feature vectors with fixed dimensions, so that input texts need to be filled and truncated at an input end, the length of original sentences is damaged, and unnecessary redundant information is introduced in the model training process. Therefore, the embodiment of the application is improved from multiple angles, and the performance of the whole model is improved, so that the accuracy of the model prediction result is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a method of relationship extraction for natural language according to an embodiment of the present application;

FIG. 2 is a flow chart of an exemplary method for extracting relationships in natural language according to an embodiment of the present application;

FIG. 3 is a sample of a generic causal convolution with a convolution kernel height of 2 according to an embodiment of the present application;

FIG. 4 is a sample of a generic causal convolution with a convolution kernel height of 3 according to an embodiment of the present application;

FIG. 5 is a sample hole causal convolution with a convolution kernel height of 7 according to an embodiment of the present application

FIG. 6 is a schematic representation of a channel attention domain according to an embodiment of the present application;

FIG. 7 is a diagram illustrating a relationship between a hybrid convolutional layer and a pyramid pooling layer according to an embodiment of the present application;

fig. 8 is a block diagram showing a configuration of a relationship extraction apparatus for natural language according to an embodiment of the present application;

fig. 9 is an internal structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

Fig. 1 is a flowchart of a method for extracting a relationship of a natural language according to an embodiment of the present application, as shown in fig. 1, the method including the steps of:

s101: identifying an entity from the original sentence through an entity identification model;

s102: analyzing absolute position information of the entity and relative position information between each word and the entity in the original sentence;

s103: calculating a position vector of each word according to the absolute position information, the relative position information and the length information of the original sentence;

s104: performing word vectorization representation on an original sentence to obtain a word vector of each word, and splicing the word vector and a position vector to obtain a fusion vector;

s105: inputting the fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common causal convolution and a void causal convolution;

s106: sending the extracted features into an Attention (Attention) layer, inputting the features output by the Attention layer into a pyramid pooling layer, and performing down-sampling on the features through pyramid pooling to obtain dimension-reduced features;

s107: and inputting the dimension reduction features into the full connection layer, and outputting the relation category through Softmax.

Thus, the overall model of the embodiments of the present application includes an entity identification model, a hybrid convolution layer (including a normal causal convolution and a hole causal convolution), an attention layer, a pyramid pooling layer, and a full-link layer, and may further include a BERT model and a Word2vec model in some embodiments. The embodiment of the application is improved from multiple angles, and the performance of the whole model is improved, so that the accuracy of the model prediction result is improved.

As an example, step S104 includes: performing word vectorization representation on an original sentence through a BERT model to obtain a dynamic word vector; performing Word vectorization representation on an original sentence through a Word2vec model to obtain a static Word vector; the dynamic word vector and the static word vector are multiplied by different weights respectively and then added to obtain the word vector in step S104.

As an example, the above-mentioned common causal convolution contains at least two different convolution kernels in each dimension, and step S105 includes: extracting sentence local features through common cause and effect convolution; extracting long-distance sentence features through a hole causal convolution; the sentence local features and the sentence long-distance features are subjected to an activation function to obtain nonlinear features, and the nonlinear features are used as the extracted features in the step S106, so that the learning capability of the model can be enhanced.

In order to more clearly illustrate the examples of the present application, a more detailed and complete process is schematically illustrated below.

It should be noted that, first, the entity category and the relationship category need to be defined, and then the relationship extraction process is executed. Fig. 2 is a flowchart illustrating a method for extracting a relationship in a natural language according to an embodiment of the present application, where the method includes the following steps, as shown in fig. 2:

s201: the original sentence is input.

S202: and identifying the entity from the input original sentence through an entity identification model, and obtaining absolute position information of the entity.

S203: performing Word vectorization representation on an input original sentence through a BERT model and a Word2vec model respectively to obtain two Word vectors (namely the dynamic Word vector and the static Word vector), and combining the two Word vectors of the same sentence after multiplying the two Word vectors by corresponding weights respectively into one Word vector, wherein it needs to be noted that the dimensions of the two Word vectors are consistent, and the calculation mode of the combined Word vector is as follows:

wherein, vector_bertRepresenting dynamic word vectors, vector_{word 2vec}Representing static word vectors, and weight _1 and weight _2 respectively representing a preset first weight value and a preset second weight value.

S204: calculating the relative position information between each word in the input original sentence and the entity according to the absolute position information of the entity;

s205: calculating the position vector of each word in a sine and cosine position coding mode according to the relative position information, the absolute position information and the length information of the original sentence; and splicing the word vector and the position vector of each word to obtain a fusion vector. In the step, the relative distance value of each word and the entity needs to be calculated, and if the relative distance value is an even number, sinusoidal position coding is used; if the relative distance value is odd, cosine position coding is used, and the specific calculation formula of the position vector PE is as follows:

pos is absolute position information of an entity in an original statement, i is an integer larger than 0, 2i and 2i +1 are relative distance values, 2i represents an even number, 2i +1 represents an odd number, and len (content) represents length information of the original statement.

For example, the original sentence is "wang XX created XX group in the world in 1998", and assuming that wang XX is identified as a name entity and XX group is identified as a company entity, the position vector calculation method of "in" word is as follows:

defining each entity as a whole, that is, if the length of the entity is defined as 1, the length of the whole sentence is 11, the absolute position of "wang XX" in the sentence is 1, the absolute position of "XX clique" in the sentence is 11, "the relative distance from" wang XX "is 6 (even), the relative distance from" XX clique "is 5 (odd), and the position vector of" at "is:

[PE(1,6)＝sin(1/1000^6/11)，PE(11,5)＝cos(11/1000^5/11)]。

from the above description, the position vector of the "at" word is 2-dimensional, and assuming that the word vector dimension of the "at" word is N, the dimension N of the fused vector of the "at" word is 2+ N.

Thus, for an original sentence with a length L, assuming that the dimension of the position vector of each word is m and the dimension of the word vector is N, the dimension of the transformed fused vector of the original sentence is [ L, N ], where N is N + m.

S206: the method comprises the steps of sending a fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common cause-effect convolution and a cavity cause-effect convolution, extracting sentence local features by using the common cause-effect convolution, increasing a receptive field by using the cavity cause-effect convolution, and extracting long-distance features of a sentence.

The mixed convolution layer comprises a mixed convolution and an activation function, the mixed convolution comprises a common causal convolution and a void causal convolution, the result obtained by the calculation of the mixed convolution is represented by nonlinear characteristics obtained by the activation function to serve as the output of the mixed convolution layer, and the nonlinear characteristics are introduced into the network, so that the learning capacity of the network can be enhanced.

For normal causal convolution, the convolution kernel in each dimension includes m (m ≧ 2) different convolution kernels, which is illustrated in this embodiment by m ═ 2, the Height (Height) of the convolution kernel is [2,3,7], and the Width (Width) of the convolution kernel is equal to dimension N of the fusion vector, fig. 3 is a normal causal convolution sample with a convolution kernel Height of 2 according to an embodiment of the present application, fig. 4 is a normal causal convolution sample with a convolution kernel Height of 3 according to an embodiment of the present application, and fig. 5 is a hole causal convolution sample with a convolution kernel Height of 7 according to an embodiment of the present application.

The essence of the common causal convolution and the void causal convolution is one-dimensional convolution, which can only slide up and down, and the moving step length is 1. Referring to fig. 3, 4 and 5, the fusion vector passes through convolution kernels with Height size [2,3,7] and Width N, because parameters in two convolution kernels with the same dimension are different, the convolution kernels with different dimensions can respectively generate features with dimensions [ M-1,1], [ M-2,1] and [ M-6,1], and all the features with different dimensions have M ═ 2, so that three groups of features [ M-1,1,2], [ M-2,1,2] and [ M-6,1, 2] can be extracted by hybrid convolution by calculation, wherein [ M-1,1,2], [ M-2,1,2] represents local features and [ M-6,1, 2] represents long distance features.

It should be noted that, if the sample sentences all belong to the level of long text or paragraphs, before the convolution layer is mixed, the common convolution and pooling can be used to perform feature extraction and feature dimension reduction on the sentences first.

S207: and inputting the feature representation output by the mixed convolutional layer into an attention layer, learning the importance degree of each element from the sequence by using an attention mechanism, and then merging the elements according to the importance degree to obtain semantic codes.

Through feature extraction of the hybrid convolutional layer, it is equivalent to using different convolutional kernels to generate different channel information, so this embodiment uses a channel attention domain, and fig. 6 is an expression diagram of a channel attention domain according to an embodiment of the present application, where such attention domains are mainly distributed in different channels, and are expressed by different attention degrees of channel information generated for different convolutional kernels. Referring to fig. 6, X, U, X' all represent feature matrices, N represents the dimension of the fusion vector, and H represents the sentence length; f_trIn the representation of ordinary convolution operations, F_sq(,) denotes the Squeeze operation, i.e. global average pooling, F_ex(., W) represents an Excitation operation, including a full connection layer and sigmoid function, outputting the importance of each feature channel, F_scale(.,) indicates the operation of Reweight, i.e., the value output by the Excitation operation is channel-by-channel weighted onto the previous feature by multiplication.

And generating a feature map of [ H,1,2] by a convolution kernel with one dimension, wherein H represents high (Height), an effective channel attention domain corresponds to a matrix of [1,1,2], each position is a weight for all pixels of a channel (channel) corresponding to the original feature map, and channel-wise multiplication is carried out during calculation. Thus, the weight of each channel should be different, important channels should be noted, and unimportant channels should be ignored. The deep network method can be applied, for example, a neural network is used to train a parameter of 1 × C, where C is the number of channel layers, and multiplied by the original channel, and as a result, each channel is defined by the weight to be more important, and the weight is more important.

S208: and inputting the semantic coding representation output by the attention layer into the pyramid pooling layer, and performing down-sampling on the features through pyramid pooling to obtain the dimension reduction features.

According to the description of step S206, the feature dimensions of the output sentence after the fusion vector is subjected to the hybrid convolution and the activation function are [ M-1,1,2], [ M-2,1,2] and [ M-6,1, 2], which is equivalent to dividing a sentence into three groups of features. If a certain layer of the pyramid needs to output x × 1 features, pooling is needed by using windows size ═ W, h/x ], wherein W is equal to the dimension N of the fusion vector; h represents the sentence length, corresponding to H. Fig. 7 is a schematic diagram of a relationship between a hybrid convolutional layer and a pyramid pooling layer according to an embodiment of the present application, where as shown in fig. 7, features are arranged from large to small according to a first dimension, each layer of a feature vector (each group of features) with different dimensions is uniformly divided into 8 blocks, 4 blocks, and 1 block according to the size of the first dimension, the maximum value of each block is respectively calculated, so as to obtain an output neuron, when the length of the feature vector is insufficient, rounding is performed upwards, and an edge filling (padding) operation is performed on the insufficient length, so that the feature dimension output through the pyramid pooling layer is [13,2 ].

When there are many layer networks, if the network inputs a long sentence (paragraph level), the convolution and pooling can be performed until the last few layers of the network, that is, when the network is to be connected with a full connection layer, the pyramid pooling is performed, so that sentences of any size can be converted into feature vectors of fixed size.

It should be noted that parameters of the Model, such as the number of network layers, the size of the convolution kernel, the parameters in the convolution kernel, and the number of pyramid-pooled blocks, may all be changed according to the actual situation, and in addition, the entity recognition Model may be any one of the existing models, such as HMM (Hidden Markov Model), CRF (Conditional Random Fields) Model, blstm (Bi-directional Long Short-Term Memory) Model, bltm + CRF Model, and so on.

S209: and inputting the dimension reduction features into the full connection layer, and outputting the relation category through Softmax.

In summary, the embodiments of the present application have the following advantages:

firstly, the conventional convolutional neural network cannot utilize position information, but in the feature construction stage, the embodiment utilizes relative position information and absolute position information, utilizes length information of an original statement, and also utilizes sine and cosine position coding to design a position vector;

secondly, the dynamic Word vector extracted by the BERT model and the static Word vector extracted by the Word2vec model are fused when the Word vector is expressed, wherein the BERT model can solve the problems of Word ambiguity and unknown words (oov), and meanwhile, the performance of the whole model can be improved by introducing the fusion vector;

thirdly, extracting local features and long-distance features of the sentences by using a mixed convolution layer (comprising common causal convolution and cavity causal convolution);

fourthly, dividing the importance degree of the features generated by different convolution kernels with the same dimensionality by utilizing a channel attention domain, and giving weight to the features;

fifth, the traditional pooling cannot output feature vectors with fixed dimensions, so that input texts need to be filled and truncated at an input end, the length of original sentences is damaged, and unnecessary redundant information is introduced in the model training process.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

An embodiment of the present application further provides a relationship extraction device for natural language, and fig. 8 is a block diagram of a structure of a relationship extraction device for natural language according to an embodiment of the present application, and as shown in fig. 8, the device includes: the system comprises an identification module 1, an analysis module 2, a calculation module 3, a fusion module 4 and an extraction module 5.

The recognition module 1 is used for recognizing an entity from an original sentence through an entity recognition model;

the analysis module 2 is used for analyzing the absolute position information of the entity and the relative position information between each word and the entity in the original sentence;

the calculation module 3 is used for calculating the position vector of each word according to the absolute position information, the relative position information and the length information of the original sentence;

the fusion module 4 is used for performing word vectorization representation on the original sentence to obtain a word vector of each word, and splicing the word vector and the position vector to obtain a fusion vector;

the extraction module 5 is used for inputting the fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common causal convolution and a cavity causal convolution; sending the extracted features into an attention layer, inputting the features output by the attention layer into a pyramid pooling layer, and performing down-sampling on the features through pyramid pooling to obtain dimension reduction features; and inputting the dimension reduction features into the full connection layer, and outputting the relation category through Softmax.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

In addition, in combination with the method for extracting the relationship of the natural language in the foregoing embodiment, the embodiment of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements the method for extracting a relationship in a natural language according to any one of the above embodiments.

An embodiment of the present application also provides an electronic device, which may be a terminal. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a method of relational extraction in natural language. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.

In an embodiment, fig. 9 is a schematic internal structure diagram of an electronic device according to an embodiment of the present application, and as shown in fig. 9, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 9. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a natural language relation extraction method, and the database is used for storing data.

Those skilled in the art will appreciate that the configuration shown in fig. 9 is a block diagram of only a portion of the configuration relevant to the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for extracting a relationship in a natural language, comprising:

identifying an entity from the original sentence through an entity identification model;

analyzing absolute position information of the entity and relative position information between each word in the original sentence and the entity;

calculating a position vector of each word according to the absolute position information, the relative position information and the length information of the original sentence;

performing word vectorization representation on the original sentence to obtain a word vector of each word, and splicing the word vector and the position vector to obtain a fusion vector;

inputting the fusion vector into a mixed convolution layer for feature extraction, wherein the mixed convolution layer comprises a common causal convolution and a void causal convolution;

sending the extracted features into an attention layer, inputting the features output by the attention layer into a pyramid pooling layer, and performing down-sampling on the features through pyramid pooling to obtain dimension reduction features;

inputting the dimension reduction features into a full connection layer, and outputting the relation category through Softmax.

2. The method of claim 1, wherein the calculating a position vector for each word based on the absolute position information, the relative position information, and the length information of the original sentence comprises:

calculating a relative distance value of each word and the entity according to the absolute position information and the relative position information;

if the relative distance value is an even number, sinusoidal position coding is used, and the position vector calculation formula of the word is as follows:

PE(pos，2i)＝sin(pos/1000^{2i/len(sentence)})

PE(pos，2i+1)＝cos(pos/1000^{2i+1/len(sentence)})

3. The method of claim 1, wherein said representing the original sentence by word vectorization, obtaining a word vector for each word comprises:

performing word vectorization representation on the original sentence through a BERT model to obtain a dynamic word vector;

performing Word vectorization representation on the original statement through a Word2vec model to obtain a static Word vector, wherein the dimensions of the static Word vector and the dynamic Word vector are kept consistent;

the word vector is calculated by the following formula:

vector＝weight_1*vector_bert+weight_2*vector_word2vec

wherein vector is the word vector_bertRepresenting said dynamic word vector, vector_word2vecRepresenting the static word vector, wherein weight _1 and weight _2 respectively represent a preset first weight value and a preset second weight value, and the sum of the first weight value and the second weight value is 1.

4. The method of claim 1, wherein the concatenating the word vector and the position vector to obtain a fused vector comprises:

setting the dimensionality of the position vector as m and the dimensionality of the word vector as n;

for the original sentence with length L, the dimension of the transformed fusion vector is [ L, N ], where N is N + m.

5. The method of claim 1, wherein the common causal convolution comprises at least two different convolution kernels in each dimension.

6. The method of claim 5, wherein said inputting the fused vector into a hybrid convolutional layer for feature extraction comprises:

extracting sentence local features through the common cause and effect convolution;

extracting long-distance sentence features through the hole causal convolution;

and the sentence local feature and the sentence long-distance feature are subjected to an activation function to obtain a nonlinear feature which is used as the extracted feature.

7. The method of claim 5, wherein after said passing the extracted features into the attention layer, the method further comprises:

and (4) dividing the features generated by different convolution kernels with the same dimension into important degrees by utilizing a channel attention domain, and giving weights to the features.

8. A relationship extraction apparatus for a natural language, comprising:

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 7.

10. A storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any one of claims 1 to 7 when executed.