WO2022178950A1 - 预测语句实体的方法、装置和计算机设备 - Google Patents

预测语句实体的方法、装置和计算机设备 Download PDF

Info

Publication number
WO2022178950A1
WO2022178950A1 PCT/CN2021/084569 CN2021084569W WO2022178950A1 WO 2022178950 A1 WO2022178950 A1 WO 2022178950A1 CN 2021084569 W CN2021084569 W CN 2021084569W WO 2022178950 A1 WO2022178950 A1 WO 2022178950A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
sequence
expression
table structure
text vector
Prior art date
Application number
PCT/CN2021/084569
Other languages
English (en)
French (fr)
Inventor
王健宗
宋青原
吴天博
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022178950A1 publication Critical patent/WO2022178950A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of neural networks in artificial intelligence, and in particular, the present application relates to methods, apparatuses and computer equipment for predicting sentence entities.
  • the main purpose of this application is to provide a method for predicting sentence entities, aiming to solve the technical problem that the existing entity recognizers are prone to mis-propagation.
  • the present application proposes a method for predicting a sentence entity, including:
  • the entity relationship prediction result output by the last table coding layer is obtained, and according to the obtaining method of the second sequence expression, the entity prediction result output by the last sequence coding layer is obtained.
  • the present application also provides a device for predicting a sentence entity, including:
  • the acquisition module is used to acquire the text vector corresponding to the sentence to be analyzed
  • a first input module configured to input the text vector into a first table encoding layer to obtain a first table structure expression corresponding to the text vector
  • a second input module configured to input the text vector and the first table structure expression into the first sequence coding layer to obtain the first sequence expression corresponding to the text vector
  • a third input module configured to input the first sequence expression and the first table structure expression into a second table encoding layer to obtain a second table structure expression corresponding to the text vector, wherein the second table encoding layer adjacent to the first table coding layer;
  • a fourth input module configured to input the first sequence expression and the second table structure expression into a second sequence encoding layer to obtain a second sequence expression corresponding to the text vector, wherein the second sequence encoding layer adjacent to the first sequence coding layer;
  • the obtaining module is used to obtain the entity relationship prediction result output by the last layer table coding layer according to the obtaining method of the second table structure expression, and obtain the entity output by the last layer sequence coding layer according to the obtaining method of the second sequence expression forecast result.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements a method for predicting a sentence entity when the processor executes the computer program, wherein the The method for predicting a sentence entity includes: obtaining a text vector corresponding to the sentence to be analyzed; inputting the text vector into a first table encoding layer to obtain a first table structure expression corresponding to the text vector; A table structure expression is input into the first sequence encoding layer, and the first sequence expression corresponding to the text vector is obtained; the first sequence expression and the first table structure expression are input into the second table encoding layer to obtain the corresponding text vector.
  • the second table structure expression wherein, the second table encoding layer is adjacent to the first table encoding layer;
  • the first sequence expression and the second table structure expression are input into the second sequence encoding layer, Obtain the second sequence expression corresponding to the text vector, wherein, the second sequence encoding layer is adjacent to the first sequence encoding layer;
  • the last layer table encoding is obtained For the entity relationship prediction result output by the layer, the entity prediction result output by the last layer sequence coding layer is obtained according to the obtaining method of the second sequence expression.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements a method for predicting a sentence entity, wherein the predicted sentence entity is
  • the method includes: obtaining a text vector corresponding to a statement to be analyzed; inputting the text vector into a first table encoding layer to obtain a first table structure expression corresponding to the text vector; inputting the text vector and the first table structure expression
  • the first sequence encoding layer obtains the first sequence expression corresponding to the text vector; the first sequence expression and the first table structure expression are input into the second table encoding layer to obtain the second table structure corresponding to the text vector expression, wherein the second table encoding layer is adjacent to the first table encoding layer; the first sequence expression and the second table structure expression are input into the second sequence encoding layer to obtain the text vector
  • the corresponding second sequence expression wherein, the second sequence coding layer is adjacent to the first sequence coding layer; According to the obtaining method of the second table
  • the present application performs joint learning through the connection of two different types of encoders, which alleviates the problem of error propagation in the pipeline method, and benefits from the use of the relationship between entity relationship prediction results and entity prediction results in training and use, and improves the performance of entity prediction. precision.
  • FIG. 1 is a schematic flowchart of a method for predicting a sentence entity according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a model of a predicted sentence entity according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of the interaction between a table encoding layer and a sequence encoding layer in the model structure of a predicted sentence entity according to an embodiment of the present application;
  • FIG. 4 is a schematic flowchart of a system for predicting a sentence entity according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.
  • a method for predicting a sentence entity includes:
  • S4 Input the first sequence expression and the first table structure expression into a second table encoding layer to obtain a second table structure expression corresponding to the text vector, wherein the second table encoding layer and the first table encoding layer
  • the table coding layers are adjacently connected;
  • S5 Input the first sequence expression and the second table structure expression into a second sequence encoding layer to obtain a second sequence expression corresponding to the text vector, wherein the second sequence encoding layer is the same as the first sequence encoding layer.
  • the sequence coding layers are connected adjacently;
  • the model for predicting the sentence entity is composed of two different types of encoders connected, and the composition structure is shown in FIG. 2 .
  • One encoder corresponds to a table encoder represented by a table structure
  • the other encoder corresponds to a sequence encoder represented by a sequence.
  • the two encoders interact with each other in units of convolutional layers, and improve the quality of the two representations and the prediction accuracy through the multi-layer interaction.
  • two prediction results of named entity recognition and relation extraction are simultaneously obtained in the same model through two expressions, thereby improving the accuracy of entity recognition.
  • the above named entities refer to the preset entities included in the sentence. For example, the names of persons, items, organizations, etc. are pre-specified as named entities.
  • the text vector in this embodiment of the present application includes word embedding, word embedding, and contextual word embedding.
  • Contextual word embeddings are produced by models such as BERT and are represented as The above R represents the value range, w represents the word embedding, c represents the word embedding, d1 represents the dimension corresponding to the word embedding, d2 represents the dimension corresponding to the word embedding, and d3 represents the dimension corresponding to the contextual word embedding.
  • the first table encoding layer and the second table encoding layer in the embodiment of the present application are any two adjacent table encoding layers in the model for predicting sentence entities; the first sequence encoding layer and the second sequence encoding layer are models for predicting sentence entities in any adjacent two sequence coding layers.
  • “First” and “second” are only used for distinction, not for limitation, and similar terms in other places have the same functions and will not be repeated.
  • This application connects two different types of encoders, and achieves the purpose of accurately identifying named entities through joint learning of table structure expression and sequence expression for the same input sentence, without adding additional entity recognizers to form a pipeline, avoiding pipelines
  • the first table encoding layer is the first table encoding layer connected to the text vectorizer, and the step S2 of inputting the text vector into the first table encoding layer to obtain the first table structure expression corresponding to the text vector ,include:
  • the table structure in the first table coding layer is defined as an initial table structure, and the initial table structure is a context-free table structure. Then, through the first context relationship probability output by the coding layer of the Bert model, the association prediction relationship in the table structure is transformed.
  • the parameters in the encoding layer of the Bert model connected to the first table encoding layer are the pretrained attention weights.
  • the table encoder is used to learn the neural network expressed by the table structure.
  • the structure is shown on the left side of Figure 3.
  • the direct splicing layer is Concat as a table.
  • it is an N ⁇ N vector table, in which the vectors in the i-th row and the j-th column correspond to the vectors between the i-th word and the j-th word in the sentence, respectively.
  • L l is the number of stacked Transformer layers, and A l is the number of heads of multi-head attention for each Transformer layer.
  • the number of hidden neurons in the BERT fully-connected layer in the embodiment of the present application is halved, which improves the calculation rate.
  • ReLu (Rectified Linear Unit, linear rectification function) is an activation function
  • Linear represents a linear mapping
  • X l, i, j represents the vector expression between the ith word and the jth word in the lth layer
  • S l-1 ,i represents the sequence representation corresponding to the ith word in the l-1th layer
  • S l-1,j represents the sequence representation corresponding to the jth word in the l-1th layer
  • X l represents the vector representation of the sentence in the lth layer.
  • the first table encoding layer includes a direct splicing layer, a linear projection layer and an iterative recursive layer that are connected in sequence, and the step of constructing a context-free initial table structure according to the text vector in the first table encoding layer S21, including:
  • S213 Calculate the initial sequence through an iterative recursive layer to obtain a hidden layer state of each cell in the context-free table structure, and obtain an initial table structure corresponding to the text vector.
  • the iterative recurrent layers described above include multi-dimensional recurrent neural networks and/or gated recurrent units to add context to Xl .
  • the iterative recursive layer is a multi-dimensional recurrent neural network
  • the initial sequence is calculated by the iterative recursive layer to obtain the hidden layer state of each cell in the context-free table structure, and the corresponding text vector is obtained.
  • the step S213 of the initial table structure includes:
  • S2131 Acquire first gated loop data of a specified cell in a first spatial direction and a second spatial direction, wherein the first spatial direction and the second spatial direction are two opposite directions of the first spatial dimension , the specified cell is any cell in the table structure without context;
  • S2132 Acquire second gated loop data of the specified cell in a third spatial direction and a fourth spatial direction, wherein the third spatial direction and the fourth spatial direction are two of the second spatial dimension relative direction, the second spatial dimension and the first spatial dimension are perpendicular to each other;
  • S2134 Obtain the hidden layer state of each cell in the context-free table structure according to the calculation method of the hidden layer state of the specified cell, and obtain the initial table structure corresponding to the text vector.
  • a cyclic neural network RNN and a gated recurrent unit GRU are used to add context to X1 .
  • the recurrent neural network is a multi-dimensional recurrent neural network MD-RNN, and in order to reduce the amount of calculation, the use of 2D-RNN can consider RNNs from four spatial aspects.
  • the table structure after iterative calculation is expressed as The representation is the concatenation of the hidden layers of two RNNs, as follows: a and c represent different spatial dimensions, for example, a represents the upper and lower dimensions, and c represents the left and right or front and rear dimensions.
  • the first sequence coding layer is the first sequence coding layer connected to the text vectorizer, the first sequence coding layer includes table-guided attention, and the text vector and the first table structure are expressed as input.
  • the first sequence encoding layer, the step S3 of obtaining the first sequence expression corresponding to the text vector, includes:
  • S34 Assign weights to each of the first output values and input the text vector into a feedforward neural network to obtain a first sequence expression corresponding to the text vector.
  • the sequence encoder is used to learn the sequence representation, and the ith vector in the vector sequence corresponds to the ith word of the sentence.
  • the sequence encoder architecture of this application is similar to Transformer, as shown in the right part of Figure 3. However, this application adds table-guided attention and replaces scaled dotproduct attention with table-guided attention so that new table-guided attention can be generated.
  • the output value is a weighted sum, which is assigned to each output
  • the weight of the value is determined by the relevance of the query Q to all keys given by the score function f.
  • the table-guided attention in the embodiment of the present application is multi-head attention, in which each head has independent parameters, and their outputs are connected through a fully connected layer to obtain the final attention output value.
  • the feedforward neural network FFNN in the table sequence encoder has residual connections and normalization processing of layers.
  • the output sequence expression is expressed as:
  • the second table encoding layer includes a direct splicing layer, a linear projection layer and an iterative recursive layer that are connected in turn, and the second table encoding layer is obtained by inputting the first sequence expression and the first table structure expression into the second table encoding layer.
  • the step S4 of expressing the second table structure corresponding to the text vector includes:
  • S42 Input the second context relationship probability and the first sequence expression into the direct splicing layer, and perform vector splicing through the direct splicing layer to obtain a second splicing vector;
  • the embodiment of the present application takes the interaction between the intermediate table encoding layer and the sequence encoding layer as an example, the calculation process is the same as the calculation principle of the first table encoding layer, and only the input data is different.
  • the calculation process refer to the calculation process of the coding layer in the table above, which will not be repeated.
  • the second sequence encoding layer includes a table-guided attention and feedforward neural network, and the first sequence expression and the second table structure expression are input into the second sequence encoding layer to obtain the corresponding text vector.
  • the step S5 of the second sequence expression comprising:
  • S53 assign weights to each of the second output values according to the score function
  • S54 Assign weights to each of the second output values and input the first sequence expression to a feedforward neural network to obtain the second sequence expression.
  • the embodiment of the present application takes the interaction between the table coding layer and the sequence coding layer as an example.
  • the calculation process is the same as the calculation principle of the first sequence coding layer, and only the input data is different.
  • the calculation process refer to the calculation process of the above-mentioned sequence coding layer, which will not be repeated.
  • an apparatus for predicting a sentence entity includes:
  • Obtaining module 1 for obtaining the text vector corresponding to the sentence to be analyzed
  • the first input module 2 is used to input the text vector into the first table encoding layer to obtain the first table structure expression corresponding to the text vector;
  • the second input module 3 is configured to input the text vector and the first table structure expression into the first sequence coding layer to obtain the first sequence expression corresponding to the text vector;
  • the third input module 4 is configured to input the first sequence expression and the first table structure expression into the second table encoding layer to obtain the second table structure expression corresponding to the text vector, wherein the second table encoding The layer is adjacently connected to the first table coding layer;
  • the fourth input module 5 is used for inputting the first sequence expression and the second table structure expression into the second sequence encoding layer to obtain the second sequence expression corresponding to the text vector, wherein the second sequence encoding layers are adjacently connected to the first sequence coding layer;
  • Obtaining module 6 is used to obtain the entity relationship prediction result output by the last layer table coding layer according to the obtaining method of the second table structure expression, and obtain the output of the last layer sequence coding layer according to the obtaining method of the second sequence expression. Entity prediction results.
  • the first table encoding layer is the first table encoding layer connecting the text vectorizer, and the first input module 2 includes:
  • a construction submodule for constructing a context-free initial table structure according to the text vector in the first table encoding layer
  • the first acquisition submodule is used to acquire the first context relationship probability of the coding layer output of the Bert model
  • An association submodule configured to associate the first context relationship probability with the initial table structure to obtain a first table structure expression.
  • the first table encoding layer includes a direct splicing layer, a linear projection layer and an iterative recursive layer that are connected in sequence, and a sub-module is constructed, including:
  • a splicing unit configured to perform vector splicing of the text vector through a direct splicing layer to obtain a first splicing vector
  • a linear projection unit configured to obtain the initial sequence corresponding to the text vector by passing the first splicing vector through a linear projection layer
  • the calculation unit is configured to calculate the initial sequence through the iterative recursive layer to obtain the hidden layer state of each cell in the context-free table structure, and obtain the initial table structure corresponding to the text vector.
  • the iterative recursive layer is a multi-dimensional recurrent neural network
  • the computing unit includes:
  • the first acquisition subunit is used to acquire the first gated loop data of the specified cell in the first spatial direction and the second spatial direction, wherein the first spatial direction and the second spatial direction are the first spatial direction
  • the specified cell is any cell in the table structure without context
  • the second acquisition subunit is configured to acquire the second gated loop data of the specified cell in the third spatial direction and the fourth spatial direction, wherein the third spatial direction and the fourth spatial direction are the first Two relative directions of two spatial dimensions, the second spatial dimension and the first spatial dimension are perpendicular to each other;
  • the first obtaining subunit is used to obtain the hidden layer state of the specified cell according to the first gated loop data and the second gated loop data;
  • the second obtaining subunit is used to obtain the hidden layer state of each cell in the context-free table structure according to the calculation method of the hidden layer state of the specified cell, and obtain the initial table structure corresponding to the text vector .
  • the first sequence coding layer is the first sequence coding layer connecting the text vectorizer, the first sequence coding layer includes table-guided attention, and the second input module 3 includes:
  • the second obtaining sub-module is used to obtain the preset query and the initial assignment of the corresponding key-value pair;
  • a first operation submodule used for calculating the first output value corresponding to the text vector through the attention operation guided by the table according to the initial assignment
  • a first assigning sub-module for assigning weights to each of the first output values according to the score function
  • the first input sub-module is configured to assign weights to each of the first output values and input the text vector to a feedforward neural network to obtain a first sequence expression corresponding to the text vector.
  • the second table coding layer includes a direct splicing layer, a linear projection layer and an iterative recursive layer that are connected in sequence
  • the third input module 4 includes:
  • the third acquisition submodule is used to acquire the second context relationship probability output by the Bert model encoding layer connected to the second table encoding layer;
  • the second input sub-module is used to input the second context relationship probability and the first sequence expression into the direct splicing layer, and perform vector splicing through the direct splicing layer to obtain a second splicing vector;
  • a projection submodule used to obtain a specified sequence by passing the second splicing vector through a linear projection layer
  • the third input sub-module is configured to input the specified sequence and the first table structure expression into the iterative recursive layer, and calculate the hidden layer state of each cell in the first table structure expression through the iterative recursion layer to obtain The second table structure expression.
  • the second sequence encoding layer includes a table-guided attention and feedforward neural network
  • the fourth input module 5 includes:
  • the fourth acquisition submodule is used to acquire the query corresponding to the second table structure expression and the specified assignment of the key-value pair corresponding to the query;
  • a second operation sub-module configured to express the corresponding second output value through the table-guided attention operation according to the specified assignment
  • the second assigning sub-module is used to assign weights to each of the second output values according to the score function
  • the fourth input sub-module is configured to assign weights to each of the second output values and input the first sequence expression to a feedforward neural network to obtain the second sequence expression.
  • an embodiment of the present application further provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 5 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer device is used to store all the data required by the process of predicting the entity of the sentence.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program when executed by a processor, implements a method of predicting sentence entities.
  • the above-mentioned method for predicting a sentence entity by the above-mentioned processor includes: obtaining a text vector corresponding to the sentence to be analyzed; inputting the text vector into a first table encoding layer to obtain a first table structure expression corresponding to the text vector; The vector and the first table structure expression are input into the first sequence encoding layer to obtain the first sequence expression corresponding to the text vector; the first sequence expression and the first table structure expression are input into the second table encoding layer to obtain The second table structure expression corresponding to the text vector, wherein the second table encoding layer is adjacent to the first table encoding layer; input the first sequence expression and the second table structure expression into the first table structure expression.
  • Two-sequence coding layer to obtain the second sequence expression corresponding to the text vector, wherein the second sequence coding layer is adjacent to the first sequence coding layer; according to the obtaining method of the second table structure expression, The entity relationship prediction result output by the last layer table coding layer is obtained, and the entity prediction result output by the last layer sequence coding layer is obtained according to the obtaining method of the second sequence expression.
  • the above computer equipment performs joint learning through the connection of two different types of encoders, which alleviates the problem of error propagation in the pipeline method, and benefits from the use of the relationship between entity relationship prediction results and entity prediction results in training and use. Prediction accuracy.
  • the first table encoding layer is the first table encoding layer connected to the text vectorizer, and the processor inputs the text vector into the first table encoding layer to obtain the first table structure corresponding to the text vector
  • the expressing step includes: constructing an initial table structure without context relationship according to the text vector in the first table coding layer; obtaining the first context relationship probability output by the coding layer of the Bert model; The probability is associated with the initial table structure to obtain the first table structure expression.
  • the first table encoding layer includes a direct concatenation layer, a linear projection layer, and an iterative recursive layer that are connected in sequence
  • the processor constructs a context-free initial layer according to the text vector in the first table encoding layer
  • the step of table structure includes: performing vector splicing on the text vector through a direct splicing layer to obtain a first splicing vector; obtaining an initial sequence corresponding to the text vector by passing the first splicing vector through a linear projection layer;
  • the initial sequence obtains the hidden layer state of each cell in the context-free table structure through iterative recursive layer calculation, and obtains the initial table structure corresponding to the text vector.
  • the iterative recursive layer is a multi-dimensional recurrent neural network
  • the processor calculates the initial sequence through the iterative recursive layer to obtain the hidden layer state of each cell in the context-free table structure
  • the step of obtaining the initial table structure corresponding to the text vector includes: obtaining first gated loop data of a specified cell in a first spatial direction and a second spatial direction, wherein the first spatial direction and the The second spatial direction is the two opposite directions of the first spatial dimension, and the specified cell is any cell in the table structure without context; obtains the specified cell in the third spatial direction and the fourth spatial direction
  • the second gated cyclic data wherein the third spatial direction and the fourth spatial direction are two opposite directions of the second spatial dimension, and the second spatial dimension and the first spatial dimension are perpendicular to each other; according to The first gated loop data and the second gated loop data obtain the hidden layer state of the specified cell; according to the calculation method of the hidden layer state of the specified cell, a context-free table structure is obtained
  • the first sequence coding layer is the first sequence coding layer connected to the text vectorizer, the first sequence coding layer includes table-guided attention, and the processor combines the text vector with the first sequence coding layer.
  • the table structure expression is input into the first sequence coding layer, and the steps of obtaining the first sequence expression corresponding to the text vector include: obtaining a preset query and an initial assignment of a key-value pair corresponding to the query; Force operation on the first output value corresponding to the text vector; assign weight to each of the first output values according to the score function; assign weight to each of the first output values and input the text vector into a feedforward neural network to obtain the the first sequence expression corresponding to the text vector.
  • the second table encoding layer includes a direct concatenation layer, a linear projection layer and an iterative recursive layer that are connected in sequence, and the above-mentioned processor inputs the first sequence expression and the first table structure expression into the second
  • the step of obtaining the second table structure representation corresponding to the text vector by the table coding layer includes: obtaining the second context relationship probability output by the Bert model coding layer connected to the second table coding layer; converting the second context relationship
  • the probability and the expression of the first sequence are input into the direct splicing layer, and vector splicing is performed through the direct splicing layer to obtain a second splicing vector; the second splicing vector is passed through the linear projection layer to obtain a specified sequence; the specified sequence and the
  • the first table structure expression is input to an iterative recursive layer, and the second table structure expression is obtained by calculating the hidden layer state of each cell in the first table structure expression through the iterative recursion layer.
  • the second sequence encoding layer includes a table-guided attention and feedforward neural network
  • the above-mentioned processor inputs the first sequence expression and the second table structure expression into the second sequence encoding layer to obtain the result
  • the step of expressing the second sequence corresponding to the text vector comprising: obtaining the corresponding query in the second table structure expression and the specified assignment of the query corresponding key-value pair;
  • the second table structure expresses the corresponding second output value; assigns a weight to each of the second output values according to the score function; assigns a weight to each of the second output values and the first sequence expression is input into the feedforward neural network, obtaining The second sequence is expressed.
  • FIG. 5 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • An embodiment of the present application further provides a computer-readable storage medium, the storage medium is a volatile storage medium or a non-volatile storage medium, and a computer program is stored thereon, and the computer program implements a predictive statement when executed by a processor
  • the method of the entity includes: obtaining the text vector corresponding to the statement to be analyzed; inputting the text vector into the first table coding layer to obtain the first table structure expression corresponding to the text vector; The structural expression is input into the first sequence encoding layer to obtain the first sequence expression corresponding to the text vector; the first sequence expression and the first table structural expression are input into the second table encoding layer to obtain the first sequence expression corresponding to the text vector.
  • Two-table structure expression wherein the second table encoding layer is adjacent to the first table encoding layer; inputting the first sequence expression and the second table structure expression into the second sequence encoding layer to obtain the The second sequence expression corresponding to the text vector, wherein the second sequence encoding layer is adjacent to the first sequence encoding layer; according to the obtaining method of the second table structure expression, the output of the last layer table encoding layer is obtained The entity relationship prediction result is obtained, and the entity prediction result output by the last sequence coding layer is obtained according to the obtaining method of the second sequence expression.
  • the above-mentioned computer-readable storage medium performs joint learning through the connection of two different types of encoders, which alleviates the problem of error propagation in pipeline methods, and benefits from the use of entity relationship prediction results and the relationship between entity prediction results in training and use. , to improve the accuracy of entity prediction.
  • the first table encoding layer is the first table encoding layer connected to the text vectorizer, and the processor inputs the text vector into the first table encoding layer to obtain the first table structure corresponding to the text vector
  • the expressing step includes: constructing an initial table structure without context relationship according to the text vector in the first table coding layer; obtaining the first context relationship probability output by the coding layer of the Bert model; The probability is associated with the initial table structure to obtain the first table structure expression.
  • the first table encoding layer includes a direct concatenation layer, a linear projection layer, and an iterative recursive layer that are connected in sequence
  • the processor constructs a context-free initial layer according to the text vector in the first table encoding layer
  • the step of table structure includes: performing vector splicing on the text vector through a direct splicing layer to obtain a first splicing vector; obtaining an initial sequence corresponding to the text vector by passing the first splicing vector through a linear projection layer;
  • the initial sequence obtains the hidden layer state of each cell in the context-free table structure through iterative recursive layer calculation, and obtains the initial table structure corresponding to the text vector.
  • the iterative recursive layer is a multi-dimensional recurrent neural network
  • the processor calculates the initial sequence through the iterative recursive layer to obtain the hidden layer state of each cell in the context-free table structure
  • the step of obtaining the initial table structure corresponding to the text vector includes: obtaining first gated loop data of a specified cell in a first spatial direction and a second spatial direction, wherein the first spatial direction and the The second spatial direction is the two opposite directions of the first spatial dimension, and the specified cell is any cell in the table structure without context; obtains the specified cell in the third spatial direction and the fourth spatial direction
  • the second gated cyclic data wherein the third spatial direction and the fourth spatial direction are two opposite directions of the second spatial dimension, and the second spatial dimension and the first spatial dimension are perpendicular to each other; according to The first gated loop data and the second gated loop data obtain the hidden layer state of the specified cell; according to the calculation method of the hidden layer state of the specified cell, a context-free table structure is obtained
  • the first sequence coding layer is the first sequence coding layer connected to the text vectorizer, the first sequence coding layer includes table-guided attention, and the processor combines the text vector with the first sequence coding layer.
  • the table structure expression is input into the first sequence coding layer, and the steps of obtaining the first sequence expression corresponding to the text vector include: obtaining a preset query and an initial assignment of a key-value pair corresponding to the query; Force operation on the first output value corresponding to the text vector; assign weight to each of the first output values according to the score function; assign weight to each of the first output values and input the text vector into a feedforward neural network to obtain the the first sequence expression corresponding to the text vector.
  • the second table encoding layer includes a direct concatenation layer, a linear projection layer and an iterative recursive layer that are connected in sequence, and the above-mentioned processor inputs the first sequence expression and the first table structure expression into the second
  • the step of obtaining the second table structure representation corresponding to the text vector by the table coding layer includes: obtaining the second context relationship probability output by the Bert model coding layer connected to the second table coding layer; converting the second context relationship
  • the probability and the expression of the first sequence are input into the direct splicing layer, and vector splicing is performed through the direct splicing layer to obtain a second splicing vector; the second splicing vector is passed through the linear projection layer to obtain a specified sequence; the specified sequence and the
  • the first table structure expression is input to an iterative recursive layer, and the second table structure expression is obtained by calculating the hidden layer state of each cell in the first table structure expression through the iterative recursion layer.
  • the second sequence encoding layer includes a table-guided attention and feedforward neural network
  • the above-mentioned processor inputs the first sequence expression and the second table structure expression into the second sequence encoding layer to obtain the result
  • the step of expressing the second sequence corresponding to the text vector comprising: obtaining the corresponding query in the second table structure expression and the specified assignment of the query corresponding key-value pair;
  • the second table structure expresses the corresponding second output value; assigns a weight to each of the second output values according to the score function; assigns a weight to each of the second output values and the first sequence expression is input into the feedforward neural network, obtaining The second sequence is expressed.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

预测语句实体的方法,涉及人工智能中的神经网络技术领域,包括:获取待分析语句对应的文本向量(S1);将文本向量输入第一表编码层得到文本向量对应的第一表结构表达(S2);将文本向量和第一表结构表达输入第一序列编码层,得到文本向量对应的第一序列表达(S3);将第一序列表达和第一表结构表达输入第二表编码层得到文本向量对应的第二表结构表达(S4);将第一序列表达和第二表结构表达输入第二序列编码层,得到文本向量对应的第二序列表达(S5);根据第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果(S6)。通过联合学习缓解管道方法错误传播的问题,提高实体预测的精准度。

Description

预测语句实体的方法、装置和计算机设备
本申请要求于2021年2月25日提交中国专利局、申请号为202110212245.0,发明名称为“预测语句实体的方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能中的神经网络技术领域,本申请特别是涉及到预测语句实体的方法、装置和计算机设备。
背景技术
随着人工智能的发展,语音识别等语言模型越来越流行于各智能设备上。现有语言模型多基于实体识别和关系提取等进行预测分析,但发明人意识到现有命名实体识别和关系提取分别通过独立的两个模型处理,无法获取它们之间的关联关系,而且某些分类任务中,实体通常不可直接用于任务,需要增加一个额外的实体识别器形成一个管道,但管道方法容易发生误传播的问题,影响对预测结果的精准判断。
技术问题
现有命名实体识别和关系提取分别通过独立的两个模型处理,无法获取它们之间的关联关系,而且某些分类任务中,实体通常不可直接用于任务,需要增加一个额外的实体识别器形成一个管道,但管道方法容易发生误传播的问题,影响对预测结果的精准判断。
技术解决方案
本申请的主要目的为提供预测语句实体的方法,旨在解决现有实体识别器容易发生误传播的的技术问题。
第一方面,本申请提出一种预测语句实体的方法,包括:
获取待分析语句对应的文本向量;
将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
第二方面,本申请还提供了一种预测语句实体的装置,包括:
获取模块,用于获取待分析语句对应的文本向量;
第一输入模块,用于将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
第二输入模块,用于将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
第三输入模块,用于将所述第一序列表达和所述第一表结构表达输入第 二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
第四输入模块,用于将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
得到模块,用于根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
第三方面,本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种预测语句实体的方法,其中,所述预测语句实体的方法包括:获取待分析语句对应的文本向量;将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
第四方面,本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种预测语句实体的方法,其中,所述预测语句实体的方法包括:获取待分析语句对应的文本向量;将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
有益效果
本申请通过两种不同类型的编码器连接进行联合学习,缓解管道方法错误传播的问题,且在训练和使用中受益于利用实体关系预测结果和实体预测结果之间的相互关系,提高实体预测的精准度。
附图说明
图1本申请一实施例的预测语句实体的方法流程示意图;
图2本申请一实施例的预测语句实体的模型结构示意图;
图3本申请一实施例的预测语句实体的模型结构中表编码层和序列编码层的交互示意图;
图4本申请一实施例的预测语句实体的系统流程示意图;
图5本申请一实施例的计算机设备内部结构示意图。
本发明的最佳实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,本申请一实施例的预测语句实体的方法,包括:
S1:获取待分析语句对应的文本向量;
S2:将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
S3:将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
S4:将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
S5:将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
S6:根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
本申请实施例中,预测语句实体的模型由两种不同类型的编码器连接组成,组成结构如图2。一种编码器对应表结构表示的表编码器,另一种编码器对应序列表示的序列编码器。两种编码器以卷积层为单位进行相互交互,并通过多层的相互交互提高两种表达的质量,提高预测精准度。本申请通过两种表达在同一模型中同时得到命名实体识别和关系提取两种预测结果,提高实体识别的精准度。上述命名实体指语句中包括的预设实体,比如,预先指定人名、物品名、组织等为命名实体,句子“老王喜欢吃苹果”中,“老王”和“苹果”则为命名实体,“老王喜欢吃”则为实体之间的关系提取,通过关系提取可提高下次识别“老王”实体的概率。
本申请实施例的文本向量包括词嵌入、字嵌入和上下文关系词嵌入。对于包含N个单词的句子为x,x=[x i] 1≤i≤N,i表示句子中的第i个子,对应的词嵌入为
Figure PCTCN2021084569-appb-000001
对应的字嵌入为
Figure PCTCN2021084569-appb-000002
词嵌入和字嵌入由LSTM或RNNS等计算得到。上下文关系词嵌入由BERT等模型产生,表示为
Figure PCTCN2021084569-appb-000003
上述R表示取值区间,w表示词嵌入,c表示字嵌入,d1表示词嵌入对应的维度,d2表示字嵌入对应的维度,d3表示上下文关系词嵌入对应的维度。
本申请实施例的第一表编码层、第二表编码层为预测语句实体的模型中任意相邻的两个表编码层;第一序列编码层、第二序列编码层为预测语句实体的模型中任意相邻的两个序列编码层。“第一”和“第二”仅用于区别,不用于限定,其他处的类似用语作用相同,不赘述。
本申请通过两种不同类型的编码器连接,通过对同一个输入语句进行表结构表达和序列表达的联合学习达到精准识别命名实体的目的,无需增加额外的实体识别器形成一个管道,避免了管道方法错误传播的问题,且在训练和使用中受益于利用实体关系预测结果和实体预测结果之间的相互关系,提高实体预测的精准度。
进一步地,所述第一表编码层为连接文本向量器的首个表编码层,所述 将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达的步骤S2,包括:
S21:在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;
S22:获取Bert模型的编码层输出的第一上下文关系概率;
S23:将所述第一上下文关系概率关联于所述初始表结构中,得到第一表结构表达。
本申请实施例中首个表编码层中的表结构定义为初始表结构,初始表结构为无上下文关系表结构。然后通过Bert模型的编码层输出的第一上下文关系概率,转变表结构中关联预测关系。与首个表编码层相连的Bert模型的编码层中的参量为预训练的注意力权重。
表编码器用于学习表结构表达的神经网络,结构如图3的左边所示,通过增加直接拼接层和线性投影层,加快计算效率,直接拼接层为Concat作为表。举例地,为一个N×N的向量表,其中第i行和第j列的向量分别对应语句中的第i个词和第j个词之间的向量。通过构造一个没有上下文关系的表结构,通过Bert模型的编码层输出的上下文关系概率,连接语句序列中的两个词向量,得到注意力修正后的表结构。比如修正前对于第l层的两个词向量之间的向量表达为:X l,i,j=ReLU(Linear(|S l-1,i;S l-1,j|)),X l∈R N×N×N。修正后为
Figure PCTCN2021084569-appb-000004
如图2中的虚线部分表示,利用来自BERT等预先训练的语言模型的注意力权重形式的信息,将注意力所有头部和所有层的注意力权重叠加起来形成
Figure PCTCN2021084569-appb-000005
此处L l是堆叠的Transformer层数,A l是每一Transformer层的多头注意力的头数。本申请实施例的BERT全连接层的隐藏神经元的数量减半,提高计算速率。上述ReLu(Rectified Linear Unit,线性整流函数)为激活函数,Linear表示线性映射,X l,i,j表示第l层中第i个词和第j个词之间的向量表达,S l-1,i表示第l-1层第i个词对应的序列表示,S l-1,j表示第l-1层第j个词对应的序列表示,X l表示句子在第l层的向量表达。
进一步地,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构的步骤S21,包括:
S211:将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;
S212:将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;
S213:将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
本申请实施例中,上述文本向量连接到语句中的每个词,并通过直接拼接层进行向量拼接后,通过线性投影层Linear&ReLU进行线性投影来形成初始序列S 0,表示为:S 0=Linear(|x w;x c;x l|),S 0∈R N×H,每个词都由H维向量表示。上述迭代递归层包括多维循环神经网络和/或门控循环单元,为X l添 加语境。
进一步地,所述迭代递归层为多维循环神经网络,将所述初始序列通过所述迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构的步骤S213,包括:
S2131:获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;
S2132:获取所述指定单元格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;
S2133:根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;
S2134:根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
本申请实施例中,使用循环神经网络RNN和门控循环单元GRU(Gated Recurrent Unit)为X l加入语境。通过迭代递归层的迭代计算上下文关系化的表结构中每个单元格的隐藏层状态,形成上下文关系化的表结构表达,表示为:T l,i,j=GRU(X l,i,j,T l-1,i,j,T l,i-1,j,T l,i,j-1)。
本申请为了能够从各个空间方向接触到周围的语境环境,循环神经网络为多维循环神经网络MD-RNN,且为了减少计算量,使用2D-RNN可从四个空间方面考虑RNNs。迭代计算后的表结构表达为
Figure PCTCN2021084569-appb-000006
表示是两个RNN的隐藏层串联,如下:
Figure PCTCN2021084569-appb-000007
Figure PCTCN2021084569-appb-000008
a和c代表不同空间维度,比如a表示上下维度,c表示左右维度或前后维度。
进一步地,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,所述将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达的步骤S3,包括:
S31:获取预设查询和查询对应键值对的初始赋值;
S32:根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;
S33:根据得分函数为各所述第一输出值赋予权重;
S34:将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
本申请实施例中,序列编码器用于学习序列表示,向量序列中第i个向量对应语句的第i个单词。本申请序列编码器架构类似于Transformer,如图3右边部分所示。但本申请增加了表引导的注意力,且用table-guided attention替换了scaled dotproduct attention,以便能生成新的表引导的注意力。
首先,给定表引导的注意力参数查询Q(querys)和查询Q对应的键值对K(keys)和V(values),对于每个查询Q,输出值为加权和,赋给每个输出 值的权重,由查询Q与所有键的相关性决定,上述相关性由得分函数f给出。举例地,对于每一个查询Q i和键K j,表示得分函数f的表达式为:f(Q i,K j)=U·g(Q i,K j)=U·T l,i,j,其中U是一个可学习向量参数,模型可在预设范围内自行调节,g表示将每个查询键对映射为向量的函数,对应权重相关性。
本申请实施例中表引导的注意力为多头注意力,其中每个头具有独立参数,通过一个完全连接的层连接它们的输出并获得最终的注意力输出值。表序列编码器中的前馈神经网络FFNN带残差连接,还有层的normalization处理,输出的序列表达表示为:
Figure PCTCN2021084569-appb-000009
Figure PCTCN2021084569-appb-000010
进一步地,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达的步骤S4,包括:
S41:获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;
S42:将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;
S43:将所述第二拼接向量通过线性投影层得到指定序列;
S44:将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
本申请实施例以中间的表编码层和序列编码层的交互为例,计算过程与首个表编码层的计算原理相同,仅输入数据不同。关于计算过程解释参照上述表编码层的计算过程,不赘述。
进一步地,第二序列编码层包括表引导的注意力和前馈神经网络,所述将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达的步骤S5,包括:
S51:获取所述第二表结构表达中对应的查询以及查询对应键值对的指定赋值;
S52:根据所述指定赋值通过表引导的注意力运算所述第二表结构表达对应的第二输出值;
S53:根据得分函数为各所述第二输出值赋予权重;
S54:将各所述第二输出值赋予权重以及所述第一序列表达输入前馈神经网络,得到所述第二序列表达。
本申请实施例以中间的表编码层和序列编码层的交互为例,计算过程与首个序列编码层的计算原理相同,仅输入数据不同。关于计算过程解释参照上述序列编码层的计算过程,不赘述。
参照图4,本申请一实施例的预测语句实体的装置,包括:
获取模块1,用于获取待分析语句对应的文本向量;
第一输入模块2,用于将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
第二输入模块3,用于将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
第三输入模块4,用于将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
第四输入模块5,用于将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
得到模块6,用于根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
本申请实施例中的装置解释同方法对应部分,不赘述。
进一步地,所述第一表编码层为连接文本向量器的首个表编码层,第一输入模块2,包括:
构建子模块,用于在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;
第一获取子模块,用于获取Bert模型的编码层输出的第一上下文关系概率;
关联子模块,用于将所述第一上下文关系概率关联于所述初始表结构中,得到第一表结构表达。
进一步地,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,构建子模块,包括:
拼接单元,用于将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;
线性投影单元,用于将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;
计算单元,用于将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
进一步地,所述迭代递归层为多维循环神经网络,计算单元,包括:
第一获取子单元,用于获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;
第二获取子单元,用于获取所述指定单元格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;
第一得到子单元,用于根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;
第二得到子单元,用于根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
进一步地,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,第二输入模块3,包括:
第二获取子模块,用于获取预设查询和查询对应键值对的初始赋值;
第一运算子模块,用于根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;
第一赋予子模块,用于根据得分函数为各所述第一输出值赋予权重;
第一输入子模块,用于将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
进一步地,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,第三输入模块4,包括:
第三获取子模块,用于获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;
第二输入子模块,用于将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;
投影子模块,用于将所述第二拼接向量通过线性投影层得到指定序列;
第三输入子模块,用于将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
进一步地,第二序列编码层包括表引导的注意力和前馈神经网络,第四输入模块5,包括:
第四获取子模块,用于获取所述第二表结构表达中对应的查询以及查询对应键值对的指定赋值;
第二运算子模块,用于根据所述指定赋值通过表引导的注意力运算所述第二表结构表达对应的第二输出值;
第二赋予子模块,用于根据得分函数为各所述第二输出值赋予权重;
第四输入子模块,用于将各所述第二输出值赋予权重以及所述第一序列表达输入前馈神经网络,得到所述第二序列表达。
参照图5,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图5所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储预测语句实体的过程需要的所有数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现预测语句实体的方法。
上述处理器执行上述预测语句实体的方法,包括:获取待分析语句对应的文本向量;将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
上述计算机设备,通过两种不同类型的编码器连接进行联合学习,缓解管道方法错误传播的问题,且在训练和使用中受益于利用实体关系预测结果和实体预测结果之间的相互关系,提高实体预测的精准度。
在一个实施例中,所述第一表编码层为连接文本向量器的首个表编码层,上述处理器将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达的步骤,包括:在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;获取Bert模型的编码层输出的第一上下文关系概率;将所述第一上下文关系概率关联于所述初始表结构中,得到第一表结构表达。
在一个实施例中,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,上述处理器在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构的步骤,包括:将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
在一个实施例中,所述迭代递归层为多维循环神经网络,上述处理器将所述初始序列通过所述迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构的步骤,包括:获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;获取所述指定单元格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
在一个实施例中,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,上述处理器将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达的步骤,包括:获取预设查询和查询对应键值对的初始赋值;根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;根据得分函数为各所述第一输出值赋予权重;将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
在一个实施例中,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,上述处理器将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达的步骤,包括:获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;将所述第二拼接向量通过线性投影层得到指定序列;将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
在一个实施例中,第二序列编码层包括表引导的注意力和前馈神经网络,上述处理器将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达的步骤,包括:获取所述第二表结构表达中对应的查询以及查询对应键值对的指定赋值;根据所述指定赋值通过表引导的注意力运算所述第二表结构表达对应的第二输出值;根据得分函数为各所述第二输出值赋予权重;将各所述第二输出值赋予权重以及所述第一序列表达输入前馈神经网络,得到所述第二序列表达。
本领域技术人员可以理解,图5中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请一实施例还提供一种计算机可读存储介质,所述存储介质为易失性存储介质或非易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现预测语句实体的方法,包括:获取待分析语句对应的文本向量;将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
上述计算机可读存储介质,通过两种不同类型的编码器连接进行联合学习,缓解管道方法错误传播的问题,且在训练和使用中受益于利用实体关系预测结果和实体预测结果之间的相互关系,提高实体预测的精准度。
在一个实施例中,所述第一表编码层为连接文本向量器的首个表编码层,上述处理器将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达的步骤,包括:在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;获取Bert模型的编码层输出的第一上下文关系概率;将所述第一上下文关系概率关联于初始表结构中,得到第一表结构表达。
在一个实施例中,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,上述处理器在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构的步骤,包括:将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
在一个实施例中,所述迭代递归层为多维循环神经网络,上述处理器将所述初始序列通过所述迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构的步骤,包括:获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;获取所述指定单元 格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
在一个实施例中,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,上述处理器将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达的步骤,包括:获取预设查询和查询对应键值对的初始赋值;根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;根据得分函数为各所述第一输出值赋予权重;将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
在一个实施例中,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,上述处理器将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达的步骤,包括:获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;将所述第二拼接向量通过线性投影层得到指定序列;将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
在一个实施例中,第二序列编码层包括表引导的注意力和前馈神经网络,上述处理器将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达的步骤,包括:获取所述第二表结构表达中对应的查询以及查询对应键值对的指定赋值;根据所述指定赋值通过表引导的注意力运算所述第二表结构表达对应的第二输出值;根据得分函数为各所述第二输出值赋予权重;将各所述第二输出值赋予权重以及所述第一序列表达输入前馈神经网络,得到所述第二序列表达。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。

Claims (20)

  1. 一种预测语句实体的方法,其中,包括:
    获取待分析语句对应的文本向量;
    将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
    将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
    将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
    将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
    根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
  2. 根据权利要求1所述的预测语句实体的方法,其中,当所述第一表编码层为连接文本向量器的首个表编码层时,所述将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达的步骤,包括:
    在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;
    获取Bert模型的编码层输出的第一上下文关系概率;
    将所述第一上下文关系概率关联于所述初始表结构中,得到第一表结构表达。
  3. 根据权利要求2所述的预测语句实体的方法,其中,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构的步骤,包括:
    将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;
    将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;
    将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
  4. 根据权利要求3所述的预测语句实体的方法,其中,所述迭代递归层为多维循环神经网络,将所述初始序列通过所述迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构的步骤,包括:
    获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;
    获取所述指定单元格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;
    根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;
    根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
  5. 根据权利要求1所述的预测语句实体的方法,其中,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,所述将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达的步骤,包括:
    获取预设查询和查询对应键值对的初始赋值;
    根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;
    根据得分函数为各所述第一输出值赋予权重;
    将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
  6. 根据权利要求1所述的预测语句实体的方法,其中,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达的步骤,包括:
    获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;
    将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;
    将所述第二拼接向量通过线性投影层得到指定序列;
    将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
  7. 根据权利要求1所述的预测语句实体的方法,其中,第二序列编码层包括表引导的注意力和前馈神经网络,所述将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达的步骤,包括:
    获取所述第二表结构表达中对应的查询以及查询对应键值对的指定赋值;
    根据所述指定赋值通过表引导的注意力运算所述第二表结构表达对应的第二输出值;
    根据得分函数为各所述第二输出值赋予权重;
    将各所述第二输出值赋予权重以及所述第一序列表达输入前馈神经网络,得到所述第二序列表达。
  8. 一种预测语句实体的装置,其中,包括:
    获取模块,用于获取待分析语句对应的文本向量;
    第一输入模块,用于将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
    第二输入模块,用于将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
    第三输入模块,用于将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
    第四输入模块,用于将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
    得到模块,用于根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种预测语句实体的方法;其中,所述预测语句实体的方法包括:
    获取待分析语句对应的文本向量;
    将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
    将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
    将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
    将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
    根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
  10. 根据权利要求9所述的计算机设备,其中,当所述第一表编码层为连接文本向量器的首个表编码层时,所述将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达的步骤,包括:
    在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;
    获取Bert模型的编码层输出的第一上下文关系概率;
    将所述第一上下文关系概率关联于所述初始表结构中,得到第一表结构表达。
  11. 根据权利要求10所述的计算机设备,其中,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构的步骤,包括:
    将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;
    将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;
    将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
  12. 根据权利要求11所述的计算机设备,其中,所述迭代递归层为多维循环神经网络,将所述初始序列通过所述迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构的步骤,包括:
    获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;
    获取所述指定单元格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;
    根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;
    根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
  13. 根据权利要求9所述的计算机设备,其中,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,所述将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达的步骤,包括:
    获取预设查询和查询对应键值对的初始赋值;
    根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;
    根据得分函数为各所述第一输出值赋予权重;
    将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
  14. 根据权利要求9所述的计算机设备,其中,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达的步骤,包括:
    获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;
    将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;
    将所述第二拼接向量通过线性投影层得到指定序列;
    将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种预测语句实体的方法;
    其中,所述预测语句实体的方法包括:
    获取待分析语句对应的文本向量;
    将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达;
    将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达;
    将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达,其中,所述第二表编码层与所述第一表编码层相邻相连;
    将所述第一序列表达和所述第二表结构表达输入第二序列编码层,得到所述文本向量对应的第二序列表达,其中,所述第二序列编码层于所述第一序列编码层相邻相连;
    根据所述第二表结构表达的获得方式,得到末层表编码层输出的实体关系预测结果,根据所述第二序列表达的获得方式,得到末层序列编码层输出的实体预测结果。
  16. 根据权利要求15所述的计算机可读存储介质,其中,当所述第一表编码层为连接文本向量器的首个表编码层时,所述将所述文本向量输入第一表编码层得到所述文本向量对应的第一表结构表达的步骤,包括:
    在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构;
    获取Bert模型的编码层输出的第一上下文关系概率;
    将所述第一上下文关系概率关联于所述初始表结构中,得到第一表结构表达。
  17. 根据权利要求16所述的计算机可读存储介质,其中,第一表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述在所述首个表编码层中根据所述文本向量构建无上下文关系的初始表结构的步骤,包括:
    将所述文本向量通过直接拼接层进行向量拼接,得到第一拼接向量;
    将所述第一拼接向量通过线性投影层得到所述文本向量对应的初始序列;
    将所述初始序列通过迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述迭代递归层为多维循环神经网络,将所述初始序列通过所述迭代递归层计算得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构的步骤,包括:
    获取指定单元格在第一空间方向和第二空间方向上的第一门控循环数据,其中,所述第一空间方向和所述第二空间方向为第一空间维度的两个相对方向,所述指定单元格为无上下文关系的表结构中任一单元格;
    获取所述指定单元格在第三空间方向和第四空间方向上的第二门控循环数据,其中,所述第三空间方向和所述第四空间方向为第二空间维度的两个相对方向,所述第二空间维度与所述第一空间维度相互垂直;
    根据所述第一门控循环数据和所述第二门控循环数据,得到所述指定单元格的隐藏层状态;
    根据所述指定单元格的隐藏层状态的计算方式,得到无上下文关系的表结构中的每个单元格的隐藏层状态,得到所述文本向量对应的初始表结构。
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述第一序列编码层为连接文本向量器的首个序列编码层,第一序列编码层包括表引导的注意力,所述将所述文本向量和所述第一表结构表达输入第一序列编码层,得到所述文本向量对应的第一序列表达的步骤,包括:
    获取预设查询和查询对应键值对的初始赋值;
    根据所述初始赋值通过表引导的注意力运算所述文本向量对应的第一输出值;
    根据得分函数为各所述第一输出值赋予权重;
    将各所述第一输出值赋予权重以及所述文本向量输入前馈神经网络,得到所述文本向量对应的第一序列表达。
  20. 根据权利要求15所述的计算机可读存储介质,其中,所述第二表编码层包括依次连接的直接拼接层、线性投影层和迭代递归层,所述将所述第一序列表达和所述第一表结构表达输入第二表编码层得到所述文本向量对应的第二表结构表达的步骤,包括:
    获取与所述第二表编码层相连的Bert模型编码层输出的第二上下文关系概率;
    将所述第二上下文关系概率和所述第一序列表达输入直接拼接层,通过直接拼接层进行向量拼接,得到第二拼接向量;
    将所述第二拼接向量通过线性投影层得到指定序列;
    将所述指定序列和所述第一表结构表达输入迭代递归层,通过迭代递归层计算所述第一表结构表达中的每个单元格的隐藏层状态,得到所述第二表结构表达。
PCT/CN2021/084569 2021-02-25 2021-03-31 预测语句实体的方法、装置和计算机设备 WO2022178950A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110212245.0A CN112949307A (zh) 2021-02-25 2021-02-25 预测语句实体的方法、装置和计算机设备
CN202110212245.0 2021-02-25

Publications (1)

Publication Number Publication Date
WO2022178950A1 true WO2022178950A1 (zh) 2022-09-01

Family

ID=76246221

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084569 WO2022178950A1 (zh) 2021-02-25 2021-03-31 预测语句实体的方法、装置和计算机设备

Country Status (2)

Country Link
CN (1) CN112949307A (zh)
WO (1) WO2022178950A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676705A (zh) * 2021-06-17 2022-06-28 腾讯云计算(北京)有限责任公司 一种对话关系处理方法、计算机及可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871158A (zh) * 2016-09-26 2018-04-03 清华大学 一种结合序列文本信息的知识图谱表示学习方法及装置
CN108681544A (zh) * 2018-03-07 2018-10-19 中山大学 一种基于图谱拓扑结构和实体文本描述的深度学习方法
CN111339774A (zh) * 2020-02-07 2020-06-26 腾讯科技(深圳)有限公司 文本的实体关系抽取方法和模型训练方法
WO2020261234A1 (en) * 2019-06-28 2020-12-30 Tata Consultancy Services Limited System and method for sequence labeling using hierarchical capsule based neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079431A (zh) * 2019-10-31 2020-04-28 北京航天云路有限公司 一种基于迁移学习的实体关系联合抽取方法
CN112163092B (zh) * 2020-10-10 2022-07-12 成都数之联科技股份有限公司 实体及关系抽取方法及系统、装置、介质
CN112380867A (zh) * 2020-12-04 2021-02-19 腾讯科技(深圳)有限公司 文本处理、知识库的构建方法、装置和存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871158A (zh) * 2016-09-26 2018-04-03 清华大学 一种结合序列文本信息的知识图谱表示学习方法及装置
CN108681544A (zh) * 2018-03-07 2018-10-19 中山大学 一种基于图谱拓扑结构和实体文本描述的深度学习方法
WO2020261234A1 (en) * 2019-06-28 2020-12-30 Tata Consultancy Services Limited System and method for sequence labeling using hierarchical capsule based neural network
CN111339774A (zh) * 2020-02-07 2020-06-26 腾讯科技(深圳)有限公司 文本的实体关系抽取方法和模型训练方法

Also Published As

Publication number Publication date
CN112949307A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
CN110781312B (zh) 基于语义表征模型的文本分类方法、装置和计算机设备
CN111344779B (zh) 训练和/或使用编码器模型确定自然语言输入的响应动作
CN109034378B (zh) 神经网络的网络表示生成方法、装置、存储介质和设备
WO2020151310A1 (zh) 文本生成方法、装置、计算机设备及介质
US20210232753A1 (en) Ml using n-gram induced input representation
CN111191002A (zh) 一种基于分层嵌入的神经代码搜索方法及装置
CN111400461B (zh) 智能客服问题匹配方法及装置
JP2019159654A (ja) 時系列情報の学習システム、方法およびニューラルネットワークモデル
CN112699215B (zh) 基于胶囊网络与交互注意力机制的评级预测方法及系统
CN113033189B (zh) 一种基于注意力分散的长短期记忆网络的语义编码方法
US20230096805A1 (en) Contrastive Siamese Network for Semi-supervised Speech Recognition
US20240185086A1 (en) Model distillation method and related device
WO2022178948A1 (zh) 模型蒸馏方法、装置、设备及存储介质
CN117151084B (zh) 一种中文拼写、语法纠错方法、存储介质及设备
CN110543566A (zh) 一种基于自注意力近邻关系编码的意图分类方法
CN111027681B (zh) 时序数据处理模型训练方法、数据处理方法、装置及存储介质
WO2022178950A1 (zh) 预测语句实体的方法、装置和计算机设备
CN115374270A (zh) 一种基于图神经网络的法律文本摘要生成方法
CN115563314A (zh) 多源信息融合增强的知识图谱表示学习方法
CN109308316A (zh) 一种基于主题聚类的自适应对话生成系统
CN112463935A (zh) 一种带有强泛化知识选择的开放域对话生成方法及模型
KR20210067865A (ko) 적대적 학습을 통한 질의응답 학습모델의 생성 방법 및 장치
Gunasekara et al. Quantized-dialog language model for goal-oriented conversational systems
US11954429B2 (en) Automated notebook completion using sequence-to-sequence transformer
CN114357284A (zh) 基于深度学习的众包任务个性化推荐方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927384

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21927384

Country of ref document: EP

Kind code of ref document: A1