CN114201506B

CN114201506B - Context-dependent semantic analysis method

Info

Publication number: CN114201506B
Application number: CN202111524256.9A
Authority: CN
Inventors: 陈观林; 余皆毅; 李甜; 杨武剑; 翁文勇
Original assignee: Zhejiang University City College ZUCC
Current assignee: Zhejiang University City College ZUCC
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2024-03-29
Anticipated expiration: 2041-12-14
Also published as: CN114201506A

Abstract

The invention relates to a context-dependent semantic analysis method, which comprises the following steps: encoding data by using ERNIE, and sequentially arranging the problem text and the tables and columns of a database to serve as initial input of an Alignment-Rect-Augmented NL2SQL model; updating attention using the current text code and the text codes of the previous rounds; and performing attention operation by using the obtained problem text codes and the alignment matrix of the previous round, and performing cross entropy calculation on the generated state tracking alignment matrix and the real label. The beneficial effects of the invention are as follows: a Chinese context correlation NL2SQL model based on alignment matrix enhancement is provided, the model is based on a RATSQL model, a pre-training language model BERT is utilized to obtain text character level codes of a problem text and a database mode, word vectors based on context information are obtained, state tracking attention and correlation characteristics are introduced, and alignment relation of the model between the problem text and the database mode is enhanced.

Description

Context-dependent semantic analysis method

Technical Field

The invention belongs to the technical field of context-related semantic parsing, and particularly relates to a context-related semantic parsing method.

Background

When digital data is related to the internet, conventional relational databases are typically used to store such data for ease of administration, operation, and maintenance. How to query necessary information from the relational databases through natural language, namely how to convert human natural language query descriptions into executable database query sentences SQL, has become one of the most popular research directions in the field of natural language processing.

The natural language query interface aims to complete man-machine interaction of a user through a natural language and relational database to obtain wanted data, and is a component for constructing an automatic database intelligent query system. The most important task to implement a natural language query interface is how to construct SQL statements, called NL2SQL operations, from natural language queries.

How to eliminate the differences in representation and structure between natural language queries, the structure and content of data tables in databases, and SQL statements is the core of using NL2 SQL. Because of how to map the query intent of a natural language query to the canonical description of the database, the problem of forming an accurate executable SQL statement is created, where NL2SQL operations are constrained by the syntax tree, and the SQL statement is handled separately in subtrees of the parts. The method achieves the purpose of eliminating inconsistency of Chinese text data, column name difference, inconsistency of natural language query description and inconsistency of database storage data.

Existing NL2SQL task research methods can be broadly divided into context-free methods and context-dependent methods. The context-free method is single round NL2SQL, a text and a database mode are given, and the relation between the text and the database mode is learned through a model, so that a desired SQL sentence is generated. The disadvantage of this approach is that only a single round of information can be processed, whereas the person's conversation habits are simple sentences, which can be challenging if all the information desired is given at one time. The context correlation method is that NL2SQL of multiple rounds, the dialogue content of each round is aligned with the database mode through the form of multiple rounds of dialogues, and corresponding SQL sentences of each round are obtained through a gradual updating mode. The context-dependent method may capture more complex and varied semantic information described in natural language text than the context-independent method.

Chinese patent invention (application number: CN202110737345.5, patent name: semantic analysis method, device, electronic equipment and storage medium), encode the input problem and corresponding database; generating SQL query sentences according to the coding result, wherein the following processing is respectively carried out on any SQL clause: determining a question segment corresponding to the SQL clause in the question; the SQL clause is generated according to the question segment, so that the accuracy of the generated SQL query statement and the like can be improved. However, the invention is based on the NL2SQL of the context nothing, and has poor effect on the NL2SQL task of the context nothing.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provides a context-dependent semantic analysis method.

The context-dependent semantic parsing method specifically comprises the following steps:

s1, encoding data by using ERNIE, sequentially arranging the tables and columns of a problem text and a database, and obtaining an encoding representation of the problem text and the database mode by using the table and the columns as initial input of an Alignment-Rect-Augmented NL2SQL model;

x _input ＝(q ₁ ，q ₂ ，...，q _i ，t ₁ ，t ₂ ，...，t _j ，c ₁ ，c ₂ ，...，c _k )

in the above, q _i Each character element representing a question text, i being the total number of character elements of the question text; t is t _j Representing the name of each table in the database, j being the total number of tables in the database; c _k Representing the column names under the corresponding tables in the database, k being the total number of column names present in the database;

s2, updating attention by using the current text codes and the text codes of the previous rounds;

s3, performing attention operation by using the problem text code obtained in the step S2 and the alignment matrix of the previous round, and performing cross entropy calculation on the generated state tracking alignment matrix and the real label; the alignment matrix is co-occurrence information between the problem text and the database mode, namely, a corresponding matrix appears in the problem and a corresponding matrix appears in the database mode, the relation between the problem text and the database mode is aligned, and dialogue information of the current round is added to correct the alignment problem of the alignment matrix of the current round;

s4, calculating the correlation of the alignment matrix by using an n_gram mode, and acquiring the correlation of the alignment matrix by using a correlation Schema-Linking to obtain a correlation alignment matrix: firstly, determining the length of an n_gram, searching a question text, and listing words or sentences existing in the length of the n_gram in the question text; performing correlation calculation by using the candidate fragments, the table names and the column names respectively, and comparing a correlation calculation result with a set threshold value; if the correlation calculation result is greater than the threshold value, setting the candidate fragments related to the table names as CES, setting the candidate fragments related to the column names as TES, and setting the candidate fragments not related to the column names as NONE;

s5, performing fusion calculation on the state tracking alignment matrix and the correlation alignment matrix by utilizing the step S3 and the step S4 to obtain a current alignment matrix;

s6, replacing the Alignment matrix in the NL2SQL neural network model by using the Alignment matrix obtained in the step S5, and bringing the Alignment matrix into the step S3 to train the obtained Alignment-Rect-Augmented NL2SQL model.

Preferably, the step S2 specifically includes:

the historical information tracking attention mechanism captures the interrelation between the current round of question text and the historical question text, in the current round t, the attention weight between the current round of question text codes and the historical question text codes is calculated by using a dot product attention mode, and then the historical question text codes are added with the current question text codes in a weighted average mode to obtain

α ^turn ＝softmax(s _i )

In the above-mentioned method, the step of,word embedding for the first few rounds，/>Word embedding, W, of the current round _turn-att Is the parameter to be learned, alpha ^turn Attention weight of the current turn, which is the learned attention weight of the question text code of the previous turn,/for the current turn>The method is that the problem text codes are updated by historical attention; use->To describe current contextually relevant information.

Preferably, the step S3 specifically includes the steps of:

s3.1, based on the schema_linking marked in the data, the attention calculation is carried out by using the features of the extracted problem text in the round and the word embedding of the alignment matrix of the previous round, a parameter model for acquiring the guiding effect of the alignment matrix of the previous round on the current round is established to guide updating of the alignment matrix of the current round, and the relation between the problem text and the database mode is more accurately aligned;

R ^current ＝R ^current +a _i .R ^last

in the above-mentioned method, the step of,word embedding, q, of the alignment matrix of the previous round ^current Word embedding of the dialog of the current round, wherein tanh, W and U are weights to be learned, and a _i Is to learn the attention weight, R ^current Is the alignment matrix of the current wheel, R ^last Is the alignment matrix of the previous round, and R after updating is obtained after the weight updating is learned ^current ；

S3.2, correcting the alignment problem of the alignment matrix of the current round by adding dialogue information of the current round: taking dialogue information as a key, embedding words of an alignment matrix of the previous round as a query, and performing attention calculation by using the key and the query; the importance of the Alignment matrix of the previous round on the current Alignment matrix under the current dialogue information is calculated, the weight calculated by the Alignment matrix and the attention of the previous round is used for calculating and updating the Alignment matrix of the current round, then the model link labeling information is used as a real label, a loss function is added to the original Alignment-Rect-Augmented NL2SQL model to train together, the current Alignment matrix is updated as a supervision signal, and the loss function is as follows:

in the above, y _k Label for sample, p (x _k ) For the probability that the sample is predicted to be a positive class, m and n represent dimensions of the alignment matrix, and q is the class of the alignment matrix label.

Preferably, the step S5 specifically includes the steps of:

s5.1, carrying out weighted summation on the explicit Alignment matrix and the implicit Alignment matrix, setting a parameter to be learned, allowing the Alignment-Rect-Augmented NL2SQL model to carry out back propagation through a gradient descent method, and learning a weight relation between the explicit Alignment matrix and the implicit Alignment matrix;

s5.2, adding a RELU function as an activation function, and adding nonlinear capability for parameters to be learned:

in the above-mentioned method, the step of,encoding for the current explicit alignment matrix, +.>For the current correlation alignment matrix coding, RELU is the activation function, W is the parameter to be learned, < ->To learn the final alignment matrix, x _i 、x _j For the encoded representation of the input, is a parameter to be learned for attentive manipulation,/->For the relative positional relationship of the operation, attention weight +.> For the parameters to be learned, the final result +.>

The beneficial effects of the invention are as follows:

aiming at the problem that a context-related model is not matched in a context-free NL2SQL technology, the invention provides a Chinese context-related NL2SQL model Alignment-Rect-authorized NL2SQL based on Alignment matrix enhancement, the model obtains text character level codes of a problem text and a database mode by utilizing a pre-training language model BERT on the basis of a RATSQL model to obtain word vectors based on context information, and then introduces state tracking attention and correlation characteristics to enhance the Alignment relationship of the model between the problem text and the database mode.

Experimental results on the CHASE dataset show that the index of the Alignment-Rect-Augmented NL2SQL model is better than other multi-round Chinese NL2SQL models. The Alignment-Rect-Augmented NL2SQL can better align the relation between the problem text and the database mode, and the preset relation is learned, so that the accuracy of generating the SQL task is improved.

Drawings

FIG. 1 is a schematic flow diagram of Alignment-Rect-Augmented NL2SQL model Alignment-Rect-Augmented NL2SQL based on Alignment matrix enhancement proposed by the present invention;

fig. 2 is a historical information trace attention diagram.

Detailed Description

The invention is further described below with reference to examples. The following examples are presented only to aid in the understanding of the invention. It should be noted that it will be apparent to those skilled in the art that modifications can be made to the present invention without departing from the principles of the invention, and such modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Example 1

An embodiment of the present application provides a context-dependent semantic parsing method as shown in fig. 1:

in the above, q _i Each character element representing a question text, i being the total number of character elements of the question text; t is t _j Representing the name of each table in the database, j being the total number of tables in the database; c _k Representing the column names under the corresponding tables in the database, k is the total number of column names present in the database.

Example two

Based on the first embodiment, a second embodiment of the present application provides a Chinese context-related NL2SQL model Alignment-Rect-Augmented NL2SQL method based on Alignment matrix enhancement of the first embodiment, including the following steps:

s1, acquiring a problem text and a database mode code;

ERNIE is used herein to encode data, with the input of the model having the problem and the table and column sequence of the database as initial inputs. The specific input format is as follows:

The model is then entered to obtain a coded representation of the acquired question text and database schema.

S2, updating by using the current text codes and the attention operation of the previous rounds;

as shown in FIG. 2, the historical information tracking attention mechanism captures the problem and historical problem correlations for the current turn. In the current round t, the attention weight of the problem code and the history problem code of the current round is calculated by using a dot product attention mode, and then the history problem code is added with the current problem code by a weighted average mode to obtainMay be used to describe the current context-bearing information. The formula is as follows:

α ^turn ＝softmax(s _i )

in the above-mentioned description of the invention,word embedding of the first few rounds, < ->Word embedding, W, of the current round _turn-att Is a parameter which can be learned, alpha ^turn Is the attention weight of the learned question code of the previous round to the current round,/for the previous round>Is the problem code updated by the history attention.

S3, performing attention operation by using the problem text code obtained in the step S2 and the alignment matrix of the previous round, and performing cross entropy calculation on the generated state tracking alignment matrix and the real label;

the alignment matrix is co-occurrence information between the problem and the database schema, i.e., a corresponding matrix of occurrences in the problem and occurrences in the database schema.

Based on the condition that the schema_linking is marked in the data, the attention calculation is performed by using the feature of the problem extracted in the round and the word embedding of the alignment matrix of the previous round, so as to establish a parameter model for acquiring the guiding effect of the alignment matrix of the previous round on the current round, and the parameter model is used for guiding updating of the alignment matrix of the current round, so that the relation between the problem and the database mode is better aligned. The formula is as follows:

R ^current ＝R ^current +a _i .R ^last

in the above-mentioned description of the invention,word embedding, q, of the alignment matrix of the previous round ^current Word embedding of the dialog of the current round, where tanh, W and U are learnable weights, and α _i Is to learn the attention weight, R ^current Is the alignment matrix of the current wheel, R ^last Is the alignment matrix of the previous round, and R after updating is obtained after the weight updating is learned ^current 。

In the model, the possible alignment problem of the alignment matrix of the current round can be corrected by adding knowledge of dialogue information of the current round, words of the alignment matrix of the previous round are embedded into the query by taking the dialogue information as keys, attention calculation is performed by using the keys and the query, the importance of the alignment matrix of the previous round to the current alignment matrix under the current dialogue information is calculated, in the model, the alignment matrix of the current round is calculated and updated by using weights calculated by the alignment matrix and the attention of the previous round, then the information is marked as a real label through a mode link, and a loss function is added to the original model for training together to update the current alignment matrix as a supervision signal. The formula is as follows:

S4, acquiring the correlation of the alignment matrix by using a correlation scheme-Linking;

as shown in table 1 below, the present embodiment uses the n_gram method to calculate the correlation of the alignment matrix. First, the length of the n_gram is determined, where the value is typically set to 5; secondly, searching the question text after the n_gram length, and listing possible words or sentences within the n_gram length in the question text; then, the candidate segment and the table name are respectively used for carrying out correlation calculation with the column name, and the segment of the candidate segment is marked as the following condition according to the set threshold value: 1) When the number is greater than a certain threshold, the table name is referred to as "CES", the column name is referred to as "TES", and the table name is referred to as "NONE".

Table 1 correlation-based name matching algorithm table

in the module, fusion calculation is carried out on the explicit alignment matrix and the implicit alignment matrix. If it is simply added, simple addition is not a good choice because the magnitude of the effect of the two parts on the model is uncertain. Therefore, in this section, they are selected to be weighted and summed, a learnable parameter is set, and the model is made to learn the weight relationship between them through back propagation by a gradient descent method. Further, the relationship of the explicit alignment matrix to the implicit alignment matrix is not necessarily a linear relationship, and may be a nonlinear combination. Because the explicit relationship and the evaluation dimension of the correlation are not necessarily the same dimension, we add an activation function to the learning of the parameters, adding non-linear capabilities to the learning of the parameters, above learning their relationship using the parameters. Because our required output is not probability, the sigmod function is not suitable; because the range of output is not necessarily (-1, +1), the alignment matrix enhanced based encoder is also unsuitable with the tanh function; here we have therefore chosen the RELU function as the activation function for this part. The formula is as follows:

in the above-mentioned description of the invention,encoding for the current explicit alignment matrix, +.>For the current correlation alignment matrix coding, RELU is the activation function, W is the learnable parameter,/for the matrix coding>To learn the final alignment matrix, x _i 、x _j For the coded representation of the input +.> Is a parameter which can be learned for attentive manipulation,/->For the relative positional relationship of the operation, attention weight +.> For a learnable parameter, learn the final result +.>

S6, replacing the alignment matrix in the NL2SQL neural network model by using the alignment matrix obtained in the step S5, and training the obtained model in the step S3.

The multiple deep learning models of this embodiment are all trained based on a pre-training model Ernie plus specific network layer fine-tuning (fine-tuning) of downstream tasks, with the word segmenter (token) using the Ernie version. In order to prevent how much the learning rate damages the original learned knowledge of the pre-trained model during the fine tuning, the learning rate for the pre-trained model part is set to 1e-5, and the downstream tasks are decoded by using LSTM, so that the learning rate is set to 2e-3 for better learning because LSTM is not pre-trained. lstm_hidden_size is the hidden dimension of the BiLSTM output, train_epochs is the total number of iterations of training, and batch_size is the batch size of collective training at training. word_compressing_size is 768-dimensional, the lstm_hidden_size of the encoder is 400-dimensional, the lstm_hidden_size of the decoder is 300-dimensional, and the succession relationship between the front and the back is the multi-round NL2SQL, so the batch_size can only be 1. the train_epochs is 50.

Claims

1. A method of context-dependent semantic parsing, comprising the steps of:

s3, performing attention operation by using the problem text code obtained in the step S2 and the alignment matrix of the previous round, and performing cross entropy calculation on the generated state tracking alignment matrix and the real label; the alignment matrix is shared information between the problem text and the database mode, the relation between the problem text and the database mode is aligned, and dialogue information of the current round is added to correct the alignment problem of the alignment matrix of the current round;

2. The context-dependent semantic parsing method according to claim 1, wherein step S2 is specifically:

the historical information tracking attention mechanism captures the interrelation between the current round of question text and the historical question text, in the current round t, the attention weight between the current round of question text codes and the historical question text codes is calculated by using a dot product attention mode, and then the historical question text codes are added with the current question text codes in a weighted average mode to obtainTo (get to)

α ^turn ＝softmax(s _i )

In the above-mentioned method, the step of,word embedding of the first few rounds, < ->Word embedding, W, of the current round _turn-att Is the parameter to be learned, alpha ^turn Attention weight of the current turn, which is the learned attention weight of the question text code of the previous turn,/for the current turn>The method is that the problem text codes are updated by historical attention; use->To describe current contextually relevant information.

3. The context-dependent semantic parsing method according to claim 2, wherein step S3 specifically comprises the steps of:

R ^current ＝R ^current +a _i ·R ^last

4. A context-dependent semantic parsing method according to claim 3, wherein step S5 specifically comprises the steps of:

s5.2, adding RELU function as an activation function:

in the above-mentioned method, the step of,encoding for the current explicit alignment matrix, +.>For the current correlation alignment matrix coding, RELU is the activation function, W is the parameter to be learned, < ->To learn to be finalAlignment matrix, x _i 、x _j For the coded representation of the input +.> Is a parameter to be learned for attentive manipulation,/->For the relative positional relationship of the operation, attention weight +.> For the parameters to be learned, the final result +.>