CN117349311A

CN117349311A - Database natural language query method based on improved RetNet

Info

Publication number: CN117349311A
Application number: CN202311336112.XA
Authority: CN
Inventors: 张睿恒; 杨碧文; 徐立新; 张军; 王潮; 刘雨蒙; 苏毅
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2023-10-16
Filing date: 2023-10-16
Publication date: 2024-01-05

Abstract

The invention relates to a database natural language query method based on improved RetNet, and belongs to the technical field of computer databases. The invention uses the association between each sub-sentence after dividing the query sentence to divide the NL2SQL task into a plurality of sub-tasks for filling the slot value. According to the characteristics of the slot value filling task, the slot value filling task is divided into two sub problems, so that the workload of NL2SOL problems is greatly reduced; the improved RetNet is used as a natural language processing depth model, a depth feature sequence is expressed as a one-dimensional vector, and RNN-like parallel cyclic updating of a self-attention mechanism is realized, so that the computational complexity is reduced. By introducing a multi-scale residual error fusion mechanism, fusion of semantic features with different depths is effectively improved. The method can effectively improve the generation accuracy of SQL sentences and the operation efficiency of the database.

Description

Database natural language query method based on improved RetNet

Technical Field

The invention relates to a database natural language query method based on a fine-tuning improved RetNet natural language large model, and belongs to the technical field of computer databases.

Background

With the rapid development of internet information technology, mass data is generated in daily life, most of which are stored in structured or semi-structured relational databases for centralized management and utilization.

Currently, statistical analysis and application of data stored in databases is mainly implemented through programming languages such as structured query language (Structured Query Language, SQL), which requires users to have knowledge of the corresponding underlying database usage. Even simple manipulation of database data is quite difficult for non-computer domain specialized users. Therefore, how to implement man-machine interaction with the database through the interaction mode of natural language becomes an important problem to be solved at present.

In the field of natural language processing (Natural Language Processing, NLP), research in natural language to SQL statements (Natural Language to SQL, NL2 SQL) has become a hotspot, which can effectively convert human language to a computer-executable database query language. From a technical perspective alone, NL2SQL can automatically convert a user's natural language into a computer-understandable, executable database language, which is one of the important subtasks in Semantic Parsing (Semantic Parsing) in the natural language processing field, enabling efficient construction of a bridge between unstructured natural language and structured database language. With the vigorous development of artificial intelligence technology, man-machine interaction demand based on natural language mode is rapidly increased, and meanwhile, the front research and technology in the field of natural language processing are continuously changed, and the factors provide good preconditions for NL2SQL research.

The database natural language interface (Natural Language Interface to Database, NLIDB) technology is a fusion product of multiple fields such as database system, deep learning, man-machine interaction and the like. The method is oriented to users, and query natural language of the users is converted into SQL through a deep learning network model, so that the database accurately identifies query intention of the users. The NL2SQL has excellent performance, can effectively help people to operate the database simply and conveniently, greatly reduces the use threshold of the database system, and can simply and quickly complete corresponding data retrieval and data analysis requirements even for non-professional users.

However, the existing NL2SQL method uses an NLP model based on a transducer structure, is difficult to operate in parallel in the running process due to the internal structure and the depth of the model, has low efficiency, and the extracted semantic features do not have wide receptive fields, so that a database query interface cannot correctly understand the query intention of a user, and therefore SQL containing error information is generated, and the query efficiency is reduced. In addition, the NL2SQL method is long in training time and large in data demand, and cannot be directly suitable for the query demands of many professional fields.

Disclosure of Invention

The invention aims at overcoming the defects and shortcomings of the prior art and creatively provides a database natural language query method based on improved RetNet. The method has the capability of learning the high-dimensional semantic information converted from natural query languages with various structures to SQL, can effectively improve the generation accuracy of SQL sentences and effectively improve the operation efficiency of a database.

The innovation points of the method of the invention include: and dividing the NL2SQL task into a plurality of subtasks for filling the slot value by utilizing the association among each subtank after dividing the query statement. According to the characteristics of the slot value filling task, the slot value filling task is divided into two sub-problems, and the workload of the NL2SOL problem is greatly reduced. Using the modified RetNet (Retentive Networks) as a natural language processing depth model, the depth feature sequence is represented as a one-dimensional vector, and the RNN-like parallel cyclic update of the self-attention mechanism is realized, so that the computational complexity is reduced to O (1). By introducing a multi-scale residual error fusion mechanism, fusion of semantic features with different depths is effectively improved.

The invention is realized by adopting the following technical scheme.

A database natural language query method based on improved RetNet comprises the following steps:

step 1: the natural language is pre-process encoded using a sequence encoder.

A null column (Emmpty column) is added to each data table, replacing the segment code with a type code.

In the encoding process, four kinds of information, i.e., a natural language Question (Question), a range Column (Categorial Column), a numeric Column (Numerical Column), and an Empty Column (Empty Column), are learned. The method is convenient for the subsequent model to learn the data at the same time, so that the model can learn the multidimensional language features fully.

Step 2: the data processed in step 1 is word-segmented embedded encoded using a text-sequence block (CLIP sequence) encoder.

Specifically, the encoded vectors of the four information are input to a semantic enhancement mode encoder based on an improved RetNet self-attention model for enhancing the vectors derived from the text-sequence block encoder and obtaining depth features with fused different depth semantic information.

The RetNet secondarily encodes the input sequence in an autoregressive mode and conceals the dimension, so that the calculation complexity is reduced, the depth characteristic of the context vector is calculated, and the output characteristic vector is obtained after the semantic information is enhanced and can be used as the input of a classification model.

The position relation and the word sense of the word are presented in the form of an expansion vector, and the expanded parameterization vector can effectively represent the distribution condition of the word in the sentence in the word sense feature space, so that different word senses and words in different positions have the feature of distinguishing.

Step 3: the depth features are input into a decoder to obtain a bottom layer code shared by a plurality of subtasks, the bottom layer code comprises information for extracting condition values from the content vectors of a database table, and the global information is used for eliminating mismatching of interested targets of different tasks.

After the full-connection layer is used as a decoder, parameters of the feature vector are used as low-dimensional input of different sub-classification tasks, SQL results are partially predicted in parallel, the calculation complexity is further reduced, and the calculation efficiency is improved.

Step 4: and converting the bottom coding into content prediction of 8 subtasks by completing 8 predefined classifying subtasks, respectively generating corresponding multidimensional vectors, and representing probability distribution of the task to obtain the value, namely mapping of the bottom coding vectors on the task. Different sub-classification tasks are carried out at the final stage of the model, global semantic information can be aggregated to the greatest extent before the classification tasks, and interesting features of all the sub-tasks are aligned, so that the output has higher reliability.

Specifically, the sub-classification task is set to 8 related SQL statement structures, including:

PRT task: the database column name is used for predicting whether the SQL statement can appear to be reused;

S-NUM task: the method comprises the steps of predicting the number of database columns selected by a SELECT statement in SQL statements;

S-COL task: the database column name is used for generating a selected database column name in the SELECT statement, and the input of the database column name is set as a prediction result of the S-NUM task;

AGG task: the aggregation operation mode is used for predicting the use of column names in SQL sentences;

a W-NUM task for predicting an overview of selected column names in the WHERE statement;

the COND-OP task is used for predicting the relation between the conditions required by different SQL sentences under different conditions;

the W-COL task is used for predicting the probability that the column names in the database are matched with all the column names in the natural language query question, and the probability is input as a prediction result of the W-SUM;

the W-COL-OP task is used for predicting logical operators between the post-column and corresponding condition-to-condition of the selected query position statement (WHERE statement).

Step 5: and aggregating the values corresponding to the multidimensional vectors respectively generated by the 8 subtasks into SQL sentences according to a predefined structure, and generating a final SQL sentence.

8 subtask prediction SQL sentences based on priori knowledge of people are used, and the integrally generated mode is disassembled into a task tree with logic causal relationship, so that classification prediction can be performed in part in parallel, and prediction results of the SQL sentences can be comprehensively and accurately obtained by combining the two.

Preferably, the invention introduces a attention self-attention mechanism in the RetNet, and parallelly characterizes the Q, K, V matrix to obtain a calculation result of the self-attention module, so that a calculation core of the large-scale integrated graphics processing chip can be efficiently utilized to obtain a calculation speed which is higher than that of a transducer structure, model training and reasoning can be greatly accelerated, an effective, accurate and functional SQL sentence is finally generated, and the use experience of a user is improved.

Preferably, the invention divides the mapping long sequence characteristic of the query statement into a plurality of small blocks in the improved RetNet module, and the small blocks are characterized in parallel, and a cyclic characterization technology is adopted during the block crossing, so that the information in the blocks can be reserved to the maximum extent, the block crossing information is fused, the calculation force requirement and the physical memory requirement during long sequence training and reasoning are effectively reduced, and the deeper semantic characteristic capability is obtained by fitting an autoregressive decoding method.

Preferably, the invention breaks the symmetry of the depth network by using a residual structure in the RetNet and the decoder, eliminates unstable factors and linear dependence caused by zero singular points of input and network weight, solves the problem of deep network degradation caused by normalization, finally enhances the effective information content of a network model, and improves the generalization capability of the network by phase transformation.

Preferably, the invention uses the Chinese and English data pretrained large language model of the fine tuning of the query data in the training process, can save the time and resources of the head training, quickly adapt to the distribution of the query data, can support Chinese and English query at the same time, solve the data scarcity capability, has high generalization and can obtain a model with deeper comprehension.

Preferably, the invention divides the SQL sentence generating process into 8 subtasks by using the priori knowledge of people, PRT, S-NUM, S-COL, AGG, W-NUM, COND-OP, W-COL and W-COL-OP can represent the differentiated characteristics of different fields, and effectively improves the prediction accuracy of SQL fields.

Advantageous effects

Compared with the prior art, the invention has the following advantages:

1. the invention introduces a RetNet parallel self-attention mechanism and a multi-scale residual error structure, builds a deep learning generating SQL model, realizes a query module from deep learning natural language to SQL, ensures that the structured query language generated according to the deep features is more accurate, and remarkably reduces the calculated amount.

2. By organically combining the modules, the system can fully exert the advantages of each module, thereby improving the overall performance. The CLIP encoder provides strong semantic representation capability, the RetNet self-attention module captures important information in the sequence, and the post-processing module further processes and classifies the depth feature vector, so that the system has higher accuracy and efficiency in a database query interface. The comprehensive application enables the system to process and understand text data from different angles, and improves the performance of the system in a database query interface.

3. The invention uses the Chinese-English bilingual large language model fine tuning mechanism, so that the system can be excellent in query interfaces in different fields. The pre-trained large language model is pre-trained through a large-scale general corpus and has strong language understanding and expression capability. After fine tuning, the model can adapt to the technical terms and the contexts of different fields, so that the system can provide accurate and efficient query service in various fields, and has quick deployment capability and low-cost iterative optimization capability.

Drawings

FIG. 1 is a flow chart of a database natural language query method of the present invention;

FIG. 2 is a general framework diagram of a database natural language query method of the present invention;

fig. 3 is a schematic diagram of an improved RetNet feature enhancement flow.

Fig. 4 is a schematic diagram of a decoder and sub-classification framework.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

A database natural language query method based on improved RetNet introduces a RetNet self-attention technology and a multi-scale residual error structure in a deep learning model, innovatively uses eight sub-classification task systems of human prior, generates SQL after learning the deep features of natural language, and is a database natural language query interface model of Chinese-English bilingual for a querier. The overall flow is shown in fig. 1.

As shown in fig. 2, which is a general frame diagram of the database natural language query method according to the present invention, the steps include:

firstly, in the preprocessing stage before data encoding, dividing a natural language Question into four kinds of information in data table information, namely a natural language Question, a range Column Categorial Column, a digital Column and an Empty Column, so that a subsequent module can learn the data simultaneously.

And then, the divided four kinds of information are automatically segmented into words through Clip layer coding, and the words are converted into joint feature vectors with position relations and word senses.

Then, a semantic enhancement mode encoder based on an improved RetNet self-attention model is input, and depth characteristics with fused different depth semantic information are obtained.

The depth features are decoded by a decoder. Inputting the decoded features into 8 sub-classification tasks, and aggregating the values corresponding to the respectively generated multidimensional vectors into SQL sentences according to a predefined structure to generate a final SQL sentence.

Wherein, the improved RetNet self-attention model, as shown in figure 3, utilizes the semantic feature enhancement module to complete feature extraction and feature enhancement, forming a depth feature vector.

In the invention, feature extraction and enhancement are carried out on word vectors after word segmentation and expansion in database query. Considering the word vector feature extraction step in the transform-based self-attention mechanism, the invention provides an improved RetNet self-attention mechanism feature extraction method which is completed by an improved RetNet feature extraction module. Word vectors have rich context information, and word vector generation methods generally use a self-attention mechanism, which can consider the influence of context when generating word vectors, so that the generated word vectors not only depend on the surface form of a single word, but also contain the context information, and the expression capability of the word vectors is improved. The present module refers to a mainstream processing framework in the field of natural language processing, and is expected to extract a feature representation corresponding to SQL from structural information of a natural query language or the like.

The word vector feature enhancement method based on the improved RetNet comprises the following specific steps:

step A: initializing: natural language query word vector obtained by setting text-sequence block encoderNetwork parameter learning step size eta _θ Super parameter gamma; wherein x represents the sequence block of the input, +.>Representing the space of sequence vectors, d representing the dimension of the hidden layer vector space;

initial input isFor each layer of network RetNet _l It contains inputs of the multi-scale residuals, outputs are as follows:

X ^l ＝RetNet _l (X ^l-1 ),l∈{1,…,L}

where l represents each layer of network RetNet _l Is a reference to (a).

For the RetNet layer:

projecting an input to a one-dimensional value feature vector v ^l ＝X ^l ω ^V ，Is a learnable linear layer weight; v represents a parameter that produces a matrix of values.

The state vector s is further obtained by learning the linear transformation to obtain the key matrix K and the query matrix Q ^l The method comprises the following steps:

wherein A is a transition matrix with diagonalized hyper-parameters;

the method comprises the following steps:

where m represents an intermediate variable of the summation operation.

O is set to ^l Further simplified into:

wherein Q is ^l (γe ^iθ ) ^-l And K ^m (γe ^iθ ) ^-m Is a relative position code proposed for transformer structures (transformers); gamma represents a predefined d-dimensional vector, e ^iθ Representing the d-dimensional theta vector elements as real part and imaginary part of complex number in a group to obtain a diagonal matrix re ^iθ The diagonal elements are gamma and e ^iθ The complex vector is converted into complex vector for multiplication, and the result is converted back into the real vector.

And (B) step (B): simplifying gamma to scalar yields highly parallelized

Expressed in terms of flow: retention (X) _[i] )＝(Q _[i] K _[i] ⊙γ)V _[i] +(Q _[i] R _i ) The process of cyclic reasoning using multi-headed blocks is indicated by the term ζ, the matrix multiplication is indicated by the term X _[i] Representing the i-th block input, Q _[i] Query matrix, K, representing the ith block _[i] Key matrix representing the i-th block, V _[i] Matrix of values representing the ith block, R _i A key-input value matrix representing the i-th block; ζ represents a predefined hyper-parameter vector, ζ=γ ⁱ⁺¹ ,R _i ＝K _[i] V _[i] +γR _i-1 ；

Wherein, the multi-head block circulation is expressed as follows:

and (3) loop execution: for i in {1, …, L })

head _i ＝Retention(X _i )

Y＝GroupNorm _i (Concat(head ₁ ,…,head _i ))

MSR(X)＝(swish(Xω ^G )⊙Y)ω ^O

X ^l+1 ＝RetNet _l (X ^l )＝(X ^l +MSR(X))+gelu((X ^l +MSR(X))ω ¹ )ω ²

Wherein omega ¹ ,ω ² Are all learnable weights. head part _i Representing the ith attention header; y represents the multi-head characteristic after group regularization; groupNorm _i Representing a group regularization operation on an ith attention header; MSR represents the operation of activation and mapping; swish represents an activation function that generates a gating threshold; g represents a group layer; o represents a block output layer; omega ^G 、ω ^O 、ω ¹ 、ω ² Respectively representing different mapping matrixes with the learnable parameters; x is X ^l An input representing a first layer; gelu represents the gelu activation function.

After the loop is completed, a depth feature vector is formed, and the depth feature vector enters the decoder.

SQL structure classification prediction module:

the decoder and sub-classification module, as shown in fig. 4, establishes a corresponding sub-task model according to the feature vectors of each sub-task, takes the depth feature vectors subjected to feature extraction and reinforcement as input to form the feature vectors used in the corresponding sub-task processing, and combines the decoder and the PRT, S-NUM, S-COL, AGG, W-NUM, COND-OP, W-COL and W-COL-OP classifiers into SQL structure classification probability prediction output. The model uses partial parallel and partial serial human priori classification logic to realize an algorithm which is more suitable for professional SQL construction.

As shown in fig. 4, the main steps based on the asynchronous dominant actor commentator model include:

the first step: statement P _c (U)＝softmax(Wtanh(U)),The sub-classification module of the first layer is calculated in parallel, and W represents a learnable parameter:

(1) the prediction set of subtask RPT is {0,1}, which is used to predict whether a reused database column name will appear in SQL language.

Using sequence information qcl _s The vector is used as input to construct the bi-classification probability of the RPT task, denoted p ₁ ，p ₁ ＝sigmoid(W ₁ q _cls )，W ₁ A mapping matrix representing the 1 st one of the models having a learnable parameter; the sigmoid function is an activation function for the classification model output layer,z represents the input matrix of the layer, e ^-z The natural logarithm of the base constant e for each element of the input matrix is represented.

(2) The subtask S-NUM is used for predicting the number of database column names selected by the SELECT language in the SQ L language, and setting the prediction set of the S-NUM as {0,1,2,3}, and S-NUM classification probability distribution p after statistically analyzing the data set used ₂ The calculation formula of (2) is p ₂ ＝P _c (W ₂ q _cls )，P _c Representing the calculation method described in the first step, W ₂ Representing the mapping matrix of the model 2 nd with the learnable parameters.

(3) W-NUM is used to predict the total number of column names in WHERE language choices. The prediction set of the analog S-NUM, W-NUM task is also {0,1,2,3}, the classification probability distribution p of W-NUM ₃ The calculation process of (1) is p ₃ ＝P _c (W ₃ q _cls )，W ₃ Representing the mapping matrix of the model 3 rd with the learnable parameters.

(4) There are multiple conditions in the SQL statement, COND-OP is used to predict the relationships between the different conditions in this case, the prediction set is { None, "AND", "OR" }, the classification probability distribution p of COND-OP ₄ The calculation process of (1) is p ₄ ＝P _c (W ₄ q _cls) ，W ₄ Representing the mapping matrix of the model 4 th with the learnable parameters.

(5) After the WHERE statement in the SQL statement selects a column, the operator between that column and the corresponding condition value also needs to be predicted. The function of the W-COL-OP is a predictive operator whose predictive range is { ">”,“＝”,“！＝”,“<＝”,“>Any operator in = "}. The classification probability distribution of W-COL-OP is denoted as p ₅ (h _i )，p ₅ (h _i ) The column name of the selected database is denoted as column name h in the database table _i Is a function of the output probability of (a). W (W) ₅ A mapping matrix representing the 5 th learnable parameter of the model, H _v Representing the mean-pooled result of the value vector, W ₆ Mapping matrix representing the 6 th of the model with learnable parameters +.>Representing the code vector of the ith token in the SQL language sequence.

And a second step of: and on the prediction result of the first step, performing prediction of the second part of sub-classification.

(1) S-COL is the choice in generating SELECT statementsCan be predicted on the basis of S-NUM, S-COL being predicted on the basis of the result of S-NUM. Because the S-COL predicts the column names of the database, the predicted set is formed by combining all column names in the database. The classification probability distribution of S-COL is denoted as p ₆ The calculation process is expressed as formula p ₆ ＝P _c (W ₇ H _v +W ₈ H _h )，W ₇ 、W ₈ Mapping matrix with learning parameters of 7 th and 8 th of the model respectively, H _h Representing the overall code sequence of the task input.

(2) After S-COL predicts the column names of the database selected by SQL sentences, the aggregation operation used for the column names in AGG prediction SQL sentences is used, and the probability distribution p is classified ₇ ，The calculation result shows that the column name of the selected database is h in the database table _i Is a function of the output probability of (a). W (W) ₉ 、W ₁₀ The 9 th and 10 th mapping matrix with the learning parameters of the model are shown.

(3) The prediction of W-COL is performed on the basis of the prediction of W-NUM, and the prediction set is the same as SCOL, and is also all column names in the database table, wherein the column names are matched with those in the natural language query question. W-COL classification probability p ₈ ，p ₈ ＝P _c (W ₁₁ H _v +W ₁₂ H _h )，W ₁₁ Mapping matrix representing 11 th learning parameter of the model, W ₁₂ Representing the mapping matrix of the model 12 th with the learnable parameters.

And a third step of: and establishing a relation between the SQL and all the prediction results of the W-COL, searching the SQL vocabulary by combining the prediction results, and splicing and outputting a complete SQL sentence according to the groove value of the predefined vocabulary.

Loss function: the joint error set for the 8 sub-classification tasks after a round of training is a scalar function. Because each task is a classification problem, the loss function is selected as a cross entropy function, and the total loss function loss is:

wherein p is _k () A calculation process representing a kth subtask; loss () means using cross entropy as a loss function; n represents all the sample numbers in the subtask; x represents the input of the subtask, and y represents the true value corresponding to the subtask; k represents the summation index variable of 1-8 subtasks, and K represents the number of classification performed by the subtasks; x is x _ij An ith sample representing a jth class label; y is _ij A reference value representing the ith sample and the jth label, which indicates whether the value is true or not, and the value is 0 or 1; p (x) _ij ) Representing the predicted probability value.

Claims

1. A database natural language query method based on improved RetNet, comprising the steps of:

step 1: pre-processing encoding natural language using a sequence encoder;

adding a blank column to each data table, and replacing segment codes with type codes; in the encoding process, four kinds of information in question and data sheet information are learned, namely natural language question, range column, digital column and blank column, and are encoded into a database sheet;

step 2: using a text-sequence block encoder to perform word segmentation embedded encoding on the data processed in the step 1;

inputting the coded vectors of the four kinds of information into a semantic enhancement mode coder based on an improved RetNet self-attention model, for enhancing the vectors obtained from the text-sequence block coder and obtaining depth characteristics with fused semantic information of different depths;

step 3: inputting the depth features into a decoder to obtain a bottom layer code shared by a plurality of subtasks, wherein the bottom layer code comprises information of extracting condition values from the content vectors of a database table, and using global information to eliminate mismatching of interested targets of different tasks;

after using the full connection layer as a decoder, taking parameters of the feature vector as input of different sub-classification tasks, and partially predicting SQL results in parallel;

step 4: converting the bottom coding into content prediction of 8 subtasks by completing 8 predefined classifying subtasks, respectively generating corresponding multidimensional vectors, and representing probability distribution of the task to obtain the value, namely mapping of the bottom coding vectors on the task;

setting the sub-classification tasks to 8 about SQL statement structures includes:

S-COL task: the method comprises the steps of generating a selected database column name in a SELECT statement, and inputting a prediction result set as an S-NUM task;

AGG task: an aggregate operation mode used for predicting column names in SQL sentences;

the W-COL-OP task is used for predicting a logical operator between the column rear of the selected query position statement and the corresponding condition;

2. The method for query in natural language of database based on improved RetNet as claimed in claim 1, wherein the method for enhancing feature of word vector based on improved RetNet is as follows:

X ^l ＝RetNte _l (X ^l-1 ),l∈{1,…,L}

where l represents each layer of network RetNet _l Index of (2);

for the RetNet layer:

projecting an input to a one-dimensional value feature vector v ^l ＝X ^l ω ^V ，Is a learnable linear layer weight; v represents a parameter that generates a matrix of values;

wherein A is a transition matrix with diagonalized hyper-parameters;

the method comprises the following steps:

wherein m represents an intermediate variable of the summation operation;

o is set to ^l Further simplified into:

wherein Q is ^l (γe ^iθ ) ^-l And K ^m (γe ^iθ ) ^-m Is a relative position code provided for the transformer structure; gamma represents a predefined d-dimensional vector, e ^iθ Representing the d-dimensional theta vector elements as real part and imaginary part of complex number in a group to obtain a diagonal matrix gamma e ^iθ The diagonal elements are gamma and e ^iθ Converting into complex vector multiplication and converting the result back to the real vector result;

and (B) step (B): simplifying gamma to scalar yields highly parallelized

Expressed in terms of flow: retention (X) _[i] )＝(Q _[i] K _[i] ⊙γ)V _[i] +(Q _[i] R _i ) The process of cyclic reasoning using multi-headed blocks is indicated by the term ζ, the matrix multiplication is indicated by the term X _[i] Representing the i-th block input, Q _[i] Query matrix, K, representing the ith block _[i] Key matrix representing the i-th block, V _[i] Matrix of values representing the ith block, R _i A key-input value matrix representing the i-th block; ζ represents a predefined hyper-parameter vector, ζ=γ ⁱ⁺¹ ，R _i ＝K _[i] V _[i] +γR _i-1 ；

Wherein, the multi-head block circulation is expressed as follows:

and (3) loop execution: for i in {1, …, L })

head _i ＝Retention(X _i )

Y＝GroupNorm _i (Concat(head ₁ ，…，head _i ))

MSR(X)＝(swish(Xω ^G )⊙Y)ω ^O

X ^l+1 ＝RetNet _l (X ^l )＝(X ^l +MSR(X))+gelu((X ^l +MSR(X))ω ¹ )ω ²

Wherein omega ¹ ，ω ² Are all learnable weights; head part _i Representing the ith attention header; y represents the multi-head characteristic after group regularization; groupNorm _i Representing a group regularization operation on an ith attention header; MSR represents the operation of activation and mapping; swish represents an activation function that generates a gating threshold; g represents a group layer; o represents a block output layer; omega ^G 、ω ^O 、ω ¹ 、ω ² Respectively representing different mapping matrixes with the learnable parameters; x is X ^l An input representing a first layer; gelu represents the gelu activation function;

3. The improved RetNet based database natural language query method of claim 1, wherein:

establishing a corresponding subtask model according to the feature vectors of each subtask, taking the depth feature vectors subjected to feature extraction and reinforcement as input to form feature vectors used in corresponding subtask processing, and combining a decoder and PRT, S-NUM, S-COL, AGG, W-NUM, COND-OP, W-COL and W-COL-OP classifiers into SQL structure classification probability prediction output;

based on an asynchronous dominant actor criticism model, comprising:

the first step: statement P _c (U)＝softmax(Wtanh(U))，The sub-classification module of the first layer is calculated in parallel, and W represents a learnable parameter:

(1) the prediction set of the subtask RPT is {0,1}, which is used for predicting whether the repeated database column names appear in the SQL language;

using sequence information q _cls The probability of a vector as input to construct a two-class of RPT tasks is denoted as p ₁ ，p ₁ ＝sigmoid(W ₁ q _cls )，W ₁ A mapping matrix representing the 1 st one of the models having a learnable parameter; the sigmoid function is an activation function for the classification model output layer,z represents the input matrix of the layer, e ^-z Representing the natural logarithm of each element of the input matrix to be taken, which is based on a constant e;

(2) the subtask S-NUM function is to predict the number of database column names selected by the SELECT language in the SQL language, and after the data set used by statistical analysis, set the prediction set of S-NUM as {0,1,2,3}, S-NUM classification probability distribution p ₂ The calculation formula of (2) is p ₂ ＝P _c (W ₂ q _cls )，P _c Representing the calculation method described in the first step, W ₂ A mapping matrix representing the model 2 nd with a learnable parameter;

(3) W-NUM is used for predicting the total number of column names in WHERE language selection; the prediction set of the analog S-NUM, W-NUM task is also {0,1,2,3}, the classification probability distribution p of W-NUM ₃ The calculation process of (1) is p ₃ ＝P _c (W ₃ q _cls )，W ₃ A mapping matrix representing the 3 rd one of the models with a learnable parameter;

(4) there are multiple conditions in the SQL statement, COND-OP is used to predict the relationships between the different conditions in this case, the prediction set is { None, "AND", "OR" }, the classification probability distribution p of COND-OP ₄ The calculation process of (1) is p ₄ ＝P _c (W ₄ q _cls )，W ₄ A mapping matrix representing the 4 th learnable parameter of the model;

(5) after the WHERE statement in the SQL statement selects a column, the operator between the column and the corresponding condition value is also predicted; the function of the W-COL-OP is a prediction operator whose scope of prediction is { ">", "=", "+|! Any operator in = "," < = "," > = "; W-COL-The classification probability distribution of OP is denoted as p ₅ (h _i )，p ₅ (h _i ) The column name of the selected database is denoted as column name h in the database table _i Output probability of (2); w (W) ₅ A mapping matrix representing the 5 th learnable parameter of the model, H _v Representing the mean-pooled result of the value vector, W ₆ Mapping matrix representing the 6 th of the model with learnable parameters +.>A code vector representing an i-th token in the SQL language sequence;

and a second step of: on the prediction result of the first step, performing prediction of the second part of sub-classification;

(1) S-COL is a database column name selected in the generation of the SELECT statement that can be predicted on an S-NUM basis, S-COL being predicted on top of the S-NUM result; because the S-COL predicts the column names of the database, the predicted set is formed by combining all column names in the database; the classification probability distribution of S-COL is denoted as p ₆ The calculation process is expressed as formula p ₆ ＝P _c (W ₇ H _v +W ₈ H _h )，W ₇ 、W ₈ Mapping matrix with learning parameters of 7 th and 8 th of the model respectively, H _h Representing the overall code sequence of the task input;

(2) after the S-COL predicts the column names of the database selected by the SQL sentence, the aggregate operation used for the column names in the AGG predicted SQL sentence is used, the probability distribution p7 is classified,the calculation result shows that the column name of the selected database is h in the database table _i Output probability of (2); w (W) ₉ 、W ₁₀ Mapping matrix representing 9 th and 10 th learning parameter of the model;

(3) the prediction of W-COL is based on the prediction of W-NUMThe row prediction is the same as SCOL, and is also all column names in the database table, wherein the column names are matched with those in the natural language query question; W-COL classification probability p ₈ ，p ₈ ＝P _c (W ₁₁ H _v +W ₁₂ H _h )，W ₁₁ Mapping matrix representing 11 th learning parameter of the model, W ₁₂ A mapping matrix representing the 12 th learnable parameter of the model;

4. The improved RetNet based database natural language query method of claim 3, wherein:

setting the joint error of 8 sub-classification tasks after one round of training as a loss function, wherein the joint error is a scalar function; the loss function is selected as the cross entropy function, and the total loss function loss is:

wherein p is _k () Representing a computing process of the subtask; loss () means using cross entropy as a loss function; n represents all the sample numbers in the subtask; x represents the input of the subtask and less represents the true value corresponding to the subtask; k represents the summation index variable of 1-8 subtasks, and K represents the number of classification performed by the subtasks; x is x _ij An ith sample representing a jth class label; y is _ij A reference value representing the ith sample and the jth label, which indicates whether the value is true or not, and the value is 0 or 1; p (x) _ij ) Representing the predicted probability value.