CN115203236B - text-to-SQL generating method based on template retrieval - Google Patents

text-to-SQL generating method based on template retrieval Download PDF

Info

Publication number
CN115203236B
CN115203236B CN202210836518.3A CN202210836518A CN115203236B CN 115203236 B CN115203236 B CN 115203236B CN 202210836518 A CN202210836518 A CN 202210836518A CN 115203236 B CN115203236 B CN 115203236B
Authority
CN
China
Prior art keywords
sql
template
column
database
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210836518.3A
Other languages
Chinese (zh)
Other versions
CN115203236A (en
Inventor
车万翔
窦隆续
潘名扬
赵妍妍
刘挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202210836518.3A priority Critical patent/CN115203236B/en
Publication of CN115203236A publication Critical patent/CN115203236A/en
Application granted granted Critical
Publication of CN115203236B publication Critical patent/CN115203236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A text-to-SQL generating method based on template retrieval relates to the technical field of data processing, and aims at solving the problem that in the prior art, the decoding speed of SQL sentences with longer length is low, and due to the fact that the parallelism of a non-autoregressive model brings time performance improvement, meanwhile, the method has the defect that context information of a target sequence cannot be observed in a generating stage, the method overcomes the defects of the non-autoregressive model through template retrieval and repeated iteration generation, and the technical scheme of the method is more than 50% higher than that of the traditional method for SQL sentences with complicated structure and longer length. The template library of the technical scheme has expandability, is easy to migrate and has higher generation speed.

Description

text-to-SQL generating method based on template retrieval
Technical Field
The invention relates to the technical field of data processing, in particular to a text-to-SQL generation method based on template retrieval.
Background
The text-to-SQL generation task is an important direction in semantic parsing, and the main contents are as follows: on the premise of giving a database or a table, the system generates SQL sentences consistent with the user description semantics according to the user description (or the problem), and then obtains the query result in the database or the table. Most of the research on text-to-SQL generating tasks is an end-to-end generating mode, and the following categories generally exist: SQL generation based on fixed templates (e.g., SQLova, M-SQL, etc.), SQL generation based on grammar and transfer systems (e.g., RATSQL, etc.), and SQL generation based on pre-training models and constraint decoding (e.g., PICARD, etc.). However, existing model architectures, while having relatively good time efficiency and decoding performance for SQL statements of simpler structure, are slower to decode for SQL statements of longer length, which are complex in structure.
Disclosure of Invention
The purpose of the invention is that: aiming at the problem of low decoding speed of SQL sentences with longer length in the prior art, a text-to-SQL generating method based on template retrieval is provided.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the text-to-SQL generating method based on template retrieval comprises the following steps:
step one: acquiring a data set, wherein the data set comprises a user question, a database and SQL sentences, and then utilizing semantics to analyze the SQL sentences in the data set so as to construct an SQL template library;
step two: acquiring the structure of a database, and searching an SQL template most relevant to the user question in an SQL template library according to the user question and the structure of the database;
step three: splicing the user question, the structure of the database and the SQL template most relevant to the user question to obtain a word element sequence, and inputting the word element sequence into a pre-training language model for coding to obtain a coding vector of each word element;
step four: selecting a coding vector of a first word element, and predicting the SQL sequence length consistent with the questioning semantics of the user through a feedforward neural network;
step five: based on the SQL sequence length, decoding the coded vector by utilizing a non-autoregressive transducer to obtain an SQL sentence consistent with the user questioning semantics.
Further, the specific steps of the first step are as follows:
replacing tables, columns, values and ordering modes in SQL by using specific marks, deleting on clauses in SQL to obtain an SQL template, and removing repeated SQL templates until all SQL sentences are processed to obtain an SQL template library;
the specific table, column, value and ordering mode which are presented in the SQL are replaced by specific marks are as follows:
replace table name with [ TAB ];
replace column name with [ COL ];
the limit clause value is replaced with [ NUM ], and other values are replaced with [ VAL ];
the ordering is replaced by an ORD, which includes ascending and descending order, namely ASC and DESC.
Further, the retrieval is performed through a template retrieval model, and the template retrieval model is obtained by introducing loss function optimization based on a double-tower model;
the input of the double-tower model comprises a template part and a query part;
the template part is a template in an SQL template library;
the query part is formed by splicing a user question and a database structure to obtain a query sequence S;
the specific processing steps of the double-tower model are as follows:
and respectively sending templates in the query sequence S and the SQL template library into two independent pre-training language models for encoding, respectively obtaining the encoding result of the query and the encoding result of the template through a multi-layer feedforward neural network aiming at the encoded result, then calculating the encoding result of the query and the encoding result of the template to obtain cosine similarity of the two, and selecting the template corresponding to the maximum value of the cosine similarity in the SQL template library as the SQL template most relevant to the user question.
Further, the query sequence S is expressed as:
S=<TABLE>t 1 |t 2 |…|t N |<COLUMN>c 1 |c 2 |…|c M |<QUESTION>q 1…n
wherein t is 1 ~t N C is the table name in the database 1 ~c M Q is the column name in the database 1 ~q n As a word element in the question,<TABLE>、<COLUMN>and<QUESTION>For identifying the list head, the column names and the special symbols of the problems, N is the number of the tables in the database, M is the number of the columns in the database, and N is the number of the words in the problems;
the loss function is expressed as:
Figure GDA0004065709400000021
wherein S represents a query sequence, T +/- Representing positive and negative templates, p representing conditional probabilities.
Furthermore, the input of the pre-training language model in the third step is obtained by splicing SQL templates most relevant to the user question after the query sequence S;
the pre-training language model in the third step is obtained by adding new type codes on the basis of the original position codes of the pre-training language model;
the new type code includes:
table and column position coding: each table name and column name in the input sequence corresponds to a single code, and the other codes are marked as 0 from 1;
table, column identification code: table names are represented by 1, column names are represented by 2, and others are represented by 0;
column type coding: 0 represents other, 1-5 represent integers, character strings, floating point numbers, dates and Boolean types respectively;
database matching coding: and matching the table and the column in the database with the word elements in the user problem in a character string matching mode, wherein the complete matching is marked as 1, the partial matching is marked as 2, and the other cases are marked as 0.
Furthermore, the feedforward neural network in the step four is obtained through training, and cross entropy loss functions are used for optimization during training, and the cross entropy loss functions are added into the loss functions of the model overall in a ratio of 0.1 times.
In the fifth step, decoding is performed through a pointer network based on fragment copying, and SQL sentences consistent with the user questioning semantics are expressed in the form of keywords and range indexes;
the scope index refers to that in the SQL sentence, the table name, the column name and the condition value in the SQL sentence are represented by using the starting position index and the ending position index of the fragment in the input sequence.
Further, in the fifth step, the non-autoregressive transformers are obtained by randomly initializing transformers and adding a pointer network, and training by using a cross entropy loss function.
Further, the specific steps of the fifth step are as follows:
firstly, using < mask > symbols with the same length as the SQL sequence obtained in the step four as the input of a non-autoregressive transducer, and calculating self-attention with the coding vector of each word element to obtain the coding vector of each < mask > symbol;
then carrying out iteration for preset times by using the coding vector of each < mask > symbol to generate a keyword and a range index in the SQL sentence;
and finally, filling corresponding tables, columns and values according to the generated range indexes, and supplementing the missing on clauses to obtain a final SQL sentence.
Further, the pre-trained language model is BERT, roBERTa or electric.
The beneficial effects of the invention are as follows:
because the parallelism of the non-autoregressive model brings time performance improvement and has some defects, the context information of the target sequence cannot be observed in the generation stage, the method and the device make up for the defects of the non-autoregressive model through template retrieval and repeated iteration generation, and the technical scheme of the method and the device has the advantages that the structure is complex, the SQL sentence with a longer length is adopted, and the decoding speed is improved by more than 50% compared with that of the traditional method. The template library of the technical scheme has expandability, is easy to migrate and has higher generation speed.
Drawings
FIG. 1 is an overall flow chart of the present application;
FIG. 2 is a diagram of the overall architecture of the model;
FIG. 3 is a template retrieval model;
FIG. 4 is a schematic illustration of a template filling portion;
fig. 5 is a schematic diagram of multiple iterative decoding.
Detailed Description
The first embodiment is as follows: describing the present embodiment with reference to fig. 1, the present embodiment discloses: the text-to-SQL generating method based on template retrieval comprises the following steps:
step one: acquiring a data set, wherein the data set comprises a user question, a database and SQL sentences, and then utilizing semantics to analyze the SQL sentences in the data set so as to construct an SQL template library;
step two: acquiring the structure of a database, and searching an SQL template most relevant to the user question in an SQL template library according to the user question and the structure of the database;
step three: splicing the user question, the structure of the database and the SQL template most relevant to the user question to obtain a word element sequence, and inputting the word element sequence into a pre-training language model for coding to obtain a coding vector of each word element;
step four: selecting a coding vector of a first word element, and predicting the SQL sequence length consistent with the questioning semantics of the user through a feedforward neural network;
step five: based on the SQL sequence length, decoding the coded vector by utilizing a non-autoregressive transducer to obtain an SQL sentence consistent with the user questioning semantics.
The scheme is suitable for text-to-SQL generating tasks and can be roughly divided into two parts: template retrieval and SQL generation. The specific flow is shown in figure 1: the overall structure of the model is shown in fig. 2:
1. template library construction and retrieval
Firstly, according to SQL sentences appearing in the semantic analysis data set, an SQL template is extracted, and a template library is constructed. Tables, columns, values, etc. that appear in SQL are replaced with specific labels. The specific rules are as follows: table names replaced by TAB, column names replaced by COL, other values replaced by VAL, ordering (ascending/descending) replaced by ORD, limit clause values replaced by NUM. The on clauses in SQL are deleted simultaneously (which can be deduced from the primary-foreign key relationships between the tables in the from and join clauses). Such as by SQL statements: "select name from student where age >18; "SQL templates can be obtained: "select [ COL ] from [ TAB ] where [ COL ] > [ VAL ]; ".
After the template library is constructed, training is carried out by using a double-tower model to obtain a template retrieval model, and the model structure is shown in figure 3: the template part is all templates in the template library, the query part is the splicing of the problems and the database structure, and the format is as follows: "S =<TABLE>t 1 |t 2 |…|t N |<COLUMN>c 1 |c 2 |…|c M |<QUESTION>q 1...n ". Wherein t is i C is the table name in the database i Q is the column name in the database i Is a term in the question.<TABLE>、<COLUMN>and <QUESTION>To identify the head, column name and special symbols of the question。
The query sequence and templates are fed into a pre-trained language model for encoding. The pre-training language model can use BERT, roBERTa, electra and other self-coding language models, and uses the coding results corresponding to the sentence head identifier ([ CLS ] or [ BOS ]), and the coding results of the query and the template are obtained respectively through the multi-layer feedforward neural network.
And finally, calculating cosine similarity of the two, sequencing the templates, and selecting the SQL template with the maximum similarity as the corresponding SQL template. During training, 3-5 negative templates are selected for each problem, and the loss function is as follows:
Figure GDA0004065709400000051
wherein S represents a query sequence, T +/- Representing positive and negative templates.
2. Template filling (SQL generation)
The template filling stage adopts a non-autoregressive transducer structure, and the module is divided into three parts: encoder, length module, decoder. The detailed structure is shown in fig. 4:
the input of the coding part is consistent with the query part of the TEMPLATE retrieval module, and the obtained TEMPLATE (taking < TEMPLATE > as prefix) is queried in the previous stage after the query part is spliced. The encoder adopts the RoBERTa pre-training language model, and adds several new types of codes (added to the Embedding representation of the input sequence in the same way) based on the original position codes:
table and column position coding: each entity (table or column) of the database portion in the input sequence corresponds to a separate code, starting with 1 and marking the other portions as 0;
table, column identification code: the position corresponds to the table name by 1, the column name by 2, and the others by 0;
column type coding: 0 represents other, 1-6 represent integers, character strings, floating point numbers, dates and Boolean types respectively;
database matching coding: and matching the table and the column in the database with the word elements in the problem in a character string matching mode, wherein the complete matching is marked as 1, the partial matching is marked as 2, and the other cases are marked as 0.
After the coding is finished, the coding result of the sentence head identifier is used for predicting a length module through a layer of feedforward neural network, and the loss function of the length module is added into the loss function of the model overall in a ratio of 0.1 times during training.
The technical scheme of the application uses a pointer network based on fragment copy in the decoding part to complete the non-autoregressive decoding part, and the SQL is expressed in the form of a keyword and a range index. The scope index refers to a table name, a column name, and a condition value (shown in the following table) in the SQL statement, which are represented by using a start position index and an end position index of a fragment in the input sequence.
Table 1: position index representation
Figure GDA0004065709400000061
The decoder uses a randomly initialized transducer and adds a pointer network-like generation module. The input section uses a corresponding number of < mask > symbols according to the result of the length prediction module. In the model decoding stage, SQL sentences are generated repeatedly, and only one or more word elements which are the most determined by the model are generated each time. In decoding SQL statements, the generation may be performed by "copying" (copying the table name, column value, etc. referred to by the fragment index) or "generating" (SQL key). An example diagram of the decoding section is shown in fig. 5:
after decoding is completed, corresponding tables, columns and values are filled back according to the generated indexes, and missing on clauses are supplemented according to the tables appearing in SQL, so that a final SQL sentence is obtained.
The closest approach to this application is the PICARD model proposed by TorstenScholak et al, which also uses the transducer model to accomplish the text-to-SQL generation task, but this approach is only applicable to Spider datasets and the time efficiency of decoding is low. The M-SQL model proposed by XiaoyuZhang et al is also a template-based text-to-SQL generation model, but the solution is single in template, not ubiquitous, and relatively inefficient in time. The template library of the technical scheme has expandability, is easy to migrate and has higher generation speed.
It should be noted that the detailed description is merely for explaining and describing the technical solution of the present invention, and the scope of protection of the claims should not be limited thereto. All changes which come within the meaning and range of equivalency of the claims and the specification are to be embraced within their scope.

Claims (10)

1. The text-to-SQL generating method based on template retrieval is characterized by comprising the following steps of:
step one: acquiring a data set, wherein the data set comprises a user question, a database and SQL sentences, and then utilizing semantics to analyze the SQL sentences in the data set so as to construct an SQL template library;
step two: acquiring the structure of a database, and searching an SQL template most relevant to the user question in an SQL template library according to the user question and the structure of the database;
step three: splicing the user question, the structure of the database and the SQL template most relevant to the user question to obtain a word element sequence, and inputting the word element sequence into a pre-training language model for coding to obtain a coding vector of each word element;
step four: selecting a coding vector of a first word element, and predicting the SQL sequence length consistent with the questioning semantics of the user through a feedforward neural network;
step five: based on the SQL sequence length, decoding the coded vector by utilizing a non-autoregressive transducer to obtain an SQL sentence consistent with the user questioning semantics.
2. The method for generating text to SQL based on template retrieval according to claim 1, wherein the specific steps of the first step are:
replacing tables, columns, values and ordering modes in SQL by using marks, deleting on clauses in SQL to obtain an SQL template, and removing repeated SQL templates until all SQL sentences are processed to obtain an SQL template library;
the table, column, value and ordering mode in the SQL are replaced by the marks specifically comprises the following steps:
replace table name with [ TAB ];
replace column name with [ COL ];
the limit clause value is replaced with [ NUM ], and other values are replaced with [ VAL ];
the ordering is replaced by an ORD, which includes ascending and descending order, namely ASC and DESC.
3. The text-to-SQL generation method based on template retrieval according to claim 2, wherein the retrieval is performed by a template retrieval model, which is based on a double-tower model and is obtained by introducing a loss function optimization;
the input of the double-tower model comprises a template part and a query part;
the template part is a template in an SQL template library;
the query part is formed by splicing a user question and a database structure to obtain a query sequence S;
the specific processing steps of the double-tower model are as follows:
and respectively sending templates in the query sequence S and the SQL template library into two independent pre-training language models for encoding, respectively obtaining the encoding result of the query and the encoding result of the template through a multi-layer feedforward neural network aiming at the encoded result, then calculating the encoding result of the query and the encoding result of the template to obtain cosine similarity of the two, and selecting the template corresponding to the maximum value of the cosine similarity in the SQL template library as the SQL template most relevant to the user question.
4. A text-to-SQL generation method based on template retrieval according to claim 3, wherein said query sequence S is expressed as:
S=<TABLE>t 1 |t 2 |…|t N |<COLUMN>c 1 |c 2 |…|c M |<QUESTION>q 1…n
wherein t is 1 ~t N C is the table name in the database 1 ~c M Q is the column name in the database 1 ~q n As a word element in the question,<TABLE>、<COLUMN>and<QUESTION>For identifying the list head, the column names and the special symbols of the problems, N is the number of the tables in the database, M is the number of the columns in the database, and N is the number of the words in the problems;
the loss function is expressed as:
Figure FDA0004065709380000021
wherein S represents a query sequence, T +/- Representing positive and negative templates, p representing conditional probabilities.
5. The text-to-SQL generation method based on template retrieval according to claim 4, wherein the input of the pre-trained language model in the third step is obtained by concatenating the SQL templates most relevant to the user question after the query sequence S;
the pre-training language model in the third step is obtained by adding new type codes on the basis of the original position codes of the pre-training language model;
the new type code includes:
table and column position coding: each table name and column name in the input sequence corresponds to a single code, and the other codes are marked as 0 from 1;
table, column identification code: table names are represented by 1, column names are represented by 2, and others are represented by 0;
column type coding: 0 represents other, 1-5 represent integers, character strings, floating point numbers, dates and Boolean types respectively;
database matching coding: and matching the table and the column in the database with the word elements in the user problem in a character string matching mode, wherein the complete matching is marked as 1, the partial matching is marked as 2, and the other cases are marked as 0.
6. The text-to-SQL generation method based on template retrieval according to claim 5, wherein the feedforward neural network in the step four is obtained through training, and cross entropy loss functions are used for optimization during training, and the cross entropy loss functions are added into the loss functions of the model overall in a ratio of 0.1 times.
7. The text-to-SQL generation method based on template retrieval according to claim 6, wherein the decoding in the fifth step is performed through a pointer network based on fragment copy, and SQL sentences consistent with the user question semantics are expressed in the form of keywords + range indexes;
the scope index refers to that in the SQL sentence, the table name, the column name and the condition value in the SQL sentence are represented by using the starting position index and the ending position index of the fragment in the input sequence.
8. The method for generating text to SQL based on template retrieval according to claim 7, wherein the non-autoregressive transformers in the fifth step are obtained by randomly initializing transformers and adding a pointer network, and training by using a cross entropy loss function.
9. The text-to-SQL generating method based on template retrieval according to claim 8, wherein the specific steps of the fifth step are as follows:
firstly, using < mask > symbols with the same length as the SQL sequence obtained in the step four as the input of a non-autoregressive transducer, and calculating self-attention with the coding vector of each word element to obtain the coding vector of each < mask > symbol;
then carrying out iteration for preset times by using the coding vector of each < mask > symbol to generate a keyword and a range index in the SQL sentence;
and finally, filling corresponding tables, columns and values according to the generated range indexes, and supplementing the missing on clauses to obtain a final SQL sentence.
10. The template retrieval based text-to-SQL generation method of claim 9, wherein the pre-trained language model is BERT, roBERTa, or electric.
CN202210836518.3A 2022-07-15 2022-07-15 text-to-SQL generating method based on template retrieval Active CN115203236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210836518.3A CN115203236B (en) 2022-07-15 2022-07-15 text-to-SQL generating method based on template retrieval

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210836518.3A CN115203236B (en) 2022-07-15 2022-07-15 text-to-SQL generating method based on template retrieval

Publications (2)

Publication Number Publication Date
CN115203236A CN115203236A (en) 2022-10-18
CN115203236B true CN115203236B (en) 2023-05-12

Family

ID=83581938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210836518.3A Active CN115203236B (en) 2022-07-15 2022-07-15 text-to-SQL generating method based on template retrieval

Country Status (1)

Country Link
CN (1) CN115203236B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303559B (en) * 2023-02-24 2024-02-23 广东爱因智能科技有限公司 Method, system and storage medium for controlling form question and answer

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559556A (en) * 2021-02-25 2021-03-26 杭州一知智能科技有限公司 Language model pre-training method and system for table mode analysis and sequence mask
CN114637765A (en) * 2022-04-26 2022-06-17 阿里巴巴达摩院(杭州)科技有限公司 Man-machine interaction method, device and equipment based on form data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10423649B2 (en) * 2017-04-06 2019-09-24 International Business Machines Corporation Natural question generation from query data using natural language processing system
CN111666575B (en) * 2020-04-15 2022-11-18 中国人民解放军战略支援部队信息工程大学 Text carrier-free information hiding method based on word element coding
CN112988785B (en) * 2021-05-10 2021-08-20 浙江大学 SQL conversion method and system based on language model coding and multitask decoding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559556A (en) * 2021-02-25 2021-03-26 杭州一知智能科技有限公司 Language model pre-training method and system for table mode analysis and sequence mask
CN114637765A (en) * 2022-04-26 2022-06-17 阿里巴巴达摩院(杭州)科技有限公司 Man-machine interaction method, device and equipment based on form data

Also Published As

Publication number Publication date
CN115203236A (en) 2022-10-18

Similar Documents

Publication Publication Date Title
CN113642330B (en) Rail transit standard entity identification method based on catalogue theme classification
CN110119765B (en) Keyword extraction method based on Seq2Seq framework
CN112507065B (en) Code searching method based on annotation semantic information
CN110390049B (en) Automatic answer generation method for software development questions
CN110688854A (en) Named entity recognition method, device and computer readable storage medium
CN116151132B (en) Intelligent code completion method, system and storage medium for programming learning scene
CN110442880B (en) Translation method, device and storage medium for machine translation
CN113609824A (en) Multi-turn dialog rewriting method and system based on text editing and grammar error correction
CN112364132A (en) Similarity calculation model and system based on dependency syntax and method for building system
CN115935957B (en) Sentence grammar error correction method and system based on syntactic analysis
CN115203236B (en) text-to-SQL generating method based on template retrieval
CN115048447A (en) Database natural language interface system based on intelligent semantic completion
CN108664464B (en) Method and device for determining semantic relevance
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN115658898A (en) Chinese and English book entity relation extraction method, system and equipment
CN115168402A (en) Method and device for generating model by training sequence
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN114757184A (en) Method and system for realizing knowledge question answering in aviation field
CN112732862B (en) Neural network-based bidirectional multi-section reading zero sample entity linking method and device
CN112148879B (en) Computer readable storage medium for automatically labeling code with data structure
CN116521857A (en) Method and device for abstracting multi-text answer abstract of question driven abstraction based on graphic enhancement
CN114757181B (en) Method and device for training and extracting event of end-to-end event extraction model based on prior knowledge
CN113743095A (en) Chinese problem generation unified pre-training method based on word lattice and relative position embedding
CN113010676A (en) Text knowledge extraction method and device and natural language inference system
CN114201506B (en) Context-dependent semantic analysis method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant