CN115455156A - NL2SQL modeling method and device, electronic equipment and storage medium - Google Patents

NL2SQL modeling method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115455156A
CN115455156A CN202210903164.XA CN202210903164A CN115455156A CN 115455156 A CN115455156 A CN 115455156A CN 202210903164 A CN202210903164 A CN 202210903164A CN 115455156 A CN115455156 A CN 115455156A
Authority
CN
China
Prior art keywords
model
training
layer
nl2sql
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210903164.XA
Other languages
Chinese (zh)
Inventor
周晓辉
王华超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Best Tone Information Service Corp Ltd
Original Assignee
Best Tone Information Service Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Best Tone Information Service Corp Ltd filed Critical Best Tone Information Service Corp Ltd
Priority to CN202210903164.XA priority Critical patent/CN115455156A/en
Publication of CN115455156A publication Critical patent/CN115455156A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a NL2SQL modeling method, a device, electronic equipment and a storage medium, wherein the NL2SQL modeling method comprises the following steps: s1, processing data, generating mass unlabelled samples after processing the data, and realizing labeling of a training sample data set by adopting an UDA (user data access) technology; s2, NL-to-SQL model training, and quantifying a weight layer and an activation layer by adopting a TernaryBERT model; s3, a model compression and generation module, which adopts a pre-training knowledge distillation mode to further compress the model, and simultaneously adopts a bat algorithm to carry out parameter training and intelligent search to find out an optimal distillation network structure; and S4, session management, namely fusing the model with a preset SQL paradigm, associating and generating complete database query SQL, and further interacting with an intelligent customer service interface to realize session management application. The NL2SQL modeling method is suitable for non-professional operators to use, does not need to label sample data on a large scale for model training, and has strong adaptability to new environments and new tasks.

Description

NL2SQL modeling method and device, electronic equipment and storage medium
Technical Field
The invention relates to the field of Natural Language processing semantic analysis, in particular to a method, a device, electronic equipment and a storage medium for NL2SQL (Natural Language to Structured Query Language) modeling based on unsupervised domain adaptation technology and bat algorithm optimization.
Background
With the continuous development of 5G and artificial intelligence technologies, telecom/mobile communication operators as modern comprehensive information service providers rapidly update intelligent customer service of each product line under the guidance of cloud change. For Natural Language (NL) data of the intelligent customer service, an operator may not be a technical person in the data field, and accesses and queries the database through the NL, so that the operation efficiency of the intelligent customer service can be well improved, which is one of the hot researches combining artificial intelligence and the intelligent customer service field at present. At present, the modeling technology of big data focuses on technologies such as machine learning and deep learning, the threshold of use is high, and the modeling technology is not suitable for non-professional operators.
The NL-to-Structured Query Language (NL-2-SQL) data modeling technology for intelligent customer service enables non-professional operators to interact with a database through voice, the threshold of database Query is reduced, and the operating efficiency is greatly improved.
Under the deep study of the deep learning technology, the NL-to-SQL (NL 2 SQL) data modeling technology based on the supervised learning model is successful in the NLP field, but the supervised learning needs to depend on a large amount of manual marking data, and meanwhile, the model has the problems of false correlation, generalization errors, adversarial attack and the like. In addition, patents such as a complex natural language query SQL conversion method based on a tree model (patent number: 202110183393.4), an SQL conversion method based on language model coding and multitask decoding and a system (patent number: 202110505064.7) and the like all need to label sample data on a large scale for model training, and in a real learning task, data labeling usually needs manual participation and relevant professional knowledge, and is time-consuming, labor-consuming and expensive; in addition, the learning model has weak adaptability to new environments and new tasks, and when the learning model is in a learning environment different from a training scene, new data needs to be labeled and the learning model needs to be retrained.
Therefore, it is highly desirable to develop a modeling method suitable for non-professional operators, not requiring large-scale labeling of sample data for model training, and having strong adaptability to new environments and new tasks.
Disclosure of Invention
The invention aims to solve the technical problem of how to realize a modeling method which is suitable for non-professional operators, does not need to label sample data on a large scale for model training and has strong adaptability to new environments and new tasks, thereby reducing the threshold of database query of operators and improving the operation efficiency.
To solve the above technical problem, according to an aspect of the present invention, there is provided a NL2SQL modeling method, comprising the steps of: s1, processing data, namely product operation data, generating a mass of unlabelled samples after processing the data, and further realizing labeling of a training sample data set by adopting an Unsupervised Domain Adaptation (UDA) technology through a preset query SQL (structured query language) paradigm) template; s2, NL-to-SQL model training, performing sample data training on an acquired natural language sample to generate a structured query language model, wherein the model comprises a weight layer and an activation layer, and quantizing the weight layer and the activation layer by adopting a Ternary Bidirectional Encoder replication from transformations (Ternary BERT, which is a pre-trained language Representation model); s3, a model compression and generation module, which is used for further compressing the model by adopting a pre-training knowledge distillation mode in order to make up the insufficient effect of the Ternary BERT model due to excessive quantization on tasks including intelligent customer service consultation, and meanwhile, parameter training and intelligent search are carried out by adopting a bat algorithm in order to prevent the distillation structure network weight from falling into a local optimal value, so as to find an optimal distillation network structure; and S4, carrying out dialogue management, carrying out pretreatment after TernaryBERT model training and PKD model compression, fusing the model with a preset SQL (structured query language) paradigm, associating and generating complete database query SQL, and further interacting with an intelligent customer service interface to realize dialogue management application.
According to the embodiment of the invention, the product operation data in the step S1 can comprise calling name cards and on-hook short messages, and the data processing mode comprises word segmentation, stop word removal, english punctuation replacement, question and answer data set construction, sentence vector index database construction, and further mass non-labeled samples are generated.
According to the embodiment of the invention, the NL-to-SQL model in the S2 step can train a TernaryBERT model, which quantizes the weight layer and the activation layer, wherein the weight layer can comprise a linear layer and an Embedding layer, and the parameters of the linear layer and the Embedding layer account for most of the total parameters of the TernaryBERT model, so that the TernaryBERT model can quantize the linear layer and the Embedding layer more thoroughly; in the quantization of the active layer, 8-bit symmetric and asymmetric methods can be adopted for quantization.
Further, in the quantization of the active layer, in the actual reasoning process, the matrix multiplication can be changed from 32-bit floating point number operation into int8 shaping operation, so that the accelerated target is achieved; the model achieves performance equivalent to a full-precision model with only 6.7% of the parameters of the BERT model.
According to the embodiment of the invention, in the step S3, the pre-training knowledge distillation can adopt two strategies, namely PKD-Last and PKD-Skip, to extract hidden knowledge from the hidden layer of the 'teacher' model, so that the output of the 'teacher' model simulated by the 'student' model is completely eliminated, wherein the PKD-Last strategy is the knowledge contained in the Last k layer of the 'teacher' model; the PKD-Skip strategy is to extract and distill the knowledge in each k-layer in the "teacher" model.
According to the embodiment of the invention, parameter training and intelligent search can be carried out by using a bat algorithm in the step S3 to find out an optimal distillation network structure, and the method can comprise the following steps: s31, inputting the network coding vector into a trained structure generator, generating weights corresponding to the distillation network, and evaluating the distillation network on a verification set to obtain the precision of the corresponding distillation network; s32, in order to search out the distillation network with the highest precision meeting specific constraint conditions, a bat algorithm is adopted to search a distillation structure model with the highest precision meeting the specific constraint conditions, wherein the specific constraint conditions comprise floating point number operation times.
According to an embodiment of the present invention, the preprocessing in step S4 may include processing for implementing named entity recognition, error correction of wrongly written words, and emotion analysis of the user.
According to a second aspect of the present invention, there is provided an apparatus for NL2SQL modeling, comprising:
the system comprises a mass data processing module, a data acquisition module and a data processing module, wherein the mass data processing module trains sample data and adopts a technical scheme of UDA (user data access) to realize self-supervision sample data learning, and the UDA is used for helping target domain learning without any marking information by means of a source domain with a small amount of marking data so as to migrate a learning technology and realize low-cost acquisition of marking data; the NL-to-SQL model training module is based on a TernaryBERT network model and realizes a core technical scheme comprising network weight layer quantization and activation layer quantization; the model compression and generation module adopts a PKD model compression algorithm, extracts hidden knowledge from the hidden layer of the 'teacher' model by using two strategies of PKD-Last and PKD-Skip, thoroughly eliminates the output of the 'student' model imitating the 'teacher' model, is used for reducing the number of side layers of the NL-to-SQL model, reduces the parameter quantity of model training and improves the knowledge reasoning speed of the model; meanwhile, network parameter training and intelligent search are carried out by adopting a bat algorithm, and an optimal distillation network structure is found; and the dialogue management module is used for realizing the processing including named entity recognition, wrongly written character error correction and user emotion analysis after the TernaryBERT model training and the PKD model compression, then fusing a preset SQL (structured query language) paradigm, associating and generating complete database query SQL, and further carrying out interaction by means of a dialogue management system and an intelligent customer service interface.
According to a third aspect of the present invention, there is provided an electronic apparatus comprising: the NL2SQL modeling program is stored on the memory and executable on the processor, and when executed by the processor, implements the steps of the NL2SQL modeling method described above.
According to a fourth aspect of the present invention, there is provided a computer storage medium, wherein the computer storage medium has stored thereon an NL2SQL modeling program, the NL2SQL modeling program, when executed by a processor, implementing the steps of the NL2SQL modeling method described above.
Compared with the prior art, the technical scheme provided by the embodiment of the invention can at least realize the following beneficial effects:
1. the invention is based on an Unsupervised Domain Attachment (UDA) technology, the Unsupervised Domain Adaptation technology mainly aims to help a target Domain without any labeled information to learn by means of a source Domain with a large amount of labeled data, and the method mainly comprises the steps of learning the initial characteristics of the source Domain and the target Domain through self-supervision and fixing part of network parameters for storing the target Domain information. And then, transferring the sample contrast knowledge of the source domain to the target domain to assist the learning type distinguishing characteristics of the target domain. Therefore, the UDA technology can utilize the auxiliary task to mine own supervision information from large-scale unsupervised data, and utilize the information to train the network, thereby learning rich and universal characteristics which are valuable to downstream tasks.
2. The invention provides an intelligent customer service-oriented data NL generation SQL modeling method by fusing a UDA technology, solves the technical problem that the existing NL-to-SQL technology is only limited to database query and cannot be applied to complex intelligent customer service data, reduces the database query threshold of operators, and improves the operation efficiency. Meanwhile, the UDA technology is adopted to obtain a large number of model training samples at low cost, so that the method has wide application scenes and commercial application value.
3. The invention is based on UDA task training and NL-to-SQL modeling technology, carries out language model pre-training task based on UDA technology, adopts bat algorithm for optimization in model compression structure selection, fuses NL-to-SQL modeling technology, realizes automatic training model based on mass unmarked data, and greatly reduces cost of manual marking data.
4. The technical scheme of the invention realizes the self-supervision learning of the sample based on the UDA technology, effectively reduces the dependence of a Text-to-SQL model on large-scale labeled sample data, optimizes a compression network structure by means of a TernaryBERT model training and PKD model compression method and a bat algorithm, fuses SQL normal forms preset by various categories, and predicts and outputs a complete SQL query sentence.
5. The technical scheme of the invention adopts the UDA technology, realizes the autonomous generation of unsupervised domain adaptation of data, reduces the dependence of a model on a large amount of labeled data, and is compatible with single-field and multi-field scenes.
6. The technical scheme of the invention has the advantages that the accurate matching rate and the execution accuracy in a single field reach more than 92 percent, the accurate matching rate in multiple fields reach more than 75 percent, and the execution accuracy reaches more than 86 percent.
7. The technical scheme of the invention performs a new round of reinforcement learning on the error case, and the system has iterative optimization capability.
8. According to the technical scheme, the TernaryBERT algorithm-based model is adopted, parameters of the model are fewer, the effect of the model is improved, less memory is needed during model deployment, and the model has the advantages of rapid deployment and wide application.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings of the embodiments will be briefly described below, and it is apparent that the drawings in the following description only relate to some embodiments of the present invention and are not limiting on the present invention.
Fig. 1 is a model framework diagram showing an NL2SQL modeling method according to an embodiment of the present invention.
FIG. 2 is a flow diagram illustrating a NL2SQL modeling method according to an embodiment of the invention.
Fig. 3 is a diagram illustrating a TernaryBERT pre-training model according to an embodiment of the invention.
Fig. 4 is a diagram illustrating two strategies of a PKD method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It should be apparent that the described embodiments are only some of the embodiments of the present invention, and not all of them. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without inventive step, are within the scope of protection of the invention.
Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The use of "first," "second," and similar terms in the description and in the claims of the present application does not denote any order, quantity, or importance, but rather the terms are used to distinguish one element from another. Also, the use of the terms a, an, etc. do not denote a limitation of quantity, but rather denote the presence of at least one.
Fig. 1 is a model framework diagram showing an NL2SQL modeling method according to an embodiment of the present invention, and fig. 2 is a flowchart showing the NL2SQL modeling method according to an embodiment of the present invention.
As shown in fig. 1 and 2, the NL2SQL modeling method includes the following steps:
s1, processing data, wherein the data are product operation data, generating massive non-labeled samples after processing the data, and labeling a training sample data set by adopting a UDA technology through a preset query SQL (structured query language) paradigm sample plate.
And S2, NL-to-SQL model training, wherein sample data training is carried out on the acquired natural language sample to generate a structured query language model, the model comprises a weight layer and an activation layer, and the weight layer and the activation layer are quantized by adopting a TernaryBERT model.
And S3, a model compression and generation module, which is used for further compressing the model by adopting a pre-training knowledge distillation mode in order to make up the insufficient effect of the Ternary BERT model due to excessive quantization on tasks including intelligent customer service consultation, and meanwhile, parameter training and intelligent search are carried out by adopting a bat algorithm in order to prevent the distillation structure network weight from falling into a local optimal value, so that the optimal distillation network structure is found.
And S4, session management, namely preprocessing after TernaryBERT model training and PKD model compression, fusing the model with a preset SQL (structured query language) paradigm, associating and generating complete database query SQL, and further interacting with an intelligent customer service interface to realize session management application.
The invention is based on an Unsupervised Domain Adaptation (UDA) technology, the main aim of the Unsupervised Domain Adaptation technology is to help a target Domain without any label information to learn by means of a source Domain with a large amount of label data, the main steps are to learn the initial characteristics of the source Domain and the target Domain by self-supervision, and fix part of network parameters for storing the target Domain information. And then, transferring the sample comparison knowledge of the source domain to the target domain to assist the learning discrimination characteristics of the target domain. Therefore, the UDA technology can utilize the auxiliary task to mine own supervision information from large-scale unsupervised data, and utilize the information to train the network, so that rich and universal characteristics which are valuable to downstream tasks are learned.
The technical scheme of the invention adopts the UDA technology, realizes the autonomous generation of unsupervised domain adaptation of data, reduces the dependence of a model on a large amount of labeled data, and is compatible with single-field and multi-field scenes.
According to one or some embodiments of the invention, the product operation data in the step S1 includes calling cards and on-hook short messages, and the data processing mode includes word segmentation, stop word removal, english punctuation replacement, question and answer data set construction, sentence vector index database construction, and thus the generated massive non-labeled samples.
Fig. 3 is a diagram illustrating a TernaryBERT pre-training model according to an embodiment of the invention.
As shown in fig. 3, in the S2 step, the NL-to-SQL model trains a TernaryBERT model, which quantizes the weight layer and the activation layer, wherein the weight layer may include a linear layer and an Embedding layer, and parameters of the linear layer and the Embedding layer account for most of total parameters of the TernaryBERT model, so that the TernaryBERT model quantizes the linear layer and the Embedding layer more thoroughly; in the quantization of the active layer, 8-bit symmetrical and asymmetrical methods are adopted for quantization.
According to the technical scheme, the TernaryBERT algorithm-based model is adopted, so that the parameters of the model are fewer, the effect of the model is improved, less memory is required during model deployment, and the model has the advantages of rapid deployment and wide application.
Further, in the quantization of the active layer, in the actual reasoning process, the matrix multiplication is changed from 32-bit floating point number operation to int8 shaping operation, so that the accelerated target is achieved; the model achieves performance equivalent to a full-precision model with only 6.7% of the parameters of the BERT model.
Fig. 4 is a schematic diagram illustrating two strategies of a PKD method according to an embodiment of the invention.
As shown in fig. 4, in the step S3, pre-training knowledge distillation (PKD) adopts two strategies, PKD-Last and PKD-Skip, to extract hidden knowledge from the hidden layer of the "teacher" model, and completely eliminate the output of the "student" model imitating the "teacher" model, wherein the PKD-Last strategy uses knowledge contained in the Last k layer of the "teacher" model; the PKD-Skip strategy is to extract and distill the knowledge in each k-layer in the "teacher" model.
According to one or some embodiments of the present invention, parameter training and intelligent search are performed by using a bat algorithm in the step S3 to find an optimal distillation network structure, and the step S3 includes the following steps:
and S31, inputting the network coding vector into the trained structure generator, generating the weight corresponding to the distillation network, and evaluating the distillation network on the verification set to obtain the precision corresponding to the distillation network.
S32, in order to search out the distillation network with the highest precision meeting specific constraint conditions, a bat algorithm is adopted to search a distillation structure model with the highest precision meeting the specific constraint conditions, wherein the specific constraint conditions comprise floating point number operation times.
The invention carries out the pre-training task of the language model based on the UDA technology, simultaneously selects the model compression structure by adopting a bat algorithm, integrates NL-to-SQL modeling technology, realizes the automatic training model based on mass unmarked data, and greatly reduces the cost of manually marking data.
According to one or some embodiments of the invention, the preprocessing in step S4 includes processing to realize named entity recognition, error correction of wrongly written words, user emotion analysis.
According to a second aspect of the present invention, there is provided an apparatus for NL2SQL modeling, comprising: the system comprises a mass data processing module, an NL-to-SQL model training module, a model compression and generation module and a dialogue management module.
The mass data processing module trains sample data, and self-supervision sample data learning is realized by adopting a technical scheme of the UDA, and the UDA is used for helping target domain learning without any marking information by means of a source domain with a small amount of marking data, so that the learning technology is transferred, and the marking data can be acquired at low cost.
The NL-to-SQL model training module is based on a TernaryBERT network model, and a core technical scheme comprising network weight layer quantification and activation layer quantification is achieved.
The model compression and generation module adopts a PKD model compression algorithm, hidden knowledge is extracted from a hidden layer of the 'teacher' model by using two strategies, namely PKD-Last and PKD-Skip, the output of the 'student' model imitating the 'teacher' model is thoroughly eliminated, the number of side layers of the NL-to-SQL model is reduced, parameters of model training are reduced, and the knowledge inference speed of the model is improved; meanwhile, a bat algorithm is adopted to carry out network parameter training and intelligent search, and an optimal distillation network structure is found.
After the TernaryBERT model training and the PKD model compression, the dialogue management module is used for realizing the processing including named entity recognition, wrongly written characters error correction and user emotion analysis, then fusing a preset SQL (structured query language) paradigm, associating and generating complete database query SQL, and further carrying out interaction by means of a dialogue management system and an intelligent customer service interface.
The technical scheme of the invention realizes the self-supervision learning of the sample based on the UDA technology, effectively reduces the dependence of the Text-to-SQL model on large-scale labeled sample data, optimizes the compression network structure by means of the TernaryBERT model training and the PKD model compression method and through the bat algorithm, fuses SQL normal forms preset by various categories, and predicts and outputs complete SQL query sentences.
According to the technical scheme of the invention, the intelligent service system can be effectively deployed and applied in intelligent operation scenes such as intelligent customer service application scenes, intelligent query system application scenes and the like, the use threshold of operators is reduced, and 7-24 hours of uninterrupted service in one week is guaranteed.
According to yet another aspect of the invention, there is provided an NL2SQL modeling apparatus comprising: the NL2SQL modeling program is stored on the memory and executable on the processor, and when executed by the processor, implements the steps of the NL2SQL modeling method described above.
There is also provided a computer storage medium according to the present invention.
The NL2SQL modeling program is stored on the computer storage medium, and when executed by the processor, the NL2SQL modeling program implements the steps of the NL2SQL modeling method described above.
The method implemented when the NL2SQL modeling program running on the processor is executed may refer to each embodiment of the NL2SQL modeling method of the present invention, and is not described herein again.
The invention also provides a computer program product.
The computer program product of the invention comprises an NL2SQL modeling program, which when executed by a processor implements the steps of the NL2SQL modeling method as described above.
The method implemented when the NL2SQL modeling program running on the processor is executed may refer to each embodiment of the NL2SQL modeling method of the present invention, and is not described herein again.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The technical scheme of the invention performs a new round of reinforcement learning on the error case, and the system has iterative optimization capability. The accurate matching rate and the execution accuracy rate in a single field both reach more than 92 percent, the accurate matching rate in multiple fields reaches more than 75 percent, and the execution accuracy rate reaches more than 86 percent.
The invention provides an intelligent customer service-oriented data NL generation SQL modeling method by fusing a UDA technology, solves the technical problem that the existing NL-to-SQL technology is only limited to database query and cannot be applied to complex intelligent customer service data, reduces the database query threshold of operators, and improves the operation efficiency. Meanwhile, the UDA technology is adopted to obtain a large number of model training samples at low cost, so that the method has wide application scenes and commercial application value.
The above description is intended to be illustrative of the present invention and not to limit the scope of the invention, which is defined by the claims appended hereto.

Claims (10)

1. A NL2SQL modeling method, wherein the NL2SQL modeling method is realized based on an Unsupervised Domain Adaptation (UDA) technology and bat algorithm optimization, and the method comprises the following steps:
s1, processing data, wherein the data are product operation data, generating massive non-labeled samples after processing the data, and further labeling a training sample data set by adopting a UDA technology through a preset query SQL (structured query language) paradigm sample plate;
s2, NL-to-SQL model training, wherein sample data training is carried out on an acquired natural language sample to generate a structured query language model, the model comprises a weight layer and an activation layer, and the weight layer and the activation layer are quantized by adopting a TernaryBERT model;
s3, a model compression and generation module, which is used for further compressing the model by adopting a pre-training knowledge distillation mode in order to make up the insufficient effect of the Ternary BERT model due to excessive quantization on tasks including intelligent customer service consultation, and meanwhile, parameter training and intelligent search are carried out by adopting a bat algorithm in order to prevent the distillation structure network weight from falling into a local optimal value, so as to find out an optimal distillation network structure;
and S4, session management, namely preprocessing after TernaryBERT model training and PKD model compression, fusing the model with a preset SQL (structured query language) paradigm, associating and generating complete database query SQL, and further interacting with an intelligent customer service interface to realize session management application.
2. The method as claimed in claim 1, wherein the product operation data in the step S1 includes calling cards and on-hook messages, and the data processing manner includes performing word segmentation, removing stop words, replacing english punctuations, constructing question and answer data sets, constructing sentence vector index databases, and generating massive unlabelled samples.
3. The method of claim 1, wherein the NL-to-SQL model training in S2 step employs a TernaryBERT model that quantizes weight layers and activation layers,
the weight layer comprises a linear layer and an Embedding layer, and parameters of the linear layer and the Embedding layer account for most of total parameters of the TernaryBERT model, so that the TernaryBERT model can quantize the linear layer and the Embedding layer more thoroughly;
and in the quantization of the active layer, 8-bit symmetrical and asymmetrical methods are adopted for quantization.
4. The method of claim 3, wherein in the active layer quantization, in the actual reasoning process, the matrix multiplication is changed from 32-bit floating point number operation to int8 shaping operation, so as to achieve the accelerated goal; the model achieves performance equivalent to a full-precision model with only 6.7% of the parameters of the BERT model.
5. The method of claim 1, wherein the pre-training knowledge distillation in the S3 step adopts two strategies of PKD-Last and PKD-Skip to extract hidden knowledge from the hidden layer of the "teacher" model, completely eliminates the output of the "student" model imitating the "teacher" model,
wherein the PKD-Last strategy is knowledge contained in the Last k layer of a 'teacher' model; the PKD-Skip strategy is to extract and distill the knowledge in each k-layer in the "teacher" model.
6. The method as claimed in claim 1, wherein the parameter training and intelligent search using bat algorithm to find the optimal distillation network structure in S3 step comprises the following steps:
s31, inputting the network coding vector into a trained structure generator to generate a weight corresponding to the distillation network, and evaluating the distillation network on a verification set to obtain the precision corresponding to the distillation network;
s32, in order to search out the distillation network with the highest precision meeting specific constraint conditions, a bat algorithm is adopted to search a distillation structure model with the highest precision meeting the specific constraint conditions, wherein the specific constraint conditions comprise floating point number operation times.
7. The method of claim 1, wherein the preprocessing in the S4 step includes processing for implementing named entity recognition, error correction of wrongly written words, user emotion analysis.
8. An apparatus for NL2SQL modeling, comprising:
the mass data processing module is used for training sample data and realizing self-supervision sample data learning by adopting a technical scheme of the UDA, and the UDA is used for helping target domain learning without any marking information by means of a source domain with a small amount of marking data so as to transfer a learning technology and realize low-cost acquisition of marking data;
the NL-to-SQL model training module is based on a TernaryBERT network model and realizes a core technical scheme comprising network weight layer quantization and activation layer quantization;
the model compression and generation module adopts a PKD model compression algorithm, extracts hidden knowledge from the hidden layer of the 'teacher' model by using two strategies, namely PKD-Last and PKD-Skip, thoroughly eliminates the output of the 'student' model imitating the 'teacher' model, is used for reducing the number of layers on the NL-to-SQL model, reduces the parameter quantity of model training and improves the knowledge inference speed of the model; meanwhile, network parameter training and intelligent search are carried out by adopting a bat algorithm, and an optimal distillation network structure is found;
and the dialogue management module is used for realizing the processing including named entity recognition, wrongly written character error correction and user emotion analysis after the TernaryBERT model training and the PKD model compression, then fusing a preset SQL (structured query language) paradigm, associating and generating complete database query SQL, and further carrying out interaction by means of a dialogue management system and an intelligent customer service interface.
9. An electronic device, comprising: memory, a processor and an NL2SQL modeling program stored on the memory and executable on the processor, the NL2SQL modeling program, when executed by the processor, implementing the steps of the NL2SQL modeling method according to any one of claims 1 to 7.
10. A computer storage medium, wherein the computer storage medium has stored thereon an NL2SQL modeling program, which NL2SQL modeling program, when executed by a processor, implements the steps of the NL2SQL modeling method according to any one of claims 1 to 7.
CN202210903164.XA 2022-07-29 2022-07-29 NL2SQL modeling method and device, electronic equipment and storage medium Pending CN115455156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210903164.XA CN115455156A (en) 2022-07-29 2022-07-29 NL2SQL modeling method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210903164.XA CN115455156A (en) 2022-07-29 2022-07-29 NL2SQL modeling method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115455156A true CN115455156A (en) 2022-12-09

Family

ID=84296462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210903164.XA Pending CN115455156A (en) 2022-07-29 2022-07-29 NL2SQL modeling method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115455156A (en)

Similar Documents

Publication Publication Date Title
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN109670035A (en) A kind of text snippet generation method
CN113065358B (en) Text-to-semantic matching method based on multi-granularity alignment for bank consultation service
CN108549658A (en) A kind of deep learning video answering method and system based on the upper attention mechanism of syntactic analysis tree
CN113268610B (en) Intent jump method, device, equipment and storage medium based on knowledge graph
CN109460459A (en) A kind of conversational system automatic optimization method based on log study
CN113672708A (en) Language model training method, question and answer pair generation method, device and equipment
CN117313728A (en) Entity recognition method, model training method, device, equipment and storage medium
CN111783464A (en) Electric power-oriented domain entity identification method, system and storage medium
CN113326367B (en) Task type dialogue method and system based on end-to-end text generation
CN115064154A (en) Method and device for generating mixed language voice recognition model
CN116186259A (en) Session cue scoring method, device, equipment and storage medium
CN115860002A (en) Combat task generation method and system based on event extraction
CN115840884A (en) Sample selection method, device, equipment and medium
CN115455156A (en) NL2SQL modeling method and device, electronic equipment and storage medium
CN115062123A (en) Knowledge base question-answer pair generation method of conversation generation system
CN111091011B (en) Domain prediction method, domain prediction device and electronic equipment
CN116910377B (en) Grid event classified search recommendation method and system
CN117453895B (en) Intelligent customer service response method, device, equipment and readable storage medium
CN114996407B (en) Remote supervision relation extraction method and system based on packet reconstruction
CN116089589B (en) Question generation method and device
CN113744737B (en) Training of speech recognition model, man-machine interaction method, equipment and storage medium
CN109241539B (en) Updating method of machine learning artificial intelligence translation database
CN114417880A (en) Interactive intelligent question-answering method based on power grid practical training question-answering knowledge base
CN118194868A (en) Low-resource entity identification method based on ontology and semantic consistency example retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination