CN117271558A

CN117271558A - Language query model construction method, query language acquisition method and related devices

Info

Publication number: CN117271558A
Application number: CN202311249586.0A
Authority: CN
Inventors: 洪烨嵘
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2023-09-26
Filing date: 2023-09-26
Publication date: 2023-12-22

Abstract

The embodiment of the application discloses a language query model construction method, a query language acquisition method and a related device, which are applied to the field of artificial intelligence or finance. Firstly, a historical structured query language is obtained, and the historical structured query language is preprocessed to obtain first preprocessed data. And then inputting the first preprocessed data into a convolutional neural network for model training to obtain a language query model. The method and the device are used for carrying out model training by utilizing the processed historical structured query language based on the convolutional neural network to generate the language query model capable of automatically generating structured query language SQL sentences. The language query model can enable the developer to obtain SQL sentences without checking data details, so that the query and acceptance efficiency of service personnel is improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

Description

Language query model construction method, query language acquisition method and related devices

Technical Field

The application relates to the field of artificial intelligence and finance, in particular to a language query model construction method, a query language acquisition method and a related device.

Background

Databases have become an indispensable part in modern software development. For database management, the SQL language (Structured Query Language ) is one of the most common. However, the writing of SQL statements requires a certain expertise and experience from the user. On one hand, the manual development is easy to cause errors, grammar errors and logic errors which are not easy to find; on the other hand, some business personnel need assistance of developers to view the scene of confidential business data messages, and under the condition that the developers do not allow the contact data, the processing and communication of SQL sentences are difficult to a certain extent, and the development of SQL is also a certain threshold for the business personnel of beginners, so that the difficulty is high.

Therefore, how to provide a language query model, so as to assist developers not to look up data details to obtain SQL sentences, thereby improving query and acceptance efficiency of business personnel, is a technical problem which needs to be solved by those skilled in the art.

Disclosure of Invention

Based on the problems, the application provides a language query model construction method, a query language acquisition method and a related device, so that developers are assisted in obtaining SQL sentences without checking data details, and query and acceptance efficiency of business personnel is improved.

The embodiment of the application discloses the following technical scheme:

a method of language query model construction, the method comprising:

acquiring a history structured query language;

preprocessing the history structured query language to obtain first preprocessed data;

and inputting the first preprocessing data into a convolutional neural network for model training to obtain a language query model.

In some possible implementations, the preprocessing the historical structured query language to obtain first preprocessed data includes:

removing data values and column name information in the history structured query language to obtain a first data template;

clustering the first data templates to obtain a plurality of intermediate templates, and removing target templates in the intermediate templates to obtain a plurality of second data templates;

selecting a preset number of second data templates from the second data templates according to the occurrence frequency of the templates as third data templates;

when the third data template is consistent with the Chinese description of the third data template, the Chinese description is used as a language description template;

acquiring second data values and second column name information which are the same as the types of the third data templates and the language description templates, and filling the second data values and the second column name information into the third data templates and the language description templates to obtain a template set;

Dividing the template set into a training set and a testing set, and taking the training set as the first preprocessing data.

In some possible implementations, the method further includes:

acquiring a target answer of a real sentence corresponding to the test set, and acquiring a target character corresponding to the target answer, wherein the real sentence is a sentence input by a user;

inputting the test set into the language query model to obtain an output structured query language;

and when the output answer of the output structured query language is inconsistent with the target answer and/or the output characters of the output structured query language are inconsistent with the target characters, correcting the language query model according to the target answer and/or the target characters.

A query language acquisition method, the method comprising:

acquiring a requirement to be processed;

preprocessing the to-be-processed requirement to obtain second preprocessed data;

when no query record of the to-be-processed requirement exists in the database, inputting the second preprocessing data into a language query model to obtain a target structured query language, and outputting the target structured query language to a client, wherein the language query model is constructed according to the language query model construction method;

When the query records of the to-be-processed requirements exist in the database, outputting the query records to the client corresponding to the target structured query language.

In some possible implementations, the preprocessing the requirement to be processed to obtain second preprocessed data includes:

removing special characters and punctuation marks in the to-be-processed requirements to obtain a clean text;

word segmentation processing is carried out on the clean text to obtain a word segmentation text;

and carrying out semantic analysis on the word segmentation text to obtain the second preprocessing data.

A device for constructing a language query model, the device comprising:

the first acquisition unit is used for acquiring the historical structured query language;

the first preprocessing unit is used for preprocessing the history structured query language to obtain first preprocessing data;

the model training unit is used for inputting the first preprocessing data into a convolutional neural network for model training to obtain a language query model.

A query language acquisition apparatus, the apparatus comprising:

the third acquisition unit is used for acquiring the requirement to be processed;

the second preprocessing unit is used for preprocessing the to-be-processed requirement to obtain second preprocessed data;

The input-output unit is used for inputting the second preprocessing data into a language query model to obtain a target structured query language and outputting the target structured query language to a client when the query record of the to-be-processed requirement does not exist in the database, wherein the language query model is constructed according to the language query model construction method;

and the output unit is used for outputting the query record corresponding to the target structured query language to the client when the query record of the to-be-processed requirement exists in the database.

A query language acquisition model building apparatus comprising: the language query model building method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the language query model building method when executing the computer program.

A query language acquisition device, comprising: the query language acquisition system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the query language acquisition method is realized when the processor executes the computer program.

A computer readable storage medium having instructions stored therein, which when executed on a terminal device, cause the terminal device to perform a language query model building method as described above, or to perform a query language acquisition method as described above.

Compared with the prior art, the application has the following beneficial effects:

the application provides a language query model construction method, a query language acquisition method and a related device. Specifically, when the language query model construction method provided by the embodiment of the present application is executed, a history structured query language may be first obtained. Next, the historical structured query language is preprocessed to obtain first preprocessed data. And then inputting the first preprocessing data into a convolutional neural network for model training to obtain a language query model. The method and the device are used for model training based on the convolutional neural network by using the processed historical structured query language to generate a language query model capable of automatically generating SQL sentences. The language query model can enable the developer to obtain SQL sentences without checking data details, so that the query and acceptance efficiency of service personnel is improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

Drawings

In order to more clearly illustrate the present embodiments or the technical solutions in the prior art, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a method flowchart of a language query model construction method provided in an embodiment of the present application;

FIG. 2 is a flowchart of a method for obtaining a query language according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a device for constructing a language query model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a query language obtaining device according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, the following description will first explain the background technology related to the embodiments of the present application.

The SQL language is a standardized language for managing and manipulating relational databases, among other things. It allows users to query, insert, update and delete data in databases through simple grammars.

In order to solve the problem, in the embodiment of the application, a language query model construction method, a query language acquisition method and a related device are provided, and a history structured query language is acquired first. And then preprocessing the historical structured query language to obtain first preprocessed data. And finally, inputting the first preprocessing data into a convolutional neural network for model training to obtain a language query model. The method and the device are used for model training based on the convolutional neural network by using the processed historical structured query language to generate a language query model capable of automatically generating SQL sentences. The language query model can enable the developer to obtain SQL sentences without checking data details, so that the query and acceptance efficiency of service personnel is improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Referring to fig. 1, which is a method flowchart of a language query model construction method provided in an embodiment of the present application, as shown in fig. 1, the language query model construction method may include steps S101 to S103:

s101: a historical structured query language is obtained.

In order to construct a language query model, a system for constructing a language query model first needs to obtain a historical structured query language.

In some possible implementations, the source 1 of the historic structured query language is a library of external SQL (Structured Query Language ) statements that may be: the crawler knight-errant CSpider contains language sentences and corresponding SQL information; source 2 is the internal SQL statement of the bank development system and the corresponding chinese. Wherein, CSpider is an open source web crawler framework based on Python and is used for capturing data on the Internet. It provides powerful functions and flexible configuration options that can be used for various web crawler tasks such as data acquisition, search engine indexing, web page analysis, etc.

S102: and preprocessing the history structured query language to obtain first preprocessed data.

After the history structured query language is obtained, the language query model may be constructed to preprocess the history structured query language to obtain first preprocessed data.

In some possible implementations, the preprocessing the historic structured query language to obtain first preprocessed data includes A1-A6:

a1: and removing the data value and the column name information in the history structured query language to obtain a first data template.

In order to realize preprocessing of the historical structured query language to obtain first preprocessed data, the language query model building system firstly needs to remove data values and column name information in the historical structured query language to obtain a first data template, and converts the historical structured query language into a universal form (namely, removes the data values and the column name information in the historical structured query language), so that subsequent clustering and model training are facilitated.

In some possible implementations, the Data value (Data Values): the data value refers to specific data stored in the database table. They may be text, numbers, dates, boolean values, etc. of different types of data. In SQL statements, data values are used to perform operations such as insert, update, and filter.

For example, in an INSERT INTO statement, the VALUES clause specifies the specific data value to be inserted.

For example, in an insert statement, we use the VALUES key to specify the particular value to be inserted. These values may be text, numbers, dates, etc. of different types of data. For example, INSERT INTO table names (column 1, column2, column 3) VALUES (value 1, value 2, value 3).

Column name information (Column Names): column name information refers to the names of columns in the database table. Each table consists of a set of columns, each column having a unique name. Column names are used to identify a particular column for query, update, or delete operations. In the SELECT statement, a column name is used to specify a particular column to be retrieved. For example, SELECT column1, column2 FROM table_name.

A2: clustering the first data templates to obtain a plurality of intermediate templates, and removing target templates in the intermediate templates to obtain a plurality of second data templates.

After removing the data value and the column name information in the historical structured query language to obtain a first data template, the language query model building system also needs to cluster the first data template to obtain a plurality of intermediate templates, and removes the target template in the intermediate templates to obtain a plurality of second data templates. Clustering the first data templates, and classifying similar templates into one type. This can help remove relatively simple templates, ensuring that the templates selected have a certain complexity and variety.

In some possible implementations, the target templates may be understood as templates that are more basic and common to query requirements, and may include more common query operations, such as simple filtering conditions, ordering, query fields, and the like. These templates tend to be very straightforward and easy to understand, and do not involve sophisticated operations such as complex database associations or nested queries. The complex template (i.e., the second data template) corresponding to the target template may contain more complex query requirements, such as multi-table associations, sub-queries, aggregation functions, window functions, connection conditions, and the like. These templates typically require more extensive database knowledge and advanced query skills to understand and use. Thus, in selecting complex templates, the complexity of the templates, the type of query operations involved, the nature of the databases involved, and the popularity and breadth of the corresponding query requirements may be considered. By carefully choosing complex templates, the model can be better trained to solve more challenging query problems.

A3: and selecting a preset number of second data templates from the second data templates according to the template occurrence frequency to serve as third data templates.

After clustering the first data templates to obtain a plurality of intermediate templates and removing target templates in the intermediate templates to obtain a plurality of second data templates, the language query model building system can select a preset number of second data templates from the second data templates with lower occurrence frequency as a third data template according to the occurrence frequency of the templates.

Where more frequent templates often represent common query requirements, lower frequent templates may contain more complex and special query requirements and thus require choosing templates that appear lower frequently.

In some possible implementations, the preset number may be, but is not limited to, 50, and the preset number is not specifically limited in the present application.

A4: and when the third data template is consistent with the Chinese description of the third data template, the Chinese description is used as a language description template.

When the third data template is consistent with the Chinese description, the language query model construction system can take the Chinese description as a language description template.

In some possible implementations, whether the third data template is consistent with the Chinese description thereof may be checked by means of manual verification.

A5: and acquiring a second data value and second column name information which are the same as the third data template and the language description template in type, and filling the second data value and the second column name information into the third data template and the language description template to obtain a template set.

After obtaining the language description templates, the language query model building system also needs to obtain second data values and second column name information which are the same as the third data templates and the language description templates in type, and fill the second data values and the second column name information into the third data templates and the language description templates to obtain a template set.

A6: dividing the template set into a training set and a testing set, and taking the training set as the first preprocessing data.

After the template set is obtained, the language query model construction system divides the template set into a training set and a testing set, and takes the training set as the first preprocessing data.

Noise and irrelevant information in the original query language can be filtered out by removing the first data value and the first column name information, and a pure data template is extracted. Meanwhile, the first data templates are clustered, so that similar data templates can be classified into one type, and a plurality of intermediate templates are obtained. This helps reduce redundancy and complexity of the data templates. In summary, the preprocessing method can effectively remove noise and irrelevant information, extract useful data templates, generate language description templates with strong readability, and provide high-quality training data for subsequent tasks.

S103: and inputting the first preprocessing data into a convolutional neural network for model training to obtain a language query model.

After preprocessing the historical structured query language to obtain first preprocessed data, the language query model building system can input the first preprocessed data into the convolutional neural network to perform model training to obtain a language query model.

Among them, CNN (Convolutional Neural Network ) is a deep learning model widely used for computer vision and image recognition tasks. The method is based on the cognitive mode of human beings on visual information processing, the characteristics in the image are extracted through multi-layer rolling and pooling operation, and the characteristics are input into a full-connection layer for classification and identification. The core idea of CNN is to capture local spatial structure information of input data using convolution operations. By using a series of convolution layers and activation functions, the CNN may automatically learn low-level features (e.g., edges, textures, etc.) of the input image, and then gradually combine these features to obtain higher-level features (e.g., shapes, objects, etc.).

In some possible implementations, the language query model includes the following modules:

nested query module (prediction unit\except): the module is used for generating joint and difference set operation in the nested query sentences, and can predict and generate corresponding grammar.

Keyword module (predictive/select/insert/update): this module is used to predict and generate keywords in SQL statements, including WHERE, SELECT, INSERT and UPDATE, etc.

A column name module: the module is used for generating column names in SQL sentences, and can predict and generate corresponding column names according to the context.

Operator block (>/=/like): the module is used for generating comparison and logic operators, including greater than, less than, equal to, LIKE and the LIKE.

Aggregation function module (max/min/sum): this module is used to generate aggregation functions such as MAX, MIN, SUM, etc.

A predictor query or terminator module: the module is used for predicting and generating sub-queries or terminators, and can generate corresponding statement structures according to requirements.

Predictive conditional expression relationship module (and/or): the block is used to predict AND generate relationships between conditional expressions, including AND AND OR.

ORDER BY Module (des/asc/limit): this module is used to generate ORDER BY clauses, including descending ORDER, ascending ORDER, limiting the number of results, and so forth.

GROUP BY module (holding): the module is used for generating GROUP BY clauses and HAVING clauses, and is used for grouping and filtering query results.

The calling sequence of each module can be determined according to the requirement of the model and grammar rules through training to determine the calling sequence, so as to generate SQL query sentences meeting grammar and semantic requirements. Such a modular structure enables the model to flexibly generate various complex SQL query statements.

In some possible implementations, the method further includes B1-B3:

B1: and acquiring a target answer of a real sentence corresponding to the test set, and acquiring a target character corresponding to the target answer, wherein the real sentence is a sentence input by a user.

In order to improve accuracy, consistency and reliability of the model, the language query model construction system first needs to acquire target answers of real sentences input by a user corresponding to the test set, and acquire target characters corresponding to the target answers.

In some possible implementations, the actual sentence may be a question or statement sentence, or the like.

In some possible implementations, the target answer is a historic structured query language. The target answer is a correct answer marked manually, and the target characters are key information extracted from the target answer.

B2: and inputting the test set into the language query model to obtain an output structured query language.

After the target answers and target characters are obtained, the language query model building system also needs to input the test set into the language query model to obtain an output structured query language.

B3: and when the output answer of the output structured query language is inconsistent with the target answer and/or the output characters of the output structured query language are inconsistent with the target characters, correcting the language query model according to the target answer and/or the target characters.

When the output answer of the output structured query language is inconsistent with the target answer and/or the output character of the output structured query language is inconsistent with the target character, the language query model construction may modify the language query model according to the target answer and/or the target character.

Wherein, the output answer refers to the answer text carried in the structured query language generated by the language query model, and the output character refers to the specific character contained in the answer text.

In some possible implementations, the language query model may be modified according to the target answer and/or the target character by defining a new one-to-one alignment mapping feature, and using a reordering technique to perform a preferential selection of correct answers to the obtained multiple candidate results; or a data enhancement method, etc.

Modifying the language query model according to the target answer and/or the target character includes the following aspects:

characteristic engineering: this means that a series of processing is performed on the real sentence according to the target answer and the target character of the real sentence, and useful new features representing the features of the data are extracted. By defining new one-to-one alignment mapping features, hidden rules and information in the data can be better mined.

Super parameter tuning: the super-parameters refer to parameters needing to be set manually in the machine learning model, such as weight attenuation coefficients, learning rates and the like. Through reasonable adjustment, the generalization performance of the model on the training set and the testing set can be optimized.

A reordering technology: this means that the correct answer is preferentially selected by carrying out the selection of a plurality of obtained candidate results. In the question-answering system, the results can be ordered according to the relevance and the credibility of the answers, so that the accuracy of the system is improved.

The data enhancement method comprises the following steps: the data enhancement refers to the operation of transforming, expanding and the like on the real sentence according to the target answer and the target character of the real sentence, and more training data is generated. Therefore, the problem of insufficient data can be effectively relieved, and the generalization capability of the model is improved.

By using the real data to carry out model correction, the accuracy, consistency and reliability of the model can be improved, so that the performance of the language query model is improved.

Based on the content of S101-S103, a history structured query language is obtained. Next, the historical structured query language is preprocessed to obtain first preprocessed data. And finally, inputting the first preprocessing data into a convolutional neural network for model training to obtain a language query model. The method and the device are used for model training based on the convolutional neural network by using the processed historical structured query language to generate a language query model capable of automatically generating SQL sentences. The language query model can enable the developer to obtain SQL sentences without checking data details, so that the query and acceptance efficiency of service personnel is improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

Based on the embodiment of the training method of the query language acquisition model, the embodiment of the application also provides a query language acquisition method. Referring to fig. 2, fig. 2 is a flowchart of a method for obtaining a query language according to an embodiment of the present application. As shown in fig. 2, the method includes S201 to S204:

s201: and obtaining the requirement to be processed.

When the language query model is utilized to perform language query, the query language acquisition system needs to acquire the to-be-processed requirement proposed by the user.

S202: and preprocessing the to-be-processed requirement to obtain second preprocessed data.

After the requirement to be processed, which is proposed by the user, is acquired, the query language acquisition system also needs to preprocess the requirement to be processed to obtain second preprocessing data.

In some possible implementations, the preprocessing the requirement to be processed includes preprocessing second data including C1-C3:

c1: and removing special characters and punctuation marks in the to-be-processed requirements to obtain a clean text.

In order to achieve the purpose of preprocessing the to-be-processed requirement to obtain second preprocessed data, the query language acquisition system needs to remove special characters and punctuation marks in the to-be-processed requirement to ensure the cleanliness and consistency of texts.

Where special characters and punctuation marks refer to symbols having a special meaning or function in text. They typically do not contain actual semantic information, but rather are used to represent text structures, punctuation, quotation marks, brackets, connectors, etc.

C2: and performing word segmentation processing on the clean text to obtain a word segmentation text.

After the special characters and punctuations in the requirements to be processed are removed to obtain a clean text, the query language acquisition system also needs to perform word segmentation processing on the clean text to obtain a word segmentation text.

In some possible implementations, the word segmentation process for clean text may generally be performed by the following method: using a word segmentation tool library: some chinese word segmentation tool libraries, such as jieba, pkuseg, etc., may be used that automatically segment text by word.

Calling a word segmentation API: if a word segmentation API is available, clean text can be called through the API and returned word segmentation results can be obtained.

Custom rules: according to specific requirements and text structures, the clean text can be subjected to word segmentation processing by a custom rule. For example, text may be cut by spaces, punctuation, and the like.

The mixing method comprises the following steps: and combining a plurality of methods to perform word segmentation, such as performing preliminary word segmentation by using a word segmentation tool library, and then further adjusting and correcting the result according to a specific rule.

The word segmentation process of chinese text is more complex than english because chinese has no explicit word spacing. In the word segmentation process, semantic and contextual information needs to be considered to obtain more accurate word segmentation results. Meanwhile, domain-specialized word segmentation processing may be required for a specific domain or task.

And C3: and carrying out semantic analysis on the word segmentation text to obtain the second preprocessing data.

After the clean text is subjected to word segmentation processing to obtain a word segmentation text, the query language acquisition system can perform semantic analysis on the word segmentation text to obtain the second preprocessing data.

In some possible implementations, the semantic parsing of the segmented text may use the following method:

named Entity Recognition (NER): by using a named entity recognition algorithm, specific entities such as a person name, a place name, an organization name and the like can be recognized from the word segmentation text and marked. This helps to better understand the semantic information of the text.

Keyword extraction: the most representative and important words may be extracted from the segmented text as keywords using a keyword extraction algorithm. These keywords may provide the subject matter and core concepts of the text.

Syntax parsing: by using a grammar analysis technology, grammar relations between words in the word segmentation text, such as main-term relations, fixed-form supplementary relations and the like, can be analyzed. This allows a further understanding of the grammatical structure of the text and facilitates the acquisition of more advanced semantic information.

Word sense disambiguation: in a word segmentation text, some words may have multiple meanings, and through a word sense disambiguation algorithm, the exact meaning of each word in context may be determined, thereby more accurately understanding the semantics of the text.

Constructing a semantic map: by carrying out semantic analysis on the word segmentation text, a semantic graph can be constructed, and the semantic relation among the words can be coded and represented. This helps to better understand the semantics of the text and to make subsequent semantic analysis and reasoning.

Preprocessing the requirements to be processed, including clean text generation, word segmentation and semantic analysis, can improve the accuracy, efficiency and reliability of text processing.

S203: when no query record of the to-be-processed requirement exists in the database, the second preprocessing data is input into a language query model to obtain a target structured query language, and the target structured query language is output to a client, wherein the language query model is constructed according to the language query model construction method.

When no query record of the to-be-processed requirement exists in the database, the query language acquisition system can input the second preprocessing data into the language query model, so that a target structured query language is obtained, and the target structured query language is output to the client.

S204: when the query records of the to-be-processed requirements exist in the database, outputting the query records to the client corresponding to the target structured query language.

When the query records of the to-be-processed requirements exist in the database, the query language acquisition system can directly output the target structured query language corresponding to the query records to the client.

Based on the above-mentioned content of S201-S204, when the query record of the requirement to be processed does not exist in the database, the language query of the requirement to be processed after the pretreatment can be performed by using the language query model after the training to obtain the target structured query language. When the query records of the to-be-processed requirements exist in the database, the target structured query language corresponding to the query records can be directly output to the client. And carrying out language query on the preprocessed requirement to be processed by using a language query model to obtain a target structured query language, so that the query and acceptance efficiency of business personnel can be improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a device for constructing a language query model according to an embodiment of the present application. As shown in fig. 3, the construction device of the language query model includes:

a first obtaining unit 301, configured to obtain a history structured query language.

A first preprocessing unit 302, configured to preprocess the historic structured query language to obtain first preprocessed data.

The model training unit 303 is configured to input the first preprocessed data into a convolutional neural network for model training to obtain a language query model.

In some possible implementations, the apparatus further includes:

And the first removing unit is used for removing the data value and the column name information in the historical structured query language to obtain a first data template.

And the clustering unit is used for clustering the first data templates to obtain a plurality of intermediate templates, and removing target templates in the intermediate templates to obtain a plurality of second data templates.

And the selecting unit is used for selecting a preset number of second data templates from the second data templates according to the template occurrence frequency to serve as third data templates.

And the setting unit is used for taking the Chinese description as a language description template when the third data template is consistent with the Chinese description of the third data template.

And the obtaining and filling unit is used for obtaining a second data value and second column name information which are the same as the third data template and the language description template in type, and filling the second data value and the second column name information into the third data template and the language description template to obtain a template set.

The dividing setting unit is used for dividing the template set into a training set and a testing set, and taking the training set as the first preprocessing data.

In some possible implementations, the apparatus further includes:

the second acquisition unit is used for acquiring target answers of real sentences corresponding to the test set and acquiring target characters corresponding to the target answers, wherein the real sentences are sentences input by a user.

And the input unit is used for inputting the test set into the language query model to obtain an output structured query language.

And the correction unit is used for correcting the language query model according to the target answer and/or the target character when the output answer of the output structured query language is inconsistent with the target answer and/or the output character of the output structured query language is inconsistent with the target character.

The embodiment of the present application provides a device for constructing a language query model, after a first obtaining unit 301 obtains a historical structured query language, a first preprocessing unit 302 may perform preprocessing on the historical structured query language to obtain first preprocessed data, so that a model training unit 303 may input the first preprocessed data into a convolutional neural network to perform model training to obtain the language query model. The method and the device are used for model training based on the convolutional neural network by using the processed historical structured query language to generate a language query model capable of automatically generating SQL sentences. The language query model can enable the developer to obtain SQL sentences without checking data details, so that the query and acceptance efficiency of service personnel is improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a query language obtaining device according to an embodiment of the present application. As shown in fig. 4, the query language acquisition device includes:

the third obtaining unit 401 is configured to obtain a requirement to be processed.

The second preprocessing unit 402 is configured to preprocess the to-be-processed requirement to obtain second preprocessed data.

And the input/output unit 403 is configured to input the second preprocessed data into a language query model to obtain a target structured query language when the query record of the requirement to be processed does not exist in the database, and output the target structured query language to the client, where the language query model is constructed according to the language query model construction method as described above.

And the output unit 404 is configured to output, when the query record of the to-be-processed requirement exists in the database, the query record corresponding to the target structured query language to the client.

In some possible implementations, the apparatus further includes:

and the second removing unit is used for removing the special characters and punctuation marks in the to-be-processed requirements to obtain clean texts.

And the word segmentation unit is used for carrying out word segmentation processing on the clean text to obtain a word segmentation text.

And the semantic analysis unit is used for carrying out semantic analysis on the word segmentation text to obtain the second preprocessing data.

In addition, the embodiment of the application also provides a device for constructing a query language acquisition model, which comprises: the language query model constructing method comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the constructing method of the language query model when executing the computer program.

In addition, the embodiment of the application also provides query language acquisition equipment, which comprises the following steps: the query language acquisition system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the query language acquisition method is realized when the processor executes the computer program.

In addition, the embodiment of the application also provides a computer readable storage medium, wherein the computer readable storage medium stores instructions, and when the instructions run on the terminal device, the instructions cause the terminal device to execute the method for constructing the language query model or the method for acquiring the query language.

In the embodiment of the present application, after the third obtaining unit 401 obtains the requirement to be processed, the second preprocessing unit 402 may perform preprocessing on the requirement to be processed to obtain second preprocessed data. When there is a query record of a requirement to be processed in the database, the input/output unit 403 inputs the second pre-processed data into the language query model to obtain a target structured query language, and outputs the target structured query language to the client. When there is a query record of the pending requirement in the database, the output unit 404 outputs the target structured query language corresponding to the query record to the client. According to the method and the device, the language query model is used for carrying out language query on the preprocessed to-be-processed requirement to obtain the target structured query language, so that the query and acceptance efficiency of business personnel can be improved. Meanwhile, the complexity and errors of manually writing SQL sentences are avoided, and the development efficiency and the code quality are improved.

The language query model construction method, the query language acquisition method and the related devices provided by the application are described in detail. In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

It should be noted that, the language query model construction method, the query language acquisition method and the related device provided by the invention can be used in the artificial intelligence field or the financial field, and the above is only an example and does not limit the application fields of the language query model construction method, the query language acquisition method and the related device provided by the invention.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

It is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for constructing a language query model, the method comprising:

acquiring a history structured query language;

2. The method of claim 1, wherein preprocessing the historic structured query language to obtain first preprocessed data comprises:

3. The method according to claim 2, wherein the method further comprises:

4. A query language acquisition method, the method comprising:

acquiring a requirement to be processed;

when no query record of the to-be-processed requirement exists in the database, inputting the second preprocessing data into a language query model to obtain a target structured query language, and outputting the target structured query language to a client, wherein the language query model is constructed according to the language query model construction method according to any one of claims 1-3;

5. The method of claim 4, wherein preprocessing the demand to be processed to obtain second preprocessed data comprises:

6. A device for constructing a language query model, the device comprising:

7. A query language acquisition device, the device comprising:

the input-output unit is used for inputting the second preprocessing data into a language query model to obtain a target structured query language and outputting the target structured query language to a client when the query record of the to-be-processed requirement does not exist in the database, wherein the language query model is constructed according to the language query model construction method as set forth in any one of claims 1-3;

8. A query language acquisition model construction apparatus, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the language query model construction method of any one of claims 1-3 when the computer program is executed.

9. A query language acquisition device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed, implements the query language retrieval method of claim 4 or 5.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein instructions, which when run on a terminal device, cause the terminal device to perform the language query model construction method of any one of claims 1-3, or to perform the query language acquisition method of claim 4 or 5.