CN115577085A - Processing method and equipment for table question-answering task - Google Patents

Processing method and equipment for table question-answering task Download PDF

Info

Publication number
CN115577085A
CN115577085A CN202211269215.4A CN202211269215A CN115577085A CN 115577085 A CN115577085 A CN 115577085A CN 202211269215 A CN202211269215 A CN 202211269215A CN 115577085 A CN115577085 A CN 115577085A
Authority
CN
China
Prior art keywords
query
information
query information
sql
sql statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211269215.4A
Other languages
Chinese (zh)
Inventor
杨旭强
罗雪峰
蒋宗亨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202211269215.4A priority Critical patent/CN115577085A/en
Publication of CN115577085A publication Critical patent/CN115577085A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a processing method and equipment for a form question-answering task. According to the method, the target tables matched with the query information are screened from the plurality of data tables of the form question and answer task according to the query information input by the user, the number and the total column number of the screened target tables are greatly reduced, the data tables related to the current round of query are accurately screened and positioned, the query information is further converted into the first SQL sentence based on the target tables, the efficiency of converting the query information into the SQL sentence can be improved, the efficiency of converting the form question and answer into the SQL sentence can be improved, the time consumed by answering is shortened, and the timeliness of responding to the user is improved.

Description

Processing method and equipment for table question-answering task
Technical Field
The present application relates to artificial intelligence technologies, and in particular, to a method and an apparatus for processing a form question-answering task.
Background
The table question-answering is a question-answering engine giving answers according to table contents based on a Natural Language technology, is a core dialogue engine in a table question-answering product such as an intelligent customer service dialogue robot and mainly relates to technologies such as NL2SQL (Natural Language to SQL) and SQL (Structured Query Language) execution. The dialogue robot based on the table question-answer relies on converting a question input by a user into an SQL statement using an NL2SQL model, and acquires response information by executing the SQL statement.
The current NL2SQL model typically works on only one data table, and there is an upper limit on the number of columns of the table (typically a dozen or so columns). In most business scenarios, there are typically multiple data tables that cooperate together to serve the user. Usually, a plurality of data tables are combined into one data table containing a plurality of columns to adapt to the NL2SQL model, so as to implement the table question-answering function. However, the NL2SQL model has more parameters, and in a complex business scenario with more table columns, the NL2SQL model is used to convert a user question into an SQL statement, which takes longer time, resulting in low efficiency of table question-answering, longer time consumed for answering, and untimely response to the user.
Disclosure of Invention
The application provides a processing method and equipment for a form question-answering task, which are used for solving the problems of low efficiency of form question-answering, long time consumption of answering and untimely response to a user in a complex business scene with more form columns.
In one aspect, the present application provides a method for processing a form question-answering task, including:
acquiring query information input by a user;
according to the query information, screening out a target table matched with the query information from a plurality of data tables of a table question-answering task;
converting the query information into a first SQL statement based on the target table;
and executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
In another aspect, the present application provides a processing apparatus for a form question-answering task, including:
the information acquisition module is used for acquiring query information input by a user;
the table selection module is used for screening a target table matched with the query information from a plurality of data tables of the table question-answering task according to the query information;
the conversion module is used for converting the query information into a first SQL statement based on the target table;
and the query module is used for executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
In another aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer execution instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of the first aspect.
In another aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the method of the first aspect when executed by a processor.
According to the processing method and the processing equipment for the form question-answering task, the target tables matched with the query information are screened from the multiple data tables of the form question-answering task according to the query information input by the user, the number and the total columns of the screened target tables are greatly reduced, the data tables related to the current round of query are accurately screened and positioned, the query information is further converted into the first SQL sentence based on the target tables, the efficiency of converting the query information into the SQL sentence can be improved, the efficiency of form question-answering can be improved, the time consumed for answering is shortened, and the timeliness of responding to the user is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of an example network architecture upon which the present application is based;
FIG. 2 is a flow chart of a method for form question answering provided in an exemplary embodiment of the present application;
FIG. 3 is a flow chart of a method for form question answering provided by an exemplary embodiment of the present application;
fig. 4 is a frame diagram of an SQL statement corresponding to query information generated based on semantic rules according to an exemplary embodiment of the present application;
FIG. 5 is a flowchart of a method for a form question-answer including a form selection intervention as provided by an exemplary embodiment of the present application;
FIG. 6 is a flowchart of generating a reply message according to an exemplary embodiment of the present application;
FIG. 7 is a block diagram illustrating the generation of reply messages that augment a reply intervention mechanism as provided by an exemplary embodiment of the present application;
FIG. 8 is a general architecture diagram of a form question and answer provided by an exemplary embodiment of the present application;
FIG. 9 is a schematic structural diagram of a processing device for a form question and answer task according to an exemplary embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an example embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terms referred to in this application are explained first:
the table question-answer: the method is to provide answers according to table contents based on Natural Language technology for Natural Language (NL) problems.
NL2SQL (Natural Language to SQL): is an abbreviation for converting natural Language into Structured Query Language (SQL) statements, which essentially converts the user's natural Language into a normalized semantic representation that can be understood by a computer.
Relational Database (Relational Database): relational models are employed to organize a database of data that stores data in rows and columns for easy understanding by users.
Elastic search: ES, is a Lucene-based search server. It provides a distributed multi-user capability full-text search engine. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine.
Dictionary tree: also known as a word-lookup tree, commonly referred to as a Trie tree, is a tree-like structure that is a variant of a hash tree.
The form question-answering is a question-answering engine giving answers according to form contents based on a natural Language technology, is a core conversation engine in a form question-answering product such as an intelligent customer service conversation robot, and mainly relates to technologies such as NL2SQL (Structured Query Language) execution and Structured Query Language (SQL) execution. The dialogue robot based on the table question-answer relies on converting a question input by a user into an SQL statement using an NL2SQL model, and acquires response information by executing the SQL statement.
In most business scenarios, a plurality of tables are usually matched together to serve a user, and the user generally only relates to one table at a time, for example, in a furniture mall business scenario, there may be tables storing relevant data of tables, television cabinets, curtains, lamps and the like, respectively, and the user's question may be "table size information" and the data of the table is referred to, but no other tables are referred to. At present, the current NL2SQL model usually only works on one data table, and usually combines a plurality of data tables into one data table containing a plurality of columns to adapt to the NL2SQL model to realize the table question-answering function. However, NL2SQL model has more parameters, and there are performance problems in processing based on tables in more columns, so that in a complex business scenario with more columns in tables, the conversion of user question statements into SQL statements by using the NL2SQL model takes longer, resulting in low efficiency of table question-answering, longer time-consuming response, and untimely response to users.
In addition, in some complex business scenarios, the number of related data tables may be as many as several tens, and the number of columns may be as many as several hundreds, and the total number of columns far exceeds the upper limit of the number of columns supported by the existing NL2SQL model, so that a single dialog robot cannot implement the table question-answering function in the complex scenario. Under the condition, a plurality of dialogue robots need to be realized, a scheduler is added to schedule the plurality of dialogue robots to respectively obtain query result data based on different forms, the scheduler identifies correct query result data according to feedback results of the dialogue robots, error identification is easy to occur, the cost is high, no dialogue robot records a complete end-to-end form question-answering process, and the later maintenance is not friendly.
The method for the form question answering includes the steps that entity information contained in query information input by a user is identified, at least one first SQL statement corresponding to the query information is generated according to the entity information, a data table related to the first SQL statement is selected from a plurality of data tables of a form question answering task to serve as a target table matched with the query information, namely the target table related to the current round of query, the query information is converted into a second SQL statement only based on the target table to obtain SQL statements converted by the query information, a small number of target tables related to the current round of query are screened from a large number of data tables based on the query information, interference of other irrelevant data tables is eliminated, the query information input by the user is converted into the SQL statements only based on the small number of target tables, the efficiency of conversion into the SQL statements can be improved, the efficiency of the form question answering is improved, the time consumption is shortened, and the timeliness of response to the user is improved.
Fig. 1 is a schematic diagram of an exemplary network architecture based on the present application, and as shown in fig. 1, the network architecture includes a terminal and a server.
The server may be a server cluster deployed in the cloud, or an electronic device having local computing capability, or an Internet of Things (IoT) device. The server stores a plurality of data tables of the table question-answering tasks and an NL2SQL model, and can acquire query information input by a user. The method comprises the steps that through operation logic preset in a server, the server obtains query information input by a user, identifies entity information contained in the query information, and generates at least one first SQL statement corresponding to the query information according to the entity information; selecting a data table related to a first SQL statement from a plurality of data tables of the table question-answering task as a target table of the current round of query, and converting query information into a second SQL statement based on the target table; and executing the second SQL statement to query the target table to obtain query result data corresponding to the query information.
The terminal may specifically be a hardware device having a network communication function, an operation function, and an information display function, and includes, but is not limited to, a smart phone, a tablet computer, a desktop computer, an internet of things device, and the like.
Through communication interaction with the server, the terminal can upload the query information input by the user to the server so that the server can obtain query result data corresponding to the query information.
After the server acquires the query result data corresponding to the query information, the query result data can be fed back to the terminal, or the query result data is post-processed to generate reply information, and the reply information is fed back to the terminal.
The form question-answering method provided by the application can be specifically realized as a product with a form question-answering function, such as a conversation robot, a question-answering system and the like, can be applied to form question-answering in various application scenes/fields, and can be applied to form question-answering in various fields, such as intelligent education, electronic commerce, finance, medical treatment, traffic and the like.
The following describes the technical solution of the present application and how to solve the above technical problems in detail by specific embodiments. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a flowchart of a method for form question answering according to an exemplary embodiment of the present application. The execution subject of the method provided by this embodiment is the above-mentioned server, and specifically may be a server/cluster deployed in a cloud or locally, an internet of things device, and the like. As shown in fig. 2, the method comprises the following specific steps:
and step S201, acquiring query information input by a user.
The query information is a natural language question input by a user under a form question and answer task, and can be text information input by the user on an interactive interface or text information converted from information of other modes, such as voice information uploaded by the user.
And S202, screening a target table matched with the query information from a plurality of data tables of the table question-answering task according to the query information.
In this embodiment, before the NL2SQL model is used to convert the query information into the SQL statement, first, according to the content of the query information, a small number of data tables closely related to the query information are screened out from the multiple data tables of the table question-answering task, and used as the target tables matched with the query information, and the target tables in the screening result are usually few, and account for a small percentage in the data tables of the current table question-answering task.
Step S203, converting the query information into a first SQL statement based on the target table.
Further, based on the target table which is determined after screening and matched with the query information, the NL2SQL model is used for converting the query information into SQL like based on the target table, and the SQL like is rewritten into an executable first SQL statement. Compared with all data tables under the current table question-answering task, the number and the total columns of the screened target tables are greatly reduced, so that compared with the prior art that the query information is converted into the SQL sentences based on all the data tables under the current table question-answering task, the efficiency of converting the query information into the SQL sentences based on the target tables is greatly improved by converting the query information into the first SQL sentences based on the target tables.
And step S204, executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
After the query result data corresponding to the query information is obtained, the query result data corresponding to the query information can be fed back to the user. For example, the query result data may be output in the form of a table as reply information fed back to the user.
After the query result data corresponding to the query information is obtained, a reply dialect can be generated according to the query result data corresponding to the query information, and the reply dialect is fed back to the user.
In the embodiment, the target table matched with the query information is screened from the plurality of data tables of the table question and answer task according to the query information, and the query information is converted into the first SQL sentence based on the target table, so that the total column number of the data tables based on the NL2SQL model is greatly reduced, the efficiency of converting the query information into the SQL sentence can be improved, the low efficiency of the table question and answer can be improved, the time consumed by answering is shortened, and the timeliness of responding to a user is improved.
In an optional embodiment, in step S202, a data table in which the column name and the column value included in the query information are located may be determined as a target table matched with the query information, that is, a target table involved in the query in the current round, by identifying the column name and the column value (that is, a table value) included in the query information and according to the column name and the column value included in the query information.
Fig. 3 is a flowchart of a method for form question answering according to an exemplary embodiment of the present application. Considering that the number of data tables in which the column names and the column values included in the query information are still greater, in order to more accurately locate the target table related to the current round of query, in an optional embodiment, in step S202, entity information included in the query information may also be identified, and at least one second SQL statement corresponding to the query information is generated according to the entity information; and selecting the data table related to the second SQL statement as a target table matched with the query information from the plurality of data tables of the table question-answering task, wherein the process of selecting the table is performed based on semantic rules.
As shown in fig. 3, the table question-answering method provided in this embodiment includes the following specific steps:
and step S301, acquiring query information input by a user.
Step S302, entity information contained in the query information is identified, and a second SQL statement corresponding to the query information is generated according to the entity information.
In this step, entity information such as column names, column values, aggregation functions, operators, and connectors included in the query information is identified.
Specifically, in the step, entity information contained in query information is identified by identifying possible slot positions and positions of the slot positions in the query information (query) input by a user; then, splicing the entity information according to the position of the slot corresponding to the entity to obtain a possible query result clause (select part in the SQL statement) and a query condition clause (where part in the SQL statement) in the SQL statement; and finally, generating an SQL statement according to the query result clause and the query condition clause, and obtaining a second SQL statement corresponding to the query information.
In the step, semantic analysis is performed on the query information input by the user to generate a second SQL statement corresponding to the query information, and the data table related to the query information can be accurately screened out based on the second SQL statement corresponding to the query information.
Step S303, selecting a data table related to the second SQL statement from the plurality of data tables of the table question-answering task as a target table matched with the query information.
After the second SQL statement corresponding to the query information is generated, the data tables (including the data tables related to the query condition clause and the query result clause) related to the second SQL statement corresponding to the query information are used as target tables matched with the query information, and therefore accurate screening and positioning of the data tables related to the query in the current round are achieved.
And step S304, converting the query information into a first SQL statement based on the target table.
Based on the target table which is determined after screening and matched with the query information, the NL2SQL model is used for converting the query information into the first SQL statement based on the target table, and compared with all data tables under the current table question-answering task, the number and the total column number of the screened target tables are greatly reduced, so that compared with the prior art in which the query information is converted into the SQL statements based on all data tables under the current table question-answering task, the efficiency of converting the query information into the SQL statement can be greatly improved by converting the query information into the first SQL statement based on the target table.
Step S305, executing the first SQL statement to query the target table, and obtaining query result data corresponding to the query information.
In the embodiment, entity information contained in query information input by a user is identified, a second SQL statement corresponding to the query information is generated according to the entity information, the second SQL statement corresponding to the query information is generated in a semantic parsing mode, a data table related to the second SQL statement corresponding to the query information is used as a target table matched with the query information in data tables with a large number of table question-answering tasks, the number and total columns of the screened target tables are greatly reduced, accurate screening and positioning of the data table related to the current round of query are achieved, and the efficiency of converting the query information into the SQL statement based on the target table can be greatly improved by further converting the query information into the first SQL statement based on the target table.
In an alternative embodiment, in step S302, the entity information included in the identification query information includes a column name, a column value, an aggregation function, an operator, and a connector. The column name and the column value are dynamically changeable service configuration data. Considering that the data size of the column names and the column values is large in practical application, candidate column names and candidate column values possibly contained in the query information can be recalled roughly through a search engine, and then the column names and the column values contained in the query information can be determined accurately from the candidate column names and the candidate column values. The method can be realized by the following steps S3021-S3022:
step S3021, in the search engine storing the column names and column values of the plurality of data tables of the table question and answer task, candidate column names and candidate column values that may be included in the query information are searched for based on the matching degree with the query information.
The search engine may be an ES (elastic search), or other engine capable of fast search.
Taking ES as an example, the ES stores column names and column values of all data tables of the current table question-answering task, and configures preset conditions for rough recall. In this step, candidate column names and candidate column values that satisfy a preset condition are recalled from the search engine.
It should be noted that the preset conditions include conditions that the candidate column name to be recalled and the candidate column value need to satisfy, and the conditions that the candidate column name to be recalled and the candidate column value need to satisfy are different from each other and may be configured separately.
In this embodiment, the matching degree between the column name/column value and the query information may be the number of matched words or other information capable of measuring semantic similarity between texts, and this embodiment is not specifically limited herein.
Alternatively, in a search engine storing column names and column values of a plurality of data tables of the table question and answer task, when candidate column names possibly included in the query information are searched based on the matching degree with the query information, the search engine may search for candidate column names whose matching degree with the query information is greater than or equal to a first matching degree threshold value according to the matching degree of the column names and the query information, so as to roughly recall more candidate column names possibly included in the query information.
The first matching degree threshold is set to a lower value to recall a plurality of candidate column names roughly, and the first matching degree threshold can be set and adjusted according to the needs of the actual application scenario, which is not specifically limited herein.
For example, the matching degree of the column name and the query information may be the number of matched words, and the first matching degree threshold may be set to 1, that is, the column name matched with the query information by at least 1 word may be recalled as the candidate column name.
Optionally, in a search engine storing column names and column values of multiple data tables of the table question-answering task, when candidate column names possibly included in the query information are searched based on the matching degree with the query information, the column names in the search engine may be sorted according to the matching degree of the column names and the query information, and candidate column names of at least a column name recall quantity threshold value are determined according to a column name sorting result, so as to roughly recall the candidate column names possibly included in more query information.
The column name recall number threshold may be several tens of column name recall number thresholds, which are a large value, so that a large number of candidate column names are recalled roughly, and the specific value of the column name recall number threshold may be set and adjusted according to the needs of the actual application scenario, which is not specifically limited herein.
For example, the degree of matching between the column name and the query information may be the number of matched words, the column names may be sorted in the order from top to bottom based on the number of matched words of the column name and the query information, and a candidate column name having a threshold number of column name recalls with a larger number of matched words of the query information may be selected according to the sorting result.
Optionally, in a search engine storing column names and column values of a plurality of data tables of the table question-answering task, when candidate column values possibly contained in the query information are searched based on the matching degree with the query information, the column values in the search engine may be grouped according to the table and the column; sorting the column values in each group according to the matching degree of the column values and the query information; and determining candidate column values of at least a column value group recall quantity threshold value in each group according to the column value sorting result, thereby recalling the candidate column values of at least the column value group recall quantity threshold value in each group so as to roughly recall the candidate column values possibly contained in more query information.
When the column values in the search engine are grouped according to the table and the column, the column values in different data tables are grouped in different groups, and the values in different columns of the same data table are grouped in different groups. If a plurality of identical column values exist in the same column value of the same data table, the duplicate removal processing is carried out during grouping, so that the same grouping does not have repeated column values, and the efficiency of recalling candidate column values can be improved.
Further, when at least a candidate column value of the column value grouping recall number threshold is recalled in each grouping, the respective groupings may be processed in parallel to improve efficiency in recalling the candidate column values.
The threshold value of the number of column name recalls may be 10, 20, etc., and may be set and adjusted according to the needs of the actual application scenario, which is not limited in this embodiment.
Optionally, in a search engine storing column names and column values of a plurality of data tables of the table question-and-answer task, when candidate column values possibly included in the query information are searched based on the matching degree with the query information, the candidate column values with the matching degree with the query information being greater than or equal to a second matching degree threshold value may be searched in the search engine according to the matching degree between the column values and the query information in a similar manner of searching the candidate column names; or, sorting the column values in the search engine according to the matching degree of the column values and the query information, and determining candidate column values of at least a column value recall quantity threshold value according to a column value sorting result so as to roughly recall candidate column values possibly contained in more query information.
The second matching degree threshold may be the same as the first matching degree threshold or different from the first matching degree threshold, the column recall number threshold is greater than the column name recall number threshold, and the second matching degree threshold and the column recall number threshold may be configured and adjusted according to an actual application scenario, which is not specifically limited herein.
Step S3022 determines the column name included in the query information from among the candidate column names, and determines the column value included in the query information from among the candidate column values.
After recalling the candidate column names, the column names contained in the query information are accurately determined in the candidate column names. After recalling the candidate column value, the column value contained in the query information is accurately determined in the candidate column value.
The overall process of identifying the column name and the column value contained in the query information comprises the following steps: pretreatment, coarse recall and post-treatment. The preprocessing is to segment the table data under the current table question-answering task according to a preset segmentation protocol and store the segmented table data in a search engine. And during rough recalling, segmenting the query information according to a preset segmentation protocol, and roughly recalling the candidate column names and the candidate column values in the search engine according to the segmentation result. And in post-processing, the column name and the column value contained in the query information are accurately determined. The preset segmentation protocol may adopt a text segmentation method commonly used in rough recall, and is particularly and flexibly configured and adjusted, which is not specifically limited herein.
Illustratively, taking the matching degree of the column name/column value and the query information as the number of matching words as an example, during the preprocessing, each column name and each column value in the table data are divided into single words (such as a chinese character or an english word) according to the segmentation, and stored in the ES. During rough recall, dividing the query information into single characters, and roughly recalling a column name matched with at least one character with the query information in a search engine as a candidate column name based on a division result; and precisely determining the column names contained in the query information in the candidate column names.
In addition, in practical applications, some list names (headers) are described variously, for example, "fund names", and possible synonyms are "fund abbreviation", "fund product name", "fund name", and the like. Synonyms can be configured for one or more column names in combination with a column name intervention mechanism in this embodiment. And when the candidate column names are subjected to rough recall, the query information is matched with any synonym of the column names, and the synonym is considered as the query information matched with the column names. When it is determined whether the query information contains the candidate column names or not, if the query information contains synonyms of the candidate column names, the query information is considered to contain the candidate column names.
Likewise, in practical applications, there may be synonyms for column value, such as "color" where a column has a column value of "colorful" and possible synonyms are "color", "suit", etc. Synonyms can also be configured for one or more column values in the embodiment in combination with a column value intervention mechanism. And when the candidate column values are coarsely recalled, the query information is matched with any synonym of the column values and is regarded as the query information which is matched with the column values. When it is determined whether the query information contains the candidate column value, if the query information contains the synonym of the candidate column value, the query information is considered to contain the candidate column value.
In step S302, the entity information included in the query information further includes aggregation functions, operators and connectors, which are aggregation functions, operators and connectors used in the structured query language, and are some static entity information, which generally does not change. The possible query methods of the static entity information can be enumerated, and the data size is not large, so that the aggregation function, the operator and the connector contained in the query information can be identified in a dictionary tree mode, and the identification efficiency of the entity information such as the aggregation function, the operator and the connector is improved.
Specifically, it is possible to obtain an available question of each entity information in the aggregation function, the operator, and the connector as much as possible, and construct a dictionary tree (Trie tree). Identifying aggregation functions, operators and connectors contained in the query information can be specifically realized by adopting the following modes: acquiring a dictionary tree, wherein the dictionary tree comprises an aggregation function, an operator and an available question of a connector; the aggregation function, the operator and the connector contained in the query information are looked up in the dictionary tree.
Alternatively, the available questions of the aggregation function, the operator, and the connector may be stored in a key-value database, where key is the available question and value is the corresponding entity information (which may be the aggregation function, the operator, or the connector). And matching the query information with the key based on the key-value database, so that the aggregation function, the operator and the connector contained in the query information can be searched.
In an optional embodiment, in the step S302, the entity information included in the query information is identified to include a plurality of entity words, each entity word corresponds to an entity, and after the entity information included in the query information is identified, the second SQL statement corresponding to the query information is generated according to the entity information, which may be implemented by using the following steps S3023 to S3026:
and S3023, splicing the entity information to obtain a query result clause and a query condition clause of the query in the current round.
In this step, entity splicing can be performed according to the syntax rule of the SQL statement, and entity information is spliced into a query result clause and a query condition clause.
It should be noted that, when the entity splicing is performed, a query result clause and a query condition clause for each data table are respectively generated according to the data table where the column name and the column value are located, and the query result clause and the query condition clause of the same data table are spliced, so that a third SQL statement for querying the data table can be obtained. If the identified entity information relates to multiple data tables, a third SQL statement for querying each of the data tables may be obtained.
The following describes a specific process of generating a query result clause and a query condition clause for a data table according to column names and column values belonging to the same data table in entity information included in query information, and an aggregation function, an operator, and a connector in the entity information:
the query condition clause refers to the where portion of the SQL statement. The splicing can be realized by adopting the following method: 1) Determining column names, column values and operational characters; 2) If the attribute of the column value supports the operator and the text distance of the column value and the operator in the query information is smaller than a first distance threshold (if the value is 0), and if the attribute of the column name supports the operator and the text distance of the column name and the operator in the query information is smaller than a second distance threshold (if the value is 0), splicing the column name, the column value and the operator into a query condition, and taking the column name in the query condition as the first column name; 3) Determining the logic relation among all the query conditions; 4) And splicing the plurality of query conditions into a query condition clause according to the logic relationship among the query conditions.
For each possible column value, preferentially determining column name information possibly corresponding to the column value, judging based on the distance between the column value and the column name in the query information, the relation between the column value and the column name in the data table, if the column value and the column name can be associated, reserving the column value, otherwise, discarding the column value. After determining the column value and the column name, judging whether to reserve or not by utilizing the attribute (text or number) of the column value and the distance of the column value and the operator in the query information, and finally determining the where part.
The query result clause refers to the select portion in the SQL statement. Splicing can be realized by adopting the following method: 1) Determining a second column name related to the query result clause or the second column name and an aggregation function according to the first column name and the entity word; 2) If the attribute of the column value corresponding to the second column name supports the aggregation function and the text distance between the second column name and the aggregation function in the query information is smaller than a third distance threshold (if the value is 0), generating an aggregation column name according to the second column name and the aggregation function; 3) And splicing the second column name and the aggregation column name of the unassociated aggregation function into a query result clause.
To determine the select part, the column names and aggregation functions identified by the entities and the result of the where part are used, the column names appearing in the where part are removed first, then association adaptation is performed with the aggregation functions for each column name, the judgment condition is that the column names and the aggregation functions appear in the text, and if no association exists, the default is that no aggregation function exists.
And step S3024, generating a plurality of third SQL sentences according to the query result clause and the query condition clause.
In this step, the query result clause and the query condition clause of the same data table are spliced to obtain a third SQL statement for querying the data table.
Specifically, based on the SQL syntax, the where part and the select part of the same data table obtained by entity splicing are utilized to perform SQL splicing, and a third SQL statement for querying the data table is generated.
Optionally, in order to support multiple rounds of question and answer, query result clauses and query condition clauses obtained in historical round of query can be obtained; and generating a plurality of third SQL sentences according to the query result clauses and the query condition clauses of the query of the current round and the historical round, and screening a target table matched with the query information by integrating the query result clauses and the query condition clauses of the query of the current round and the historical round, so that the accuracy of screening the target table related to the query of the current round can be improved, the multi-round inheritance can be realized, and the multi-round question answering can be supported.
Illustratively, a query result clause and a query condition clause obtained in the last round (or multiple historical rounds in the current session) of query can be obtained; and generating a third SQL statement according to the query result clause and the query condition clause of the current round of query and the previous round (or a plurality of historical rounds in the current session) of query.
Step S3025, sorting the third SQL statements according to the number of query result clauses included in the third SQL statements and/or the number of query condition clauses included in the third SQL statements.
The third SQL statement is formed by splicing a query result clause and a query condition clause, the query result clause and the query condition clause are spliced according to entity information contained in query information, and the larger the number of the query result clauses and the number of the query condition clauses contained in the third SQL statement is, the higher the correlation between the third SQL statement and the query information input by a user is, the closer the third SQL statement is to the real query intention of the user, and the higher the quality of the third SQL statement is.
In this step, the third SQL statements are sorted according to the number of query result clauses contained in the third SQL statements and/or the number of query condition clauses contained in the third SQL statements, and the sorting result can reflect the quality of the third SQL statements.
Optionally, the third SQL statements are sorted according to the number of the query condition clauses included in the third SQL statement, and the third SQL statements including the same number of the query condition clauses are sorted according to the number of the included query result clauses to obtain a final sorting result.
Optionally, the third SQL statement may be sorted according to the sum of the number of query condition clauses included in the third SQL statement and the number of query condition clauses included in the third SQL statement, so as to obtain a final sorting result.
Optionally, the third SQL statement may be sorted according to the number of the included query condition clauses to obtain a final sorting result.
Optionally, the third SQL statement may be sorted according to the number of the included query condition clauses to obtain a final sorting result.
Step S3026, according to the sorting result of the third SQL statement, selecting the third SQL statement smaller than or equal to the threshold of the number of SQL statements as the second SQL statement corresponding to the query information, where the threshold of the number of SQL statements is a positive integer.
In the step, according to the sorting result, a third SQL statement with higher quality and no more than the threshold value of the number of the SQL statements is selected as a second SQL statement corresponding to the query information.
Considering that in practical applications, query information input by a user per round of query is usually asked for one or a few tables, the SQL statement quantity threshold may be set to a smaller value, such as 1, 3, 5, 6, 7, etc., to determine a smaller quantity of high-quality second SQL statements, so that the number of target tables involved in the second SQL statements is smaller, thereby accurately locating the target tables involved in the round of query.
Exemplarily, fig. 4 is a frame diagram of an SQL statement corresponding to query information generated based on semantic rules according to an exemplary embodiment of the present application, as shown in fig. 4, taking as an example that query information input by a user is "which products are gray in color and have a length greater than 170cm and a depth greater than 20 cm", the following entity information included in the query information may be determined through entity identification: column value: grayscale, 170, 20; an operator: greater than or equal to; the column names: color, length, depth, product name; a connector: AND (AND). Data that can be matched to 3 tables based on column name: sofa table (containing all identified column names: color, length, depth, product name), curtain table (containing column names: color, length, product name), bookcase table (containing column names: color, depth, product name). Entity information splicing is performed on each table respectively to generate a query result clause (a select part shown in the figure) and a query condition clause (a where part shown in the figure) for each table. The query result clause (select part shown in the figure) and the query condition clause (where part shown in the figure) aiming at the same table are spliced according to the SQL syntax rules, and the SQL statement (SQL-like) based on each table can be determined.
In addition, in practical applications, there are multiple question and answer types for a column of data (i.e., for a column name), such as a maximum value type, an equal type, a greater than type, etc., there are multiple possible questions for the same question and answer type, for example, there may be multiple different questions such as "equal to", "yes", "called", "yes", etc., in the question and answer process, there may be situations that cannot be identified inevitably, and the model training period is long, and when a new question appears, the new question cannot be identified accurately. In this embodiment, in the process of performing table selection based on the semantic rule, multiple questions of the question-answer type may be configured for the column name in combination with the question-answer type intervention mechanism of the column, and the multiple questions of the question-answer type configured for the column name may take effect immediately. And when the query information is matched with a question method of a certain question and answer type of a certain column name, generating a query result clause corresponding to the question method of the question and answer type of the column name, wherein the query result clause is used as a part for generating a third SQL sentence.
It should be noted that the second SQL statement and the third SQL statement may not be executable SQL but similar SQL, and based on the similar SQL statement, data table screening may be implemented to determine a target table matching the query information.
In an optional embodiment, environment variable information of the query information input by the user can be further obtained, and entity information contained in the environment variable information is identified. The environment variable information may include other information that affects query result data associated with the query information when the user inputs the query information, and may include, but is not limited to, information of a target object associated with the query information, source information of the query information, and the like.
For example, when a user consults an intelligent customer service for information about a currently browsed product while browsing a detailed page of a certain product, the user inputs a question to be consulted, that is, query information input by the user, environment variable information of the query information includes information such as an Identifier (ID) and a link of the currently browsed product, and the environment variable information of the query information may further include store information of the currently browsed product.
In step S202, when the second SQL statement corresponding to the query information is generated according to the entity information, the second SQL statement corresponding to the query information may also be generated according to the entity information included in the environment variable information and the entity information included in the query information, so that the generated second SQL statement is closer to the real query intention of the user, and thus the target table related to the query of the current round can be more accurately positioned based on the second SQL statement.
In an optional embodiment, in the above steps S203 and S304, after the target tables matching the query information are screened out from the multiple data tables of the table question-answering task, if there are multiple target tables, a virtual table is determined according to the column names of the target tables, and the virtual table includes the columns of the target tables; converting the query information into a fourth SQL statement based on the virtual table; and rewriting the fourth SQL statement into first SQL statements based on the target tables, wherein each first SQL statement relates to one target table.
If there are still a plurality of target tables after screening, the column names of all the target tables can be synthesized to form a virtual table. The NL2SQL model is used to convert the query information into a fourth SQL statement based on the virtual table, which is SQL-like and not an executable SQL statement. Further, the fourth SQL statement is rewritten into a plurality of first SQL statements related to only one target table, so that the first SQL statement corresponding to the query information can be obtained. And executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
By constructing the virtual table, the query information is converted into a fourth SQL statement based on the virtual table, and then the fourth SQL statement is rewritten into a first SQL statement based on an actual target table to obtain an executable first SQL statement, all table data under the table question-answering task does not need to be stored in a large data table, so that the storage space is saved, the query efficiency of the SQL statement can be improved, and the efficiency of the table question-answering is improved.
On the basis of any of the above embodiments, after selecting, in step S303, a data table related to the second SQL statement from among multiple data tables of the table question-answering task as a target table matched with the query information, if the total number of columns of the target table is greater than the preset upper limit of the number of columns, the second SQL statement is executed instead of executing steps S304-S305, and the queried result data is used as query result data corresponding to the query information input by the user, which may improve the performance of the table question-answering.
On the basis of any of the above embodiments, the table selection intervention rule can be flexibly configured as required to intervene in the table selection process in time.
In some scenarios in practical application, different data tables (such as the a table and the B table) store commonalities, and it is easy to happen that the query intention of the user expects to return the answer for querying the a table, but the dialogue robot for form question answering returns the answer for querying the B table. For the situation, the table selection intervention rule for configuring the data table is supported, a specific question method for inquiring the data table is configured in the table selection intervention rule of one data table, when the inquiry information contains the specific question method in the table selection intervention rule of the data table, the inquiry information is determined to be matched with the table selection intervention rule, and the data table is directly determined as a target table matched with the inquiry information.
FIG. 5 is a flowchart of a method for a form question-answer including a form selection intervention as provided in an exemplary embodiment of the present application. As shown in fig. 5, the method comprises the following specific steps:
and step S501, acquiring query information input by a user.
And step S502, determining whether the query information is matched with the table selection intervention rule.
In this embodiment, one or more table selection intervention rules of the data table may be configured. The table selection intervention rules for each data table may configure one or more specific questions for querying the data table. And when the query information contains any specific inquiry method in the table selection intervention rule of the data table, determining that the query information is matched with the table selection intervention rule.
If the query information matches the table selection intervention rule, step S503 is executed to determine a target table matching the query information.
If the query information does not match the table selection intervention rule, executing steps S504-S505 to determine a target table matching the query information.
And S503, taking the data table corresponding to the selected table intervention rule matched with the query information as a target table matched with the query information.
And step S504, identifying entity information contained in the query information, and generating a second SQL statement corresponding to the query information according to the entity information.
And step S505, selecting the data table related to the second SQL statement from the plurality of data tables of the table question-answering task as a target table matched with the query information.
And step S506, converting the query information into a first SQL statement based on the target table.
And step S507, executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
And step S508, inquiring the inquiry result data corresponding to the information to generate reply information.
In this embodiment, the implementation manners of the steps S504 to S507 are the same as those of the steps S302 to S305, and refer to the relevant contents of the above embodiment specifically, which is not described herein again.
On the basis of any one of the above embodiments, the source channel intervention rule of the query information can be flexibly configured according to needs. In the scene of a plurality of tables and questions and answers in practical application, the method supports the division of channels, and intelligently answers specified data tables or specified columns of different channels, but cannot inquire the information of other data tables or other non-specified columns. For example, the intelligent customer service may provide channels (e.g., different channels are represented by 0,1, 2) for implementing different types (which may implement different functions of question answering) for the user to select, and after the user selects one type of channel (e.g., user selection 1), for the query information input by the user, the query result data is obtained only in the specified data table or specified column corresponding to the channel.
For this case, source channel intervention rules for querying information are supported, and a specified data table or specified column that the source channel allows to query is configured in one source channel intervention rule, that is, the data table or specified column corresponding to the source channel. If the query information has a source channel and the source channel is configured with a source channel intervention rule, the data table or the designated column corresponding to the source channel is used as a target table or a target column matched with the query information, so that the efficiency of selecting the target table related to the query in the current round can be improved.
Specifically, a specific channel set with source channel intervention rules may be configured, and a data table and a specified column corresponding to each specific source channel in the specific channel set may be configured. After the query information input by the user is acquired, before step S302, it is determined whether the configured specific channel set includes a source channel of the query information according to the source channel of the query information input by the user. If the specific channel set comprises the source channel of the query information, the data table or the designated column corresponding to the source channel of the query information is used as a target table or a target column matched with the query information.
If the specific channel set does not include the source channel of the query information, the entity information included in the query information is identified and the subsequent steps are executed in step S302 to screen out the target table matching the query information from the plurality of data tables of the table question-answering task.
On the basis of any one of the above embodiments, the end-to-end SQL intervention rules can be flexibly configured according to needs. In some practical application scenarios, the user's query methods are peculiar, and the user expects specific answer to talk, and the responses of the SQL statement queries converted by the NL2SQL model for these peculiar query methods are often inaccurate. For the situation, the configuration of an end-to-end SQL intervention rule aiming at a specific question is supported, and an SQL statement corresponding to the specific question is configured in one end-to-end SQL intervention rule. When the query information contains a specific question method in the end-to-end SQL interference rule, the query information is determined to be matched with the end-to-end SQL interference rule, an SQL statement corresponding to the end-to-end SQL interference rule is directly used as a first SQL statement corresponding to the query information, the SQL statement is executed, the query result data can be obtained, and reply information is generated.
Specifically, after the query information input by the user is obtained, whether the query information is matched with the end-to-end SQL intervention rule is determined before a target table matched with the query information is screened from a plurality of data tables of the table question and answer task. And if the query information is matched with the end-to-end SQL intervention rule, executing the SQL statement corresponding to the end-to-end SQL intervention rule matched with the query information. And if the query information is not matched with the end-to-end SQL intervention rule, screening a target table matched with the query information from a plurality of data tables of the table question-answering task according to the query information, and performing the subsequent steps.
By configuring an end-to-end SQL intervention rule aiming at the peculiar question method of the user in certain scenes, the accuracy of the data of the question and answer result of the form in the scenes can be improved, and the problem in special scenes can be solved.
On the basis of any of the above embodiments, in this embodiment, a reply intervention rule may also be configured to intervene in reply information generated according to query result data, so that the generated reply information is smoother and more in line with a speaking habit of a person.
Specifically, one or more reply templates may be configured, each reply template includes one or more reply fragments, format information of the reply fragments may be configured in the reply template, and a pre-utterance and/or a post-utterance of the reply fragments may also be configured. If multiple reply fragments are included, the reply template may also be configured with links between the reply fragments. The reply segment includes a result segment and a condition segment. Each reply template may include result fragments and/or condition fragments.
The condition segments correspond to query conditions in query information input by a user, the query conditions can be displayed in the reply content by adding the condition segments in the reply template, and flexible configuration of column names and dialogs in the condition segments is supported. The result segment corresponds to the query result data of the query information, the display format of the query result data in the reply content is configured by configuring the format information of the result segment in the reply template, and flexible configuration of column names and conditions in the result segment is supported.
Fig. 6 is a flowchart of generating a reply message according to an exemplary embodiment of the present application. Referring to fig. 6, in an optional embodiment, after the first SQL statement is executed to query the target table to obtain query result data corresponding to the query information, a specific process of generating the reply information according to the query result data is as follows:
and S600, executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
Step S601, determining whether the first SQL statement is matched with the reply template.
The format information of the reply segment contained in the reply information is configured in the reply template, and the reply segment comprises a result segment and/or a condition segment.
In this step, the column name included in the first SQL statement is matched with the column name in the configured reply template, and whether the first SQL statement matches (i.e., hits) the reply template is determined according to a preset hit rule.
In this embodiment, in order to reduce the number of reply templates that need to be configured and improve the application range of the reply template, the reply template may include a mandatory reply segment and an optional reply segment, and the reply template is configured with information on whether the reply segment is mandatory. The hit rules for reply templates include:
1) All the selected reply segments in the reply template need to be hit;
2) If the reply template comprises a plurality of condition fragments connected by the or, one condition fragment in the plurality of conditions connected by the or needs to be hit, and only one condition fragment in the plurality of conditions connected by the or can be hit;
3) The optional reply segment can be hit or not hit;
4) If a plurality of reply templates are hit simultaneously, sorting the reply templates by adopting the following priority strategies in sequence, and selecting one reply template as a finally hit reply template based on a sorting result:
a) The hit priority of the optional reply segment is higher than the hit priority of the reply template of the optional reply segment;
b) For the reply templates which all hit the optional reply segments, the reply templates with less number of the optional reply segments hit have higher priority;
c) For reply templates that all hit the optional reply segment and have the same number of the hit optional reply segments, the less the number of missed reply segments, the higher the priority of the reply template.
For example, assume that the first SQL statement is: select a, b where c, d. The hit reply template 1 contains the following reply fragments: a, b, c, d, e, wherein e is an optional reply segment, and a, b, c, d are optional reply segments; the hit reply template 2 contains the following reply fragments: a, b, c and d, wherein a, b, c and d are optional reply segments. For the reply templates 1 and 2, the first SQL statement hits the optional reply segment, the number of the optional reply segments that the first SQL statement hits the reply template 1 is 1, the number of the optional reply segments that the first SQL statement hits the reply template 2 is 0, and the number of the optional reply segments that the first SQL statement hits the reply template 2 is less, so based on b), it can be determined that for the currently given first SQL statement, the priority of the reply template 2 is higher than that of the reply template 1, and the reply template 2 is used as the reply template to which the first SQL statement is matched.
The first SQL statement hit reply segment refers to a column name contained in the first SQL statement hit a column in the reply segment, and specifically refers to a column name of a column value involved in the reply segment contained in the first SQL statement.
In this step, if the first SQL statement matches the reply template, step S602 is executed to generate reply information from the query result data according to the format of the reply template.
And if the first SQL statement is not matched with the reply template, generating reply fragments according to the configured default reply generation rule and the query result data, and assembling the reply fragments into reply information.
Step S602, if the first SQL statement is matched with the reply template, generating reply information according to the query result data and the reply template.
The reply template is formed by splicing reply segments, a joint word technique, a prepositive word technique and a postpositive word technique according to a certain format, and the reply segments are formed by splicing column names (or synonym word techniques of the column names), self-defined word techniques and column values according to a certain format. When the reply information is generated according to the query result data and the reply template, the column value in the query result data is inserted into the corresponding column value position in the reply segment of the reply template, and then the corresponding reply information can be generated.
Step S603, if the first SQL statement does not match the reply template, generating a reply segment according to the query result data, and assembling the reply segment into reply information.
If the first SQL statement is not matched with the reply template, converting a query result clause and query result data in the first SQL statement into text information according to a configured default reply generation rule to obtain a result fragment; and converting the query condition clauses in the first SQL sentence into natural language text information to obtain condition segments, and splicing the condition segments and the result segments by a splicing operation to obtain reply information.
Illustratively, referring to fig. 7, fig. 7 shows a frame diagram of reply information generation for adding a reply intervention mechanism, and taking the query information input by the user as "which products at risk" as an example, the SQL-like expression of the generated first SQL statement may be: select fund name from financing information sheet where risk rating = intermediate risk. The query result data queried according to the first SQL statement is shown in fig. 7. Based on the hit rule of the reply template, if the first SQL statement is not matched to the reply template, reply information is generated based on the default reply generation rule, a reply segment (i.e., result segment) corresponding to the query result clause of the select part is generated, the "fund name has medical health stock a and strategic distribution mix", and a reply segment (i.e., condition segment) "risk level equal to medium risk" corresponding to the query condition clause of the where part is generated. The reply segments are spliced by a connection technique to obtain reply information that the fund names with the risk level equal to the medium risk have medical health stock A and strategic distribution and sale mixture. Based on the hit rule of the reply template, if the first SQL statement matches the reply template, the matched reply template is as shown in fig. 7, where "[ ]" in the condition segment and the result segment in the reply template represents information that can be customized. And filling the table values in the query result data to the corresponding table value positions in the reply segments of the reply template, so as to generate the reply information in accordance with the format of the reply template.
In addition, the reply template may further be configured with a configuration of whether the reply segment is displayed in the reply message, and if a certain reply segment in a certain reply template is configured not to be displayed in the reply message, after the first SQL statement is matched to the reply template, when the reply message is generated according to the reply template, the reply message does not include the reply segment configured not to be displayed in the reply message.
In the embodiment, the reply intervention rule is introduced, the format of the reply message can be defined by self through configuring the reply template, and when the first SQL statement corresponding to the query message is matched with the reply template, the reply message is generated according to the reply template, so that the generated reply message is smoother and more in line with the speaking habit of a person, and the quality of the reply message can be effectively improved.
Exemplarily, fig. 8 is a general architecture diagram of a table question and answer provided in an exemplary embodiment of the present application, and as shown in fig. 8, the table question and answer is generally divided into modules of preprocessing, natural Language Understanding (NLU), dialog Management (DM), natural Language Generation (NLG), and post-processing.
The pre-processing module is responsible for loading context of the dialog, form metadata, intervention configuration information, multiple rounds of configuration information, and the like. Wherein the table metadata includes a table name, a column name, attributes of the column, etc. of the table. The intervention configuration information contains relevant configuration information of various intervention mechanisms. The multi-round configuration information contains the configuration whether to support multiple rounds of question answering.
And the natural language understanding module is responsible for realizing processing such as table selection, NL2SQL model identification, identification result post-processing and the like. The table selection refers to the selection of a target table related to the query in the current round, and if the table selection intervention rule is configured, the SQL sentence corresponding to the query information is directly determined based on the hit table selection intervention rule under the condition that the query information hits the table selection intervention rule. And under the condition that the query information does not hit the table selection intervention rule, generating a second SQL statement corresponding to the query information based on the semantic rule, specifically, performing entity identification on the query information, generating a second SQL statement corresponding to the query information by entity splicing according to the identified entity information, and screening a target table matched with the query information based on the second SQL statement, namely the target table related to the current round of query. The NL2SQL model identification is to construct a virtual table based on the screened target table, as a table aimed by the NL2SQL model, identify query information by using the NL2SQL model, and convert the query information into similar SQL based on the virtual table. And the recognition result post-processing realizes the functions of SQL combination, multi-round inheritance, SQL sequencing, SQL correction and the like. Wherein the SQL merge can merge multiple SQL statements based on the same table into one SQL statement. The multi-round inheritance is a process that a query result clause and a query condition clause of an SQL statement in a historical round can be inherited in a table selection process based on semantic rules under the configuration supporting multi-round question answering, and other information in the historical round needs to be inherited. And the SQL sequencing is to sequence the generated third SQL sentences in the table selection process based on the semantic rule so as to select the third SQL sentences with higher quality as the second SQL sentences for screening the target table. The SQL correction means that the SQL-like generated by the NL2SQL model conversion may be corrected based on the second SQL statement for screening the target table, so as to improve the quality of the SQL-like.
The Dialogue Management (DM) module implements SQL rewrite and SQL execution. SQL rewrite includes: the functions of SQL conversion, multi-table SQL rewriting, expression SQL rewriting and the like can be executed. The executable SQL conversion is to convert the SQL-like statement corresponding to the query information (e.g., the second SQL statement corresponding to the query information, the SQL-like statement generated by the NL2SQL model, etc.) into the executable SQL statement. The multi-table SQL rewrite includes rewriting a virtual table-based SQL statement (fourth SQL statement) generated with the NL2SQL model to a first SQL statement based on a plurality of target tables that actually exist. The expression SQL rewrite refers to rewrite of dynamic columns in the SQL statement, so that the rewritten SQL statement can be executed.
A Natural Language Generation (NLG) module is used to generate reply information to be presented to a user based on the query data results. Specifically, if a reply intervention rule is introduced, that is, a reply template is configured, before generating the reply information, it is determined whether a first SQL statement corresponding to the query data result matches the reply template, and if not, the reply information is generated based on a default reply generation rule. And if the reply template is matched, generating reply information in accordance with the format of the reply template. In addition, when the reply information is generated, the query data result can be displayed in a table mode, and the reply information in the table mode is generated. Or the query data result can be displayed in a text mode to generate reply information of the text dialect.
The post-processing module realizes the functions of updating the conversation context and encapsulating the result.
In the form question-answering process, the scheme of regular semantic form selection is adopted in the NLU module, so that the dialogue robot based on the form question-answering task can support the form question-answering of multi-service forms, the performance of the model is free from problems, and further basic guarantee is provided for system expansion.
In practical application, the products of the form question answering class are used as Artificial Intelligence (AI) class products, bad performance (badcase) is inevitable in the using process, one or more intervention mechanisms are supported in the embodiment of the application, and in the framework of the form question answering, two classes are needed to intervene, one is an identification process, and the other is a reply process. The recognition process intervenes in the whole life cycle of the form question-answer from the primary intervention to the advanced intervention, respectively: table selection intervention, header intervention (column name synonym intervention), column value intervention, question and answer type intervention of columns, end-to-end SQL intervention, environment variable intervention, source channel intervention and the like. Reply interventions are interventions on the reply process. In the whole form question-answering process, including NLU, NLG and the like, a scheme capable of fast intervening is provided, and possibility is provided for industrialized landing of form question-answering.
Fig. 9 is a schematic structural diagram of a processing device for a form question and answer task according to an exemplary embodiment of the present application. The device provided by the embodiment is used for executing the processing method of the form question-answering task. As shown in fig. 9, the processing device 90 of the form question-answering task includes: an information acquisition module 91, a tab module 92, a conversion module 93 and a query module 94.
The information obtaining module 91 is configured to obtain query information input by a user.
The table selecting module 92 is configured to screen out a target table matching the query information from the multiple data tables of the table question and answer task according to the query information.
The conversion module 93 is used for converting the query information into a first SQL statement based on the target table.
The query module 94 is configured to execute the first SQL statement to query the target table to obtain query result data corresponding to the query information.
In an alternative embodiment, in implementing the table selection task to select the target table matching the query information from the plurality of data tables of the table question-answering task according to the query information, the table selection module 92 is further configured to:
identifying entity information contained in the query information, and generating a second SQL statement corresponding to the query information according to the entity information; and selecting the data table related to the second SQL statement from the plurality of data tables of the table question-answering task as a target table matched with the query information.
In an alternative embodiment, in implementing the identification of the entity information included in the query information, the tab module 92 is further configured to:
in a search engine storing column names and column values of a plurality of data tables of a table question-answering task, searching candidate column names and candidate column values possibly contained in query information based on the matching degree with the query information; column names included in the query information are determined among the candidate column names, and column values included in the query information are determined among the candidate column values.
In an alternative embodiment, when implemented in a search engine storing column names and column values of multiple data tables of the table question-answering task, and based on the matching degree with the query information, the table selecting module 92 is further configured to:
searching candidate column names with the matching degree of the column names and the query information being greater than or equal to a first matching degree threshold value in a search engine according to the matching degree of the column names and the query information; or sorting the column names in the search engine according to the matching degree of the column names and the query information, and determining candidate column names of at least a column name recall quantity threshold value according to a column name sorting result.
In an alternative embodiment, when implemented in a search engine storing column names and column values of multiple data tables of the table question-answering task, and based on the matching degree with the query information, the table selecting module 92 is further configured to:
grouping the column values in the search engine according to the list and the columns in which the column values are located; sorting the column values in each group according to the matching degree of the column values and the query information; candidate column values for at least a column value grouping recall number threshold are determined within each grouping according to the column value sorting results.
In an alternative embodiment, in implementing the identification of the entity information included in the query information, the tab module 92 is further configured to:
acquiring a dictionary tree, wherein the dictionary tree comprises an aggregation function, an operator and an available question of a connector; the aggregation function, the operator and the connector contained in the query information are looked up in the dictionary tree.
In an optional embodiment, when the second SQL statement corresponding to the query information is generated according to the entity information, the table selecting module 92 is further configured to:
splicing the entity information to obtain a query result clause and a query condition clause of the current round of query; generating a third SQL statement according to the query result clause and the query condition clause, wherein different third SQL statements relate to different data tables; sequencing the third SQL sentences according to the quantity of the query result clauses contained in the third SQL sentences and/or the quantity of the query condition clauses contained in the third SQL sentences; and selecting the third SQL sentences smaller than or equal to the SQL sentence quantity threshold value as second SQL sentences corresponding to the query information according to the sorting result of the third SQL sentences, wherein the SQL sentence quantity threshold value is a positive integer.
In an optional embodiment, when generating the plurality of third SQL statements according to the query result clause and the query condition clause is implemented, the table selecting module 92 is further configured to:
acquiring a query result clause and a query condition clause obtained in historical round query; and generating a plurality of third SQL sentences according to the query result clauses and the query condition clauses of the query of the current round and the query of the historical round.
In an alternative embodiment, in implementing the conversion of the query information into the first SQL statement based on the target table, the conversion module 93 is further configured to:
if the number of the target tables is multiple, determining a virtual table according to the column names of the target tables, wherein the virtual table comprises columns of the target tables; converting the query information into a fourth SQL statement based on the virtual table; and rewriting the fourth SQL statement into first SQL statements based on the target tables, wherein each first SQL statement relates to one target table.
In an optional embodiment, the information obtaining module 91 is further configured to: and acquiring environment variable information of query information input by a user. The tab module 92 is further configured to: entity information contained in the environment variable information is identified.
When the second SQL statement corresponding to the query information is generated according to the entity information, the table selecting module 92 is further configured to: and generating a second SQL statement corresponding to the query information according to the entity information contained in the environment variable information and the entity information contained in the query information.
In an alternative embodiment, before identifying the entity information included in the query information, the tab module 92 is further configured to:
determining whether the query information is matched with a table selection intervention rule; if the query information is matched with the table selection intervention rule, taking a data table corresponding to the table selection intervention rule matched with the query information as a target table matched with the query information; and if the query information is not matched with the table selection intervention rule, executing entity information included in the query information and subsequent steps so as to screen out a target table matched with the query information from a plurality of data tables of the table question-answering task.
In an alternative embodiment, before identifying the entity information included in the query information, the tab module 92 is further configured to: determining whether the configured specific channel set comprises a source channel or not according to the source channel of query information input by a user; if the specific channel set comprises the source channel, taking a data table or a column corresponding to the source channel as a target table or a target column matched with the query information; if the specific channel set does not contain the source channel, entity information contained in the query information is identified, and subsequent steps are executed so as to screen out a target table matched with the query information from a plurality of data tables of the table question-answering task.
In an alternative embodiment, the processing device 90 for the form question-answering task further comprises:
and the end-to-end SQL intervention module is used for determining whether the query information is matched with the end-to-end SQL intervention rule.
The query module 94 is further configured to: and if the query information is matched with the end-to-end SQL interference rule, executing an SQL statement corresponding to the end-to-end SQL interference rule matched with the query information.
The tab module 92 is further configured to: and if the query information is not matched with the end-to-end SQL intervention rule, screening a target table matched with the query information from a plurality of data tables of the table question-answering task according to the query information.
In an alternative embodiment, the processing device 90 for the form question-answering task further comprises:
the reply module is used for determining whether the first SQL statement is matched with a reply template, format information of a reply segment contained in the reply information is configured in the reply template, and the reply segment comprises a result segment and/or a condition segment; if the first SQL statement is matched with the reply template, generating reply information according to the query result data and the reply template; and if the first SQL statement is not matched with the reply template, generating a reply segment according to the query result data, and assembling the reply segment into reply information.
The apparatus provided in this embodiment may be specifically configured to execute the method provided based on any of the above embodiments, and specific functions and technical effects that can be achieved are not described herein again.
Fig. 10 is a schematic structural diagram of an electronic device according to an example embodiment of the present application. As shown in fig. 10, the electronic device 100 includes: a processor 1001, and a memory 1002 communicatively coupled to the processor 1001, the memory 1002 storing computer-executable instructions.
The processor executes the computer execution instructions stored in the memory to implement the scheme provided by any of the above method embodiments, and the specific functions and the technical effects that can be achieved are not described herein again.
The embodiment of the present application further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used to implement the solutions provided in any of the above method embodiments, and specific functions and technical effects that can be achieved are not described herein again.
An embodiment of the present application further provides a computer program product, where the computer program product includes: the computer program is stored in the readable storage medium, at least one processor of the electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program, so that the electronic device executes the scheme provided by any one of the above method embodiments, and specific functions and technical effects that can be achieved are not described herein again.
In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a certain order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and only for distinguishing between different operations, and the sequence number itself does not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different. The meaning of "plurality" is two or more unless explicitly defined otherwise.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (14)

1. A processing method of a form question-answering task is characterized by comprising the following steps:
acquiring query information input by a user;
according to the query information, screening a target table matched with the query information from a plurality of data tables of a table question-answering task;
converting the query information into a first SQL statement based on the target table;
and executing the first SQL statement to query the target table to obtain query result data corresponding to the query information.
2. The method according to claim 1, wherein the step of screening out a target table matching the query information from a plurality of data tables of a table question and answer task according to the query information comprises:
identifying entity information contained in the query information, and generating a second SQL statement corresponding to the query information according to the entity information;
and selecting the data table related to the second SQL statement from a plurality of data tables of the table question-answering task as a target table matched with the query information.
3. The method of claim 2, wherein the identifying entity information included in the query information comprises:
in a search engine storing column names and column values of a plurality of data tables of the table question-answering task, searching candidate column names and candidate column values possibly contained in the query information based on the matching degree with the query information;
and determining column names contained in the query information in the candidate column names, and determining column values contained in the query information in the candidate column values.
4. The method of claim 3, wherein searching candidate column values that the query information may contain in a search engine storing column names and column values of a plurality of data tables of the table question and answer task based on a degree of matching with the query information comprises:
grouping the column values in the search engine according to the list and the columns;
sorting the column values in each group according to the matching degree of the column values and the query information;
candidate column values for at least a column value group recall number threshold are determined within each of the groups according to a column value sorting result.
5. The method of claim 2, wherein the identifying entity information contained in the query information comprises:
acquiring a dictionary tree, wherein the dictionary tree comprises an aggregation function, an operational character and an available question of a connector;
and searching aggregation functions, operators and connectors contained in the query information in the dictionary tree.
6. The method according to claim 2, wherein the generating a second SQL statement corresponding to the query information according to the entity information comprises:
splicing the entity information to obtain a query result clause and a query condition clause of the query in the current round;
generating a third SQL statement according to the query result clause and the query condition clause, wherein different third SQL statements relate to different data tables;
sorting the third SQL sentences according to the number of the query result clauses contained in the third SQL sentence and/or the number of the query condition clauses contained in the third SQL sentence;
and selecting a third SQL statement smaller than or equal to the SQL statement quantity threshold value as a second SQL statement corresponding to the query information according to the sorting result of the third SQL statement, wherein the SQL statement quantity threshold value is a positive integer.
7. The method of claim 6, wherein generating a plurality of third SQL statements according to the query result clause and the query condition clause comprises:
acquiring a query result clause and a query condition clause obtained in historical round query;
and generating a plurality of third SQL sentences according to the query result clauses and the query condition clauses of the query of the current round and the query of the historical round.
8. The method of claim 1, wherein converting the query information into a first SQL statement based on the target table comprises:
if the target tables are multiple, determining a virtual table according to the column names of the target tables, wherein the virtual table comprises the columns of the target tables;
converting the query information into a fourth SQL statement based on the virtual table;
rewriting said fourth SQL statements into first SQL statements based on said target tables, each of said first SQL statements relating to one of said target tables.
9. The method according to any one of claims 2-7, further comprising:
acquiring environment variable information of query information input by a user, and identifying entity information contained in the environment variable information;
the generating of the second SQL statement corresponding to the query information according to the entity information includes:
and generating a second SQL statement corresponding to the query information according to the entity information contained in the environment variable information and the entity information contained in the query information.
10. The method according to any one of claims 2-7, wherein before identifying entity information contained in the query information, further comprising:
determining whether the query information is matched with a table selection intervention rule;
if the query information is matched with the table selection intervention rule, taking a data table corresponding to the table selection intervention rule matched with the query information as a target table matched with the query information;
and if the query information is not matched with the table selection intervention rule, executing entity information included in the query information and subsequent steps so as to screen out a target table matched with the query information from a plurality of data tables of a table question-answering task.
11. The method according to any one of claims 2-7, wherein before identifying entity information contained in the query information, further comprising:
determining whether a configured specific channel set comprises a source channel according to the source channel of the query information input by the user;
if the specific channel set comprises the source channel, taking a data table or a column corresponding to the source channel as a target table or a target column matched with the query information;
and if the specific channel set does not comprise the source channel, executing entity information included in the query information and subsequent steps so as to screen out a target table matched with the query information from a plurality of data tables of a table question-answering task.
12. The method according to any one of claims 1-8, wherein before the step of screening out the target table matching the query information from the plurality of data tables of the table question and answer task according to the query information, the method further comprises:
determining whether the query information matches an end-to-end SQL intervention rule;
if the query information is matched with the end-to-end SQL interference rule, executing an SQL statement corresponding to the end-to-end SQL interference rule matched with the query information;
and if the query information is not matched with the end-to-end SQL intervention rule, screening a target table matched with the query information from a plurality of data tables of a table question-answering task according to the query information.
13. The method according to any of claims 1-8, wherein after executing the first SQL statement to query the target table and obtain query result data corresponding to the query information, the method further comprises:
determining whether the first SQL statement is matched with a reply template, wherein format information of reply fragments contained in reply information is configured in the reply template, and the reply fragments comprise result fragments and/or condition fragments;
if the first SQL statement is matched with a reply template, generating reply information according to the query result data and the reply template;
and if the first SQL statement is not matched with a reply template, generating reply fragments according to the query result data, and assembling the reply fragments into reply information.
14. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-13.
CN202211269215.4A 2022-10-17 2022-10-17 Processing method and equipment for table question-answering task Pending CN115577085A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211269215.4A CN115577085A (en) 2022-10-17 2022-10-17 Processing method and equipment for table question-answering task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211269215.4A CN115577085A (en) 2022-10-17 2022-10-17 Processing method and equipment for table question-answering task

Publications (1)

Publication Number Publication Date
CN115577085A true CN115577085A (en) 2023-01-06

Family

ID=84584488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211269215.4A Pending CN115577085A (en) 2022-10-17 2022-10-17 Processing method and equipment for table question-answering task

Country Status (1)

Country Link
CN (1) CN115577085A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737909A (en) * 2023-07-28 2023-09-12 无锡容智技术有限公司 Table data processing method based on natural language dialogue

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737909A (en) * 2023-07-28 2023-09-12 无锡容智技术有限公司 Table data processing method based on natural language dialogue
CN116737909B (en) * 2023-07-28 2024-04-23 无锡容智技术有限公司 Table data processing method based on natural language dialogue

Similar Documents

Publication Publication Date Title
US11615791B2 (en) Voice application platform
US11790904B2 (en) Voice application platform
US11887597B2 (en) Voice application platform
JP6634515B2 (en) Question clustering processing method and apparatus in automatic question answering system
US20180210883A1 (en) System for converting natural language questions into sql-semantic queries based on a dimensional model
CN107798123B (en) Knowledge base and establishing, modifying and intelligent question and answer methods, devices and equipment thereof
US11281864B2 (en) Dependency graph based natural language processing
EP3671526B1 (en) Dependency graph based natural language processing
US11437029B2 (en) Voice application platform
CN109902087B (en) Data processing method and device for questions and answers and server
CN112328489B (en) Test case generation method and device, terminal equipment and storage medium
US20220004547A1 (en) Method, apparatus, system, device, and storage medium for answering knowledge questions
CN114090760B (en) Data processing method of table question and answer, electronic equipment and readable storage medium
CN115577085A (en) Processing method and equipment for table question-answering task
CN114647719A (en) Question-answering method and device based on knowledge graph
EP3617970A1 (en) Automatic answer generation for customer inquiries
WO2019236444A1 (en) Voice application platform
CN110032574A (en) The processing method and processing device of SQL statement
CN114357137A (en) Knowledge graph-based question-answering method, knowledge graph-based question-answering equipment, knowledge graph-based storage medium and question-answering robot
CN106682221B (en) Question-answer interaction response method and device and question-answer system
CN112966031A (en) Data processing method and device, electronic equipment and computer readable storage medium
EP3944127A1 (en) Dependency graph based natural language processing
US20240152511A1 (en) Transliteration of machine interpretable languages for enhanced compaction
CN114064862A (en) Question answering method, device and equipment
CN117493369A (en) Data retrieval method and device, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination