CN114090619B - Query processing method and device for natural language - Google Patents

Query processing method and device for natural language Download PDF

Info

Publication number
CN114090619B
CN114090619B CN202210058317.5A CN202210058317A CN114090619B CN 114090619 B CN114090619 B CN 114090619B CN 202210058317 A CN202210058317 A CN 202210058317A CN 114090619 B CN114090619 B CN 114090619B
Authority
CN
China
Prior art keywords
sequence
natural language
grammar
tag
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210058317.5A
Other languages
Chinese (zh)
Other versions
CN114090619A (en
Inventor
田有朋
李俊
黄亚东
王小卫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202210058317.5A priority Critical patent/CN114090619B/en
Priority to CN202211411569.8A priority patent/CN115687397A/en
Publication of CN114090619A publication Critical patent/CN114090619A/en
Application granted granted Critical
Publication of CN114090619B publication Critical patent/CN114090619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Abstract

The embodiment of the specification provides a query processing method and device of a natural language. The method comprises the following steps: acquiring a tag sequence corresponding to a target statement, wherein the target statement is a natural language statement used for requesting to query data from a data storage system, and a single tag in the tag sequence is used for indicating an entity class to which a word corresponding to the target statement belongs; parsing the tag sequence to generate a natural language syntax tree; and generating a query statement for querying the data storage system according to the natural language syntax tree.

Description

Query processing method and device for natural language
Technical Field
One or more embodiments of the present disclosure relate to the field of computers, and in particular, to a method and an apparatus for query processing in natural language.
Background
In the process of converting a Natural Language (NL) into a Query Language capable of querying a database, for example, in the process of converting a Natural Language statement into a Structured Query Language (SQL) statement, entities in the Natural Language statement are usually mapped based on very simple semantic mapping rules, and the Natural Language statement with a complicated semantic meaning cannot be accurately converted, for example, the statements related to a "logical relationship" and/or an "arithmetic relationship" cannot be accurately converted, so that an accurate Query result cannot be queried from a data storage system in a subsequent process.
It is desirable to have a new solution that can support more complex natural language based data query scenarios.
Disclosure of Invention
One or more embodiments of the present specification provide a method and an apparatus for query processing in natural language.
In a first aspect, a query processing method is provided, including: acquiring a tag sequence corresponding to a target statement, wherein the target statement is a natural language statement used for requesting to query data from a data storage system, and a single tag in the tag sequence is used for indicating an entity category to which a word corresponding to the tag in the target statement belongs; parsing the tag sequence to generate a natural language syntax tree; and generating a query statement for querying the data storage system according to the natural language syntax tree.
In one possible embodiment, the method further comprises: and determining attribute values corresponding to all the tags in the tag sequence. Generating a query statement for querying the data storage system according to the natural language syntax tree, comprising: and generating a grammar parsing result according to the natural language grammar tree and the attribute values respectively corresponding to the labels, and generating a query statement according to the grammar parsing result.
In a possible implementation manner, the obtaining a tag sequence corresponding to a target sentence includes: performing word segmentation on a target sentence to obtain a corresponding word sequence; and determining entity categories to which all words in the word sequence respectively belong, and forming a label sequence by utilizing the entity categories to which all the words respectively belong.
In a possible implementation, the parsing the tag sequence to generate a natural language syntax tree specifically includes: matching the label sequence with a grammar unit composition rule to determine a plurality of grammar units which are sequentially arranged, wherein a single grammar unit corresponds to one label in the label sequence or a plurality of labels which are sequentially arranged; and generating a natural language grammar tree according to the label sequence and the plurality of grammar units which are sequentially arranged.
In one possible embodiment, the parsing the tag sequence to generate a natural language syntax tree further includes: and determining whether the plurality of grammar units arranged in sequence are matched with a grammar unit combination rule.
In one possible implementation, the plurality of leaf nodes sequentially arranged in the natural language syntax tree are each tag sequentially arranged in the tag sequence, the plurality of child nodes of the root node in the natural language syntax tree are the plurality of syntax units sequentially arranged, and a single syntax unit is connected to each tag corresponding to the syntax unit.
In one possible embodiment, a single tag in the tag sequence is specified as one of the following entity classes: time, dimension value, logical relationship, arithmetic relationship, number, and query object.
In a possible implementation manner, the plurality of grammar units arranged in sequence comprise modification objects and a plurality of modifiers; the syntax element construction rule is used for indicating the mapping relation between the label and the modification object or the modification word.
In one possible embodiment, the mapping includes at least one of: when the plurality of sequentially arranged labels comprise a plurality of dimension values which are arranged at intervals, and the labels between two adjacent dimension values are in a logic relationship, grammar units corresponding to the plurality of sequentially arranged labels are modifiers; when the plurality of labels arranged in sequence are dimensions, arithmetic relations and numbers arranged in sequence, the grammar units corresponding to the plurality of labels arranged in sequence are modifiers.
In a second aspect, a query processing apparatus in natural language is provided, including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a tag sequence corresponding to a target sentence, the target sentence is a natural language sentence used for requesting to inquire data from a data storage system, and a single tag in the tag sequence is used for indicating an entity category to which a word corresponding to the tag in the target sentence belongs; a parsing unit configured to parse the tag sequence to generate a natural language syntax tree; a generating unit configured to generate a query statement to query the data storage system according to the natural language syntax tree.
In a possible embodiment, the apparatus further comprises: and the interpolation unit is configured to determine attribute values corresponding to the labels in the label sequence respectively. The generating unit is specifically configured to generate a syntax parsing result according to the natural language syntax tree and the attribute values corresponding to the labels, and generate a query statement according to the syntax parsing result.
In a possible implementation manner, the obtaining unit is specifically configured to perform word segmentation on the target sentence to obtain a word sequence corresponding to the target sentence; and determining entity categories to which all words in the word sequence respectively belong, and forming a label sequence by utilizing the entity categories to which all the words respectively belong.
In a possible implementation manner, the parsing unit is specifically configured to match the tag sequence with a syntax element configuration rule to determine a plurality of syntax elements arranged in sequence, where a single syntax element corresponds to one tag in the tag sequence or a plurality of consecutive tags arranged in sequence; and generating a natural language grammar tree according to the label sequence and the plurality of grammar units which are sequentially arranged.
In a possible implementation manner, the parsing unit is further configured to determine whether the plurality of syntax units arranged in sequence match the syntax unit combination rule.
In one possible implementation, the plurality of leaf nodes sequentially arranged in the natural language syntax tree are each tag sequentially arranged in the tag sequence, the plurality of child nodes of the root node in the natural language syntax tree are the plurality of syntax units sequentially arranged, and a single syntax unit is connected to each tag corresponding to the syntax unit.
In one possible embodiment, a single tag in the tag sequence is specified as one of the following entity classes: time, dimension value, logical relationship, arithmetic relationship, number, and query object.
In a possible implementation manner, the plurality of grammar units arranged in sequence comprise modification objects and a plurality of modifiers; the grammar unit composition rule is used for indicating the mapping relation between the label and the modification object or the modification word.
In one possible embodiment, the mapping includes at least one of: when the plurality of labels which are sequentially arranged comprise a plurality of dimension values which are arranged at intervals, and the labels between two adjacent dimension values are in a logic relationship, the grammatical units corresponding to the plurality of labels which are sequentially arranged are modifiers; when the plurality of labels arranged in sequence are dimensions, arithmetic relations and numbers arranged in sequence, the grammar units corresponding to the plurality of labels arranged in sequence are modifiers.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program/instructions which, when executed in a computing device, the computing device performs the method according to any of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory having stored therein a computer program/instructions, and a processor implementing the method of any of the first aspects when executing the computer program/instructions.
According to the method and the device provided in one or more embodiments of the specification, for a natural language sentence requesting to query data from a data storage system, after a tag sequence corresponding to the natural language sentence is obtained based on an entity type to which each word in the natural language sentence belongs, syntax parsing is performed on the tag sequence corresponding to the natural language sentence to generate a corresponding natural language syntax tree, and then a query sentence capable of directly querying the data storage system is generated according to the natural language syntax tree, so that a more complex data query scene based on a natural language is supported.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic view of a service scenario of a technical solution provided in an embodiment of the present specification;
FIG. 2 is a flow chart of a method for processing a query in natural language provided in an embodiment of the present specification;
FIG. 3 is a flow chart of another method for processing a query in natural language provided in an embodiment of the present specification;
FIG. 4 is a diagram of a natural language syntax tree provided in an exemplary embodiment of the present specification;
FIG. 5 is a diagram of a syntax parsing result exemplarily provided in an embodiment of the present specification;
fig. 6 is a schematic diagram of a query processing apparatus in natural language provided in an embodiment of the present specification.
Detailed Description
Various non-limiting embodiments provided by the present specification are described in detail below with reference to the attached figures.
Fig. 1 is a schematic view of a service scenario of the technical solution provided in an embodiment of this specification. To reduce the user's usage threshold for data in a data storage system, it is generally desirable to support querying data in the data storage system based on user-initiated natural language statements, where the data storage system may be, for example, a database, a file management system, or a file having a particular format, etc. The mainstream solutions at present include seq2SQL and Natural Language Processing (NLP) based on artificial intelligence, which essentially map entities in Natural Language statements based on simple semantic mapping rules to translate the Natural Language statements into query statements such as SQL statements for querying a database. For seq2sql and artificial intelligence based NLP, the accuracy rate is low, the coverage range is relatively narrow, and various complex data analysis requirements under the real data query scene in an enterprise/organization cannot be supported; for example, the current advanced seq2sql algorithm actually supports only about 80% accuracy under single-table single-layer aggregation, and the artificial intelligence based NLP has no reliability in the translated query statement because it depends on the training sample and the training process. For example, in a typical scenario, referring to fig. 1, for a natural language statement "the last thirty days of the payment amounts of beijing and shanghai" that is initiated by a user and requests to query data from a database system, where a more complex logical relationship "beijing and shanghai" and a more complex arithmetic relationship "the last thirty days" are involved, when the natural language statement is translated into an SQL statement for querying the database system by a currently mainstream solution seq2SQL or an artificial intelligence based NLP, the obtained SQL statement may not accurately represent the semantics of the natural language statement, and data meeting the user's expectations cannot be queried from the database system.
In view of the above problems, embodiments of the present disclosure provide a method and an apparatus for query processing in natural language, where for a natural language sentence that requests to query data from a data storage system, after a tag sequence corresponding to the natural language sentence is obtained based on entity categories to which each word in the natural language sentence belongs, syntax parsing is performed on the tag sequence corresponding to the natural language sentence to generate a corresponding natural language syntax tree, and then a query sentence that queries the data storage system is generated according to the natural language syntax tree, which is beneficial to supporting a more complex data query scenario based on natural language. For example, for a natural language sentence with complex semantics related to logical relationships and/or arithmetic relationships, the corresponding natural language syntax tree can assist in understanding the semantics of the natural language sentence, and an accurate query sentence for directly querying the data storage system can be generated based on the natural language syntax tree, so that an accurate query result can be queried from the data storage system based on the query sentence.
Fig. 2 is a flowchart of a query processing method in natural language provided in an embodiment of the present specification. Where the method may be performed by any apparatus, device, platform or cluster of devices having computing/processing capabilities, the method may be at least as shown in fig. 2 as step 21 and step 23.
First, in step 21, a tag sequence corresponding to the target sentence is acquired.
The target sentence is a natural language sentence used for requesting to query data from the data storage system, a single tag in the tag sequence is used for indicating an entity category to which a word corresponding to the tag in the target sentence belongs, for example, a classification category to which a single word in a word sequence obtained by segmenting the target sentence belongs may be used as the tag corresponding to the word.
In one possible implementation, referring to fig. 3, step 21 may include the following steps 211 to 215.
In step 211, the target sentence is segmented to obtain its corresponding word sequence. The data storage system may be configured with a corresponding word bank and a disabled word list, and the target sentence is segmented based on the word bank to generate a segmentation result composed of a plurality of words. When words which are not declared in the word stock exist in the word segmentation result, corresponding error prompt information can be generated to instruct the user to provide the natural language sentences meeting the grammar requirement again; in addition, stop words in the word segmentation result can be removed based on the stop word list, and word sequences corresponding to the target sentences are formed by all words except the stop words in the word segmentation result. For example, for the natural language sentence "transaction amounts of Beijing and Shanghai last thirty days", the word sequence that is segmented to obtain may be "transaction amounts of Beijing, and, Shanghai" last thirty days, wherein "of" is discarded as a stop word.
Step 213, determining entity categories to which each word in the word sequence belongs, and forming a tag sequence by using the entity categories to which each word belongs. The entity categories to which each word in the word sequence belongs can be specifically identified through various entity identification algorithms configured in advance, and then the entity categories to which each word belongs are utilized to form the tag sequence.
The single tag in the tag sequence may specifically be one of a plurality of entity categories such as time, dimension value, logical relationship, arithmetic relationship, number, and query object, in other words, when the entity categories to which the words in the word sequence respectively belong are identified based on the corresponding entity identification algorithm, the entity category to which the single word belongs may be time, dimension value, logical relationship, arithmetic relationship, number, or query object. For example, for the word sequence "thirty days recently, beijing, sum, shanghai, transaction amount", the corresponding tag sequence may be "TIME (TIME), dimension VALUE (VALUE), logical relationship (LOGIC _ OPERATOR), dimension VALUE (VALUE), query object" (MEASURE) "; where "time" may be, for example, a field in a database table requesting a query; "beijing" and "shanghai" may be field values under a certain field in the corresponding database table, for example, specifically, field values under a "CITY (CITY)" field.
After the tag sequence corresponding to the target sentence is obtained in the foregoing various manners, step 23 may be executed to perform syntax parsing on the tag sequence to generate a natural language syntax tree. Referring to fig. 3, in a possible implementation, step 23 may specifically include the following steps 231 and 233.
Step 231, matching the tag sequence with the grammar unit composition rule to determine a plurality of grammar units arranged in sequence, wherein a single grammar unit corresponds to one tag or a plurality of continuous tags in the tag sequence.
The mapping relationship between the tag/entity class and the syntax element can be configured in advance. For example, a modifier (FORMULA) and a modifier (AGGR) may be predefined, and a mapping relationship between the tag/entity class and the modifier may be defined by a syntax element composition rule. The mapping relationship may include, but is not limited to: when the plurality of labels which are sequentially arranged comprise a plurality of dimension values which are arranged at intervals, and the label/entity category between two adjacent dimension values is a logical relationship, the grammatical units corresponding to the plurality of labels which are sequentially arranged are modifiers; when the plurality of labels arranged in sequence are dimensions, arithmetic relations and numbers arranged in sequence, the grammar units corresponding to the plurality of labels arranged in sequence are modifiers. For example, the mapping relationship may further define that when the tag/entity type is a query object, the corresponding syntax unit is a modification object; it can also be defined that when the tag/entity category is time, the corresponding syntax element is a modifier.
Continuing with the example of the tag sequence "TIME (TIME), dimension VALUE (VALUE), logical relationship (LOGIC _ OPERATOR), dimension VALUE (VALUE), query object (MEASURE)" of the foregoing example, matching the tag sequence with the grammar element construction rule of the foregoing example may construct a plurality of grammar elements arranged in sequence: modifier (FORMULA), and object of modification (AGGR).
Step 233, a natural language syntax tree is generated according to the tag sequence and the plurality of syntax elements arranged in sequence.
The multiple leaf nodes sequentially arranged in the natural language syntax tree may be specifically tags sequentially arranged in a tag sequence, the multiple child nodes of a ROOT node (ROOT) in the natural language syntax tree are specifically multiple syntax units that are determined and sequentially arranged, and a single syntax unit is connected with each tag corresponding to the single syntax unit. Continuing with the aforementioned natural language statement "transaction amount of Beijing and Shanghai in the last thirty days" as an example, after the tag sequence and the plurality of grammar units arranged in sequence are obtained in various manners of the aforementioned example, a natural language grammar tree as shown in FIG. 4 may be generated for the natural language statement, for example.
In a possible embodiment, corresponding syntax element combination rules may also be configured in advance, for example, a combination rule between a single "modifier object (AGGR)" and several "modifiers (FORMULAs)" is configured. Correspondingly, on the basis of the foregoing steps 231 and 233, the foregoing step 23 may further perform a step 232 of determining whether the syntax elements arranged in sequence match the syntax element combination rule. If the grammar units which are sequentially arranged are not matched with the grammar unit combination rule, the current process can be ended, and other affairs such as prompt information for indicating that the natural language sentence does not accord with the grammar rule and the like are provided for a user; if the multiple grammar units arranged in sequence match the grammar unit combination rule, the foregoing step 233 is continuously executed to generate a natural language grammar tree corresponding to the target sentence.
After the generation of the natural language syntax tree corresponding to the target sentence is completed, step 25 may be executed to generate a query sentence for querying the data storage system according to the natural language syntax tree. Where the query statement may be, for example, an SQL statement.
With continued reference to fig. 3, in a possible implementation, step 24 may be further performed before step 25, to determine attribute values corresponding to respective tags in the tag sequence. Correspondingly, the step 25 may specifically include a step 251, in which a syntax parsing result is generated according to the attribute values corresponding to the natural language syntax tree and each tag, and a query statement is generated according to the syntax parsing result.
Corresponding lexical rules can be configured in advance, and grammar parsing results are generated by combining the lexical rules and all tags in the tag sequences on the basis of the generated natural language grammar tree. For example, it may be defined by lexical rules: the attribute value of the word "and" belonging to the entity category "logical relationship" is "&; the words "greater than", "not less than", "not greater than", "equal to" belonging to the entity category "arithmetic relationship" correspond to attribute values of ">", "> =", "<" = "," = "; the "time" belonging to the entity class may define its attribute value independently based on the composition structure of the word, for example, the word "last N days" belonging to the entity class "time" may define its attribute value as "data < now () -N" where now () is used to represent the current time and data is used to represent the time interval in accordance with the user's desire, or may also define its attribute value as "T1-T2" where T2 represents the current time and T1 is located before the current time and its time interval between T2 is N days; the attribute value of a word belonging to the entity category "dimension value" or "query object" is the word itself.
Continuing with the example of the tag sequence "TIME (TIME), dimension VALUE (VALUE), logical _ OPERATOR (logical _ OPERATOR), dimension VALUE (VALUE), query object (MEASURE)" in the foregoing example, it can be determined that the attribute VALUEs sequentially corresponding to the tags in the tag sequence are: "date < (now () -30)", "beijing", "& & &", "shanghai", "transaction amount", and then generates a syntax parsing result as exemplified in fig. 5.
The syntax parsing result can accurately reflect the semantics of the natural language sentence, so that a query sentence capable of accurately expressing the semantics of the natural language sentence can be generated based on the syntax parsing result. The rule for generating the query statement based on the syntax parsing result may be flexibly configured in combination with an actual service situation, for example, a target template matched with the syntax parsing result may be determined from various pre-configured possible query statement templates, and then the corresponding query statement template is filled with corresponding attribute values based on the syntax parsing result to generate the query statement for querying the data storage system. For example, for two modifiers and modified objects sequentially arranged in the parsing result shown in fig. 5, the attribute values of their respective corresponding labels may each constitute a corresponding syntax phrase, and in combination with a corresponding query statement template, an SQL statement "select sum from table likelihood in (beijing, shanghai) and date < (now () -30)" may be generated for these syntax phrases, where the amount represents the transaction amount in the parsing result; based on a similar principle, in the natural language statement "city with transaction amount less than 100000", a grammar unit corresponding to the "city with transaction amount less than 100000" is a modifier, and the attribute value of each label corresponding to the grammar unit can form a corresponding grammar phrase, so that the SQL statement "select city from table group by city having city less than 100000" is accurately generated based on the grammar phrase.
Based on the foregoing embodiments, for a natural language statement that has a complex semantic meaning related to a logical relationship and/or an arithmetic relationship, a corresponding natural language syntax tree is generated to assist in understanding the semantic meaning of the natural language statement, so that an accurate query statement can be generated for the natural language statement, so as to query data that meets a user's desire from a data storage system based on the query statement.
The method is based on the same concept as the method embodiments, and the embodiment of the specification further provides a query processing device. As shown in fig. 6, the apparatus includes: an obtaining unit 61, configured to obtain a tag sequence corresponding to a target sentence, where the target sentence is a natural language sentence used for requesting to query data from a data storage system, and a single tag in the tag sequence is used for indicating an entity category to which a word corresponding to the tag in the target sentence belongs; a parsing unit 63 configured to parse the tag sequence to generate a natural language syntax tree; a generating unit 65 configured to generate a query statement for querying the data storage system according to the natural language syntax tree.
In a possible implementation, the apparatus further includes an interpolation unit 64 configured to determine attribute values corresponding to the respective tags in the tag sequence; the generating unit 65 is configured to generate a syntax parsing result according to the attribute values corresponding to the natural language syntax tree and the tags, and generate a query statement according to the syntax parsing result.
In a possible implementation manner, the obtaining unit 61 is specifically configured to perform word segmentation on the target sentence to obtain a word sequence corresponding to the target sentence; and determining entity categories to which all words in the word sequence respectively belong, and forming a label sequence by utilizing the entity categories to which all the words respectively belong.
In a possible implementation manner, the parsing unit 63 is specifically configured to match the tag sequence with a syntax element configuration rule to determine a plurality of syntax elements arranged in sequence, where a single syntax element corresponds to one tag or a plurality of consecutive tags in the tag sequence; and generating a natural language grammar tree according to the label sequence and the plurality of grammar units which are sequentially arranged.
In a possible embodiment, the parsing unit 63 is further configured to determine whether the plurality of syntax units arranged in sequence match the syntax unit combination rule.
In one possible implementation, the plurality of leaf nodes sequentially arranged in the natural language syntax tree are each tag sequentially arranged in the tag sequence, the plurality of child nodes of the root node in the natural language syntax tree are the plurality of syntax units sequentially arranged, and a single syntax unit is connected to each tag corresponding to the syntax unit.
In one possible embodiment, a single tag in the tag sequence is specified as one of the following entity classes: time, dimension value, logical relationship, arithmetic relationship, number, and query object.
In a possible implementation manner, the plurality of grammar units arranged in sequence comprise modification objects and a plurality of modifiers; the syntax element composition rule is used for indicating the mapping relation between the label and the modification object or the modification word.
In one possible embodiment, the mapping includes at least one of: when the plurality of sequentially arranged labels comprise a plurality of dimension values which are arranged at intervals, and the labels between two adjacent dimension values are in a logic relationship, grammar units corresponding to the plurality of sequentially arranged labels are modifiers; when the plurality of labels arranged in sequence are dimensions, arithmetic relations and numbers arranged in sequence, the grammar units corresponding to the plurality of labels arranged in sequence are modifiers.
Those skilled in the art will recognize that in one or more of the examples described above, the functions described in this specification can be implemented in hardware, software, firmware, or any combination thereof. When implemented by software, a computer program corresponding to these functions may be stored in a computer-readable medium or transmitted as one or more instructions/codes on the computer-readable medium, so that when the computer program corresponding to these functions is executed by a computer, the query processing method in natural language provided in any one embodiment of the present specification is implemented by the computer.
Also provided in an embodiment of the present specification is a computer-readable storage medium having stored thereon a computer program/instruction which, when executed in a computing device, executes the query processing method in natural language provided in any one of the embodiments of the present specification.
In an embodiment of the present specification, a computing device is further provided, and includes a memory and a processor, where the memory stores therein a computer program/instruction, and when the processor executes the computer program/instruction, the computing device implements the method for processing a query in a natural language provided in any one embodiment of the present specification.
The embodiments in the present description are described in a progressive manner, and the same and similar parts in the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (16)

1. A method of query processing in natural language, comprising:
segmenting words of a target sentence to obtain a word sequence corresponding to the words, determining entity categories to which the words in the word sequence respectively belong, and forming a tag sequence by utilizing the entity categories to which the words respectively belong according to the sequence of the words in the word sequence, wherein the target sentence is a natural language sentence for requesting to inquire data from a data storage system, and a single tag in the tag sequence is used for indicating the entity category to which the word corresponding to the tag in the target sentence belongs;
parsing the tag sequence to generate a natural language syntax tree;
determining attribute values corresponding to all the tags in the tag sequence respectively;
and generating a grammar parsing result according to the natural language grammar tree and the attribute values respectively corresponding to the labels, and generating a query statement according to the grammar parsing result.
2. The method of claim 1, wherein the parsing the sequence of tags to generate a natural language syntax tree comprises:
matching the label sequence with a grammar unit composition rule to determine a plurality of grammar units which are sequentially arranged, wherein a single grammar unit corresponds to one label in the label sequence or a plurality of labels which are sequentially arranged;
and generating a natural language grammar tree according to the label sequence and the plurality of grammar units which are sequentially arranged.
3. The method of claim 2, wherein the parsing the sequence of tags to generate a natural language syntax tree further comprises: and determining whether the plurality of grammar units arranged in sequence are matched with grammar unit combination rules.
4. The method of claim 2, wherein the sequentially arranged plurality of leaf nodes in the natural language syntax tree are sequentially arranged labels in the sequence of labels, the sequentially arranged plurality of child nodes of the root node in the natural language syntax tree are the sequentially arranged plurality of syntax elements, and a single syntax element is connected to its corresponding label.
5. The method of claim 2, wherein a single tag in the sequence of tags is specific to one of the following respective entity classes: time, dimension value, logical relationship, arithmetic relationship, number, and query object.
6. The method according to claim 5, wherein the plurality of grammar units arranged in sequence comprise a modifier and a plurality of modifiers; the syntax element construction rule is used for indicating the mapping relation between the label and the modification object or the modification word.
7. The method of claim 6, wherein the mapping relationship comprises at least one of:
when the plurality of labels which are sequentially arranged comprise a plurality of dimension values which are arranged at intervals, and the labels between two adjacent dimension values are in a logic relationship, the grammatical units corresponding to the plurality of labels which are sequentially arranged are modifiers;
when the plurality of labels arranged in sequence are dimensions, arithmetic relations and numbers arranged in sequence, the grammar units corresponding to the plurality of labels arranged in sequence are modifiers.
8. A query processing apparatus of natural language, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to perform word segmentation on a target sentence to obtain a word sequence corresponding to the word sequence, determine entity categories to which each word in the word sequence belongs respectively, and form a tag sequence by using the entity categories to which each word belongs respectively according to the sequence of each word in the word sequence, wherein the target sentence is a natural language sentence for requesting to query data from a data storage system, and a single tag in the tag sequence is used for indicating the entity category to which the word corresponding to the tag belongs in the target sentence;
a parsing unit configured to parse the tag sequence to generate a natural language syntax tree;
the interpolation unit is configured to determine attribute values corresponding to the labels in the label sequence respectively;
and the generating unit is configured to generate a grammar parsing result according to the natural language grammar tree and the attribute values corresponding to the labels respectively, and generate a query statement according to the grammar parsing result.
9. The apparatus according to claim 8, wherein the parsing unit is specifically configured to match the tag sequence with a syntax element construction rule to determine a plurality of syntax elements arranged in sequence, a single syntax element corresponding to one tag in the tag sequence or a plurality of consecutive tags arranged in sequence; and generating a natural language grammar tree according to the label sequence and the plurality of grammar units which are sequentially arranged.
10. The apparatus of claim 9, wherein the parsing unit is further configured to determine whether the sequentially arranged plurality of syntax elements match a syntax element combination rule.
11. The apparatus of claim 9, wherein a plurality of leaf nodes arranged in sequence in the natural language syntax tree are respective tags arranged in sequence in the sequence of tags, a plurality of child nodes of a root node in the natural language syntax tree are the plurality of syntax elements arranged in sequence, and a single syntax element is connected to its corresponding respective tag.
12. The apparatus of claim 9, wherein a single tag in the sequence of tags is specific to one of the following respective entity classes: time, dimension value, logical relationship, arithmetic relationship, number, and query object.
13. The apparatus according to claim 12, wherein the plurality of syntax elements arranged in sequence include a modifier and several modifiers; the grammar unit composition rule is used for indicating the mapping relation between the label and the modification object or the modification word.
14. The apparatus of claim 13, wherein the mapping relationship comprises at least one of:
when the plurality of labels which are sequentially arranged comprise a plurality of dimension values which are arranged at intervals, and the labels between two adjacent dimension values are in a logic relationship, the grammatical units corresponding to the plurality of labels which are sequentially arranged are modifiers;
when the plurality of labels arranged in sequence are dimensions, arithmetic relations and numbers arranged in sequence, the grammar units corresponding to the plurality of labels arranged in sequence are modifiers.
15. A computer-readable storage medium having stored thereon a computer program which, when executed in a computing device, performs the method of any of claims 1-7.
16. A computing device comprising a memory having a computer program stored therein and a processor that, when executing the computer program, implements the method of any of claims 1-7.
CN202210058317.5A 2022-01-19 2022-01-19 Query processing method and device for natural language Active CN114090619B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210058317.5A CN114090619B (en) 2022-01-19 2022-01-19 Query processing method and device for natural language
CN202211411569.8A CN115687397A (en) 2022-01-19 2022-01-19 Query processing method and device for natural language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210058317.5A CN114090619B (en) 2022-01-19 2022-01-19 Query processing method and device for natural language

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202211411569.8A Division CN115687397A (en) 2022-01-19 2022-01-19 Query processing method and device for natural language

Publications (2)

Publication Number Publication Date
CN114090619A CN114090619A (en) 2022-02-25
CN114090619B true CN114090619B (en) 2022-09-20

Family

ID=80308542

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210058317.5A Active CN114090619B (en) 2022-01-19 2022-01-19 Query processing method and device for natural language
CN202211411569.8A Pending CN115687397A (en) 2022-01-19 2022-01-19 Query processing method and device for natural language

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202211411569.8A Pending CN115687397A (en) 2022-01-19 2022-01-19 Query processing method and device for natural language

Country Status (1)

Country Link
CN (2) CN114090619B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117370377B (en) * 2023-12-05 2024-02-06 子亥科技(成都)有限公司 Three-dimensional scene management method and device based on structured query language

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451153A (en) * 2016-05-31 2017-12-08 北京京东尚科信息技术有限公司 The method and apparatus of export structure query statement
CN111797278A (en) * 2020-05-19 2020-10-20 武汉乐程软工科技有限公司 Method for mapping associated object and relation
CN113051287A (en) * 2021-06-01 2021-06-29 北京达佳互联信息技术有限公司 Query statement generation method, device, equipment and storage medium
CN113495900A (en) * 2021-08-12 2021-10-12 国家电网有限公司大数据中心 Method and device for acquiring structured query language sentences based on natural language

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013003055A1 (en) * 2013-02-18 2014-08-21 Nadine Sina Kurz Method and apparatus for performing natural language searches
US9442977B2 (en) * 2013-09-06 2016-09-13 Sap Se Database language extended to accommodate entity-relationship models
US10867256B2 (en) * 2015-07-17 2020-12-15 Knoema Corporation Method and system to provide related data
US20180210883A1 (en) * 2017-01-25 2018-07-26 Dony Ang System for converting natural language questions into sql-semantic queries based on a dimensional model
CN112580357A (en) * 2019-09-29 2021-03-30 微软技术许可有限责任公司 Semantic parsing of natural language queries
CN111459967A (en) * 2020-03-03 2020-07-28 深圳壹账通智能科技有限公司 Structured query statement generation method and device, electronic equipment and medium
CN113918589A (en) * 2020-07-10 2022-01-11 阿里巴巴集团控股有限公司 Query statement generation method, correlation method and device
CN112001188B (en) * 2020-10-30 2021-03-16 北京智源人工智能研究院 Method and device for rapidly realizing NL2SQL based on vectorization semantic rule
CN113886527A (en) * 2021-10-20 2022-01-04 前锦网络信息技术(上海)有限公司 Natural language semantic extraction method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107451153A (en) * 2016-05-31 2017-12-08 北京京东尚科信息技术有限公司 The method and apparatus of export structure query statement
CN111797278A (en) * 2020-05-19 2020-10-20 武汉乐程软工科技有限公司 Method for mapping associated object and relation
CN113051287A (en) * 2021-06-01 2021-06-29 北京达佳互联信息技术有限公司 Query statement generation method, device, equipment and storage medium
CN113495900A (en) * 2021-08-12 2021-10-12 国家电网有限公司大数据中心 Method and device for acquiring structured query language sentences based on natural language

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《A Simple Guide to Implement Data Retrieval through Natural Language Database Query Interface (NLDQ)》;Tameem Ahmad 等;《IEEE》;20200616;第1-5页 *
《基于深度学习的自然语言生成SQL方法研究与应用》;葛岩;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115;第I138-778页 *

Also Published As

Publication number Publication date
CN115687397A (en) 2023-02-03
CN114090619A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN112232074B (en) Entity relationship extraction method and device, electronic equipment and storage medium
JP2010541079A5 (en)
CN110275947A (en) Domain-specific knowledge map natural language querying method and device based on name Entity recognition
US9754083B2 (en) Automatic creation of clinical study reports
TWI686707B (en) Method and device for obtaining data inventory
CN111435410B (en) Relationship extraction method and device for medical texts
TWI713015B (en) Language recognition method and device
CN112580357A (en) Semantic parsing of natural language queries
CN111292814A (en) Medical data standardization method and device
CN113127605B (en) Method and system for establishing target recognition model, electronic equipment and medium
CN110909126A (en) Information query method and device
CN114090619B (en) Query processing method and device for natural language
US20150066536A1 (en) Method and apparatus for generating health quality metrics
US20090234852A1 (en) Sub-linear approximate string match
CN109902309B (en) Translation method, device, equipment and storage medium
CN116737879A (en) Knowledge base query method and device, electronic equipment and storage medium
WO2016131295A1 (en) Northbound data conversion method and device
CN114090620B (en) Query request processing method and device
CN111061927A (en) Data processing method and device and electronic equipment
CN114625889A (en) Semantic disambiguation method and device, electronic equipment and storage medium
CN110058858B (en) JSON data processing method and device
CN114676258A (en) Disease classification intelligent service method based on patient symptom description text
JP2014229078A (en) Natural language inference system, natural language inference method and program
CN114090721B (en) Method and device for querying and updating data based on natural language data
US11544317B1 (en) Identifying content items in response to a text-based request

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant