CN114064861A

CN114064861A - Query statement generation method and device

Info

Publication number: CN114064861A
Application number: CN202010761820.8A
Authority: CN
Inventors: 李裕田
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-02-18

Abstract

The embodiment of the application provides a method and a device for generating a query statement, wherein the method comprises the following steps: acquiring text information; extracting query keywords from the text information; determining a slot position type corresponding to the query keyword according to text information; and generating a query statement by adopting the query keyword and the slot position type corresponding to the query keyword. The query keywords and the slot types corresponding to the query keywords can be directly determined from the text information, and the query sentence query database is generated, so that the processing efficiency can be improved, and higher accuracy can be obtained.

Description

Query statement generation method and device

Technical Field

The present application relates to the field of text processing technologies, and in particular, to a query statement generation method and a query statement generation device.

Background

In the prior art, in order to realize interaction between a person and a computer, the computer may generally obtain text information input by the person, convert the text information into a query sentence understood by the computer, and return an answer corresponding to the query sentence.

However, in order to convert the text information into a computer-understandable query statement, a large number of preset text information-query statement pairs are usually used to train a model or build a grammar database. However, if the acquired text information does not exist in the preset text information-query sentence pair, the accuracy is likely to decrease. And the end-to-end model training has strong dependence on the field, and the model trained in one field is difficult to be migrated to a new field.

Disclosure of Invention

In view of the above, embodiments of the present application are proposed to provide a query statement generation apparatus and a corresponding query statement generation apparatus that overcome or at least partially solve the above problems.

In order to solve the above problem, an embodiment of the present application discloses a method for generating a query statement, including:

acquiring text information;

extracting query keywords from the text information;

determining a slot position type corresponding to the query keyword according to text information;

generating a query statement by adopting the query keyword and a slot position type corresponding to the query keyword;

and searching query result information corresponding to the query statement in a preset database.

Optionally, the step of extracting the query keyword from the text information includes:

extracting candidate keywords and data types corresponding to the candidate keywords from the text information;

and determining a target keyword in the candidate keywords as a query keyword according to the data type corresponding to the candidate keywords.

Optionally, the database includes at least one preset entity data and a data type corresponding to the preset entity data;

the step of extracting the candidate keywords and the data types corresponding to the candidate keywords from the text information comprises the following steps:

extracting candidate entity words matched with preset entity data in the database from the text information;

and determining a target entity word in the candidate entity words as a candidate keyword, and determining a data type corresponding to the candidate keyword.

Optionally, the step of determining a target entity word in the candidate entity words as a candidate keyword and determining a data type corresponding to the candidate keyword includes:

determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity word and the preset entity data;

and taking the data type corresponding to the preset entity data matched with the candidate keyword as the data type corresponding to the candidate keyword.

Optionally, the step of determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity word and the preset entity data includes:

determining entity type probability corresponding to the candidate entity words in the text information by adopting a preset entity marking model;

and determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity words and the preset entity data and the entity type probability corresponding to the candidate entity words.

Optionally, the step of determining a target keyword from the candidate keywords as a query keyword according to the data type corresponding to the candidate keyword includes:

replacing candidate keywords in the text information with data types corresponding to the candidate keywords to obtain candidate language information;

determining grammar probability corresponding to the candidate language information by adopting a preset language model;

and determining a target keyword in the candidate keywords as a query keyword according to the grammar probability.

Optionally, the step of determining the slot type corresponding to the query keyword according to the text information includes:

replacing the query keywords in the text information with the data types corresponding to the query keywords to obtain query language information;

performing syntactic analysis on the query language information, and determining a syntactic tree corresponding to the query language information;

and determining the slot position type corresponding to the query keyword by adopting the syntax tree.

Optionally, the data type includes at least one of metadata, a dimension attribute, an index, a dimension enumeration value, and a time value;

the step of determining the slot position type corresponding to the query keyword according to the text information comprises the following steps:

determining whether the data type corresponding to the query keyword contains metadata;

and if the data type corresponding to the query keyword contains metadata, determining the slot position type corresponding to the query keyword according to text information.

Optionally, the step of determining the slot type corresponding to the query keyword further includes:

if the data type corresponding to the query keyword does not contain metadata, determining whether historical text information contains the historical query keyword of which the data type is the metadata;

if the historical text information contains historical query keywords of which the data types are metadata, determining the historical query keywords and slot position types of the query keywords according to the historical text information;

if the historical text information does not contain historical query keywords with data types as metadata, determining the metadata keywords with the data types as the metadata by adopting the query keywords; and determining the slot position types of the metadata keywords and the query keywords according to text information.

Optionally, the method further comprises:

and determining the intention category corresponding to the text information by adopting a preset text classification model.

The embodiment of the present application further discloses a device for generating query statements, including:

the acquisition module is used for acquiring text information;

the extraction module is used for extracting query keywords from the text information;

the slot position type determining module is used for determining the slot position type corresponding to the query keyword according to the text information;

the generating module is used for generating a query statement by adopting the query keyword and the slot position type corresponding to the query keyword;

and the searching module is used for searching the query result information corresponding to the query statement in a preset database.

Optionally, the extraction module comprises:

the candidate keyword extraction submodule is used for extracting candidate keywords and data types corresponding to the candidate keywords from the text information;

and the query keyword determining submodule is used for determining a target keyword in the candidate keywords as a query keyword according to the data type corresponding to the candidate keyword.

the candidate keyword extraction sub-module comprises:

the candidate entity word extracting unit is used for extracting candidate entity words which are matched with preset entity data in the database from the text information;

and the candidate keyword extraction unit is used for determining a target entity word in the candidate entity words as a candidate keyword and determining the data type corresponding to the candidate keyword.

Optionally, the candidate keyword extraction unit includes:

a candidate keyword extraction subunit, configured to determine, based on a similarity between the candidate entity word and the preset entity data, a target entity word in the candidate entity words as a candidate keyword;

and the data type determining unit is used for taking the data type corresponding to the preset entity data matched with the candidate keyword as the data type corresponding to the candidate keyword.

Optionally, the candidate keyword extraction subunit is specifically configured to determine, by using a preset entity tagging model, an entity type probability corresponding to the candidate entity word in the text information; and determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity words and the preset entity data and the entity type probability corresponding to the candidate entity words.

Optionally, the query keyword determination sub-module includes:

the candidate language information acquisition unit is used for replacing the candidate keywords in the text information with the data types corresponding to the candidate keywords to obtain candidate language information;

the grammar probability determining unit is used for determining grammar probability corresponding to the candidate language information by adopting a preset language model;

and the query keyword determining unit is used for determining a target keyword from the candidate keywords as a query keyword according to the grammar probability.

Optionally, the slot type determining module includes:

the query language information acquisition sub-module is used for replacing the query keywords in the text information with the data types corresponding to the query keywords to obtain query language information;

the syntax tree determining submodule is used for carrying out syntax analysis on the query language information and determining a syntax tree corresponding to the query language information;

and the slot position type determining submodule is used for determining the slot position type corresponding to the query keyword by adopting the syntax tree.

the slot type determination submodule comprises:

a metadata determining unit, configured to determine whether a data type corresponding to the query keyword includes metadata;

and the first slot type determining unit is used for determining the slot type corresponding to the query keyword according to the text information if the data type corresponding to the query keyword contains metadata.

Optionally, the slot type determining sub-module further includes:

the history query keyword determining unit is used for determining whether history text information contains history query keywords of which the data types are metadata or not if the data types corresponding to the query keywords do not contain the metadata;

a second slot type determining unit, configured to determine, according to history text information, a history query keyword whose data type is metadata and a slot type of the query keyword if the history text information includes the history query keyword;

a third slot type determining unit, configured to determine a metadata keyword having a data type of metadata by using the query keyword if the history text information does not include the history query keyword having a data type of metadata; and determining the slot position types of the metadata keywords and the query keywords according to text information.

Optionally, the apparatus further comprises:

and the intention category determining module is used for determining an intention category corresponding to the text information by adopting a preset text classification model.

The embodiment of the application also discloses a device, including:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform one or more methods as described in embodiments of the application.

Embodiments of the present application also disclose one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform one or more methods as described in embodiments of the present application.

The embodiment of the application has the following advantages:

acquiring text information by the generation method of the query statement in the embodiment of the application; extracting query keywords from the text information; determining a slot position type corresponding to the query keyword according to text information; and generating a query statement by adopting the query keyword and the slot position type corresponding to the query keyword. Therefore, the query keywords and the slot types corresponding to the query keywords can be directly determined from the text information, the query sentence query database is generated, the processing efficiency can be improved, and higher accuracy can be obtained.

Drawings

FIG. 1 is a flow chart of steps of an embodiment of a method for generating a query statement of the present application;

FIG. 2 is a flow chart of steps of another embodiment of a method for generating a query statement of the present application;

FIG. 3 is a schematic diagram of a syntax tree of the present application;

fig. 4 is a block diagram illustrating an embodiment of a query statement generation apparatus according to the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

According to the method and the device, the query keyword is extracted from the text information, the slot type corresponding to the query keyword is determined, and the query sentence is generated based on the query keyword and the slot type corresponding to the query keyword, so that the query sentence facing various fields can be generated. The query sentence generating method can be used in the fields of natural language query systems, intelligent data interaction, intelligent dialogue robots and the like, and realizes interaction with a computer through texts input by users.

For example, when a user purchases via the internet, needs to inquire logistics states, consult commodity information, order commodities, obtain after-sales support and the like, the user can communicate with the intelligent dialogue robot by inputting texts, the intelligent dialogue robot can extract inquiry keywords from the text information, determine slot types corresponding to the inquiry keywords, generate inquiry sentences based on the inquiry keywords and the slot types corresponding to the inquiry keywords, obtain inquiry results corresponding to contents which the user wants to inquire, and realize interaction with the user based on the inquiry results.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for generating a query statement in the present application is shown, which may specifically include the following steps:

step 101, acquiring text information;

in the embodiment of the application, the text information input by the user can be acquired. Specifically, a user may input a voice through a voice input device such as a microphone, so that voice recognition may be performed on the voice to obtain text information. The user may input text through an external input device such as a keyboard, mouse, touch screen, etc., so that text information may be obtained.

Alternatively, the text information may be language information that is naturally generated as the human society develops, and is used in communication at ordinary times, such as chinese, english, japanese, and the like.

Step 102, extracting query key words from the text information;

in the embodiment of the present application, a query keyword used for querying in the database may be extracted from the text information.

In a specific implementation, the text information may include entity words, and the entity words may be words having a specific meaning in the text information. Such as a person's name, place name, organization name, time of day, proper noun, etc.

The information to be queried by the user can be information with a keyword existing in the entity term, so that the entity term can be extracted from the text information as a query keyword, and the information with the association with the entity term can be queried in the database.

Step 103, determining slot position types corresponding to the query keywords according to text information;

in the embodiment of the application, the slot type corresponding to the query keyword can be determined according to text information, and the position of the query keyword to be filled in the query sentence can be determined.

In a specific implementation, information is queried in the database, and a query statement dedicated to the database is usually required to be used for querying. Such as SQL query statements, etc. The query statement may have a preset query syntax format and at least one slot corresponding to a slot type. The query syntax format may be a fixed expression format of the query statement. The slot type may be a function type corresponding to a slot in the query statement.

The textual information may also be composed of a plurality of different types of sentence components, such as subjects, predicates, objects, complements, determinants, subjects, peers, and the like. Thus, the query keyword can correspond to sentence components belonging to different types in the text information. Different types of sentence components and positions of the sentence components in the text information can correspond to different slot types in the query sentence. Therefore, the slot position type corresponding to the query keyword can be determined based on the sentence component of the query keyword in the text information and the position of the query keyword in the text information, so that the query keyword is correspondingly filled into slot positions belonging to different slot position types to form a query sentence.

As an example of the present application, the query statement may include a SELECT statement. The query syntax format of the SELECT statement may be "column name FROM table name WHERE constraint to be queried by SELECT". In a SELECT statement, "SELECT", "FROM", and "WHERE" may be components in the SELECT statement, each of which may have a different slot type. "SELECT" is used to determine the column name to be queried, "FROM" is used to determine the data table name to be queried, and "WHERE" is used to further constrain the query statement.

The textual information may be "query names and ages in employee table," where the query keywords include "name", "age", "employee table. The query keywords "name" and "age" are located at the end of the text message and are objects in the text message. The query keyword "employee table" is located in the middle position in the text message and is a fixed language of the object in the text message. The slot type corresponding to the query keyword "name" and "age" may be determined as a query column name, and the slot type corresponding to the query keyword "employee table" may be determined as a table name, so as to fill the query keyword into slots belonging to different slot types, thereby forming a query statement.

104, generating a query statement by adopting the query keyword and a slot position type corresponding to the query keyword;

in this embodiment of the present application, after determining the slot type corresponding to the query keyword, a query statement that can be used for querying in a database may be generated by using the query keyword and the slot type corresponding to the query keyword.

Specifically, the query keyword may be filled in the query statement according to the slot type corresponding to the query keyword based on the query syntax format of the query statement.

As an example of the present application, as described above, the slot type corresponding to the query keyword "name" and "age" is a query column name, and the slot type corresponding to the query keyword "employee table" is a table name. The query syntax format of the SELECT statement may be "a list name FROM table name WHERE constraint condition to be queried by SELECT", and the query keyword may be filled in different slots of the SELECT statement according to the slot type corresponding to the query keyword, so as to obtain the query statement "SELECT name, age FROM employee table".

In the embodiment of the application, the query statement may be used to perform query in a preset database to obtain query result information corresponding to the query statement.

In the embodiment of the application, after the query result information corresponding to the query statement is determined, the query result information can be directly displayed to a user. Or generating interactive information expressed by natural language based on the query result information, and displaying the interactive information to the user. Thereby enabling natural language interaction between the user and the computer.

Referring to fig. 2, a flowchart illustrating steps of an embodiment of a method for generating a query statement in the present application is shown, which may specifically include the following steps:

step 201, acquiring text information;

Step 202, extracting candidate keywords and data types corresponding to the candidate keywords from the text information;

in the embodiment of the application, the text information may include entity words, and the entity words may be words having a specific meaning in the text information. Such as a person's name, place name, organization name, time of day, proper noun, etc.

In this embodiment of the application, the preset database may be a multidimensional database, and the data in the database may be stored in a manner of multiple N-dimensional arrays. An N-dimensional array may be referred to as a Cube (Cube), and the database may have a plurality of different levels of data types, such as metadata, dimension attributes, indices, dimension enumeration values, time values, and so on.

The metadata may include description information of the cube, instance information of the cube, project information, table information, dictionary information, and the like. The dimension attribute may be attribute information of a certain dimension in the cube. The index may be information of aggregation analysis in a dimension of the cube. The dimension enumeration value can be a specific value of each datum in a dimension of the cube. The time value may be a value of data expressed in a time form in the cube.

In the embodiment of the application, entity words can be extracted from the text information, the entity words are used as candidate keywords, and the data types corresponding to the candidate keywords are determined, so that the target keywords are further determined based on the data types of the candidate keywords.

In a specific implementation, it may be determined whether an entity word that may have a higher similarity with data in a preset database exists in the text information based on the preset database, and the entity word is used as a candidate keyword, and further, a data type corresponding to the candidate keyword is determined based on the database.

In an embodiment of the present application, the database includes at least one preset entity data and a data type corresponding to the preset entity data;

in this embodiment of the present application, the database may include at least one preset entity data, and the preset entity data may be data information having a specific meaning in the database, for example, metadata information, dimension attribute information, index information, dimension enumeration value information, time value information, and the like.

In this embodiment, the preset entity data may correspond to a data type. For example, the data type can be metadata, dimension attributes, metrics, dimension enumeration values, time values, and the like.

s11, extracting candidate entity words matched with preset entity data in the database from the text information;

in the embodiment of the application, entity words matched with preset entity data in the database in the text information can be extracted as candidate entity words. The candidate entity words and the preset entity data may have a certain similarity, and target entity words may be further screened out from the candidate entity words.

In a specific implementation, at least one word in the text information may be matched with preset entity data in the database in a character string matching manner, so as to obtain at least one candidate entity word matched with the preset entity data in the database.

S12, determining a target entity word in the candidate entity words as a candidate keyword, and determining the data type corresponding to the candidate keyword.

In the embodiment of the application, entity words with higher similarity to data in a preset database can be further screened from the candidate entity words to serve as candidate keywords, and the data types corresponding to the candidate keywords are determined based on the database.

In an embodiment of the present application, the step of determining a target entity word in the candidate entity words as a candidate keyword and determining a data type corresponding to the candidate keyword includes:

s21, determining target entity words in the candidate entity words as candidate keywords based on the similarity between the candidate entity words and the preset entity data;

in this embodiment of the application, after extracting the candidate entity word in the text information, which matches with the preset entity data in the database, the similarity between the candidate entity word and the preset entity data may be further determined. Then, a target entity word may be determined in the candidate entity words as a candidate keyword based on a similarity between the candidate entity word and the preset entity data.

In a specific implementation, the similarity between the candidate entity word and the preset entity data may be determined by adopting a calculation manner of calculating text similarities such as a minimum edit distance and a cosine similarity. And then, sequentially selecting at least one candidate entity word based on the sequence of the similarity from high to low, determining the candidate entity word as a target entity word, and taking the target entity word as a candidate keyword.

And S22, taking the data type corresponding to the preset entity data matched with the candidate keyword as the data type corresponding to the candidate keyword.

In this embodiment of the present application, the preset entity data may have a corresponding data type. The candidate keywords matched with the preset entity data can be considered to be basically the same as the preset entity data, so that the data type corresponding to the preset entity data can also be the data type corresponding to the candidate keywords. Therefore, the data type corresponding to the preset entity data matched with the candidate keyword can be used as the data type corresponding to the candidate keyword.

In an embodiment of the application, the step of determining a target entity word in the candidate entity words as a candidate keyword based on a similarity between the candidate entity word and the preset entity data includes:

s31, determining entity type probability corresponding to the candidate entity words in the text information by adopting a preset entity labeling model;

in the embodiment of the application, a preset entity tagging model can be adopted to perform entity identification on the text information, identify entity words in the text information, perform entity tagging on the entity words, and give the probability that the entity words belong to a certain entity type, so as to obtain the entity type probability. The entity words may include candidate entity words, so that entity type probabilities corresponding to the candidate entity words may be obtained.

In the embodiment of the present application, the entity types may include a name of a person, a name of a place, a name of an organization, a date and time, a proper noun, and the like, which is not limited in the present application.

In this embodiment of the application, the entity tagging Model may be an HMM (Hidden Markov Model), a CRF (Conditional Random Fields), a Bi-LSTM (Bi-directional Long Short-Term Memory) Model, a Bi-LSTM + CRF (Bi-directional Long Short-Term Memory + Conditional Random Fields), and the like, which is not limited in this application. The entity labeling model can be obtained by training samples labeled with entity words and entity types corresponding to the entity words. Optionally, based on training by using a general sample, according to an application field in which a user interacts with a computer in natural language, further training the entity tagging model by using the sample in the application field, so as to improve the entity tagging accuracy of the entity tagging model in a certain field. The application fields are divided into an electronic commerce field, a financial field, an entertainment field, a personal assistant field, a map navigation field, an intelligent home field and the like according to actual needs, and the application is not limited to the fields.

And S32, determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity words and the preset entity data and the entity type probability corresponding to the candidate entity words.

In this embodiment of the application, a target entity word that is similar to the preset entity data may be determined comprehensively among the candidate entity words as a candidate keyword based on a similarity between the candidate entity word and the preset entity data and an entity type probability corresponding to the candidate entity word.

In a specific implementation, the similarity between the candidate entity word and the preset entity data and a weighted average value between entity type probabilities corresponding to the candidate entity word may be determined, and based on a sequence from high to low of the weighted average value, at least one candidate entity word is sequentially selected and determined as a target entity word, and the target entity word is used as a candidate keyword.

Step 203, determining a target keyword in the candidate keywords as a query keyword according to the data type corresponding to the candidate keyword.

In the embodiment of the application, the candidate keywords are not necessarily the content that the user wants to query. For example, a text message "query movie spiders" may match candidate keywords "spiders" that do not match the content the user wishes to query, as well as "spiders". Therefore, whether the candidate keywords are the content which the user wants to query or not can be determined according to the data types corresponding to the candidate keywords and the positions of the candidate keywords in the text information, and therefore the target keywords are determined from the candidate keywords and serve as the query keywords.

In an embodiment of the present application, the step of determining a target keyword among the candidate keywords as a query keyword according to a data type corresponding to the candidate keyword includes:

s41, replacing the candidate keywords in the text information with the data types corresponding to the candidate keywords to obtain candidate language information;

in the embodiment of the application, the candidate keywords in the text information may be replaced with the data types corresponding to the candidate keywords to obtain candidate language information, so as to determine whether the candidate language information conforms to the grammar rule of the text information and belongs to a normal sentence. For example, between the text message "hello, good weather today" and "hello, good weather today", the "hello, good weather today" conforms more to the grammatical rules of the text message, and is closer to the normal sentence.

As an example of the present application, the text information is "query movie spiderman", and candidate keywords "movie", "spider", and "spiderman" can be matched. The data types of the "spider" and the "spiderman" can be dimension enumeration values, and are marked as # dimEnum; the data type of "movie" may be dimension, labeled # dimension. The candidate keywords in the text message can be replaced with the data types corresponding to the candidate keywords to obtain candidate language information, so that candidate language information "query # dimension # dimEnum man" and "query # dimension # dimEnum" can be obtained.

S42, determining grammar probability corresponding to the candidate language information by adopting a preset language model;

in the embodiment of the application, a preset language model can be adopted to determine the grammar probability corresponding to the candidate language information, so that whether the candidate language information accords with the grammar rule of the text and is close to a normal sentence can be determined based on the grammar probability.

In the embodiment of the present application, the Language model may be an n-gram (n-gram) model, an NNLM (Neural Network Language Models), and the like, which is not limited in the present application. And designing a plurality of samples in advance, in which the text information is partially replaced by data types, according to the sentence structure possibly related to the text information, and training the language model to be trained by adopting the samples to obtain the language model.

As an example of the present application, the sample in which the text information is partially replaced with the data type may be "# cube", "# dimEnum", "# time", "# measure for # cube", "how much # measure for # cube is", how much # measure for "# time # cube # dimEnum", or the like.

And S43, determining a target keyword in the candidate keywords as a query keyword according to the grammar probability.

In the embodiment of the application, the candidate language information which is higher in grammar probability and belongs to the normal sentence can be determined according to the grammar probability, the candidate keywords contained in the candidate language information which belongs to the normal sentence are determined as the target keywords, and the target keywords are used as the query keywords.

In the embodiment of the present application, there may be a plurality of candidate language information having a high grammar probability and belonging to a normal sentence. At this time, according to actual needs, the candidate keyword in the candidate language information with the highest grammatical probability may be used as the target keyword, or the target keyword in the candidate keyword may be determined in a manner that the user intends to be determined for the user, the user selects the target keyword from the candidate keywords, and the like based on the candidate keyword in the candidate language information with the higher grammatical probability.

Step 204, determining slot position types corresponding to the query keywords according to text information;

In an embodiment of the application, the step of determining the slot type corresponding to the query keyword according to the text information includes:

s51, replacing the query keywords in the text information with the data types corresponding to the query keywords to obtain query language information;

in this embodiment of the application, in order to further determine the slot type corresponding to the query keyword, the query keyword in the text information may be replaced with a data type corresponding to the query keyword, so as to obtain query language information. Therefore, the data type corresponding to the determined query keyword can be converted into the slot position type corresponding to the determined data type, and the problem that the analysis difficulty is increased due to the fact that text information is directly analyzed aiming at a large number of different query keywords is avoided.

S52, carrying out syntactic analysis on the query language information, and determining a syntactic tree corresponding to the query language information;

in this embodiment of the present application, the query language information may be parsed, sentence components corresponding to each word in the query language information may be determined, and a corresponding syntax tree may be generated.

In a specific implementation, a Probabilistic Context Free Grammar (PCFG) can be employed for syntax analysis. The probabilistic context-free grammar can define a quadruple { N, E, S, R }. Where N represents a set of non-terminal symbols, E represents a set of terminal symbols, S represents an initial symbol, R represents a set of grammar rules, and each grammar rule in the set of grammar rules may be provided with a probability P.

Then, a CYK (cockeyouger-Kasami algorithm) algorithm may be adopted to perform dynamic programming based on a probabilistic context-free grammar, so as to obtain at least one syntax tree corresponding to the query language information. And the probability of each of the syntax trees may be a product of probabilities of all used syntax rules. The higher the probability of a syntax tree, the more likely it is to be the correct syntax tree. Thus, a target syntax tree can be determined in the syntax tree as a syntax tree corresponding to the query language information.

And S53, determining the slot position type corresponding to the query keyword by adopting the syntax tree.

In this embodiment of the present application, the syntax tree may be adopted to determine a slot type corresponding to the query keyword, so as to compose the query keyword into a query statement. The slot type may be a function type corresponding to each component in the query statement.

In a specific implementation, in the case of processing text information, a phrase structure in the text information and a part of speech of each word in the text information may be generally marked in a syntax tree.

For example, for the textual information "teacher is happy by a late student," wherein "teacher" belongs to a nominal phrase, "late student is happy" belongs to a verb phrase, the verb phrase "late student is happy" may further include a preposition phrase "late student" and a verb phrase "happy," the preposition phrase "late student" may include a preposition phrase "late," a fictitious phrase "and a nominal phrase" late student, "the nominal phrase" late student "may include a verbalization phrase" late, "a fictitious phrase" and a nominal phrase "student," the verb phrase "fun" may include a verbalization phrase "fun" and a fictitious phrase "and a verb phrase" fun "may include a verbalization phrase" fun "and a fictitious phrase" so as to yield a phrase structure in the textual information. And the text information "teacher is funed by late student" in which "teacher" and "student" belong to nouns, "are" to prepositions, "late" and "fun" belong to verbs.

Because the query key words in the query language information are replaced by the data types, a self-defined quadruplet can be designed for the query language information, so that the corresponding slot position types can be determined for the data types in the query language information, and the slot position types corresponding to the query key words can be determined.

Specifically, for the non-terminal symbol set and the terminal symbol set in the quadruplet, the original non-terminal symbol set and the terminal symbol set may generally include part-of-speech tags such as nouns, noun phrases, verbs, conjunctions, and the like, so that part-of-speech tags may be performed on words in the text information, and the present application may replace part of the part-of-speech tags in the non-terminal symbol set and the terminal symbol set with slot types and data types. For the grammar rules in the quadruple, the grammar rules for the query language information can be set based on the common sentence structure of the query language information. Therefore, the syntax tree can be adopted to mark the slot position type corresponding to the data type in the query language information.

As a specific example of the present application, the set of non-terminal symbols may include:

s (initial symbol), AND SLOT types such as TABLE _ SLOT, SELECT _ TARGET, SELECT _ C ONCAT, SELECT _ COLUMN, SELECT _ AGG _ MAX _ PREFIX, SELECT _ A GG _ MAX _ SUFFIX, WHERE _ CLAUSE, WHERE _ CONDITION, WHERE _ CONCAT _ AND, WHERE _ CONCAT _ OR WHERE _ SLOT, WHERE _ VALUE, WHERE _ OPERATOR _ EQUAL _ PREFIX.

As a specific example of the present application, the terminal symbol set may include:

the system comprises a connecting word, a modifier, a help word, a punctuation mark, and data types such as # cube (metadata), # dimension (dimension attribute), # dimension (dimension enumeration value), # measure (index), # time (time), and # number (number).

As a specific example of the present application, the grammar rule may include:

S→TABLE_SLOT WHERE_CLAUSE SELECT_TARGET

TABLE_→#cube

WHERE→WHERE_CONDITION aux

WHERE_CONDITION→#dimEnum

aux → SELECT _ TARGET →

As a specific example of the present application, fig. 3 is a schematic diagram of a syntax tree of the present application. The text information is "resolution of membership query for a certain product", the query keywords "certain product", "membership query", and "resolution" can be extracted, and further the data type corresponding to the query keyword "certain product" is determined as metadata, the data type corresponding to the query keyword "membership query" is a dimension enumeration value, and the data type corresponding to the query keyword "resolution" is determined as an index, thereby generating the query language information "# measure of # cube # dimenum". And carrying out syntactic analysis on the query language information, and determining a syntactic tree corresponding to the query language information. Where S is an initial notation, the query language information is divided into "table name", "whereClause", and "selectTarget", where "whereClause" is further divided into "whereCondition" and "aux". Wherein, # cube belongs to table name, # dimenum belongs to whereCondition, and # measure belongs to selectTarget.

In an embodiment of the present application, the data type includes at least one of metadata, a dimension attribute, an index, a dimension enumeration value, and a time value.

s61, determining whether the data type corresponding to the query keyword contains metadata;

in the embodiment of the present application, the metadata may include description information of the cube, instance information of the cube, entry information, table information, dictionary information, and the like. So that generally the metadata can point to the cube in which the user wishes the data queried. If the query keyword does not store metadata, the query accuracy is easily reduced. Therefore, after the data type corresponding to the query keyword is determined, whether the data type corresponding to the query keyword contains metadata can be determined.

And S62, if the data type corresponding to the query keyword contains metadata, determining the slot position type corresponding to the query keyword according to text information.

In the embodiment of the present application, if the data type corresponding to the query keyword includes metadata, a cube in which the data that the user wishes to query is located may be determined at this time, so that the slot type corresponding to the query keyword may be determined according to text information.

In an embodiment of the application, the step of determining the slot type corresponding to the query keyword further includes:

s71, if the data type corresponding to the query keyword does not contain metadata, determining whether historical text information contains the historical query keyword with the data type as the metadata;

in this embodiment of the application, if the data type corresponding to the query keyword does not include metadata, the query keyword whose data type is metadata may be in historical text information input by interaction between the user history and the computer in a process that the user may perform multiple rounds of interaction with the computer. Therefore, whether the historical text information contains the historical query key words with the data types of metadata or not can be determined.

S72, if the historical text information contains historical query keywords with data types as metadata, determining the historical query keywords and slot position types of the query keywords according to the historical text information;

in the embodiment of the application, if the historical text information includes the historical query keyword of which the data type is the metadata, the historical query keyword of which the data type is the same as that of the current query keyword in the historical text information can be replaced by the current query keyword based on the data type of the query keyword, so that new text information is obtained, and the new text information includes the historical query keyword of which the data type is the metadata. And then determining the historical query keywords with the data types as metadata and the slot position types of the query keywords by adopting new text information. Therefore, the user can realize multiple rounds of interaction between the user and the computer without repeatedly inputting the text information containing the metadata.

S73, if the historical text information does not contain the historical query keywords with the data types as the metadata, determining the metadata keywords with the data types as the metadata by adopting the query keywords; and determining the slot position types of the metadata keywords and the query keywords according to text information.

In this embodiment of the application, if the historical text information does not include a historical query keyword whose data type is metadata, a cube including the query keyword may be determined by using the query keyword included in the current text information, and the metadata keyword whose data type is metadata may be determined based on the metadata of the cube. And then, the metadata keywords can be displayed to a user according to actual needs, whether the metadata keywords are correct or not is determined, or the metadata keywords can be directly added into text information, and the metadata keywords and the slot position types of the query keywords are determined according to the text information.

Step 205, generating a query statement by using the query keyword and the slot position type corresponding to the query keyword;

In the embodiment of the application, after the query result information corresponding to the query statement is determined, the query result information can be directly displayed to a user. Or generating interactive information expressed by texts based on the query result information, and displaying the interactive information to the user. Thereby realizing text interaction between the user and the computer.

In one embodiment of the present application, the method further comprises:

and S81, determining the intention category corresponding to the text information by adopting a preset text classification model.

In the embodiment of the application, a preset text classification model can be adopted to perform intention identification on the text information and determine the intention category corresponding to the text information. As an alternative embodiment, the intention category may include data query, factor analysis, anomaly detection, time series prediction, and the like, which is not limited in this application.

The Text Classification model may be a TextCNN (Convolutional Neural network for Text Classification) model, a classifier based on BERT (Bidirectional Encoder from transforms), and the like, which is not limited in the present application.

In this embodiment of the application, after determining to acquire the query result information, whether the query result information conforms to the intention category may be analyzed based on the intention category corresponding to the text information, so as to determine whether the query result information is abnormal. If the query statement is abnormal, an entity labeling model, a grammar type, an algorithm for dividing a syntax tree and the like used in the generation process of the query statement can be further adjusted so as to further improve the accuracy of the query result information.

Acquiring text information by the generation method of the query statement in the embodiment of the application; extracting candidate keywords and data types corresponding to the candidate keywords from the text information; determining a target keyword in the candidate keywords as a query keyword according to the data type corresponding to the candidate keyword; and generating a query statement by adopting the query keyword and the slot position type corresponding to the query keyword. Therefore, the query keywords and the slot types corresponding to the query keywords can be directly determined from the text information, the query sentence query database is generated, the processing efficiency can be improved, and higher accuracy can be obtained.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.

Referring to fig. 4, a block diagram of an embodiment of a query statement generation apparatus according to the present application is shown, and specifically includes the following modules:

an obtaining module 401, configured to obtain text information;

an extracting module 402, configured to extract a query keyword from the text information;

a slot type determining module 403, configured to determine, according to the text information, a slot type corresponding to the query keyword;

a generating module 404, configured to generate a query statement by using the query keyword and a slot type corresponding to the query keyword;

the searching module 405 is configured to search query result information corresponding to the query statement in a preset database.

Optionally, the extraction module comprises:

the candidate keyword extraction sub-module comprises:

Optionally, the candidate keyword extraction unit includes:

Optionally, the query keyword determination sub-module includes:

Optionally, the slot type determining module includes:

the slot type determination submodule comprises:

Optionally, the slot type determining sub-module further includes:

Optionally, the apparatus further comprises:

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present application further provides an apparatus, including:

one or more processors; and

one or more machine-readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform methods as described in embodiments of the present application.

Embodiments of the present application also provide one or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the methods of embodiments of the present application.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one of skill in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the true scope of the embodiments of the application.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method for generating a query statement and the device for generating a query statement provided by the present application are introduced in detail, and a specific example is applied in the text to explain the principle and the implementation of the present application, and the description of the above embodiment is only used to help understand the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for generating a query statement, comprising:

acquiring text information input by a user in a conversation process;

extracting query keywords from the text information;

determining a slot position type corresponding to the query keyword according to the text information;

and generating a query statement by adopting the query keyword and the slot position type corresponding to the query keyword, wherein the query statement is used for querying data in a database and feeding back the data to the user.

2. The method of claim 1, wherein the step of extracting query keywords from the text message comprises:

3. The method according to claim 2, wherein the database comprises at least one preset entity data and a data type corresponding to the preset entity data;

4. The method according to claim 3, wherein the step of determining a target entity word as a candidate keyword among the candidate entity words and determining a data type corresponding to the candidate keyword comprises:

5. The method according to claim 4, wherein the step of determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity word and the preset entity data comprises:

6. The method according to claim 2, wherein the step of determining a target keyword among the candidate keywords as a query keyword according to the data type corresponding to the candidate keyword comprises:

7. The method according to claim 1, wherein the step of determining the slot type corresponding to the query keyword according to the text information comprises:

8. The method of claim 2, wherein the data types include at least one of metadata, dimension attributes, metrics, dimension enumeration values, and time values;

9. The method of claim 8, wherein the step of determining the slot type corresponding to the query keyword further comprises:

10. The method of claim 1, further comprising:

11. An apparatus for generating a query statement, comprising:

the acquisition module is used for acquiring text information;

and the generating module is used for generating a query statement by adopting the query keyword and the slot position type corresponding to the query keyword.

12. The apparatus of claim 11, wherein the extraction module comprises:

13. The apparatus according to claim 12, wherein the database comprises at least one preset entity data and a data type corresponding to the preset entity data;

the candidate keyword extraction sub-module comprises:

14. The apparatus of claim 13, wherein the candidate keyword extraction unit comprises:

15. The apparatus according to claim 14, wherein the candidate keyword extraction subunit is specifically configured to determine, by using a preset entity labeling model, an entity type probability corresponding to a candidate entity word in the text information; and determining a target entity word in the candidate entity words as a candidate keyword based on the similarity between the candidate entity words and the preset entity data and the entity type probability corresponding to the candidate entity words.

16. The apparatus of claim 12, wherein the query keyword determination sub-module comprises:

17. The apparatus of claim 11, wherein the slot type determination module comprises:

18. The apparatus of claim 12, wherein the data types comprise at least one of metadata, dimension attributes, metrics, dimension enumeration values, and time values;

the slot type determination submodule comprises:

19. The apparatus of claim 18, wherein the slot type determination submodule further comprises:

20. The apparatus of claim 1, further comprising:

21. An apparatus, comprising:

one or more processors; and

one or more machine-readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of one or more of claims 1-10.

22. One or more machine-readable media having instructions stored thereon, which when executed by one or more processors, cause the processors to perform the method of one or more of claims 1-10.