CN112380240A

CN112380240A - Data query method, device and equipment based on semantic recognition and storage medium

Info

Publication number: CN112380240A
Application number: CN202011283796.8A
Authority: CN
Inventors: 赵亮
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-11-17
Filing date: 2020-11-17
Publication date: 2021-02-19
Also published as: WO2022105493A1

Abstract

The invention relates to the field of artificial intelligence, and provides a data query method, a data query device, data query equipment and a storage medium based on semantic recognition. The method comprises the steps of receiving a data query request sent by a user, obtaining first voice data carried in the request, identifying an intention result of the first voice data based on a pre-constructed intention template, obtaining a target character set corresponding to the intention result, judging whether the target character set meets a preset condition for generating a target SQL statement or not, generating the target SQL statement based on the target character set when the target character set meets the condition for generating the target SQL statement, querying a preset database according to the target SQL statement to obtain target data, and feeding the target data back to the user. The invention also relates to the technical field of block chains, and the first voice data and the target data can be stored in a node of a block chain.

Description

Data query method, device and equipment based on semantic recognition and storage medium

Technical Field

The invention relates to the field of artificial intelligence, in particular to a data query method, a data query device, data query equipment and a storage medium based on semantic recognition.

Background

At present, the existing natural language query function is based on rules or deep learning to realize query, and the former controls sentence pattern complexity of query sentences of a user through offline training or interface constraint, thereby realizing more accurate analysis results; the latter is trained through large-scale anticipation, and when the form of the user question exceeds the well-known range of the model training set, the analysis result is often incorrect, so that the data query efficiency is low.

Disclosure of Invention

In view of the above, the present invention provides a data query method, device, apparatus and storage medium based on semantic recognition, and aims to solve the technical problem of low data query efficiency in the prior art.

In order to achieve the above object, the present invention provides a data query method based on semantic recognition, which comprises:

receiving a data query request sent by a user, acquiring first voice data carried in the request, and identifying an intention result of the first voice data based on a pre-constructed intention template;

acquiring a target character set corresponding to the intention result, and judging whether the target character set meets a preset condition for generating a target SQL statement;

and when the target character set meets the preset condition, generating a target SQL statement based on the target character set, querying a preset database according to the target SQL statement to obtain target data, and feeding the target data back to the user.

Preferably, the recognizing the intention result of the first speech data based on the pre-constructed intention template includes:

recognizing a semantic result of the first voice data according to a pre-trained semantic recognition model, matching the semantic result with a plurality of pre-constructed intention templates, judging whether an intention template associated with the semantic result is matched, and taking intention information corresponding to the intention template as an intention result of the first voice data when the intention template associated with the semantic result is matched.

Preferably, the judging whether the intention template associated with the semantic result is matched comprises:

and when the intention template associated with the semantic result is not matched, respectively calculating similarity values of the semantic result and intention information corresponding to the intention templates, and when the similarity values are larger than a preset threshold value, taking the intention information corresponding to the intention template with the largest similarity value as the intention result of the first voice data.

Preferably, after the calculating the similarity value of the semantic result and the intention information corresponding to each intention template, the method further includes:

and when the similarity value larger than the preset threshold value does not exist, feeding back first preset prompt information to the user.

Preferably, after the determining whether the target character set satisfies a preset condition for generating a target SQL statement, the method further includes:

and when the target character set does not meet the condition for generating the target SQL statement, determining a target story line corresponding to the intention information from a pre-constructed story line set, and identifying a target character set corresponding to the first voice data according to the target story line.

Preferably, the determining a target story line corresponding to the intention information from a set of pre-constructed story lines includes:

matching the intention information with sentences of a plurality of root nodes of the story line set, feeding back a first sentence of the root node which is successfully matched to the user, receiving second voice data of the user based on the first sentence, identifying intention information of the second voice data based on a pre-constructed intention template, adding a target character of the intention information to the target character set, and re-judging whether the target character set meets the preset SQL sentence generating condition.

Preferably, before the receiving the request of the data query sent by the user, the method further comprises:

and acquiring the identity information of the user, matching the identity information with a white list with the authority of initiating a data query request, executing subsequent steps when the matching is successful, and rejecting the request and sending second preset prompt information when the matching is failed.

In order to achieve the above object, the present invention further provides a data query device based on semantic recognition, including:

an identification module: the voice recognition system comprises a voice recognition module, a voice recognition module and a voice recognition module, wherein the voice recognition module is used for receiving a data query request sent by a user, acquiring first voice data carried in the request and recognizing an intention result of the first voice data based on a pre-constructed intention template;

a judging module: the system is used for acquiring a target character set corresponding to the intention result and judging whether the target character set meets the preset condition for generating a target SQL statement or not;

the query module: and the target SQL sentence generation module is used for generating a target SQL sentence based on the target character set when the target character set meets the preset condition, inquiring a preset database according to the target SQL sentence to obtain target data, and feeding the target data back to the user.

In order to achieve the above object, the present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a program executable by the at least one processor to enable the at least one processor to perform any of the steps of the semantic identification based data query method as described above.

To achieve the above object, the present invention further provides a computer-readable storage medium, which stores a data query program based on semantic recognition, and when the data query program based on semantic recognition is executed by a processor, the computer-readable storage medium implements any of the steps of the data query method based on semantic recognition as described above.

According to the data query method, device, equipment and storage medium based on semantic recognition, the first voice data carried in the query request are obtained, the intention result of the first voice data is recognized according to the pre-constructed intention template, the target character set corresponding to the intention result is obtained, whether the target character set meets the condition for generating the target SQL statement or not is judged, when the condition is met, the target SQL statement is generated according to the target character set, the database is queried according to the target SQL statement, and the efficiency of user data query is improved.

Drawings

FIG. 1 is a flow chart diagram of a preferred embodiment of the data query method based on semantic recognition according to the present invention;

FIG. 2 is a block diagram of a preferred embodiment of a data query device based on semantic recognition according to the present invention;

FIG. 3 is a diagram of an electronic device according to a preferred embodiment of the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a data query method based on semantic recognition. Referring to fig. 1, a method flow diagram of an embodiment of the data query method based on semantic recognition according to the present invention is shown. The method may be performed by an electronic device, which may be implemented by software and/or hardware. The data query method based on semantic recognition comprises the following steps:

step S10: receiving a data query request sent by a user, acquiring first voice data carried in the request, and identifying an intention result of the first voice data based on a pre-constructed intention template.

In this embodiment, a scenario in which a user queries report data is taken as an example to describe the present solution, and it should be noted that a specific scenario of the present solution is not limited to querying some report data. When the user needs to query the report data, for example, the user queries the report data of a certain financial product, the user can open an application program for querying the report data on the terminal, and the related data needing to be queried is input in a voice mode through the application program. For example, after clicking a "report data query" virtual button on a human-computer interaction interface of an application program, a terminal prompts "please input specific information that you want to query in voice" on the interface, a user can initiate a request for querying report data to the terminal after inputting related voice, and after receiving a request for querying a data report sent by the user, the terminal analyzes the request to obtain first voice data carried in the request. The request may include voice data input by the user, and may also include a storage path and a unique identifier of the voice data. That is, the first voice data may be entered by the user together when submitting the data query request.

And then, recognizing semantic information of the first voice data according to a pre-constructed intention template, wherein the intention template can be an NLU template, and the NLU template is an abbreviation of Nature Language Understanding. The NLU template records the corresponding intentions of various sentence patterns. For example: # intent of thankyou

Thank you

Du you is

Thank you, etc., are the intentions of thankyou.

##intent:one_amb_rule

-how is the recent [ placeholder indicator ] (rule) case?

- [ placeholder indicator ] (rule) how can they be done?

-give me a look at [ placeholder indicator ] (rule) case

How recent [ placeholder indicator ] (rule)?

The middle brackets are placeholders, and can be related arbitrary scene words. For example, in the loan data sheet scenario, there may be overdue rates, bad rates, etc., and the parenthesis is the entity type of this placeholder, and these templates correspond to the "one _ amb _ rule" intent.

In one embodiment, the recognizing the intention result of the first speech data based on the pre-constructed intention template includes:

The semantic recognition model can be obtained by training according to a bert model, a Rasa framework can be adopted to train the semantic recognition model, the Rasa framework is an open-source machine learning framework and is used for constructing context AI assistants and chat robots, and the Rasa is embedded with large semantic recognition models such as bert and XLNet, so that the understanding accuracy of the model intention is greatly improved. After the semantic result of the first voice data is recognized, the recognized semantic result is matched with a plurality of pre-constructed intention templates, whether an intention template associated with the semantic result is matched or not is judged, and when the intention template is matched, intention information corresponding to the intention template is used as the intention result of the first voice data.

Further, the determining whether the intention template associated with the semantic result is matched comprises:

The intention information corresponding to the intention template with the largest similarity value is selected as the intention result of the first voice data, and the condition that the intention template obtains the intention result of the first voice data can be provided when the corresponding intention template is not matched.

Further, after the calculating the similarity value of the semantic result and the intention information corresponding to each intention template, the method further comprises:

When the first voice data of the user is not matched with the intention template, and the similarity between the semantic result and the intention corresponding to each intention template is smaller than a preset threshold, it indicates that the first voice data of the user is irrelevant to the intention of the intention template at this time, and prompt information, for example, "please re-input the query information by voice" may be fed back to the user at this time.

In one embodiment, before the receiving the request for the data query issued by the user, the method further comprises:

The method comprises the steps of matching identity information of a user with a white list which is provided with inquiry request permission and is in a preset database, wherein the white list can be a user list with an inquiry data report, when data matched with the user identity information exist in the white list, the user is considered to be provided with the permission for initiating a data inquiry request, when the data matched with the user identity information do not exist in the white list, the user is considered not to be provided with the permission, and preset prompt information is sent, and the preset prompt information can be 'no inquiry permission', and the like.

Step S20: and acquiring a target character set corresponding to the intention result, and judging whether the target character set meets the preset condition for generating a target SQL statement.

In this embodiment, a target character set corresponding to the intention result is obtained, and whether the target character set meets a preset condition for generating the SQL statement is determined, where the preset condition may be that characters corresponding to select, where, and groupby parts in the SQL statement exist in the target character set. The purpose of identifying the intention result is to fill SQL step by step in a slot filling mode to enable the SQL to become an executable SQL statement, wherein the SQL statement comprises select, from, where, groupby, orderby and the like, and the slot is of the entity types. The entity of Measure and rule type corresponds to a select aggregation function part, the dimension corresponds to a specific select column and a group part, and the filter and time number correspond to a where screening condition part.

It is necessary to identify which keywords in the first voice data of the user are related to the preset data table (column name, value, etc.), and time descriptors in the first voice data, such as "2020", "last three years", "last year", etc.

Classifying related keywords of a preset data table, and mapping the classified related keywords to the entity types of the intention template, wherein the SQL statement generated by the target character set of the intention result can determine which types of data in the data table correspond to the first voice data of the user, for example:

numeric column name-measure type in data table

Enumerated column name-dimension type in data table

Enumeration value-filter type in data table

Pointer-rule type

Time string-time type

Number type

The time type is irrelevant to a specific scene and can be identified through named entity identification, regular expressions and the like.

In one embodiment, after the determining whether the target character set satisfies the preset condition for generating the target SQL statement, the method further includes:

When the target character set corresponding to the first voice data of the user cannot generate a complete SQL statement, a target story line corresponding to the intention information is determined from a pre-constructed story line set, and the target character corresponding to the first voice data is identified according to the target story line. For example, the first speech data of the user is "how recent overdue condition? ", the target story line may be determined to be an overdue story line, rather than a bad story line.

Further, the determining a target story line corresponding to the intention information from a set of pre-constructed story lines includes:

Users often cannot clearly express own requirements, and may raise some fuzzy query problems to obtain a target character set corresponding to an intention result capable of generating an SQL statement. The user is continuously inquired and prompted through the story line, the user is guided to input correct inquiry voice from the fuzzy inquiry voice to the end, and the efficiency of user data inquiry is improved. For example:

the user: how did the overdue rate of the last six months?

Multi-turn conversation assistant: do you want to ask about the overdue rate of different products or the overdue rate of different agencies?

The user: different products.

Multi-turn conversation assistant: you are shown the overdue rates of the different products for nearly six months.

Step S30: and when the target character set meets the preset condition, generating a target SQL statement based on the target character set, querying a preset database according to the target SQL statement to obtain target data, and feeding the target data back to the user.

In this embodiment, when the condition for generating the SQL statement is satisfied, that is, when there are contents corresponding to the select, where and group parts of the SQL statement in the target character set, the contents corresponding to the characters of the target character set are filled to the relevant positions of the preset SQL template, for example, the contents corresponding to the rule type in the target character set are filled to the select position of the SQL template, the contents of the time type in the target character set are filled to the where position of the SQL template, and the contents of the dimension type in the target character set are filled to the group position of the SQL template, so as to generate an executable target SQL statement. After the target SQL sentence is generated, a preset database is inquired according to the target SQL sentence to obtain a target report, the preset database can be a local database of a financial institution or a third-party database, and then the target report is fed back to the user. For example:

the user: how did the product expire in the last six months?

Multi-turn conversation assistant: you are shown the overdue rates of the different products for nearly six months. And simultaneously outputting a report.

Referring to fig. 2, a functional module diagram of the data query apparatus 100 based on semantic recognition according to the present invention is shown.

The data query device 100 based on semantic recognition according to the present invention can be installed in an electronic device. According to the implemented functions, the data query device 100 based on semantic recognition may include a recognition module 110, a judgment module 120 and a query module 130. A module according to the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.

In the present embodiment, the functions regarding the respective modules/units are as follows:

the identification module 110 is configured to receive a request for data query sent by a user, acquire first voice data carried in the request, and identify an intention result of the first voice data based on a pre-constructed intention template.

Thank you

Du you is

Thank you, etc., are the intentions of thankyou.

##intent:one_amb_rule

-how is the recent [ placeholder indicator ] (rule) case?

- [ placeholder indicator ] (rule) how can they be done?

-give me a look at [ placeholder indicator ] (rule) case

How recent [ placeholder indicator ] (rule)?

Further, the identification module is further configured to:

In one embodiment, the identification module is further configured to:

The determining module 120 is configured to obtain a target character set corresponding to the intention result, and determine whether the target character set meets a preset condition for generating a target SQL statement.

In this embodiment, a target character set corresponding to the intention result is obtained, and whether the target character set meets a preset condition for generating the SQL statement is determined, where the preset condition may be that characters of the target character set correspond to contents of select, where, and groupby parts in the SQL statement, and the intention result is identified by gradually filling the SQL statement in a slot filling manner, so that the SQL statement becomes an executable SQL statement, where the SQL statement is composed of select, from, where, groupby, orderby, and the like, and the slot is of the above various entity types. The entity of Measure and rule type corresponds to a select aggregation function part, the dimension corresponds to a specific select column and a group part, and the filter and time number correspond to a where screening condition part.

numeric column name-measure type in data table

Enumerated column name-dimension type in data table

Enumeration value-filter type in data table

Pointer-rule type

Time string-time type

Number type

In one embodiment, the determining module is further configured to:

the user: how did the overdue rate of the last six months?

The user: different products.

The query module 130 is configured to generate a target SQL statement based on the target character set when the target character set meets the preset condition, query a preset database according to the target SQL statement to obtain target data, and feed the target data back to the user.

the user: how did the product expire in the last six months?

Fig. 3 is a schematic diagram of an electronic device 1 according to a preferred embodiment of the invention.

The electronic device 1 includes but is not limited to: memory 11, processor 12, display 13, and network interface 14. The electronic device 1 is connected to a network through a network interface 14 to obtain raw data. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System for Mobile communications (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), Wi-Fi, or a communication network.

The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 11 may be an internal storage unit of the electronic device 1, such as a hard disk or a memory of the electronic device 1. In other embodiments, the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like equipped with the electronic device 1. Of course, the memory 11 may also comprise both an internal memory unit and an external memory device of the electronic device 1. In this embodiment, the memory 11 is generally used for storing an operating system installed in the electronic device 1 and various types of application software, such as program codes of the data query program 10 based on semantic recognition. Further, the memory 11 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 12 is typically used for controlling the overall operation of the electronic device 1, such as performing data interaction or communication related control and processing. In this embodiment, the processor 12 is configured to execute the program code stored in the memory 11 or process data, for example, execute the program code of the data query program 10 based on semantic recognition.

The display 13 may be referred to as a display screen or display unit. In some embodiments, the display 13 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch screen, or the like. The display 13 is used for displaying information processed in the electronic device 1 and for displaying a visual work interface, e.g. displaying the results of data statistics.

The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), the network interface 14 typically being used for establishing a communication connection between the electronic device 1 and other electronic devices.

Fig. 3 only shows the electronic device 1 with components 11-14 and the semantic recognition based data query program 10, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.

Optionally, the electronic device 1 may further comprise a user interface, the user interface may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface may further comprise a standard wired interface and a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch screen, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

The electronic device 1 may further include a Radio Frequency (RF) circuit, a sensor, an audio circuit, and the like, which are not described in detail herein.

In the above embodiment, the processor 12 may implement the following steps when executing the data query program 10 based on semantic recognition stored in the memory 11:

The storage device may be the memory 11 of the electronic device 1, or may be another storage device communicatively connected to the electronic device 1.

For a detailed description of the above steps, please refer to the above description of fig. 2 regarding a functional block diagram of an embodiment of the data query apparatus 100 based on semantic recognition and fig. 1 regarding a flowchart of an embodiment of a data query method based on semantic recognition.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium may be non-volatile or volatile. The computer readable storage medium may be any one or any combination of hard disks, multimedia cards, SD cards, flash memory cards, SMCs, Read Only Memories (ROMs), Erasable Programmable Read Only Memories (EPROMs), portable compact disc read only memories (CD-ROMs), USB memories, etc. The computer readable storage medium comprises a storage data area and a storage program area, the storage data area stores data created according to the use of the blockchain nodes, the storage program area stores a data query program 10 based on semantic recognition, and when being executed by a processor, the data query program 10 based on semantic recognition realizes the following operations:

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned data query method based on semantic recognition, and will not be described herein again.

In another embodiment, in order to further ensure the privacy and security of all the appearing data, all the data may be stored in a node of a block chain. Such as the first voice data and the target data, which may be stored in block link points.

It should be noted that the blockchain in the present invention is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention essentially or contributing to the prior art can be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal device (such as a mobile phone, a computer, an electronic device, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A data query method based on semantic recognition is characterized by comprising the following steps:

2. The semantic recognition based data query method of claim 1, wherein the recognizing the intention result of the first speech data based on the pre-constructed intention template comprises:

3. The semantic recognition based data query method of claim 2, wherein the determining whether the intent template associated with the semantic result is matched comprises:

4. The semantic recognition based data query method according to claim 3, wherein after the separately calculating the similarity value of the semantic result and the intention information corresponding to each intention template, the method further comprises:

5. The semantic recognition based data query method according to claim 1, wherein after the determining whether the target character set satisfies a preset condition for generating a target SQL statement, the method further comprises:

6. The semantic recognition-based data query method of claim 5, wherein the determining a target storyline corresponding to the intention information from a set of pre-constructed storylines comprises:

7. The semantic identification-based data query method of any one of claims 1 to 6, wherein prior to the receiving of the request for the data query from the user, the method further comprises:

8. A data query device based on semantic recognition, the device comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores a program executable by the at least one processor to enable the at least one processor to perform the semantic identification based data query method of any one of claims 1 to 7.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a data query program based on semantic recognition, and when the data query program based on semantic recognition is executed by a processor, the steps of the data query method based on semantic recognition according to any one of claims 1 to 7 are implemented.