WO2021120631A1

WO2021120631A1 - Intelligent interaction method and apparatus, and electronic device and storage medium

Info

Publication number: WO2021120631A1
Application number: PCT/CN2020/105636
Authority: WO
Inventors: 刘璐; 臧磊
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-12-19
Filing date: 2020-07-29
Publication date: 2021-06-24
Also published as: CN111223485A

Abstract

An intelligent interaction method and apparatus, and an electronic device and a storage medium. The intelligent interaction method comprises: an intelligent voice assistant acquiring sound information of a user (S1); verifying the identity of the user according to the sound information (S2); after the identity verification of the user is passed, the intelligent voice assistant starting an open domain dialogue, and identifying a user intention according to the open domain dialogue (S3); determining a service level according to the user intention (S4); conducting a closed domain dialogue according to the service level, and identifying key information in the closed domain dialogue (S5); acquiring a slot position value according to the key information, and filling a slot position (S6); and when the filled slot position meets a threshold value, executing an operation corresponding to the user intention (S7). In the method, a secure dialogue with a user can be conducted by means of an intelligent voice assistant, and an operation is executed after a dialogue intention is identified.

Description

Intelligent interaction method, device, electronic equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 19, 2019, the application number is 201911319401.2, and the invention title is "Intelligent Interaction Method, Device, Electronic Equipment and Storage Medium", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of computer technology, in particular to an intelligent interaction method, device, electronic equipment and storage medium.

Background technique

With the development of the artificial intelligence industry, intelligent voice assistants have also become a relatively mature field of artificial intelligence system applications. In the prior art, smart voice assistants are usually applied to mobile terminals. Users can use the voice assistant function of the mobile terminal to interact with the machine assistant, so that the machine assistant can perform various operations on the mobile terminal under the user's voice control. However, the inventor realizes that the intent recognition accuracy of existing intelligent voice assistants is low, which makes human-computer interaction fluency poor.

Summary of the invention

In view of the above content, it is necessary to propose an intelligent interaction method, device, electronic device, and storage medium, which can safely dialogue with the user through the intelligent voice assistant, and perform operations after accurately identifying the dialogue intention.

The first aspect of the present application provides an intelligent interaction method, wherein the intelligent interaction method includes:

The intelligent voice assistant obtains the user's voice information;

Verify the user identity according to the voice information;

After the user's identity is verified, the intelligent voice assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue;

Determine the service level according to the user's intention;

Conducting a closed domain dialogue according to the business level, and identifying key information in the closed domain dialogue;

Obtain the slot value according to the key information and fill the slot; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.

A second aspect of the present application is an intelligent interaction device, wherein the intelligent interaction device includes:

The acquisition module is used to acquire the user's voice information through the intelligent voice assistant;

The verification module is used to verify the user's identity according to the voice information;

The recognition module is used to start the open domain dialogue after the user identity verification is passed by the intelligent voice assistant, and identify the user's intention according to the open domain dialogue;

The determining module is used to determine the service level according to the user's intention;

The identification module is further configured to conduct a closed domain dialogue according to the service level and identify key information in the closed domain dialogue;

The obtaining module is also used to obtain the slot value according to the key information and fill the slot; and

The execution module is used to execute the operation corresponding to the user's intention when the filled slot meets the threshold.

A third aspect of the present application provides an electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:

The intelligent voice assistant obtains the user's voice information;

Verify the user identity according to the voice information;

Determine the service level according to the user's intention;

Obtain the slot value according to the key information and fill the slot; and

A fourth aspect of the present application provides a computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:

The intelligent voice assistant obtains the user's voice information;

Verify the user identity according to the voice information;

Determine the service level according to the user's intention;

Obtain the slot value according to the key information and fill the slot; and

In summary, the intelligent interaction method, device, electronic equipment and storage medium described in this application. In the field of artificial intelligence, the intelligent voice assistant starts an open domain dialogue after the user identity verification is passed, recognizes the user's intention according to the open domain dialogue, determines the service level according to the user's intention, and the voice assistant performs the operation according to the service level Closed-domain dialogue, and identify key information in the closed-domain dialogue, obtain the slot value according to the key information and fill the slot, and when the filled slot meets the threshold, perform the operation corresponding to the user's intention. This application can accurately identify the user's intention, and after entering the closed domain dialogue, enter the first-level business interface according to the user's intention, and conduct question-and-answer communication in the first-level business interface, perform tasks more intelligently, and have more interactive human-computer communication. high.

In addition, the present application can handle the situation where the voice command corresponding to the user's intention includes multiple level services and multiple services of different levels, and can guide the user to operate when the user is not clear until the entire closed-loop operation is completed.

Description of the drawings

Fig. 1 is a flowchart of an intelligent interaction method provided in Embodiment 1 of the present application.

Fig. 2 is a functional module diagram of the intelligent interaction device provided in the second embodiment of the present application.

Fig. 3 is a schematic diagram of an electronic device provided in a third embodiment of the present application.

The following specific embodiments will further illustrate this application in conjunction with the above-mentioned drawings.

Detailed ways

In order to be able to understand the above objectives, features and advantages of the application more clearly, the application will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the application and the features in the embodiments can be combined with each other if there is no conflict.

In the following description, many specific details are set forth in order to fully understand the present application. The described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the specification of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.

Example one

In this embodiment, the intelligent interaction method can be applied to electronic equipment. For electronic devices that require intelligent interaction, the intelligent interaction function provided by the method of the present application can be directly integrated on the electronic device, or a client for implementing the method of the present application can be installed. For another example, the method provided in this application can also be run on servers and other devices in the form of a Software Development Kit (SDK), and provide interfaces for intelligent interactive functions in the form of SDK. Electronic devices or other devices provide The interface can realize the intelligent interactive function.

As shown in Fig. 1, the flow chart of the intelligent interaction method. According to different requirements, the execution sequence in the flowchart can be changed, and some steps can be omitted.

Step S1, the intelligent voice assistant obtains the user's voice information.

In this embodiment, the intelligent interaction method is applied to an intelligent voice assistant, and the intelligent voice assistant may be a bank intelligent voice assistant. When the user handles related business in the bank, he can directly interact with the bank's intelligent voice assistant. The intelligent voice assistant receives the voice of the user through a microphone, so that the user can be identified and the banking service can be processed according to the user's intention.

For example, when the user needs to perform an operation of querying the account balance, the smart voice assistant can be awakened first, and the user's voice information can be obtained when the smart voice assistant is awakened.

Step S2, verify the user's identity according to the voice information.

In this embodiment, the voiceprint feature in the voice information is extracted; the extracted voiceprint feature is matched with the pre-built voiceprint model; when the extracted voiceprint feature matches the pre-built voiceprint model, confirm The user identity verification is passed; when the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user identity verification is not passed.

Specifically, the step of identifying the user's identity according to the voice information includes: the voiceprint registration stage, inputting the user's voice sample into the system, extracting the Mel frequency cepstral coefficient (MFCC) of the user's voice information, and then using the resnet+ghostvlad network Perform end-to-end training to obtain the voiceprint features in the user’s voice information, and build the user’s voiceprint model; in the voiceprint authentication phase, when the user wakes up the smart voice assistant through a wake-up word in the near field, the smart voice assistant acquires the user Voice information. The voiceprint feature in the voice information is extracted, and the extracted voiceprint feature is matched with the constructed voiceprint model to verify the user's identity. When the extracted voiceprint feature matches the constructed voiceprint model, it is confirmed that the user is a legitimate user; when the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user is illegal.

In another embodiment, in the voiceprint registration stage, user voice samples can be input into the system, the frequency spectrum of the user voice signal can be extracted through short-time Fourier transform, and then the resnet+ghostvlad network is used for end-to-end training , Get the voiceprint features in the user's voice signal, and build the user's voiceprint model.

For example, when the user uses a wake-up word and ten digits from 0-9 as a voice sample in the near field, the voiceprint feature in the voice sample is extracted. The user's voiceprint model can be constructed during the voiceprint registration stage. In this way, the user can determine whether the user is a legitimate user according to the digital pronunciation specified by the intelligent voice assistant during identity verification. In this way, the accuracy of authentication can be effectively improved, and it can also prevent someone from recording fraud in advance and improve security.

Preferably, the method further includes: when the number of times the extracted voiceprint feature does not match the constructed voiceprint model is greater than or equal to a preset number of times (for example, 3 times), turning on the password verification function.

Step S3: After the user's identity is verified, the intelligent voice assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue.

In this embodiment, after the user's identity is verified, the intelligent voice assistant starts an open domain dialogue, converts the voice information in the open domain dialogue into text, and then performs intent recognition.

In this embodiment, a combined model of intention recognition and slot filling may be used to identify the user's intention in the open domain dialogue. Specifically, the intention recognition and slot filling joint model includes three layers, the first layer is one-hot encoding of question text; the second layer is a network structure combined by BLSTM and CNN, and the language learning is used to share semantic information and intent information. Characterization; the third layer is the CRF layer, which decodes the shared characterization, and uses a unified loss function to jointly learn the intent recognition task and the slot filling task. The intent recognition and slot filling joint model performs one-hot encoding of the questions in the open-domain dialogue to obtain a sentence vector, inputs the sentence vector into the BLSTM model to obtain a new sequence vector representation, and then passes it through CNN The model processing obtains the feature vector, and splices the feature vector and the sequence vector to obtain an output vector. The output vector is fed to the CRF layer, and the optimal tag sequence is decoded jointly, and the tag sequence is represented by associating each character wt in the question u with the BIO tag. Among them, BIO means start (begin), continue (in) and other (out) respectively. The input label X is represented as w1, w2...wn, and the output label Y is represented as s1, s2...sn. For the combined model of intent recognition and slot filling, an extra label is added at the end of the input question, and the intent information mark is connected to the end of the output label to obtain new input label and output label. The hidden layer at the end of the model contains the latent semantic representation of the entire input question, so that it can be used to identify the intent of the question.

In other embodiments, one or more of the intent recognition method based on rule template, the intent recognition method based on statistical feature classification, the intent recognition method based on word vector, the intent recognition method based on convolutional neural network, etc. can be adopted. This combination is used to identify the user's intention in the open domain dialogue, which will not be repeated here.

Step S4: Determine the service level according to the user's intention.

In this embodiment, the business level is determined by querying the pre-established association table of intention and business level. In the association table, the corresponding relationship between intent and business level can be established according to the business logic of the application field and the knowledge base of the field. For example, in the banking application field, the corresponding relationship between intent and business level can be established based on the business logic in the banking field and the knowledge base in the banking field.

For example, the first-level business includes credit card business, payment business and loan business. The secondary business corresponding to the credit card business includes consumption bills, repayment amount and repayment date, etc.; secondary business corresponding to the payment business includes payment of electricity bills, gas bills, and payment of telephone bills; secondary business corresponding to loan business Including fast credit, cash credit and smart credit.

In this embodiment, the first-level service is strongly related to the user's intention.

Step S5: Conduct a closed domain dialogue according to the business level, and identify key information in the closed domain dialogue.

In this embodiment, the closed domain dialogue refers to a dialogue conducted to clarify the user's purpose (or called clarifying task details) after recognizing the user's intention. The key information is information extracted from lower-level services when in the closed domain dialogue. For example, if only the first-level service information is received, the voice assistant broadcasts information according to the second-level service information corresponding to the received first-level service information to prompt the user what the second-level service needs to be performed.

For example, when the voice assistant receives the information that the primary business is "credit card", but does not receive other secondary business messages corresponding to the credit card, the voice assistant issues a voice prompt "Do you need to check the consumption bill? "Or "Do you need to check the repayment amount" or "Do you need to check the repayment date", etc. When the user hears the voice prompt and makes a reply, the intelligent voice assistant may determine the secondary service information according to the reply information. In this way, it is possible to enter the first-level service interface according to the user's intention, and conduct question-and-answer communication within the first-level service interface. In order to obtain the secondary business that users need to perform, so that the execution of tasks is more intelligent.

Step S6: Obtain the slot value according to the key information and fill the slot.

The slot filling refers to a process of completing information in order to transform the user's intention into a clear instruction of the user. In this embodiment, the slot value is acquired according to the key information, and then the slot is filled according to the slot value. For example, when the text information corresponding to the voice information collected by the voice assistant is "View the consumption bill in my credit card", the key information that can be obtained are: me, credit card, and consumption bill. Then the intelligent voice assistant will obtain the slot value according to the key information and fill the slot.

Step S7: When the filled slot meets the threshold, execute the operation corresponding to the user's intention.

In this embodiment, when the filled slot meets the threshold, the user's intention is converted into a voice instruction, and the intelligent voice assistant performs an operation according to the voice instruction. The threshold is related to the user's intention. For example, when the user intends to perform a transfer business, two parameters are required, namely the transfer account number and the transfer amount. Then, the corresponding threshold is also two. If any one of the two thresholds is not completed, the operation corresponding to the user's intention cannot be performed.

For example, when the text information corresponding to the voice information collected by the intelligent voice assistant is "Check my credit card", it can be recognized that the user's intention is: a credit card. The primary business corresponding to the credit card is a credit card business. Then the intelligent voice assistant will enter the closed domain of the credit card business to conduct a dialogue, extract the slot information according to the slot, and call the target interface. For example, the voice prompt "Are you going to check the credit card consumption bill, repayment amount, repayment date, or whether it is overdue", etc. When the voice assistant receives the user's reply "repayment amount", the intelligent voice assistant queries the user's credit card consumption status and responds to the user according to the query result. For example, the voice assistant broadcasts "You should repay 2033 yuan this month".

In addition, the intelligent voice assistant can also directly call the target service interface to obtain information or perform operations. For example, when the user's voice message is "Check how much I need to repay my credit card this month", the repayment balance in the secondary service interface is directly called to obtain the balance information. Then carry out the voice broadcast "You should repay 2033 yuan this month".

Preferably, when the intelligent voice assistant performs the operation corresponding to the user's intention, a prompt message will be issued for the user to confirm. For example, the intelligent voice assistant will play the task voice to the user for confirmation before execution, and after receiving the user's confirmation information, perform the operation corresponding to the user's intention. And whether the intelligent voice assistant executes successfully or not, the result is fed back to the user.

Preferably, the intelligent voice assistant stores information authorized by the user, and performs corresponding operations according to the authorized information and the recognized user's intention. Specifically, receiving user-authorized information and storing the authorized information, where the authorized information includes account information (for example, a gas account); after the service level is determined according to the user's intention, the domain is closed according to the service level Dialogue, and identify the key information in the closed domain dialogue; obtain the slot value according to the authorized information and the key information and fill the slot; and when the filled slot meets the threshold, execute the corresponding user intention Operation.

For example, when the smart voice assistant stores the information authorized by the user to help the family pay gas bills, the smart voice assistant has a memory function for the information authorized by the user, and can remember the authorized information defined by the user without requiring multiple times. ask. When the user says "help my mother-in-law pay the gas bill," the smart assistant recognizes the intention: pay. According to the user's intention, the secondary business is determined as: paying gas bills. The key information identified in the closed domain dialogue is "My mother-in-law (that is, the user who needs to pay), pay the gas bill." You can find your mother-in-law's gas account based on memory, and directly help the mother-in-law pay the gas bill in the application.

Preferably, when the voice command corresponding to the user's intention includes multiple parallel services, the execution order of the multiple parallel services is determined according to the closed domain dialogue, and the corresponding operations are executed according to the execution order.

When the voice command corresponding to the user's intention includes two parallel services, it is necessary to clarify which service interface operation the user wants to perform. For example, when the user says "Help me inquire about the credit card business and loan business", the intelligent voice assistant will prompt the user "Do you want to inquire about the credit card business or the loan business first?" After the business intention, the intelligent voice assistant first executes the credit card business inquiry, and then executes the loan business inquiry.

Preferably, when the voice command corresponding to the user's intention includes multiple services of different levels, the user is prompted to the upper-level service corresponding to the lowest-level service among the multiple different-level services, and then all the lower-level services included in the upper-level service Business for users to choose. Specifically, when the voice command corresponding to the user's intention includes multiple services of different levels, the lowest-level service among the multiple services of different levels is identified according to the intent and service level association table; and the lowest-level service is queried The upper-level business corresponding to the business; all the lower-level services included in the upper-level business are given for the user to choose.

For example, when the voice command corresponding to the user's intention includes two services at the upper and lower levels, the upper-level services in the two services are identified, and a prompt voice is issued to the user to clarify the second-level services included under the upper-level services, and pass The closed domain dialogue confirms the user's business needs. When the user says "Help me check the smart credit under the credit card business", the smart voice assistant will prompt the user "Are you inquiring about the smart credit under the loan business? There is no smart loan business under the credit card business". When the user confirms to inquire about the loan business, the intelligent voice assistant then presents all the lower-level services (such as fast credit, smart credit, and cash credit) under the loan business for the user to choose.

Preferably, when the filled slot does not meet the threshold, the intelligent voice assistant issues a voice prompt according to the missing slot value in the slot; when there are multiple missing slot values, the intelligent voice assistant performs voice prompts in sequence Prompt, and fill in the missing slot values in order according to the user's reply; start the task corresponding to the filled slot to perform the operation corresponding to the user's intention. In this way, when the filled slot does not meet the threshold, targeted questions can be asked based on the missing slot value in the slot. When there are multiple slots for clarification, you need to ask questions in order to ensure that the user's real slot value information is obtained, and it is convenient for the intelligent voice assistant to start the task corresponding to the slot.

For example, when the user says "I want to pay", because it is not clear what fee to pay and who is to pay the fee. Therefore, the filled slot does not meet the threshold. At this time, the intelligent voice assistant issues a voice prompt according to the missing slot value in the slot. For example, "What fee should I pay?" "Who pays for?" Due to the current two missing slot values, the intelligent voice assistant performs voice prompts in order. For example, the smart voice assistant voice prompts "what fee to pay", when receiving the user's reply "pay gas bill", the gas fee is filled into the missing slot value; the smart voice assistant continues to prompt voice "Who pays for", when the user replies to "pay for my mother-in-law", fill my mother-in-law with the missing slot value; start the task corresponding to the filled slot (that is, pay the gas bill for her mother-in-law) to find Mother-in-law’s gas account can be used to help mother-in-law pay for gas directly in the app.

In summary, the intelligent interaction method provided by this application includes: the intelligent voice assistant obtains the user's voice information; verifies the user's identity according to the voice information; when the user's identity is verified, the intelligent voice assistant initiates an open domain dialogue, The open domain dialogue identifies the user's intention; determines the service level according to the user's intention; conducts a closed domain dialogue according to the service level, and identifies key information in the closed domain dialogue; obtains and fills the slot value according to the key information Slot; and when the filled slot meets the threshold, perform the operation corresponding to the user's intention. This application joins the user's voiceprint recognition system. The bank's intelligent voice assistant is awakened and simultaneously obtains the user's voice information. After extracting the voiceprint features in the voice, the user's identity is determined. When the user controls its operation by voice, only voice verification is required. Additional verification operations are required to simplify the operation process and improve safety. This application can accurately identify the user's intention, and after entering the closed domain dialogue, enter the first-level business interface according to the user's intention, and conduct question-and-answer communication in the first-level business interface, perform tasks more intelligently, and have more interactive human-computer communication. high. In addition, the present application can handle the situation where the voice command corresponding to the user's intention includes multiple level services and multiple services of different levels, and can guide the user to operate when the user is not clear until the entire closed-loop operation is completed.

The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. For those of ordinary skill in the art, without departing from the creative concept of this application, they can also make Improvements, but these all belong to the scope of protection of this application.

The functional modules and hardware structure of the electronic device implementing the above-mentioned intelligent interaction method are respectively introduced below in conjunction with FIG. 2 and FIG. 3.

Example two

Fig. 2 is a diagram of functional modules in a preferred embodiment of the intelligent interaction device of this application.

In some embodiments, the smart interaction device 20 runs in an electronic device. The intelligent interaction device 20 may include multiple functional modules composed of program code segments. The program code of each program segment in the smart interaction device 20 may be stored in a memory and executed by at least one processor to perform smart interaction functions.

In this embodiment, the intelligent interaction device 20 can be divided into multiple functional modules according to the functions it performs. The functional modules may include: an acquisition module 201, a verification module 202, an identification module 203, a determination module 204, and an execution module 205. The module referred to in this application refers to a series of computer program segments that can be executed by at least one processor and can complete fixed functions, and are stored in a memory. In some embodiments, the functions of each module will be detailed in subsequent embodiments.

The acquisition module 201 is used to acquire user voice information through an intelligent voice assistant.

In this embodiment, the smart voice assistant may be a bank smart voice assistant. When the user handles related business in the bank, he can directly interact with the bank's intelligent voice assistant. The intelligent voice assistant receives the voice of the user through a microphone, so that the user can be identified and the banking service can be processed according to the user's intention.

The verification module 202 is configured to verify the user's identity according to the voice information.

In this embodiment, the verification module 202 is used to extract the voiceprint features in the voice information; match the extracted voiceprint features with a pre-built voiceprint model; when the extracted voiceprint features match the pre-built voiceprint model When the voiceprint model matches, it is confirmed that the user identity verification is passed; when the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user identity verification is not passed.

Specifically, the identification of the user's identity according to the voice information includes: the voiceprint registration stage, inputting the user's voice sample into the system, extracting the Mel frequency cepstral coefficient (MFCC) of the user's voice information, and then using the resnet+ghostvlad network to perform the terminal End-to-end training to obtain the voiceprint characteristics of the user's voice information, and build the user's voiceprint model; in the voiceprint authentication stage, when the user wakes up the intelligent voice assistant through a wake-up word in the near field, the intelligent voice assistant obtains the user's voice information . The voiceprint feature in the voice information is extracted, and the extracted voiceprint feature is matched with the constructed voiceprint model to verify the user's identity. When the extracted voiceprint feature matches the constructed voiceprint model, it is confirmed that the user is a legitimate user; when the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user is illegal.

Preferably, the intelligent interaction device may also: enable the password verification function when the number of times the extracted voiceprint feature does not match the constructed voiceprint model is greater than or equal to a preset number of times (for example, 3 times).

The recognition module 203 is used for the intelligent voice assistant to start an open domain dialogue after the user's identity verification is passed, and identify the user's intention according to the open domain dialogue.

In this embodiment, a combined model of intention recognition and slot filling may be used to identify the user's intention in the open domain dialogue. Specifically, the intention recognition and slot filling joint model includes three layers, the first layer is one-hot encoding of question text; the second layer is a network structure combined by BLSTM and CNN, and the language learning is used to share semantic information and intent information. Characterization; the third layer is the CRF layer, which decodes the shared characterization, and uses a unified loss function to jointly learn the intent recognition task and the slot filling task. The intent recognition and slot filling joint model performs one-hot encoding of the questions in the open-domain dialogue to obtain a sentence vector, inputs the sentence vector into the BLSTM model to obtain a new sequence vector representation, and then passes it through CNN The model processing obtains the feature vector, and splices the feature vector and the sequence vector to obtain an output vector. The output vector is fed to the CRF layer, and the optimal tag sequence is decoded jointly, and the tag sequence is represented by associating each character wt in the question u with the BIO tag. Among them, BIO means start (begin), continue (in) and other (out) respectively. The input label X is represented as w1, w2...wn, and the output label Y is represented as s1, s2...sn. For the combined model of intent recognition and slot filling, an extra label is added at the end of the input question, and the intent information sign is connected to the end of the output label to obtain new input label and output label. The hidden layer at the end of the model contains the latent semantic representation of the entire input question, so that it can be used to identify the intent of the question.

The determining module 204 is configured to determine the service level according to the user's intention.

The identification module 203 is also used to conduct a closed domain dialogue according to the service level and identify key information in the closed domain dialogue.

The acquiring module 201 is further configured to acquire the slot value according to the key information and fill the slot.

The execution module 205 is configured to execute the operation corresponding to the user's intention when the filled slot meets the threshold.

For example, when the smart voice assistant stores the information authorized by the user to help the family pay gas bills, the smart voice assistant has a memory function for the information authorized by the user, and can remember the authorized information defined by the user without requiring multiple times. ask. When the user says "help my mother-in-law pay the gas bill," the smart assistant recognizes the intention: pay the bill. According to the user's intention, the secondary business is determined as: paying gas bills. The key information identified in the closed domain dialogue is "My mother-in-law (that is, the user who needs to pay), pay the gas bill." You can find your mother-in-law's gas account based on memory, and directly help the mother-in-law pay the gas bill in the application.

In summary, the intelligent interaction device 20 provided by the present application includes an acquisition module 201, a verification module 202, an identification module 203, a determination module 204, and an execution module 205. The obtaining module 201 is used to obtain the user's voice information through the intelligent voice assistant; the verification module 202 is used to verify the user's identity according to the voice information; the recognition module 203 is used to, when the user's identity is verified, the smart voice The assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue; the determining module 204 is used to determine the service level according to the user's intention; the identification module 203 is also used to conduct a closed domain dialogue according to the service level, And identify the key information in the closed domain dialogue; the acquisition module 201 is also used to acquire the slot value according to the key information and fill the slot; and the execution module 205 is used to when the filled slot meets the threshold To perform the operation corresponding to the user's intention. This application joins the user's voiceprint recognition system. The bank's intelligent voice assistant is awakened and simultaneously obtains the user's voice information. After extracting the voiceprint features in the voice, the user's identity is determined. When the user controls its operation by voice, only voice verification is required. Additional verification operations are required to simplify the operation process and improve safety. This application can accurately identify the user's intention, and after entering the closed domain dialogue, enter the first-level business interface according to the user's intention, and conduct question-and-answer communication in the first-level business interface, perform tasks more intelligently, and have more interactive human-computer communication. high. In addition, the present application can handle the situation where the voice command corresponding to the user's intention includes multiple level services and multiple services of different levels, and can guide the user to operate when the user is not clear until the entire closed-loop operation is completed.

The above-mentioned integrated unit implemented in the form of a software function module may be stored in a computer readable storage medium. The above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which can be a personal computer, a dual-screen device, or a network device, etc.) or a processor to execute the various embodiments of this application. Part of the method.

FIG. 3 is a schematic diagram of the electronic device provided in the third embodiment of the application.

The electronic device 3 includes a memory 31, at least one processor 32, a computer program 33 stored in the memory 31 and running on the at least one processor 32, at least one communication bus 34 and a database 35.

When the at least one processor 32 executes the computer program 33, the steps in the foregoing embodiment of the intelligent interaction method are implemented.

Exemplarily, the computer program 33 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 31 and executed by the at least one processor 32, To complete this application. The one or more modules/units may be a series of computer-readable instructions capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 33 in the electronic device 3.

The electronic device 3 may be a mobile phone, a tablet computer, a personal digital assistant (Personal Digital Assistant, PDA) and other devices installed with applications. Those skilled in the art can understand that the schematic diagram 3 is only an example of the electronic device 3, and does not constitute a limitation on the electronic device 3. It may include more or less components than those shown in the figure, or combine certain components, or be different. For example, the electronic device 3 may also include input and output devices, network access devices, buses, and so on.

The at least one processor 32 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more central processing units. (Central Processing unit, CPU), a combination of microprocessors, digital processing chips, graphics processors, and various control chips. The at least one processor 32 is the control core (Control Unit) of the electronic device 3, which uses various interfaces and lines to connect the various components of the entire electronic device 3, and runs or executes programs stored in the memory 31 or Modules, and call data stored in the memory 31 to perform various functions of the electronic device 3 and process data, for example, perform smart interaction functions.

The memory 31 is used to store computer-readable instructions and various data, such as the intelligent interactive device 20 installed in the electronic device 3, and realizes high-speed and automatic completion of programs or data during the operation of the electronic device 3 Access. The memory 31 includes volatile and non-volatile memory, such as random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), and programmable read-only memory (Programmable Read-Only). Memory, PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electronic Erasable Programmable Read-Only Memory, OTPROM Read memory (Electrically-Erasable Programmable Read-Only Memory, EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage, magnetic disk storage, tape storage, or other data that can be used to carry or store data The computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile.

The memory 31 stores program codes, and the at least one processor 32 can call the program codes stored in the memory 31 to perform related functions. For example, the modules (acquisition module 201, verification module 202, identification module 203, determination module 204, and execution module 205) described in FIG. 2 are program codes stored in the memory 31 and processed by the at least one Executed by the device 32, so as to realize the functions of the various modules to achieve the purpose of intelligent interaction.

The obtaining module 201 is used to obtain the user's voice information through the intelligent voice assistant; the verification module 202 is used to verify the user's identity according to the voice information; the recognition module 203 is used to, when the user's identity is verified, the smart voice The assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue; the determining module 204 is used to determine the service level according to the user's intention; the identification module 203 is also used to conduct a closed domain dialogue according to the service level, And identify the key information in the closed domain dialogue; the acquisition module 201 is also used to acquire the slot value and fill the slot according to the key information; and the execution module 205 is used to when the filled slot meets the threshold To perform the operation corresponding to the user's intention.

The database (Database) 35 is a warehouse built on the electronic device 3 for organizing, storing and managing data according to a data structure. Databases are usually divided into three types: hierarchical database, network database and relational database. In this embodiment, the database 35 is used to store user voice information and the like.

In the several embodiments provided in this application, it should be understood that the disclosed device and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.

The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other elements or the singular number does not exclude the plural number. Multiple units or devices stated in the device claims can also be implemented by one unit or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the application and not to limit them. Although the application has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the application can be Make modifications or equivalent replacements without departing from the spirit and scope of the technical solution of the present application.

Claims

An intelligent interaction method, wherein the intelligent interaction method includes:

The intelligent voice assistant obtains the user's voice information;

Verify the user identity according to the voice information;

After the user's identity is verified, the intelligent voice assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue;

Determine the service level according to the user's intention;

Conducting a closed domain dialogue according to the business level, and identifying key information in the closed domain dialogue;

Obtain the slot value according to the key information and fill the slot; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.
The intelligent interaction method according to claim 1, wherein the step of verifying the user's identity according to the voice information comprises:

Extracting voiceprint features in the voice information;

Match the extracted voiceprint features with the pre-built voiceprint model;

When the extracted voiceprint features match the pre-built voiceprint model, confirm that the user identity verification is passed;

When the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user identity verification fails.
The intelligent interaction method of claim 1, wherein the business level is determined by querying a pre-established association table of intention and business level, wherein the association table of intention and business level is based on the business logic of the application domain and the The corresponding relationship between the intention and business level established by the knowledge base in the application domain.
The intelligent interaction method according to claim 1, wherein the method further comprises:

Receiving user-authorized information and storing the authorized information, where the authorized information includes account information;

After the service level is determined according to the user's intention, a closed domain dialogue is conducted according to the service level, and key information in the closed domain dialogue is identified;

Obtain the slot value and fill the slot according to the authorized information and the key information; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.
The intelligent interaction method of claim 1, wherein when the voice instruction corresponding to the user's intention includes multiple parallel services, the execution sequence of the multiple parallel services is determined according to the closed domain dialogue, and the execution sequence of the multiple parallel services is determined according to the closed domain dialogue. Perform the corresponding operations in the order of execution.
The intelligent interaction method of claim 3, wherein the method further comprises:

When the voice command corresponding to the user's intention includes multiple services of different levels, identifying the lowest level service among the multiple services of different levels according to the intent and service level association table;

Query the upper-level business corresponding to the lowest-level business;

All the lower-level services included in the upper-level service are given for the user to choose.
The intelligent interaction method according to claim 1, wherein the method further comprises:

When the filled slot does not meet the threshold, the intelligent voice assistant issues a voice prompt according to the missing slot value in the slot;

When there are multiple missing slot values, the intelligent voice assistant performs voice prompts in order, and fills in the missing slot values in order according to the user's reply;

The task corresponding to the filled slot is started to execute the operation corresponding to the user's intention.
An intelligent interactive device, wherein the intelligent interactive device includes:

The acquisition module is used to acquire the user's voice information through the intelligent voice assistant;

The verification module is used to verify the user's identity according to the voice information;

The recognition module is used to start the open domain dialogue after the user identity verification is passed by the intelligent voice assistant, and identify the user's intention according to the open domain dialogue;

The determining module is used to determine the service level according to the user's intention;

The identification module is further configured to conduct a closed domain dialogue according to the service level and identify key information in the closed domain dialogue;

The obtaining module is also used to obtain the slot value according to the key information and fill the slot; and

The execution module is used to execute the operation corresponding to the user's intention when the filled slot meets the threshold.
An electronic device, wherein the electronic device includes a processor, and the processor is configured to execute computer-readable instructions stored in a memory to implement the following steps:

The intelligent voice assistant obtains the user's voice information;

Verify the user identity according to the voice information;

After the user's identity is verified, the intelligent voice assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue;

Determine the service level according to the user's intention;

Conducting a closed domain dialogue according to the business level, and identifying key information in the closed domain dialogue;

Obtain the slot value according to the key information and fill the slot; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.
9. The electronic device according to claim 9, wherein when the processor executes the computer-readable instructions to implement the authentication of the user identity according to the voice information, it specifically comprises:

Extracting voiceprint features in the voice information;

Match the extracted voiceprint features with the pre-built voiceprint model;

When the extracted voiceprint features match the pre-built voiceprint model, confirm that the user identity verification is passed;

When the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user identity verification fails.
9. The electronic device according to claim 9, wherein, when the processor executes the computer-readable instructions to implement the determination of the service level according to the user's intention, it specifically comprises:

The business level is determined by querying a pre-established association table of intent and business level, where the association table of intent and business level corresponds to the intent and business level established according to the business logic of the application field and the knowledge base of the application field relationship.
9. The electronic device of claim 9, wherein the processor executing the computer-readable instructions is further used to implement the following steps:

Receiving user-authorized information and storing the authorized information, where the authorized information includes account information;

After the service level is determined according to the user's intention, a closed domain dialogue is conducted according to the service level, and key information in the closed domain dialogue is identified;

Obtain the slot value and fill the slot according to the authorized information and the key information; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.
9. The electronic device of claim 9, wherein the processor executes the computer-readable instructions to implement the operation corresponding to the user's intention when the filled slot meets a threshold, which specifically includes:

When the voice command corresponding to the user's intention includes multiple parallel services, the execution sequence of the multiple parallel services is determined according to the closed domain dialogue, and the corresponding operation is performed according to the execution sequence.
11. The electronic device of claim 11, wherein the processor executing the computer-readable instructions is further used to implement the following steps:

When the voice command corresponding to the user's intention includes multiple services of different levels, identifying the lowest level service among the multiple services of different levels according to the association table of intentions and service levels;

Query the upper-level business corresponding to the lowest-level business;

All the lower-level services included in the upper-level service are given for the user to choose.
9. The electronic device of claim 9, wherein the processor executing the computer-readable instructions is further used to implement the following steps:

When the filled slot does not meet the threshold, the intelligent voice assistant issues a voice prompt according to the missing slot value in the slot;

When there are multiple missing slot values, the intelligent voice assistant performs voice prompts in order, and fills in the missing slot values in order according to the user's reply;

The task corresponding to the filled slot is started to execute the operation corresponding to the user's intention.
A computer-readable storage medium having computer-readable instructions stored thereon, wherein the computer-readable instructions implement the following steps when executed by a processor:

The intelligent voice assistant obtains the user's voice information;

Verify the user identity according to the voice information;

After the user's identity is verified, the intelligent voice assistant starts an open domain dialogue, and recognizes the user's intention according to the open domain dialogue;

Determine the service level according to the user's intention;

Conducting a closed domain dialogue according to the business level, and identifying key information in the closed domain dialogue;

Obtain the slot value according to the key information and fill the slot; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.
15. The computer-readable storage medium according to claim 16, wherein, when the computer-readable instructions are executed by the processor to implement the verification of the user identity based on the voice information, it specifically comprises:

Extracting voiceprint features in the voice information;

Match the extracted voiceprint features with the pre-built voiceprint model;

When the extracted voiceprint features match the pre-built voiceprint model, confirm that the user identity verification is passed;

When the extracted voiceprint feature does not match the constructed voiceprint model, it is confirmed that the user identity verification fails.
15. The computer-readable storage medium according to claim 16, wherein, when the computer-readable instructions are executed by the processor to implement the determination of the service level according to the user's intention, it specifically comprises:

The business level is determined by querying a pre-established association table of intent and business level, where the association table of intent and business level corresponds to the intent and business level established according to the business logic of the application field and the knowledge base of the application field relationship.
16. The computer-readable storage medium of claim 16, wherein the computer-readable instructions are executed by the processor to further implement the following steps:

Receiving user-authorized information and storing the authorized information, where the authorized information includes account information;

After the service level is determined according to the user's intention, a closed domain dialogue is conducted according to the service level, and key information in the closed domain dialogue is identified;

Obtain the slot value and fill the slot according to the authorized information and the key information; and

When the filled slot meets the threshold, the operation corresponding to the user's intention is performed.
18. The computer-readable storage medium of claim 18, wherein the computer-readable instructions are executed by the processor to further implement the following steps:

When the voice command corresponding to the user's intention includes multiple services of different levels, identifying the lowest level service among the multiple services of different levels according to the association table of intentions and service levels;

Query the upper-level business corresponding to the lowest-level business;

All the lower-level services included in the upper-level service are given for the user to choose.