CN111862966A

CN111862966A - Intelligent voice interaction method and related device

Info

Publication number: CN111862966A
Application number: CN201910779585.4A
Authority: CN
Inventors: 李宽; 熊彬; 权圣
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd; Mashang Consumer Finance Co Ltd
Priority date: 2019-08-22
Filing date: 2019-08-22
Publication date: 2020-10-30

Abstract

The application discloses an intelligent voice interaction method and a related device. The intelligent voice interaction method comprises the following steps: calling a first process engine interface to acquire voice information input by a user through the first process engine interface; calling a first process engine interface to identify the voice information through a first process engine to obtain identification information of the voice information; calling a second process engine interface to acquire a corresponding response event of the identification information through the second process engine interface; a corresponding response event of the identification information is executed. By the scheme, maintainability and reusability of the script file can be improved.

Description

Intelligent voice interaction method and related device

Technical Field

The present application relates to the field of intelligent voice technologies, and in particular, to an intelligent voice interaction method and a related device.

Background

The intelligent chat robot is widely applied in various industries, particularly service industries, so that various commercial and civil products including intelligent customer service, intelligent sound boxes, entertainment products and the like are derived. As an advanced form of the intelligent voice robot, the intelligent voice robot is more and more favored by the industry in a more natural and convenient voice interaction mode.

In view of the fact that intelligent voice interaction is mostly multi-turn conversation, programming languages such as Lua and Python need to be adopted to write script files so as to achieve management of business processes in conversation scenes. However, for complex conversation scenes such as operator customer service and e-commerce customer service, the service flow is relatively complex, so that the script file is large and bloated, and is not easy to maintain. In addition, since it is difficult for a single script file to manage a plurality of different dialog scenarios, a plurality of script files need to be set in order to cope with a plurality of different dialog scenarios, so that reusability is greatly reduced, and maintenance and management are not facilitated. In view of the above, how to improve maintainability and reusability of script files becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide an intelligent voice interaction method and a related device, which can improve maintainability and reusability of script files.

In order to solve the above problem, a first aspect of the present application provides an intelligent voice interaction method, including calling a first process engine interface, and acquiring voice information input by a user through the first process engine interface; calling a first process engine interface, and identifying the voice information through a first process engine to obtain identification information of the voice information; calling a second process engine interface, and acquiring a corresponding response event of the identification information through the second process engine interface; and executing the response event corresponding to the identification information.

In order to solve the above problems, a second aspect of the present application provides an intelligent voice interaction apparatus, including an obtaining module, a recognition module, a matching module, and an execution module, where the obtaining module is configured to call a first process engine interface, and obtain voice information input by a user through the first process engine interface; the recognition module is used for calling a first process engine interface, recognizing the voice information through the first process engine and obtaining recognition information of the voice information; the matching module is used for calling a second process engine interface and acquiring a corresponding response event of the identification information through the second process engine interface; the execution module is used for executing the corresponding response event of the identification information.

In order to solve the above problem, a third aspect of the present application provides an intelligent voice interaction device, comprising a memory and a processor coupled to each other; the processor is configured to execute the program instructions stored in the memory to implement the intelligent voice interaction method of the first aspect.

In order to solve the above problem, a fourth aspect of the present application provides a storage device, which stores program instructions capable of being executed by a processor, where the program instructions are used to implement the intelligent voice interaction method of the first aspect.

According to the scheme, a first process engine interface is called, voice information input by a user is obtained through the first process engine interface, the first process engine interface is called, the voice information is identified through the first process engine to obtain identification information of the voice information, then a second process engine interface is called, a corresponding response event of the identification information is obtained through the second process engine interface, and the response event corresponding to the identification information is executed, so that the voice information input by the user is identified, the corresponding response event of the identification information is handed to the corresponding process engine to be responsible for, the business process in a conversation scene is not required to be managed, only input/output of the information is required, and calling of various interfaces is required, so that the volume of script files is greatly reduced, and maintainability is improved; in addition, the script file is only responsible for inputting/outputting information and calling various interfaces, so that the reusability of various different conversation scenes is greatly improved.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an intelligent voice interaction method of the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of an intelligent voice interaction method of the present application;

FIG. 3 is a schematic flow chart diagram illustrating a method for intelligent voice interaction in accordance with another embodiment of the present application;

FIG. 4 is a block diagram of an embodiment of an intelligent voice interaction system based on the intelligent voice interaction method of the present application;

FIG. 5 is a block diagram of an embodiment of an intelligent voice interaction apparatus;

FIG. 6 is a block diagram of an embodiment of the intelligent voice interaction device;

FIG. 7 is a block diagram of an embodiment of a memory device according to the present application.

Detailed Description

The following describes in detail the embodiments of the present application with reference to the drawings attached hereto.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, interfaces, techniques, etc. in order to provide a thorough understanding of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of an intelligent voice interaction method according to the present application. Specifically, the method may include the steps of:

step S11: and calling a first process engine interface, and acquiring the voice information input by the user through the first process engine interface.

In this embodiment, the script file calls the first process engine interface, so that the first process engine acquires the voice information input by the user through the first process engine interface. In an implementation scenario, the first process engine includes an output interface and an input interface, where the input interface is used to obtain voice information input by a user, and the output interface, that is, the first process engine interface in this embodiment, is used to output the voice information to a script file communicatively connected to the first process engine, so that the script file obtains the voice information input by the user through the first process engine interface.

In an implementation scenario, according to a specific dialog scenario, if the number of the service flows is small, the first flow engine may also be not set, and the key information input by the user may be directly obtained through a preset interface, which is not specifically limited in this embodiment.

In an implementation scenario, in order to guide the user to perform voice interaction so as to gradually understand the intention of the user, the fourth process engine interface may be invoked to obtain the welcome dialog and play the welcome dialog before step S11. Dialogs are conversational modalities in the various flows of speech interaction. Taking the customer service of the operator as an example, when the customer service telephone is connected, welcome words are generally broadcasted, such as "respected customer, your good, mobile service request … …, broadband service request … …, manual service request … …, end hang-up request" to guide the user to feed back the user's needs.

Step S12: and calling a first process engine interface, and identifying the voice information through the first process engine to obtain the identification information of the voice information.

In this embodiment, the script file may further call a first process engine interface, so that the first process engine transfers the identification information obtained by identifying the voice information to the script file through the first process engine interface, and in an implementation scenario, the script file calls the first process engine interface, so that the first process engine directly identifies the voice information after acquiring the voice information input by the user, and obtains the identification information of the voice information; in another implementation scenario, after the script file acquires the voice information input by the user through the first process engine interface, the first process engine interface is called to enable the first process engine to identify the voice information, so as to obtain the identification information of the voice information, thereby improving the robustness of the system.

Identification information includes, but is not limited to: the voice information input by the user is converted into text information after the text information is converted into text, the user intention represented by the voice information input by the user, and the like, which is not specifically limited in this embodiment.

Step S13: and calling a second process engine interface, and acquiring a corresponding response event of the identification information through the second process engine interface.

In this embodiment, the script file further calls a second process engine interface, so that the second process engine determines, according to the identification information transmitted by the script file, a corresponding response event associated with the identification information, and transmits the corresponding response event to the script file through the second process engine interface. For example, in a customer service voice interaction scenario of an operator, after acquiring voice information "i want to check the call charge" input by a user, a first process engine recognizes that the intention of the user is "call charge", that is, the recognition information is "call charge", a script file transmits the recognition information to a second process engine, the second process engine can further request specific call charge information such as the call charge balance, the consumption details in the month and the like corresponding to the telephone number of the user from a server according to the recognition information, and then a second process engine interface transmits the call charge information to the script file. Other voice interaction scenarios are analogized, and the embodiment is not exemplified here.

Step S14: and executing the response event corresponding to the identification information.

Depending on the specific business process, the corresponding response event may be a word related to the identification information, or may be other actions such as hanging up. Still taking the above-mentioned service voice interaction example of the operator, the script file calls the second process engine interface to acquire the corresponding response event of the identification information as "telephone charge information", so that the telephone charge information can be played for the user to listen to. Or, the script file calls the second process engine interface to acquire the identification information, that is, the corresponding response event of "no other problem, i want to hang up" is "hang up", and then the voice interaction may be directly hung up, which is not limited in this embodiment.

According to the scheme, a first process engine interface is called, voice information input by a user is obtained through the first process engine interface, the first process engine interface is called to recognize the voice information through the first process engine to obtain recognition information of the voice information, then a second process engine interface is called, a response event corresponding to the recognition information is obtained through the second process engine interface, and the response event corresponding to the recognition information is executed, so that the voice information input by the user is recognized, the response event corresponding to the recognition information is handed to the corresponding process engine to be responsible for, the business process in a conversation scene is not required to be managed, only information input/output is required to be responsible, and various interfaces are called, so that the volume of script files is greatly reduced, and maintainability is improved; in addition, the script file is only responsible for inputting/outputting information and calling various interfaces, so that the reusability of various different conversation scenes is greatly improved.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an intelligent voice interaction method according to another embodiment of the present application. Fig. 2 is a schematic flowchart of an embodiment of the intelligent voice interaction method shown in fig. 1. The method comprises the following steps:

Step S21: and calling a first process engine interface, and acquiring the voice information input by the user through the first process engine interface.

Please refer to step S11.

Step S22: and calling a first process engine interface, and identifying the voice information through the first process engine to obtain text information converted from the voice information.

In this embodiment, the script file calls a first process engine interface to enable the first process engine to identify the acquired voice information, so that the voice information is converted into text information. In one implementation scenario, the script file may also invoke the first process engine interface to cause the first process engine to recognize the voice information, convert the voice information to text information, and further recognize the user's intent.

Please refer to step S12, which is not described herein again.

Step S23: and calling a second process engine interface, and acquiring a corresponding response dialog of the text message through the second process engine interface.

In this embodiment, the script file may further call the second process engine interface, so that the second process engine determines that the corresponding response session of the text message is returned to the script file through the second process engine interface. Still taking the customer service of the operator as an example, when the customer service telephone is connected, generally report "respected customer, your, mobile service request … …, broadband service request … …, manual service request … …, end call on hook", when the current seat cannot answer, "the seat is busy, please wait", when the conversation ends, generally report "thank you for incoming call, please call on hook", etc., broadcast the telephone art, guide the user to solve the problem step by step, or provide information for the user.

Please refer to step S13 above.

Step S24: and calling a third process engine interface, and playing the corresponding response dialogs of the text information through the third process engine.

And calling a third flow engine interface by the script file so as to enable the third flow engine to play a corresponding response dialog of the text information. In this embodiment, after the script file calls the second flow engine interface to obtain the corresponding response dialog of the text message, the corresponding response dialog is transmitted through the third flow engine interface, so that the third flow engine plays the corresponding response dialog.

In one implementation scenario, the script file may also invoke a third flow engine interface to cause the third flow engine to directly play audio corresponding to the response dialog. For example, according to the specific voice interaction scenario, the number of the service flows involved is small, the dialogues involved in the service flows can be recorded as audio in advance and stored in the intelligent voice interaction apparatus, in the step S23, the second flow engine interface may be invoked to obtain an ID (identity document) of the text information corresponding to the response dialogues, and in the step S24, the audio is retrieved from the memory of the intelligent voice interaction apparatus and played according to the ID; or, according to a plurality of service flows related to a specific voice interaction scenario, the dialogs related to the service flows may be recorded as audio in advance and stored in a server communicatively connected to the intelligent voice interaction apparatus, in the step S23, the second process engine interface may be invoked to obtain an ID (Identity Document) of the text information corresponding to the response dialogs through the second process engine interface, in this step S24, the server may be requested to obtain the corresponding audio according to the ID, and when the corresponding audio is downloaded from the server, the audio is played. The embodiment is not particularly limited herein.

Different from any of the above embodiments, in this embodiment, the script file calls the first process engine interface, the voice information input by the user is identified by the first process engine and converted into text information, so as to call the second process engine interface, the corresponding response dialogues of the text information are acquired by the second process engine interface, and the corresponding response dialogues are played by the third process engine interface, so that the corresponding response dialogues are acquired by calling various process engine interfaces according to the voice information input by the user, and further, the voice interaction with the user is realized.

Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an intelligent voice interaction method according to another embodiment of the present application. Fig. 3 is a schematic flowchart of another embodiment of the intelligent voice interaction method shown in fig. 1. Specifically, the method may include the steps of:

step S31: and calling a first process engine interface, and acquiring the voice information input by the user through the first process engine interface.

Please refer to step S21.

Step S32: and calling a first process engine interface, identifying the voice information through the first process engine, and converting the voice information into text information.

Please refer to step S22.

Step S33: and calling a second process engine interface, and acquiring a corresponding response dialog of the text message through the second process engine interface.

Please refer to step S23.

Step S34: and calling a third process engine interface, and playing the corresponding response dialogs of the text information through the third process engine.

Please refer to step S24.

Step S35: step S31 and subsequent steps are re-executed.

After the script file calls the third process engine interface to play the corresponding response dialog of the text message through the third process engine, the step of calling the first process engine interface to acquire the voice message input by the user through the first process engine interface and the subsequent steps can be executed again, so that after the corresponding response dialog is played, the feedback of the user on the dialog is acquired, and a new round of voice interaction is started. In an implementation scenario, when the intelligent voice interaction supports user interruption, the first process engine interface may be further invoked to obtain the voice information input by the user through the first process engine interface in the process executed in step S34, which is not limited in this embodiment.

Different from any of the above embodiments, in this embodiment, after the current round of voice interaction is finished, the voice information input by the user is continuously acquired, so that the next round of voice interaction is started, and the user can acquire the desired information to be understood in the process of multiple rounds of voice interaction.

Referring to fig. 4, fig. 4 is a schematic diagram of a framework of an embodiment of an intelligent voice interaction system based on the intelligent voice interaction method of the present application. As shown in fig. 4, the intelligent voice interaction system in this embodiment may be designed based on FreeSwitch, or may be designed based on other soft switch software, such as Asterisk. The relevant technical standards for Freeswitch and Asterisk are prior art in the field, and are not described herein in detail. The intelligent voice interaction system in this embodiment may include the script file, the first process engine, the second process engine, and the third process engine described in the above embodiments. The script file may be written based on programming languages such as Lua and Python, and the embodiment is not limited in this embodiment. The first process engine is an Automatic Speech Recognition (ASR) system, and is mainly used for receiving and recognizing Speech information input by a user and converting the Speech information into text information; the second process engine is mainly used for acquiring a corresponding response word operation according to the text information; the third flow engine is a Speech synthesis (Text To Speech, TTS) system, and is mainly used for playing corresponding response dialogues. In an implementation scenario, the intelligent voice interaction system may further include a fourth process engine, which is mainly configured to generate a welcome dialog corresponding to the interaction scenario when the voice interaction starts. The embodiment is not particularly limited herein.

Referring to fig. 4, the script file includes a first flow engine interface calling module, a second flow engine interface calling module, and a third flow engine interface calling module, where the first flow engine interface calling module is connected to the first flow engine, the second flow engine interface calling module is connected to the second flow engine, and the third flow engine interface calling module is connected to the third flow engine. The connections referred to in this embodiment are communication connections so that the modules may communicate related information between each other. In addition, the script file also comprises a voice input module and a voice output module. Specifically, when the intelligent voice interaction system works, the interaction between the modules of the script file and the interaction between each module and the process engine may include:

when voice interaction is triggered, for example, a customer service telephone is connected, etc., a fourth process engine (not shown) generates a welcome call, a fourth process engine interface calling module (not shown) of the script file transmits the welcome call to a third process engine interface calling module through a voice output module, so that the third engine is called to play the welcome call through the third process engine interface calling module;

The first process engine acquires voice information input by a user, the script file acquires the voice information through the first process engine interface calling module, and continues to acquire the recognition information of the first process engine on the voice information through the first process engine interface calling module, so that the recognition information is transmitted to the second process engine interface calling module through the voice input module until the recognition information is transmitted to the second process engine;

the second process engine obtains a corresponding response event of the identification information, such as: when the corresponding response event is on-hook, requires key confirmation of a user and the like, transmitting the corresponding response event to other operation modules so as to execute the corresponding response event; or, when the corresponding response event is the corresponding response session, the corresponding response session is transmitted to the voice output module through the second process engine interface calling module and is transmitted to the third process engine interface calling module through the voice output module until the corresponding response session is transmitted to the third process engine, so that the third process engine plays the corresponding response session.

After the third flow engine finishes playing the corresponding response session or when the intelligent voice interaction system supports user interruption, the script file re-executes the step of calling the first flow engine interface to acquire the voice information input by the user and the subsequent steps in the process of playing the corresponding response session by the third flow engine.

Therefore, different from the prior art, the script file is only responsible for input/output of information and calling of various interfaces, reusability of various different conversation scenes is greatly improved, the size of the script file is greatly reduced, maintainability is improved, and then the service can be promoted to quickly fall to the ground, maintenance threshold is reduced, and system stability is enhanced.

Referring to fig. 5, fig. 5 is a schematic diagram of an embodiment of an intelligent voice interaction device 50 according to the present application. Specifically, the intelligent voice interaction device 50 includes an obtaining module 51, a recognition module 52, a matching module 53 and an executing module 54, where the obtaining module 51 is configured to call a first process engine interface to obtain voice information input by a user through the first process engine interface, the recognition module 52 is configured to call the first process engine interface to recognize the voice information through the first process engine to obtain recognition information of the voice information, the matching module 53 is configured to call a second process engine interface to obtain a corresponding response event of the recognition information through the second process engine interface, and the executing module 54 is configured to execute the corresponding response event of the recognition information.

According to the scheme, the script file calls the first process engine interface to acquire the voice information input by a user through the first process engine interface, calls the first process engine interface to recognize the voice information through the first process engine to acquire the recognition information of the voice information, then calls the second process engine interface to acquire the corresponding response event of the recognition information through the second process engine interface, and executes the corresponding response event of the recognition information, so that the voice information input by the user is recognized, the corresponding response event of the recognition information is handed to the corresponding process engine, the business process in a conversation scene is not required to be managed, only the input/output of the information is required to be responsible, and the calling of various interfaces is required, so that the volume of the script file is greatly reduced, and the maintainability is improved; in addition, the script file is only responsible for inputting/outputting information and calling various interfaces, so that the reusability of various different conversation scenes is greatly improved.

In some embodiments, the recognition module 52 is configured to call a first process engine interface to recognize the voice message through the first process engine and convert the voice message into text message, the matching module 53 is configured to call a second process engine interface to obtain a corresponding response dialog of the text message through the second process engine interface, and the execution module 54 is configured to call a third process engine interface to play a corresponding response dialog of the text message through the third process engine.

In some embodiments, the intelligent voice interaction device 50 further includes a loop control module, configured to control the obtaining module 51, the recognition module 52, the matching module 53 and the execution module 54 to re-execute the steps of obtaining the voice information input by the user through the first process engine interface by calling the first process engine interface and the subsequent steps after the execution module 54 calls the third process engine interface to play the corresponding response dialog of the text information through the third process engine.

In some embodiments, the obtaining module 51 is configured to invoke an automatic speech recognition process engine interface to obtain speech information through the automatic speech recognition process engine interface, and the recognition module 52 is configured to invoke the automatic speech recognition process engine interface to recognize the speech information through the automatic speech recognition process engine to obtain recognition information of the speech information.

In some embodiments, execution module 54 is configured to invoke a speech synthesis system to play a corresponding responsive utterance of the recognition information through the speech synthesis system.

In some embodiments, execution module 54 is further configured to suspend the voice interaction.

In some embodiments, the obtaining module 51 is further configured to invoke a fourth process engine interface to obtain the welcome word through the fourth process engine interface, and play the welcome word.

Different from the prior art, the script file calls the first process engine interface to recognize the voice information input by the user through the first process engine interface and converts the voice information into text information, so that the second process engine interface is called to acquire the corresponding response dialogues of the text information through the second process engine interface, and the third process engine interface is called to play the corresponding response dialogues through the third process engine, so that the corresponding response dialogues can be acquired by calling various process engine interfaces according to the voice information input by the user, and further the voice interaction with the user is realized.

In addition, after the current round of voice interaction is finished, voice information input by the user is continuously acquired, so that the next round of voice interaction is started, and the user can acquire expected information to be known in the process of multiple rounds of voice interaction.

Referring to fig. 6, fig. 6 is a schematic diagram of a framework of an embodiment of an intelligent voice interaction device 60 according to the present application. The intelligent voice interaction device 60 comprises a memory 61 and a processor 62 coupled to each other, and the processor 62 is configured to execute program instructions stored in the memory 61 to implement the steps in any of the above-described embodiments of the intelligent voice interaction method.

In particular, the processor 62 is configured to control itself and the memory 61 to implement the steps in any of the above-described embodiments of the intelligent voice interaction method. The processor 62 may also be referred to as a CPU (Central Processing Unit). The processor 62 may be an integrated circuit chip having signal processing capabilities. The Processor 62 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be commonly implemented by a plurality of integrated circuit chips.

According to the scheme, the script file is only responsible for inputting/outputting information and calling various interfaces, reusability of various different conversation scenes is greatly improved, the size of the script file is greatly reduced, maintainability is improved, and then the quick landing of services is facilitated, the maintenance threshold is reduced, and the system stability is enhanced.

Referring to fig. 7, fig. 7 is a schematic diagram of a memory device 70 according to an embodiment of the present application. The memory device 70 stores program instructions 71 capable of being executed by the processor, the program instructions 71 being used for implementing the intelligent voice interaction method in any of the above embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. An intelligent voice interaction method, comprising:

calling a first process engine interface, and acquiring voice information input by a user through the first process engine interface; and the number of the first and second groups,

calling the first process engine interface, and identifying the voice information through a first process engine to obtain identification information of the voice information;

calling a second process engine interface, and acquiring a response event corresponding to the identification information through the second process engine interface;

and executing the response event corresponding to the identification information.

2. The intelligent voice interaction method according to claim 1, wherein the step of calling the first process engine interface to recognize the voice message through the first process engine to obtain the recognition information of the voice message comprises:

calling the first flow engine interface, identifying the voice information through the first flow engine, and converting the voice information into text information;

the step of calling a second process engine interface and acquiring the corresponding response event of the identification information through the second process engine interface comprises the following steps:

calling the second process engine interface, and acquiring a response dialect corresponding to the text information through the second process engine interface;

The step of executing the response event corresponding to the identification information comprises:

and calling a third flow engine interface, and playing a response dialog corresponding to the text message through the third flow engine.

3. The intelligent voice interaction method according to claim 2, wherein the step of calling a third flow engine interface and playing a response dialog corresponding to the text message through the third flow engine includes:

and re-executing the calling first process engine interface, and acquiring voice information input by a user through the first process engine interface and subsequent steps.

4. The intelligent voice interaction method according to claim 1 or 2, wherein the step of calling a first process engine interface, and acquiring the voice information input by the user through the first process engine interface comprises:

calling an automatic voice recognition process engine interface, and acquiring the voice information through the automatic voice recognition process engine interface;

the step of calling the first process engine interface, recognizing the voice information through the first process engine, and obtaining the recognition information of the voice information comprises:

and calling the automatic voice recognition process engine interface, and recognizing the voice information through the automatic voice recognition process engine to obtain the recognition information of the voice information.

5. The intelligent voice interaction method according to claim 1 or 2, wherein the step of executing the response event corresponding to the identification information comprises:

and calling a voice synthesis system, and playing the response dialogs corresponding to the identification information through the voice synthesis system.

6. The intelligent voice interaction method according to claim 1, wherein the step of executing the response event corresponding to the identification information comprises:

and hanging up the voice interaction.

7. The intelligent voice interaction method according to claim 1, wherein the step of calling the first process engine interface and acquiring the voice information input by the user through the first process engine interface further comprises:

and calling a fourth process engine interface, acquiring the welcome call through the fourth process engine interface and playing the welcome call.

8. An intelligent voice interaction device, comprising:

the acquisition module is used for calling a first flow engine interface and acquiring voice information input by a user through the first flow engine interface;

the recognition module is used for calling the first flow engine interface, recognizing the voice information through the first flow engine and obtaining the recognition information of the voice information;

The matching module is used for calling a second flow engine interface and acquiring a corresponding response event of the identification information through the second flow engine interface;

and the execution module is used for executing the response event corresponding to the identification information.

9. An intelligent voice interaction device, comprising a memory and a processor coupled to each other;

the processor is configured to execute the program instructions stored by the memory to implement the method of any of claims 1 to 7.

10. A storage device storing program instructions executable by a processor to perform the method of any one of claims 1 to 7.