CN111105797A

CN111105797A - Voice interaction method and device and electronic equipment

Info

Publication number: CN111105797A
Application number: CN201911311008.9A
Authority: CN
Inventors: 孙鹏; 甘津瑞
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2020-05-05

Abstract

The invention discloses a voice interaction method, a voice interaction device and electronic equipment. An embodiment of the method is applied to first equipment, the first equipment establishes communication connection with a voice conversation platform and a voice processing client, and the method comprises the following steps: acquiring a voice request; performing semantic analysis on the voice request by means of a voice dialogue platform; sending a semantic analysis result to a voice processing client; in the voice interaction method, the first device carries out semantic analysis on the acquired voice request by means of the voice dialogue platform, sends a result of the semantic analysis to the voice processing client, and then completes operation corresponding to the voice request by matching with the voice processing client; therefore, the stability, convenience and fluency of the interaction process of the first equipment and the voice processing client can be improved; and then reduce cost, improved user's experience nature.

Description

Voice interaction method and device and electronic equipment

Technical Field

The invention belongs to the technical field of intelligence, and particularly relates to a voice interaction method and device and electronic equipment.

Background

The intelligent vehicle machine or rearview mirror carried in the high-end luxury automobile on the market at present can realize making a call through some simple voice instructions. The intelligent vehicle machine or the rearview mirror is adopted to make a call, and the mobile phone address book needs to be transmitted to the intelligent vehicle machine or the rearview mirror through Bluetooth in advance; the process has the disadvantages of complicated steps, high privacy disclosure risk, less network speed than the stability of the mobile phone and slow voice instruction response. Because the smart car machine or the rearview mirror needs to be equipped with the SIM card independently, when the smart car machine or the rearview mirror is used for calling, the other party finds that the mobile phone number of the incoming call is strange and is often rejected.

What adopt intelligent car machine or rear-view mirror to make a call provides is basic language ability, does not possess the ability of customized dialogue, and convenience and stability are not enough. In addition, the intelligent car machine or rearview mirror which can be operated by voice in the market at present has higher cost, and is generally equipped on some high-end cars, so the popularization is not high.

Disclosure of Invention

In view of this, embodiments of the present invention provide a voice interaction method and apparatus, and an electronic device, which can improve stability, convenience, and smoothness of an interaction process between a first device and a voice processing client.

To achieve the above object, according to a first aspect of the embodiments of the present invention, there is provided a voice interaction method applied to a first device, where the first device establishes a communication connection with a voice dialog platform and a voice processing client, the method includes: acquiring a voice request; performing semantic analysis on the voice request by means of the voice dialogue platform; sending a semantic analysis result to the voice processing client; and completing the operation corresponding to the voice request by matching with the voice processing client.

Optionally, the sending the result of semantic parsing to the voice processing client includes: obtaining a semantic parsing result from the voice dialogue platform; and sending the semantic analysis result to the voice processing client.

Optionally, the sending the result of semantic parsing to the voice processing client includes: and sending the semantic analysis result to the voice processing client through the voice dialogue platform.

Optionally, the completing, in cooperation with the voice processing client, an operation corresponding to the voice request includes: acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result; and responding to the operation instruction, and executing the operation corresponding to the voice request.

Optionally, the method further includes: before the step of obtaining the voice request, the method further comprises: acquiring a wakeup word instruction; and responding to the awakening language instruction, and triggering to start the voice interaction function.

To achieve the above object, according to a second aspect of the embodiments of the present invention, there is provided a voice interaction apparatus, applied to a first device, where the first device establishes a communication connection with a voice dialog platform and a voice processing client, the apparatus including: the acquisition module is used for acquiring a voice request; the semantic analysis module is used for carrying out semantic analysis on the voice request by means of the voice dialogue platform; the sending module is used for sending a semantic analysis result to the voice processing client; and the operation module is used for matching with the voice processing client to complete the operation corresponding to the voice request.

Optionally, the sending module includes: the acquisition unit is used for acquiring a semantic analysis result from the voice conversation platform; and the sending unit is used for sending the semantic analysis result to the voice processing client.

Optionally, the sending module includes: and the sending unit is used for sending the semantic analysis result to the voice processing client through the voice dialogue platform.

Optionally, the operation module includes: the acquisition unit is used for acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result; and the execution unit is used for responding to the operation instruction and executing the operation corresponding to the voice request.

Optionally, the apparatus further comprises: the acquisition module is also used for acquiring a wake-up language instruction; and the triggering module is used for responding to the awakening language instruction and triggering and starting the voice interaction function.

To achieve the above object, according to a third aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the voice interaction method as described in the first aspect.

Based on the technical scheme, in the voice interaction method, the first device performs semantic analysis on the acquired voice request by means of the voice dialogue platform, sends a semantic analysis result to the voice processing client, and completes operation corresponding to the voice request by matching with the voice processing client; therefore, the stability, convenience and fluency of the interaction process of the first equipment and the voice processing client can be improved; and then reduce cost, improved user's experience nature.

Further effects of the above-described non-conventional alternatives will be described below in connection with specific embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein: in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 is a flow chart of a voice interaction method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a voice interaction method according to yet another embodiment of the present invention;

FIG. 3 is a flow chart of a voice interaction method according to another embodiment of the present invention;

FIG. 4 is a diagram illustrating a voice interaction apparatus according to an embodiment of the present invention;

FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 is a flow chart of a voice interaction method according to the present invention. As shown in fig. 1, a voice interaction method according to an embodiment of the present invention is applied to a first device, where the first device establishes a communication connection with a voice dialog platform and a voice processing client, and the method includes:

s101: acquiring a voice request;

illustratively, the following operations are required before obtaining the voice request: acquiring a wakeup word instruction; and responding to the awakening language instruction, and triggering to start the voice interaction function.

Specifically, a wakeup word is obtained; performing semantic analysis on the awakening words through the voice conversation platform to obtain a semantic analysis result; generating a wakeup word instruction corresponding to the result of the semantic analysis according to the result of the semantic analysis; and responding to the awakening language instruction, and triggering to start the voice interaction function.

Here, the wakeup words may be set in advance, and the first device may be woken up by a specific wakeup word.

For example, the wake-up word is "love you, and by the wake-up word, after waking up the first device, the first device can obtain the voice request and perform voice interaction with the voice processing client. The voice request is 'please navigate, i want to go to the palace to watch snow'.

S102: performing semantic analysis on the voice request by means of the voice dialogue platform;

specifically, semantic analysis is performed on a voice request 'please navigate, i want to go to the palace and enjoy snow' by means of a voice dialogue platform, and the obtained result of the semantic analysis is 'navigate to the palace from the current position'.

S103: sending a semantic analysis result to the voice processing client;

firstly, the 'navigation from the current position to the affected part' is sent to the voice processing client.

Here, the voice processing client may be a mobile phone APP, or other device capable of performing voice processing; the first device may be a speaker or other device capable of voice interaction.

S104: and completing the operation corresponding to the voice request by matching with the voice processing client.

Illustratively, acquiring an operation instruction generated by the voice processing client according to the received semantic parsing result; and responding to the operation instruction, and executing the operation corresponding to the voice request.

Specifically, the voice processing client completes navigation planning according to a semantic analysis result of 'navigating from the current position to the home position', and feeds back a planning result through the first device, for example, the voice prompts 'driving 500 meters ahead and left, and entering high speed'.

It should be appreciated that MQTT communication is used between the first device and the speech processing client. The binding between the first device and the voice processing client uses a BLE network distribution mode, a sound wave network distribution mode and an AP network distribution mode. The voice interaction process relates to related technologies such as voice acquisition, front-end signal processing, voice awakening, voice recognition, semantic understanding, synthetic voice broadcasting and the like.

It should also be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

The voice interaction method of the embodiment of the invention utilizes the strong voice processing capability of the voice dialogue platform to improve the stability, convenience and fluency of interaction between the sound box and the mobile phone APP. And can also realize customizing exclusive smart speaker and cell-phone APP according to the car of difference. The voice interaction method provided by the embodiment of the invention can be applied to different scenes to realize the application of multiple skills, such as common skills in life, such as telephone making, voice navigation and the like.

Fig. 2 is a flowchart of a voice interaction method according to still another embodiment of the invention. As shown in fig. 2, the voice interaction method according to the embodiment of the present invention is applied to a first device, where the first device establishes a communication connection with a voice dialog platform and a voice processing client, and the method includes:

s201: acquiring a wakeup word instruction;

s202: responding to the awakening language instruction, and triggering and starting a voice interaction function;

s203: acquiring a voice request;

s204: performing semantic analysis on the voice request by means of the voice dialogue platform;

s205: obtaining a semantic parsing result from the voice dialogue platform;

s206: sending the result of semantic analysis to the voice processing client;

s207: acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result;

s208: and responding to the operation instruction, and executing the operation corresponding to the voice request.

Specifically, a screenless sound box on an automobile is installed as first equipment, and the awakening language of the sound box is set to be' love you. After the sound box is awakened through the awakening words, the sound box is started to perform a voice interaction function with the mobile phone APP. The sound box acquires a voice request of 'calling to Lidong'; performing semantic analysis on the voice request 'call to lee' through the voice conversation platform to obtain a result of the semantic analysis as 'call to lee'; then sending the 'dialing of the Li Dong telephone' to the mobile phone APP so as to dial the Li Dong telephone through the mobile phone APP; after the phone is dialed, the sound box is controlled through the mobile phone APP to complete the operation of 'making a call with the Lidong'.

It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

According to the voice interaction method, the voice request is subjected to semantic analysis by the sound box through the voice conversation platform, the obtained result of the semantic analysis is sent to the mobile phone APP, and then the voice request is matched with the mobile phone APP to complete the operation corresponding to the voice request; therefore, the safety, stability and fluency of the voice interaction process can be improved.

Fig. 3 is a flowchart of a voice interaction method according to another embodiment of the present invention. As shown in fig. 3, the voice interaction method according to the embodiment of the present invention is applied to a first device, where the first device establishes a communication connection with a voice dialog platform and a voice processing client, and the method includes:

s301: acquiring a wakeup word instruction;

s302: responding to the awakening language instruction, and triggering and starting a voice interaction function;

s303: acquiring a voice request;

s304: performing semantic analysis on the voice request by means of the voice dialogue platform;

s305: sending the result of semantic analysis to the voice processing client through the voice dialogue platform;

s306: acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result;

s307: and responding to the operation instruction, and executing the operation corresponding to the voice request.

Specifically, a screenless sound box on an automobile is installed as first equipment, and the awakening language of the sound box is set to be' love you. After the sound box is awakened through the awakening words, the voice interaction function of the sound box and the mobile phone APP is started. The sound box acquires a voice request as 'playing music'; performing semantic analysis on the voice request 'play music' through a voice dialogue platform, wherein the result of the semantic analysis is 'start music play function'; then the voice dialogue platform will "open music instant music" and send to cell-phone APP, and cell-phone APP is to audio amplifier propelling movement music to accomplish the instant music's instant music through the audio amplifier.

According to the voice interaction method, the voice request is subjected to semantic analysis by the sound box through the voice conversation platform, the result of the semantic analysis is sent to the mobile phone APP through the voice conversation platform, and then the voice conversation platform and the mobile phone APP are matched to complete the operation corresponding to the voice request; therefore, the safety, stability and convenience of the voice interaction process can be improved.

Fig. 4 is a schematic diagram of a voice interaction apparatus according to an embodiment of the invention. The voice interaction apparatus is applied to a first device, the first device establishes a communication connection with a voice dialog platform and a voice processing client, and the apparatus 400 includes: an obtaining module 401, configured to obtain a voice request; a semantic parsing module 402, configured to perform semantic parsing on the voice request by using the voice dialog platform; a sending module 403, configured to send a result of semantic parsing to the voice processing client; an operation module 404, configured to cooperate with the voice processing client to complete an operation corresponding to the voice request.

In an optional embodiment, the sending module includes: the acquisition unit is used for acquiring a semantic analysis result from the voice conversation platform; and the sending unit is used for sending the semantic analysis result to the voice processing client.

In an optional embodiment, the sending module includes: and the sending unit is used for sending the semantic analysis result to the voice processing client through the voice dialogue platform.

In an alternative embodiment, the operation module includes: the acquisition unit is used for acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result; and the execution unit is used for responding to the operation instruction and executing the operation corresponding to the voice request.

In an optional embodiment, the apparatus further comprises: the acquisition module is also used for acquiring a wake-up language instruction; and the triggering module is used for responding to the awakening language instruction and triggering and starting the voice interaction function.

Fig. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed.

As shown in fig. 5, the system architecture 500 may include

terminal devices

501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the

terminal devices

501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The

terminal devices

501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 505 may be a server providing various services, such as a background management server (for example only) providing support for click events generated by users using the

terminal devices

501, 502, 503. The background management server may analyze and perform other processing on the received click data, text content, and other data, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the voice interaction method provided by the embodiment of the present application is generally executed by the server 505, and accordingly, the voice interaction apparatus is generally disposed in the server 505.

It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The invention also provides an electronic device and a computer readable medium according to the embodiment of the invention.

The electronic device of the present invention includes: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a voice interaction method of an embodiment of the present invention.

The computer readable medium of the present invention has stored thereon a computer program which, when executed by a processor, implements a voice interaction method of an embodiment of the present invention.

Referring now to FIG. 6, shown is a block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 6, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data necessary for the operation of the system 900 are also stored. The CPU901, ROM902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904. The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a voice request; performing semantic analysis on the voice request by means of the voice dialogue platform; sending a semantic analysis result to the voice processing client; and completing the operation corresponding to the voice request by matching with the voice processing client.

In the voice interaction method, the first device performs semantic analysis on the acquired voice request by means of the voice dialogue platform, sends a semantic analysis result to the voice processing client, and completes operation corresponding to the voice request by matching with the voice processing client; therefore, the stability, convenience and fluency of the interaction process of the first equipment and the voice processing client can be improved; and then reduce cost, improved user's experience nature.

The product can execute the voice interaction method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the voice interaction method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

The above description is only an exemplary embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and the present invention shall be covered thereby. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A voice interaction method is applied to a first device, the first device establishes communication connection with a voice conversation platform and a voice processing client, and the method comprises the following steps:

acquiring a voice request;

performing semantic analysis on the voice request by means of the voice dialogue platform;

sending a semantic analysis result to the voice processing client;

and completing the operation corresponding to the voice request by matching with the voice processing client.

2. The method of claim 1, wherein sending the result of semantic parsing to the speech processing client comprises:

obtaining a semantic parsing result from the voice dialogue platform;

and sending the semantic analysis result to the voice processing client.

3. The method of claim 1, wherein sending the result of semantic parsing to the speech processing client comprises:

and sending the semantic analysis result to the voice processing client through the voice dialogue platform.

4. The method of claim 1, wherein the performing, in cooperation with the voice processing client, the operation corresponding to the voice request comprises:

acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result;

and responding to the operation instruction, and executing the operation corresponding to the voice request.

5. The method of claim 1, prior to the step of obtaining a voice request, further comprising:

acquiring a wakeup word instruction;

and responding to the awakening language instruction, and triggering to start the voice interaction function.

6. A voice interaction device is applied to a first device, the first device establishes communication connection with a voice conversation platform and a voice processing client, and the voice interaction device comprises:

the acquisition module is used for acquiring a voice request;

the semantic analysis module is used for carrying out semantic analysis on the voice request by means of the voice dialogue platform;

the sending module is used for sending a semantic analysis result to the voice processing client;

and the operation module is used for matching with the voice processing client to complete the operation corresponding to the voice request.

7. The apparatus of claim 6, wherein the sending module comprises:

the acquisition unit is used for acquiring a semantic analysis result from the voice conversation platform;

and the sending unit is used for sending the semantic analysis result to the voice processing client.

8. The apparatus of claim 6, wherein the sending module comprises:

and the sending unit is used for sending the semantic analysis result to the voice processing client through the voice dialogue platform.

9. The apparatus of claim 6, wherein the operation module comprises:

the acquisition unit is used for acquiring an operation instruction generated by the voice processing client according to the received semantic analysis result;

and the execution unit is used for responding to the operation instruction and executing the operation corresponding to the voice request.

10. An electronic device, comprising: one or more processors; a storage device to store one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.