CN111128184A

CN111128184A - Voice interaction method and device between devices

Info

Publication number: CN111128184A
Application number: CN201911353536.0A
Authority: CN
Inventors: 丰崇鹏; 李勇; 俞瑞隆
Original assignee: AI Speech Ltd
Current assignee: AI Speech Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2020-05-08
Anticipated expiration: 2039-12-25
Also published as: CN111128184B

Abstract

The invention discloses a voice interaction method among devices, which carries out voice enabling on all first devices of a first user and all second devices of a second user through a Dialogue User Interface (DUI) platform and maintains the device information of all the first devices and the device information of all the second devices, and the method comprises the following steps: the DUI server receives a first request message of the selected first equipment, wherein the first request message carries text data sent by a first user to a second user; the DUI server acquires a second equipment list of the second user from the DUI platform, wherein the second equipment list records the equipment information of all second equipment; the DUI server stores the text data in a third-party cloud server and generates corresponding first address information; and the DUI server sends the first address information to all the second equipment according to the acquired equipment information of all the second equipment.

Description

Voice interaction method and device between devices

Technical Field

The invention relates to the technology of the Internet of things, in particular to a voice interaction method and device between devices.

Background

The intelligent voice device can support the user to create group chat and leave messages by taking the user as a dimension, and support the user to arbitrarily select any device (namely, cross-device) belonging to the name to chat, play the leave messages and the like. The user selects a friend in the address list or selects a group chat to send a message (voice or text), the device bound by each member in the friend or the group chat can accurately receive the message, and the user can also select any device under the name of the device to broadcast and reply the message.

However, at present, the interaction of the intelligent voice device is based on a Message Queue Telemetry Transport (MQTT) protocol, which is lightweight, and a protocol carrier does not support the situation of too large data volume, so that the size of data transmitted by a single protocol is limited, and when the voice sent by a user at a single time is long, once the data volume reaches the upper limit of the data volume transmitted by the single protocol, the voice of the user is cut off, and the user experience is reduced.

Disclosure of Invention

The invention provides a voice interaction method and device between devices, which can solve the technical problems.

One aspect of the present invention provides a method for voice interaction between devices, where voice enabling is performed on all first devices of a first user and all second devices of a second user through a dialog user interface, DUI, platform, and device information of all first devices and device information of all second devices are maintained, including:

the DUI server receives a first request message of the selected first equipment, wherein the first request message carries text data sent by a first user to a second user;

the DUI server acquires a second equipment list of the second user from the DUI platform, wherein the second equipment list records the equipment information of all second equipment;

the DUI server stores the text data in a third-party cloud server and generates corresponding first address information;

and the DUI server sends the first address information to all the second equipment according to the acquired equipment information of all the second equipment.

And the DUI server acquires the second equipment list from the DUI platform according to the user ID of the second user.

Wherein, the request message also carries the user ID of the first user, and the method further includes:

and the DUI server acquires a first equipment list of the first user from a DUI platform according to the user ID of the first user, wherein the equipment information of all first equipment is recorded in the first equipment list.

Wherein, the method also comprises:

the DUI server receives a second request message sent by the selected second equipment, wherein the second request message carries text data replied by the second user to the first user;

the DUI server stores the replied text data in a third-party cloud server and generates corresponding second address information;

and the DUI server sends the second address information to all the first equipment according to the acquired equipment information of all the first equipment.

The text data comprises first text data interacted between a first user and a second user and/or second text data converted from voice data interacted between the first user and the second user;

the DUI server storing the text data in a third party cloud server, comprising:

and the DUI server stores the first text data in a third-party cloud server, and/or the DUI server restores the second text data into voice data and stores the voice data in the third-party cloud server.

Wherein, the method also comprises:

the DUI server sends the first address information to a DUI client of a second user associated with the user ID of the second user according to the user ID of the second user;

and the DUI server sends the second address information to the DUI client of the first user associated with the user ID of the first user according to the user ID of the first user.

Another aspect of the present invention provides an apparatus for voice interaction between devices, which performs voice enabling on all first devices of a first user and all second devices of a second user through a DUI platform, and maintains device information of all first devices and device information of all second devices, and the apparatus is applied to a DUI server, and includes:

the interaction module is used for receiving a first request message of the selected first equipment, wherein the first request message carries text data sent by the first user to the second user;

an information processing module, configured to obtain, from the DUI platform, a second device list of the second user, where device information of all the second devices is recorded in the second device list;

the resource processing module is used for storing the text data in a third-party cloud server and generating corresponding first address information;

the interaction module is further configured to send the first address information to all the second devices according to the acquired device information of all the second devices.

Wherein, the request message also carries the user ID of the first user;

the information processing module is further configured to obtain a first device list of the first user from the DUI platform according to the user ID of the first user, where device information of all first devices is recorded in the first device list.

The interaction module is further configured to receive a second request message sent by the selected second device, where the second request message carries text data replied by the second user to the first user;

the resource processing module is further configured to store the replied text data in a third-party cloud server and generate corresponding second address information;

the interaction module is further configured to send the second address information to all the first devices according to the acquired device information of all the first devices.

the resource processing module is further used for storing the first text data in a third-party cloud server, and/or restoring the second text data into voice data and then storing the voice data in the third-party cloud server.

The interaction module is further configured to send the first address information to a DUI client of the second user associated with the user ID of the second user according to the user ID of the second user; and the interaction module is further used for sending the second address information to the DUI client of the first user associated with the user ID of the first user according to the user ID of the first user.

According to the scheme, after the voice data are converted into the text data, the data volume required to be transmitted can be greatly reduced, even if the user sends long voice, the single-time transmission data volume specified by the MQTT protocol can be met, the user can smoothly send the voice, the requirements of the MQTT protocol can be met, and the user experience is improved.

Drawings

Fig. 1 is a schematic flow chart illustrating a voice interaction method between devices according to an embodiment of the present invention;

fig. 2 is a schematic flow chart illustrating a voice interaction method between devices according to another embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a voice interaction apparatus between devices according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a voice interaction process between devices in a scenario.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the scheme provided by the embodiment of the invention, the voice is converted into characters by combining a Dialogue User Interface (DUI) platform through a voice recognition technology and then transmitted to the DUI server, and then the characters are transmitted to the corresponding user by the DUI server, so that the problem of overlarge bandwidth occupied by the voice is solved.

To implement the above solution, first, a user can register all his devices on the DUI platform through the DUI client, so that the DUI platform can perform voice-enabling on each device, and the capability of each device for specific voice interaction includes but is not limited to: voice acquisition, voice recognition, voice conversion and the like. Therefore, the device to which the present invention is directed needs to have a hardware basis for implementing these functions. Based on this, device information for multiple devices bound per user is maintained on the DUI platform, and the device information is used to uniquely identify one device.

After the preparation work is completed, the implementation of the voice interaction process between the devices of the present invention on the DUI server side is shown in fig. 1, and includes:

step 101, the DUI server receives a first request message of the selected first device, wherein the first request message carries text data sent by the first user to the second user.

In the embodiment of the present invention, the voice interaction between the first user and the second user is taken as an example for description, which may be a point-to-point voice interaction between the first user and the second user, or a case where the first user initiates voice in a group chat and other members (taking the second user as an example) receive voice in the group chat.

Assuming that a first user has bound multiple devices on the DUI platform, the multiple devices are referred to as multiple first devices; the second user has bound multiple devices on the DUI platform, which are referred to as multiple second devices.

The first user may select any one of the first devices to initiate speech. The selected first device may send a first request message to the DUI server after processing the collected first user's voice. It is noted that in order to reduce the amount of data for the interaction, the first device may convert the speech sent by the first user into text, so that text data is carried in the message instead of speech data. If the text submitted by the first user is the text, the text is directly packaged into the first request message without conversion.

Of course, the first user may also initiate voice through the DUI client, in which case, the DUI client sends the first request message after processing the collected voice of the first user.

Step 102, the DUI server obtains a second device list of the second user from the DUI platform, and the device information of all the second devices is recorded in the second device list.

The DUI platform may maintain a list of devices for each user, and may search for a corresponding list of devices via a user ID. The user ID is used to uniquely identify a user. Assume that the DUI platform maintains a first device list for a first user and a second device list for a second user.

If the interaction scenario is a point-to-point voice interaction between the first user and the second user, then here the DUI server, after receiving the first request message, can determine two user IDs, i.e., the user ID of the first user and the user ID of the second user.

If the interaction situation is that the first user initiates voice in the group chat, the DUI server may determine two types of user IDs after receiving the first request message, that is, the user ID of the first user and the user IDs of other members in the group chat, that is, the user IDs of the plurality of second users, and the processing for each second user is the same subsequently.

In order to send the text or speech submitted by the first user to the second user, the DUI server needs to know the device information of the second user, where the DUI server may obtain a second device list from the DUI platform according to the user ID of the second user, in which the device information of all the second devices of the second user is described. The retrieved second device list may be cached locally.

And 103, the DUI server stores the text data in a third-party cloud server and generates corresponding first address information.

After receiving the first request message, the DUI server acquires text data carried in the first request message, wherein the text data is divided into two types: firstly, the text data sent by the first user to the second user can be classified as text data one for the convenience of description; first, text data obtained by converting voice data sent from the first user to the second user may be classified as text data two for ease of description.

The text data I can be directly stored in a cloud server of a third party; and for the second text data, the second text data can be restored into voice data and then stored in a cloud server of a third party.

After the text data of the interaction is stored in the third-party cloud server, address information (for convenience of description, referred to as first address information) such as a URL may be generated.

It should be noted that, in the solution of the present invention, there is no execution sequence between step 102 and step 103.

And step 104, the DUI server sends the first address information to all the second devices according to the acquired device information of all the second devices.

The DUI server may transmit the first address information to a corresponding device according to the device information. Here, the DUI server may send all of the second devices that are individually exposed to the second user. The second user may optionally select a second device to read or listen to the text or voice from the first user, and specifically, the resource stored on the third party cloud server may be accessed through the first address information.

Through the process, the unidirectional interaction from the equipment of the first user to the equipment of the second user is realized.

It is noted that, in this step, the DUI server may also send the first address information directly to the associated DUI client of the second user according to the user ID of the second user. The DUI server need not obtain the user's client information to the DUI platform.

The present invention can also be implemented, that is, bidirectional interaction between a device of a first user and a device of a second user, as shown in fig. 2, a voice interaction method between the devices of the present invention includes:

and 105, the DUI server receives a second request message sent by the selected second device, wherein the second request message carries text data replied by the second user to the first user.

As mentioned above, the second user may choose a second device to read or listen to the text or voice from the first user, and then the second user may choose to reply to the first user, where the reply may be a case where the second user replies to the first user point-to-point, or a case where the second user replies to the first user in a group chat (and in this case, the other users in the group chat who reply to the second user may also receive the reply).

Of course, the second user may also read or listen to text or voice from the first user through the DUI client, in which case the second user may reply by sending a second request message through the DUI client.

And 106, the DUI server stores the replied text data in a third-party cloud server and generates corresponding second address information.

Here, text data is also classified into two types: firstly, the text data sent by the second user to the first user can be classified as text data one; first, text data obtained by converting voice data sent by the second user to the first user may also be classified as text data two.

After the text data of the interaction is stored in the third-party cloud server, an address message can be generated, and for convenience of description, the address message is called as a second address message.

Step 107, the DUI server sends the second address information to all the first devices according to all the acquired device information of the first devices.

The DUI server may obtain the device information of the first device, which may be completed when obtaining the device list of the second device in step 102, that is, the DUI server obtains the device lists of the first device and the second device at the same time when receiving the first request message, and in the case of group chat, in essence, in step 102, the DUI server obtains the device lists of all members in the group chat.

Thus, when the second user replies to speech or text, the DUI server does not need to request the device list from the DUI platform any more, and can read the required device list from the local cache, for example, in the case of peer-to-peer interaction, the device list of the first user can be read only; in the case of group chat, the device list of other members (including the first user) except the second user itself is read, and since the interaction between the second user and each of the other members in the group chat is the same as the interaction between the second user and the first user, only the first user is taken as an example for description here.

The DUI server may transmit the second address information to a corresponding device according to the device information of the first user. Here, the DUI server may send all first devices of the first user, which are respectively exposed to the first user. The first user may optionally select a first device to read or listen to the text or voice from the second user, and specifically, the resource stored on the third party cloud server may be accessed through the second address information.

Of course, the DUI server may also send the second address information directly to the associated DUI client of the first user according to the user ID of the first user.

The one-way interaction from the second user to the first user can be completed through the operation of step 105 and step 107.

The two-way interaction between the first user and the second user can be completed through the operations of step 101 and step 107.

In order to implement the method, the present invention further provides an apparatus 30 for voice interaction between devices, which is applied to a DUI server, as shown in fig. 3, and includes:

the interaction module 31 is configured to receive a first request message of the selected first device, where the first request message carries text data sent by the first user to the second user;

the information processing module 32 is configured to obtain a second device list of the second user from the DUI platform, where the device information of all the second devices is recorded in the second device list;

the resource processing module 33 is used for storing the text data in a third-party cloud server and generating corresponding first address information;

the interaction module 31 is further configured to send the first address information to all the second devices according to the acquired device information of all the second devices.

The first request message also carries a user ID of the second user, and the DUI server acquires the second device list from the DUI platform according to the user ID of the second user.

The request message also carries a user ID of the first user;

the information processing module 32 is further configured to obtain a first device list of the first user from the DUI platform according to the user ID of the first user, where device information of all first devices is recorded in the first device list.

The interaction module 31 is further configured to receive a second request message sent by the selected second device, where the second request message carries text data replied by the second user to the first user;

the resource processing module 33 is further configured to store the replied text data in a third-party cloud server, and generate corresponding second address information;

the interaction module 31 is further configured to send the second address information to all the first devices according to the acquired device information of all the first devices.

The text data comprises first text data interacted between the first user and the second user and/or second text data converted from voice data interacted between the first user and the second user;

the resource processing module 33 is further configured to store the first text data in the third-party cloud server, and/or restore the second text data into voice data and store the voice data in the third-party cloud server.

The interaction module 31 is further configured to send the first address information to a DUI client of the second user associated with the user ID of the second user according to the user ID of the second user;

and the interaction module 31 is further configured to send the second address information to the DUI client of the first user associated with the user ID of the first user according to the user ID of the first user.

The above-mentioned solution of the present invention is further explained by a specific scenario, as shown in fig. 4, it is assumed that a user 1 and a user 2 perform voice interaction, where the user 1 has 2 devices, respectively device 1 and device 2, and the user 2 has 3 devices, respectively devices 3, 4, and 5.

Then the voice interaction process is:

firstly, a user 1 binds own devices 1 and 2 on a DUI platform through a DUI client 1, and the user 2 binds own devices 3, 4 and 5 on the DUI platform through the DUI client 2. The DUI platform voice-energizes devices 1-5, respectively, and maintains a device list for user 1 (containing device information for devices 1, 2) and a device list for user 2 (containing device information for devices 3, 4, 5).

1. The user 1 initiates voice to the user 2 through the device 1, the device 1 converts the voice into text after collecting the voice, encapsulates the text data in the request message 1 and sends the text data to the DUI server.

After the DUI server receives request message 1:

2. the DUI server can determine the ID of the user 1 and the ID of the user 2 according to the request message 1, and the DUI server requests the device list of the user 1 and the device list of the user 2 from the DUI platform according to the user IDs and caches the device lists locally.

3. And the DUI server converts the text data in the request message 1 into voice data, stores the voice data in a third-party cloud server and generates address information 1.

4. The DUI server reads the device list of the user 2 from the local cache, and sends the address information 1 to the devices 3, 4 and 5 respectively according to the device information of the devices 3, 4 and 5 recorded in the device list, and simultaneously can also send the address information 1 to the DUI client 2 according to the ID of the user 2.

5. The user 2 chooses to listen to the voice of the user 1 via the device 5 (i.e. to access the voice of the user 1 stored on the third party remote server via the address information provided by the device 5) and replies. The device 5 converts the speech replied by the user 2 into text, encapsulates the text data in the request message 2 and sends it to the DUI server.

6. And the DUI server converts the text data in the request message 2 into voice data, stores the voice data in a third-party cloud server and generates address information 2.

7. The DUI server reads the device list of the user 1 from the local cache, respectively sends the address information 2 to the devices 1 and 2 according to the recorded device information of the devices 1 and 2, and simultaneously can send the address information 2 to the DUI client 1 according to the ID of the user 1, so that the user 1 can optionally listen to the voice replied by the user 2 from one of the DUI client 1, the device 1 and the device 2.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of voice interaction between devices, wherein all first devices of a first user and all second devices of a second user are voice-enabled and device information of all first devices and device information of all second devices is maintained through a conversational user interface, DUI, platform, the method comprising:

2. The method of claim 1, wherein the first request message further carries a user ID of the second user, and the DUI server obtains the second device list from the DUI platform according to the user ID of the second user.

3. The method of claim 1, wherein the request message further carries a user ID of the first user, and the method further comprises:

4. The method of claim 3, further comprising:

5. The method according to claim 1 or 4, wherein the text data comprises first text data interacted by a first user and a second user and/or second text data converted from voice data interacted by the first user and the second user;

the DUI server storing the text data in a third party cloud server, comprising:

6. The method of claim 1 or 4, further comprising:

7. An apparatus for voice interaction between devices, wherein all first devices of a first user and all second devices of a second user are voice-enabled through a DUI platform, and device information of all first devices and device information of all second devices are maintained, the apparatus is applied to a DUI server, and comprises:

8. The apparatus of claim 7, wherein the first request message further carries a user ID of the second user, and wherein the DUI server obtains the second device list from the DUI platform according to the user ID of the second user.

9. The apparatus according to claim 7, wherein the request message further carries a user ID of the first user;

10. The apparatus of claim 9,

11. The device according to claim 7 or 10, wherein the text data comprises first text data interacted with by a first user and a second user and/or second text data converted from voice data interacted with by the first user and the second user;

12. The apparatus according to claim 7 or 10,