CN113050910B

CN113050910B - Voice interaction method, device, equipment and storage medium

Info

Publication number: CN113050910B
Application number: CN201911370301.2A
Authority: CN
Inventors: 华润策; 郑思源
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2023-12-05
Anticipated expiration: 2039-12-26
Also published as: CN113050910A

Abstract

A voice interaction method, apparatus, device and storage medium are disclosed. Setting audio playing priority corresponding to each service scene in a plurality of service scenes related to application on the intelligent equipment; acquiring a first audio playing event to be processed in the application; and determining whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene aimed at by the first audio play event. Thus, the audio play conflict existing in the application can be solved.

Description

Voice interaction method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of voice interactions, and in particular to voice interactions related to applications on smart devices.

Background

With the progress of computer technology, the voice processing capability is remarkably improved, and an application program (APP) running in the equipment can receive voice input of a user and feed back audio output to the user so as to realize voice interaction, thereby providing a plurality of convenience for life and work of people.

However, in the process of voice interaction through APP, there may be a plurality of service scenarios that require audio data output using an audio output device such as a speaker.

For example, for an APP that integrates multiple functions such as music playing, video playing, navigation, talking, etc., there may be multiple audio playing events in the APP that require audio data to be output.

As another example, there may be multiple APPs on the device that are capable of playing audio, so there may be situations where multiple APPs need to output audio data at the same time.

In the case that there are a plurality of audio play events that need to output audio data at the same time, how to control the audio output of the plurality of audio play events to better serve the user is a technical problem that needs to be solved at present.

Disclosure of Invention

The invention aims to provide a voice interaction scheme for solving the possible audio playing conflict in the application or among different applications.

According to a first aspect of the present disclosure, a voice interaction method is provided, including: setting audio playing priority corresponding to each service scene in a plurality of service scenes related to application on the intelligent equipment; acquiring a first audio playing event to be processed in an application; and determining whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene aimed at by the first audio play event.

Optionally, the step of determining whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene targeted by the first audio play event includes: comparing the first audio playing priority with a second audio playing priority of a service scene aimed at by a second audio playing event of playing audio data in the application; and playing the audio data corresponding to the first audio playing event under the condition that the first audio playing priority is higher than or equal to the second audio playing priority.

Optionally, when the first audio playing priority is higher than the second audio playing priority, playing the audio data corresponding to the second audio playing event and playing the audio data corresponding to the first audio playing event, and/or when the first audio playing priority is equal to the second audio playing priority, continuing playing the audio data corresponding to the second audio playing event and playing the audio data corresponding to the first audio playing event, and/or when the first audio playing priority is higher than or equal to the second audio playing priority, playing the audio data corresponding to the second audio playing event with the first parameter, and playing the audio data corresponding to the first audio playing event with the second parameter.

Optionally, the method further comprises: and under the condition that the audio data corresponding to the first audio playing event is not played, storing the first audio playing event.

Optionally, the method further comprises: judging whether the first audio playing event is an event which needs to be executed in real time; and executing the operation of storing the first audio play event under the condition that the first audio play event is judged to be the event which does not need to be executed immediately.

Optionally, the method further comprises: and under the condition that the current audio playing state is not played, selecting one or more first audio playing events from the plurality of first audio playing events stored before, and playing audio data corresponding to the selected first audio playing events.

Optionally, the step of selecting one or more first audio play events from the plurality of first audio play events stored previously includes: selecting one or more first audio play events from a plurality of first audio play events stored before according to the audio play priority; or selecting one or more first audio play events from the plurality of first audio play events stored before according to the sequence when the first audio play events are stored.

Optionally, the application includes a plurality of application interfaces, the application interfaces correspond to predetermined service scenes, the application interfaces are used for responding to the operation of the user to play the audio data under the service scenes corresponding to the application interfaces, and the step of setting the audio playing priority corresponding to each service scene in the plurality of service scenes related to the application on the intelligent device includes: and setting audio playing priority for the application interface according to the service scene corresponding to the application interface.

Optionally, the step of obtaining the first audio playing event to be processed in the application includes: and detecting the application interface to acquire a first audio playing event in the application interface.

Optionally, the application interface is an H5 page, and/or the service scenario includes at least one of the following: talk scenes, navigation scenes, music scenes, video scenes.

Optionally, the method further comprises: establishing a connection between an application and a first device, the first device having audio input and/or output capabilities, or the first device being connectable to a second device having audio input and/or output capabilities; the application receives input audio data from the first device and/or transmits output audio data to the first device via the connection; the application realizes voice interaction based on the input audio data to obtain output audio data.

Optionally, the step of implementing voice interaction includes: the application communicates with the server, the method further comprising: the application receives the data content from the server and generates an audio play event broadcasting the data content.

Optionally, the method further comprises: in the event that it is determined to play audio data corresponding to the first audio play event, the application transmits output audio data corresponding to the first audio play event to the first device over the connection.

Optionally, a connection is established between the application and the first device based on the private protocol, input audio data is received from the first device based on the private protocol, and/or output audio data is sent to the first device.

Optionally, in response to receiving the audio play event to be played during the process of playing the audio play event, the playing speed of the audio play event currently played is increased.

Optionally, the step of setting the audio playing priority corresponding to each of the plurality of service scenarios related to the application on the intelligent device includes: and acquiring the audio playing priority set by the user for each service scene in a plurality of service scenes related to the application on the intelligent equipment.

Optionally, the method further comprises: receiving an audio playing event sent by second equipment; and setting audio playing priority for the audio playing event from the second device according to the type of the second device.

According to a second aspect of the present disclosure, there is also provided a voice interaction method, including: setting an audio playing priority corresponding to a service scene; acquiring a first audio playing event to be processed; and determining whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene aimed at by the first audio play event.

According to a third aspect of the present disclosure, there is also provided a voice interaction device, including: the setting module is used for setting audio playing priority corresponding to each service scene in a plurality of service scenes related to the application on the intelligent equipment; the acquisition module is used for acquiring a first audio playing event to be processed in the application; and the determining module is used for determining whether to play the audio data corresponding to the first audio playing event according to the first audio playing priority of the service scene aimed at by the first audio playing event.

According to a fourth aspect of the present disclosure, there is also provided a voice interaction device, including: the setting module is used for setting the audio playing priority corresponding to the service scene; the acquisition module is used for acquiring a first audio playing event to be processed; and the determining module is used for determining whether to play the audio data corresponding to the first audio playing event according to the first audio playing priority of the service scene aimed at by the first audio playing event.

According to a fifth aspect of the present disclosure, there is also presented a computing device comprising: a processor; and a memory having executable code stored thereon which, when executed by the processor, causes the processor to perform the method as described in the first or second aspect of the present disclosure.

According to a sixth aspect of the present disclosure, there is also provided a non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform a method as set forth in the first or second aspect of the present disclosure.

The method and the device can set the audio playing priority based on the service scene, and can determine the audio playing logic of the plurality of audio playing events according to the audio playing priority of the service scene aimed at by the plurality of audio playing events under the condition that the plurality of audio playing events exist at the same time so as to solve the possible audio playing conflict in the application or among different applications.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following more particular descriptions of exemplary embodiments of the disclosure as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout exemplary embodiments of the disclosure.

Fig. 1 shows a schematic flow chart of a voice interaction method according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of a voice interaction system according to one embodiment of the present disclosure.

Fig. 3 shows a schematic flow chart of the present disclosure in an in-vehicle scenario.

Fig. 4 shows a schematic flow chart of a voice interaction method according to another embodiment of the present disclosure.

Fig. 5 shows a schematic structural diagram of a voice interaction device according to an embodiment of the present disclosure.

Fig. 6 shows a schematic structural diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

Under the condition that a plurality of audio playing events needing to output audio data exist at the same time, if the audio playing events are not controlled, the audio data are directly played, the problem of multi-path audio playing mixing can be caused, and if when a new audio playing event arrives, other audio playing is directly forbidden without distinction, the use experience of a user can be influenced.

Therefore, the present disclosure proposes that an audio playing priority may be set based on a service scenario, and in a case where a plurality of audio playing events exist at the same time, audio playing logic of the plurality of audio playing events may be determined according to the audio playing priority of the service scenario for which the plurality of audio playing events are aimed, so as to solve audio playing conflicts that may exist in an application or between different applications.

Details relating to the present disclosure are further described below.

[ solution to Audio playback Conflict within an application ]

Fig. 1 shows a schematic flow chart of a voice interaction method according to an embodiment of the present disclosure. The method shown in fig. 1 may be performed based on an application on the smart device.

In the present embodiment, an application refers to an application program capable of supporting a plurality of audio playback functions corresponding to different service scenarios.

For example, the application may support voice interaction, and the application may provide a user with a plurality of audio playing functions such as information broadcasting, audio/video playing, voice communication, etc. by means of voice interaction.

For another example, the application may also support manual operations, e.g., the application may include one or more application interfaces, and may provide a user with various audio playback functions such as information broadcast, audio/video broadcast, voice call, etc., in response to a user touching a particular interface element in the application interface.

Referring to fig. 1, in step S110, audio playback priorities corresponding to respective service scenes among a plurality of service scenes to which an application on an intelligent device relates are set.

The traffic scenario is also called traffic type.

The plurality of business scenarios involved in the application may include, but is not limited to, any combination of a number of scenarios including a talk scenario, a navigation scenario, an audio scenario, a video scenario, a voice interaction scenario, and the like. The audio scene can be further subdivided into a plurality of business scenes such as music, radio stations, audio books and the like.

For a plurality of service scenes related to the application, the audio playing priority corresponding to the service scenes can be set according to the importance degree of the service scenes. For example, a higher audio playing priority may be set for more important service scenes (e.g., navigation scenes, call scenes), and a lower audio playing priority may be set for service scenes with relatively lower importance (e.g., music scenes, video scenes).

The audio playback priority may also be set by the user. For example, the user can set corresponding priority for the service scene related to the application according to own habit and preference. The audio playing priority set by the user for each service scene in a plurality of service scenes related to the application on the intelligent equipment can be obtained.

In step S120, a first audio play event to be processed in the application is acquired.

The terms "first," "second," and the like, as used in this disclosure, are used solely for distinguishing between descriptions and not for limiting chronological order, primary and secondary levels, importance, etc.

The first audio play event refers to an audio play event generated in the application that needs to play audio data. The first audio play event may include, but is not limited To, a music play event, a navigation information play event, a video play event, a sound book play event, a TTS (Text To Speech) Speech play event fed back To the user during a Speech interaction, and so forth.

The application may be detected to obtain a first audio play event generated in the application to be processed (i.e., to be executed).

As an example, the application may include a plurality of application interfaces, the application interfaces corresponding to predetermined service scenarios, and the application interfaces may play audio data in the service scenarios corresponding to the application interfaces in response to user operations (may be, but not limited to, manual operations, voice instructions). The application interface may be an H5 page, where the H5 page refers to a page designed based on the 5 th generation html standard specification (abbreviated as html 5). The application interface may be detected to obtain a first audio play event in the application interface.

In step S130, it is determined whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene targeted by the first audio play event.

After the first audio playing event is acquired, it may be determined whether a second audio playing event is present in the application that is playing audio data.

In the case that the second audio playing event exists in the application, the playing speed of the second audio playing event can be increased so as to reduce the playing delay of the new first audio playing event. That is, in response to receiving the audio play event to be played during the process of playing the audio play event, the playing speed of the audio play event currently played can be increased.

In the case that the second audio playing event exists in the application, the first audio playing priority may also be compared with the second audio playing priority of the service scene for which the second audio playing event is aimed. And determining whether to play the audio data corresponding to the first audio play event according to the comparison result.

As an example, whether to play audio data corresponding to the first audio play event may be determined based on, but not limited to, the following audio play logic.

Audio playing logic

1. The first audio playing priority is higher than the second audio playing priority

The playing of the audio data corresponding to the second audio playing event can be stopped, and the audio data corresponding to the first audio playing event can be played.

The audio data corresponding to the second audio playing event can be played by the first parameter, and the audio data corresponding to the first audio playing event can be played by the second parameter. The first parameter and the second parameter refer to parameters related to sound, such as volume, wherein the first parameter may be lower volume, and the second parameter may be higher volume.

That is, in the case where the first audio playing priority is higher than the second audio playing priority, the audio data corresponding to the second audio playing event may be played at a lower volume, and the audio data corresponding to the first audio playing event may be played at a relatively higher volume.

2. The first audio playing priority is lower than the second audio playing priority

The audio data corresponding to the first audio play event may not be played, or the audio data corresponding to the first audio play event may be played with a reduced volume.

3. The first audio playing priority is equal to the second audio playing priority

The audio data corresponding to the second audio playing event can be continuously played, and the audio data corresponding to the first audio playing event can be played.

Optionally, the audio data corresponding to the second audio playing event may be played with the first parameter, and the audio data corresponding to the first audio playing event may be played with the second parameter. The first parameter and the second parameter refer to parameters related to sound, such as volume, wherein the first parameter may be lower volume, and the second parameter may be higher volume.

That is, in the case where the first audio playing priority is equal to the second audio playing priority, the audio data corresponding to the second audio playing event may be played at a lower volume, and the audio data corresponding to the first audio playing event may be played at a relatively higher volume.

Under the condition that the audio data corresponding to the first audio playing event is not played, whether the first audio playing event is an event needing to be executed in real time or not can be judged, and under the condition that the first audio playing event is an event needing not to be executed in real time, the first audio playing event can be stored.

Under the condition that the current audio playing state of the intelligent device or the application is not played, one or more first audio playing events can be selected from a plurality of first audio playing events stored before, and audio data corresponding to the selected first audio playing events can be played.

As an example, one or more first audio play events may be selected from a plurality of first audio play events stored previously according to an audio play priority, or may be selected from a plurality of first audio play events stored previously according to a sequence in which the first audio play events are stored.

Under the condition that the second audio playing event does not exist in the application, namely under the condition that the application does not play the audio data currently, the audio data corresponding to the first audio playing event can be directly played.

It should be noted that, in a case where a plurality of first audio play events exist at the same time, for example, in a case where a plurality of first audio play events to be processed are generated in an application, in addition to comparing priorities of the first audio play events with priorities of the second audio play events, priorities of the plurality of first audio play events may be compared, and then play logic of the plurality of first audio play events may be determined based on a result of the comparison.

As an example, the first audio playback priorities of the service scenarios for which the plurality of first audio playback events are directed may be compared, and the first audio playback event with the highest audio playback priority may be selected therefrom. In the case that the number of the selected first audio playing events is plural, the first audio playing events can be selected from the plurality of first audio playing events, can be sequentially played, and can also be played simultaneously. And then further selecting the first audio playing event with the highest audio playing priority from the rest first audio playing events which are not executed, and the like, so as to complete the playing of the audio data corresponding to all the first audio playing events.

In summary, the present disclosure sets an audio playing priority corresponding to a service scenario, compares priorities of service scenarios targeted by a plurality of audio playing events when the plurality of audio playing events exist in an application, and determines playing logic of the plurality of audio playing events based on a comparison result, thereby solving audio playing conflicts that may exist in the application.

The present disclosure may also receive audio play events sent by other devices (which may be referred to as second devices for ease of distinction) than the first device, and set audio play priorities for the audio play events of the second device according to the type of the second device. Wherein the second device may be, but is not limited to, a local device, a remote device. For example, the second device may be an internet of things device that is in the same home as the first device, and the second device may include, but is not limited to, a home internet of things device such as a smart door, a smart light bulb, a smart door lock, and the like. The audio playback priority may be set for the second device by a user of the first device. After receiving the audio play event of the second device, the playing logic of the audio play event may refer to the description of the first audio play event above, which is not repeated herein.

As shown in fig. 2, a connection may be established between an APP on the smart device 120 and the first device 110.

The smart device 120 may be any suitable terminal device, preferably a portable mobile device, such as a smart phone, tablet computer, etc.

The APP referred to herein may refer to an application program capable of supporting a plurality of audio playback functions corresponding to different service scenarios as described above.

The first device 110 has audio input and/or output capabilities or the first device 110 can be connected to a second device 140 having audio input and/or output capabilities.

The first device 110 may have audio input and/or output capabilities that enable audio collection and/or broadcasting of audio data. Alternatively, as shown with reference to fig. 2, the first device 110 can be connected to, for example, a second device 140 having audio input and/or output functionality, and the audio input and/or output functionality described above can be implemented via the second device 140.

The application may receive input audio data from the first device 110 and/or send output audio data to the first device 110 via the connection.

The application can realize voice interaction based on the input audio data to obtain output audio data.

Referring to fig. 2, an application may communicate with a server 130 to implement voice interactions. The application may receive the data content from the server 130 and generate an audio play event for broadcasting the data content, that is, the first audio play event described above.

The scheme shown in fig. 1 may be executed by an application to determine whether to play audio data corresponding to the first audio play event.

In the case where it is determined to play audio data corresponding to the first audio play event based on the scheme shown in fig. 1, the application may transmit output audio data corresponding to the first audio play event to the first device through the connection.

Preferably, a connection may be established between the application and the first device based on the private protocol, and input audio data is received from the first device and/or output audio data is sent to the first device based on the private protocol. Thus, in the process of realizing voice interaction by the application, the interference of other applications on the intelligent device 120 on the application can be avoided.

Application example

In this embodiment, the voice interaction scheme of the present disclosure may be executed by the vehicle-mounted terminal. The vehicle-mounted terminal can be, but is not limited to, a vehicle-mounted device such as a vehicle-mounted central control panel and a vehicle-mounted player. The vehicle-mounted terminal can also be a mobile phone APP connected with the vehicle-mounted equipment, wherein the mobile phone APP can perform voice interaction through the vehicle-mounted equipment, is used as a cloud transit NLP instruction, and can also provide behavior logic for controlling instruction execution.

The vehicle-mounted terminal can comprise a plurality of H5 pages, different H5 pages can correspond to different service scenes, and a user can play audio/video by operating the H5 pages.

The Audio playing priority of the H5 page in the APP can be registered through the JS Bridge, and the playing management module can detect the Audio playing event of the H5 page, for example, the onPlay event and the onEnd event of Video and Audio labels of the H5 page can be detected through the JS Bridge, so that the playing state of the H5 page can be known.

As shown in fig. 3, a play management module and an inscribed voice interaction module may also be built in the vehicle-mounted terminal. The inscribed voice interaction module may include an audio-video player and a TTS voice player, and in this embodiment, the audio playing event may include a TTS voice stream playing instruction or other music playing instructions.

In the voice interaction process, when receiving a TTS voice stream playing instruction or other music playing instructions, the playing management module can judge the priority according to the registered audio playing priority, and control audio playing or breaking in an H5 page in the vehicle-mounted terminal based on the judging result.

As an example, the audio playback priority may be divided into two levels, mix and focus, with focus having a higher priority than mix. When registering the audio playing priority, a proper priority class can be determined according to the service scene of the H5 page. For example, a focus may be registered corresponding to an H5 page of a navigation service scene, and only mix may be registered corresponding to an H5 page of a music play scene.

As an example, upon encountering a focus level audio play event, the play management module may interrupt mix-play of mix-level audio if mix-level audio data is currently being played, and may mix-play if other focus level audio data is currently being played, optionally, may reduce the volume of other focus levels during play.

When encountering an audio playing event of a mix level, if the audio data of the focus level is currently being played, the playing management module can ignore the audio playing event of the mix level, namely, does not play the audio data corresponding to the audio playing event of the mix level; if audio data of other mix levels is currently being played, the playback may be mixed, optionally the volume of the other mix levels may be reduced during the playback.

For example, in the voice interaction process, the user enters into the H5 page to play video on demand, at this time, the registered JS Bridge receives the audio and video playing event, and notifies the player management module to pause the execution of the music playing instruction, so as to avoid the problem of multi-path audio playing.

The following is an exemplary description of the implementation of TSS playback in a voice interaction process.

1. And returning the voice command of the user to the playable TTS audio stream after cloud processing.

2. And judging whether other media are played by the vehicle-mounted terminal before playing the TTS, if so, whether music is being played, whether a message is being played or the vehicle-mounted terminal is navigating.

3. If the related media content is being played at this time, the playing management module will judge whether the media playing service side allows interruption according to the registered audio playing priority, if the interruption is allowed, the current media playing will be interrupted, the TTS audio stream is directly played, and the media is notified to continue playing after playing.

4. If the media is not allowed to be interrupted, the TTS audio stream cannot be played currently, the play management module can determine whether to store the TTS according to the type of the TTS audio stream, and if the TTS is not needed to be stored, the TTS is directly discarded and not processed.

5. The playback management module detects the playback of the media and, once the playback of the media is completed, will play the TTS audio stream from the waiting playback pool in a first-in-first-out order.

6. And completing all tasks when the TTS in all waiting pools are completely played.

In this embodiment, when the multi-service scene inside the vehicle-mounted terminal needs to be played, a play management module is designed, and the scene of playing audio and video in the H5 page is emphasized, so that the play conflict between the audio and video playing and the voice interaction operated by the user in the H5 page is better processed.

The implementation principle of the solution for the in-application audio playback conflict is described in detail with reference to fig. 1 to 3.

[ solution to Audio playback conflicts between applications ]

Based on the technical conception, the method and the device can also solve the audio playing conflict among different applications.

Fig. 4 shows a schematic flow chart of a voice interaction method according to another embodiment of the present disclosure. Wherein the method shown in fig. 4 may be performed by a client installed on a smart device.

Referring to fig. 4, in step S410, an audio play priority corresponding to a service scene is set.

In this embodiment, an audio playing priority may be set for an application according to a service scenario related to the application on the intelligent device. The application referred to herein refers to an application program capable of providing an audio playback service to a user.

The intelligent device can be provided with a plurality of applications, and different applications can correspond to different service scenes and are used for providing audio playing functions matched with the corresponding service scenes. For example, the plurality of applications installed on the smart device may include, but are not limited to, navigation-type applications, audio playback-type applications, video playback-type applications, and other applications that support voice interaction functionality, among others.

The audio playing priority can be set for the application according to the importance degree of the service scene related to the application. For example, a higher audio playing priority may be set for applications (such as navigation applications and conversation applications) corresponding to more important service scenes, and a lower audio playing priority may be set for applications (such as music playing applications and video playing applications) corresponding to service scenes with relatively lower importance.

The audio play priority may also be set by the user. For example, the user can set corresponding priority for the service scene related to the application according to own habit and preference. The audio playing priority set by the user for each service scene in a plurality of service scenes related to the application on the intelligent equipment can be obtained.

In step S420, a first audio play event to be processed is acquired.

The application may be detected to obtain a first audio play event to be processed.

In step S430, it is determined whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene targeted by the first audio play event.

And under the condition that the current audio playing state of the intelligent equipment is not played, the audio data corresponding to the first audio playing event can be played.

In the case that the current audio playing state of the intelligent device is in playing, the playing speed of the audio playing event currently playing can be increased so as to reduce the playing delay of the new first audio playing event. That is, in response to receiving the audio play event to be played during the process of playing the audio play event, the playing speed of the audio play event currently played can be increased.

Under the condition that the current audio playing state of the intelligent device is in playing, the first audio playing priority and the third audio playing priority corresponding to the application playing the audio data can be compared, and whether the audio data corresponding to the first audio playing event are played or not is determined based on the comparison result.

Similar to the audio playback logic described above.

In the case that the first audio playing priority is higher than the third audio playing priority, the audio playing of the application currently playing the audio data can be stopped, and the audio data corresponding to the first audio playing event can be played. The audio data currently being played can be played by the first parameter, and the audio data corresponding to the first audio playing event can be played by the second parameter. The first parameter and the second parameter refer to parameters related to sound, such as volume, wherein the first parameter may be lower volume, and the second parameter may be higher volume. That is, in the case where the first audio playing priority is higher than the third audio playing priority, the audio data currently being played can be played at a lower volume, and the audio data corresponding to the first audio playing event can be played at a relatively higher volume.

Under the condition that the first audio playing priority is lower than the third audio playing priority, the audio data corresponding to the first audio playing event can not be played, or the volume of the audio data corresponding to the first audio playing event can be reduced.

And under the condition that the first audio playing priority is equal to the third audio playing priority, continuously playing the audio data currently being played and playing the audio data corresponding to the first audio playing event. Alternatively, the audio data currently being played may be played with the first parameter, and the audio data corresponding to the first audio playing event may be played with the second parameter. The first parameter and the second parameter refer to parameters related to sound, such as volume, wherein the first parameter may be lower volume, and the second parameter may be higher volume. That is, in the case where the first audio playing priority is equal to the third audio playing priority, the audio data currently being played can be played at a lower volume, and the audio data corresponding to the first audio playing event can be started to be played at a relatively higher volume.

In case there are a plurality of first audio play events from different applications at the same time, priorities of the plurality of first audio play events may also be compared, and then play logic of the plurality of first audio play events is determined based on the comparison result.

As an example, the first audio playing priorities of the applications corresponding to the plurality of first audio playing events may be compared, and the first audio playing event with the highest audio playing priority may be selected therefrom. In the case that the number of the selected first audio playing events is plural, the first audio playing events can be selected from the plurality of first audio playing events, can be sequentially played, and can also be played simultaneously. And then further selecting the first audio playing event with the highest audio playing priority from the rest first audio playing events which are not executed, and the like, so as to complete the playing of the audio data corresponding to all the first audio playing events.

Under the condition that the current audio playing state of the intelligent equipment is not played, one or more first audio playing events can be selected from a plurality of first audio playing events stored before, and audio data corresponding to the selected first audio playing events are played.

In summary, the present disclosure sets an audio playing priority corresponding to a service scenario, compares priorities of service scenarios for which a plurality of audio playing events are directed in the case that there are a plurality of audio playing events from different applications, and determines playing logic of the plurality of audio playing events based on a comparison result, thereby solving audio playing conflicts that may exist between different applications.

Wherein different applications may be located in the same device or in different devices. Taking the method shown in fig. 4 as an example, the first device may also receive audio play events sent by other devices (for convenience of distinction, may be referred to as a second device) other than the first device, and set an audio play priority for the audio play event of the second device according to the type of the second device. Wherein the second device may be, but is not limited to, a local device, a remote device. For example, the second device may be an internet of things device that is in the same home as the first device. Wherein the audio playback priority may be set for the second device by a user of the first device. After receiving the audio play event of the second device, the playing logic of the audio play event may refer to the description of the first audio play event above, which is not repeated herein.

As an example, the method shown in fig. 4 may be performed by a specific client application installed on the smart device, and the client application may set an audio playing priority for an audio playing class application installed on the smart device, detect the audio playing class application, obtain audio playing events from the applications, and control playing of audio data by comparing the audio playing priorities so as to solve audio playing conflicts that may exist between different applications.

Fig. 5 shows a schematic structural diagram of a voice interaction device according to an embodiment of the present disclosure. Wherein the functional modules of the voice interaction apparatus may be implemented by hardware, software, or a combination of hardware and software that implements the principles of the invention. Those skilled in the art will appreciate that the functional modules depicted in fig. 5 may be combined or divided into sub-modules to implement the principles of the invention described above. Accordingly, the description herein may support any possible combination, or division, or even further definition of the functional modules described herein.

The following is a brief description of the functional modules that the voice interaction device may have and the operations that each functional module may perform, and the details related to these functional modules may be referred to in the foregoing related description, which is not repeated herein.

Referring to fig. 5, the voice interaction apparatus 500 includes a setting module 510, an obtaining module 520, and a determining module 530.

In one embodiment, the setting module 510 is configured to set an audio playing priority corresponding to each of a plurality of service scenarios related to an application on the smart device; the acquiring module 520 is configured to acquire a first audio playing event to be processed in an application; the determining module 530 is configured to determine whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scenario targeted by the first audio play event.

The determining module 530 may compare the first audio playing priority with a second audio playing priority of a service scenario for which a second audio playing event of audio data is playing in the application, and play the audio data corresponding to the first audio playing event if the first audio playing priority is higher than or equal to the second audio playing priority.

In the case that the first audio playing priority is higher than the second audio playing priority, the determining module 530 stops playing the audio data corresponding to the second audio playing event, and plays the audio data corresponding to the first audio playing event, and/or

In the case that the first audio playing priority is equal to the second audio playing priority, the determining module 530 continues to play the audio data corresponding to the second audio playing event, and plays the audio data corresponding to the first audio playing event, and/or

In the case that the first audio playing priority is higher than or equal to the second audio playing priority, the determining module 530 plays the audio data corresponding to the second audio playing event with the first parameter, and plays the audio data corresponding to the first audio playing event with the second parameter.

Optionally, the voice interaction device 500 may further include a save module. The saving module is configured to save the first audio play event if the determining module 530 determines that the audio data corresponding to the first audio play event is not played.

Optionally, the voice interaction device 500 may further include a judgment module. The judging module is used for judging whether the first audio playing event is an event which needs to be executed in real time, and the storing module executes the operation of storing the first audio playing event under the condition that the judging module judges that the first audio playing event is an event which does not need to be executed in real time.

Optionally, the voice interaction device 500 may further include a selection module. The selecting module is used for selecting one or more first audio playing events from the plurality of first audio playing events stored before and playing audio data corresponding to the selected first audio playing events under the condition that the current audio playing state is not played.

The selecting module may select one or more first audio play events from a plurality of first audio play events stored previously according to the audio play priority, or may select one or more first audio play events from a plurality of first audio play events stored previously according to a sequence in which the first audio play events are stored.

Optionally, the voice interaction apparatus 500 may further comprise a setup module for establishing a connection between the application and the first device.

Details of this embodiment may be referred to above in connection with fig. 1, and will not be described here again.

In another embodiment, the setting module 510 is configured to set the audio playing priority according to a service scenario related to an application on the smart device. The acquiring module 520 is configured to acquire a first audio playing event to be processed. The determining module 530 is configured to determine whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scenario targeted by the first audio play event.

Optionally, the voice interaction device 500 may further include a judging module and a saving module. The judging module is used for judging whether the first audio playing event is an event which needs to be executed in real time, and the storing module stores the first audio playing event under the condition that the judging module judges that the first audio playing event is an event which does not need to be executed in real time.

Details of this embodiment may be referred to above in connection with fig. 4, and will not be described here again.

FIG. 6 illustrates a schematic diagram of a computing device that may be used to implement the voice interaction method described above, according to one embodiment of the present disclosure.

Referring to fig. 6, a computing device 600 includes a memory 610 and a processor 620.

Processor 620 may be a multi-core processor or may include multiple processors. In some embodiments, processor 620 may include a general-purpose host processor and one or more special coprocessors, such as a Graphics Processor (GPU), digital Signal Processor (DSP), etc. In some embodiments, the processor 620 may be implemented using custom circuitry, for example, an application specific integrated circuit (ASIC, application Specific Integrated Circuit) or a field programmable gate array (FPGA, field Programmable Gate Arrays).

Memory 610 may include various types of storage units, such as system memory, read Only Memory (ROM), and persistent storage. Where the ROM may store static data or instructions that are required by the processor 620 or other modules of the computer. The persistent storage may be a readable and writable storage. The persistent storage may be a non-volatile memory device that does not lose stored instructions and data even after the computer is powered down. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the persistent storage may be a removable storage device (e.g., diskette, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as dynamic random access memory. The system memory may store instructions and data that are required by some or all of the processors at runtime. Furthermore, memory 610 may include any combination of computer-readable storage media including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic disks, and/or optical disks may also be employed. In some implementations, memory 610 may include readable and/or writable removable storage devices such as Compact Discs (CDs), digital versatile discs (e.g., DVD-ROMs, dual-layer DVD-ROMs), blu-ray discs read only, super-density discs, flash memory cards (e.g., SD cards, min SD cards, micro-SD cards, etc.), magnetic floppy disks, and the like. The computer readable storage medium does not contain a carrier wave or an instantaneous electronic signal transmitted by wireless or wired transmission.

The memory 610 has stored thereon executable code that, when processed by the processor 620, causes the processor 620 to perform the voice interaction method described above.

The voice interaction method, apparatus and device according to the present invention have been described in detail above with reference to the accompanying drawings.

Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for performing the steps defined in the above-mentioned method of the invention.

Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description of embodiments of the invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of voice interaction, comprising:

setting audio playing priority corresponding to each service scene in a plurality of service scenes related to application on the intelligent equipment;

acquiring a first audio playing event to be processed in the application; and

determining whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene aimed at by the first audio play event,

wherein the application comprises a plurality of application interfaces, the application interfaces are H5 pages, the application interfaces correspond to preset service scenes, the application interfaces are used for responding to the operation of a user to play the audio data under the service scenes corresponding to the application interfaces,

setting audio playing priority corresponding to each of a plurality of service scenes related to application on the intelligent device, including: according to the service scene corresponding to the application interface, registering audio playing priority for the application interface by the H5 page through JS Bridge, wherein the audio playing priority is divided into two levels of mix and focus, the priority of focus is higher than mix,

the method for acquiring the first audio playing event to be processed in the application comprises the following steps: and detecting the application interface through JS Bridge to acquire a first audio playing event in the application interface.

2. The voice interaction method according to claim 1, wherein the step of determining whether to play the audio data corresponding to the first audio play event according to the first audio play priority of the service scene for which the first audio play event is directed comprises:

comparing the first audio playing priority with a second audio playing priority of a service scene aimed at by a second audio playing event of playing audio data in the application;

and playing the audio data corresponding to the first audio playing event under the condition that the first audio playing priority is higher than or equal to the second audio playing priority.

3. The voice interaction method of claim 2, wherein,

stopping playing the audio data corresponding to the second audio playing event and playing the audio data corresponding to the first audio playing event under the condition that the first audio playing priority is higher than the second audio playing priority, or

Continuing to play the audio data corresponding to the second audio play event and playing the audio data corresponding to the first audio play event under the condition that the first audio play priority is equal to the second audio play priority, or

And under the condition that the first audio playing priority is higher than or equal to the second audio playing priority, playing the audio data corresponding to the second audio playing event by using a first parameter, and playing the audio data corresponding to the first audio playing event by using a second parameter.

4. The voice interaction method according to claim 1, further comprising:

and under the condition that the audio data corresponding to the first audio playing event is not played, storing the first audio playing event.

5. The voice interaction method according to claim 4, further comprising:

judging whether the first audio playing event is an event which needs to be executed in real time or not;

and executing the operation of saving the first audio play event under the condition that the first audio play event is judged to be the event which does not need to be executed immediately.

6. The voice interaction method according to claim 4, further comprising:

and under the condition that the current audio playing state is not played, selecting one or more first audio playing events from the plurality of first audio playing events stored before, and playing audio data corresponding to the selected first audio playing events.

7. The method of claim 6, wherein selecting one or more first audio playback events from the plurality of first audio playback events stored previously comprises:

selecting one or more first audio play events from a plurality of first audio play events stored before according to the audio play priority; or alternatively

And selecting one or more first audio play events from the plurality of first audio play events stored before according to the sequence when the first audio play events are stored.

8. The voice interaction method of claim 1, wherein,

the business scenario comprises at least one of the following: talk scenes, navigation scenes, music scenes, video scenes.

9. The voice interaction method according to claim 1, further comprising:

establishing a connection between the application and a first device, the first device having audio input and/or output capabilities, or the first device being connectable to a second device having audio input and/or output capabilities;

the application receiving input audio data from the first device and/or sending output audio data to the first device over the connection;

The application realizes voice interaction based on the input audio data to obtain output audio data.

10. The voice interaction method of claim 9, wherein,

the step of realizing voice interaction comprises the following steps: the application is in communication with a server,

the method further comprises the steps of: the application receives the data content from the server and generates an audio playing event broadcasting the data content.

11. The voice interaction method of claim 9, further comprising:

and under the condition that the audio data corresponding to the first audio playing event is determined to be played, the application sends output audio data corresponding to the first audio playing event to the first equipment through the connection.

12. The voice interaction method of claim 9, wherein,

the connection is established between the application and the first device based on a private protocol,

based on the private protocol, the input audio data is received from the first device and/or output audio data is sent to the first device.

13. The voice interaction method according to claim 1, further comprising:

And in response to receiving the audio play event to be played in the process of playing the audio play event, the playing speed of the audio play event which is currently played is increased.

14. The voice interaction method according to claim 1, further comprising:

receiving an audio playing event sent by second equipment;

and setting audio playing priority for the audio playing event from the second device according to the type of the second device.

15. A voice interaction device, comprising:

the setting module is used for setting audio playing priority corresponding to each service scene in a plurality of service scenes related to the application on the intelligent equipment;

the acquisition module is used for acquiring a first audio playing event to be processed in the application; and

a determining module, configured to determine whether to play audio data corresponding to the first audio playing event according to a first audio playing priority of a service scenario targeted by the first audio playing event,

The setting module registers the audio playing priority for the application interface by the H5 page through JS Bridge according to the corresponding service scene of the application interface, the audio playing priority is divided into two levels of mix and focus, the priority of focus is higher than mix,

the acquisition module detects the application interface through JS Bridge to acquire a first audio playing event in the application interface.

16. A computing device, comprising:

a processor; and

a memory having executable code stored thereon, which when executed by the processor causes the processor to perform the method of any of claims 1 to 14.

17. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1 to 14.