WO2021196617A1 - 一种语音交互方法、装置、电子设备及存储介质 - Google Patents

一种语音交互方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2021196617A1
WO2021196617A1 PCT/CN2020/127116 CN2020127116W WO2021196617A1 WO 2021196617 A1 WO2021196617 A1 WO 2021196617A1 CN 2020127116 W CN2020127116 W CN 2020127116W WO 2021196617 A1 WO2021196617 A1 WO 2021196617A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio information
voice
interactive
audio channel
Prior art date
Application number
PCT/CN2020/127116
Other languages
English (en)
French (fr)
Inventor
何亚欣
Original Assignee
深圳创维-Rgb电子有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳创维-Rgb电子有限公司 filed Critical 深圳创维-Rgb电子有限公司
Publication of WO2021196617A1 publication Critical patent/WO2021196617A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present disclosure relates to the technical field of speech recognition, and in particular, to a voice interaction method, device, electronic equipment, and storage medium.
  • voice recognition technology is often applied in the field of smart TVs to achieve voice interaction between smart TVs and users, such as changing channels, adjusting volume, and turning on or off based on voice. Smart TV.
  • the purpose of the embodiments of the present disclosure is to provide a voice interaction method, device, electronic device, and storage medium, which can control the volume of interactive audio information and on-demand audio information based on different audio channels, and improve the interactive audio information. Recognition efficiency improves the efficiency of human-computer interaction.
  • the embodiment of the present disclosure provides a voice interaction method, and the method includes:
  • the target state is Closed state or bass state
  • the voice interaction method further includes:
  • the first audio channel After receiving the voice closing instruction, the first audio channel is closed, and the second audio channel is switched from the target state to the working state.
  • the voice interaction method further includes:
  • the first audio channel is also used to transmit prompt audio information, and after the first audio channel is enabled, the method further includes:
  • the prompt audio information and the interactive audio information are sequentially transmitted to the playback terminal through the first audio channel for playback.
  • the determining the transmission sequence of the prompt audio information and the interactive audio information based on the audio information transmission priority corresponding to the first audio channel includes: determining the prompt to be played The first transmission time range corresponding to the audio information intersects the second transmission time range corresponding to the interactive audio information.
  • the first audio channel is also used to transmit prompt audio information. After the first audio channel is closed, the method further includes:
  • the first audio channel is activated, and the currently activated second audio channel is set as the target state;
  • the prompt audio information is transmitted to the playback terminal through the first audio channel for playback, and after the prompt audio information is played, the first audio channel is closed, and the second audio channel Switch from the target state to the working state.
  • the switching the second audio channel from the target state to the working state includes:
  • the second audio channel is switched from the bass state to the preset volume state.
  • the voice wake-up instruction includes at least one of the following: a voice wake-up instruction generated based on the voice interaction activation information sent by the user; a voice wake-up instruction generated based on the voice interaction activation control key being clicked; based on voice interaction The voice wake-up command generated by the opening control being clicked; the voice wake-up command sent by the remote control device.
  • the voice off instruction includes at least one of the following: a voice off instruction generated based on the voice interaction off information sent by the user via voice; a voice off instruction generated based on the voice interaction off control key being clicked; based on A voice close command generated when the voice interactive close control is clicked; a voice close command generated based on not receiving the next interactive audio command within a preset time interval.
  • the failure to receive the next interactive audio instruction within the preset time interval includes: starting timing after receiving the interactive audio instruction, and after the preset time interval has elapsed, determining that the next interactive audio instruction is It is assumed that the next interactive audio command is not received within the time interval.
  • the method further includes: after receiving the interactive audio instruction, executing a device control operation corresponding to the interactive audio instruction.
  • the embodiment of the present disclosure provides a voice interaction device, the device includes:
  • the first setting module is configured to enable the first audio channel for transmitting interactive audio information after receiving the voice wake-up instruction, and set the currently enabled second audio channel for transmitting on-demand audio information to the target state;
  • the target state is an off state or a bass state;
  • the search module is configured to search for interactive audio information matching the interactive audio instruction after receiving the interactive audio instruction;
  • the first transmission module is configured to transmit the interactive audio information to the playback terminal through the first audio channel for playback.
  • the voice interaction device further includes:
  • the second setting module is used to close the first audio channel after receiving the voice closing instruction, and switch the second audio channel from the target state to the working state.
  • An embodiment of the present disclosure provides an electronic device, including a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processor and the bus The memories communicate through a bus, and the processor executes the machine-readable instructions to execute the steps of the voice interaction method according to any one of the embodiments of the present disclosure.
  • the embodiments of the present disclosure provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the voice interaction method described in any one of the embodiments of the present disclosure when the computer program is run by a processor A step of.
  • the voice interaction method, device, electronic device and storage medium provided by the embodiments of the present disclosure enable the first audio channel for transmitting interactive audio information after receiving the voice wake-up instruction, and use the currently enabled audio-on-demand transmission
  • the second audio channel of the information is set to the target state; the target state is the off state or the bass state; after receiving the interactive audio instruction, the interactive audio information matching the interactive audio instruction is searched, and the interactive audio information is passed through the first
  • the audio channel is transmitted to the playback terminal for playback.
  • the embodiments of the present disclosure can control the volume of interactive audio information and on-demand audio information based on different audio channels, thereby improving the identification efficiency of interactive audio information, and thus the efficiency of human-computer interaction.
  • the voice interaction method, device, electronic device, and storage medium can also determine the prompt audio information based on the audio information transmission priority corresponding to the first audio channel after detecting the prompt audio information to be played And the transmission sequence of interactive audio information; and based on the transmission sequence, the prompt audio information and interactive audio information are sequentially transmitted through the first audio channel to the playback terminal for playback.
  • the first audio channel is used to transmit prompt audio information and interactive audio information , Can reduce the number of occupied audio channels, improve the utilization of the first audio channel, and determine the transmission order of audio information based on the audio information transmission priority corresponding to the first audio channel, which can improve the audio information of the first audio channel Transmission quality.
  • Figure 1 shows a flowchart of a voice interaction method provided by an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of another voice interaction method provided by an embodiment of the present disclosure
  • FIG. 3 shows a flowchart of another voice interaction method provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic structural diagram of a voice interaction device provided by an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
  • the user can interact with the smart TV by voice while watching the TV program to obtain the voice interaction content feedback from the smart TV.
  • embodiments of the present disclosure provide a voice interaction method, device, electronic device, and storage medium. After receiving a voice wake-up instruction, enable the first audio channel for transmitting interactive audio information, and enable the current The second audio channel used to transmit on-demand audio information is set to the target state; where the target state is the off state or the bass state; after receiving the interactive audio command, find the interactive audio information that matches the interactive audio command, and set The interactive audio information is transmitted to the playback terminal through the first audio channel for playback.
  • the embodiments of the present disclosure can control the volume of the interactive audio information and the on-demand audio information based on different audio channels, thereby improving the identification efficiency of interactive audio information, thereby improving the human The efficiency of computer interaction.
  • the voice interaction method includes the following steps:
  • the implementation subject of the voice interaction method provided by the embodiments of the present disclosure may be smart TVs, tablets, mobile phones, computers, and other terminal devices that can interact with users by voice.
  • Smart TVs are taken as an example but not limited to smart TVs below.
  • the smart TV may include at least two audio channels, where the first audio channel can be used to transmit interactive audio information, and the second audio channel can be used to transmit on-demand audio information, such as audio information of a TV series on-demand by a user,
  • the first audio channel may correspond to the first volume
  • the second audio channel may correspond to the second volume
  • the first volume and the second volume can be adjusted respectively.
  • the first audio channel is in the off state
  • the second audio channel is in the on state.
  • the user can be turned on The voice interaction function between the smart TV and the smart TV.
  • the first audio channel can be switched from the off state to the on state
  • the second audio channel can be switched from the on state to the target state, so as to realize the interaction between the user and the smart TV. Voice interaction between.
  • the first audio channel corresponds to the first preset volume.
  • the first volume corresponding to the first audio channel can also be set to the first preset volume, here .
  • the first preset volume may be a locally pre-stored volume, or a volume selected by the user according to his own needs.
  • the target state is the off state or the bass state
  • the switching of the second audio channel from the on state to the target state specifically includes: the second audio channel can be switched from the on state to the off state, or the second audio channel can be switched from on
  • the state is switched to the bass state
  • the bass state corresponds to the second preset volume, that is, the second volume corresponding to the second audio channel can be set to the second preset volume
  • the on-demand audio information can be transmitted to the second audio channel through the second audio channel.
  • the playback terminal performs playback at the second volume (second preset volume).
  • the second preset volume may be less than the first preset volume.
  • the voice wake-up instruction can be received in one of the following ways:
  • Receive specific voice interaction activation information sent by the user-the voice interaction activation information may include: “Turn on the voice interaction function", "Let's chat”.
  • S102 After receiving the interactive audio instruction, search for interactive audio information that matches the interactive audio instruction, and transmit the interactive audio information to the playback terminal through the first audio channel for playback.
  • the corresponding relationship between interactive audio instructions and interactive audio information can be pre-stored locally. After receiving the interactive audio instructions, the interactive audio information corresponding to the interactive audio instructions can be searched for based on the above corresponding relationship, and the found The interactive audio information is transmitted to the playback terminal through the first audio channel for playback.
  • the playback terminal may include a display screen and a speaker.
  • the interactive audio information may include interactive voice information and interactive video information.
  • the interactive voice information may be transmitted to the speaker of the smart TV through the first audio channel to be played at the above-mentioned first volume (first preset volume), and the interactive video may be The information is transmitted to the display screen of the smart TV through the first audio channel for playback.
  • the interactive audio instruction can correspond to fixed interactive audio information, or it can correspond to dynamic interactive audio information.
  • the fixed interactive audio information that matches the interactive audio command such as "My screen is 55 inches?”
  • dynamic interactive audio information that matches the interactive audio command such as "current time three o'clock in the afternoon” can be transmitted to the player through the first audio channel for playback.
  • the device control operation corresponding to the interactive audio instruction may be executed.
  • the processor in the smart TV not only needs to feed back the interactive audio information matching the interactive audio command to the user, but also responds to the interactive audio command to execute the corresponding device Control operations, for example, after receiving an interactive audio command, such as "decrease the brightness of the display screen", the interactive audio information such as "the brightness is too low can hurt your eyes" through the first audio channel to be transmitted to the player for playback. And respond to the above interactive audio instructions, such as "decrease the brightness of the display screen", to reduce the brightness of the display; for another example, after receiving an interactive audio instruction, such as "turn off the smart TV", the shutdown operation can be performed.
  • the voice interaction method provided by the embodiments of the present disclosure can respectively control the volume of interactive audio information and on-demand audio information based on different audio channels, thereby improving the recognition efficiency of interactive audio information, and further improving the efficiency of human-computer interaction.
  • the voice interaction method may further include:
  • the first audio channel After receiving the voice closing instruction, the first audio channel is closed, and the second audio channel is switched from the target state to the working state.
  • the first audio channel after receiving the voice closing instruction, can be switched from the open state to the closed state, and the second audio channel can be switched from the target state to the working state.
  • the second audio channel can be switched from the target state to the working state, including: the second audio channel in the closed state can be re-enabled; or, the second audio channel can be switched from the bass state to the preset volume state.
  • the second audio channel when the target state is the off state, the second audio channel can be switched from the off state to the on state, and the second volume corresponding to the second audio channel can be restored; when the target state is the bass state, it can be restored to the second audio channel.
  • the second volume corresponding to the audio channel, or the second volume corresponding to the second audio channel may be set to a third preset volume, where the third preset volume may be a locally pre-stored volume.
  • the voice closing instruction can be received in one of the following ways:
  • the voice interaction close information may include: for example, the voice message "Turn off the voice interaction function" and "Let's end the chat", and the voice interaction close information is configured as Indicates that a voice-off instruction is generated.
  • the voice interaction close control key on the smart TV is generated and a voice close instruction is generated.
  • the voice interaction close control is generated, and a voice close instruction is generated.
  • the smart TV starts timing after receiving the audio interaction instruction, waits for 10 minutes, and determines that no new audio interaction instruction is received within these 10 minutes, the smart TV may generate a voice shutdown instruction to instruct the smart TV to turn off the The first audio channel and the second audio channel are switched from the target state to the working state, or other corresponding operations are performed. .
  • the method may further include:
  • the interactive audio information matching the voice wake-up instruction can be searched, and the interactive audio information can be transmitted to the playback terminal through the first audio channel for playback.
  • the interactive audio information matching the voice wake-up instruction may be transmitted to the playback terminal through the first audio channel for playback.
  • the interactive audio information corresponding to the voice wake-up instruction may be pre-stored locally. After the voice wake-up instruction is received, the interactive audio information may be transmitted to the player through the first audio channel for playback.
  • the interactive audio information corresponding to the voice wake-up command is pre-stored locally, such as the audio message "I am happy to chat with you".
  • the interactive audio information can be played, for example: audio message "I am happy to chat with you" ".
  • the first audio channel may also be used to transmit prompt audio information.
  • the method may further include:
  • S202 Based on the transmission sequence, sequentially transmit the prompt audio information and the interactive audio information through the first audio channel to a playback terminal for playback.
  • the first audio channel can be used to transmit prompt audio information and interactive audio information.
  • the prompt audio information to be played is detected, the to be played can be obtained
  • the first transmission time range corresponding to the prompt audio information, and the second transmission time range corresponding to the interactive audio information to be played can be obtained. If the first transmission time range intersects the second transmission time range, it can be based on the first transmission time range.
  • the audio information transmission priority corresponding to the audio channel determines the transmission sequence of the aforementioned prompt audio information to be played and the aforementioned interactive audio information to be played, and the aforementioned prompt audio information to be played and the aforementioned to-be-played interactive audio information can be sequentially combined according to the aforementioned transmission sequence.
  • the interactive audio information is transmitted to the playback terminal through the first audio channel for playback; if the first transmission time range and the second transmission time range do not intersect, the prompt audio information to be played can be transmitted within the first transmission time range respectively , Transmitting the above interactive audio information to be played within the second transmission time range.
  • the first transmission time range corresponding to the prompt audio information to be played is from 11:30:00 on March 31, 2020 to 11:30:05 on March 31, 2020
  • the interactive audio information to be played corresponds to
  • the second transmission time range is from 11:30:03 on March 31, 2020 to 11:30:10 on March 31, 2020. It can be seen that the first transmission time range corresponding to the prompt audio information to be played and all If the second transmission time range corresponding to the interactive audio information to be played has an intersecting part, then according to the audio information transmission priority corresponding to the first audio channel, the prompt audio information to be played and the above-mentioned prompt audio information can be sequentially transmitted through the first audio channel. Interactive audio information to be played.
  • the first audio channel is also used to transmit prompt audio information.
  • the method further includes:
  • the first audio channel can be used to transmit prompt audio information and interactive audio information.
  • the voice interaction function between the user and the smart TV is not enabled, the first audio channel is in the closed state, and the second audio channel is in the open state.
  • the first audio channel corresponds to a first preset volume.
  • the first volume corresponding to the first audio channel can be set to the first preset volume.
  • a preset volume can be a locally pre-stored volume or a volume selected by the user according to his own needs.
  • the target state may be an off state or a bass state
  • switching the second audio channel from the on state to the target state may specifically include: switching the second audio channel from the on state to the off state, or turning the second audio channel from on
  • the status is switched to the bass status, and the bass status corresponds to the second preset volume, that is, the second volume corresponding to the second audio channel is set to the second preset volume, and further, the on-demand audio information can be transmitted to the playback terminal through the second audio channel Play at a second volume (a second preset volume), where the second preset volume is less than the first preset volume.
  • the prompt audio information can be transmitted to the playback terminal through the first audio channel for playback.
  • Each prompt audio information corresponds to a playback duration.
  • the first audio channel can be switched from the on state. In the off state, and the second audio channel is switched from the target state to the working state.
  • switching the second audio channel from the target state to the working state may include: re-enabling the second audio channel in the closed state; or switching the second audio channel from the bass state to the preset volume state.
  • the second audio channel when the target state is the off state, the second audio channel can be switched from the off state to the on state, and the second volume corresponding to the second audio channel is restored; when the target state is the bass state, the second audio channel is restored.
  • the corresponding second volume, or the second volume corresponding to the second audio channel may be set as the third preset volume, where the third preset volume is a locally pre-stored volume.
  • the prompt audio information can include prompt voice information and prompt video information.
  • the prompt voice information can be transmitted to the speaker of the smart TV through the first audio channel to be played at the above-mentioned first volume (first preset volume), and the prompt video can be The information is transmitted to the display screen of the smart TV through the first audio channel for playback.
  • the embodiment of the present disclosure also provides a voice interaction device corresponding to the voice interaction method. Since the principle of the device in the embodiment of the disclosure to solve the problem is similar to the voice interaction method of the embodiment of the disclosure, the implementation of the device is You can refer to the implementation of the method, and the repetition will not be repeated.
  • FIG. 4 is a schematic structural diagram of a voice interaction device provided by an embodiment of the disclosure, and the voice interaction device includes:
  • the first setting module 401 is configured to enable the first audio channel for transmitting interactive audio information after receiving the voice wake-up instruction, and set the currently enabled second audio channel for transmitting on-demand audio information to the target state ;
  • the target state is the off state or the bass state;
  • the searching module 402 is configured to search for interactive audio information matching the interactive audio instruction after receiving the interactive audio instruction;
  • the first transmission module 403 is configured to transmit the interactive audio information to the playback terminal through the first audio channel for playback.
  • the voice interaction device further includes:
  • the second setting module is used to close the first audio channel after receiving the voice closing instruction, and switch the second audio channel from the target state to the working state.
  • the voice interaction device further includes:
  • the second transmission module is configured to search for interactive audio information matching the voice wake-up instruction, and transmit the interactive audio information to the playback terminal through the first audio channel for playback.
  • the first audio channel is also used to transmit prompt audio information
  • the voice interaction device further includes:
  • a determining module configured to determine the transmission sequence of the prompt audio information and the interactive audio information based on the audio information transmission priority corresponding to the first audio channel if the prompt audio information to be played is detected;
  • the third transmission module is configured to sequentially transmit the prompt audio information and the interactive audio information through the first audio channel to the playback terminal for playback based on the transmission sequence.
  • the first audio channel is also used to transmit prompt audio information
  • the voice interaction device further includes:
  • the third setting module is configured to enable the first audio channel if the prompt audio information to be played is detected, and set the currently enabled second audio channel to the target state;
  • a fourth transmission module configured to transmit the prompt audio information to the playback terminal through the first audio channel for playback
  • the fourth setting module is used to close the first audio channel after the prompt audio information is played, and switch the second audio channel from the target state to the working state.
  • the second setting module switches the second audio channel from the target state to the working state
  • the fourth setting module switches the second audio channel from the target state to the working state
  • the second audio channel is switched from the bass state to the preset volume state.
  • the voice interaction device provided by the embodiments of the present disclosure can respectively control the volume of interactive audio information and on-demand audio information based on different audio channels, thereby improving the identification efficiency of interactive audio information, and further improving the efficiency of human-computer interaction.
  • Figure 5 is an electronic device 500 provided by an embodiment of the present disclosure.
  • the electronic device 500 includes a processor 501, a memory 502, and a bus. Machine-readable instructions.
  • the processor 501 communicates with the memory 502 through a bus, and the processor 501 executes the machine-readable instructions to perform the steps of the above-mentioned voice interaction method.
  • the aforementioned memory 502 and the processor 501 can be general-purpose memories and processors, which are not specifically limited here.
  • the processor 501 runs the computer program stored in the memory 502, the aforementioned voice interaction method can be executed.
  • embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium. step.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed on multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present disclosure may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the function is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a nonvolatile computer readable storage medium executable by a processor.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the embodiments of the present disclosure provide a voice interaction method, device, electronic equipment, and storage medium. Since the voice interaction device of the embodiments of the present disclosure includes two audio channels, it can control interactive audio information and on-demand audio respectively based on different audio channels. The volume of the information improves the recognition efficiency of interactive audio information, thereby improving the efficiency of human-computer interaction.

Abstract

一种语音交互方法、装置、电子设备及存储介质,其中,语音交互方法包括:在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,目标状态为关闭状态或者低音状态(S101);在接收到交互音频指令后,查找与交互音频指令相匹配的交互音频信息,并将交互音频信息通过第一音频通道传输至播放端进行播放(S102)。能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。

Description

一种语音交互方法、装置、电子设备及存储介质
相关申请的交叉引用
本公开要求于2020年04月02日提交中国专利局的申请号为202010256089.3、名称为“一种语音交互方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及语音识别技术领域,具体而言,涉及一种语音交互方法、装置、电子设备及存储介质。
背景技术
近年来,随着语音识别技术逐渐趋于成熟,常常将语音识别技术应用在智能电视领域,以实现智能电视和用户之间的语音交互功能,比如,基于语音调换频道、调节音量、开启或关闭智能电视。
实际中,用户使用智能电视的过程中,可以在观看电视节目的同时,与智能电视进行语音交互,获取智能电视反馈的语音交互内容,此时,受正在播放的电视节目的影响,用户很难对电视节目和语音交互内容进行区分,这将降低用户识别语音交互内容的效率,进而降低用户与智能电视的交互效率。
发明内容
有鉴于此,本公开实施例的目的在于提供一种语音交互方法、装置、电子设备及存储介质,能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。
本公开实施例提供了一种语音交互方法,所述方法包括:
在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,所述目标状态为关闭状态或者低音状态;
在接收到交互音频指令后,查找与所述交互音频指令相匹配的交互音频信息,并将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
在本公开实施方式中,所述语音交互方法还包括:
在接收到语音关闭指令后,关闭所述第一音频通道,以及,将所述第二音频通道由所述目标状态切换为工作状态。
在本公开实施方式中,所述语音交互方法还包括:
查找与所述语音唤醒指令相匹配的交互音频信息,并将所述交互音频信息通过所述第一音频通道传输至所述播放端进行播放。
在本公开实施方式中,所述第一音频通道还用于传输提示音频信息,在启用所述第一音频通道后,所述方法还包括:
若检测到待播放的提示音频信息,则基于所述第一音频通道对应的音频信息传输优先级,确定所述提示音频信息和所述交互音频信息的传输顺序;
基于所述传输顺序,依次将所述提示音频信息和所述交互音频信息通过所述第一音频通道传输至所述播放端进行播放。
在本公开实施方式中,所述基于所述第一音频通道对应的音频信息传输优先级,确定所述提示音频信息和所述交互音频信息的传输顺序,之前包括:确定所述待播放的提示音频信息对应的第一传输时间范围与所述交互音频信息对应的第二传输时间范围相交。
在本公开实施方式中,所述第一音频通道还用于传输提示音频信息,在关闭所述第一音频通道后,所述方法还包括:
若检测到待播放的提示音频信息,则启用所述第一音频通道,以及,将当前启用的第二音频通道设置为目标状态;
将所述提示音频信息通过所述第一音频通道传输至所述播放端进行播放,并在所述提示音频信息播放完成后,关闭所述第一音频通道,以及,将所述第二音频通道由所述目标状态切换为工作状态。
在本公开实施方式中,所述将第二音频通道由所述目标状态切换为工作状态,包括:
重新启用处于关闭状态的第二音频通道;
或者,
将所述第二音频通道由低音状态切换为预设音量状态。
在本公开实施方式中,所述语音唤醒指令至少包括以下之一:基于用户发送的语音交互开启信息生成的语音唤醒指令;基于语音交互开启控制键被点击而生成的语音唤醒指令;基于语音交互开启控件被点击而生成的语音唤醒指令;遥控设备发送的语音唤醒指令。
在本公开实施方式中,所述语音关闭指令至少包括以下之一:基于用户通过语音发送的语音交互关闭信息生成的语音关闭指令;基于语音交互关闭控制键被点击而生 成的语音关闭指令;基于语音交互关闭控件被点击而生成的语音关闭指令;基于预设时间间隔内未接收到下一个交互音频指令而生成的语音关闭指令。
在本公开实施方式中,所述预设时间间隔内未接收到下一个交互音频指令,包括:当接收到所述交互音频指令后开始计时,在经过预设时间间隔后,确定在所述预设时间间隔内未接收到下一个交互音频指令。
在本公开实施方式中,所述方法还包括:在接收到所述交互音频指令后,执行所述交互音频指令对应的设备控制操作。
本公开实施例提供了一种语音交互装置,所述装置包括:
第一设置模块,用于在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,所述目标状态为关闭状态或者低音状态;
查找模块,用于在接收到交互音频指令后,查找与所述交互音频指令相匹配的交互音频信息;
第一传输模块,用于将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
在本公开实施方式中,所述语音交互装置还包括:
第二设置模块,用于在接收到语音关闭指令后,关闭第一音频通道,以及,将第二音频通道由目标状态切换为工作状态。
本公开实施例提供了一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述处理器执行所述机器可读指令,以执行本公开实施例的任一项所述的语音交互方法的步骤。
本公开实施例提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行本公开实施例的任一项所述的语音交互方法的步骤。
本公开实施例提供的语音交互方法、装置、电子设备及存储介质,在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,目标状态为关闭状态或者低音状态;在接收到交互音频指令后,查找与交互音频指令相匹配的交互音频信息,并将交互音频信息通过第一音频通道传输至播放端进行播放,本公开实施例能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。
进一步,本公开实施例提供的语音交互方法、装置、电子设备及存储介质,还可以在检测到待播放的提示音频信息后,基于第一音频通道对应的音频信息传输优先级,确定提示音频信息和交互音频信息的传输顺序;并基于传输顺序,依次将提示音频信息和交互音频信息通过第一音频通道传输至播放端进行播放,这里,第一音频通道用于传输提示音频信息和交互音频信息,能够减少占用的音频通道的数量,提高第一音频通道的利用率,并且,基于第一音频通道对应的音频信息传输优先级,确定音频信息的传输顺序,能够提高第一音频通道音频信息的传输质量。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例提供的一种语音交互方法的流程图;
图2示出了本公开实施例提供的另一种语音交互方法的流程图;
图3示出了本公开实施例提供的另一种语音交互方法的流程图;
图4示出了本公开实施例提供的一种语音交互装置的结构示意图;
图5示出了本公开实施例提供的一种电子设备的示意图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
现阶段,用户使用智能电视的过程中,可以在观看电视节目的同时,与智能电视进行语音交互,获取智能电视反馈的语音交互内容,此时,受正在播放的电视节目的影响,用户很难对电视节目和语音交互内容进行区分,这将降低用户识别语音交互内容的效率,进 而降低用户与智能电视的交互效率。
基于上述问题,本公开实施例提供了一种语音交互方法、装置、电子设备及存储介质,在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,目标状态为关闭状态或者低音状态;在接收到交互音频指令后,查找与交互音频指令相匹配的交互音频信息,并将交互音频信息通过第一音频通道传输至播放端进行播放,本公开实施例能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。
针对以上方案所存在的缺陷,均是发明人在经过实践并仔细研究后得出的结果,因此,上述问题的发现过程以及下文中本公开针对上述问题所提出的解决方案,都应该是发明人在本公开过程中对本公开做出的贡献。
为了使得本领域技术人员能够使用本公开内容,结合特定应用场景“智能电视领域”,给出以下实施方式。对于本领域技术人员来说,在不脱离本公开的精神和范围的情况下,可以将这里定义的一般原理应用于其他实施例和应用场景。虽然本公开主要围绕“智能电视领域”进行描述,但是应该理解,这仅是一个示例性实施例。
下面将结合本公开中附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。
为便于对本实施例进行理解,首先对本公开实施例所公开的一种语音交互方法进行详细介绍。
如图1所示,为本公开实施例提供的语音交互方法的流程图,该语音交互方法包括以下步骤:
S101、在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,所述目标状态为关闭状态或者低音状态。
本公开实施例中,本公开实施例提供的语音交互方法的实施主体可以是智能电视、平板、手机、电脑等可以与用户进行语音交互的终端设备,下文以智能电视为例但不限于智 能电视,其中,智能电视中可以至少包括两条音频通道,其中,第一音频通道可以用于传输交互音频信息,第二音频通道可以用于传输点播音频信息,比如,用户点播的电视剧的音频信息,并且,第一音频通道可以对应第一音量,第二音频通道可以对应第二音量,可以分别对第一音量和第二音量进行调节。
在智能电视播放点播音频信息时(未开启用户与智能电视之间的语音交互功能),第一音频通道处于关闭状态,第二音频通道处于开启状态,在接收到语音唤醒指令后,可以开启用户与智能电视之间的语音交互功能,此时,可以将第一音频通道由关闭状态切换为开启状态,以及,可以将第二音频通道由开启状态切换为目标状态,以实现用户与智能电视之间的语音交互。
其中,第一音频通道对应有第一预设音量,可以将第一音频通道由关闭状态切换为开启状态时,也可以将第一音频通道对应的第一音量设置为第一预设音量,这里,第一预设音量可以为本地预存的音量,也可以为用户根据自身的需求选择的音量。
其中,目标状态为关闭状态或者低音状态,将第二音频通道由开启状态切换为目标状态具体包括:可以将第二音频通道由开启状态切换为关闭状态,或者,可以将第二音频通道由开启状态切换为低音状态,低音状态对应有第二预设音量,即可以将第二音频通道对应的第二音量设置为第二预设音量,进而,可以将点播音频信息通过第二音频通道传输至播放端以第二音量(第二预设音量)进行播放,这里,第二预设音量可以小于第一预设音量。
本公开实施例中,可以通过如下方式之一接收语音唤醒指令:
1、接收用户发送的特定的语音交互开启信息——该语音交互开启信息可以包括:如“开启语音交互功能”、“让我们聊天吧”。
2、检测到用户点击(例如:检测到长按操作)智能电视上的语音交互开启控制键。
3、检测到用户点击(例如:检测到长按、滑动操作)智能电视显示屏上的语音交互开启控件。
4、接收智能电视对应的遥控设备发送的语音唤醒指令。
S102、在接收到交互音频指令后,查找与所述交互音频指令相匹配的交互音频信息,并将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
本公开实施例中,本地可以预存交互音频指令与交互音频信息的对应关系,在接收到交互音频指令后,可以基于上述对应关系,查找交互音频指令对应的交互音频信息,并可以将查找到的交互音频信息通过所述第一音频通道传输至播放端进行播放,这里,播放端可以包括显示屏和音箱。
其中,交互音频信息可以包括交互语音信息和交互视频信息,可以将交互语音信息通 过第一音频通道传输至智能电视的音箱以上述第一音量(第一预设音量)进行播放,可以将交互视频信息通过第一音频通道传输至智能电视的显示屏进行播放。
本公开实施例中,交互音频指令可以对应固定的交互音频信息,也可以对应动态的交互音频信息,比如,在接收到交互音频指令,诸如“你的显示屏是多大的尺寸呢”,之后,可以将与该交互音频指令相匹配的固定的交互音频信息,诸如“我的屏幕是55英寸呢”,通过第一音频通道传输至播放端进行播放,或者,在接收到交互音频指令,诸如“现在几点了”之后,可以将与该交互音频指令相匹配的动态的交互音频信息,诸如“当前时间下午三点整”,通过第一音频通道传输至播放端进行播放。
在本公开实施例中,在接收到交互音频指令后,可以执行所述交互音频指令对应的设备控制操作。具体地,实际中,在接收到交互音频指令后,智能电视中的处理器既要将与交互音频指令相匹配的交互音频信息反馈给用户,还要响应该交互音频指令,以执行对应的设备控制操作,比如,在接收到交互音频指令,诸如“调低显示屏亮度”后,可以将交互音频信息,诸如“亮度太低容易伤眼睛哦”通过第一音频通道传输至播放端进行播放,并响应上述交互音频指令,诸如“调低显示屏亮度”,以降低显示屏的亮度;又例如,在接收到交互音频指令,诸如“关闭智能电视”后,可以执行关机操作。
本公开实施例提供的语音交互方法,能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。
进一步的,所述语音交互方法还可以包括:
在接收到语音关闭指令后,关闭所述第一音频通道,以及,将所述第二音频通道由所述目标状态切换为工作状态。
本公开实施例中,在接收到语音关闭指令后,可以将第一音频通道由开启状态切换为关闭状态,可以将第二音频通道由目标状态切换为工作状态。
其中,可以将第二音频通道由目标状态切换为工作状态,包括:可以重新启用处于关闭状态的第二音频通道;或者,可以将所述第二音频通道由低音状态切换为预设音量状态。
具体的,当目标状态为关闭状态时,可以将第二音频通道由关闭状态切换为开启状态,并恢复第二音频通道对应的第二音量;当目标状态为低音状态时,可以恢复至第二音频通道对应的第二音量,或者,可以将第二音频通道对应的第二音量设置为第三预设音量,这里,第三预设音量可以为本地预存的音量。
本公开实施例中,可以通过如下方式之一接收语音关闭指令:
1、接收用户通过语音发送的特定的语音交互关闭信息,该语音交互关闭信息可以包括:比如,语音信息“关闭语音交互功能”、“让我们结束聊天吧”,该语音交互关闭信息被配置成指示生成语音关闭指令。
2、检测到用户点击(例如:检测到长按操作)智能电视上的语音交互关闭控制键后生成语音关闭指令。
3、检测到用户点击(例如:检测到长按、滑动操作)智能电视显示屏上的语音交互关闭控件后,生成语音关闭指令。
4、当接收到所述交互音频指令后开始计时,在经过预设时间间隔后,确定在所述预设时间间隔内未接收到下一个交互音频指令,生成语音关闭指令。例如,智能电视接收到了音频交互指令后开始计时,等待10分钟后,判断在这10分钟之内未收到新的音频交互指令,则智能电视可以生成语音关闭指令,以指示智能电视关闭所述第一音频通道以及将所述第二音频通道由所述目标状态切换为工作状态,或进行其他相应操作。。
进一步的,在接收到语音唤醒指令后,所述方法还可以包括:
可以查找与所述语音唤醒指令相匹配的交互音频信息,并可以将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
本公开实施例中,在接收到语音唤醒指令后,可以将与语音唤醒指令相匹配的交互音频信息通过第一音频通道传输至播放端进行播放。
作为本公开示例性的实施方式,本地可以预存有语音唤醒指令对应的交互音频信息,在接收到语音唤醒指令后,可以将上述交互音频信息通过第一音频通道传输至播放端进行播放。
比如,本地预存有语音唤醒指令对应的交互音频信息,例如:音频信息“很开心与你聊天”,在接收到语音唤醒指令后,可以播放交互音频信息,例如:音频信息“很开心与你聊天”。
进一步的,如图2所示,所述第一音频通道还可以用于传输提示音频信息,在启用第一音频通道后,所述方法还可以包括:
S201、若检测到待播放的提示音频信息,则基于所述第一音频通道对应的音频信息传输优先级,确定所述提示音频信息和所述交互音频信息的传输顺序。
S202、基于所述传输顺序,依次将所述提示音频信息和所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
结合步骤201和步骤202,第一音频通道可以用于传输提示音频信息和交互音频信息,在智能电视与用户进行语音交互的过程中,若检测到待播放的提示音频信息,可以获取该待播放的提示音频信息对应的第一传输时间范围,以及,可以获取待播放的交互音频信息对应的第二传输时间范围,若上述第一传输时间范围与第二传输时间范围相交,则可以基于第一音频通道对应的音频信息传输优先级,确定上述待播放的提示音频信息和上述待播 放的交互音频信息的传输顺序,并可以根据上述传输顺序,依次将上述待播放的提示音频信息和上述待播放的交互音频信息通过第一音频通道传输至播放端进行播放;若上述第一传输时间范围与第二传输时间范围不相交,则可以分别在第一传输时间范围内传输上述待播放的提示音频信息,在第二传输时间范围内传输上述待播放的交互音频信息。
比如,待播放的提示音频信息对应的第一传输时间范围为2020年3月31日11点30分00秒至2020年3月31日11点30分05秒,待播放的交互音频信息对应的第二传输时间范围为2020年3月31日11点30分03秒至2020年3月31日11点30分10秒,可以看出待播放的提示音频信息对应的第一传输时间范围与所述待播放的交互音频信息对应的第二传输时间范围有相交的部分,则可以根据第一音频通道对应的音频信息传输优先级,依次通过第一音频通道传输上述待播放的提示音频信息和上述待播放的交互音频信息。
进一步的,如图3所示,所述第一音频通道还用于传输提示音频信息,在关闭第一音频通道后,所述方法还包括:
S301、若检测到待播放的提示音频信息,则启用第一音频通道,以及,将当前启用的第二音频通道设置为目标状态。
本公开实施例中,第一音频通道可以用于传输提示音频信息和交互音频信息,未开启用户与智能电视之间的语音交互功能时,第一音频通道处于关闭状态,第二音频通道处于开启状态,在检测到待播放的提示音频信息后,可以将第一音频通道由关闭状态切换为开启状态,将第二音频通道由开启状态切换为目标状态。
其中,第一音频通道对应有第一预设音量,可以将第一音频通道由关闭状态切换为开启状态时,将第一音频通道对应的第一音量设置为第一预设音量,这里,第一预设音量可以为本地预存的音量,也可以为用户根据自身的需求选择的音量。
其中,目标状态可以为关闭状态或者低音状态,将第二音频通道由开启状态切换为目标状态具体可以包括:将第二音频通道由开启状态切换为关闭状态,或者,将第二音频通道由开启状态切换为低音状态,低音状态对应有第二预设音量,即将第二音频通道对应的第二音量设置为第二预设音量,进而,可以将点播音频信息通过第二音频通道传输至播放端以第二音量(第二预设音量)进行播放,这里,第二预设音量小于第一预设音量。
S302、将所述提示音频信息通过所述第一音频通道传输至播放端进行播放,并在所述提示音频信息播放完成后,关闭第一音频通道,以及,将第二音频通道由目标状态切换为工作状态。
本公开实施例中,可以将提示音频信息通过第一音频通道传输至播放端进行播放,每个提示音频信息均对应有播放时长,在经过播放时长后,可以将第一音频通道从开启状态切换为关闭状态,以及,将第二音频通道从目标状态切换为工作状态。
其中,将第二音频通道由目标状态切换为工作状态,可以包括:重新启用处于关闭状态的第二音频通道;或者,将所述第二音频通道由低音状态切换为预设音量状态。
具体的,当目标状态为关闭状态时,可以将第二音频通道由关闭状态切换为开启状态,并恢复第二音频通道对应的第二音量;当目标状态为低音状态时,恢复第二音频通道对应的第二音量,或者,可以将第二音频通道对应的第二音量设置为第三预设音量,这里,第三预设音量为本地预存的音量。
其中,提示音频信息可以包括提示语音信息和提示视频信息,可以将提示语音信息通过第一音频通道传输至智能电视的音箱以上述第一音量(第一预设音量)进行播放,可以将提示视频信息通过第一音频通道传输至智能电视的显示屏进行播放。
基于同一发明构思,本公开实施例中还提供了与语音交互方法对应的语音交互装置,由于本公开实施例中的装置解决问题的原理与本公开实施例上述语音交互方法相似,因此装置的实施可以参见方法的实施,重复之处不再赘述。
参照图4所示,图4为本公开实施例提供的一种语音交互装置的结构示意图,该语音交互装置包括:
第一设置模块401,用于在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,所述目标状态为关闭状态或者低音状态;
查找模块402,用于在接收到交互音频指令后,查找与所述交互音频指令相匹配的交互音频信息;
第一传输模块403,用于将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
在本公开的实施方式中,所述语音交互装置还包括:
第二设置模块,用于在接收到语音关闭指令后,关闭第一音频通道,以及,将第二音频通道由目标状态切换为工作状态。
在本公开实施方式中,所述语音交互装置还包括:
第二传输模块,用于查找与所述语音唤醒指令相匹配的交互音频信息,并将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
在本公开实施方式中,所述第一音频通道还用于传输提示音频信息,所述语音交互装置还包括:
确定模块,用于若检测到待播放的提示音频信息,则基于所述第一音频通道对应的音频信息传输优先级,确定所述提示音频信息和所述交互音频信息的传输顺序;
第三传输模块,用于基于所述传输顺序,依次将所述提示音频信息和所述交互音频信 息通过所述第一音频通道传输至播放端进行播放。
在本公开实施方式中,所述第一音频通道还用于传输提示音频信息,所述语音交互装置还包括:
第三设置模块,用于若检测到待播放的提示音频信息,则启用第一音频通道,以及,将当前启用的第二音频通道设置为目标状态;
第四传输模块,用于将所述提示音频信息通过所述第一音频通道传输至播放端进行播放;
第四设置模块,用于在所述提示音频信息播放完成后,关闭第一音频通道,以及,将第二音频通道由目标状态切换为工作状态。
在本公开实施方式中,第二设置模块将第二音频通道由目标状态切换为工作状态,或者,第四设置模块将第二音频通道由目标状态切换为工作状态,包括:
重新启用处于关闭状态的第二音频通道;
或者,
将所述第二音频通道由低音状态切换为预设音量状态。
本公开实施例提供的语音交互装置,能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。
参见图5所示,图5为本公开实施例提供的一种电子设备500,该电子设备500包括:处理器501、存储器502和总线,所述存储器502存储有所述处理器501可执行的机器可读指令,当电子设备运行时,所述处理器501与所述存储器502之间通过总线通信,所述处理器501执行所述机器可读指令,以执行如上述语音交互方法的步骤。
具体地,上述存储器502和处理器501能够为通用的存储器和处理器,这里不做具体限定,当处理器501运行存储器502存储的计算机程序时,能够执行上述语音交互方法。
对应于上述语音交互方法,本公开实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器运行时执行上述语音交互方法的步骤。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或模块的间接耦合或通信 连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本公开各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失的计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上所述实施例,仅为本公开的具体实施方式,用以说明本公开的技术方案,而非对其限制,本公开的保护范围并不局限于此,尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,其依然可以对前述实施例所记载的技术方案进行修改或可轻易想到变化,或者对其中部分技术特征进行等同替换;而这些修改、变化或者替换,并不使相应技术方案的本质脱离本公开实施例技术方案的精神和范围,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。
工业实用性
本公开实施例提供了一种语音交互方法、装置、电子设备及存储介质,由于本公开实施例的语音交互装置包含了两种音频通道,能够基于不同的音频通道分别控制交互音频信息和点播音频信息的音量,提高了交互音频信息的识别效率,进而提高了人机交互的效率。

Claims (15)

  1. 一种语音交互方法,其特征在于,所述方法包括:
    在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,所述目标状态为关闭状态或者低音状态;
    在接收到交互音频指令后,查找与所述交互音频指令相匹配的交互音频信息,并将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
  2. 根据权利要求1所述的语音交互方法,其特征在于,所述方法还包括:
    在接收到语音关闭指令后,关闭所述第一音频通道,以及,将所述第二音频通道由所述目标状态切换为工作状态。
  3. 根据权利要求1或2所述的语音交互方法,其特征在于,在接收到所述语音唤醒指令后,所述方法还包括:
    查找与所述语音唤醒指令相匹配的交互音频信息,并将所述交互音频信息通过所述第一音频通道传输至所述播放端进行播放。
  4. 根据权利要求1至3任一项所述的语音交互方法,其特征在于,所述第一音频通道还用于传输提示音频信息,在启用所述第一音频通道后,所述方法还包括:
    若检测到待播放的提示音频信息,则基于所述第一音频通道对应的音频信息传输优先级,确定所述提示音频信息和所述交互音频信息的传输顺序;
    基于所述传输顺序,依次将所述提示音频信息和所述交互音频信息通过所述第一音频通道传输至所述播放端进行播放。
  5. 根据权利要求4所述的语音交互方法,其特征在于,所述基于所述第一音频通道对应的音频信息传输优先级,确定所述提示音频信息和所述交互音频信息的传输顺序,之前包括:
    确定所述待播放的提示音频信息对应的第一传输时间范围与所述交互音频信息对应的第二传输时间范围相交。
  6. 根据权利要求1至5任一项所述的语音交互方法,其特征在于,所述第一音频通道还用于传输提示音频信息,在关闭所述第一音频通道后,所述方法还包括:
    若检测到待播放的提示音频信息,则启用所述第一音频通道,以及,将当前启用的第二音频通道设置为目标状态;
    将所述提示音频信息通过所述第一音频通道传输至所述播放端进行播放,并在所述提示音频信息播放完成后,关闭所述第一音频通道,以及,将所述第二音频通道由 所述目标状态切换为工作状态。
  7. 根据权利要求2至6任一项所述的语音交互方法,其特征在于,所述将第二音频通道由所述目标状态切换为工作状态,包括:
    重新启用处于关闭状态的第二音频通道;
    或者,
    将所述第二音频通道由低音状态切换为预设音量状态。
  8. 根据权利要求1至7任一项所述的语音交互方法,其特征在于,所述语音唤醒指令至少包括以下之一:
    基于用户发送的语音交互开启信息生成的语音唤醒指令;
    基于语音交互开启控制键被点击而生成的语音唤醒指令;
    基于语音交互开启控件被点击而生成的语音唤醒指令;
    遥控设备发送的语音唤醒指令。
  9. 根据权利要求1至8任一项所述的语音交互方法,其特征在于,所述语音关闭指令至少包括以下之一:
    基于用户通过语音发送的语音交互关闭信息生成的语音关闭指令;
    基于语音交互关闭控制键被点击而生成的语音关闭指令;
    基于语音交互关闭控件被点击而生成的语音关闭指令;
    基于预设时间间隔内未接收到下一个交互音频指令而生成的语音关闭指令。
  10. 根据权利要求9所述的语音交互方法,其特征在于,所述预设时间间隔内未接收到下一个交互音频指令,包括:
    当接收到所述交互音频指令后开始计时,在经过预设时间间隔后,确定在所述预设时间间隔内未接收到下一个交互音频指令。
  11. 根据权利要求1至10任一项所述的语音交互方法,其特征在于,所述方法还包括:在接收到所述交互音频指令后,执行所述交互音频指令对应的设备控制操作。
  12. 一种语音交互装置,其特征在于,所述装置包括:
    第一设置模块,用于在接收到语音唤醒指令后,启用用于传输交互音频信息的第一音频通道,以及,将当前启用的用于传输点播音频信息的第二音频通道设置为目标状态;其中,所述目标状态为关闭状态或者低音状态;
    查找模块,用于在接收到交互音频指令后,查找与所述交互音频指令相匹配的交互音频信息;
    第一传输模块,用于将所述交互音频信息通过所述第一音频通道传输至播放端进行播放。
  13. 根据权利要求12所述的语音交互装置,其特征在于,所述装置还包括:
    第二设置模块,用于在接收到语音关闭指令后,关闭所述第一音频通道,以及,将所述第二音频通道由目标状态切换为工作状态。
  14. 一种电子设备,其特征在于,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述处理器执行所述机器可读指令,以执行如权利要求1至11任一项所述的语音交互方法的步骤。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至11任一项所述的语音交互方法的步骤。
PCT/CN2020/127116 2020-04-02 2020-11-06 一种语音交互方法、装置、电子设备及存储介质 WO2021196617A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010256089.3A CN111462744B (zh) 2020-04-02 2020-04-02 一种语音交互方法、装置、电子设备及存储介质
CN202010256089.3 2020-04-02

Publications (1)

Publication Number Publication Date
WO2021196617A1 true WO2021196617A1 (zh) 2021-10-07

Family

ID=71680542

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/127116 WO2021196617A1 (zh) 2020-04-02 2020-11-06 一种语音交互方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN111462744B (zh)
WO (1) WO2021196617A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443197A (zh) * 2022-01-24 2022-05-06 北京百度网讯科技有限公司 界面处理方法、装置、电子设备及存储介质
CN114836936A (zh) * 2022-05-10 2022-08-02 海信(山东)冰箱有限公司 一种衣物处理设备及其控制方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111462744B (zh) * 2020-04-02 2024-01-30 深圳创维-Rgb电子有限公司 一种语音交互方法、装置、电子设备及存储介质
CN113207058B (zh) * 2021-05-06 2023-04-28 恩平市奥达电子科技有限公司 一种音频信号的传输处理方法
CN113362826A (zh) * 2021-06-21 2021-09-07 艺唯科技股份有限公司 一种自动转换的语音通道的装置及其方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105867718A (zh) * 2015-12-10 2016-08-17 乐视网信息技术(北京)股份有限公司 一种多媒体互动方法及装置
CN109151564A (zh) * 2018-09-03 2019-01-04 青岛海信电器股份有限公司 基于麦克风的设备控制方法及装置
CN109817214A (zh) * 2019-03-12 2019-05-28 百度在线网络技术(北京)有限公司 应用于车辆的交互方法和装置
CN110290475A (zh) * 2019-05-30 2019-09-27 深圳米唐科技有限公司 车载人机交互方法、系统及计算机可读存储介质
US20190377545A1 (en) * 2016-02-22 2019-12-12 Sonos, Inc Metadata exchange involving a networked playback system and a networked microphone system
CN111462744A (zh) * 2020-04-02 2020-07-28 深圳创维-Rgb电子有限公司 一种语音交互方法、装置、电子设备及存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100525413C (zh) * 2007-04-09 2009-08-05 海尔集团公司 电视音频的播放方法及实现该方法的电视
KR101604693B1 (ko) * 2009-07-01 2016-03-18 엘지전자 주식회사 이동 단말기 및 이것의 멀티미디어 콘텐츠 제어 방법
CN102945672B (zh) * 2012-09-29 2013-10-16 深圳市国华识别科技开发有限公司 一种多媒体设备语音控制系统及方法
CN108363557B (zh) * 2018-02-02 2020-06-12 刘国华 人机交互方法、装置、计算机设备和存储介质
CN108769745A (zh) * 2018-06-29 2018-11-06 百度在线网络技术(北京)有限公司 视频播放方法和装置
CN109275025A (zh) * 2018-09-25 2019-01-25 四川长虹电器股份有限公司 智能电视中实现语音播报时弱化背景音的方法
CN110017848B (zh) * 2019-04-11 2020-09-29 北京三快在线科技有限公司 语音导航方法、装置、电子设备及存储介质
CN110166550B (zh) * 2019-05-22 2022-03-18 湖南康通电子股份有限公司 一种数字广播系统的定时广播方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105867718A (zh) * 2015-12-10 2016-08-17 乐视网信息技术(北京)股份有限公司 一种多媒体互动方法及装置
US20190377545A1 (en) * 2016-02-22 2019-12-12 Sonos, Inc Metadata exchange involving a networked playback system and a networked microphone system
CN109151564A (zh) * 2018-09-03 2019-01-04 青岛海信电器股份有限公司 基于麦克风的设备控制方法及装置
CN109817214A (zh) * 2019-03-12 2019-05-28 百度在线网络技术(北京)有限公司 应用于车辆的交互方法和装置
CN110290475A (zh) * 2019-05-30 2019-09-27 深圳米唐科技有限公司 车载人机交互方法、系统及计算机可读存储介质
CN111462744A (zh) * 2020-04-02 2020-07-28 深圳创维-Rgb电子有限公司 一种语音交互方法、装置、电子设备及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114443197A (zh) * 2022-01-24 2022-05-06 北京百度网讯科技有限公司 界面处理方法、装置、电子设备及存储介质
CN114443197B (zh) * 2022-01-24 2024-04-09 北京百度网讯科技有限公司 界面处理方法、装置、电子设备及存储介质
CN114836936A (zh) * 2022-05-10 2022-08-02 海信(山东)冰箱有限公司 一种衣物处理设备及其控制方法

Also Published As

Publication number Publication date
CN111462744B (zh) 2024-01-30
CN111462744A (zh) 2020-07-28

Similar Documents

Publication Publication Date Title
WO2021196617A1 (zh) 一种语音交互方法、装置、电子设备及存储介质
US11620984B2 (en) Human-computer interaction method, and electronic device and storage medium thereof
CN107340991B (zh) 语音角色的切换方法、装置、设备以及存储介质
WO2017193540A1 (zh) 弹幕播放方法、播放装置及播放系统
JP7051799B2 (ja) 音声認識制御方法、装置、電子デバイス及び読み取り可能な記憶媒体
US11688389B2 (en) Method for processing voice signals and terminal thereof
TW202025139A (zh) 語音互動方法、裝置及系統
CN110769189B (zh) 视频会议切换方法、装置及可读存储介质
CN107655154A (zh) 终端控制方法、空调器及计算机可读存储介质
US20150163610A1 (en) Audio keyword based control of media output
WO2017156983A1 (zh) 一种列表的调用方法及装置
CN104615359A (zh) 对应用软件进行语音操作的方法及装置
JP7051798B2 (ja) 音声認識制御方法、装置、電子デバイスと読み取り可能な記憶媒体
JP2014021493A (ja) 外部入力制御方法及びそれを適用した放送受信装置
CN205693817U (zh) 一种触摸屏遥控器及其电视系统
WO2017148270A1 (zh) 一种音量控制方法和装置、及终端
WO2020135773A1 (zh) 数据处理方法、装置及计算机可读存储介质
US20170171497A1 (en) Method and Device for Automatically Adjusting Volume
CN104423992A (zh) 显示器语音辨识的启动方法
CN113676689A (zh) 一种视频通话方法、装置及电视
CN109658924B (zh) 会话消息处理方法、装置及智能设备
WO2021042584A1 (zh) 全双工语音对话方法
KR20210025812A (ko) 전자장치, 디스플레이장치 및 그 제어방법
CN112786031B (zh) 人机对话方法及系统
JP7053693B2 (ja) 音声スキルの終了方法、装置、デバイスおよび記憶媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929488

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20929488

Country of ref document: EP

Kind code of ref document: A1