CN112511877B

CN112511877B - Intelligent television voice continuous conversation and interaction method

Info

Publication number: CN112511877B
Application number: CN202011420024.4A
Authority: CN
Inventors: 陈贵凤; 周杰; 高美军; 李洋全
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-08-27
Anticipated expiration: 2040-12-07
Also published as: CN112511877A

Abstract

The invention discloses a method for voice continuous conversation and interaction of an intelligent television, which is characterized in that the validity of the operation intention of an instruction is judged through a server side, and the effective instruction, the weak intention instruction and the refusable instruction are distinguished; and the television equipment performs different operations on the instructions with different effectiveness, UI display and continuous conversation interaction. The voice interaction that voice commands can be continuously input to carry out television control can be realized by awakening the smart television once, the operation path is shortened, the voice use complexity of a user is greatly reduced, the user can operate the television naturally as if the user communicates with a person, and the use experience of the smart equipment of the user is obviously improved.

Description

Intelligent television voice continuous conversation and interaction method

Technical Field

The invention relates to the technical field of intelligent voice interaction, in particular to a voice continuous conversation and interaction method for an intelligent television,

background

At present, two interaction modes of voice on the smart television basically exist: single round of interaction, namely one-time awakening and one-time interaction; and multiple rounds of interaction, namely one-time awakening and multiple interactions. The voice pickup can be carried out after the voice is called up every time in a single round of interaction, and the awakening-free multi-round voice input can be supported only for limited times after the voice is called up even though the voice is not required to be activated every time in the multi-round of interaction. When a user uses the television, the user needs to frequently call the voice by activating the words to input a new voice command, so that the user cannot operate the television continuously and smoothly by the voice without obstacles. The main reasons for this problem are: if the voice is always in the awakening state, because a user possibly speaks all the time in the house, a quieter environment cannot be guaranteed, the television can continuously record external sound, unexpected semantic understanding and execution operation are easy to occur, and the user can hardly use the television function normally.

Continuous voice interaction of voice instructions under the fixed same-class service intentions is only realized on the existing intelligent product for the continuous conversation interaction, and the scene and the continuous conversation interaction are fixed and inflexible, can not perform cross-scene and cross-service interaction, and can not dynamically update operable instructions under the continuous conversation in real time.

Disclosure of Invention

Aiming at the problem that the intelligent television is in an awakening activation state for a long time, and after external environment sound is continuously recorded, wrong semantic understanding and execution operation outside expectation are easily carried out, and continuous interaction cannot be carried out; the method aims at the problems of continuous conversation interaction and fixed and inflexible scene at present. The invention provides a voice continuous conversation and interaction method for an intelligent television. The method realizes one-time voice activation of the smart television and can continuously perform voice interaction by performing effective semantic analysis on the voice instruction. The invention defines a plurality of instruction data sets at the server end to carry out continuous conversation among a plurality of scenes and services, and supports dynamically adjustable rules and data sets, thereby realizing the voice continuous conversation interaction of cross-scene and cross-service. Compared with the prior art, the method defines a scene set and an instruction set capable of continuously picking sound, judges the validity of the voice instruction intention, carries out different interactions on the instructions with different validity, and realizes the interaction of continuous conversation across scenes and businesses.

The invention realizes the purpose through the following technical scheme:

a method for voice continuous conversation and interaction of an intelligent television comprises the following steps:

step1, defining an effective instruction data set of continuous conversation;

effective scene instruction data set defining a continuous dialog: defining scenes needing to customize and process continuous dialogue interaction, dynamically configuring an effective instruction set and rules for instructions in different scenes, and preferentially judging the instruction set of the effective scene and the continuous dialogue interaction in the scene;

effective domain data sets defining continuous dialogs: defining a voice field set and rules which can support continuous dialogue interaction, and dynamically configuring an instruction set; defining a voice domain set and rules which can support continuous dialogue interaction, and dynamically configuring domain data in the set;

defining a weak semantic instruction dataset: in hundreds of voice instructions or intentions supported by an effective field data set, strong semantic and weak semantic intention instructions in the field are further distinguished, instructions with ambiguous intentions or weak functions are classified into a weak semantic instruction set, and instruction set rules in the instructions can be dynamically configured;

step2, the server side judges the validity of the instruction;

for the continuous dialogue effective instruction set and the weak semantic instruction set which are dynamically configured in the step1, the server side judges the effectiveness of the instruction through an intention rejection algorithm model based on a pre-training language model and a convolutional neural network, the effectiveness is divided into an effective instruction, a weak intention instruction and a rejectable instruction, and a semantic control intention is issued;

step3, voice continuous dialogue interactive display at the television equipment end;

and (3) judging the validity of the instruction through the step2, displaying different interactions and UIs by the television end, providing different interaction states for the user, and enabling the user to intuitively perceive whether the function is executable or not, whether the current state can continuously pick up sound or not and whether the conversation can be directly and continuously carried out or not through different interaction effects and UI display according to different states.

Further, in the step2, the determining step is as follows:

A. preferentially determining the effective scene instruction data set: whether the current scene is in an effective scene set or not is judged, the analysis judgment of the instruction set of the corresponding scene is preferentially carried out under the effective scene, and the effectiveness is judged according to the semantic intention;

B. if it is determined in step a that the voice command is not a valid scene command, a valid domain data set is determined: if the voice command is not an intention command in the effective field set, judging the voice command to be a rejectable command;

C. and B, judging that the voice instruction is an effective domain instruction, and judging a weak semantic instruction set: and if the voice instruction is not an intention instruction in the weak semantic set, judging the voice instruction to be an effective instruction, otherwise, judging the voice instruction to be a weak intention instruction.

Further, the instruction judgment types are as follows:

valid instructions: directly triggering continuous conversation interaction in a discontinuous conversation state, entering a continuous pickup state, and performing instruction intention control; if the continuous conversation state is already in the continuous conversation state, the continuous conversation state is kept, the continuous sound pickup and recording state is kept, and the operation of the current instruction is kept;

the weak intent instruction: in the discontinuous conversation state, the original single-round or multi-round interaction state is kept; if the conversation is in a continuous conversation state, performing guidance reply without operation;

the rejectable instruction: in the discontinuous conversation state, the original single-round or multi-round interaction state is kept; and under the continuous conversation state, the original continuous pickup state is kept without reply and control.

If the command can be rejected, keeping the original interaction state and carrying out related operation in a discontinuous conversation state; and in the continuous conversation state, the current continuous conversation state is maintained, no operation is performed, and the UI is displayed as an unexecuted state.

Further, if the instruction is a valid instruction, the intended operation of the instruction is executed; if the voice is not in the continuous conversation interaction, entering a continuous conversation UI state, and starting a continuous pickup function; and in the continuous dialogue interaction, the voice keeps the interaction UI state and the continuous sound pickup and recording function is kept.

The further scheme is that if the instruction is a weak intention instruction, the original interaction state is kept and related operations are carried out in a discontinuous conversation state; and in the continuous conversation state, the interaction state and the UI display effect are kept, and only guidance reply is carried out without operation.

The invention has the beneficial effects that:

the invention realizes the voice interaction of television control by continuously inputting the voice command after the smart television is awakened once by voice, shortens the operation path, greatly reduces the voice use complexity of the user, enables the user to operate the television as natural as the communication with people, and obviously improves the use experience of the smart equipment of the user.

The invention defines a plurality of effective instruction data sets under continuous conversation and realizes the data support of cross-scene and cross-service continuous conversation interaction; judging the validity of the operation intention of the instruction through the server side, and distinguishing a valid instruction, a weak intention instruction and a rejectable instruction; and the television equipment performs different operations on the instructions with different effectiveness, UI display and continuous conversation interaction.

The invention is not limited to the smart television, but can also be expanded to other intelligent devices; the method is not limited to a fixed instruction set, and can flexibly allocate and configure each instruction set of the server side, even the personalized instruction set of the user can be dynamically updated at the later stage, so that more intelligent experience is provided.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the embodiments or the drawings needed to be practical in the prior art description, and obviously, the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

In any embodiment, as shown in fig. 1, a method for voice continuous conversation and interaction of a smart television of the present invention includes:

step1, defining an effective instruction data set of continuous conversation;

effective scene instruction data set defining a continuous dialog: the intelligent television has the functional scenes of network video playing, local video playing, song playing, radio listening, education and learning, game entertainment and the like, and in different scenes, the demands for continuous conversation are different, and different scenes are effectively distinguished and continuously interacted and distinguished by defining an effective scene set.

Defining a valid scene instruction dataset: defining scenes needing to customize and process continuous dialogue interaction, dynamically configuring an effective instruction set and rules for instructions in different scenes, and preferentially judging the instruction set of the effective scene and the continuous dialogue interaction in the scene;

and the scene without customized processing is judged according to the effective field set of the voice.

Effective domain data sets defining continuous dialogs: different voice instructions can be subdivided into different fields of listening to songs, watching videos, checking weather, listening to radio stations and the like, and by defining an effective field set, a non-effective field set is drawn into a field with low utilization rate and weak function of part of users, and the triggering of continuous conversation interaction is not carried out or the existing interaction of the continuous conversation is not interrupted.

Defining a valid domain data set: defining a voice domain set and rules which can support continuous dialogue interaction, and dynamically configuring domain data in the set;

defining a weak semantic instruction dataset: in hundreds of voice instructions or intentions supported by the effective field data set, strong semantic and weak semantic intention instructions in the field are further distinguished, instructions with ambiguous intentions or weak functions are classified into a weak semantic instruction set, and instruction set rules in the instructions can be dynamically configured.

Step2, the server side judges the validity of the instruction;

for the continuous dialogue effective instruction set and the weak semantic instruction set which are dynamically configured in the step1, the server side judges the effectiveness of the instruction through an intention rejection algorithm model based on a pre-training language model and a convolutional neural network, the effectiveness is divided into an effective instruction, a weak intention instruction and a rejectable instruction, and a semantic control intention is issued. The determination steps are as follows:

and (3) through the judgment of the validity of the instruction in the step (2), the television end displays different interactions and UIs and provides different interaction states for the user. Aiming at different states, through different interaction effects and UI display, a user can intuitively perceive whether the function is executable or not, whether the current state can continuously pick up sound or not, and whether continuous conversation can be directly realized or not.

The types of instruction judgment are as follows:

In an embodiment, as shown in fig. 1, a method for voice continuous conversation and interaction of a smart television of the present invention includes the following steps:

step1 defining an effective instruction data set of continuous conversation;

1.1 defining a valid scene instruction data set for a continuous dialog;

defining a scene needing to customize and process continuous dialogue interaction; if the video application is defined as an effective scene App, the private playback control global instruction in the scene is an effective instruction set AppControl.

1.2 defining valid domain data sets for continuous dialogs;

defining a voice field set and rules which can support continuous dialogue interaction, and dynamically configuring an instruction set; if a domain set DomainA of movie and Video + Music + TV control is defined, other domains such as weather and chatting are not in an effective domain set;

strong semantic and weak semantic instructions in the effective field are distinguished, partial single nouns or ambiguous weak semantic instruction sets DomainWeak are defined, such as some Stars of singers and actors, and the configuration in the weak semantic instruction sets DomainWeak can be dynamically adjusted according to market popularity and user preference.

Step2, the server side judges the validity of the instruction;

after receiving the reported voice command, the server side performs the following processing through the reported scene and the continuous conversation state information of the equipment side, and issues the semantic control intention of the command:

step1, judging the semantic intention of the instruction;

step2 makes a valid scene instruction data set decision: the current scene is an effective scene App, and if the current instruction is judged to be in an effective instruction set Appcontrol of the current scene, the current instruction is an effective instruction; otherwise, the next Step3 processing is carried out;

step3, judging the valid domain data set: if the current instruction is in the effective domain set DomainA, performing next Step4 processing, otherwise, determining that the current instruction is a rejectable instruction;

step4 performs a weak semantic instruction set decision: if the instruction is not in the weak semantic instruction set DomainWeak, the instruction is a valid instruction; in DomainWeak, it is a weak intention instruction.

Step3, voice continuous dialogue interaction at the television equipment end;

and (3) designing different UI display states of continuous interaction and normal interaction at the equipment end, and after receiving the recording and reporting the instruction Query request, carrying out validity judgment on the instruction Query through the step2 by the television end to display different UI display states.

Determination case 1: the Query is an effective instruction and executes the intention operation of the instruction; if the voice is not in the continuous conversation interaction, entering a continuous conversation UI state, and starting a continuous pickup function; and in the continuous dialogue interaction, the voice keeps the interaction UI state and the continuous sound pickup and recording function is kept.

Determination case 2: the Query is a weak intention instruction, and in a discontinuous conversation state, the original interaction state is kept and related operations are carried out; and in the continuous conversation state, the interaction state and the UI display effect are kept, and only guidance reply is carried out without operation. If the command is star name in Stars, in the continuous dialogue state, only "you can say the movie of Query" or "you can say the song of Query" is replied.

Determination case 3: the Query is a refusable instruction, and the original interaction state is kept and related operations are carried out under the discontinuous conversation state; in the continuous conversation state, the current continuous conversation state is maintained, no operation is performed, and the UI is shown in a non-execution state, such as a graytone display of the UI for instruction or reply.

Under the continuous conversation state, after each instruction execution, the device can enter a waiting state, the UI does not shield the display and operation of the next layer of window of the device, the pickup tilting state is kept, and the device can enter a voice operation state at any time.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims. It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition. In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims

1. A method for voice continuous conversation and interaction of an intelligent television is characterized by comprising the following steps:

step1, defining an effective instruction data set of continuous conversation;

effective scene instruction data set defining a continuous dialog: defining scenes needing to customize and process continuous dialogue interaction, dynamically configuring an effective scene instruction data set and rules for instructions in different scenes, and preferentially judging the effective scene instruction data set of the effective scene and the continuous dialogue interaction in the scene;

effective domain data sets defining continuous dialogs: defining a voice field set and rules which can support continuous dialogue interaction, and dynamically configuring an effective field data set; defining a voice field set and rules which can support continuous dialogue interaction, and dynamically configuring field data in the voice field set which can support continuous dialogue interaction;

defining a weak semantic instruction dataset: in hundreds of voice instructions or intentions supported by an effective field data set, strong semantics and weak semantic intention instructions in the field are further distinguished, instructions with ambiguous intentions or weak functions are classified into a weak semantic instruction data set, and weak semantic instruction data set rules in the instructions can be dynamically configured;

step2, the server side judges the validity of the instruction;

for the continuous conversation effective instruction data set and the weak semantic instruction data set which are dynamically configured in the step1, the server side judges the effectiveness of the instruction through an intention rejection algorithm model based on a pre-training language model and a convolutional neural network, the effectiveness is divided into an effective instruction, a weak intention instruction and a rejectable instruction, and a semantic control intention is issued;

in the step2, the determination step is as follows:

A. preferentially determining the effective scene instruction data set: whether the current scene is in an effective scene set or not is judged, the effective scene instruction data set of the corresponding scene is analyzed and judged preferentially under the effective scene, and the effectiveness is judged according to the semantic intention;

B. if it is determined in step a that the voice command is not a valid scene command, a valid domain data set is determined: if the voice command is not an intention command in the effective field data set, judging that the command can be rejected;

C. and B, judging that the voice command is a valid domain command, and judging a weak semantic command data set: if the voice instruction is not an intention instruction in the weak semantic set, judging the voice instruction as an effective instruction, otherwise, judging the voice instruction as a weak intention instruction;

through the judgment of the validity of the instruction in the step2, the television end displays different interactions and UIs to provide different interaction states for the user, and the user can intuitively perceive whether the function is executable, whether the current state can continuously pick up sound or not and whether continuous conversation can be directly realized or not through different interaction effects and UI display aiming at different states;

the types of instruction judgment are as follows:

2. The method for intelligent television voice continuous conversation and interaction as claimed in claim 1, wherein if the command is a refusable command, in the discontinuous conversation state, the original interaction state is maintained, and related operations are performed; and in the continuous conversation state, the current continuous conversation state is maintained, no operation is performed, and the UI is displayed as an unexecuted state.

3. The method according to claim 1, wherein if the command is valid, performing an intended operation of the command; if the voice is not in the continuous conversation interaction, entering a continuous conversation UI state, and starting a continuous pickup function; and in the continuous dialogue interaction, the voice keeps the interaction UI state and the continuous sound pickup and recording function is kept.

4. The method for intelligent television voice continuous conversation and interaction as claimed in claim 1, wherein if the instruction is a weak intention instruction, in a discontinuous conversation state, the original interaction state is maintained, and related operations are performed; and in the continuous conversation state, the interaction state and the UI display effect are kept, and only guidance reply is carried out without operation.