CN110728981A

CN110728981A - Interactive function execution method and device, electronic equipment and storage medium

Info

Publication number: CN110728981A
Application number: CN201910955133.7A
Authority: CN
Inventors: 赵丽娜; 赵倩; 白琛
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-10-09
Filing date: 2019-10-09
Publication date: 2020-01-24

Abstract

The disclosure relates to an interactive function execution method, an interactive function execution device, electronic equipment and a storage medium. The method comprises the steps of receiving a voice instruction which is input through a multimedia client and used for requesting to execute an interactive function; identifying an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction; and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result. The execution efficiency of the interactive function can be improved.

Description

Interactive function execution method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to an interactive function execution method and apparatus, an electronic device, and a storage medium.

Background

With the development of internet technology, services provided by multimedia clients to users are gradually diversified. For example, in the multimedia client, besides providing the multimedia playing function to the user, more types of interactive functions such as chatting, sharing and praise are provided.

In the related technology, a user designates an interaction type of an interaction function to be executed and parameters required by the interaction type in a multimedia client through clicking or retrieving and the like; further, the multimedia client sends the identification of the interaction type specified by the user and the parameters required by the interaction type to the server; after receiving the identification and the parameters sent by the multimedia client, the server generates and executes an operation instruction based on the identification and the parameters; and then, feeding back the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.

However, when the number of the interactive functions provided by the multimedia client is large, the process of the user searching for the button or the search box of the interactive function to be executed on the multimedia client is tedious and takes a long time, which undoubtedly results in low efficiency of executing the interactive function.

Disclosure of Invention

The disclosure provides an execution method and device of an interactive function, electronic equipment and a storage medium, so as to improve the execution efficiency of the interactive function. The technical scheme of the disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an execution method of an interactive function applied to a server, including:

receiving a voice instruction for requesting execution of an interactive function, which is input through a multimedia client;

identifying an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction;

generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction;

and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.

Optionally, the identifying, based on the semantic content of the voice instruction, an interaction type of the interaction function requested to be performed includes:

converting semantic content of the voice instruction into a text sequence, wherein the text sequence is a sequence formed by each word in the semantic content and part-of-speech information of each word;

inputting the text sequence into an interactive classification model trained in advance to obtain an identification of an interactive type corresponding to the text sequence;

taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed;

the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.

Optionally, the converting semantic content of the voice instruction into a text sequence includes:

performing word segmentation processing on the semantic content of the voice command to obtain each word segmentation and the part of speech of each word segmentation;

and constructing a text sequence by taking each participle and the part of speech of the participle as sequence elements, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.

Optionally, the generating, according to the semantic content of the voice instruction, an operation instruction corresponding to the identified interaction type includes:

extracting operation key words from semantic contents of the voice instruction;

and filling the extracted operation key words into the instruction template of the identified interaction type to generate the operation instruction.

Optionally, the extracting an operation keyword from semantic content of the voice instruction includes:

performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation;

classifying the participles by utilizing a pre-trained participle classification model to obtain an interaction type corresponding to each participle;

extracting the participles with the same interaction type as the identified interaction type from the participles to serve as operation keywords;

the word segmentation classification model is a model obtained by training based on sample word segmentation and the identification of the interaction type corresponding to the sample word segmentation.

Optionally, after returning the execution result of the operation instruction to the multimedia client, the method further includes:

and returning the feedback voice corresponding to the execution result to the multimedia client so that the multimedia client plays the feedback voice.

According to a second aspect of embodiments of the present disclosure, there is provided a method for performing an interactive function applied to a multimedia client, including:

receiving a voice instruction for requesting execution of an interactive function;

sending the voice instruction to a server so as to enable the server to identify the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning an execution result of the operation instruction;

and receiving and displaying the execution result.

Optionally, after presenting the execution result, the method further includes:

receiving feedback voice corresponding to the execution result sent by the server;

and playing the feedback voice.

According to a third aspect of embodiments of the present disclosure, there is provided an apparatus for performing an interactive function applied to a server, including:

a receiving module configured to receive a voice instruction for requesting execution of an interactive function, which is input through a multimedia client;

the recognition module is configured to recognize the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction;

the execution module is configured to generate an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and execute the operation instruction;

the first feedback module is configured to return an execution result of the operation instruction to the multimedia client, so that the multimedia client displays the execution result.

Optionally, the identification module includes: a conversion submodule and an identification submodule;

the conversion submodule is configured to convert semantic contents of the voice instruction into a text sequence, and the text sequence is a sequence formed by each word in the semantic contents and part-of-speech information of each word;

the recognition submodule is configured to input the text sequence into a pre-trained interactive classification model to obtain an identification of an interactive type corresponding to the text sequence; taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed; the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.

Optionally, the conversion sub-module is specifically configured to:

Optionally, the execution module includes an extraction submodule and a filling submodule;

the extraction sub-module is configured to extract operation keywords from semantic content of the voice instruction;

and the filling sub-module is configured to fill the extracted operation key words into the instruction template of the identified interaction type to generate the operation instruction.

Optionally, the extracting sub-module is specifically configured to:

Optionally, the apparatus further comprises: a second feedback module;

the second feedback module is configured to return feedback voice corresponding to the execution result to the multimedia client, so that the multimedia client plays the feedback voice.

According to a fourth aspect of embodiments of the present disclosure, there is provided an apparatus for performing an interactive function applied to a multimedia client, including:

a first receiving module configured to receive a voice instruction for requesting execution of an interactive function;

a sending module configured to send the voice instruction to a server so as to enable the server to identify an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning an execution result of the operation instruction;

a presentation module configured to receive and present the execution result.

Optionally, the apparatus further comprises: the second receiving module and the playing module;

the second receiving module is configured to receive feedback voice corresponding to the execution result sent by the server;

the playing module is configured to play the feedback voice.

According to a fifth aspect of embodiments of the present disclosure, there is provided a server including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any one of the above methods for executing the interactive function applied to the server.

According to a sixth aspect of embodiments of the present disclosure, there is provided a multimedia client device comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any one of the above-mentioned methods for executing an interactive function applied to a multimedia client.

According to a seventh aspect of embodiments of the present disclosure, there is provided a storage medium having a computer program stored therein, the computer program, when executed by a processor, implementing any one of the above-described execution methods for an interactive function applied to a server.

According to an eighth aspect of embodiments of the present disclosure, there is provided a storage medium having a computer program stored therein, wherein the computer program, when executed by a processor, implements any one of the above-mentioned methods for executing an interactive function applied to a multimedia client.

According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described execution methods applied to an interactive function of a server.

According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described methods of performing an interactive function applied to a multimedia client.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: in the scheme, a user does not need to search a button or a retrieval frame and the like of an interactive function to be executed in a multimedia client, a voice instruction is directly input in the multimedia client, a server corresponding to the multimedia client can identify the interactive type of the interactive function requested to be executed by the voice instruction based on the semantic content of the voice instruction, an operation instruction corresponding to the identified interactive type is generated according to the semantic content of the voice instruction, and the operation instruction is executed; then, the server feeds back the execution result of the operation instruction to the multimedia client. Therefore, the method and the device can improve the execution efficiency of the interactive function.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flowchart illustrating a method of performing an interactive function applied to a server according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method of performing an interactive function applied to a multimedia client according to an exemplary embodiment.

Fig. 3 is a block diagram illustrating an apparatus for performing an interactive function applied to a server according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating an apparatus for performing an interactive function applied to a multimedia client according to an exemplary embodiment.

FIG. 5 is a block diagram illustrating a server in accordance with an example embodiment.

FIG. 6 is a block diagram illustrating a multimedia client device according to an example embodiment.

FIG. 7 is a block diagram illustrating an apparatus for performing interactive functions in accordance with an exemplary embodiment.

Fig. 8 is a block diagram illustrating another apparatus for performing interactive functions in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In order to improve the execution efficiency of the interactive function, the disclosure provides an interactive function execution method, an interactive function execution device, an electronic device and a storage medium.

The execution method of the interactive function provided by the present disclosure includes two execution methods, namely an execution method of the interactive function applied to a server and an execution method of the interactive function applied to a multimedia client. It can be understood that the execution subject of the execution method of the interactive function applied to the server may be an execution device of the interactive function in the server; the execution main body of the execution method of the interactive function applied to the multimedia client can be an execution device of the interactive function in the electronic equipment where the multimedia client is located. The multimedia client to which the present disclosure is directed may be a client for providing a short video service, but is not limited thereto.

First, a method for executing an interactive function applied to a server according to an embodiment of the present disclosure will be described in detail. As shown in fig. 1, the method may include the steps of:

s11: and receiving a voice command for requesting the execution of the interactive function, which is input through the multimedia client.

Wherein the voice instruction can be input to the multimedia client by a user of the multimedia client. For example, a voice input button may be provided in the multimedia client, and the user may input a voice command by clicking the button; or, a start voice may be preset in the multimedia client, and when the multimedia client monitors the start voice, a section of voice input by the user after the start voice may be used as the voice instruction, and of course, a section of voice including the start voice may also be used as the voice instruction; or, the multimedia client may start receiving a voice instruction when monitoring that the electronic device where the multimedia client is located is operated by shaking, touching, and the like according to a predetermined mode, and use a received voice as the voice instruction.

In addition, there are various interactive functions in the multimedia client, such as a search function, a recommendation function, a chat function, and a comment function, and the like, but not limited thereto.

S12: based on the semantic content of the voice instruction, an interaction type of the interaction function requested to be performed is identified.

It is understood that, in this step, based on the semantic content of the voice command, the interaction type of the interaction function requested to be performed by the voice command can be identified.

There are various specific implementation manners for identifying the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction. For example, in one implementation, identifying the interaction type of the interaction function requested to be performed based on the semantic content of the voice instruction may include:

It is understood that, in the process data before the interaction classification model outputs the identification of the interaction type corresponding to the text sequence, the probability that each word in the text sequence corresponds to the identification of various interaction types may be included; the identifier of the interaction type corresponding to the word with the highest probability is the identifier of the interaction type corresponding to the text sequence output by the interaction classification model.

When the interactive classification model is trained, natural corpora corresponding to the vertical field or the general field can be used as sample text sequences, and the identification of the interactive type corresponding to each sample text sequence is obtained in a labeling mode, so that the interactive classification model can be trained based on the sample text sequences and the identifications of the interactive types corresponding to the sample text sequences. Calculating a loss value of the interactive classification model based on the identification of the interactive type corresponding to the pre-labeled sample text sequence and the result of whether the identification of the interactive type corresponding to the sample text sequence output by the interactive classification model is consistent; and when the loss value is smaller than a preset first threshold value, the interactive classification model converges to finish training. The interactive classification model may be an svm (support Vector machine) model, a CNN (Convolutional Neural Network) model, an RNN (Recurrent Neural Network) model, a DNN (Deep Neural networks), or the like. It can be understood that, when the interaction classification model converges, for any sample text sequence, the identifier of the interaction type output by the interaction classification model may be the same as the identifier of the interaction type corresponding to the pre-labeled sample text sequence.

Correspondingly, after the interactive classification model is trained, the text sequence converted from the semantic content of the voice command is input into the interactive classification model, so that the identification of the interactive type corresponding to the text sequence can be obtained, and the interactive type of the interactive function requested to be executed by the voice command is identified.

In this implementation, there are various specific implementations for converting semantic content of the voice instruction into a text sequence. For example, in a first implementation, converting semantic content of a voice instruction into a text sequence may include: performing word segmentation processing on semantic content of the voice instruction to obtain each word segmentation; and constructing a text sequence by taking each participle as a sequence element, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.

In a second implementation, converting semantic content of the voice instruction into a text sequence may include:

performing word segmentation processing on semantic content of the voice instruction to obtain each word segmentation and the part of speech of each word segmentation;

In the second implementation manner, each participle and the part-of-speech of the participle are used as sequence elements, and when a text sequence is constructed, the text sequence can be constructed according to the sequence of each participle in semantic content. For example, assuming that the semantic content of the voice instruction is "play food video bar", the constructed text sequence may be "play/n food/n video/n bar/s"; the slashes are used for separating all the participles, and letters behind the slashes represent the part of speech of the participles in front of the slashes. It will be appreciated that the ordering of the individual participles, when constructing the text sequence, may also be random. For example, the text sequence constructed according to the semantic content of the voice instruction "play food video bar" may also be "bar/s play/n video/n food/n".

For clarity of the scheme, the following describes a process of identifying an interaction type of an interaction function requested to be performed based on semantic content of a voice instruction by taking a specific example as an example.

For example, assuming that the semantic content of the voice instruction is "please help me play a gourmet video bar", a text sequence converted from the semantic content is input into the interaction classification model, and the obtained identification of the interaction type corresponding to the text sequence may be an identification of a "search function", and accordingly, it may be determined that the voice instruction requests execution of the interaction function of the "search function"; or, assuming that the semantic content of the voice instruction is "a work of a person who i pay attention to", inputting a text sequence into which the semantic content is converted into an interaction classification model, and the obtained identification of the interaction type corresponding to the text sequence may be an identification of a "recommendation function", and accordingly, it may be determined that the voice instruction requests execution of the interaction function of the "recommendation function"; or, assuming that the semantic content of the voice command is "like for the video point", inputting the text sequence converted from the semantic content into the interaction classification model, and the obtained identification of the interaction type corresponding to the text sequence may be an identification of a "comment function", and accordingly, determining that the voice command requests to execute the interaction function of the "comment function"; or, assuming that the semantic content of the voice instruction is "hello", inputting the text sequence converted from the semantic content into the interaction classification model, and determining that the interaction type identifier corresponding to the text sequence may be an identifier of a "chatting function", and accordingly, determining that the voice instruction requests to execute the interactive function of the "chatting function".

In addition, the specific implementation manner of performing word segmentation processing on the semantic content of the voice instruction is not the inventive point of the present disclosure, and is the same as or similar to the existing word segmentation technology, and is not described herein again.

S13: and generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction.

It can be understood that, when the interaction type of the interaction function requested to be executed by the voice instruction is determined, the operation instruction corresponding to the interaction type can be generated according to the semantic content of the voice instruction. Specifically, the content related to the operation instruction corresponding to the interaction type may be extracted from the semantic content, so as to generate the operation instruction corresponding to the interaction type. For clarity of the scheme and clarity of layout, a specific implementation manner of generating the operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction is illustrated subsequently.

In this step, after the operation instruction is generated, the operation instruction may be executed.

S14: and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.

It can be understood that different interaction types correspond to different execution results. For clarity of the scheme and clarity of layout, the execution results corresponding to different interaction types are illustrated in the following.

In this step, after the multimedia client receives the execution result sent by the server, the execution result can be correspondingly displayed.

In the method for executing the interactive function provided by the embodiment of the disclosure, a user directly inputs a voice instruction at a multimedia client without searching a button or a search box of the interactive function to be executed in the multimedia client, and a server corresponding to the multimedia client can identify the interactive type of the interactive function requested to be executed based on the semantic content of the voice instruction, generate an operation instruction corresponding to the identified interactive type according to the semantic content of the voice instruction, and execute the operation instruction; then, the server feeds back the execution result of the operation instruction to the multimedia client. Therefore, the method and the device can improve the execution efficiency of the interactive function.

For clarity of the scheme and clarity of layout, a specific implementation manner for generating the operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction is illustrated below.

For example, in an implementation manner, generating an operation instruction corresponding to the identified interaction type according to semantic content of the voice instruction may include:

extracting operation key words from semantic contents of the voice instruction;

and filling the extracted operation key words into the instruction template of the identified interaction type to generate an operation instruction.

It can be understood that the instruction template is composed of a software method, and the operation instruction can be obtained by filling the extracted operation keyword into the software method as an input parameter of the software method.

There are various specific implementation manners for extracting the operation keywords from the semantic content of the voice instruction. For example, in one implementation, the step of extracting the operation keyword from the semantic content of the voice instruction may include:

classifying each participle by utilizing a pre-trained participle classification model to obtain an interaction type corresponding to each participle;

In this implementation manner, the training process of the segmentation classification model may be similar to the training process of the interactive classification model, that is, when training the segmentation classification model, massive vocabularies may be used as sample segmentation, and the identification of the interactive type corresponding to each sample segmentation may be obtained in a labeling manner. And training a segmentation classification model based on the sample segmentation and the identification of the interaction type corresponding to the sample segmentation. And calculating a loss value of the word segmentation classification model based on the pre-labeled identification of the interaction type corresponding to the sample word segmentation and the result of whether the identification of the interaction type corresponding to the sample word segmentation output by the word segmentation classification model is consistent, and when the loss value is smaller than a preset second threshold value, the word segmentation classification model is converged to complete training. Here, the second threshold may be the same as or different from the first threshold, and the numerical values of the first threshold and the second threshold and the magnitude relationship therebetween are not limited in the present disclosure. The word segmentation classification model may also be an svm (support vector machine) model, a CNN (Convolutional Neural Network) model, an RNN (Recurrent Neural Network) model, a DNN (Deep Neural Networks), or the like. It can be understood that, when the word segmentation classification model converges, for any sample word segmentation, the identifier of the interaction type corresponding to the sample word segmentation output by the word segmentation classification model may be the same as the identifier of the interaction type corresponding to the pre-labeled sample word segmentation.

In practical application, when the identifier of the interaction type corresponding to the sample participle is labeled, a mode of probability that the sample participle belongs to various interaction types can be adopted. In the labeling manner, the probability that each sample participle belongs to the identifier of the corresponding interaction type may be set to 1, and the probability that the sample participle belongs to the identifiers of other interaction types may be set to 0. In this way, after the training of the segmentation classification model is completed, each segmentation obtained by the segmentation processing is input to the segmentation classification model, and the probability that each segmentation belongs to each interaction type can be obtained, so that the segmentation which belongs to the interaction type identified in step S12 and has a probability greater than the preset probability threshold can be used as an operation keyword.

In another implementation, the step of extracting the operation keyword from the semantic content of the voice instruction may include:

performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation and the part of speech of each word segmentation;

aiming at each participle, inputting a text sequence formed by the participle and the part of speech of the participle into the trained interactive classification model to obtain an identification of an interactive type corresponding to the participle;

and taking the corresponding identification of the interaction type and the participle with the same identification of the interaction type identified in the step S12 as the operation key word.

It can be understood that, the interactive classification model is obtained by using a natural corpus corresponding to a vertical domain or a general domain, and the natural corpus may include sentences or words. Therefore, the trained interactive classification model can predict the identification of the interactive type corresponding to the sentence, and can also predict the identification of the interactive type corresponding to the word; that is, the interactive classification model may predict the identifier of the interactive type corresponding to the semantic content of the voice instruction, or may predict the identifier of the interactive type corresponding to the participle.

In one implementation manner, in the two implementation manners of extracting the operation keyword from the semantic content of the voice instruction, the step of performing word segmentation processing on the semantic content corresponding to the voice instruction may be combined with the step of performing word segmentation processing on the semantic content of the voice instruction when the interaction type of the interaction function requested to be executed is identified based on the semantic content of the voice instruction in step S12, that is, when the interaction type of the interaction function requested to be executed is identified based on the semantic content of the voice instruction, after the word segmentation processing is performed on the semantic content of the voice instruction, a result of the word segmentation processing may be directly obtained, and the identifier of the interaction type corresponding to each word in the semantic content of the voice instruction is predicted by using the word segmentation classification model or the interaction classification model.

It is understood that the operation keywords of different types of interaction types are different. In the disclosure, firstly, the interaction type of the interaction function requested to be executed by the voice command is determined, so that the operation keywords related to the interaction type can be further extracted from the semantic content of the voice command in a targeted manner.

And after the operation key words are extracted, the operation key words can be filled in the instruction template of the identified interaction type, so that an operation instruction is generated and executed.

For clarity of the scheme, the following further describes an execution method of the interactive function provided by the embodiment of the present disclosure, taking a specific interactive function as an example.

Example 1, assuming that the semantic content of the voice command is "please help me play a gourmet video bar", it may be determined that the voice command requests execution of a "search function" based on the step of S12; based on the step of S13, an operation keyword "food" related to the interaction type of the "search function" may be extracted, and the "food" is filled into the instruction template of the interaction type of the "search function" to obtain an operation instruction for executing the search function with the "food" as a keyword; and executing the operation instruction to obtain a video search result of which the video name comprises food and/or the video category is food.

Example 2, assuming that the semantic content of the voice instruction is "a work of a person who i focused on", it may be determined that the voice instruction requests execution of the "recommendation function" based on the step of S12, and an operation keyword "i focused" related to the interaction type of the "recommendation function" may be extracted based on the step of S13, and the "i focused" is filled in the instruction template of the interaction type of the "recommendation function", resulting in: selecting an operation instruction of performing video recommendation in a recommended category of 'i pay attention to'; and executing the operation instruction to obtain a video recommendation result with the recommended category of 'i pay attention to'. Of course, there are many categories of recommendations in the recommendation function, such as "hot video", "lovely pet video", "news video", and "newest video", etc.

Example 3, assume that the semantic content of the voice instruction is "hello"; it may be determined based on the step of S12 that the voice command requests to perform a "chit chat function"; based on the step of S13, the operation keyword "nihaya" related to the interaction type of the "chatting function" may be extracted, and the "nihaya" is filled into the instruction template of the interaction type of the "chatting function" to obtain the feedback voice corresponding to the "nihaya". Here, the feedback voice is, for example, "hello, XXX", or the like. Where XXX may specifically be the name of the "respected user" or the user of the specific multimedia client, etc., it is reasonable.

Example 4, assume that the semantic content of the voice command is "like this video"; it may be determined based on the step of S12 that the voice instruction requests to perform a "critique function"; based on the step of S13, the operation keyword "like" related to the interaction type of the "comment function" may be extracted, the "like" is filled in the instruction template of the interaction type of the "comment function", and an operation instruction for adding 1 to the number of like of the video and lighting the like icon of the video is obtained; and executing the operation instruction to obtain an execution result that the video praise number is increased by 1 and the praise graph of the video is lighted up.

In addition, in the embodiment of the present disclosure, the extracted operation keyword from the semantic content of the voice instruction may include a plurality of operation keywords; correspondingly, the instruction template of the interaction type requested to be executed by the voice instruction may include filling positions of a plurality of operation keywords. For example, assume that the semantic content of the voice instruction is "say bye for zhang san; it may be determined based on the step of S12 that the voice command requests to perform a "chat function"; correspondingly, the instruction template of the interaction type of the chat function comprises two filling positions of the operation keywords, one is a chat object, and the other is chat content. Filling "zhang san" in the position of the chat object, and filling "bye" in the position of the chat content, so as to obtain a corresponding operation instruction, and executing the operation instruction, wherein the obtained execution result may be: and calling a conversation with Zhang III, and adding a new message with the content of 'bye' in the conversation.

It should be noted that the above-mentioned interaction functions and the operation keywords related to each interaction function are only examples, and should not be construed as limiting the disclosure.

Optionally, in an implementation manner, in order to enhance the user experience of the multimedia client, after the execution result of the operation instruction is returned to the multimedia client, the method for executing the interactive function applied to the server may further include:

It can be understood that, since the user sends the voice command, the voice interaction between the user and the multimedia client can be established by feeding back the execution result of the interactive function requested to be executed by the voice command to the user through the feedback voice, so as to improve the user experience of the multimedia client.

Corresponding to the above-mentioned method for executing an interactive function applied to a server, an embodiment of the present disclosure further provides a method for executing an interactive function applied to a multimedia client, as shown in fig. 2, where the method may include:

s21: a voice instruction requesting execution of an interactive function is received.

Here, the voice instruction may be issued by a user of the multimedia client. For example, a voice input button may be provided in the multimedia client, and the user may input a voice command by clicking the button; or, a start voice may be preset in the multimedia client, and when the multimedia client monitors the start voice, a section of voice input by the user after the start voice may be used as the voice instruction, and of course, a section of voice including the start voice may also be used as the voice instruction; or, the multimedia client may start receiving the voice instruction when monitoring that the device where the multimedia client is located is operated by shaking, touching, and the like according to a predetermined mode, and use the received voice as the voice instruction.

S22: sending the voice instruction to a server so that the server identifies the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; and executing the operation instruction and returning the execution result of the operation instruction.

In this step, as to the specific implementation manner of each step executed by the server, detailed description has been already made in the method for executing the interactive function applied to the server provided in the embodiment of the present disclosure, and details are not described here again.

S23: and receiving and displaying an execution result.

It can be understood that, because the execution results corresponding to different interactive functions are different, for different interactive functions, the display modes of the multimedia client are different when the multimedia client displays the received execution results. Examples 1 to 4 in the above-described embodiment are taken as an example. In example 1, the execution result of the search function whose operation keyword is "food" may be: the video name contains "food" and/or video search results for which the video category is "food". Accordingly, the multimedia client can display the video search result of which the video name comprises food and/or the video category is food. In example 2 described above, the execution result of the "recommended function" whose operation keyword is "i pay attention to" may be: the recommendation category is video recommendation results under the category of 'i pay attention to'. Correspondingly, the multimedia client can display the video recommendation results with the recommendation category of 'i pay attention to'. In the above example 3, the execution result of the "chatting service" whose operation keyword is "hello" may be a feedback voice corresponding to "hello". Accordingly, the multimedia client can play the feedback voice. In the above example 4, the execution result of the "review service" whose operation keyword is "like" may be the number of like of the video plus 1, and the like, in which the like map of the video is highlighted. Accordingly, the multimedia client may light up the video's likes icon and add 1 to the displayed video's likes.

In the method for executing the interactive function applied to the multimedia client, the user does not need to search a button or a search box and the like of the interactive function to be executed in the multimedia client, and directly sends the voice instruction, so that the multimedia client can send the voice instruction to the server and receive the execution result of the interactive function requested to be executed by the voice instruction fed back by the server. Therefore, the method and the device can improve the execution efficiency of the interactive function.

Optionally, in an implementation manner, in order to enhance the user experience of the multimedia client, after the execution result is presented, the method for executing the interactive function applied to the multimedia client may further include:

receiving feedback voice corresponding to an execution result sent by a server;

and playing the feedback voice.

It can be understood that, since the user sends the voice command, the execution result of the interactive function requested to be executed by the voice command is fed back to the user by the feedback voice, and the voice interaction between the user and the multimedia client can be established, so as to improve the user experience of the multimedia client.

Corresponding to the above-mentioned method for executing the interactive function applied to the server, an embodiment of the present disclosure further provides an apparatus for executing the interactive function applied to the server, as shown in fig. 3, the apparatus may include: a receiving module 301, an identifying module 302, an executing module 303 and a first feedback module 304.

Wherein, the receiving module 301 is configured to receive a voice instruction for requesting to execute an interactive function, which is input through a multimedia client;

the recognition module 302 is configured to recognize an interaction type of the interaction function requested to be performed based on semantic content of the voice instruction;

the execution module 303 is configured to generate an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and execute the operation instruction;

the first feedback module 304 is configured to return the execution result of the operation instruction to the multimedia client, so that the multimedia client displays the execution result.

Optionally, the identifying module 302 includes: a conversion submodule and an identification submodule;

the recognition sub-module is configured to input the text sequence into a pre-trained interactive classification model to obtain an identification of an interactive type corresponding to the text sequence; taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed; the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.

Optionally, the conversion sub-module is specifically configured to:

Optionally, the executing module 303 includes an extracting sub-module and a filling sub-module;

the extraction submodule is configured to extract operation keywords from semantic contents of the voice instruction;

the filling sub-module is configured to fill the extracted operation keywords into the instruction template of the identified interaction type to generate an operation instruction.

Optionally, the extracting sub-module is specifically configured to:

Optionally, the apparatus may further include: a second feedback module;

According to the execution device of the interactive function applied to the server, a user does not need to search a button or a retrieval frame and the like of the interactive function to be executed in a multimedia client, a voice instruction is directly input into the multimedia client, the server corresponding to the multimedia client can identify the interactive type of the interactive function requested to be executed based on the semantic content of the voice instruction, an operation instruction corresponding to the identified interactive type is generated according to the semantic content of the voice instruction, and the operation instruction is executed; then, the server feeds back the execution result of the operation instruction to the multimedia client. Therefore, the method and the device can improve the execution efficiency of the interactive function.

Corresponding to the above-mentioned method for executing an interactive function applied to a multimedia client, an embodiment of the present disclosure further provides an apparatus for executing an interactive function applied to a multimedia client, as shown in fig. 4, the apparatus may include: a first receiving module 401, a sending module 402 and a showing module 403.

Wherein, the first receiving module 401 is configured to receive a voice instruction for requesting to execute an interactive function;

the sending module 402 is configured to send the voice instruction to the server, so that the server identifies the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning the execution result of the operation instruction;

the presentation module 403 is configured to receive and present the execution result.

Optionally, the apparatus may further include: the second receiving module and the playing module;

the playing module is configured to play the feedback voice.

According to the execution device of the interactive function applied to the multimedia client, a user does not need to search a button or a search box and the like of the interactive function to be executed in the multimedia client, the user directly sends the voice instruction, the multimedia client can send the voice instruction to the server, and an execution result of the interactive function requested to be executed by the voice instruction fed back by the server is received. Therefore, the method and the device can improve the execution efficiency of the interactive function.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 5 is a block diagram illustrating a server according to an example embodiment, the server including, as shown in fig. 5:

a processor 510;

a memory 520 for storing instructions executable by the processor 510;

wherein the processor 510 is configured to execute the instructions to implement any one of the above-mentioned methods for executing the interaction function applied to the server.

Fig. 6 is a block diagram illustrating a multimedia client device according to an exemplary embodiment, as shown in fig. 6, the multimedia client device comprising:

a processor 610;

a memory 620 for storing instructions executable by the processor 610;

wherein the processor 610 is configured to execute the instructions to implement any one of the above-mentioned methods for executing the interactive function applied to the multimedia client.

It is understood that the multimedia client device is an electronic device in which the multimedia client is located.

Fig. 7 is a block diagram illustrating an apparatus 700 for performing interactive functions in accordance with an example embodiment. For example, the apparatus 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 7, apparatus 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.

The processing component 702 generally controls overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.

The memory 704 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 706 provides power to the various components of the device 700. The power components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 700.

The multimedia component 708 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 710 is configured to output and/or input audio signals. For example, audio component 710 includes a Microphone (MIC) configured to receive external audio signals when apparatus 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.

The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the apparatus 700. For example, sensor assembly 714 may detect an open/closed state of device 700, the relative positioning of components, such as a display and keypad of apparatus 700, sensor assembly 714 may also detect a change in position of apparatus 700 or a component of apparatus 700, the presence or absence of user contact with apparatus 700, orientation or acceleration/deceleration of apparatus 700, and a change in temperature of apparatus 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 716 is configured to facilitate wired or wireless communication between the apparatus 700 and other devices. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the apparatus 700 to perform the method of performing the above-described interactive function is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 8 is a block diagram illustrating an apparatus 800 for performing interactive functions in accordance with an example embodiment. For example, the apparatus 800 may be provided as a server. Referring to FIG. 8, the apparatus 800 includes a processing component 822, which further includes one or more processors, and memory resources, represented by memory 832, for storing instructions, such as applications, that are executable by the processing component 822. The application programs stored in memory 832 may include one or more modules that each correspond to a set of instructions. Further, the processing component 822 is configured to execute instructions to perform the execution method of the above-described interactive function.

The device 800 may also include a power component 826 configured to perform power management of the device 800, a wired or wireless network interface 850 configured to connect the device 800 to a network, and an input/output (I/O) interface 858. The apparatus 800 may operate based on an operating system stored in the memory 832, such as a Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or similar operating system.

In an exemplary embodiment, there is also provided a storage medium having a computer program stored therein, which when executed by a processor, implements any one of the above-described execution methods of an interactive function applied to a server.

In an exemplary embodiment, there is also provided a storage medium having a computer program stored therein, which when executed by a processor, implements any of the above-described execution methods applied to an interactive function of a multimedia client.

Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, there is also provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described execution methods applied to an interactive function of a server.

In an exemplary embodiment, there is also provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described methods of performing interactive functions applied to a multimedia client.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the claims.

Claims

1. An interactive function executing method applied to a server, the method comprising:

2. The method of claim 1, wherein identifying the interaction type of the interaction function requested to be performed based on the semantic content of the voice instruction comprises:

3. The method of claim 1, wherein after returning the result of the execution of the operation instruction to the multimedia client, the method further comprises:

4. An interactive function executing method applied to a multimedia client, the method comprising:

and receiving and displaying the execution result.

5. An apparatus for executing an interactive function, applied to a server, the apparatus comprising:

6. An apparatus for performing an interactive function, applied to a multimedia client, the apparatus comprising:

a presentation module configured to receive and present the execution result.

7. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to carry out the method steps of any one of claims 1-3.

8. A multimedia client device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to carry out the method steps of claim 4.

9. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-3.

10. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of claim 4.