CN110728981A - Interactive function execution method and device, electronic equipment and storage medium - Google Patents

Interactive function execution method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110728981A
CN110728981A CN201910955133.7A CN201910955133A CN110728981A CN 110728981 A CN110728981 A CN 110728981A CN 201910955133 A CN201910955133 A CN 201910955133A CN 110728981 A CN110728981 A CN 110728981A
Authority
CN
China
Prior art keywords
voice
interactive
instruction
multimedia client
voice instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910955133.7A
Other languages
Chinese (zh)
Inventor
赵丽娜
赵倩
白琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910955133.7A priority Critical patent/CN110728981A/en
Publication of CN110728981A publication Critical patent/CN110728981A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The disclosure relates to an interactive function execution method, an interactive function execution device, electronic equipment and a storage medium. The method comprises the steps of receiving a voice instruction which is input through a multimedia client and used for requesting to execute an interactive function; identifying an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction; and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result. The execution efficiency of the interactive function can be improved.

Description

Interactive function execution method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to an interactive function execution method and apparatus, an electronic device, and a storage medium.
Background
With the development of internet technology, services provided by multimedia clients to users are gradually diversified. For example, in the multimedia client, besides providing the multimedia playing function to the user, more types of interactive functions such as chatting, sharing and praise are provided.
In the related technology, a user designates an interaction type of an interaction function to be executed and parameters required by the interaction type in a multimedia client through clicking or retrieving and the like; further, the multimedia client sends the identification of the interaction type specified by the user and the parameters required by the interaction type to the server; after receiving the identification and the parameters sent by the multimedia client, the server generates and executes an operation instruction based on the identification and the parameters; and then, feeding back the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.
However, when the number of the interactive functions provided by the multimedia client is large, the process of the user searching for the button or the search box of the interactive function to be executed on the multimedia client is tedious and takes a long time, which undoubtedly results in low efficiency of executing the interactive function.
Disclosure of Invention
The disclosure provides an execution method and device of an interactive function, electronic equipment and a storage medium, so as to improve the execution efficiency of the interactive function. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided an execution method of an interactive function applied to a server, including:
receiving a voice instruction for requesting execution of an interactive function, which is input through a multimedia client;
identifying an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction;
generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction;
and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.
Optionally, the identifying, based on the semantic content of the voice instruction, an interaction type of the interaction function requested to be performed includes:
converting semantic content of the voice instruction into a text sequence, wherein the text sequence is a sequence formed by each word in the semantic content and part-of-speech information of each word;
inputting the text sequence into an interactive classification model trained in advance to obtain an identification of an interactive type corresponding to the text sequence;
taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed;
the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.
Optionally, the converting semantic content of the voice instruction into a text sequence includes:
performing word segmentation processing on the semantic content of the voice command to obtain each word segmentation and the part of speech of each word segmentation;
and constructing a text sequence by taking each participle and the part of speech of the participle as sequence elements, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.
Optionally, the generating, according to the semantic content of the voice instruction, an operation instruction corresponding to the identified interaction type includes:
extracting operation key words from semantic contents of the voice instruction;
and filling the extracted operation key words into the instruction template of the identified interaction type to generate the operation instruction.
Optionally, the extracting an operation keyword from semantic content of the voice instruction includes:
performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation;
classifying the participles by utilizing a pre-trained participle classification model to obtain an interaction type corresponding to each participle;
extracting the participles with the same interaction type as the identified interaction type from the participles to serve as operation keywords;
the word segmentation classification model is a model obtained by training based on sample word segmentation and the identification of the interaction type corresponding to the sample word segmentation.
Optionally, after returning the execution result of the operation instruction to the multimedia client, the method further includes:
and returning the feedback voice corresponding to the execution result to the multimedia client so that the multimedia client plays the feedback voice.
According to a second aspect of embodiments of the present disclosure, there is provided a method for performing an interactive function applied to a multimedia client, including:
receiving a voice instruction for requesting execution of an interactive function;
sending the voice instruction to a server so as to enable the server to identify the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning an execution result of the operation instruction;
and receiving and displaying the execution result.
Optionally, after presenting the execution result, the method further includes:
receiving feedback voice corresponding to the execution result sent by the server;
and playing the feedback voice.
According to a third aspect of embodiments of the present disclosure, there is provided an apparatus for performing an interactive function applied to a server, including:
a receiving module configured to receive a voice instruction for requesting execution of an interactive function, which is input through a multimedia client;
the recognition module is configured to recognize the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction;
the execution module is configured to generate an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and execute the operation instruction;
the first feedback module is configured to return an execution result of the operation instruction to the multimedia client, so that the multimedia client displays the execution result.
Optionally, the identification module includes: a conversion submodule and an identification submodule;
the conversion submodule is configured to convert semantic contents of the voice instruction into a text sequence, and the text sequence is a sequence formed by each word in the semantic contents and part-of-speech information of each word;
the recognition submodule is configured to input the text sequence into a pre-trained interactive classification model to obtain an identification of an interactive type corresponding to the text sequence; taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed; the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.
Optionally, the conversion sub-module is specifically configured to:
performing word segmentation processing on the semantic content of the voice command to obtain each word segmentation and the part of speech of each word segmentation;
and constructing a text sequence by taking each participle and the part of speech of the participle as sequence elements, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.
Optionally, the execution module includes an extraction submodule and a filling submodule;
the extraction sub-module is configured to extract operation keywords from semantic content of the voice instruction;
and the filling sub-module is configured to fill the extracted operation key words into the instruction template of the identified interaction type to generate the operation instruction.
Optionally, the extracting sub-module is specifically configured to:
performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation;
classifying the participles by utilizing a pre-trained participle classification model to obtain an interaction type corresponding to each participle;
extracting the participles with the same interaction type as the identified interaction type from the participles to serve as operation keywords;
the word segmentation classification model is a model obtained by training based on sample word segmentation and the identification of the interaction type corresponding to the sample word segmentation.
Optionally, the apparatus further comprises: a second feedback module;
the second feedback module is configured to return feedback voice corresponding to the execution result to the multimedia client, so that the multimedia client plays the feedback voice.
According to a fourth aspect of embodiments of the present disclosure, there is provided an apparatus for performing an interactive function applied to a multimedia client, including:
a first receiving module configured to receive a voice instruction for requesting execution of an interactive function;
a sending module configured to send the voice instruction to a server so as to enable the server to identify an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning an execution result of the operation instruction;
a presentation module configured to receive and present the execution result.
Optionally, the apparatus further comprises: the second receiving module and the playing module;
the second receiving module is configured to receive feedback voice corresponding to the execution result sent by the server;
the playing module is configured to play the feedback voice.
According to a fifth aspect of embodiments of the present disclosure, there is provided a server including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any one of the above methods for executing the interactive function applied to the server.
According to a sixth aspect of embodiments of the present disclosure, there is provided a multimedia client device comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement any one of the above-mentioned methods for executing an interactive function applied to a multimedia client.
According to a seventh aspect of embodiments of the present disclosure, there is provided a storage medium having a computer program stored therein, the computer program, when executed by a processor, implementing any one of the above-described execution methods for an interactive function applied to a server.
According to an eighth aspect of embodiments of the present disclosure, there is provided a storage medium having a computer program stored therein, wherein the computer program, when executed by a processor, implements any one of the above-mentioned methods for executing an interactive function applied to a multimedia client.
According to a ninth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described execution methods applied to an interactive function of a server.
According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described methods of performing an interactive function applied to a multimedia client.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects: in the scheme, a user does not need to search a button or a retrieval frame and the like of an interactive function to be executed in a multimedia client, a voice instruction is directly input in the multimedia client, a server corresponding to the multimedia client can identify the interactive type of the interactive function requested to be executed by the voice instruction based on the semantic content of the voice instruction, an operation instruction corresponding to the identified interactive type is generated according to the semantic content of the voice instruction, and the operation instruction is executed; then, the server feeds back the execution result of the operation instruction to the multimedia client. Therefore, the method and the device can improve the execution efficiency of the interactive function.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flowchart illustrating a method of performing an interactive function applied to a server according to an exemplary embodiment.
Fig. 2 is a flowchart illustrating a method of performing an interactive function applied to a multimedia client according to an exemplary embodiment.
Fig. 3 is a block diagram illustrating an apparatus for performing an interactive function applied to a server according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating an apparatus for performing an interactive function applied to a multimedia client according to an exemplary embodiment.
FIG. 5 is a block diagram illustrating a server in accordance with an example embodiment.
FIG. 6 is a block diagram illustrating a multimedia client device according to an example embodiment.
FIG. 7 is a block diagram illustrating an apparatus for performing interactive functions in accordance with an exemplary embodiment.
Fig. 8 is a block diagram illustrating another apparatus for performing interactive functions in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In order to improve the execution efficiency of the interactive function, the disclosure provides an interactive function execution method, an interactive function execution device, an electronic device and a storage medium.
The execution method of the interactive function provided by the present disclosure includes two execution methods, namely an execution method of the interactive function applied to a server and an execution method of the interactive function applied to a multimedia client. It can be understood that the execution subject of the execution method of the interactive function applied to the server may be an execution device of the interactive function in the server; the execution main body of the execution method of the interactive function applied to the multimedia client can be an execution device of the interactive function in the electronic equipment where the multimedia client is located. The multimedia client to which the present disclosure is directed may be a client for providing a short video service, but is not limited thereto.
First, a method for executing an interactive function applied to a server according to an embodiment of the present disclosure will be described in detail. As shown in fig. 1, the method may include the steps of:
s11: and receiving a voice command for requesting the execution of the interactive function, which is input through the multimedia client.
Wherein the voice instruction can be input to the multimedia client by a user of the multimedia client. For example, a voice input button may be provided in the multimedia client, and the user may input a voice command by clicking the button; or, a start voice may be preset in the multimedia client, and when the multimedia client monitors the start voice, a section of voice input by the user after the start voice may be used as the voice instruction, and of course, a section of voice including the start voice may also be used as the voice instruction; or, the multimedia client may start receiving a voice instruction when monitoring that the electronic device where the multimedia client is located is operated by shaking, touching, and the like according to a predetermined mode, and use a received voice as the voice instruction.
In addition, there are various interactive functions in the multimedia client, such as a search function, a recommendation function, a chat function, and a comment function, and the like, but not limited thereto.
S12: based on the semantic content of the voice instruction, an interaction type of the interaction function requested to be performed is identified.
It is understood that, in this step, based on the semantic content of the voice command, the interaction type of the interaction function requested to be performed by the voice command can be identified.
There are various specific implementation manners for identifying the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction. For example, in one implementation, identifying the interaction type of the interaction function requested to be performed based on the semantic content of the voice instruction may include:
converting semantic content of the voice instruction into a text sequence, wherein the text sequence is a sequence formed by each word in the semantic content and part-of-speech information of each word;
inputting the text sequence into an interactive classification model trained in advance to obtain an identification of an interactive type corresponding to the text sequence;
taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed;
the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.
It is understood that, in the process data before the interaction classification model outputs the identification of the interaction type corresponding to the text sequence, the probability that each word in the text sequence corresponds to the identification of various interaction types may be included; the identifier of the interaction type corresponding to the word with the highest probability is the identifier of the interaction type corresponding to the text sequence output by the interaction classification model.
When the interactive classification model is trained, natural corpora corresponding to the vertical field or the general field can be used as sample text sequences, and the identification of the interactive type corresponding to each sample text sequence is obtained in a labeling mode, so that the interactive classification model can be trained based on the sample text sequences and the identifications of the interactive types corresponding to the sample text sequences. Calculating a loss value of the interactive classification model based on the identification of the interactive type corresponding to the pre-labeled sample text sequence and the result of whether the identification of the interactive type corresponding to the sample text sequence output by the interactive classification model is consistent; and when the loss value is smaller than a preset first threshold value, the interactive classification model converges to finish training. The interactive classification model may be an svm (support Vector machine) model, a CNN (Convolutional Neural Network) model, an RNN (Recurrent Neural Network) model, a DNN (Deep Neural networks), or the like. It can be understood that, when the interaction classification model converges, for any sample text sequence, the identifier of the interaction type output by the interaction classification model may be the same as the identifier of the interaction type corresponding to the pre-labeled sample text sequence.
Correspondingly, after the interactive classification model is trained, the text sequence converted from the semantic content of the voice command is input into the interactive classification model, so that the identification of the interactive type corresponding to the text sequence can be obtained, and the interactive type of the interactive function requested to be executed by the voice command is identified.
In this implementation, there are various specific implementations for converting semantic content of the voice instruction into a text sequence. For example, in a first implementation, converting semantic content of a voice instruction into a text sequence may include: performing word segmentation processing on semantic content of the voice instruction to obtain each word segmentation; and constructing a text sequence by taking each participle as a sequence element, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.
In a second implementation, converting semantic content of the voice instruction into a text sequence may include:
performing word segmentation processing on semantic content of the voice instruction to obtain each word segmentation and the part of speech of each word segmentation;
and constructing a text sequence by taking each participle and the part of speech of the participle as sequence elements, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.
In the second implementation manner, each participle and the part-of-speech of the participle are used as sequence elements, and when a text sequence is constructed, the text sequence can be constructed according to the sequence of each participle in semantic content. For example, assuming that the semantic content of the voice instruction is "play food video bar", the constructed text sequence may be "play/n food/n video/n bar/s"; the slashes are used for separating all the participles, and letters behind the slashes represent the part of speech of the participles in front of the slashes. It will be appreciated that the ordering of the individual participles, when constructing the text sequence, may also be random. For example, the text sequence constructed according to the semantic content of the voice instruction "play food video bar" may also be "bar/s play/n video/n food/n".
For clarity of the scheme, the following describes a process of identifying an interaction type of an interaction function requested to be performed based on semantic content of a voice instruction by taking a specific example as an example.
For example, assuming that the semantic content of the voice instruction is "please help me play a gourmet video bar", a text sequence converted from the semantic content is input into the interaction classification model, and the obtained identification of the interaction type corresponding to the text sequence may be an identification of a "search function", and accordingly, it may be determined that the voice instruction requests execution of the interaction function of the "search function"; or, assuming that the semantic content of the voice instruction is "a work of a person who i pay attention to", inputting a text sequence into which the semantic content is converted into an interaction classification model, and the obtained identification of the interaction type corresponding to the text sequence may be an identification of a "recommendation function", and accordingly, it may be determined that the voice instruction requests execution of the interaction function of the "recommendation function"; or, assuming that the semantic content of the voice command is "like for the video point", inputting the text sequence converted from the semantic content into the interaction classification model, and the obtained identification of the interaction type corresponding to the text sequence may be an identification of a "comment function", and accordingly, determining that the voice command requests to execute the interaction function of the "comment function"; or, assuming that the semantic content of the voice instruction is "hello", inputting the text sequence converted from the semantic content into the interaction classification model, and determining that the interaction type identifier corresponding to the text sequence may be an identifier of a "chatting function", and accordingly, determining that the voice instruction requests to execute the interactive function of the "chatting function".
In addition, the specific implementation manner of performing word segmentation processing on the semantic content of the voice instruction is not the inventive point of the present disclosure, and is the same as or similar to the existing word segmentation technology, and is not described herein again.
S13: and generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction.
It can be understood that, when the interaction type of the interaction function requested to be executed by the voice instruction is determined, the operation instruction corresponding to the interaction type can be generated according to the semantic content of the voice instruction. Specifically, the content related to the operation instruction corresponding to the interaction type may be extracted from the semantic content, so as to generate the operation instruction corresponding to the interaction type. For clarity of the scheme and clarity of layout, a specific implementation manner of generating the operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction is illustrated subsequently.
In this step, after the operation instruction is generated, the operation instruction may be executed.
S14: and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.
It can be understood that different interaction types correspond to different execution results. For clarity of the scheme and clarity of layout, the execution results corresponding to different interaction types are illustrated in the following.
In this step, after the multimedia client receives the execution result sent by the server, the execution result can be correspondingly displayed.
In the method for executing the interactive function provided by the embodiment of the disclosure, a user directly inputs a voice instruction at a multimedia client without searching a button or a search box of the interactive function to be executed in the multimedia client, and a server corresponding to the multimedia client can identify the interactive type of the interactive function requested to be executed based on the semantic content of the voice instruction, generate an operation instruction corresponding to the identified interactive type according to the semantic content of the voice instruction, and execute the operation instruction; then, the server feeds back the execution result of the operation instruction to the multimedia client. Therefore, the method and the device can improve the execution efficiency of the interactive function.
For clarity of the scheme and clarity of layout, a specific implementation manner for generating the operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction is illustrated below.
For example, in an implementation manner, generating an operation instruction corresponding to the identified interaction type according to semantic content of the voice instruction may include:
extracting operation key words from semantic contents of the voice instruction;
and filling the extracted operation key words into the instruction template of the identified interaction type to generate an operation instruction.
It can be understood that the instruction template is composed of a software method, and the operation instruction can be obtained by filling the extracted operation keyword into the software method as an input parameter of the software method.
There are various specific implementation manners for extracting the operation keywords from the semantic content of the voice instruction. For example, in one implementation, the step of extracting the operation keyword from the semantic content of the voice instruction may include:
performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation;
classifying each participle by utilizing a pre-trained participle classification model to obtain an interaction type corresponding to each participle;
extracting the participles with the same interaction type as the identified interaction type from the participles to serve as operation keywords;
the word segmentation classification model is a model obtained by training based on sample word segmentation and the identification of the interaction type corresponding to the sample word segmentation.
In this implementation manner, the training process of the segmentation classification model may be similar to the training process of the interactive classification model, that is, when training the segmentation classification model, massive vocabularies may be used as sample segmentation, and the identification of the interactive type corresponding to each sample segmentation may be obtained in a labeling manner. And training a segmentation classification model based on the sample segmentation and the identification of the interaction type corresponding to the sample segmentation. And calculating a loss value of the word segmentation classification model based on the pre-labeled identification of the interaction type corresponding to the sample word segmentation and the result of whether the identification of the interaction type corresponding to the sample word segmentation output by the word segmentation classification model is consistent, and when the loss value is smaller than a preset second threshold value, the word segmentation classification model is converged to complete training. Here, the second threshold may be the same as or different from the first threshold, and the numerical values of the first threshold and the second threshold and the magnitude relationship therebetween are not limited in the present disclosure. The word segmentation classification model may also be an svm (support vector machine) model, a CNN (Convolutional Neural Network) model, an RNN (Recurrent Neural Network) model, a DNN (Deep Neural Networks), or the like. It can be understood that, when the word segmentation classification model converges, for any sample word segmentation, the identifier of the interaction type corresponding to the sample word segmentation output by the word segmentation classification model may be the same as the identifier of the interaction type corresponding to the pre-labeled sample word segmentation.
In practical application, when the identifier of the interaction type corresponding to the sample participle is labeled, a mode of probability that the sample participle belongs to various interaction types can be adopted. In the labeling manner, the probability that each sample participle belongs to the identifier of the corresponding interaction type may be set to 1, and the probability that the sample participle belongs to the identifiers of other interaction types may be set to 0. In this way, after the training of the segmentation classification model is completed, each segmentation obtained by the segmentation processing is input to the segmentation classification model, and the probability that each segmentation belongs to each interaction type can be obtained, so that the segmentation which belongs to the interaction type identified in step S12 and has a probability greater than the preset probability threshold can be used as an operation keyword.
In another implementation, the step of extracting the operation keyword from the semantic content of the voice instruction may include:
performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation and the part of speech of each word segmentation;
aiming at each participle, inputting a text sequence formed by the participle and the part of speech of the participle into the trained interactive classification model to obtain an identification of an interactive type corresponding to the participle;
and taking the corresponding identification of the interaction type and the participle with the same identification of the interaction type identified in the step S12 as the operation key word.
It can be understood that, the interactive classification model is obtained by using a natural corpus corresponding to a vertical domain or a general domain, and the natural corpus may include sentences or words. Therefore, the trained interactive classification model can predict the identification of the interactive type corresponding to the sentence, and can also predict the identification of the interactive type corresponding to the word; that is, the interactive classification model may predict the identifier of the interactive type corresponding to the semantic content of the voice instruction, or may predict the identifier of the interactive type corresponding to the participle.
In one implementation manner, in the two implementation manners of extracting the operation keyword from the semantic content of the voice instruction, the step of performing word segmentation processing on the semantic content corresponding to the voice instruction may be combined with the step of performing word segmentation processing on the semantic content of the voice instruction when the interaction type of the interaction function requested to be executed is identified based on the semantic content of the voice instruction in step S12, that is, when the interaction type of the interaction function requested to be executed is identified based on the semantic content of the voice instruction, after the word segmentation processing is performed on the semantic content of the voice instruction, a result of the word segmentation processing may be directly obtained, and the identifier of the interaction type corresponding to each word in the semantic content of the voice instruction is predicted by using the word segmentation classification model or the interaction classification model.
It is understood that the operation keywords of different types of interaction types are different. In the disclosure, firstly, the interaction type of the interaction function requested to be executed by the voice command is determined, so that the operation keywords related to the interaction type can be further extracted from the semantic content of the voice command in a targeted manner.
And after the operation key words are extracted, the operation key words can be filled in the instruction template of the identified interaction type, so that an operation instruction is generated and executed.
For clarity of the scheme, the following further describes an execution method of the interactive function provided by the embodiment of the present disclosure, taking a specific interactive function as an example.
Example 1, assuming that the semantic content of the voice command is "please help me play a gourmet video bar", it may be determined that the voice command requests execution of a "search function" based on the step of S12; based on the step of S13, an operation keyword "food" related to the interaction type of the "search function" may be extracted, and the "food" is filled into the instruction template of the interaction type of the "search function" to obtain an operation instruction for executing the search function with the "food" as a keyword; and executing the operation instruction to obtain a video search result of which the video name comprises food and/or the video category is food.
Example 2, assuming that the semantic content of the voice instruction is "a work of a person who i focused on", it may be determined that the voice instruction requests execution of the "recommendation function" based on the step of S12, and an operation keyword "i focused" related to the interaction type of the "recommendation function" may be extracted based on the step of S13, and the "i focused" is filled in the instruction template of the interaction type of the "recommendation function", resulting in: selecting an operation instruction of performing video recommendation in a recommended category of 'i pay attention to'; and executing the operation instruction to obtain a video recommendation result with the recommended category of 'i pay attention to'. Of course, there are many categories of recommendations in the recommendation function, such as "hot video", "lovely pet video", "news video", and "newest video", etc.
Example 3, assume that the semantic content of the voice instruction is "hello"; it may be determined based on the step of S12 that the voice command requests to perform a "chit chat function"; based on the step of S13, the operation keyword "nihaya" related to the interaction type of the "chatting function" may be extracted, and the "nihaya" is filled into the instruction template of the interaction type of the "chatting function" to obtain the feedback voice corresponding to the "nihaya". Here, the feedback voice is, for example, "hello, XXX", or the like. Where XXX may specifically be the name of the "respected user" or the user of the specific multimedia client, etc., it is reasonable.
Example 4, assume that the semantic content of the voice command is "like this video"; it may be determined based on the step of S12 that the voice instruction requests to perform a "critique function"; based on the step of S13, the operation keyword "like" related to the interaction type of the "comment function" may be extracted, the "like" is filled in the instruction template of the interaction type of the "comment function", and an operation instruction for adding 1 to the number of like of the video and lighting the like icon of the video is obtained; and executing the operation instruction to obtain an execution result that the video praise number is increased by 1 and the praise graph of the video is lighted up.
In addition, in the embodiment of the present disclosure, the extracted operation keyword from the semantic content of the voice instruction may include a plurality of operation keywords; correspondingly, the instruction template of the interaction type requested to be executed by the voice instruction may include filling positions of a plurality of operation keywords. For example, assume that the semantic content of the voice instruction is "say bye for zhang san; it may be determined based on the step of S12 that the voice command requests to perform a "chat function"; correspondingly, the instruction template of the interaction type of the chat function comprises two filling positions of the operation keywords, one is a chat object, and the other is chat content. Filling "zhang san" in the position of the chat object, and filling "bye" in the position of the chat content, so as to obtain a corresponding operation instruction, and executing the operation instruction, wherein the obtained execution result may be: and calling a conversation with Zhang III, and adding a new message with the content of 'bye' in the conversation.
It should be noted that the above-mentioned interaction functions and the operation keywords related to each interaction function are only examples, and should not be construed as limiting the disclosure.
Optionally, in an implementation manner, in order to enhance the user experience of the multimedia client, after the execution result of the operation instruction is returned to the multimedia client, the method for executing the interactive function applied to the server may further include:
and returning the feedback voice corresponding to the execution result to the multimedia client so that the multimedia client plays the feedback voice.
It can be understood that, since the user sends the voice command, the voice interaction between the user and the multimedia client can be established by feeding back the execution result of the interactive function requested to be executed by the voice command to the user through the feedback voice, so as to improve the user experience of the multimedia client.
Corresponding to the above-mentioned method for executing an interactive function applied to a server, an embodiment of the present disclosure further provides a method for executing an interactive function applied to a multimedia client, as shown in fig. 2, where the method may include:
s21: a voice instruction requesting execution of an interactive function is received.
Here, the voice instruction may be issued by a user of the multimedia client. For example, a voice input button may be provided in the multimedia client, and the user may input a voice command by clicking the button; or, a start voice may be preset in the multimedia client, and when the multimedia client monitors the start voice, a section of voice input by the user after the start voice may be used as the voice instruction, and of course, a section of voice including the start voice may also be used as the voice instruction; or, the multimedia client may start receiving the voice instruction when monitoring that the device where the multimedia client is located is operated by shaking, touching, and the like according to a predetermined mode, and use the received voice as the voice instruction.
In addition, there are various interactive functions in the multimedia client, such as a search function, a recommendation function, a chat function, and a comment function, and the like, but not limited thereto.
S22: sending the voice instruction to a server so that the server identifies the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; and executing the operation instruction and returning the execution result of the operation instruction.
In this step, as to the specific implementation manner of each step executed by the server, detailed description has been already made in the method for executing the interactive function applied to the server provided in the embodiment of the present disclosure, and details are not described here again.
S23: and receiving and displaying an execution result.
It can be understood that, because the execution results corresponding to different interactive functions are different, for different interactive functions, the display modes of the multimedia client are different when the multimedia client displays the received execution results. Examples 1 to 4 in the above-described embodiment are taken as an example. In example 1, the execution result of the search function whose operation keyword is "food" may be: the video name contains "food" and/or video search results for which the video category is "food". Accordingly, the multimedia client can display the video search result of which the video name comprises food and/or the video category is food. In example 2 described above, the execution result of the "recommended function" whose operation keyword is "i pay attention to" may be: the recommendation category is video recommendation results under the category of 'i pay attention to'. Correspondingly, the multimedia client can display the video recommendation results with the recommendation category of 'i pay attention to'. In the above example 3, the execution result of the "chatting service" whose operation keyword is "hello" may be a feedback voice corresponding to "hello". Accordingly, the multimedia client can play the feedback voice. In the above example 4, the execution result of the "review service" whose operation keyword is "like" may be the number of like of the video plus 1, and the like, in which the like map of the video is highlighted. Accordingly, the multimedia client may light up the video's likes icon and add 1 to the displayed video's likes.
In the method for executing the interactive function applied to the multimedia client, the user does not need to search a button or a search box and the like of the interactive function to be executed in the multimedia client, and directly sends the voice instruction, so that the multimedia client can send the voice instruction to the server and receive the execution result of the interactive function requested to be executed by the voice instruction fed back by the server. Therefore, the method and the device can improve the execution efficiency of the interactive function.
Optionally, in an implementation manner, in order to enhance the user experience of the multimedia client, after the execution result is presented, the method for executing the interactive function applied to the multimedia client may further include:
receiving feedback voice corresponding to an execution result sent by a server;
and playing the feedback voice.
It can be understood that, since the user sends the voice command, the execution result of the interactive function requested to be executed by the voice command is fed back to the user by the feedback voice, and the voice interaction between the user and the multimedia client can be established, so as to improve the user experience of the multimedia client.
Corresponding to the above-mentioned method for executing the interactive function applied to the server, an embodiment of the present disclosure further provides an apparatus for executing the interactive function applied to the server, as shown in fig. 3, the apparatus may include: a receiving module 301, an identifying module 302, an executing module 303 and a first feedback module 304.
Wherein, the receiving module 301 is configured to receive a voice instruction for requesting to execute an interactive function, which is input through a multimedia client;
the recognition module 302 is configured to recognize an interaction type of the interaction function requested to be performed based on semantic content of the voice instruction;
the execution module 303 is configured to generate an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and execute the operation instruction;
the first feedback module 304 is configured to return the execution result of the operation instruction to the multimedia client, so that the multimedia client displays the execution result.
Optionally, the identifying module 302 includes: a conversion submodule and an identification submodule;
the conversion submodule is configured to convert semantic contents of the voice instruction into a text sequence, and the text sequence is a sequence formed by each word in the semantic contents and part-of-speech information of each word;
the recognition sub-module is configured to input the text sequence into a pre-trained interactive classification model to obtain an identification of an interactive type corresponding to the text sequence; taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed; the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.
Optionally, the conversion sub-module is specifically configured to:
performing word segmentation processing on semantic content of the voice instruction to obtain each word segmentation and the part of speech of each word segmentation;
and constructing a text sequence by taking each participle and the part of speech of the participle as sequence elements, wherein the text sequence is used as a text sequence converted from the semantic content of the voice instruction.
Optionally, the executing module 303 includes an extracting sub-module and a filling sub-module;
the extraction submodule is configured to extract operation keywords from semantic contents of the voice instruction;
the filling sub-module is configured to fill the extracted operation keywords into the instruction template of the identified interaction type to generate an operation instruction.
Optionally, the extracting sub-module is specifically configured to:
performing word segmentation processing on semantic content corresponding to the voice instruction to obtain each word segmentation;
classifying each participle by utilizing a pre-trained participle classification model to obtain an interaction type corresponding to each participle;
extracting the participles with the same interaction type as the identified interaction type from the participles to serve as operation keywords;
the word segmentation classification model is a model obtained by training based on sample word segmentation and the identification of the interaction type corresponding to the sample word segmentation.
Optionally, the apparatus may further include: a second feedback module;
the second feedback module is configured to return feedback voice corresponding to the execution result to the multimedia client, so that the multimedia client plays the feedback voice.
According to the execution device of the interactive function applied to the server, a user does not need to search a button or a retrieval frame and the like of the interactive function to be executed in a multimedia client, a voice instruction is directly input into the multimedia client, the server corresponding to the multimedia client can identify the interactive type of the interactive function requested to be executed based on the semantic content of the voice instruction, an operation instruction corresponding to the identified interactive type is generated according to the semantic content of the voice instruction, and the operation instruction is executed; then, the server feeds back the execution result of the operation instruction to the multimedia client. Therefore, the method and the device can improve the execution efficiency of the interactive function.
Corresponding to the above-mentioned method for executing an interactive function applied to a multimedia client, an embodiment of the present disclosure further provides an apparatus for executing an interactive function applied to a multimedia client, as shown in fig. 4, the apparatus may include: a first receiving module 401, a sending module 402 and a showing module 403.
Wherein, the first receiving module 401 is configured to receive a voice instruction for requesting to execute an interactive function;
the sending module 402 is configured to send the voice instruction to the server, so that the server identifies the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning the execution result of the operation instruction;
the presentation module 403 is configured to receive and present the execution result.
Optionally, the apparatus may further include: the second receiving module and the playing module;
the second receiving module is configured to receive feedback voice corresponding to the execution result sent by the server;
the playing module is configured to play the feedback voice.
According to the execution device of the interactive function applied to the multimedia client, a user does not need to search a button or a search box and the like of the interactive function to be executed in the multimedia client, the user directly sends the voice instruction, the multimedia client can send the voice instruction to the server, and an execution result of the interactive function requested to be executed by the voice instruction fed back by the server is received. Therefore, the method and the device can improve the execution efficiency of the interactive function.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 5 is a block diagram illustrating a server according to an example embodiment, the server including, as shown in fig. 5:
a processor 510;
a memory 520 for storing instructions executable by the processor 510;
wherein the processor 510 is configured to execute the instructions to implement any one of the above-mentioned methods for executing the interaction function applied to the server.
Fig. 6 is a block diagram illustrating a multimedia client device according to an exemplary embodiment, as shown in fig. 6, the multimedia client device comprising:
a processor 610;
a memory 620 for storing instructions executable by the processor 610;
wherein the processor 610 is configured to execute the instructions to implement any one of the above-mentioned methods for executing the interactive function applied to the multimedia client.
It is understood that the multimedia client device is an electronic device in which the multimedia client is located.
Fig. 7 is a block diagram illustrating an apparatus 700 for performing interactive functions in accordance with an example embodiment. For example, the apparatus 700 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 7, apparatus 700 may include one or more of the following components: a processing component 702, a memory 704, a power component 706, a multimedia component 708, an audio component 710, an input/output (I/O) interface 712, a sensor component 714, and a communication component 716.
The processing component 702 generally controls overall operation of the device 700, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 702 may include one or more processors 720 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 702 may include one or more modules that facilitate interaction between the processing component 702 and other components. For example, the processing component 702 may include a multimedia module to facilitate interaction between the multimedia component 708 and the processing component 702.
The memory 704 is configured to store various types of data to support operations at the apparatus 700. Examples of such data include instructions for any application or method operating on device 700, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 704 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 706 provides power to the various components of the device 700. The power components 706 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 700.
The multimedia component 708 includes a screen that provides an output interface between the device 700 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 708 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 700 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 710 is configured to output and/or input audio signals. For example, audio component 710 includes a Microphone (MIC) configured to receive external audio signals when apparatus 700 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 704 or transmitted via the communication component 716. In some embodiments, audio component 710 also includes a speaker for outputting audio signals.
The I/O interface 712 provides an interface between the processing component 702 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 714 includes one or more sensors for providing status assessment of various aspects of the apparatus 700. For example, sensor assembly 714 may detect an open/closed state of device 700, the relative positioning of components, such as a display and keypad of apparatus 700, sensor assembly 714 may also detect a change in position of apparatus 700 or a component of apparatus 700, the presence or absence of user contact with apparatus 700, orientation or acceleration/deceleration of apparatus 700, and a change in temperature of apparatus 700. The sensor assembly 714 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 714 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 714 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 716 is configured to facilitate wired or wireless communication between the apparatus 700 and other devices. The apparatus 700 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 716 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 716 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 700 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a storage medium comprising instructions, such as the memory 704 comprising instructions, executable by the processor 720 of the apparatus 700 to perform the method of performing the above-described interactive function is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 8 is a block diagram illustrating an apparatus 800 for performing interactive functions in accordance with an example embodiment. For example, the apparatus 800 may be provided as a server. Referring to FIG. 8, the apparatus 800 includes a processing component 822, which further includes one or more processors, and memory resources, represented by memory 832, for storing instructions, such as applications, that are executable by the processing component 822. The application programs stored in memory 832 may include one or more modules that each correspond to a set of instructions. Further, the processing component 822 is configured to execute instructions to perform the execution method of the above-described interactive function.
The device 800 may also include a power component 826 configured to perform power management of the device 800, a wired or wireless network interface 850 configured to connect the device 800 to a network, and an input/output (I/O) interface 858. The apparatus 800 may operate based on an operating system stored in the memory 832, such as a Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or similar operating system.
In an exemplary embodiment, there is also provided a storage medium having a computer program stored therein, which when executed by a processor, implements any one of the above-described execution methods of an interactive function applied to a server.
In an exemplary embodiment, there is also provided a storage medium having a computer program stored therein, which when executed by a processor, implements any of the above-described execution methods applied to an interactive function of a multimedia client.
Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, there is also provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described execution methods applied to an interactive function of a server.
In an exemplary embodiment, there is also provided a computer program product which, when run on a computer, causes the computer to perform any of the above-described methods of performing interactive functions applied to a multimedia client.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the claims.

Claims (10)

1. An interactive function executing method applied to a server, the method comprising:
receiving a voice instruction for requesting execution of an interactive function, which is input through a multimedia client;
identifying an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction;
generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and executing the operation instruction;
and returning the execution result of the operation instruction to the multimedia client so that the multimedia client displays the execution result.
2. The method of claim 1, wherein identifying the interaction type of the interaction function requested to be performed based on the semantic content of the voice instruction comprises:
converting semantic content of the voice instruction into a text sequence, wherein the text sequence is a sequence formed by each word in the semantic content and part-of-speech information of each word;
inputting the text sequence into an interactive classification model trained in advance to obtain an identification of an interactive type corresponding to the text sequence;
taking the interaction type corresponding to the obtained identification as the interaction type of the interaction function requested to be executed;
the interactive classification model is a model obtained by training based on a plurality of sample text sequences and the identification of the interactive type labeled on each sample text sequence.
3. The method of claim 1, wherein after returning the result of the execution of the operation instruction to the multimedia client, the method further comprises:
and returning the feedback voice corresponding to the execution result to the multimedia client so that the multimedia client plays the feedback voice.
4. An interactive function executing method applied to a multimedia client, the method comprising:
receiving a voice instruction for requesting execution of an interactive function;
sending the voice instruction to a server so as to enable the server to identify the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning an execution result of the operation instruction;
and receiving and displaying the execution result.
5. An apparatus for executing an interactive function, applied to a server, the apparatus comprising:
a receiving module configured to receive a voice instruction for requesting execution of an interactive function, which is input through a multimedia client;
the recognition module is configured to recognize the interaction type of the interaction function requested to be executed based on the semantic content of the voice instruction;
the execution module is configured to generate an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction, and execute the operation instruction;
the first feedback module is configured to return an execution result of the operation instruction to the multimedia client, so that the multimedia client displays the execution result.
6. An apparatus for performing an interactive function, applied to a multimedia client, the apparatus comprising:
a first receiving module configured to receive a voice instruction for requesting execution of an interactive function;
a sending module configured to send the voice instruction to a server so as to enable the server to identify an interaction type of the interaction function requested to be executed based on semantic content of the voice instruction; generating an operation instruction corresponding to the identified interaction type according to the semantic content of the voice instruction; executing the operation instruction and returning an execution result of the operation instruction;
a presentation module configured to receive and present the execution result.
7. A server, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to carry out the method steps of any one of claims 1-3.
8. A multimedia client device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to carry out the method steps of claim 4.
9. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-3.
10. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when being executed by a processor, carries out the method steps of claim 4.
CN201910955133.7A 2019-10-09 2019-10-09 Interactive function execution method and device, electronic equipment and storage medium Pending CN110728981A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910955133.7A CN110728981A (en) 2019-10-09 2019-10-09 Interactive function execution method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910955133.7A CN110728981A (en) 2019-10-09 2019-10-09 Interactive function execution method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110728981A true CN110728981A (en) 2020-01-24

Family

ID=69219758

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910955133.7A Pending CN110728981A (en) 2019-10-09 2019-10-09 Interactive function execution method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110728981A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164400A (en) * 2020-09-18 2021-01-01 广州小鹏汽车科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112242140A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Intelligent device control method and device, electronic device and storage medium
CN116760942A (en) * 2023-08-22 2023-09-15 云视图研智能数字技术(深圳)有限公司 Holographic interaction teleconferencing method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543082A (en) * 2012-01-19 2012-07-04 北京赛德斯汽车信息技术有限公司 Voice operation method for in-vehicle information service system adopting natural language and voice operation system
CN104078043A (en) * 2013-04-26 2014-10-01 腾讯科技(深圳)有限公司 Method and system for recognition of voice operational command of network transaction system
US20140310004A1 (en) * 2013-04-10 2014-10-16 Via Technologies, Inc. Voice control method, mobile terminal device, and voice control system
CN104462347A (en) * 2014-12-04 2015-03-25 北京国双科技有限公司 Keyword classifying method and device
CN106254915A (en) * 2016-07-29 2016-12-21 乐视控股(北京)有限公司 Exchange method based on television terminal, Apparatus and system
CN106847284A (en) * 2017-03-09 2017-06-13 深圳市八圈科技有限公司 Electronic equipment, computer-readable recording medium and voice interactive method
CN109559744A (en) * 2018-12-12 2019-04-02 泰康保险集团股份有限公司 Processing method, device and the readable storage medium storing program for executing of voice data
CN109903755A (en) * 2019-02-26 2019-06-18 珠海格力电器股份有限公司 A kind of voice interactive method, device, storage medium and air conditioner

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102543082A (en) * 2012-01-19 2012-07-04 北京赛德斯汽车信息技术有限公司 Voice operation method for in-vehicle information service system adopting natural language and voice operation system
US20140310004A1 (en) * 2013-04-10 2014-10-16 Via Technologies, Inc. Voice control method, mobile terminal device, and voice control system
CN104078043A (en) * 2013-04-26 2014-10-01 腾讯科技(深圳)有限公司 Method and system for recognition of voice operational command of network transaction system
CN104462347A (en) * 2014-12-04 2015-03-25 北京国双科技有限公司 Keyword classifying method and device
CN106254915A (en) * 2016-07-29 2016-12-21 乐视控股(北京)有限公司 Exchange method based on television terminal, Apparatus and system
CN106847284A (en) * 2017-03-09 2017-06-13 深圳市八圈科技有限公司 Electronic equipment, computer-readable recording medium and voice interactive method
CN109559744A (en) * 2018-12-12 2019-04-02 泰康保险集团股份有限公司 Processing method, device and the readable storage medium storing program for executing of voice data
CN109903755A (en) * 2019-02-26 2019-06-18 珠海格力电器股份有限公司 A kind of voice interactive method, device, storage medium and air conditioner

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112164400A (en) * 2020-09-18 2021-01-01 广州小鹏汽车科技有限公司 Voice interaction method, server and computer-readable storage medium
CN112242140A (en) * 2020-10-13 2021-01-19 中移(杭州)信息技术有限公司 Intelligent device control method and device, electronic device and storage medium
CN116760942A (en) * 2023-08-22 2023-09-15 云视图研智能数字技术(深圳)有限公司 Holographic interaction teleconferencing method and system
CN116760942B (en) * 2023-08-22 2023-11-03 云视图研智能数字技术(深圳)有限公司 Holographic interaction teleconferencing method and system

Similar Documents

Publication Publication Date Title
US10990623B2 (en) Information retrieval method, eletronic device and storage medium
CN108038102B (en) Method and device for recommending expression image, terminal and storage medium
CN108073606B (en) News recommendation method and device for news recommendation
CN110728981A (en) Interactive function execution method and device, electronic equipment and storage medium
US11335348B2 (en) Input method, device, apparatus, and storage medium
CN106547850B (en) Expression annotation method and device
CN110598098A (en) Information recommendation method and device and information recommendation device
CN112464031A (en) Interaction method, interaction device, electronic equipment and storage medium
CN108270661B (en) Information reply method, device and equipment
CN112131466A (en) Group display method, device, system and storage medium
CN106960026B (en) Search method, search engine and electronic equipment
CN109814730B (en) Input method and device and input device
CN111209381B (en) Time management method and device in dialogue scene
CN112784151B (en) Method and related device for determining recommended information
CN112988956A (en) Method and device for automatically generating conversation and method and device for detecting information recommendation effect
CN111240497A (en) Method and device for inputting through input method and electronic equipment
CN110213062B (en) Method and device for processing message
CN113656557A (en) Message reply method, device, storage medium and electronic equipment
CN113593614A (en) Image processing method and device
CN113923517A (en) Background music generation method and device and electronic equipment
CN112800084A (en) Data processing method and device
CN112036247A (en) Expression package character generation method and device and storage medium
CN113589949A (en) Input method and device and electronic equipment
CN108241438B (en) Input method, input device and input device
CN112437193A (en) Chat frame

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200124