CN107610690B

CN107610690B - Data processing method and device

Info

Publication number: CN107610690B
Application number: CN201710930363.9A
Authority: CN
Inventors: 蔡明祥
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2012-12-11
Filing date: 2012-12-11
Publication date: 2021-09-14
Anticipated expiration: 2032-12-11
Also published as: CN103871410B; CN103871410A; CN107610690A

Abstract

The invention relates to the field of multimedia processing, in particular to a data processing method and a device, wherein the method is applied to a multimedia terminal and comprises the following steps: receiving a first input; generating a first voice request according to the first input; acquiring a first voice output result obtained by processing the first voice request; judging whether the first voice output result meets a first preset condition or not, and obtaining a first judgment result; and when the first judgment result shows that the first voice output result does not meet the first preset condition, not playing the first voice output result. By applying the method provided by the invention, the voice output result played by the multimedia terminal always corresponds to the latest voice request, thereby realizing the matching of the voice output result and the voice request and leading the voice playing result to accord with the expectation of a user.

Description

Data processing method and device

The application has an application date of 2012, 12 and 11, and has an application number of: 201210533421.1, title of the invention: a divisional application of a data processing method and apparatus.

Technical Field

The present invention relates to the field of multimedia processing, and in particular, to a data processing method and apparatus.

Background

TTS (Text To Speech, from Text To Speech) is a Speech synthesis technique that converts a user's Text input into Speech data for playback To the user. Because the voice in the voice data obtained by applying the TTS technology is very dynamic, the TTS technology is widely applied to the field of voice control, and very good experience is brought to users. In the prior art, TTS is generally in an asynchronous playing mode, and after a client requests a voice event from a TTS server, the client is in a state of waiting for the TTS server to feed back voice information until the server feeds back voice information, and the client plays the voice information. If the user quickly makes another voice event request while the client waits for the server to feed back, the client obviously does not meet the user's expectations if the feedback for the first voice event request is also played. Therefore, the TTS asynchronous voice output method in the prior art cannot solve the problem that the voice request of the user corresponds to the matching of the played voice data.

Disclosure of Invention

In order to solve the above technical problem, embodiments of the present invention provide a data processing method and apparatus, which can implement a matching correspondence between a voice request and played voice data. The technical scheme is as follows:

according to a first aspect of the embodiments of the present invention, a data processing method is disclosed, which is applied to a multimedia terminal, and the method includes:

receiving a first input;

generating a first voice request according to the first input;

acquiring a first voice output result obtained by processing the first voice request;

judging whether the first voice output result meets a first preset condition or not, and obtaining a first judgment result;

and when the first judgment result shows that the first voice output result does not meet the first preset condition, not playing the first voice output result.

Preferably, after receiving the first input, the method further comprises:

receiving a second input;

generating a second voice request according to the second input;

acquiring a second voice output result obtained by processing the second voice request;

when the first voice output result is judged not to meet the first preset condition, judging whether the second voice output result meets the first preset condition or not, and obtaining a second judgment result;

and when the second judgment result shows that the second voice output result meets a first preset condition, playing a second voice output result corresponding to the second voice request.

Preferably, the generating a first voice request according to the first input comprises:

processing the first input to obtain a first processing result;

and taking the first processing result as a first voice request.

and generating a first voice request and a first identifier corresponding to the first voice request according to the first input, and storing the corresponding relation between the first voice request and the first identifier.

Preferably, the determining whether the first voice output result meets a first preset condition includes:

acquiring a first voice request corresponding to a first voice output result according to the first voice output result;

acquiring a first identifier according to the corresponding relation between the first voice request and the first identifier;

acquiring a third identifier, comparing the first identifier with the third identifier, and determining that a first preset condition is met when the first identifier is the same as the third identifier; wherein the third identification corresponds to a most recent voice request.

Preferably, the obtaining a first voice output result obtained by processing the first voice request includes:

sending the first voice request to a server so that the server processes the first voice request to obtain a first voice output result;

and receiving a first voice output result sent by the server.

Preferably, the first identifier is a timestamp, a universally unique identifier UUID, or a hash value.

Preferably, when the first identifier is a timestamp, the generating a first voice request and a first identifier corresponding to the first voice request according to the first input includes:

generating a first voice request according to the first input;

generating a first local timestamp corresponding to the first voice request as a first identifier according to the time of the first voice request generation, and storing the corresponding relation between the first voice request and the first local timestamp;

the method further comprises the following steps:

generating a global timestamp as a third identifier according to the time generated by the first voice request; the third identification is updated when a new voice request is generated.

Preferably, the obtaining of the third identifier compares the first identifier with the third identifier:

obtaining a global timestamp corresponding to a latest voice request;

comparing a first local timestamp corresponding to the first voice request to the global timestamp.

According to a second aspect of the embodiments of the present invention, there is disclosed a data processing apparatus, the apparatus including:

a first receiving unit for receiving a first input;

a first generating unit, configured to generate a first voice request according to the first input;

the first acquisition unit is used for acquiring a first voice output result obtained by processing the first voice request;

the first judging unit is used for judging whether the first voice output result meets a first preset condition or not and acquiring a first judging result;

and the output unit is used for not playing the first voice output result when the first judgment result shows that the first voice output result does not meet the first preset condition.

Preferably, the apparatus further comprises:

a second receiving unit for receiving a second input;

a second generating unit, configured to generate a second voice request according to the second input;

the second acquisition unit is used for acquiring a second voice output result obtained by processing the second voice request;

the second judging unit is used for judging whether the second voice output result meets the first preset condition or not when the first voice output result does not meet the first preset condition, and acquiring a second judging result;

the output unit is further configured to play a second voice output result corresponding to the second voice request when the second determination result indicates that the second voice output result satisfies a first preset condition.

Preferably, the first generating unit is specifically configured to process the first input to obtain a first processing result; and taking the first processing result as a first voice request.

Preferably, the first generating unit is further configured to generate a first voice request and a first identifier corresponding to the first voice request according to the first input, and store a corresponding relationship between the first voice request and the first identifier.

Preferably, the first judging unit includes:

the second acquisition unit is used for acquiring a first voice request corresponding to a first voice output result according to the first voice output result;

a third obtaining unit, configured to obtain a first identifier according to a corresponding relationship between the first voice request and the first identifier;

the comparison unit is used for acquiring a third identifier, comparing the first identifier with the third identifier, and determining that a first preset condition is met when the first identifier is the same as the third identifier; wherein the third identification corresponds to a most recent voice request.

Preferably, the first obtaining unit includes:

the sending unit is used for sending the first voice request to a server so that the server processes the first voice request to obtain a first voice output result;

and the receiving unit is used for receiving the first voice output result sent by the server.

Preferably, when the first identifier is a timestamp, the first generating unit includes:

a voice request generating unit, configured to generate a first voice request according to the first input;

a first identifier generation unit, configured to generate a first local timestamp corresponding to the first voice request as a first identifier according to the time when the first voice request is generated, and store a correspondence between the first voice request and the first local timestamp;

a third identifier generating unit, configured to generate a global timestamp as a third identifier according to the time generated by the first voice request; the third identification is updated when a new voice request is generated.

Preferably, the comparing unit is specifically configured to obtain a global timestamp, where the global timestamp corresponds to the latest voice request; comparing a first local timestamp corresponding to the first voice request to the global timestamp.

The embodiment of the invention has the following beneficial effects in one aspect: the invention provides a data processing method, which is applied to a multimedia terminal, wherein the multimedia terminal receives a first input, generates a first voice request according to the first input, and acquires a first voice output result obtained by processing the first voice request. Judging whether the first voice output result meets a first preset condition or not, and obtaining a first judgment result; and when the first judgment result shows that the first voice output result does not meet the first preset condition, not playing the first voice output result. In this way, when the multimedia terminal judges that the returned first voice output result does not meet the preset condition, the returned first voice output result is determined not to correspond to the latest voice request, the first voice output result is not played, and the first voice output result is played only when the first voice output result corresponds to the latest voice request. Therefore, the voice output result played by the multimedia terminal always corresponds to the latest voice request, the matching of the voice output result and the voice request is realized, and the voice playing result meets the expectation of a user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a first embodiment of a data processing method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a second embodiment of a data processing method according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a third embodiment of a data processing method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an embodiment of a data processing apparatus according to the present invention.

Detailed Description

The embodiment of the invention provides a data processing method and a data processing device, which can solve the problem that a voice request is matched with played voice data.

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of a data processing method according to a first embodiment of the present invention is shown.

The method provided by the first embodiment of the invention is applied to a multimedia terminal which is provided with an output unit for outputting audio data. The multimedia terminal can be an electronic device such as a smart television, a mobile phone, a PAD, a computer and the like.

S101, receiving a first input.

The multimedia terminal receives a first input, which may be a key input, a gesture input, a cursor input, or a voice input. The multimedia terminal may have a user interface for receiving a first input from a user, the first input being associated with a voice request. The user can trigger and generate the voice request through preset key action, input instructions, mouse click, cursor click or movement action and preset gesture input. Alternatively, the user enters the text information as a first input. Alternatively, a voice input of the user is taken as the first input. When the first input is a voice input, the multimedia terminal should have an audio collecting unit for collecting the voice input of the user. Of course, the first input may also be control information or data from other electronic devices.

S102, generating a first voice request according to the first input.

In specific implementation, when the first input is non-text input, the first input is processed and converted into text input, and a text input result is used as a first voice request. Further, when the first input is a voice input, voice recognition processing is performed to convert the voice input into a text input. Preferably, the semantic recognition processing is performed on a text input result obtained by converting the voice input into the text input, and the semantic recognition result is used as the first voice request. Wherein the semantic recognition processing is performed for the purpose of performing semantic analysis on the text input result to obtain a result that can be recognized by a computing device having a processor. Generally, the results of semantic recognition or analysis may include one or more of an action, a target of action execution, or a scenario of an application. The invention is not limited in this regard.

Further, one possible implementation manner of generating the first voice request according to the first input is as follows: processing the first input to obtain a first processing result; and taking the first processing result as a first voice request. In specific implementation, a user performs a first input through a multimedia terminal to initiate a first voice request, and when the user desires to play a processing result of the first input, the user needs to process the first input first to obtain a first processing result, and the first processing result is used as the first voice request.

Further, another implementation manner of generating the first voice request according to the first input is as follows: and generating a first voice request and a first identifier corresponding to the first voice request according to the first input, and storing the corresponding relation between the first voice request and the first identifier. The first identification may be a timestamp, a universally unique identifier UUID, or a hash value. Wherein the first identifier is used for uniquely identifying the first voice request. The invention is not limited to the specific manner of the first identifier, and other implementations obtained by those skilled in the art without inventive labor fall within the scope of the invention.

S103, acquiring a first voice output result obtained by processing the first voice request.

In this embodiment of the present invention, the multimedia terminal further has a communication module for performing data connection with the server. Preferably, the server is a cloud TTS server.

Step S103 is specifically realized by the following steps:

S103A, the multimedia terminal sends the first voice request to the server, so that the server processes the first voice request to obtain a first voice output result.

The multimedia terminal sends the first voice request to the server, and the server responds to the first voice request of the multimedia terminal and processes the first request to obtain a first voice output result. The specific implementation of the server obtaining the first voice output result according to the first voice request may be in a manner provided by the prior art, and the present invention is not described herein again.

S103B, receiving the first voice output result sent by the server.

And after the server processes the first voice request, sending the obtained first voice output result to the multimedia terminal, and receiving the first voice output result sent by the server by the multimedia terminal.

S104, judging whether the first voice output result meets a first preset condition or not, and obtaining a first judgment result.

In the first embodiment of the present invention, in order to achieve that the currently played voice output result of the multimedia terminal is always matched with the latest voice request, a first preset condition is set, and when it is determined that the first voice output result satisfies the first preset condition, the first voice output result is played. And when the first voice output result is judged not to meet the first preset condition, the first voice output result is not played. The first preset condition is used for judging whether the currently acquired voice output result is matched with the latest voice request. Corresponding to the step of the first example, the first preset condition is used for judging whether the acquired first voice output result is matched with the latest voice request. In a specific implementation, the first preset condition may be preset by a system or a user.

Preferably, when the implementation manner of generating the first voice request is to generate the first voice request and the first identifier corresponding to the first voice request according to the first input, the determining whether the first voice output result satisfies the first preset condition may specifically include:

S104A, according to the first voice output result, acquiring a first voice request corresponding to the first voice output result.

In the embodiment of the invention, the multimedia terminal is provided with a communication module which can realize data communication with the server. The communication module is provided with a processing mechanism, and can realize the correspondence between the sent voice request and the voice output result returned by the server. In a specific implementation, the processing mode of the communication module may be set as a synchronous processing mode, that is, after a sub-module of the communication module sends a voice request, the sub-module waits for the server to return a voice output result obtained by processing the voice request. The communication module may have a plurality of sub-modules for transmitting \ receiving data. The plurality of sub-modules may be further divided into a transmitting unit and a receiving unit.

And when the multimedia terminal receives a first voice output result returned by the server, acquiring a first voice request corresponding to the first voice output result.

S104B, obtaining a first identifier according to the corresponding relation between the first voice request and the first identifier.

And acquiring the first identifier according to the pre-stored corresponding relation between the first voice request and the first identifier.

S104C, acquiring a third identifier, comparing the first identifier with the third identifier, and determining that a first preset condition is met when the first identifier is the same as the third identifier; wherein the third identification corresponds to a most recent voice request.

Wherein the third identification corresponds to the most recent voice request. In the first embodiment of the invention, each time the multimedia terminal receives the user input, the multimedia terminal generates the voice request corresponding to the user input and sets a unique identifier for the voice request. When the user's input is plural, the third identification is a most recently generated identification corresponding to a most recent voice request.

And comparing the first identifier and the third identifier corresponding to the first voice request/first voice output result, and if the first identifier and the third identifier are the same, determining that the first voice output result corresponds to the latest voice request, and judging that the first voice output result meets a first preset condition. And if the first identifier is different from the third identifier, determining that the first voice output result does not correspond to the latest voice request, and judging that the first voice output result does not accord with a first preset condition.

S105, when the first judgment result shows that the first voice output result does not meet the first preset condition, the first voice output result is not played.

In the first embodiment of the present invention, the first speech output result is played only when the first speech output result satisfies the first preset condition, and the first speech output result is not played when the first speech output result does not satisfy the first preset condition. Therefore, the voice output result played by the multimedia terminal is ensured to always correspond to the latest voice request, the matching of the voice output result and the voice request is realized, the true expectation of a user is better met, and the user experience is improved.

Referring to fig. 2, a flowchart of a data processing method according to a second embodiment of the present invention is shown.

The method provided by the second embodiment of the invention is applied to a multimedia terminal which is provided with an output unit for outputting audio data. The multimedia terminal can be an electronic device such as a smart television, a mobile phone, a PAD, a computer and the like.

In the second embodiment of the present invention, the case where the multimedia terminal receives two input requests is described, and it will be understood by those skilled in the art that the method provided in the second embodiment of the present invention can also be applied to the case where the multimedia terminal receives a plurality of input requests. Those skilled in the art can make modifications and variations to the present invention without inventive step, and all such modifications and variations are within the scope of the present invention.

S201, receiving a first input.

S202, generating a first voice request according to the first input.

Further, after generating the first identifier and storing the corresponding relationship between the first identifier and the first voice request, the method provided by the present invention further comprises: a third identification is generated. The third identification corresponds to a most recent voice request. In specific implementation, when the first voice request is generated and the first identifier is generated, the copy of the first identifier is used as the third identifier. The third identification is updated when a new voice request is generated.

S203, a first voice output result obtained by processing the first voice request is obtained.

And S204, receiving a second input.

Wherein the second input occurs after the first input.

And S205, generating a second voice request according to the second input.

The implementation manner of generating the second voice request according to the second input is the same as the implementation manner of generating the first request according to the first input. And during specific implementation, generating a second voice request and a second identifier corresponding to the second voice request according to the second input, and storing the corresponding relation between the second voice request and the second identifier. The second identification may be a timestamp, a universally unique identifier UUID, or a hash value. Wherein the second identifier is used to uniquely identify the second voice request. The invention is not limited to the specific manner of the second identifier, and other implementations obtained by those skilled in the art without inventive labor fall within the scope of the invention. Typically, the first identifier is of the same type as the second identifier.

Further, it was previously mentioned that a third identification was generated at the same time or after the first identification was generated, the third identification corresponding to the most recent voice request. Therefore, when a new voice request is generated, namely a second voice request is generated, the third identification is updated. Specifically, when the second voice request is generated and the second identifier is generated, a copy of the second identifier is taken as the third identifier. Thus, the third identification is updated when a new voice request is generated.

It will be understood by those skilled in the art that the second input is generated later than the first input, but the execution order of the steps (S202, S203) of the first input processing and the steps (S205, S206) of the second input processing may be performed in reverse, or in parallel.

S206, a second voice output result obtained by processing the second voice request is obtained.

S207, judging whether the first voice output result meets a first preset condition or not, and obtaining a first judgment result.

In specific implementation, the first preset condition is used for judging whether the currently acquired voice output result is matched with the latest voice request. And when the first voice output result is judged to meet the first preset condition, playing the first voice output result. When it is determined that the first speech output result does not satisfy the first preset condition, the first speech output result is not played, and the process proceeds to step S208.

The description will be given by taking the first preset condition as an example to determine whether the identifier corresponding to the current voice output result corresponds to the identifier corresponding to the latest updated voice request. In concrete implementation, the first preset condition is to determine whether the first identifier corresponding to the first voice output result is the same as the third identifier, which is taken as an example for explanation, and since the third identifier is updated (replaced with a copy of the second identifier) when the second voice request is generated, when the first identifier is compared with the third identifier, and the obtained determination result is that the first identifier is different from the third identifier, the step S208 is performed.

S208, when the first voice output result is judged not to meet the first preset condition, judging whether the second voice output result meets the first preset condition or not, and obtaining a second judgment result.

The first preset condition is further used for judging whether the currently acquired voice output result (i.e. the second voice output result) is matched with the latest voice request.

Still take the first preset condition as an example to determine whether the identifier corresponding to the current voice output result corresponds to the identifier corresponding to the latest updated request. In a specific implementation, in this step, taking the first preset condition as an example to determine whether the second identifier corresponding to the second voice output result is the same as the third identifier, since the third identifier is updated (replaced with a copy of the second identifier) when the second voice request is generated, when the second identifier is compared with the third identifier, and the obtained determination result is that the second identifier is the same as the third identifier, it is determined that the second voice output result satisfies the first preset condition, and the process proceeds to step S209.

S209, when the second judgment result shows that the second voice output result meets the first preset condition, playing a second voice output result corresponding to the second voice request.

And when the second voice output result meets the first preset condition, playing the second voice output result corresponding to the second voice request. If the current input is multiple, when the second voice output result is judged not to meet the first preset condition, namely the second voice output result is determined not to correspond to the latest voice request, the second voice output result is not played.

In the second embodiment of the present invention, when the multimedia terminal receives two or more inputs requesting voices, the voice output result is played only when it is determined that the currently acquired voice output result corresponds to the latest voice request; otherwise, abandoning the voice output result and not playing. In the concrete implementation, a unique identifier is given to the voice request, the identifier corresponding to the currently acquired voice output result is compared with the identifier corresponding to the latest voice request, and when the identifier corresponding to the currently acquired voice output result is judged to be the same as the identifier corresponding to the latest voice request, the currently acquired voice output result is output only after the currently acquired voice output result is determined to correspond to the latest voice request, so that the matching of the voice output result and the voice request is realized, and the user experience is improved. On the other hand, the method provided by the invention completely matches the voice request with the voice output result by the multimedia terminal in a mode of giving the unique identifier without additional operation of the server, thereby avoiding the reconstruction of the server and saving network transmission resources.

Referring to fig. 3, a flowchart of a data processing method according to a third embodiment of the present invention is shown.

In the methods provided in the first and second embodiments of the present invention, the unique identifier given to the generated voice request may specifically be a timestamp, a universally unique identifier UUID, or a hash value, and is used to uniquely identify the voice request and a voice output result corresponding to the voice request. The specific application scenario of the present invention is described below by taking the unique identifier as a timestamp as an example. The following method can also be used in situations where other identifications are used. Alternatively, those skilled in the art can modify and modify the methods provided in the following examples to adapt to the implementation of other forms of identification, and the embodiments obtained thereby are within the scope of the present invention.

In the third embodiment of the present invention, the case where the multimedia terminal receives two input requests is still taken as an example for description, and it can be understood by those skilled in the art that the method provided in the third embodiment of the present invention can also be applied to the case where the multimedia terminal receives a plurality of input requests. Those skilled in the art can make modifications and variations to the present invention without inventive step, and all such modifications and variations are within the scope of the present invention.

S301, receiving a first input.

S302, generating a first voice request according to the first input, generating a first local timestamp corresponding to the first voice request, and generating a global timestamp according to a time generated by the first voice request.

In a specific implementation, one possible implementation manner of generating the first voice request according to the first input is as follows: processing the first input to obtain a first processing result; and taking the first processing result as a first voice request. In specific implementation, a user performs a first input through a multimedia terminal to initiate a first voice request, and when the user desires to play a processing result of the first input, the user needs to process the first input first to obtain a first processing result, and the first processing result is used as the first voice request. To illustrate by way of an example, the user sends an input (which may be a text input or a voice input) to the multimedia terminal asking for "what is now", at which point the multimedia terminal needs to process the input, i.e. obtain the current time, and take the result of processing the input (e.g. now 12 points) as the first voice request. Of course, this is only a simple example, and the processing of the first input by the multimedia terminal may involve more complex processing, such as querying, retrieving, translating, converting, etc., which is not limited by the present invention.

When a first voice request is generated according to the first input, according to the time of the first voice request generation, a first local timestamp corresponding to the first voice request is generated as a first identifier, and the corresponding relation between the first voice request and the first local timestamp is stored.

Further, after generating the first local timestamp and storing a corresponding relationship between the first local timestamp and the first voice request, the method provided by the present invention further includes: and generating a global timestamp as a third identifier according to the time generated by the first voice request. The global timestamp corresponds to the most recent voice request. In particular, when the first voice request is generated and the first local timestamp is generated, a copy of the first local timestamp is taken as the global timestamp. The global timestamp is updated when a new voice request is generated.

S303, acquiring a first voice output result obtained by processing the first voice request.

S304, receiving a second input.

Wherein the second input occurs after the first input.

S305, generating a second voice request according to the second input, generating a second local timestamp corresponding to the second voice request, and updating the global timestamp according to the time generated by the second voice request.

The implementation manner of generating the second voice request according to the second input is the same as the implementation manner of generating the first request according to the first input. During specific implementation, a second voice request and a second local timestamp corresponding to the second voice request are generated according to the second input, and the corresponding relation between the second voice request and the second local timestamp is stored.

Further, as mentioned previously, a global timestamp is generated at the same time as or after the first local timestamp is generated, the global timestamp corresponding to the most recent voice request. The global timestamp is updated when a new voice request is generated, i.e. when a second voice request is generated. Specifically, when the second voice request is generated and the second local timestamp is generated, a copy of the second local timestamp is taken as the global timestamp. In this way, the global timestamp is updated when a new voice request is generated.

It will be understood by those skilled in the art that the second input is generated later than the first input, but the execution order of the steps (S302, S303) of the first input processing and the steps (S305, S306) of the second input processing may be performed in reverse, or in parallel.

S306, a second voice output result obtained by processing the second voice request is obtained.

S307, acquiring the global timestamp, comparing the first local timestamp with the global timestamp, acquiring a first determination result, and if the first determination result indicates that the first local timestamp is different from the global timestamp, performing step S308.

S308, comparing whether a second local timestamp corresponding to the second voice output result is the same as the global timestamp or not, and acquiring a second judgment result.

S309, when the second judgment result shows that the second local timestamp corresponding to the second voice output result is the same as the global timestamp, playing the second voice output result corresponding to the second voice request.

And when the second local timestamp corresponding to the second voice output result is judged to be the same as the global timestamp, determining that the second voice output result corresponds to the latest voice request, and playing the second voice output result corresponding to the second voice request. If the current input is multiple, when the second local timestamp corresponding to the second voice output result is judged to be different from the global timestamp, namely the second voice output result is determined not to correspond to the latest voice request, the second voice output result is not played.

In the third embodiment of the present invention, in a specific implementation, a unique identifier is assigned to the voice request by using a timestamp, the identifier corresponding to the currently obtained voice output result is compared with the timestamp corresponding to the latest voice request, and when the identifier corresponding to the currently obtained voice output result is judged to be the same as the timestamp corresponding to the latest voice request, the currently obtained voice output result is output only when the currently obtained voice output result is determined to correspond to the latest voice request, so that matching between the voice output result and the voice request is achieved, user experience is improved, and the method is simple to implement.

Furthermore, in the first embodiment, the second embodiment and the third embodiment of the present invention, after the multimedia terminal plays the voice output result, the method may further include: and converting the voice output result meeting the first preset condition into a control signaling, and controlling the multimedia terminal to execute the control signaling. For example, when the user inputs "water of forgetting to play liu de hua" through text or voice, the voice output result obtained after the multimedia terminal processes the input is "water of forgetting to play liu de hua for you now", and at this time, the multimedia terminal may control the processing unit of the multimedia terminal to search the media library and play the audio data matched with the voice output result while playing the voice output result. The above is only an example and is not to be considered as a limitation of the present invention, and other embodiments obtained by those skilled in the art without inventive efforts belong to the protection scope of the present invention.

Fig. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention.

The device comprises:

a first receiving unit 401, configured to receive a first input.

A first generating unit 402, configured to generate a first voice request according to the first input.

A first obtaining unit 403, configured to obtain a first voice output result obtained by processing the first voice request.

A first determining unit 404, configured to determine whether the first voice output result meets a first preset condition, and obtain a first determination result.

An output unit 405, configured to not play the first voice output result when the first determination result indicates that the first voice output result does not satisfy the first preset condition.

Preferably, the apparatus further comprises:

a second receiving unit for receiving a second input;

Preferably, the first judging unit includes:

Preferably, the first obtaining unit includes:

and the third receiving unit is used for receiving the first voice output result sent by the server.

Preferably, the data processing apparatus may further include an audio acquisition unit for acquiring a voice input.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The foregoing is directed to embodiments of the present invention, and it is understood that various modifications and improvements can be made by those skilled in the art without departing from the spirit of the invention.

Claims

1. A data processing method, applied to a multimedia terminal, the method comprising:

receiving a first input;

generating a first voice request according to the first input;

receiving a second input;

generating a second voice request according to the second input;

when the first judgment result shows that the first voice output result does not meet the first preset condition, judging whether the second voice output result meets the first preset condition or not, and obtaining a second judgment result;

and when the second judgment result shows that the second voice output result meets a first preset condition, playing the second voice output result corresponding to the second voice request, and not playing the first voice output result.

2. The method of claim 1, wherein generating a first voice request based on the first input comprises:

processing the first input to obtain a first processing result;

and taking the first processing result as a first voice request.

3. The method of claim 1 or 2, wherein generating a first voice request based on the first input comprises:

4. The method of claim 1, wherein obtaining the first speech output result from processing the first speech request comprises:

and receiving a first voice output result sent by the server.

5. The method of claim 3, wherein the first identifier is a timestamp, a Universally Unique Identifier (UUID), or a hash value.

6. A data processing method, applied to a multimedia terminal, the method comprising:

receiving a first input;

generating a first voice request and a first identifier corresponding to the first voice request according to the first input, and storing the corresponding relation between the first voice request and the first identifier;

acquiring a third identifier, comparing the first identifier with the third identifier, and determining that a first preset condition is met when the first identifier is the same as the third identifier; wherein the third identification corresponds to a most recent voice request;

7. The method of claim 6, wherein obtaining the third identifier, comparing the first identifier with the third identifier:

obtaining a global timestamp corresponding to a latest voice request;

8. A data processing method, applied to a multimedia terminal, the method comprising:

receiving a first input;

generating a first voice request and a first identifier corresponding to the first voice request according to the first input, and storing the corresponding relation between the first voice request and the first identifier, wherein the first identifier is a timestamp, a universal unique identification code (UUID) or a hash value;

when the first judgment result shows that the first voice output result does not meet a first preset condition, the first voice output result is not played;

when the first identifier is a timestamp, generating a first voice request and a first identifier corresponding to the first voice request according to the first input, and storing a corresponding relationship between the first voice request and the first identifier includes:

generating a first voice request according to the first input;

the method further comprises the following steps:

9. A data processing apparatus, characterized in that the apparatus comprises:

a first receiving unit for receiving a first input;

a second receiving unit for receiving a second input;

and the output unit is used for playing the second voice output result corresponding to the second voice request and not playing the first voice output result when the second judgment result shows that the second voice output result meets a first preset condition.