CN113192490A

CN113192490A - Voice processing method and device and electronic equipment

Info

Publication number: CN113192490A
Application number: CN202110404559.0A
Authority: CN
Inventors: 彭伟峰
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-07-30

Abstract

The application discloses a voice processing method and device and electronic equipment, and belongs to the technical field of electronic equipment. The method comprises the following steps: receiving a first voice signal input by a user; obtaining at least two tasks corresponding to the first voice signal according to task separation identification contained in the first voice signal; each of the at least two tasks is executed.

Description

Voice processing method and device and electronic equipment

Technical Field

The application belongs to the technical field of electronic equipment, and particularly relates to a voice processing method and device and electronic equipment.

Background

When a user needs to execute a task on the smart phone, corresponding voice can be sent to a voice assistant of the smart phone, and the voice assistant can recognize the voice and execute the task in the voice.

At present, a user can only send a voice corresponding to one task to a voice assistant, and after the task is executed, the user needs to wake up the voice assistant again and send a voice corresponding to the next task.

The voice assistant needs to be awakened repeatedly, so that the user experience is poor.

Disclosure of Invention

The embodiment of the application aims to provide a voice processing method, a voice processing device and electronic equipment, which can solve the problem of poor user experience caused by the need of repeatedly waking up a voice assistant.

In a first aspect, an embodiment of the present application provides a speech processing method, where the method includes: receiving a first voice signal input by a user; obtaining at least two tasks corresponding to the first voice signal according to task separation identification contained in the first voice signal; each of the at least two tasks is executed.

In a second aspect, an embodiment of the present application provides a speech processing apparatus, including: the first receiving module is used for receiving a first voice signal input by a user; the first processing module is used for obtaining at least two tasks corresponding to the first voice signal according to the task separation identifier contained in the first voice signal; and an execution module for executing each of the at least two tasks.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.

In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, a first voice signal input by a user is received; obtaining at least two tasks corresponding to the first voice signal according to task separation identification contained in the first voice signal; each of the at least two tasks is executed. The embodiment of the application provides a voice processing method, which can solve the problem of poor user experience caused by the need of repeatedly waking up a voice assistant.

Drawings

FIG. 1 is a flowchart of a speech processing method provided in this embodiment;

fig. 2 to 14 are schematic diagrams of user interfaces of the electronic device provided in the present embodiment;

fig. 15 is a block schematic diagram of a speech processing apparatus provided in the present embodiment;

fig. 16 is a schematic diagram of a hardware structure of an electronic device provided in this embodiment;

fig. 17 is a schematic diagram of a hardware structure of another electronic device provided in this embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The following describes the speech processing method provided by the embodiment of the present application in detail through a specific embodiment and an application scenario thereof with reference to the accompanying drawings.

Referring to fig. 1, a speech processing method provided in this embodiment may include the following steps S110 to S130:

step S110, receiving a first voice signal input by a user.

A user may enter a first voice signal relating to a plurality of tasks via a voice assistant of the electronic device. The first voice signal may be any voice signal input by a user. Wherein, before the user sends out the voice corresponding to the first voice signal, the voice assistant is usually required to be woken up.

Referring to fig. 3, the user may turn on the voice assistant on the smartphone, speaking voice 1: opening application A, sending tomorrow ten-click to user C, next opening application A, and sending I 'go home today' to user B. The page used by the voice assistant when the user utters voice 1 may be as shown in fig. 3. Based on this, the voice assistant may receive a corresponding voice signal input by the user.

In a possible implementation, as shown in fig. 3, the electronic device may recognize voice information corresponding to the first voice signal and display the voice information at a corresponding position. Based on the method, the user can conveniently check whether the voice information is in accordance with the expectation, and the accurate execution of the task is ensured.

Step S120, obtaining at least two tasks corresponding to the first voice signal according to the task separation identifier included in the first voice signal.

In this embodiment, a plurality of tasks covered by one user voice are separated by task separation identifiers in the voice signal. Therefore, a plurality of tasks corresponding to the first voice signal can be obtained based on the task separation identifier contained in the first voice signal. In general, the number of tasks is the number of task separation identifiers plus 1.

In one embodiment of the present disclosure, the set task separation identifier may include at least one of a set keyword and a silent segment. Wherein, the silence segment can be a voice segment containing ambient noise but not containing the voice of the user.

In a possible implementation, as shown in fig. 2, the user can set the implementation of the task separation flag as desired. For example, as shown in fig. 2, the user may set a user-defined keyword such as "next", "line feed" and/or a system default keyword as a task separation identifier, may also use a silent segment brought by a pause operation as a task separation identifier, and may set a pause time as needed.

For example, after the user speaks the voice 1, two tasks are available. These two tasks are: opening application A, and sending tomorrow ten-click to user C (task 1.1); application A is opened and user B is sent "I go home today" (task 1.2).

In one embodiment of the present disclosure, the task includes a task operation item and an application identifier.

For example, the application identifier in the task 1.1 is application a, the task operation item is "ten clicks in tomorrow" sent to the user C, and the task 1.1 corresponds to application a; the application in task 1.2 is identified as application a, the task operation item is "i go home today" sent to user B, and task 1.2 corresponds to application a.

Based on this, the step S120, obtaining at least two tasks corresponding to the first speech signal according to the task separation identifier included in the first speech signal, may include the following steps S1201 to S1203:

step S1201, obtaining at least two voice messages separated by the task separation mark according to the task separation mark contained in the first voice signal; wherein the at least two voice messages correspond to the at least two tasks one to one.

For example, according to the voice 1 uttered by the user, based on the task separation identifier "next" included in the corresponding voice signal, two pieces of voice information can be obtained, which are: opening the application A, and sending tomorrow ten-click to the user C (voice information 1.1); application a is opened and user B is sent "i go home today" (voice message 1.2).

For another example, the voice uttered by the user is voice 1': opening application A, sending tomorrow ten clicks to user C, and next sending I ' M ' home today ' to user B. Based on the task separation identifier "next" contained in the corresponding speech signal, two pieces of speech information can be obtained, which are: opening the application A, and sending tomorrow ten-click to the user C (voice information 1'. 1); user B is sent "i go home today" (speech information 1'. 2).

Step S1202, when the first voice message includes a task operation item and an application identifier, obtains a corresponding task according to the first voice message.

Since the voice message 1.1, the voice message 1.2 and the voice message 1'.1 all include task operation items and application identifiers, the corresponding task 1.1, the task 1.2 and the task 1'.1 can be obtained respectively (i.e. opening the application a and sending "tomorrow ten clicks" to the user C).

Step S1203, in a case that the first voice message includes a task operation item and does not include an application identifier, obtaining a task corresponding to the first voice message according to the task operation item included in the first voice message and the application identifier included in the second voice message; the second voice information is the prior voice information which contains the application identification in the first voice signal and is closest to the first voice information.

Since the voice message 1'.2 contains task operation items and does not contain application identifiers, the corresponding task 1'.2 can be obtained according to the application identifiers contained in the voice message 1'.1 and the task operation items contained in the voice message 1'.2 (i.e. opening the application a and sending "i'm go home today" to the user B). I.e. the application corresponding to the task of the speech information 1'.1, is also used as the application corresponding to the task of the speech information 1'. 2.

Step S130, each task of the at least two tasks is executed.

It can be seen that, in this embodiment, the user speaks voice 1 and speaks voice 1 'to achieve the same voice control effect, so that the user may prefer to use voice 1' instead of voice 1, and thus for the same application multitask, it is not necessary to repeatedly speak the application identifier when each task in the same application multitask is spoken, and it is only necessary to speak the application identifier when the first task in the same application multitask is spoken. The realization mode can simplify the voice content of the user, avoid unnecessary voice information repetition, has good user experience, and is particularly suitable for the condition of more tasks in the same application and multiple tasks.

In this embodiment, after obtaining a plurality of tasks corresponding to the first voice signal, each obtained task may be executed to achieve the purpose of voice control expected by the user.

In this embodiment, each task that a user utters through one voice can be separated by the set task separation identifier, so that one user voice can correspond to a plurality of tasks, and the plurality of tasks can be identified and executed. Therefore, the user can send out a plurality of tasks by waking up the voice assistant once, the voice assistant does not need to be waken up repeatedly, and the user experience is better.

Furthermore, compared to executing multiple tasks by presetting a combination command, this implementation is not flexible because the multiple tasks in the combination command are fixed. Based on the voice processing method provided by the application, a user can speak multi-task voice as required, multiple tasks can be executed through one-time awakening, and the combination of the multiple tasks is more flexible.

Therefore, the multitask voice processing mode provided by the embodiment is beneficial for the user to quickly complete the personalized requirements of the user, and the use experience of the user is improved.

In a possible implementation, the multitask category of the multiple tasks issued by the user may be any one of the following categories:

class 1: single application and multiple tasks, namely, a plurality of tasks sent by a user correspond to the same application;

class 2: the multi-application and multi-task means that a plurality of tasks sent by a user respectively correspond to more than one application.

The category 2 may be further classified into the following categories 2.1 and 2.2:

class 2.1: the application number is equal to the task number, namely a plurality of tasks sent by the user respectively correspond to different applications;

class 2.2: the number of applications is smaller than the number of tasks, that is, there are not only multiple tasks corresponding to the same application but also multiple tasks corresponding to different applications among multiple tasks issued by a user.

In detail, the application in this embodiment may be any application that can be installed on an electronic device (e.g., a smartphone), and the application may be an instant chat application, a browser application, a map application, a song application, a photo application, or the like.

Next, the above-described respective categories will be described.

In detail, corresponding to the above category 1:

in one embodiment of the present disclosure, the at least two tasks include at least two first tasks corresponding to a first application. Based on this, the step S130, executing each task of the at least two tasks, includes: each of the first tasks is executed in the first application, respectively.

For example, after the user utters the above speech 1, since the speech 1 includes a task separation identifier (i.e. a "next" task separation identifier), two tasks can be obtained, and both tasks correspond to the application a. Thus, application a can be opened and the two tasks can be executed in application a.

In addition, in an embodiment of the present disclosure, if only one application identifier is included in the first voice signal, only each task item after the application identifier is split, an application corresponding to the application identifier is opened, and each task item is executed in the opened application. Thus, the voice processing flow can be simplified.

In detail, corresponding to the above category 2.1:

in one embodiment of the present disclosure, the at least two tasks include a first task corresponding to a first application and a second task corresponding to a second application. Based on this, the step S130, executing each task of the at least two tasks, includes: executing the first task in the first application, and executing the second task in the second application.

For example, the user utters speech 2: open application a, execute task operation transaction 1, open application B next, execute task operation transaction 2, open application C next, execute task operation transaction 3. Since the speech 2 includes two task separation identifiers (i.e. the "next" task separation identifier), three tasks can be obtained, which are: opening the application A and executing task operation item 1 (task 2.1); opening the application B, and executing the task operation item 2 (task 2.2); the application C is opened and the task operation item 3 (task 2.3) is executed. Thus, application a may be opened and task 2.1 may be executed in application a, application B may be reopened and task 2.2 may be executed in application B, application C may be reopened and task 2.3 may be executed in application C.

In detail, corresponding to the above category 2.2:

in one embodiment of the present disclosure, the at least two tasks include at least two first tasks corresponding to a first application and a second task corresponding to a second application. Based on this, the step S130, executing each task of the at least two tasks, includes: each of the first tasks is executed in the first application, and the second task is executed in the second application, respectively.

For example, the user utters speech 3: open application a, execute task operation transaction 1, open application B next, execute task operation transaction 2, execute task operation transaction 3 next. Since the speech 2 includes two task separation identifiers (i.e. the "next" task separation identifier), three tasks can be obtained, which are: opening the application A and executing task operation item 1 (task 3.1); opening the application B, and executing task operation item 2 (task 3.2); application B is opened and task operation item 3 (task 3.3) is executed. Thus, application a may be opened and task 3.1 may be performed in application a, and then application B may be opened and task 3.2 and task 3.3 may be performed in application B.

In this embodiment, when each obtained task is executed, each task may be executed according to a corresponding task execution order.

Therefore, in an embodiment of the present disclosure, before the step S130, each of the at least two tasks is executed, the method may further include the following step a:

and step A, determining the execution sequence of the at least two tasks according to the sequence position relationship of the at least two tasks in the first voice signal.

Correspondingly, the step S130, executing each task of the at least two tasks, includes: and executing each task of the at least two tasks according to the execution sequence.

In this embodiment, under the condition that the user does not adjust the task execution sequence, each task may be sequentially executed directly according to the precedence position relationship of each task in the first voice signal.

In this embodiment, when the user adjusts the task execution order, the initial execution order may be determined according to the precedence position relationship of each task in the first voice signal, and the user adjusts the initial execution order as needed to further obtain the adjusted execution order. Further, each task may be sequentially executed according to the adjusted execution order.

In other embodiments of the present disclosure, for some or all of the obtained tasks, the user may further set that the some or all tasks may be executed simultaneously when there is no absolute precedence order among the some or all tasks.

For the above-mentioned case of sequentially executing the tasks according to the execution sequence, in an embodiment of the present disclosure, in order to facilitate the user to adjust the execution sequence as needed, the obtained tasks may be arranged and displayed before the execution of the obtained tasks.

Therefore, in an embodiment of the present disclosure, for the at least two tasks obtained in step S120, the obtained at least two tasks may be displayed on a task display page of the electronic device before step S130 is performed. Wherein, the display modes under different multitask categories can be different.

For example, for the case where the obtained plurality of tasks conform to the above category 1:

taking the above speech 1 as an example, in step S120, the task 1.1 and the task 1.2 can be obtained according to the task separation flag in the above speech 1. To facilitate user-on-demand adjustment of the execution sequence, the two tasks may be arranged and displayed on a task display page of the user's smartphone prior to execution of the two tasks. I.e. the tasks are arranged in the order in which the user speaks them. In a possible implementation, this display effect may be as shown in fig. 4.

As shown in fig. 4, each task is preceded by a flag (e.g., 1, 2, 3, 4, etc.), and the user can customize the ranking order accordingly. For example, the voice assistant may speak the flag bits in sequence or drag the display position of the corresponding task according to a desired sequence to adjust the sequence of the tasks. In this manner, the voice assistant may update the task rankings in the order spoken by the user or by dragging operations, and execute the various tasks in the updated rankings.

Referring to FIG. 4, assuming the user has not adjusted the order of task execution, the voice assistant may open App A, execute task 1.1 in App A's page 1 (shown in FIG. 5) to send "tomorrow ten clicks" to user C, and then execute task 1.2 in App A's page 2 (shown in FIG. 6) to send "I'm go home today" to user B.

For another example, in the case where the obtained plurality of tasks conform to the above category 2:

taking speech 4 as an example, in step S120, it is assumed that N tasks are available according to the task separation flag in the speech 4, where N is a positive integer not less than 2. To facilitate user adjustment of the execution order as needed, the N tasks may be displayed on a task display page of the user's smartphone prior to execution of the N tasks. I.e. the tasks are arranged in the order in which the user speaks them. In a possible implementation, this display effect may be as shown in fig. 7.

As shown in fig. 7, preferably, the task display page may be divided into nine blocks to show the resulting plurality of tasks, and the remaining tasks not shown may be stacked under the upper task.

As shown in fig. 7, each task is provided with a flag (e.g., 1, 2, 3, 4, etc.), and the user can customize the arrangement order accordingly. For example, the voice assistant may speak the flag bits in sequence or drag the display position of the corresponding task according to a desired sequence to adjust the sequence of the tasks. In this manner, the voice assistant may update the task rankings in the order spoken by the user or by dragging operations, and execute the various tasks in the updated rankings.

Corresponding to the above, in an embodiment of the present disclosure, the step a, determining the execution sequence of the at least two tasks according to the precedence position relationship of the at least two tasks in the first speech signal, may include the following steps a1 to a 5:

step A1, arranging and displaying the at least two tasks on a task page of the electronic equipment according to the sequence position relation of the at least two tasks in the first voice signal.

Taking the above speech 4 as an example, for the obtained N (N > 9) tasks, the arrangement display effect of the N tasks on the smartphone can be as shown in fig. 7.

And step A2, receiving input operation of the user on the tasks arranged and displayed on the task page.

In one possible implementation, the input operation may be an operation corresponding to an action applied by a user on the task page. For example, the input operation may be a drag operation, which may be used to adjust the task order, or to adjust the task order while combining tasks.

In another possible implementation manner, the input operation may also be an operation corresponding to a control voice for adjusting the task sequence issued by the user. For example, the user may send out the corresponding control voice within a set time period after the electronic device arranges and displays the at least two tasks.

Step A3, in response to the input operation, adjusting the ordering of the at least two tasks on the task page.

And adjusting the sequence of the corresponding tasks and displaying the adjusted sequence effect based on the task sequence requirement brought by the input operation of the user.

For example, assume that the input operation in step a2 is: the user utters speech to transpose the task of flag bit 6 and the task of flag bit 9, the adjusted sequencing effect may be as shown in fig. 8.

For another example, assume that the input operation in step a2 is: the user drags the task to combine the task of flag 8 and the task of flag 9, and the adjusted ordering effect may be as shown in fig. 9.

For another example, the input operation in step a2 is: the user utters a voice or drags a task to drop the task of flag 2 to position 1.

Step a4, it is monitored whether the current time reaches the set target time.

In one possible implementation, the target time may be a time when the user clicks a confirmation key on the task page. Based on this, the user needs to perform the above input operation after the electronic device arranges and displays the at least two tasks. After the user confirms that the task sequence is not adjusted any more, the user can click a confirmation button.

In another possible implementation manner, the starting time of the at least two tasks may also be arranged and displayed based on the electronic device, and the time after a period of time from the starting time may also be used as the target time. If the user adjusts the task sequence before the target time, the starting time is updated to the time when the electronic equipment arranges and displays the latest task sequence, and the target time is updated accordingly. Based on this, the user needs to perform the above input operation after the electronic device arranges and displays the at least two tasks and before the target time is reached.

In addition, after the user confirms that the task sequence is not adjusted any more, the user can click a confirmation key before reaching the target time so as to update the target time to the current time.

In this embodiment, step a4 may be executed in real time after step a1, and whether the current time reaches the set target time is monitored.

Step A5, when the current time reaches the target time, determining the execution sequence of the at least two tasks according to the sequence of the at least two tasks on the task page.

As shown in fig. 8 or fig. 9, the execution order of the tasks may be determined based on the resulting ordering of the tasks on the task page.

In the embodiment, based on the arrangement and display of the plurality of tasks, the user can conveniently confirm the task execution sequence, and can conveniently adjust the task execution sequence in a personalized manner, so that the user experience is improved.

Based on the above, in one embodiment of the present disclosure, the input operation includes a task combination operation for a part of the at least two tasks.

The partial task may be a plurality of tasks corresponding to the same application, that is, a task of the same application, or may be a plurality of tasks corresponding to different applications, for example, a task of the same application type, an associated task, an arbitrary task combined as desired by a user, and the like.

For the same application task, for example, the following may be: in the same, albeit chat-like, application, information is sent to one buddy and another information is sent to another buddy.

For the tasks of the same application type, for example, the following may be used: a song is found in one music class application and the song is found in another music class application.

For the associated task, for example, the following may be: take a picture in a photo type application and send the latest picture to a friend in an even chat type application.

In this embodiment, each task is displayed by the electronic device, so that a user can combine two or more tasks as a combined task as needed. Wherein the combined task may be executed in preference to the non-combined task.

In this embodiment, if there are a plurality of combination tasks, it may be defined that the greater the number of tasks in a combination task, the earlier the execution order of the combination task is, that is, the higher the execution priority is. Wherein a maximum of not more than 6 combining tasks based on one speech signal can be defined.

Correspondingly, the step A3, in response to the input operation, adjusting the ordering of the at least two tasks on the task page, may include the following steps a31 to a 32:

and step A31, responding to the task combination operation, and combining the partial tasks to obtain the corresponding combined task.

For example, if the user drags the task to combine the task at the flag 8 and the task at the flag 9, the two tasks are combined into a combined task.

Step A32, adjusting the ordering of the combined task on the task page to be prior to the ordering of the non-combined task on the task page.

As described above, if the user drags the task (e.g., the dragging motion illustrated in fig. 7) to combine the task of the identifier 8 and the task of the identifier 9, the adjusted ordering effect may be as shown in fig. 9.

In this embodiment, a user can be helped to quickly complete a series of operations by setting a combined task.

In an embodiment of the present disclosure, assuming that a plurality of tasks combined by a user are the same application task, after the plurality of tasks are combined into one combined task, the internal execution sequence of the plurality of tasks in the combined task may be the same or may be consistent with the sequence of the plurality of tasks in the user speech.

In one embodiment of the present disclosure, assuming that a plurality of tasks combined by a user are tasks of the same application type, after combining the plurality of tasks into one combined task, the internal execution sequence of the plurality of tasks in the combined task may be the same, that is, the plurality of tasks are executed simultaneously.

In one embodiment of the present disclosure, assuming that a plurality of tasks combined by a user do not conform to any one of a contract application task, a task of the same application type, and an associated task, after combining the plurality of tasks into one combined task, an internal execution order of the plurality of tasks in the combined task may be determined according to a historical execution situation of the tasks.

For example, the internal execution order may be determined according to the length of the task execution link. The data of the execution link length of the task (for example, the number of steps required for executing the task) may be derived from the recorded behavior data of the task or the similar task executed by the user, and may be derived from the precedence position relationship of the task in the corresponding voice signal.

Based on the above, in an embodiment of the present disclosure, after the combining the partial tasks to obtain the corresponding combined task, the method may further include steps B1 to B3:

and step B1, storing the combination task and the combination name thereof.

In this embodiment, since the combination task meets the personalized requirements of the user, the combination task can be recorded after the user combines the tasks, so that the user can subsequently directly call to execute the combination task again.

And step B2, receiving a second voice signal input by the user.

In detail, the second speech signal may be a speech signal that does not include any combination name, may be a speech signal that only includes a combination name, and may be a speech signal that includes not only a combination name but also other tasks.

Step B3, in the event that the second speech signal includes the combined name, performing each of the combined tasks.

In this embodiment, the user may directly speak the name of the combined task to the voice assistant, and the voice assistant may execute each task in the combined task.

In this embodiment, in order to facilitate quick and accurate execution of the task combining operation of the user, the task combining operation of the user may be guided by displaying the related task and other tasks in a differentiated manner. Wherein, the related task may be: a co-application task, a co-application type task, an associated task, etc.

In detail, for the distinguishing display of the same application task:

in one embodiment of the present disclosure, the at least two tasks include at least two first tasks corresponding to a first application.

Correspondingly, in the step a1, the displaying the at least two tasks includes: and displaying each first task according to a first display attribute, and displaying other tasks according to other display attributes.

For example, the display attribute may be a color attribute, a guideline attribute, etc.

Taking the color attribute as an example, since each first task is the same application task, assuming that the first display attribute is displayed in red, part or all of the task display area of each first task is displayed in red, and any area of other tasks cannot be displayed in red, so that the first tasks are highlighted in red. Therefore, the user can visually and quickly check the first tasks, and then the tasks can be accurately combined according to needs.

In detail, the differentiation of tasks of the same application type shows:

in one embodiment of the present disclosure, the at least two tasks include a first task and a second task, and a first application corresponding to the first task and a second application corresponding to the second task have the same application type.

Correspondingly, in the step a1, the displaying the at least two tasks includes: and displaying the first task and the second task according to a second display attribute, and displaying other tasks according to other display attributes.

In detail, the differentiation for the associated tasks shows:

in one embodiment of the present disclosure, the at least two tasks include an associated third task and a fourth task.

Correspondingly, in the step a1, the displaying the at least two tasks includes: and displaying the third task and the fourth task according to a third display attribute, and displaying other tasks according to other display attributes.

Based on the above, the embodiment can facilitate the user to visually and rapidly view the related tasks by performing the same type of differential display on the related tasks that are possibly combined by the user, so as to combine the related tasks.

Based on the above, in an embodiment of the present disclosure, if one task has a plurality of display attributes, the display area of the task may be partitioned to display the plurality of display attributes respectively through different partitions.

For example, the user utters speech 5: and opening a camera to take a picture, next opening the instant chat application 1, sending the latest picture to the family A, and next sending the latest picture to the friend B. The speech 5 then comprises the following three tasks:

task 5.1: opening a camera to take a picture, corresponding to the application of the camera;

task 5.2: opening an instant chat application 1 to send the latest photo to a family A, corresponding to the application of the instant chat application 1;

task 5.3: opening the instant chat application 1 sends the latest photograph to friend B, also corresponding to the application instant chat application 1.

It can be seen that task 5.2 and task 5.3 are the same application task, and both tasks are associated with task 5.1, then in the display area of task 5.1, two blocks can be divided to display red and green respectively, in the display area of task 5.2, two blocks can be divided to display red and yellow respectively, and in the display area of task 5.3, two blocks can be divided to display yellow and green respectively.

The red color is used for distinguishing and displaying two associated tasks of the task 5.1 and the task 5.2 from other tasks, the green color is used for distinguishing and displaying two associated tasks of the task 5.1 and the task 5.3 from other tasks, and the yellow color is used for distinguishing and displaying two tasks of the same application and the task 5.2 and the task 5.3 from other tasks.

Therefore, the multitask voice processing mode provided by the embodiment can help the user to quickly complete the personalized requirements of the user. In addition, on the basis of implementation of the processing mode, the embodiment can also recommend the user in a personalized manner, so as to further improve the use experience of the user.

Therefore, in an embodiment of the present disclosure, after the performing each of the at least two tasks, the method may further include the following steps C1 to C3:

and step C1, storing the execution information of each task, wherein the execution information comprises the scene information of the scene where the electronic equipment executes the task.

In this embodiment, the scene in which the electronic device executes the task may be a vehicle-mounted scene, a home scene, a subway riding scene, and the like.

And step C2, acquiring target scene information of the scene where the electronic equipment is currently located.

In this step, the current scene of the electronic device is determined. Therefore, the recommendation information in the current scene can be determined according to the task execution history information in the same scene.

The task execution history information may be the task execution history information of the user, the task execution history information of other users, or a combination of the two types of information.

And step C3, displaying recommendation information on a recommendation page of the electronic equipment according to the target scene information and the execution information, wherein the recommendation information comprises at least one of a task and an application corresponding to the task.

In detail, for the case of recommending a task to a user:

under a feasible implementation mode, each executed task meeting the limiting conditions (such as scene information, user information, time interval information and the like) can be obtained, the tasks can be accumulated and sorted according to the use frequency of the tasks, and at least one task which is sorted in the front is displayed as recommendation information to be recommended to a user.

In this embodiment, the tasks in each dimension may be accumulated and sorted according to the execution times of the tasks, and at least one (for example, the top 10 recommended) task sorted before is taken as recommendation information to be recommended to the user.

Therefore, the voice assistant can execute any task of the recommended information only by clicking the task by the user, so that the user does not need to tell the voice assistant about the task to be executed each time, and the method and the device are particularly suitable for some scenes which cannot be loud and some scenes which are commonly used by people.

In detail, the voice assistant may determine the recommendation information from at least several dimensions: the user uses the voice assistant to execute the tasks, the tasks which are used by the user for a plurality of times in the current month, and the tasks which are used by other users for a plurality of times in the current month. Based on this, a recommendation page that recommends a task to the user may be as shown in fig. 14.

In detail, for the case of recommending an application to a user:

under a feasible implementation mode, the voice assistant can obtain the use frequency of the applications according to the use frequency of the tasks and the application identifications corresponding to the tasks, and accordingly, the applications are accumulated and sorted, and at least one application sorted in the front is displayed as recommendation information to be recommended to the user.

In this embodiment, the voice assistant may record the applications that the user opens for use in each scenario. Thus, when the user is again in this scenario, the top three applications may be recommended for the user (e.g., based on the application being used frequently every month).

In addition, if none of the recommended information is currently desired by the user, the voice assistant may further recommend applications of other users three times before the current scene usage frequency.

In detail, for the case of recommending an application and task operation items in the application to a user:

in this embodiment, the voice assistant may record the application opened and used by the user in each scene and the user operation of the user in the application. Thus, when the user is in the scene again, the top three applications can be recommended for the user, and the task operation items with the highest use frequency in the applications can be reproduced.

In addition, if the task operation item with the highest use frequency in the application is different from the latest task operation item, the latest task operation item can be prompted in a bubble mode. Therefore, the user can reproduce the latest task operation item by clicking the bubble.

Based on this, in one embodiment of the present disclosure, for an application recommended to a user, if a user inputs a confirmation operation for the recommended application, each application recommended to the user may be displayed in blocks on a recommendation page of the electronic device in response to the confirmation operation, and a task operation item with the highest historical use frequency in the application may be displayed, and/or a task operation item with the latest operation in the application may be displayed.

In addition, if none of the recommendation information is currently desired by the user, the voice assistant may further recommend applications and task operation items in the applications that are three times higher than the current scene use frequency of other users.

Based on this, in an embodiment of the present disclosure, information may be recommended to a user according to task execution history information of the user, and if the user indicates that the user is not interested in the recommended information, information may be recommended to the user further according to task execution history information of other users.

For example, assume that the three applications that are relatively most used monthly by the user are application X, application Y, and application Z, respectively. The task operation items with the highest monthly use frequency in the application X of the user are as follows: the trip operation items of the application X, the task operation items used in the application X for the last time are: opening a fund interface; the task operation items with the highest monthly use frequency in the application Y of the user are as follows: chatting with xx, the task operation items used in application Y last time are: punching a card; the task operation items with the highest monthly use frequency in the application Z of the user are as follows: chatting with xx, the last task operation items used in application Z are: and opening a sharing page.

Thus, as shown in fig. 10, icons of application X, application Y, and application Z may be displayed on the screen lock interface of the user smartphone, or as shown in fig. 11 or 12, application X, application Y, and application Z may be displayed on the interface after the user unlocks the screen, and the highest-frequency operation and the last operation under each application may be displayed, or as shown in fig. 13, icons of application X, application Y, and application Z may be displayed on the lower side of the page of the display page where the user opens application W.

Based on the display content of fig. 10, if the user does not need to use the currently recommended application, the user can directly slide some application upward to go out of the screen; if the user needs to use the currently recommended application, the screen can be unlocked. When the screen is unlocked, the voice assistant may split the three applications open.

As shown in fig. 11 or 12, to facilitate user focusing, the application X with the highest monthly usage frequency may be opened from one half of the screen, and the other two applications may be opened on the other half of the screen. The user can also drag the application to perform screen replacement, for example, application Z can replace application X to the upper screen and application X to the lower screen.

Referring to fig. 12, the page of each recommended application may be the page of the task execution item with the highest frequency of use in the application, while the latest task execution item within the application is displayed in the upper left corner of the corresponding page by a bubble. If the user does not want the currently recommended page, the bubble may be clicked to land the displayed page on the user's last page of operation in the application.

In other embodiments of the present disclosure, different scenarios may not be distinguished, for example, recommendation information may be determined directly according to history information of task execution.

Based on the above contents, the voice processing method provided by the embodiment can process multitask voice, and based on the realization of the voice processing method, the voice assistant of the smart phone can be more and more humanized and personalized, so that the smart phone becomes a good helper for the work and life of people.

It should be noted that, in the voice processing method provided in the embodiment of the present application, the execution main body may be a voice processing apparatus, or a control module in the voice processing apparatus for executing the voice processing method. The embodiment of the present application takes a voice processing apparatus executing a voice processing method as an example, and describes a voice processing apparatus provided in the embodiment of the present application.

As shown in fig. 15, the speech processing apparatus 200 provided in this embodiment may include a first receiving module 210, a first processing module 220, and an executing module 230.

The first receiving module 210 is configured to receive a first voice signal input by a user. The first processing module 220 is configured to obtain at least two tasks corresponding to the first voice signal according to the task separation identifier included in the first voice signal. The execution module 230 is configured to execute each of the at least two tasks.

In one embodiment of the present disclosure, the at least two tasks include at least two first tasks corresponding to a first application. The executing module 230 is configured to execute each of the first tasks in the first application respectively.

In one embodiment of the present disclosure, the at least two tasks include a first task corresponding to a first application and a second task corresponding to a second application. The executing module 230 is configured to execute the first task in the first application and execute the second task in the second application.

In an embodiment of the present disclosure, the speech processing apparatus 200 further includes a display processing module, a second receiving module, and a second processing module. The display processing module is used for arranging and displaying the at least two tasks on a task page of the electronic equipment according to the sequence position relation of the at least two tasks in the first voice signal; and adjusting the sequence of the at least two tasks on the task page in response to the input operation. And the second receiving module is used for receiving the input operation of the user on the tasks arranged and displayed on the task page. The second processing module is used for monitoring whether the current time reaches a set target time; and under the condition that the current time reaches the target time, determining the execution sequence of the at least two tasks according to the sequence of the at least two tasks on the task page. The executing module 230 is configured to execute each task of the at least two tasks according to the executing sequence.

In one embodiment of the present disclosure, the input operation includes a task combination operation for a part of the at least two tasks. The display processing module is used for responding to the task combination operation and combining the partial tasks to obtain corresponding combined tasks; and adjusting the sequence of the combined task on the task page to be prior to the sequence of the non-combined task on the task page.

In an embodiment of the present disclosure, the speech processing apparatus 200 further includes: a first storage module. The first storage module is used for storing the combined task and the combined name thereof after the display processing module combines the partial tasks to obtain the corresponding combined task. The first receiving module 210 is configured to receive a second voice signal input by a user. The execution module 230 is configured to execute each task of the combined tasks if the second speech signal includes the combined name.

In one embodiment of the present disclosure, the at least two tasks include at least two first tasks corresponding to a first application. The display processing module is used for displaying each first task according to a first display attribute and displaying other tasks according to other display attributes.

In one embodiment of the present disclosure, the at least two tasks include a first task and a second task, and a first application corresponding to the first task and a second application corresponding to the second task have the same application type. And the display processing module is used for displaying the first task and the second task according to a second display attribute and displaying other tasks according to other display attributes.

In one embodiment of the present disclosure, the at least two tasks include an associated third task and a fourth task. And the display processing module is used for displaying the third task and the fourth task according to a third display attribute and displaying other tasks according to other display attributes.

In one embodiment of the present disclosure, the task includes a task operation item and an application identifier. The first processing module 220 includes: a third processing module and a fourth processing module. The third processing module is used for obtaining at least two pieces of voice information separated by the task separation identification contained in the first voice signal according to the task separation identification contained in the first voice signal; wherein the at least two voice messages correspond to the at least two tasks one to one. The fourth processing module is used for obtaining a corresponding task according to the first voice information under the condition that the first voice information contains task operation items and an application identifier; under the condition that the first voice message contains task operation items and does not contain an application identifier, obtaining a task corresponding to the first voice message according to the task operation items contained in the first voice message and the application identifier contained in the second voice message; the second voice information is the prior voice information which contains the application identification in the first voice signal and is closest to the first voice information.

In an embodiment of the present disclosure, the speech processing apparatus 200 further includes: the device comprises a second storage module, an acquisition module and a display processing module. The second storage module is configured to store, after the execution module 230 executes each task of the at least two tasks, execution information of each task, where the execution information includes scene information of a scene in which the electronic device executes the task. The acquisition module is used for acquiring target scene information of a scene where the electronic equipment is located currently. And the display processing module is used for displaying recommendation information on a recommendation page of the electronic equipment according to the target scene information and the execution information, wherein the recommendation information comprises at least one of a task and an application corresponding to the task.

The voice processing device in the embodiment of the present application may be a device, and may also be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The speech processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The speech processing apparatus provided in the embodiment of the present application can implement each process implemented in the method embodiment of fig. 1, and is not described here again to avoid repetition.

Optionally, as shown in fig. 16, an electronic device 300 is further provided in this embodiment of the present application, and includes a processor 310, a memory 320, and a program or an instruction stored in the memory 320 and executable on the processor 310, where the program or the instruction is executed by the processor 310 to implement each process of the foregoing speech processing method embodiment, and can achieve the same technical effect, and no further description is provided here to avoid repetition.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 17 is a schematic hardware structure diagram of an electronic device 1000 implementing the embodiment of the present application.

The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.

Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 17 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description thereof is omitted.

The processor 1010 is configured to receive a first voice signal input by a user; obtaining at least two tasks corresponding to the first voice signal according to task separation identification contained in the first voice signal; each of the at least two tasks is executed.

Optionally, the at least two tasks include at least two first tasks corresponding to the first application. A processor 1010 configured to execute each of the first tasks in the first application.

Optionally, the at least two tasks include a first task corresponding to the first application and a second task corresponding to the second application. A processor 1010 configured to execute the first task in the first application and execute the second task in the second application.

Optionally, the processor 1010 is configured to, before each task of the at least two tasks is executed, arrange and display the at least two tasks on a task page of the electronic device according to a sequential position relationship of the at least two tasks in the first voice signal; receiving input operation of a user on the tasks arranged and displayed on the task page; in response to the input operation, adjusting the ordering of the at least two tasks on the task page; monitoring whether the current time reaches a set target time; under the condition that the current time reaches the target time, determining the execution sequence of the at least two tasks according to the sequence of the at least two tasks on the task page; and executing each task of the at least two tasks according to the execution sequence.

Optionally, the input operation includes a task combination operation for a part of the at least two tasks. A processor 1010, configured to combine the partial tasks to obtain corresponding combined tasks in response to the task combining operation; and adjusting the sequence of the combined task on the task page to be prior to the sequence of the non-combined task on the task page.

Optionally, the processor 1010 is configured to store the combined task and the combined name thereof after the partial tasks are combined to obtain the corresponding combined task; receiving a second voice signal input by a user; in the event that the second speech signal includes the combined name, performing each of the combined tasks.

Optionally, the at least two tasks include at least two first tasks corresponding to the first application. The processor 1010 is configured to display each of the first tasks according to a first display attribute, and display other tasks according to other display attributes.

Optionally, the at least two tasks include a first task and a second task, and a first application corresponding to the first task and a second application corresponding to the second task have the same application type. And a processor 1010, configured to display the first task and the second task according to a second display attribute, and display other tasks according to other display attributes.

Optionally, the at least two tasks include an associated third task and a fourth task. And a processor 1010, configured to display the third task and the fourth task according to a third display attribute, and display other tasks according to other display attributes.

Optionally, the task includes a task operation item and an application identifier. A processor 1010, configured to obtain at least two pieces of voice information separated by a task separation identifier included in the first voice signal according to the task separation identifier included in the first voice signal; wherein the at least two voice messages correspond to the at least two tasks one to one; under the condition that the first voice information comprises task operation items and an application identifier, obtaining a corresponding task according to the first voice information; under the condition that the first voice message contains task operation items and does not contain an application identifier, obtaining a task corresponding to the first voice message according to the task operation items contained in the first voice message and the application identifier contained in the second voice message; the second voice information is the prior voice information which contains the application identification in the first voice signal and is closest to the first voice information.

Optionally, the processor 1010 is configured to, after each of the at least two tasks is executed, store execution information of each task, where the execution information includes context information of a context in which the electronic device is located when executing the task; acquiring target scene information of a scene where the electronic equipment is located currently; and displaying recommendation information on a recommendation page of the electronic equipment according to the target scene information and the execution information, wherein the recommendation information comprises at least one of a task and an application corresponding to the task.

It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the foregoing speech processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the foregoing voice processing method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of speech processing, comprising:

receiving a first voice signal input by a user;

obtaining at least two tasks corresponding to the first voice signal according to task separation identification contained in the first voice signal;

each of the at least two tasks is executed.

2. The method of claim 1, wherein the at least two tasks include at least two first tasks corresponding to a first application;

the performing each of the at least two tasks comprises: executing each first task in the first application respectively;

and/or the presence of a gas in the gas,

the at least two tasks include a first task corresponding to a first application and a second task corresponding to a second application;

the performing each of the at least two tasks comprises: executing the first task in the first application, and executing the second task in the second application.

3. The method of claim 1, wherein prior to said executing each of said at least two tasks, said method further comprises:

arranging and displaying the at least two tasks on a task page of the electronic equipment according to the sequence position relation of the at least two tasks in the first voice signal;

receiving input operation of a user on the tasks arranged and displayed on the task page;

in response to the input operation, adjusting the ordering of the at least two tasks on the task page;

monitoring whether the current time reaches a set target time;

under the condition that the current time reaches the target time, determining the execution sequence of the at least two tasks according to the sequence of the at least two tasks on the task page;

the performing each of the at least two tasks comprises: and executing each task of the at least two tasks according to the execution sequence.

4. The method of claim 3, wherein the input operation comprises a task combining operation for a portion of the at least two tasks;

the adjusting the ordering of the at least two tasks on the task page in response to the input operation comprises:

responding to the task combination operation, and combining the partial tasks to obtain corresponding combined tasks;

and adjusting the sequence of the combined task on the task page to be prior to the sequence of the non-combined task on the task page.

5. The method of claim 4, wherein after said combining the partial tasks results in a corresponding combined task, the method further comprises:

storing the combined task and the combined name thereof;

receiving a second voice signal input by a user;

in the event that the second speech signal includes the combined name, performing each of the combined tasks.

6. The method of claim 1, wherein the task comprises a task operation item and an application identification;

the obtaining of at least two tasks corresponding to the first voice signal according to the task separation identifier included in the first voice signal includes:

according to the task separation identification contained in the first voice signal, obtaining at least two voice messages separated by the contained task separation identification; wherein the at least two voice messages correspond to the at least two tasks one to one;

under the condition that the first voice information comprises task operation items and an application identifier, obtaining a corresponding task according to the first voice information;

under the condition that the first voice message contains task operation items and does not contain an application identifier, obtaining a task corresponding to the first voice message according to the task operation items contained in the first voice message and the application identifier contained in the second voice message; the second voice information is the prior voice information which contains the application identification in the first voice signal and is closest to the first voice information.

7. A speech processing apparatus, comprising:

the first receiving module is used for receiving a first voice signal input by a user;

the first processing module is used for obtaining at least two tasks corresponding to the first voice signal according to the task separation identifier contained in the first voice signal; and the number of the first and second groups,

an execution module for executing each of the at least two tasks.

8. The speech processing apparatus according to claim 7, wherein the at least two tasks include at least two first tasks corresponding to a first application;

the execution module is used for respectively executing each first task in the first application;

and/or the presence of a gas in the gas,

the execution module is configured to execute the first task in the first application and execute the second task in the second application.

9. The speech processing apparatus according to claim 7, wherein the speech processing apparatus further comprises a display processing module, a second receiving module, and a second processing module;

the display processing module is used for arranging and displaying the at least two tasks on a task page of the electronic equipment according to the sequence position relation of the at least two tasks in the first voice signal; in response to the input operation, adjusting the ordering of the at least two tasks on the task page;

the second receiving module is used for receiving input operation of a user on the tasks arranged and displayed on the task page;

the second processing module is used for monitoring whether the current time reaches a set target time; under the condition that the current time reaches the target time, determining the execution sequence of the at least two tasks according to the sequence of the at least two tasks on the task page;

the execution module is configured to execute each task of the at least two tasks according to the execution sequence.

10. The speech processing apparatus according to claim 9, wherein the input operation includes a task combining operation for a part of the at least two tasks;

the display processing module is used for responding to the task combination operation and combining the partial tasks to obtain corresponding combined tasks; and adjusting the sequence of the combined task on the task page to be prior to the sequence of the non-combined task on the task page.

11. The speech processing apparatus according to claim 10, wherein the speech processing apparatus further comprises: a first storage module;

the first storage module is used for storing the combined task and the combined name thereof after the display processing module combines the partial tasks to obtain the corresponding combined task;

the first receiving module is used for receiving a second voice signal input by a user;

the execution module is configured to execute each of the combined tasks if the second speech signal includes the combined name.

12. The speech processing device according to claim 7, wherein the task comprises a task operation item and an application identifier;

the first processing module comprises:

the third processing module is used for obtaining at least two pieces of voice information separated by the task separation identification contained in the first voice signal according to the task separation identification contained in the first voice signal; wherein the at least two voice messages correspond to the at least two tasks one to one;

the fourth processing module is used for obtaining a corresponding task according to the first voice information under the condition that the first voice information contains task operation items and an application identifier; under the condition that the first voice message contains task operation items and does not contain an application identifier, obtaining a task corresponding to the first voice message according to the task operation items contained in the first voice message and the application identifier contained in the second voice message; the second voice information is the prior voice information which contains the application identification in the first voice signal and is closest to the first voice information.

13. An electronic device comprising a processor, a memory, and a program or instructions stored on the memory and executable on the processor, the program or instructions when executed by the processor implementing the steps of the speech processing method according to any of claims 1-6.