CN111081236A

CN111081236A - Voice processing method, terminal and computer storage medium

Info

Publication number: CN111081236A
Application number: CN201811228875.1A
Authority: CN
Inventors: 张小康
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2018-10-22
Filing date: 2018-10-22
Publication date: 2020-04-28

Abstract

The embodiment of the invention discloses a voice processing method, which is applied to a terminal and comprises the following steps: acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics; when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal. The embodiment of the invention also discloses a terminal and a computer readable storage medium.

Description

Voice processing method, terminal and computer storage medium

Technical Field

The present invention relates to voice information recognition technology in the field of communications, and in particular, to a voice processing method, a terminal, and a computer storage medium.

Background

In the existing voice man-machine interaction process, a machine takes a user question as a starting point, calculates the most reliable response, and gives corresponding voice broadcast feedback to the user based on the response. From this principle, the answer is actually a fixed result based on a dynamic traversal lookup. In other words, when a user asks a question to the machine, the machine always gives it a match to an answer that it considers reasonable.

However, when human-computer interaction reaches a certain level of proficiency, the feedback part of the machine is redundant. For example, the user says that i want to go to a certain place, and the terminal sends out voice prompt information asking the user how to go, whether to take a bus or a bus after recognizing. When the user says i want to go to a certain place next time or the Nth time, the voice prompt information given by the terminal to the user is still the fixed time-consuming voice information, and the user feels that the time is consumed when using the terminal, so that the working efficiency of the terminal is low.

Disclosure of Invention

In order to solve the above technical problems, embodiments of the present invention are expected to provide a voice processing method, a terminal, and a computer storage medium, so as to solve the problem that time is consumed in an operation flow in a human-computer interaction technology, and improve the working efficiency of the terminal.

The technical scheme of the invention is realized as follows:

a voice processing method is applied to a terminal, and comprises the following steps:

acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;

when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal.

A terminal, the terminal comprising: a processor, a memory, and a communication bus;

the communication bus is used for realizing communication connection between the processor and the memory;

the processor is configured to execute a program in memory that operates on speech information to implement the steps of:

A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of the speech processing method described above.

The voice processing method, the terminal and the computer storage medium provided by the embodiment of the invention are used for acquiring the first voice information to be recognized, performing semantic recognition on the first voice information to be recognized to obtain the first semantic, outputting the simplified prompt information when the first semantic is matched with the first preset scope, wherein the time length corresponding to the simplified prompt information is less than the specific time length, and the information in the first preset scope represents the executable operation of the terminal.

Drawings

Fig. 1 is a schematic flow chart of a speech processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another speech processing method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating another speech processing method according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a speech processing method according to another embodiment of the present invention;

fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

An embodiment of the present invention provides a speech processing method, which is shown in fig. 1 and includes the following steps:

step 101, obtaining first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain a first semantic meaning.

Step 101, acquiring first to-be-recognized voice information, performing semantic recognition on the first to-be-recognized voice information to obtain a first semantic meaning, wherein the first semantic meaning can be realized by a terminal; the terminal can be equipment with a function of man-machine interaction with a user; in one possible implementation, the terminal may be a mobile terminal capable of human-computer interaction with a user. The first speech information to be recognized can be sent to the terminal when a user needs to perform man-machine interaction operation with the terminal; the first semantic meaning can be obtained by the terminal performing semantic recognition on the first to-be-recognized voice information sent by the user by adopting a semantic recognition technology.

And 102, outputting simplified prompt information when the first semantic meaning is matched with the first preset scope.

And the time length corresponding to the simplified prompt message is less than the specific time length, and the information in the first preset action domain represents the operation executable by the terminal.

It should be noted that, when the first semantic meaning matches the first preset scope in step 102, outputting the simplified prompt information may be implemented by the terminal; the first preset scope may be previously generated and stored in the terminal. In a possible implementation manner, the first preset scope may be generated according to various operations performed by the terminal during the user history using the terminal. The first predetermined scope may include words characterizing the operations that the terminal may perform.

The simplified prompt message is no longer the initial set voice message which takes a long time, and may be any form of prompt message which takes a short time, for example, the simplified prompt message may be a simplified version of the initial set voice message; that is, the simplified prompt message may be a voice message having a more simplified content than the voice message initially set; of course, the simplified prompt message may be a voice message that is set directly and has a relatively simplified content, regardless of the initially set voice message; in one possible implementation, simplifying the hint information may include: a short time is required for outputting such information as a buzzer, a shock, a droplet, a click, and a clattering. The specific time period may be a time which is preset according to the use feeling of the user and is acceptable and comfortable for the user; however, the specific time period must be shorter than the time period for outputting the voice guidance information in the related art.

The voice processing method provided by the embodiment of the invention obtains the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, and outputs the simplified prompt information if the first semantic is matched with the first preset scope, wherein the time corresponding to the simplified prompt information is less than the specific time, and the information in the first preset scope represents the operation executable by the terminal.

Based on the foregoing embodiments, an embodiment of the present invention provides a speech processing method, which is applied in a terminal, and as shown in fig. 2, the method includes the following steps:

step 201, a terminal acquires first voice information to be recognized, and performs semantic recognition on the first voice information to be recognized to obtain a first semantic meaning.

Step 202, the terminal detects whether the proficiency of the first semantic meets the preset proficiency.

The proficiency of the first semantic meaning refers to a success rate that the terminal can perform one complete operation on the first semantic meaning, namely, the probability that the first semantic meaning is matched with a first preset scope; and if the probability that the first semantic meaning is matched with the first preset scope meets the preset probability, the proficiency of the first semantic meaning meets the preset proficiency.

It should be noted that, the step 202 of detecting whether the proficiency of the first semantic meets the preset proficiency may be implemented by:

a1, acquiring historical identification information of a target semantic matched with the first semantic;

the target semantics refers to the semantics which are identified by the terminal in the history use process and are the same as or similar to the content of the first semantics. The history identification information refers to information for identification of a target semantic identical or similar to the content of the first semantic by the terminal during history use.

a2, determining whether the proficiency of the first semantic meets the preset proficiency based on the historical identification information of the target semantic.

The preset proficiency may be that a preset probability that the first semantic meaning included in the first preset scope can be represented meets a preset probability.

And 203, if the proficiency of the first semantic meets the preset proficiency, the terminal detects whether the first semantic is matched with the first preset scope.

Wherein detecting whether the first semantic matches the first scope may be by detecting whether the first semantic matches information included in a first preset scope. The first preset scope contains a set of operations that can characterize a processing operation for a user-initiated voice request.

And step 204, when the first semantic meaning is matched with the first preset scope, the terminal outputs simplified prompt information.

The time length corresponding to the simplified prompt information is less than a specific time length; the information in the first preset action domain represents the operation which can be executed by the terminal.

And step 205, if the first semantic is not matched with the first preset scope, the terminal acquires a second preset scope from the server.

If the first semantic is not matched with the information included in the first preset scope, the first semantic is not matched with the first preset scope; the second preset scope is different from the first preset scope, and the second preset scope can be stored in the server; the second preset scope may be generated according to historical operation information in the server.

And step 206, if the first semantic meaning is matched with the second preset scope, the terminal outputs simplified prompt information.

Wherein, step 207 can be executed after both step 204 and step 206;

and step 207, the terminal acquires second voice information to be recognized aiming at the simplified prompt information and executes preset operation based on the second voice information to be recognized.

The second to-be-recognized voice information may be voice information different from the first to-be-recognized voice information, which is sent to the user by the user in response to the simplified prompt information after the terminal outputs the simplified prompt information to the user.

It should be noted that, for the explanation of the same steps or concepts in the embodiments of the present invention and other embodiments, reference may be made to the description in other embodiments, which is not repeated herein.

The voice processing method provided by the embodiment of the invention obtains the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, outputs the simplified prompt information if the first semantic is matched with the first preset scope, and the corresponding duration of the simplified prompt information is less than the specific duration, and the information in the first preset scope represents the executable operation of the terminal, so that after the terminal recognizes the voice information of the user, the terminal can give the concise prompt information with short time consumption according to the actual semantic of the voice information instead of giving the voice prompt information with long time consumption as in the man-machine interaction technology, thereby solving the problem of time consumption of the operation process in the man-machine interaction technology and improving the working efficiency of the terminal.

Based on the foregoing embodiments, an embodiment of the present invention provides a speech processing method, which is applied in a terminal, and as shown in fig. 3, the method includes the following steps:

301, the terminal acquires the first voice information to be recognized, and performs semantic recognition on the first voice information to be recognized to obtain a first semantic meaning.

Step 302, the terminal acquires historical identification information of the target semantic matched with the first semantic.

And step 303, the terminal determines the probability that the target semantics are matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics.

If the voice request initiated by the user in the interaction process with the terminal is contained in the first preset scope, the user is meant to complete one man-machine interaction meeting the proficiency. The corresponding proficiency counter will record a successful operation. On the contrary, if the user initiates a voice request outside the first preset scope in the subsequent interaction process with the terminal, the user is considered not to complete the human-computer interaction process meeting the proficiency. The corresponding proficiency counter will record a failed operation. When the ratio of the success times to the failure times in the proficiency counter reaches a preset threshold, the user is considered to reach the proficiency level for the voice interaction of the first preset scope, the voice prompt which takes a long time in the relative technology in the first preset scope is cancelled, and the simplified prompt information in the embodiment of the invention is output.

Step 303 determines, based on the historical identification information of the target semantic, a probability that the target semantic is matched with the first preset scope in the historical operation process, and may be implemented in the following manner:

b1, the terminal determines a first number of times that the target semantics are matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics.

The first time may be the number of times that the semantics recognized by the terminal for the voice information sent by the user by the proficiency counter are matched with the information included in the first preset action domain.

b2, the terminal determines a second number of times that the target semantics are not matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics.

The second number may be the number of times that the semantics recognized by the terminal for the voice information sent by the user by the proficiency counter do not match the information included in the first preset scope.

b3, the terminal determines the probability that the target semantics are matched with the first preset scope in the historical operation process based on the first times and the second times.

If the ratio of the first times to the second times is larger than a preset value, the ratio of the success times to the failure times in the proficiency counter is considered to reach a preset threshold value; at this time, the terminal may determine that the probability that the target semantics are matched with the first preset scope in the historical operation process satisfies the preset probability.

And 304, if the probability that the target semantics are matched with the first preset scope in the historical operation process meets the preset probability, the terminal determines that the proficiency of the first semantics meets the preset proficiency.

And 305, if the proficiency of the first semantic meaning meets the preset proficiency, the terminal analyzes the first semantic meaning to obtain the keyword.

Wherein the keyword may be a word capable of characterizing the operation referred to by the first semantic.

And step 306, the terminal determines a target scope matched with the keyword from the first preset scope.

The first preset scope may be a set, that is, the first preset scope includes multiple scopes, and each scope has its own identifier, for example, the identifier may be a category of the scope or a name of the scope; the terminal acquires the scope name (namely the category of the scope) matched with the keyword from the first preset scope according to the keyword, and further determines the scope as a target scope.

Step 307, the terminal detects whether the keywords are matched with the words in the target action domain.

Detecting whether the keywords are matched with the words in the target action domain can be realized by detecting whether words which are the same as the keywords or have the same meanings as the keywords exist in the words in the target action domain; if there is a word in the target scope that is the same or has the same meaning as the keyword, the keyword may be considered to match the word in the target scope.

And 308, if the keyword is matched with the word in the target scope, the terminal determines that the first semantic is matched with the first preset scope.

And 309, if the first semantic meaning is matched with the first preset scope, the terminal outputs simplified prompt information.

And 310, if the first semantic meaning is not matched with the first preset scope, the terminal acquires a second preset scope from the server.

And 311, if the first semantic meaning is matched with the second preset scope, the terminal outputs simplified prompt information.

Wherein, step 309 and step 311 can be followed by step 312;

and step 312, the terminal acquires second voice information to be recognized aiming at the simplified prompt information, and performs semantic recognition on the second voice information to be recognized to obtain a second semantic meaning.

The implementation process of performing semantic recognition on the second voice information to be recognized to obtain the second semantic is the same as the implementation process of performing semantic recognition on the first voice information to be recognized to obtain the first semantic.

And 313, if the second semantic meaning is matched with the first preset scope, the terminal outputs simplified prompt information.

And step 314, the terminal acquires third voice information to be recognized aiming at the simplified prompt information until the voices to be recognized are acquired.

It should be noted that, if the terminal detects that the third semantic meaning corresponding to the third to-be-recognized voice information matches the first preset scope, the terminal may output the simplified prompt information, and then the terminal may acquire the to-be-recognized voice information sent by the user for the simplified prompt information until all the to-be-recognized voice information that the user needs to send to the terminal is sent.

And 315, the terminal executes preset operation based on the finally acquired voice information to be recognized.

And finally, the obtained voice information to be recognized refers to the voice information to be recognized which is sent to the terminal by the user for the last time in all the voice information to be recognized.

It should be noted that, when the second semantic is not within the first preset scope and the third semantic is not within the first preset scope, the operation performed by the terminal is the same as the operation performed by the second semantic in the first preset scope.

In other embodiments of the present invention, the method may further comprise the steps of:

and acquiring information in a second preset action domain.

And updating the first preset scope based on the information in the second preset scope.

The terminal can add a word in the second preset scope different from the word in the first preset scope to the first preset scope, and then updates the first preset scope.

In other embodiments of the present invention, before step 301, the method may further comprise the steps of:

and acquiring historical operation performed on the historical to-be-recognized voice information.

And generating a first preset scope based on the historical to-be-recognized voice information and the historical operation.

The history operation executed aiming at the history voice information to be recognized refers to voice prompt information and/or executed operation which is sent to the terminal aiming at the user history to be recognized and returned to the user after semantic meaning is recognized by the terminal.

The following is a very simple example:

referring to fig. 4, if the user sends a command 1 "i want to go to the prefecture square" to the terminal, the terminal performs voice recognition on the command 1 after receiving the command, and judges that the semantic meaning is matched with the name of the scope in the first preset scope after the recognition is successful; if the matching is successful and the proficiency reaches the preset proficiency, simple simplified prompt information, such as beep sound, is output at the moment; then, receiving an instruction 2 of 'sitting on the subway' sent by the user aiming at the simplified prompt message, the terminal continues to recognize, if a condition is met, continues to output the simplified prompt message, such as beep sound, and then presents the user with a well-defined route. However, in the relative technology, if the user gives an instruction of "i want to go to the heaven square", the terminal may give "how to go? Whether you want to select driving, public transportation or walking or subway (here, the broadcasting time is long) voice prompt information; after the user answers "sit on the subway", the terminal gives a prescribed route. Obviously, the voice prompt given in the relative technology is time-consuming and poor in user experience.

Based on the foregoing embodiments, an embodiment of the present invention provides a terminal, which may be applied to the voice processing method provided in the embodiments corresponding to fig. 1 to 3, and as shown in fig. 5, the terminal may include: a processor 41, a memory 42 and a communication bus 43;

the communication bus 43 is used for realizing communication connection between the processor 41 and the memory 42;

the processor 41 is configured to execute a program in the memory 42 for operation on the speech information to implement the steps of:

when the first semantic meaning is matched with the first preset scope, outputting simplified prompt information;

In other embodiments of the present invention, processor 41 is configured to execute a speech processing program in memory 42 to perform the following steps:

and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.

In other embodiments of the present invention, processor 41 is configured to execute the following steps when the first semantic meaning matches the first predetermined scope in memory 42 to output the simplified hint information:

detecting whether the proficiency of the first semantic meets a preset proficiency;

if the proficiency of the first semantic meets the preset proficiency, detecting whether the first semantic is matched with the first preset scope;

and outputting simplified prompt information when the first semantic is matched with the first preset scope.

In other embodiments of the present invention, processor 41 is configured to execute detecting in memory 42 whether the proficiency of the first semantic meets the preset proficiency to implement the steps of:

acquiring historical identification information of a target semantic matched with the first semantic;

and determining whether the proficiency of the first semantic meets the preset proficiency based on the historical identification information of the target semantic.

In other embodiments of the present invention, processor 41 is configured to execute the historical recognition information based on the target semantic in memory 42 to determine whether the proficiency of the first semantic meets the preset proficiency, so as to implement the following steps:

determining the probability that the target semantics are matched with a first preset scope in the historical operation process based on the historical identification information of the target semantics;

and if the probability that the target semantics are matched with the first preset scope in the historical operation process meets the preset probability, determining that the proficiency of the first semantics meets the preset proficiency.

In other embodiments of the present invention, processor 41 is configured to execute the historical identification information based on the target semantics in memory 42, and determine a probability that the target semantics match the first preset scope during the historical operation, so as to implement the following steps:

determining a first number of times that the target semantics are matched with a first preset scope in the historical operation process based on the historical identification information of the target semantics;

determining a second number of times that the target semantics are not matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics;

based on the first number and the second number, determining the probability that the target semantics are matched with the first preset scope in the historical operation process.

In other embodiments of the present invention, processor 41 is configured to execute the steps of detecting in memory 42 whether the first semantic meaning matches the first predetermined scope, to:

analyzing the first semantic meaning to obtain a keyword;

determining a target scope matched with the keyword from the first preset scope;

detecting whether the keywords are matched with words in the target action domain;

and if the keyword is matched with the word in the target scope, determining that the first semantic is matched with the first preset scope.

In other embodiments of the present invention, processor 41 is configured to execute the operating program for voice information in memory 42 to implement the following steps:

if the first semantics are not matched with the first preset scope, acquiring a second preset scope from the server;

if the first semantic meaning is matched with the second preset scope, outputting simplified prompt information;

In other embodiments of the present invention, the processor 41 is configured to execute the steps of obtaining the second to-be-recognized voice information for the simplified prompt information in the memory 42, and executing a preset operation based on the second to-be-recognized voice information to implement:

acquiring second voice information to be recognized aiming at the simplified prompt information, and performing semantic recognition on the second voice information to be recognized to obtain second semantics;

if the second semantic is matched with the first preset scope, outputting simplified prompt information;

acquiring third voice information to be recognized aiming at the simplified prompt information until the voices to be recognized are acquired;

and executing preset operation based on the finally obtained voice information to be recognized.

acquiring information in a second preset action domain;

In other embodiments of the present invention, the processor 41 is configured to execute the following steps before obtaining the first speech information to be recognized in the memory 42 and performing semantic recognition on the first speech information to be recognized to obtain the first semantic meaning:

acquiring historical operation executed aiming at historical voice information to be recognized;

It should be noted that, in the embodiment of the present invention, the interaction process between the steps executed by the first processor may refer to the interaction process in the speech processing method provided in the embodiments corresponding to fig. 1 to 3, and details are not described here.

The terminal provided by the embodiment of the invention acquires the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, outputs the simplified prompt information if the first semantic is matched with the first preset scope, and the corresponding duration of the simplified prompt information is less than the specific duration, and the information in the first preset scope represents the operation executable by the terminal.

Based on the foregoing embodiments, embodiments of the invention provide a computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of:

In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:

analyzing the first semantic meaning to obtain a keyword;

acquiring information in a second preset action domain;

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims

1. A voice processing method is applied to a terminal, and is characterized in that the method comprises the following steps:

2. The method of claim 1, further comprising:

3. The method according to claim 1 or 2, wherein outputting a simplified prompt message when the first semantic matches a first preset scope comprises:

if the proficiency of the first semantic meaning meets the preset proficiency, detecting whether the first semantic meaning is matched with the first preset scope;

4. The method of claim 3, wherein the detecting whether the proficiency of the first semantic meets a preset proficiency comprises:

and determining whether the proficiency of the first semantic meaning meets the preset proficiency or not based on the historical identification information of the target semantic meaning.

5. The method of claim 4, wherein determining whether the proficiency of the first semantic meaning meets a preset proficiency based on historical recognition information of the target semantic meaning comprises:

determining the probability that the target semantics are matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics;

and if the probability that the target semantic is matched with the first preset scope in the historical operation process meets the preset probability, determining that the proficiency of the first semantic meets the preset proficiency.

6. The method of claim 5, wherein the determining the probability that the target semantic matches the first preset scope during the historical operation based on the historical identification information of the target semantic comprises:

determining a first number of times that the target semantics are matched with the first preset scope in a historical operation process based on historical identification information of the target semantics;

determining a second number of times that the target semantics are not matched with the first preset scope in a historical operation process based on historical identification information of the target semantics;

and determining the probability that the target semantics are matched with the first preset scope in the historical operation process based on the first times and the second times.

7. The method of claim 3, wherein the detecting whether the first semantic meaning matches the first predetermined scope comprises:

analyzing the first semantic meaning to obtain a keyword;

8. The method of claim 1, further comprising:

if the first semantic is not matched with the first preset scope, acquiring a second preset scope from a server;

if the first semantic meaning is matched with the second preset scope, outputting the simplified prompt information;

9. The method according to claim 1 or 7, wherein the obtaining second voice information to be recognized for the simplified prompt information and performing a preset operation based on the second voice information to be recognized comprises:

if the second semantic is matched with the first preset scope, outputting the simplified prompt message;

and executing the preset operation based on the finally obtained voice information to be recognized.

10. The method of claim 8, further comprising:

acquiring information in the second preset action domain;

11. The method of claim 1, wherein before obtaining the first speech information to be recognized and performing semantic recognition on the first speech information to be recognized to obtain a first semantic, the method further comprises:

and generating the first preset scope based on the historical to-be-recognized voice information and the historical operation.

12. A terminal, characterized in that the terminal comprises: a processor, a memory, and a communication bus;

13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the speech processing method according to any one of claims 1 to 11.