CN115171672A

CN115171672A - Voice processing method, device and computer storage medium

Info

Publication number: CN115171672A
Application number: CN202110359571.4A
Authority: CN
Inventors: 应臻恺; 时红仁
Original assignee: Shanghai Qwik Smart Technology Co Ltd
Current assignee: Shanghai Qwik Smart Technology Co Ltd
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-10-11

Abstract

The application discloses a voice processing method, a voice processing device and a computer storage medium, wherein the method comprises the following steps: acquiring at least one second voice operation command which is input to a target operation object by a user before the first voice operation command to the target operation object is input by the user; the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement; and performing voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result for representing the voice recognition satisfaction of the user. The voice processing method, the voice processing device and the computer storage medium can analyze the condition that the user uses the voice recognition to obtain the satisfaction degree of the user on the voice recognition and provide help for voice recognition optimization.

Description

Voice processing method, device and computer storage medium

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to a speech processing method and apparatus, and a computer storage medium.

Background

With the rapid development of the voice recognition technology, the application of the voice recognition function to the vehicle is more and more common, and the use scenes of the voice recognition function by the user are more and more. In order to optimize the speech recognition, it is generally necessary to analyze the user's satisfaction with the speech recognition, and the existing way of analyzing the user's satisfaction with the speech recognition mainly relies on analyzing whether the speech input by the user is successfully recognized, however, in the case of successful speech recognition input by the user, the operation performed by the vehicle based on the input speech may not be desired by the user, and these may also affect the user's satisfaction with the speech recognition.

The foregoing description is provided for general background information and does not necessarily constitute prior art.

Disclosure of Invention

An object of the present invention is to provide a speech processing method, apparatus and computer storage medium, which are advantageous in that the satisfaction of a user in speech recognition can be obtained by analyzing the situation of the user using speech recognition, so as to provide help for speech recognition optimization.

Another object of the present invention is to provide a voice processing method, apparatus and computer storage medium, which are advantageous in that the input time of different voice operation commands is analyzed to accurately obtain the satisfaction condition of voice recognition of the corresponding voice operation command, and the method, apparatus and computer storage medium are simple and convenient to operate.

It is another object of the present invention to provide a voice processing method, apparatus and computer storage medium, which are advantageous in that they further provide assistance for voice recognition optimization by improving voice operation commands whose voice recognition results do not satisfy user requirements.

Additional advantages and features of the invention will be set forth in the detailed description which follows and in part will be apparent from the description, or may be learned by practice of the invention as set forth hereinafter.

According to one aspect of the present invention, the foregoing and other objects and advantages can be achieved by a speech processing method of the present invention comprising the steps of:

acquiring at least one second voice operation command which is input to a target operation object by a user before the first voice operation command to the target operation object is input by the user; wherein the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement;

and performing voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result for representing the voice recognition satisfaction of the user.

According to an embodiment of the present invention, the acquiring at least one second voice operation command which has been input by a user to a target operation object before the first voice operation command to the target operation object is input by the user comprises the following steps:

acquiring a historical voice operation command set which is composed of historical voice operation commands with adjacent input time and time intervals meeting preset conditions;

and determining the historical voice operation command corresponding to the latest input time in the historical voice operation command set as a first voice operation command, and determining the historical voice operation commands except the historical voice operation command corresponding to the latest input time as second voice operation commands.

According to an embodiment of the present invention, after performing the voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result for characterizing the voice recognition satisfaction of the user, the method further includes the following steps:

and taking the second voice operation command as the input of a set voice recognition model, and taking the operation executed based on the first voice operation command as the output of the voice recognition model, and training the voice recognition model.

Accordingly, the present invention provides an apparatus for executing the above speech processing method, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the following steps: acquiring at least one second voice operation command which is input to a target operation object by a user before a first voice operation command to the target operation object is input by the user; wherein the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement; and performing voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result for representing the voice recognition satisfaction of the user.

Accordingly, the present invention provides a computer storage medium having a computer program stored thereon, which, when executed by a processor, implements the steps of the above-described speech processing method.

Drawings

Fig. 1 is a schematic flow chart of a speech processing method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a speech processing apparatus according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of a claim "comprising a" 8230a "\8230means" does not exclude the presence of additional identical elements in the process, method, article or apparatus in which the element is incorporated, and further, similarly named components, features, elements in different embodiments of the application may have the same meaning or may have different meanings, the specific meaning of which should be determined by its interpretation in the specific embodiment or by further combination with the context of the specific embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, items, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "a, B or C" or "a, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

It should be noted that step numbers such as S101 and S102 are used herein for the purpose of more clearly and briefly describing the corresponding contents, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S102 first and then S101 in specific implementations, but these steps should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

Please refer to fig. 1, which is a flowchart illustrating a speech processing method according to an embodiment of the present invention, where the method is applicable to analyzing a speech recognition satisfaction of a user, and the method may be executed by a speech processing apparatus according to an embodiment of the present invention, where the speech processing apparatus may be implemented in a software and/or hardware manner, and the speech processing apparatus may specifically be a terminal, such as a mobile phone, a car machine, a wearable device, or a server, and in this embodiment, the speech processing method is described by taking an example in which the speech processing method is applied to a car machine, and the method includes the following steps:

step S101: acquiring at least one second voice operation command which is input to a target operation object by a user before the first voice operation command to the target operation object is input by the user; wherein the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement;

the operation executed based on the first voice operation command satisfies a user requirement, which means that the operation executed based on the first voice operation command is an operation that a user wants to execute or an arrival purpose, for example, if the first voice operation command is to turn on an air conditioner, if the operation executed based on the first voice operation command is to turn on the air conditioner, it indicates that the operation executed based on the first voice operation command satisfies the user requirement; if the operation executed based on the first voice operation command is not to open an air conditioner, for example, to open a window, it indicates that the operation executed based on the first voice operation command does not meet the user requirement at this time. It should be noted that the in-vehicle device successfully recognizes both the first voice operation command and the second voice operation command, only that the recognition results are different, that is, the in-vehicle device performs different operations based on the first voice operation command and the second voice operation command respectively. In addition, the input time interval between the first voice operation command and the second voice operation command should be smaller than a preset time length threshold value. The target operation object can be a concrete object, such as an air conditioner, a vehicle window and the like, and can also be an abstract object, such as an on-board radio, a multimedia application and the like. That is, when the operation executed based on the first voice operation command meets the user requirement, the recognition of the first voice operation command by the car machine is successful, and the operation executed correspondingly is the user requirement, and when the operation executed based on the second voice operation command does not meet the user requirement, the recognition of the second voice operation command by the car machine is successful, but the operation executed correspondingly is not the user requirement. It can be understood that, when a user performs a voice control operation through a car machine, if the user wants to control the car machine to execute operation a, the user may input a corresponding voice operation command, and due to technical factors such as recognition accuracy and/or human factors such as accent, the car machine may recognize an operation to be executed by the voice operation command as operation B, at this time, the user may continue to input the same voice operation command or input an adjusted voice operation command, and stop the voice input until the car machine successfully executes operation a based on the input voice operation command, at this time, the last input voice operation command is determined as the first voice operation command, and the voice operation command whose input time is before the last input voice operation command is determined as the second voice operation command.

In one embodiment, the obtaining at least one second voice operation command which is input by the user to the target operation object before the first voice operation command to the target operation object is input by the user includes the following steps:

It can be understood that, the user usually continues to input the voice operation command after the recognition of the input voice operation command by the car machine fails, so that the voice operation commands have an association relationship in time, that is, the time interval between every two voice operation commands is not too long and is smaller than the time interval during which the user normally uses the voice recognition function, and therefore, the first voice operation command and the second voice operation command can be obtained based on the input time and the time interval. It should be noted that, because the operations that the car machine is intended to execute are different, the number of the obtained first voice operation commands may be multiple, and correspondingly, the number of the second voice operation commands may also be multiple. Wherein the preset condition may include a time interval less than a minimum time interval or an average time interval for the user to use the voice recognition function. Here, historical data of the user's use of the voice recognition function may be collected and analyzed to obtain characteristics or habits of the user's use of the voice recognition function, such as a maximum time interval, a minimum time interval, or an average time interval. Therefore, the first voice operation command and the second voice operation command are obtained based on the input time and the time interval, and the voice recognition satisfaction conditions of the corresponding voice operation commands are accurately obtained by analyzing the input time of different voice operation commands, so that the voice recognition device is simple and convenient to operate.

In one embodiment, the obtaining of the historical voice operation command set composed of historical voice operation commands with adjacent input times and time intervals meeting preset conditions includes the following steps:

sequencing the historical voice operation commands input by the user within a preset time length according to the sequence of the input time from front to back;

determining a target historical voice operation command from the sorted historical voice operation commands, wherein the time interval between the input time of the target historical voice operation command and the input time of the previous historical voice operation command does not meet a preset condition, and the time interval between the input time of the target historical voice operation command and the input time of the next historical voice operation command meets the preset condition;

and sequentially selecting and inputting historical voice operation commands with adjacent time and time intervals meeting preset conditions from the sorted historical voice operation commands by taking the target historical voice operation commands as a starting point, and adding the historical voice operation commands into a historical voice operation command set.

The preset time period may be set according to actual needs, for example, may be set to 30 days, 60 days, and the like. When the time interval between the input time of a historical voice operation command and the input time of the previous historical voice operation command does not meet the preset condition and the time interval between the input time of the previous historical voice operation command and the input time of the next historical voice operation command meets the preset condition, the fact that the operation that the user wants to be executed by the historical voice operation command and the next historical voice operation command is the same means that the content of the historical voice operation command and the content of the next historical voice operation command are the same, but the operation that the user wants to be executed by the historical voice operation command and the previous historical voice operation command is different means that the content of the historical voice operation command and the content of the previous historical voice operation command are different means. It can be understood that, since the user can quickly and continuously input the next voice operation command when the operation executed by the car machine based on the currently input voice operation command does not meet the user requirement, that is, in order to enable the car machine to implement the same operation, the input time interval between every two voice operation commands is not too long, and therefore, with the target historical voice operation command as a starting point, the historical voice operation commands with adjacent input times and time intervals meeting the preset condition are sequentially selected from the sorted historical voice operation commands and added into the historical voice operation command set. Therefore, the required voice operation instruction can be accurately extracted by analyzing the input time of the historical voice operation instruction, and the accuracy of obtaining the voice recognition satisfaction is further improved.

Step S102: and performing voice recognition satisfaction analysis according to the number of the second voice operation commands to obtain an analysis result for representing the voice recognition satisfaction of the user.

Here, if the number of the second voice operation commands is smaller, it is indicated that the smaller the number of the voice operation commands input by the user is, the higher the voice recognition accuracy of the car machine is, and the higher the satisfaction degree of the user on the voice recognition is; if the number of the second voice operation commands is larger, it is indicated that the number of the voice operation commands input by the user is larger, the voice recognition accuracy of the vehicle-mounted device is lower, and the satisfaction degree of the user on the voice recognition is lower. Therefore, the analysis result obtained by analyzing the voice recognition satisfaction according to the number of the second voice operation commands can represent the voice recognition satisfaction of the user, namely the voice recognition accuracy can be obtained, and further help can be provided for voice recognition optimization.

In an embodiment, the performing a speech recognition satisfaction analysis according to the number of the second speech operation commands to obtain an analysis result for characterizing the speech recognition satisfaction of the user includes the following steps: and determining a target satisfaction degree grade corresponding to the number of the second voice operation commands according to the number of the second voice operation commands and the corresponding relation between different preset numbers of the voice operation commands and the satisfaction degree grade so as to obtain an analysis result for representing the voice recognition satisfaction degree of the user. The corresponding relation between the different numbers of the preset voice operation commands and the satisfaction degree grade can be set according to the actual situation requirements, for example, when the number of the second voice operation commands is 0, the grade corresponding to the voice recognition satisfaction degree can be set to ten grades; when the number of the second voice operation commands is 1, the level corresponding to the voice recognition satisfaction degree can be set to nine levels, and so on. In addition, the voice recognition satisfaction degree can also be scored according to the number of the second voice operation commands. Therefore, the satisfaction degree of the user on the voice recognition can be evaluated quickly, and the analysis efficiency is improved.

In summary, in the voice processing method provided in the foregoing embodiment, the satisfaction degree of the user for the voice recognition is obtained by analyzing the situation that the user uses the voice recognition, and help can be provided for optimizing the voice recognition.

In an embodiment, after performing the speech recognition satisfaction analysis according to the number of the second speech operation commands to obtain an analysis result for characterizing the speech recognition satisfaction of the user, the method further includes the following steps: and taking the second voice operation command as the input of a set voice recognition model, and taking the operation executed based on the first voice operation command as the output of the voice recognition model to train the voice recognition model. It is understood that, since the operation performed based on the first voice operation command satisfies the user requirement, and the operation performed based on the second voice operation command does not satisfy the user requirement, the recognition result of the set voice recognition model for the second voice operation command may be considered as wrong, and factors affecting the recognition result may be various, such as speaking accent difference, homophone, polyphone, and the like of people located in different regions, therefore, the second voice operation command is used as the input of the set voice recognition model, and the operation performed based on the first voice operation command is used as the output of the voice recognition model, and the voice recognition model is trained to improve the adaptability of the voice recognition model, so as to correspondingly improve the recognition accuracy. It should be noted that the speech recognition model may be established by using an artificial intelligence algorithm, such as a genetic algorithm, a neural network algorithm, etc., based on historical speech operation commands and corresponding recognition results of different users. Therefore, the voice recognition model is trained by adopting the voice operation command actually input by the user, so that the adaptability of the voice recognition model can be effectively improved, and the recognition precision is correspondingly improved.

In an embodiment, after performing the speech recognition satisfaction analysis according to the number of the second speech operation commands to obtain an analysis result for characterizing the speech recognition satisfaction of the user, the method further includes the following steps:

performing semantic recognition on the second voice operation command to obtain at least one keyword;

and establishing an incidence relation between the at least one keyword and the operation executed based on the first voice operation command, and storing the incidence relation to a set voice command library.

It can be understood that, by performing semantic recognition on the second voice operation command, at least one keyword included in the second voice operation command can be obtained, and whether the keyword included in the voice operation command can be correctly recognized in the voice recognition process affects the accuracy of the voice recognition result. In order to improve the satisfaction degree of voice recognition, an association relation can be established between a keyword contained in a second voice operation command which does not meet the requirements of the user and an operation executed based on the first voice operation command, and the association relation is stored in a set voice command library so that the operation executed based on the second voice operation command can meet the requirements of the user when the user subsequently inputs the second voice operation command. For example, if a user pronounces a polyphone incorrectly when inputting a voice, for example, the user should pronounce the second voice but pronounce the fourth voice, and at this time, the operation performed by the car machine based on the voice may not meet the user requirement, so that the word corresponding to the polyphone when reading the fourth voice can be associated with the operation correctly triggered when reading the second voice, thereby improving the satisfaction of the voice recognition of the user. Therefore, different voice operation commands aiming at the target operation object are analyzed, so that when the voice operation command which does not meet the user requirement in the subsequent input execution operation, the operation meeting the user requirement can be correctly triggered and executed, and the voice recognition satisfaction degree of the user is further improved.

In an embodiment, after storing the association relationship to the set voice command library, the method further includes the following steps:

and outputting a prompt message, wherein the prompt message is used for indicating that the operation executed based on the first voice operation command can be executed when the second voice operation command is input.

It can be understood that, after the different users input the second voice operation command to the car machine, if the operation performed by the car machine based on the second voice operation command does not meet the user requirement, some users may continue to try to input voice, and some users may not continue to use the voice function. Therefore, after the association relationship between the at least one keyword and the operation executed based on the first voice operation command is established, a prompt message for indicating that the operation executed based on the first voice operation command can be executed by inputting the second voice operation command can be output, so that the user is encouraged to use the voice function, and the convenience and accuracy of the use of the voice recognition function are improved. If the user has input a voice "blow glass" and the car-mounted device returns the voice "do not know what you are saying", the user may not use the voice function any more and feel the voice bad. And after one month, the car machine can prompt that 'you can blow the glass window and can open the defrosting function'.

Based on the same inventive concept of the foregoing embodiment, this embodiment describes in detail the technical solutions of the voice processing methods provided by the foregoing embodiments through different examples.

Example one

The purpose of the speech processing method provided by the present example is: the method comprises the steps of integrally giving an index representing the maturity capability of voice recognition by carrying out validity statistics on the behavior of user voice operation; and establishing a data model of a voice function, and monitoring and improving the satisfaction degree of the user.

The implementation principle of the speech processing method provided by the present example is as follows:

firstly, when voice recognition is started, acquiring and recording a voice digital signal, and storing the voice digital signal as a voice binary file;

then, recording and analyzing the recognition result, if an operation function is correspondingly triggered, the recognition of the voice of the user is not correct at the moment, and the correctness of the function operation of the voice can be continuously judged by combining manual operation; for example, it is assumed that after a navigation destination is input by voice, the user directly starts navigation after recognition, which indicates that the operation performed based on the voice command is correct, whereas, the user is likely to continue to input the navigation destination by voice, even if the voice input is performed for multiple times, the user may also input text, which indicates that the operation performed based on the voice command is incorrect, and at this time, the recognition rate is very bad; or, assuming that the user controls the song playing by using the voice, if the user inputs the voice, the corresponding song is played normally, but the user has no further operation, which indicates that the song identification included in the voice is correct, but if the user continuously inputs the voice instruction related to the song playing, the user's requirement is not met, that is, the song identification included in the voice is incorrect.

Finally, scoring is carried out according to the number of voice operations, for example, if the voice operations are completed once, the score is 100; if the second voice operation is successfully identified, the score is 90 points; and so on. Of course, the success rate of the voice command operation of the user can be counted, and the number of the 1-time success to the 10-time success can be counted.

Example two

The purpose of the speech processing method provided by the present example is: analyzing the time of the voice command, and detecting whether the voice function is still used by the user; and improving the cloud model for the commands which are not recognized and are not executed, remotely upgrading the commands to a command library, and simultaneously prompting the user to use the commands, wherein the improvement of the voice input by different users is different, and the prompt is also different.

firstly, counting the time and frequency of voice recognition used by a user, and establishing a user voice command ordered according to time;

then, analyzing the user voice command of the time sequence to obtain the maximum interval time, the last time and the previous time;

secondly, counting the voice frequency of the user before, if the time interval is within a few seconds, representing that the user continuously inputs voice, indicating that the recognition accuracy of the voice command of the user is low, and judging that the user is not satisfied at the moment;

then, carrying out voice analysis and statistics on the user voice original file, and detecting and analyzing the voice which is not correctly identified to carry out improvement;

finally, these users are prompted individually, and previously unrecognized speech can now be recognized, adding to these quick user speech commands.

In this manner, the maturity of the user's voice function can be monitored to analyze and improve the user's satisfaction of the voice command and the user's intent; the user experience is improved, the main voice using intention of the user is convenient to count, and the using convenience and accuracy of the functions are improved; the method provides basis for diagnosing the reason of the user instruction fault, and is convenient to debug and improve.

Based on the same inventive concept as the foregoing embodiment, an embodiment of the present invention provides a speech processing apparatus, as shown in fig. 2, including: a processor 110 and a memory 111 for storing computer programs capable of running on the processor 110; the processor 110 illustrated in fig. 2 is not used to refer to the number of the processors 110 as one, but is only used to refer to the position relationship of the processor 110 relative to other devices, and in practical applications, the number of the processors 110 may be one or more; similarly, the memory 111 illustrated in fig. 2 is also used in the same sense, that is, it is only used to refer to the position relationship of the memory 111 relative to other devices, and in practical applications, the number of the memory 111 may be one or more. When the processor 110 is configured to run the computer program, the steps of the voice processing method are implemented.

The voice processing apparatus may further include: at least one network interface 112. The various components of the speech processing apparatus are coupled together by a bus system 113. It will be appreciated that the bus system 113 is used to enable communications among the components. The bus system 113 includes a power bus, a control bus, and a status signal bus in addition to the data bus. For clarity of illustration, however, the various buses are labeled as bus system 113 in FIG. 2.

The memory 111 may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a magnetic random access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), synchronous Static Random Access Memory (SSRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced DRAM), synchronous Dynamic Random Access Memory (SLDRAM), direct Memory (DRmb Access), and Random Access Memory (DRAM). The memory 111 described in connection with the embodiments of the invention is intended to comprise, without being limited to, these and any other suitable types of memory.

The memory 111 in the embodiment of the present invention is used to store various types of data to support the operation of the voice processing apparatus. Examples of such data include: any computer program for operating on the speech processing apparatus, such as an operating system and application programs; contact data; telephone book data; a message; a picture; video, etc. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs may include various application programs such as a Media Player (Media Player), a Browser (Browser), etc. for implementing various application services. Here, a program that implements the method of the embodiment of the present invention may be included in the application program.

Based on the same inventive concept of the foregoing embodiments, this embodiment further provides a computer storage medium, in which a computer program is stored, where the computer storage medium may be a Memory such as a magnetic random access Memory (FRAM), a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read Only Memory (CD-ROM), and the like; or may be a variety of devices including one or any combination of the above memories, such as a mobile phone, computer, tablet device, personal digital assistant, etc. Wherein, when the computer program is run by a processor, the steps of the voice processing method are realized.

All possible combinations of the technical features of the above embodiments may not be described for the sake of brevity, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.

As used herein, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, including not only those elements listed, but also other elements not expressly listed.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of speech processing, the method comprising the steps of:

acquiring at least one second voice operation command which is input to a target operation object by a user before a first voice operation command to the target operation object is input by the user; the operation executed based on the first voice operation command meets the user requirement, and the operation executed based on the second voice operation command does not meet the user requirement;

2. The method of claim 1, wherein the obtaining of at least one second voice operation command of the target operation object, which is input by the user before the first voice operation command of the target operation object is input, comprises the following steps:

3. The method of claim 2, wherein the preset condition comprises a time interval less than a minimum time interval or an average time interval for the user to use a voice recognition function.

4. The method according to claim 2 or 3, wherein the obtaining of the historical voice operation command set which is composed of historical voice operation commands with adjacent input time and with time intervals meeting preset conditions comprises the following steps:

determining a target historical voice operation command from the sorted historical voice operation commands, wherein the time interval between the input time of the target historical voice operation command and the input time of the previous historical voice operation command does not meet a preset condition, and the time interval between the target historical voice operation command and the input time of the next historical voice operation command meets the preset condition;

5. The method according to claim 1, after performing a speech recognition satisfaction analysis according to the number of the second speech operation commands, and obtaining an analysis result for characterizing the speech recognition satisfaction of the user, further comprising the steps of:

and taking the second voice operation command as the input of a set voice recognition model, and taking the operation executed based on the first voice operation command as the output of the voice recognition model to train the voice recognition model.

6. The method according to claim 1, after performing the voice recognition satisfaction analysis according to the number of the second voice operation commands, and obtaining the analysis result for characterizing the voice recognition satisfaction of the user, further comprising the steps of:

7. The method of claim 6, further comprising, after storing the association relationship to a set voice command library, the steps of:

8. The method according to claim 1, wherein the performing a speech recognition satisfaction analysis according to the number of the second speech operation commands to obtain an analysis result for characterizing the speech recognition satisfaction of the user, comprises the following steps:

and determining a target satisfaction degree grade corresponding to the number of the second voice operation commands according to the number of the second voice operation commands and the corresponding relation between different preset numbers of the voice operation commands and the satisfaction degree grade so as to obtain an analysis result for representing the voice recognition satisfaction degree of the user.

9. A speech processing apparatus comprising: a memory configured to store one or more computer programs; and a processor coupled to the memory and configured to execute the one or more computer programs to cause the speech processing apparatus to perform the steps of the speech processing method according to any of claims 1 to 8.

10. A computer storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of a speech processing method according to any one of claims 1 to 8.