CN111081236B - Voice processing method, terminal and computer storage medium - Google Patents

Voice processing method, terminal and computer storage medium Download PDF

Info

Publication number
CN111081236B
CN111081236B CN201811228875.1A CN201811228875A CN111081236B CN 111081236 B CN111081236 B CN 111081236B CN 201811228875 A CN201811228875 A CN 201811228875A CN 111081236 B CN111081236 B CN 111081236B
Authority
CN
China
Prior art keywords
preset
semantic
information
matched
scope
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811228875.1A
Other languages
Chinese (zh)
Other versions
CN111081236A (en
Inventor
张小康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201811228875.1A priority Critical patent/CN111081236B/en
Publication of CN111081236A publication Critical patent/CN111081236A/en
Application granted granted Critical
Publication of CN111081236B publication Critical patent/CN111081236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a voice processing method, which is applied to a terminal and comprises the following steps: acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics; when the first semantic is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is smaller than a specific time length; the information in the first preset scope characterizes the operations executable by the terminal. The embodiment of the invention also discloses a terminal and a computer readable storage medium.

Description

Voice processing method, terminal and computer storage medium
Technical Field
The present invention relates to speech information recognition technology in the field of communications, and in particular, to a speech processing method, a terminal, and a computer storage medium.
Background
In the existing voice man-machine interaction flow, a machine calculates and obtains the most reliable answer by taking a problem of a user as a starting point, and makes corresponding voice broadcasting feedback for the user based on the answer. From this principle, the answer is actually a fixed result based on a dynamic traversal lookup. In other words, when a user presents a question to a machine, the machine always matches it to an answer that it deems reasonable.
However, when the man-machine interaction reaches a certain proficiency, the part serving as the feedback of the machine becomes superfluous. For example, the user says that me should go somewhere, and the terminal recognizes that it will then send out voice prompt information asking the user how to go, drive or go buses, etc. When the user speaks me to go somewhere next time or later, the voice prompt information of the terminal to the user is still fixed and time-consuming voice information, and the user can feel that the time-consuming voice information is consumed by the user, so that the working efficiency of the terminal is lower.
Disclosure of Invention
In order to solve the technical problems, the embodiment of the invention expects to provide a voice processing method, a terminal and a computer storage medium, solves the problem that the operation flow in the relative man-machine interaction technology consumes time, and improves the working efficiency of the terminal.
The technical scheme of the invention is realized as follows:
a voice processing method applied to a terminal, the method comprising:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
When the first semantic is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is smaller than a specific time length; the information in the first preset scope characterizes the operations executable by the terminal.
A terminal, the terminal comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
The processor is used for executing a program of the operation for the voice information in the memory to realize the following steps:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
When the first semantic is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is smaller than a specific time length; the information in the first preset scope characterizes the operations executable by the terminal.
A computer readable storage medium storing one or more programs executable by one or more processors to implement the steps of the speech processing method described above.
According to the voice processing method, the terminal and the computer storage medium provided by the embodiment of the invention, the first voice information to be identified is obtained, the first voice information to be identified is subjected to semantic identification to obtain the first semantic, when the first semantic is matched with the first preset scope, the simplified prompt information is output, the time corresponding to the simplified prompt information is smaller than the specific time, the information in the first preset scope characterizes the executable operation of the terminal, so that after the voice information of the user is identified, the terminal can give out the concise prompt information with shorter time consumption of the user according to the actual semantic of the voice information instead of giving out the voice prompt information with longer time consumption as in the relative man-machine interaction technology, the problem of time consumption of the operation flow in the relative man-machine interaction technology is solved, and the working efficiency of the terminal is improved.
Drawings
Fig. 1 is a schematic flow chart of a voice processing method according to an embodiment of the present invention;
FIG. 2 is a flowchart of another speech processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart of another speech processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart of a voice processing method according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
An embodiment of the present invention provides a voice processing method, referring to fig. 1, including the steps of:
Step 101, obtaining first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics.
Step 101 obtains first voice information to be identified, and performs semantic identification on the first voice information to be identified to obtain first semantics, which can be realized by a terminal; the terminal can be a device with a man-machine interaction function with a user; in one possible implementation, the terminal may be a mobile terminal capable of human-machine interaction with a user. The first voice information to be identified can be sent to the terminal when the user needs to perform man-machine interaction operation with the terminal; the first semantic meaning may be obtained after the terminal performs semantic meaning recognition on the first voice information to be recognized sent by the user by using a semantic meaning recognition technology.
Step 102, outputting the simplified prompt information when the first semantic is matched with the first preset scope.
The duration corresponding to the simplified prompt information is smaller than a specific duration, and the information in the first preset action domain characterizes the executable operation of the terminal.
It should be noted that, in step 102, when the first semantic matches the first preset scope, the simplified prompting message is output and may be implemented by the terminal; the first preset scope may be pre-generated and stored in the terminal. In one possible implementation, the first preset scope may be generated according to various operations performed by the terminal during the use of the terminal by the user history. The first preset scope may include words that are used to characterize operations that the terminal may perform.
The simplified prompt message is not longer voice message with longer time consumption of initial setting, and can be any form of prompt message with shorter time consumption, for example, the simplified prompt message can be simplified prompt message of the voice message with initial setting; that is, the simplified prompt message may be a voice message whose content is more simplified than that of the voice message initially set; of course, the simplified prompt information may be voice information which is not related to the voice information which is initially set, but is directly set and has relatively simplified content; in one possible implementation, the reduced hint information may include: and (3) a message with short time for outputting beeping, vibration, dripping, pyridazine, clattering and the like. The specific time period may be a time that the user can accept and feel more comfortable, which is preset according to the user's use feeling; but the specific time period must be less than the time period for outputting the voice prompt in the relative technology.
According to the voice processing method provided by the embodiment of the invention, the first voice information to be recognized is obtained, the first voice information to be recognized is subjected to semantic recognition to obtain the first semantic, if the first semantic is matched with the first preset scope to output the simplified prompt information, the corresponding time of the simplified prompt information is smaller than the specific time, and the information in the first preset scope characterizes the executable operation of the terminal, so that after the voice information of the user is recognized, the terminal can give out the concise prompt information which is short in time consumption of the user according to the actual semantic of the voice information, rather than give out the voice prompt information which is long in time consumption as in the relative man-machine interaction technology, the problem that the operation flow consumes time in the relative man-machine interaction technology is solved, and the working efficiency of the terminal is improved.
Based on the foregoing embodiments, an embodiment of the present invention provides a voice processing method, which is applied to a terminal, and is shown with reference to fig. 2, and the method includes the following steps:
Step 201, the terminal obtains first voice information to be identified, and performs semantic identification on the first voice information to be identified to obtain first semantics.
Step 202, the terminal detects whether the proficiency of the first semantic meaning meets a preset proficiency.
The proficiency of the first semantic refers to the success rate that the terminal can execute a complete operation for the first semantic, that is, the probability that the first semantic is matched with a first preset scope; if the probability that the first semantic meaning matches the first preset scope meets the preset probability, the proficiency of the first semantic meaning meets the preset proficiency.
It should be noted that, the step 202 of detecting whether the proficiency of the first semantic meaning meets the preset proficiency may be implemented by the following ways:
a1, acquiring history identification information of target semantics matched with first semantics;
The target semantics refer to semantics which are identified by the terminal in the history use process and are the same as or similar to the content of the first semantics. The history identification information refers to information of identification of the terminal with respect to the target semantics identical or similar to the content of the first semantics in the history use process.
A2, determining whether the proficiency of the first semantic meets the preset proficiency or not based on the historical identification information of the target semantic.
The preset proficiency may be that a preset probability capable of characterizing the first semantic meaning to be included in the first preset scope satisfies a preset probability.
Step 203, if the proficiency of the first semantic meets the preset proficiency, the terminal detects whether the first semantic is matched with the first preset scope.
Wherein detecting whether the first semantic matches the first scope may be accomplished by detecting whether the first semantic matches information included in a first preset scope. The first preset scope contains a set of operations that are capable of characterizing processing operations for user-initiated voice requests.
Step 204, when the first semantic is matched with the first preset scope, the terminal outputs the simplified prompt information.
The time length corresponding to the simplified prompt information is smaller than the specific time length; the information in the first preset scope characterizes operations executable by the terminal.
Step 205, if the first semantic is not matched with the first preset scope, the terminal obtains a second preset scope from the server.
If the first semantic meaning is not matched with the information included in the first preset action domain, the first semantic meaning is not matched with the first preset action domain; the second preset scope is different from the first preset scope, and the second preset scope can be stored in the server; the second preset scope may be generated from historical operating information in the server.
Step 206, if the first semantic is matched with the second preset scope, the terminal outputs the simplified prompt message.
Wherein step 207 may be performed after both step 204 and step 206;
Step 207, the terminal obtains the second voice information to be recognized for the simplified prompt information, and performs a preset operation based on the second voice information to be recognized.
The second voice information to be recognized may be voice information different from the first voice information to be recognized, which is sent to the user in response to the simplified prompt information after the terminal outputs the simplified prompt information to the user.
It should be noted that, in the embodiments of the present invention, the explanation of the same steps or concepts as those in other embodiments may refer to the descriptions in other embodiments, which are not repeated here.
According to the voice processing method provided by the embodiment of the invention, the first voice information to be recognized is obtained, the first voice information to be recognized is subjected to semantic recognition to obtain the first semantic, if the first semantic is matched with the first preset scope to output the simplified prompt information, the time corresponding to the simplified prompt information is smaller than the specific time, and the information in the first preset scope characterizes the executable operation of the terminal, so that after the voice information of the user is recognized, the terminal can give out the concise prompt information with shorter time consumption of the user according to the actual semantic of the voice information instead of giving out the voice prompt information with longer time consumption as in the relative man-machine interaction technology, the problem that the operation flow consumes time in the relative man-machine interaction technology is solved, and the working efficiency of the terminal is improved.
Based on the foregoing embodiments, an embodiment of the present invention provides a voice processing method, which is applied to a terminal, and is shown with reference to fig. 3, and the method includes the following steps:
Step 301, the terminal obtains first voice information to be identified, and performs semantic identification on the first voice information to be identified to obtain first semantics.
Step 302, the terminal acquires historical identification information of target semantics matched with the first semantics.
Step 303, the terminal determines the probability of matching the target semantic with the first preset scope in the history operation process based on the history identification information of the target semantic.
If the voice request initiated by the user in the interaction process with the terminal is contained in the first preset scope, the user completes man-machine interaction meeting the proficiency level once. The corresponding proficiency counter will record a successful operation. Otherwise, if the user initiates a voice request outside the first preset scope in the subsequent interaction process with the terminal, the user is considered to not complete the man-machine interaction flow meeting the proficiency. The corresponding proficiency counter will record a failed operation. When the ratio between the number of success and the number of failure in the proficiency counter reaches a preset threshold, the user is considered to reach proficiency in the voice interaction of the first preset scope, and the voice prompt in the relative technology in the first preset scope, which takes longer time, is canceled, and the simplified prompt information in the embodiment of the invention is output.
Step 303, based on the history identifying information of the target semantic, determines a probability that the target semantic matches the first preset scope in the history operation process, which may be implemented by the following ways:
b1, the terminal determines the first times of matching the target semantics with a first preset scope in the history operation process based on the history identification information of the target semantics.
The first times may be times that the terminal counted by the proficiency statistics unit recognizes that the semantics of the voice information sent by the user match the information included in the first preset action domain.
And b2, determining a second time of the target semantic not being matched with the first preset scope in the history operation process by the terminal based on the history identification information of the target semantic.
The second times may be times that the terminal counted by the proficiency statistics unit recognizes that the semantics of the voice information sent by the user is not matched with the information included in the first preset action domain.
And b3, the terminal determines the probability of matching the target semantics with a first preset scope in the history operation process based on the first times and the second times.
If the ratio of the first times to the second times is larger than a preset value, the ratio of the successful times and the failed times in the proficiency statistics device is considered to reach a preset threshold value; at this time, the terminal may determine that the probability that the target semantic matches the first preset scope in the history operation process meets the preset probability.
Step 304, if the probability that the target semantic matches the first preset scope in the history operation process meets the preset probability, the terminal determines that the proficiency of the first semantic meets the preset proficiency.
Step 305, if the proficiency of the first semantic meets the preset proficiency, the terminal analyzes the first semantic and acquires the keywords.
Wherein the keyword may be a word capable of characterizing the operation referred to by the first semantic.
Step 306, the terminal determines a target scope matched with the keyword from the first preset scope.
The first preset scope may be a set, that is, the first preset scope includes a plurality of scopes, and each scope has its own identifier, for example, the identifier may be a category of the scope or a name of the scope; and the terminal acquires the scope of which the name (namely the category of the scope) is matched with the keyword from the first preset scope according to the keyword, and further determines the scope as the target scope.
Step 307, the terminal detects whether the keywords match the words in the target scope.
The detection of whether the keyword matches with the word in the target scope may be performed by detecting whether the word in the target scope has the same word or the same meaning as the keyword; if there is a word having the same meaning or meaning as the keyword in the target scope, the keyword may be considered to match the word in the target scope.
Step 308, if the keyword matches the word in the target scope, the terminal determines that the first semantic matches the first preset scope.
Step 309, if the first semantic matches the first preset scope, the terminal outputs a simplified prompt message.
The time length corresponding to the simplified prompt information is smaller than the specific time length; the information in the first preset scope characterizes operations executable by the terminal.
Step 310, if the first semantic is not matched with the first preset scope, the terminal acquires a second preset scope from the server.
Step 311, if the first semantic is matched with the second preset scope, the terminal outputs the simplified prompt message.
Wherein step 312 may be performed after both step 309 and step 311;
Step 312, the terminal obtains the second voice information to be recognized for the simplified prompt information, and performs semantic recognition on the second voice information to be recognized to obtain second semantics.
The implementation process of performing semantic recognition on the second voice information to be recognized to obtain the second semantic is the same as the implementation process of performing semantic recognition on the first voice information to be recognized to obtain the first semantic.
Step 313, if the second semantic is matched with the first preset scope, the terminal outputs the simplified prompt message.
Step 314, the terminal obtains the third voice information to be recognized for the simplified prompt information until the voices to be recognized are all obtained.
It should be noted that, if the terminal detects that the third semantic meaning corresponding to the third voice information to be recognized matches the first preset scope, the terminal outputs the simplified prompt information, and then the terminal obtains the voice information to be recognized sent by the user aiming at the simplified prompt information until all the voice information to be recognized, which is required to be sent to the terminal by the user, is sent.
Step 315, the terminal executes a preset operation based on the finally obtained voice information to be recognized.
The finally acquired voice information to be recognized is the last time sent to the terminal in all voice information to be recognized which is sent to the terminal by the user.
It should be noted that, when the second semantic is not in the first preset scope and when the third semantic is not in the first preset scope, the operation executed by the terminal is the same as the operation of the second semantic not in the first preset scope.
In other embodiments of the invention, the method may further comprise the steps of:
and acquiring information in a second preset action domain.
Updating the first preset scope based on the information of the second preset scope.
The terminal may add a word different from the first preset scope to the second preset scope, thereby updating the first preset scope.
In other embodiments of the present invention, the method may further comprise the following steps, prior to step 301:
a history operation performed on the history voice information to be recognized is acquired.
And generating a first preset scope based on the historical voice information to be recognized and the historical operation.
The history operation performed on the history voice information to be recognized refers to the voice prompt information and/or performed operation that the terminal sends the voice information to be recognized to itself according to the user history, and the voice prompt information and/or performed operation is returned to the user after the semantics are recognized.
The following is a very simple example:
Referring to fig. 4, if the user sends the information of instruction 1 "i want to go to the court square" to the terminal, the terminal performs voice recognition on the instruction 1 after receiving the instruction, and judges that the semantic is matched with the name of the scope in the first preset scope after the recognition is successful; if the matching is successful and the proficiency reaches the preset proficiency, a simple simplified prompt message, such as beep sound, is output at the moment; then receiving an instruction 2 'subway sitting' sent by the user aiming at the simplified prompt information, continuously identifying the terminal, continuously outputting the simplified prompt information, such as beep sound, if the condition is met, and then presenting a prescribed route for the user. But if the user gives an instruction of "i want to go to the house square" in the relative technology, the terminal will send "how go? Voice prompt information about whether you want to select driving, public transportation, walking or subway (the broadcasting time is longer here); after the user answers "subway sitting", the terminal will give a prescribed route. Obviously, compared with voice prompts given in the technology, the voice prompts are time-consuming and have poor user experience.
It should be noted that, in the embodiments of the present invention, the explanation of the same steps or concepts as those in other embodiments may refer to the descriptions in other embodiments, which are not repeated here.
According to the voice processing method provided by the embodiment of the invention, the first voice information to be recognized is obtained, the first voice information to be recognized is subjected to semantic recognition to obtain the first semantic, if the first semantic is matched with the first preset scope to output the simplified prompt information, the time corresponding to the simplified prompt information is smaller than the specific time, and the information in the first preset scope characterizes the executable operation of the terminal, so that after the voice information of the user is recognized, the terminal can give out the concise prompt information with shorter time consumption of the user according to the actual semantic of the voice information instead of giving out the voice prompt information with longer time consumption as in the relative man-machine interaction technology, the problem that the operation flow consumes time in the relative man-machine interaction technology is solved, and the working efficiency of the terminal is improved.
Based on the foregoing embodiments, an embodiment of the present invention provides a terminal, which may be applied to the voice processing method provided in the corresponding embodiment of fig. 1to 3, and referring to fig. 5, the terminal may include: a processor 41, a memory 42 and a communication bus 43;
A communication bus 43 for enabling a communication connection between the processor 41 and the memory 42;
The processor 41 is configured to execute a program for operation of voice information in the memory 42 to realize the steps of:
Acquiring first voice information to be identified, and carrying out semantic identification on the first voice information to be identified to obtain first semantics;
When the first semantic is matched with a first preset scope, outputting simplified prompt information;
the time length corresponding to the simplified prompt information is smaller than the specific time length; the information in the first preset scope characterizes operations executable by the terminal.
In other embodiments of the present invention, the processor 41 is configured to execute a speech processing program in the memory 42 to implement the following steps:
And acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the present invention, the processor 41 is configured to execute the first semantic matching with the first preset scope in the memory 42 to output the simplified prompt message, so as to implement the following steps:
detecting whether the proficiency of the first semantic meaning meets a preset proficiency;
If the proficiency of the first semantic meets the preset proficiency, detecting whether the first semantic is matched with a first preset scope;
And outputting the simplified prompt information when the first semantic is matched with the first preset scope.
In other embodiments of the present invention, the processor 41 is configured to execute the step of detecting whether the proficiency of the first semantic meaning in the memory 42 satisfies a preset proficiency to implement the following steps:
acquiring historical identification information of target semantics matched with the first semantics;
Based on the historical identification information of the target semantics, whether the proficiency of the first semantics meets the preset proficiency is determined.
In other embodiments of the present invention, the processor 41 is configured to execute the target semantic based history identification information in the memory 42 to determine whether the proficiency of the first semantic meets a preset proficiency to implement the following steps:
Determining the probability of matching the target semantics with a first preset scope in the history operation process based on the history identification information of the target semantics;
If the probability that the target semantic is matched with the first preset scope in the history operation process meets the preset probability, determining that the proficiency of the first semantic meets the preset proficiency.
In other embodiments of the present invention, the processor 41 is configured to execute the history identifying information based on the target semantics in the memory 42, and determine the probability that the target semantics matches the first preset scope during the history operation, so as to implement the following steps:
determining a first number of times of matching the target semantics with a first preset scope in a history operation process based on history identification information of the target semantics;
Determining a second number of times that the target semantics are not matched with the first preset scope in the history operation process based on the history identification information of the target semantics;
And determining the probability that the target semantics are matched with the first preset scope in the history operation process based on the first times and the second times.
In other embodiments of the present invention, the processor 41 is configured to perform the steps of detecting whether the first semantic matches the first preset scope in the memory 42 to:
Analyzing the first semantic meaning to obtain keywords;
determining a target scope matched with the keywords from a first preset scope;
Detecting whether the keywords are matched with the words in the target scope;
and if the keywords are matched with the words in the target scope, determining that the first semantic is matched with the first preset scope.
In other embodiments of the present invention, the processor 41 is configured to execute an operating program for voice information in the memory 42 to implement the following steps:
if the first semantic is not matched with the first preset scope, acquiring a second preset scope from the server;
if the first semantic meaning is matched with the second preset action range, outputting simplified prompt information;
And acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the present invention, the processor 41 is configured to perform the steps of obtaining the second to-be-recognized voice information for the simplified prompt information in the memory 42, and performing a preset operation based on the second to-be-recognized voice information to implement the following steps:
acquiring second voice information to be recognized aiming at the simplified prompt information, and carrying out semantic recognition on the second voice information to be recognized to obtain second semantics;
If the second semantic is matched with the first preset action range, outputting simplified prompt information;
acquiring third voice information to be recognized aiming at the simplified prompt information until voices to be recognized are acquired;
And executing preset operation based on the finally acquired voice information to be recognized.
In other embodiments of the present invention, the processor 41 is configured to execute an operating program for voice information in the memory 42 to implement the following steps:
Acquiring information in a second preset action domain;
updating the first preset scope based on the information of the second preset scope.
In other embodiments of the present invention, the processor 41 is configured to perform the following steps before acquiring the first to-be-identified voice information in the memory 42 and performing semantic recognition on the first to-be-identified voice information to obtain the first semantic, where:
acquiring a history operation executed for the history voice information to be identified;
And generating a first preset scope based on the historical voice information to be recognized and the historical operation.
It should be noted that, in the embodiment of the present invention, the interaction process between the steps executed by the first processor may refer to the interaction process in the voice processing method provided in the embodiment corresponding to fig. 1 to 3, which is not described herein again.
According to the terminal provided by the embodiment of the invention, the first voice information to be recognized is obtained, the first voice information to be recognized is subjected to semantic recognition to obtain the first semantic, if the first semantic is matched with the first preset action range to output the simplified prompt information, the time corresponding to the simplified prompt information is smaller than the specific time, and the information in the first preset action range characterizes the executable operation of the terminal, so that after the voice information of the user is recognized, the terminal can give out the concise prompt information with shorter time consumption of the user according to the actual semantic of the voice information instead of giving out the voice prompt information with longer time consumption as in the relative human-computer interaction technology, the problem of time consumption of the operation flow in the relative human-computer interaction technology is solved, and the working efficiency of the terminal is improved.
Based on the foregoing embodiments, embodiments of the present invention provide a computer-readable storage medium storing one or more programs executable by one or more processors to implement the steps of:
Acquiring first voice information to be identified, and carrying out semantic identification on the first voice information to be identified to obtain first semantics;
When the first semantic is matched with a first preset scope, outputting simplified prompt information;
the time length corresponding to the simplified prompt information is smaller than the specific time length; the information in the first preset scope characterizes operations executable by the terminal.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
And acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
detecting whether the proficiency of the first semantic meaning meets a preset proficiency;
If the proficiency of the first semantic meets the preset proficiency, detecting whether the first semantic is matched with a first preset scope;
And outputting the simplified prompt information when the first semantic is matched with the first preset scope.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
acquiring historical identification information of target semantics matched with the first semantics;
Based on the historical identification information of the target semantics, whether the proficiency of the first semantics meets the preset proficiency is determined.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
Determining the probability of matching the target semantics with a first preset scope in the history operation process based on the history identification information of the target semantics;
If the probability that the target semantic is matched with the first preset scope in the history operation process meets the preset probability, determining that the proficiency of the first semantic meets the preset proficiency.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
determining a first number of times of matching the target semantics with a first preset scope in a history operation process based on history identification information of the target semantics;
Determining a second number of times that the target semantics are not matched with the first preset scope in the history operation process based on the history identification information of the target semantics;
And determining the probability that the target semantics are matched with the first preset scope in the history operation process based on the first times and the second times.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
Analyzing the first semantic meaning to obtain keywords;
determining a target scope matched with the keywords from a first preset scope;
Detecting whether the keywords are matched with the words in the target scope;
and if the keywords are matched with the words in the target scope, determining that the first semantic is matched with the first preset scope.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
if the first semantic is not matched with the first preset scope, acquiring a second preset scope from the server;
if the first semantic meaning is matched with the second preset action range, outputting simplified prompt information;
And acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
acquiring second voice information to be recognized aiming at the simplified prompt information, and carrying out semantic recognition on the second voice information to be recognized to obtain second semantics;
If the second semantic is matched with the first preset action range, outputting simplified prompt information;
acquiring third voice information to be recognized aiming at the simplified prompt information until voices to be recognized are acquired;
And executing preset operation based on the finally acquired voice information to be recognized.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
Acquiring information in a second preset action domain;
updating the first preset scope based on the information of the second preset scope.
In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:
acquiring a history operation executed for the history voice information to be identified;
And generating a first preset scope based on the historical voice information to be recognized and the historical operation.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention.

Claims (9)

1. A voice processing method applied to a terminal, the method comprising:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is smaller than a specific time length; the information in the first preset action domain characterizes the operation executable by the terminal;
When the first semantic is matched with a first preset scope, outputting simplified prompt information, wherein the simplified prompt information comprises: detecting whether the proficiency of the first semantic meaning meets a preset proficiency; if the proficiency of the first semantic meaning meets the preset proficiency, detecting whether the first semantic meaning is matched with the first preset scope; outputting simplified prompt information when the first semantic is matched with the first preset scope;
Wherein the detecting whether the proficiency of the first semantic meaning meets a preset proficiency includes: acquiring historical identification information of target semantics matched with the first semantics; determining whether the proficiency of the first semantic meaning meets a preset proficiency or not based on the historical identification information of the target semantic meaning;
wherein the determining whether the proficiency of the first semantic meaning meets a preset proficiency based on the history identification information of the target semantic meaning includes: determining the probability of matching the target semantic with the first preset scope in the history operation process based on the history identification information of the target semantic; if the probability that the target semantic is matched with the first preset scope in the history operation process meets the preset probability, determining that the proficiency of the first semantic meets the preset proficiency;
The determining, based on the history identifying information of the target semantic, a probability that the target semantic matches the first preset scope in a history operation process includes: determining a first number of times that the target semantic is matched with the first preset scope in the history operation process based on the history identification information of the target semantic; determining a second number of times that the target semantic is not matched with the first preset scope in the history operation process based on the history identification information of the target semantic; and determining the probability that the target semantic is matched with the first preset scope in the history operation process based on the first times and the second times.
2. The method according to claim 1, wherein the method further comprises:
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
3. The method of claim 1, wherein the detecting whether the first semantic matches the first preset scope comprises:
Analyzing the first semantic meaning to obtain keywords;
determining a target scope matched with the keyword from the first preset scope;
detecting whether the keyword is matched with the word in the target scope;
And if the keyword is matched with the word in the target scope, determining that the first semantic is matched with the first preset scope.
4. The method according to claim 1, wherein the method further comprises:
If the first semantic is not matched with the first preset action scope, a second preset action scope is obtained from a server;
If the first semantic meaning is matched with the second preset scope, outputting the simplified prompt information;
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
5. A method according to claim 1 or 3, wherein the obtaining the second voice information to be recognized for the simplified prompt information and performing a preset operation based on the second voice information to be recognized includes:
Acquiring second voice information to be recognized aiming at the simplified prompt information, and carrying out semantic recognition on the second voice information to be recognized to obtain second semantics;
if the second semantic meaning is matched with the first preset action range, outputting the simplified prompt information;
acquiring third voice information to be recognized aiming at the simplified prompt information until voices to be recognized are acquired;
and executing the preset operation based on the finally acquired voice information to be recognized.
6. The method according to claim 4, wherein the method further comprises:
acquiring information of the second preset action domain;
updating the first preset scope based on the information of the second preset scope.
7. The method of claim 1, wherein before the obtaining the first voice information to be recognized and performing semantic recognition on the first voice information to be recognized to obtain the first semantic, further comprises:
acquiring a history operation executed for the history voice information to be identified;
And generating the first preset action domain based on the historical voice information to be recognized and the historical operation.
8. A terminal, the terminal comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
The processor is used for executing a program of the operation for the voice information in the memory to realize the following steps:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is smaller than a specific time length; the information in the first preset action domain characterizes the operation executable by the terminal;
When the first semantic is matched with a first preset scope, outputting simplified prompt information, wherein the simplified prompt information comprises: detecting whether the proficiency of the first semantic meaning meets a preset proficiency; if the proficiency of the first semantic meaning meets the preset proficiency, detecting whether the first semantic meaning is matched with the first preset scope; outputting simplified prompt information when the first semantic is matched with the first preset scope;
Wherein the detecting whether the proficiency of the first semantic meaning meets a preset proficiency includes: acquiring historical identification information of target semantics matched with the first semantics; determining whether the proficiency of the first semantic meaning meets a preset proficiency or not based on the historical identification information of the target semantic meaning;
wherein the determining whether the proficiency of the first semantic meaning meets a preset proficiency based on the history identification information of the target semantic meaning includes: determining the probability of matching the target semantic with the first preset scope in the history operation process based on the history identification information of the target semantic; if the probability that the target semantic is matched with the first preset scope in the history operation process meets the preset probability, determining that the proficiency of the first semantic meets the preset proficiency;
The determining, based on the history identifying information of the target semantic, a probability that the target semantic matches the first preset scope in a history operation process includes: determining a first number of times that the target semantic is matched with the first preset scope in the history operation process based on the history identification information of the target semantic; determining a second number of times that the target semantic is not matched with the first preset scope in the history operation process based on the history identification information of the target semantic; and determining the probability that the target semantic is matched with the first preset scope in the history operation process based on the first times and the second times.
9. A computer readable storage medium storing one or more programs executable by one or more processors to implement the steps of the speech processing method of any one of claims 1 to 7.
CN201811228875.1A 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium Active CN111081236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811228875.1A CN111081236B (en) 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811228875.1A CN111081236B (en) 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium

Publications (2)

Publication Number Publication Date
CN111081236A CN111081236A (en) 2020-04-28
CN111081236B true CN111081236B (en) 2024-06-21

Family

ID=70309666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811228875.1A Active CN111081236B (en) 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium

Country Status (1)

Country Link
CN (1) CN111081236B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006058641A (en) * 2004-08-20 2006-03-02 Nissan Motor Co Ltd Speech recognition device

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4724943B2 (en) * 2001-04-05 2011-07-13 株式会社デンソー Voice recognition device
JP2003241793A (en) * 2002-02-14 2003-08-29 Nissan Motor Co Ltd Display device, and method and program for name display
CN101158584B (en) * 2007-11-15 2011-01-26 熊猫电子集团有限公司 Voice destination navigation realizing method of vehicle mounted GPS
CN101330689A (en) * 2008-07-11 2008-12-24 北京天语君锐科技有限公司 Method and device for playing prompting sound
DE112009003645B4 (en) * 2008-12-16 2014-05-15 Mitsubishi Electric Corporation navigation device
CN104699694B (en) * 2013-12-04 2019-08-23 腾讯科技(深圳)有限公司 Prompt information acquisition methods and device
CN103929533A (en) * 2014-03-18 2014-07-16 联想(北京)有限公司 Information processing method and electronic equipment
US9564123B1 (en) * 2014-05-12 2017-02-07 Soundhound, Inc. Method and system for building an integrated user profile
CN104535074A (en) * 2014-12-05 2015-04-22 惠州Tcl移动通信有限公司 Bluetooth earphone-based voice navigation method, system and terminal
CN105138250A (en) * 2015-08-03 2015-12-09 科大讯飞股份有限公司 Human-computer interaction operation guide method, human-computer interaction operation guide system, human-computer interaction device and server
CN109215640B (en) * 2017-06-30 2021-06-01 深圳大森智能科技有限公司 Speech recognition method, intelligent terminal and computer readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006058641A (en) * 2004-08-20 2006-03-02 Nissan Motor Co Ltd Speech recognition device

Also Published As

Publication number Publication date
CN111081236A (en) 2020-04-28

Similar Documents

Publication Publication Date Title
KR102437944B1 (en) Voice wake-up method and device
CN109326289B (en) Wake-up-free voice interaction method, device, equipment and storage medium
CN108831469B (en) Voice command customizing method, device and equipment and computer storage medium
US9583102B2 (en) Method of controlling interactive system, method of controlling server, server, and interactive device
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
CN110970021B (en) Question-answering control method, device and system
EP3956884B1 (en) Identification and utilization of misrecognitions in automatic speech recognition
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
CN109360551B (en) Voice recognition method and device
CN110570855A (en) system, method and device for controlling intelligent household equipment through conversation mechanism
TWI674517B (en) Information interaction method and device
CN109545203A (en) Audio recognition method, device, equipment and storage medium
CN113901837A (en) Intention understanding method, device, equipment and storage medium
CN111081236B (en) Voice processing method, terminal and computer storage medium
KR20090076318A (en) Realtime conversational service system and method thereof
CN112447177B (en) Full duplex voice conversation method and system
CN112199470A (en) Session-based customer complaint service method, intelligent terminal and storage medium
CN111104502A (en) Dialogue management method, system, electronic device and storage medium for outbound system
CN110516043B (en) Answer generation method and device for question-answering system
CN114596842A (en) Voice interaction method and device, computer equipment and storage medium
CN106682221B (en) Question-answer interaction response method and device and question-answer system
CN113539275A (en) Method, apparatus and storage medium for determining dialogs
CN113241067B (en) Voice interaction method and system and voice interaction equipment
CN118098283B (en) 5G message voice intention recognition system, method, equipment and medium
CN113241066B (en) Voice interaction method and system and voice interaction equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant