CN111081236A - Voice processing method, terminal and computer storage medium - Google Patents

Voice processing method, terminal and computer storage medium Download PDF

Info

Publication number
CN111081236A
CN111081236A CN201811228875.1A CN201811228875A CN111081236A CN 111081236 A CN111081236 A CN 111081236A CN 201811228875 A CN201811228875 A CN 201811228875A CN 111081236 A CN111081236 A CN 111081236A
Authority
CN
China
Prior art keywords
preset
information
semantic
recognized
matched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811228875.1A
Other languages
Chinese (zh)
Inventor
张小康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201811228875.1A priority Critical patent/CN111081236A/en
Publication of CN111081236A publication Critical patent/CN111081236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a voice processing method, which is applied to a terminal and comprises the following steps: acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics; when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal. The embodiment of the invention also discloses a terminal and a computer readable storage medium.

Description

Voice processing method, terminal and computer storage medium
Technical Field
The present invention relates to voice information recognition technology in the field of communications, and in particular, to a voice processing method, a terminal, and a computer storage medium.
Background
In the existing voice man-machine interaction process, a machine takes a user question as a starting point, calculates the most reliable response, and gives corresponding voice broadcast feedback to the user based on the response. From this principle, the answer is actually a fixed result based on a dynamic traversal lookup. In other words, when a user asks a question to the machine, the machine always gives it a match to an answer that it considers reasonable.
However, when human-computer interaction reaches a certain level of proficiency, the feedback part of the machine is redundant. For example, the user says that i want to go to a certain place, and the terminal sends out voice prompt information asking the user how to go, whether to take a bus or a bus after recognizing. When the user says i want to go to a certain place next time or the Nth time, the voice prompt information given by the terminal to the user is still the fixed time-consuming voice information, and the user feels that the time is consumed when using the terminal, so that the working efficiency of the terminal is low.
Disclosure of Invention
In order to solve the above technical problems, embodiments of the present invention are expected to provide a voice processing method, a terminal, and a computer storage medium, so as to solve the problem that time is consumed in an operation flow in a human-computer interaction technology, and improve the working efficiency of the terminal.
The technical scheme of the invention is realized as follows:
a voice processing method is applied to a terminal, and comprises the following steps:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal.
A terminal, the terminal comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute a program in memory that operates on speech information to implement the steps of:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal.
A computer readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of the speech processing method described above.
The voice processing method, the terminal and the computer storage medium provided by the embodiment of the invention are used for acquiring the first voice information to be recognized, performing semantic recognition on the first voice information to be recognized to obtain the first semantic, outputting the simplified prompt information when the first semantic is matched with the first preset scope, wherein the time length corresponding to the simplified prompt information is less than the specific time length, and the information in the first preset scope represents the executable operation of the terminal.
Drawings
Fig. 1 is a schematic flow chart of a speech processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another speech processing method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating another speech processing method according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a speech processing method according to another embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
An embodiment of the present invention provides a speech processing method, which is shown in fig. 1 and includes the following steps:
step 101, obtaining first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain a first semantic meaning.
Step 101, acquiring first to-be-recognized voice information, performing semantic recognition on the first to-be-recognized voice information to obtain a first semantic meaning, wherein the first semantic meaning can be realized by a terminal; the terminal can be equipment with a function of man-machine interaction with a user; in one possible implementation, the terminal may be a mobile terminal capable of human-computer interaction with a user. The first speech information to be recognized can be sent to the terminal when a user needs to perform man-machine interaction operation with the terminal; the first semantic meaning can be obtained by the terminal performing semantic recognition on the first to-be-recognized voice information sent by the user by adopting a semantic recognition technology.
And 102, outputting simplified prompt information when the first semantic meaning is matched with the first preset scope.
And the time length corresponding to the simplified prompt message is less than the specific time length, and the information in the first preset action domain represents the operation executable by the terminal.
It should be noted that, when the first semantic meaning matches the first preset scope in step 102, outputting the simplified prompt information may be implemented by the terminal; the first preset scope may be previously generated and stored in the terminal. In a possible implementation manner, the first preset scope may be generated according to various operations performed by the terminal during the user history using the terminal. The first predetermined scope may include words characterizing the operations that the terminal may perform.
The simplified prompt message is no longer the initial set voice message which takes a long time, and may be any form of prompt message which takes a short time, for example, the simplified prompt message may be a simplified version of the initial set voice message; that is, the simplified prompt message may be a voice message having a more simplified content than the voice message initially set; of course, the simplified prompt message may be a voice message that is set directly and has a relatively simplified content, regardless of the initially set voice message; in one possible implementation, simplifying the hint information may include: a short time is required for outputting such information as a buzzer, a shock, a droplet, a click, and a clattering. The specific time period may be a time which is preset according to the use feeling of the user and is acceptable and comfortable for the user; however, the specific time period must be shorter than the time period for outputting the voice guidance information in the related art.
The voice processing method provided by the embodiment of the invention obtains the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, and outputs the simplified prompt information if the first semantic is matched with the first preset scope, wherein the time corresponding to the simplified prompt information is less than the specific time, and the information in the first preset scope represents the operation executable by the terminal.
Based on the foregoing embodiments, an embodiment of the present invention provides a speech processing method, which is applied in a terminal, and as shown in fig. 2, the method includes the following steps:
step 201, a terminal acquires first voice information to be recognized, and performs semantic recognition on the first voice information to be recognized to obtain a first semantic meaning.
Step 202, the terminal detects whether the proficiency of the first semantic meets the preset proficiency.
The proficiency of the first semantic meaning refers to a success rate that the terminal can perform one complete operation on the first semantic meaning, namely, the probability that the first semantic meaning is matched with a first preset scope; and if the probability that the first semantic meaning is matched with the first preset scope meets the preset probability, the proficiency of the first semantic meaning meets the preset proficiency.
It should be noted that, the step 202 of detecting whether the proficiency of the first semantic meets the preset proficiency may be implemented by:
a1, acquiring historical identification information of a target semantic matched with the first semantic;
the target semantics refers to the semantics which are identified by the terminal in the history use process and are the same as or similar to the content of the first semantics. The history identification information refers to information for identification of a target semantic identical or similar to the content of the first semantic by the terminal during history use.
a2, determining whether the proficiency of the first semantic meets the preset proficiency based on the historical identification information of the target semantic.
The preset proficiency may be that a preset probability that the first semantic meaning included in the first preset scope can be represented meets a preset probability.
And 203, if the proficiency of the first semantic meets the preset proficiency, the terminal detects whether the first semantic is matched with the first preset scope.
Wherein detecting whether the first semantic matches the first scope may be by detecting whether the first semantic matches information included in a first preset scope. The first preset scope contains a set of operations that can characterize a processing operation for a user-initiated voice request.
And step 204, when the first semantic meaning is matched with the first preset scope, the terminal outputs simplified prompt information.
The time length corresponding to the simplified prompt information is less than a specific time length; the information in the first preset action domain represents the operation which can be executed by the terminal.
And step 205, if the first semantic is not matched with the first preset scope, the terminal acquires a second preset scope from the server.
If the first semantic is not matched with the information included in the first preset scope, the first semantic is not matched with the first preset scope; the second preset scope is different from the first preset scope, and the second preset scope can be stored in the server; the second preset scope may be generated according to historical operation information in the server.
And step 206, if the first semantic meaning is matched with the second preset scope, the terminal outputs simplified prompt information.
Wherein, step 207 can be executed after both step 204 and step 206;
and step 207, the terminal acquires second voice information to be recognized aiming at the simplified prompt information and executes preset operation based on the second voice information to be recognized.
The second to-be-recognized voice information may be voice information different from the first to-be-recognized voice information, which is sent to the user by the user in response to the simplified prompt information after the terminal outputs the simplified prompt information to the user.
It should be noted that, for the explanation of the same steps or concepts in the embodiments of the present invention and other embodiments, reference may be made to the description in other embodiments, which is not repeated herein.
The voice processing method provided by the embodiment of the invention obtains the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, outputs the simplified prompt information if the first semantic is matched with the first preset scope, and the corresponding duration of the simplified prompt information is less than the specific duration, and the information in the first preset scope represents the executable operation of the terminal, so that after the terminal recognizes the voice information of the user, the terminal can give the concise prompt information with short time consumption according to the actual semantic of the voice information instead of giving the voice prompt information with long time consumption as in the man-machine interaction technology, thereby solving the problem of time consumption of the operation process in the man-machine interaction technology and improving the working efficiency of the terminal.
Based on the foregoing embodiments, an embodiment of the present invention provides a speech processing method, which is applied in a terminal, and as shown in fig. 3, the method includes the following steps:
301, the terminal acquires the first voice information to be recognized, and performs semantic recognition on the first voice information to be recognized to obtain a first semantic meaning.
Step 302, the terminal acquires historical identification information of the target semantic matched with the first semantic.
And step 303, the terminal determines the probability that the target semantics are matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics.
If the voice request initiated by the user in the interaction process with the terminal is contained in the first preset scope, the user is meant to complete one man-machine interaction meeting the proficiency. The corresponding proficiency counter will record a successful operation. On the contrary, if the user initiates a voice request outside the first preset scope in the subsequent interaction process with the terminal, the user is considered not to complete the human-computer interaction process meeting the proficiency. The corresponding proficiency counter will record a failed operation. When the ratio of the success times to the failure times in the proficiency counter reaches a preset threshold, the user is considered to reach the proficiency level for the voice interaction of the first preset scope, the voice prompt which takes a long time in the relative technology in the first preset scope is cancelled, and the simplified prompt information in the embodiment of the invention is output.
Step 303 determines, based on the historical identification information of the target semantic, a probability that the target semantic is matched with the first preset scope in the historical operation process, and may be implemented in the following manner:
b1, the terminal determines a first number of times that the target semantics are matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics.
The first time may be the number of times that the semantics recognized by the terminal for the voice information sent by the user by the proficiency counter are matched with the information included in the first preset action domain.
b2, the terminal determines a second number of times that the target semantics are not matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics.
The second number may be the number of times that the semantics recognized by the terminal for the voice information sent by the user by the proficiency counter do not match the information included in the first preset scope.
b3, the terminal determines the probability that the target semantics are matched with the first preset scope in the historical operation process based on the first times and the second times.
If the ratio of the first times to the second times is larger than a preset value, the ratio of the success times to the failure times in the proficiency counter is considered to reach a preset threshold value; at this time, the terminal may determine that the probability that the target semantics are matched with the first preset scope in the historical operation process satisfies the preset probability.
And 304, if the probability that the target semantics are matched with the first preset scope in the historical operation process meets the preset probability, the terminal determines that the proficiency of the first semantics meets the preset proficiency.
And 305, if the proficiency of the first semantic meaning meets the preset proficiency, the terminal analyzes the first semantic meaning to obtain the keyword.
Wherein the keyword may be a word capable of characterizing the operation referred to by the first semantic.
And step 306, the terminal determines a target scope matched with the keyword from the first preset scope.
The first preset scope may be a set, that is, the first preset scope includes multiple scopes, and each scope has its own identifier, for example, the identifier may be a category of the scope or a name of the scope; the terminal acquires the scope name (namely the category of the scope) matched with the keyword from the first preset scope according to the keyword, and further determines the scope as a target scope.
Step 307, the terminal detects whether the keywords are matched with the words in the target action domain.
Detecting whether the keywords are matched with the words in the target action domain can be realized by detecting whether words which are the same as the keywords or have the same meanings as the keywords exist in the words in the target action domain; if there is a word in the target scope that is the same or has the same meaning as the keyword, the keyword may be considered to match the word in the target scope.
And 308, if the keyword is matched with the word in the target scope, the terminal determines that the first semantic is matched with the first preset scope.
And 309, if the first semantic meaning is matched with the first preset scope, the terminal outputs simplified prompt information.
The time length corresponding to the simplified prompt information is less than a specific time length; the information in the first preset action domain represents the operation which can be executed by the terminal.
And 310, if the first semantic meaning is not matched with the first preset scope, the terminal acquires a second preset scope from the server.
And 311, if the first semantic meaning is matched with the second preset scope, the terminal outputs simplified prompt information.
Wherein, step 309 and step 311 can be followed by step 312;
and step 312, the terminal acquires second voice information to be recognized aiming at the simplified prompt information, and performs semantic recognition on the second voice information to be recognized to obtain a second semantic meaning.
The implementation process of performing semantic recognition on the second voice information to be recognized to obtain the second semantic is the same as the implementation process of performing semantic recognition on the first voice information to be recognized to obtain the first semantic.
And 313, if the second semantic meaning is matched with the first preset scope, the terminal outputs simplified prompt information.
And step 314, the terminal acquires third voice information to be recognized aiming at the simplified prompt information until the voices to be recognized are acquired.
It should be noted that, if the terminal detects that the third semantic meaning corresponding to the third to-be-recognized voice information matches the first preset scope, the terminal may output the simplified prompt information, and then the terminal may acquire the to-be-recognized voice information sent by the user for the simplified prompt information until all the to-be-recognized voice information that the user needs to send to the terminal is sent.
And 315, the terminal executes preset operation based on the finally acquired voice information to be recognized.
And finally, the obtained voice information to be recognized refers to the voice information to be recognized which is sent to the terminal by the user for the last time in all the voice information to be recognized.
It should be noted that, when the second semantic is not within the first preset scope and the third semantic is not within the first preset scope, the operation performed by the terminal is the same as the operation performed by the second semantic in the first preset scope.
In other embodiments of the present invention, the method may further comprise the steps of:
and acquiring information in a second preset action domain.
And updating the first preset scope based on the information in the second preset scope.
The terminal can add a word in the second preset scope different from the word in the first preset scope to the first preset scope, and then updates the first preset scope.
In other embodiments of the present invention, before step 301, the method may further comprise the steps of:
and acquiring historical operation performed on the historical to-be-recognized voice information.
And generating a first preset scope based on the historical to-be-recognized voice information and the historical operation.
The history operation executed aiming at the history voice information to be recognized refers to voice prompt information and/or executed operation which is sent to the terminal aiming at the user history to be recognized and returned to the user after semantic meaning is recognized by the terminal.
The following is a very simple example:
referring to fig. 4, if the user sends a command 1 "i want to go to the prefecture square" to the terminal, the terminal performs voice recognition on the command 1 after receiving the command, and judges that the semantic meaning is matched with the name of the scope in the first preset scope after the recognition is successful; if the matching is successful and the proficiency reaches the preset proficiency, simple simplified prompt information, such as beep sound, is output at the moment; then, receiving an instruction 2 of 'sitting on the subway' sent by the user aiming at the simplified prompt message, the terminal continues to recognize, if a condition is met, continues to output the simplified prompt message, such as beep sound, and then presents the user with a well-defined route. However, in the relative technology, if the user gives an instruction of "i want to go to the heaven square", the terminal may give "how to go? Whether you want to select driving, public transportation or walking or subway (here, the broadcasting time is long) voice prompt information; after the user answers "sit on the subway", the terminal gives a prescribed route. Obviously, the voice prompt given in the relative technology is time-consuming and poor in user experience.
It should be noted that, for the explanation of the same steps or concepts in the embodiments of the present invention and other embodiments, reference may be made to the description in other embodiments, which is not repeated herein.
The voice processing method provided by the embodiment of the invention obtains the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, outputs the simplified prompt information if the first semantic is matched with the first preset scope, and the corresponding duration of the simplified prompt information is less than the specific duration, and the information in the first preset scope represents the executable operation of the terminal, so that after the terminal recognizes the voice information of the user, the terminal can give the concise prompt information with short time consumption according to the actual semantic of the voice information instead of giving the voice prompt information with long time consumption as in the man-machine interaction technology, thereby solving the problem of time consumption of the operation process in the man-machine interaction technology and improving the working efficiency of the terminal.
Based on the foregoing embodiments, an embodiment of the present invention provides a terminal, which may be applied to the voice processing method provided in the embodiments corresponding to fig. 1 to 3, and as shown in fig. 5, the terminal may include: a processor 41, a memory 42 and a communication bus 43;
the communication bus 43 is used for realizing communication connection between the processor 41 and the memory 42;
the processor 41 is configured to execute a program in the memory 42 for operation on the speech information to implement the steps of:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic meaning is matched with the first preset scope, outputting simplified prompt information;
the time length corresponding to the simplified prompt information is less than a specific time length; the information in the first preset action domain represents the operation which can be executed by the terminal.
In other embodiments of the present invention, processor 41 is configured to execute a speech processing program in memory 42 to perform the following steps:
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the present invention, processor 41 is configured to execute the following steps when the first semantic meaning matches the first predetermined scope in memory 42 to output the simplified hint information:
detecting whether the proficiency of the first semantic meets a preset proficiency;
if the proficiency of the first semantic meets the preset proficiency, detecting whether the first semantic is matched with the first preset scope;
and outputting simplified prompt information when the first semantic is matched with the first preset scope.
In other embodiments of the present invention, processor 41 is configured to execute detecting in memory 42 whether the proficiency of the first semantic meets the preset proficiency to implement the steps of:
acquiring historical identification information of a target semantic matched with the first semantic;
and determining whether the proficiency of the first semantic meets the preset proficiency based on the historical identification information of the target semantic.
In other embodiments of the present invention, processor 41 is configured to execute the historical recognition information based on the target semantic in memory 42 to determine whether the proficiency of the first semantic meets the preset proficiency, so as to implement the following steps:
determining the probability that the target semantics are matched with a first preset scope in the historical operation process based on the historical identification information of the target semantics;
and if the probability that the target semantics are matched with the first preset scope in the historical operation process meets the preset probability, determining that the proficiency of the first semantics meets the preset proficiency.
In other embodiments of the present invention, processor 41 is configured to execute the historical identification information based on the target semantics in memory 42, and determine a probability that the target semantics match the first preset scope during the historical operation, so as to implement the following steps:
determining a first number of times that the target semantics are matched with a first preset scope in the historical operation process based on the historical identification information of the target semantics;
determining a second number of times that the target semantics are not matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics;
based on the first number and the second number, determining the probability that the target semantics are matched with the first preset scope in the historical operation process.
In other embodiments of the present invention, processor 41 is configured to execute the steps of detecting in memory 42 whether the first semantic meaning matches the first predetermined scope, to:
analyzing the first semantic meaning to obtain a keyword;
determining a target scope matched with the keyword from the first preset scope;
detecting whether the keywords are matched with words in the target action domain;
and if the keyword is matched with the word in the target scope, determining that the first semantic is matched with the first preset scope.
In other embodiments of the present invention, processor 41 is configured to execute the operating program for voice information in memory 42 to implement the following steps:
if the first semantics are not matched with the first preset scope, acquiring a second preset scope from the server;
if the first semantic meaning is matched with the second preset scope, outputting simplified prompt information;
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the present invention, the processor 41 is configured to execute the steps of obtaining the second to-be-recognized voice information for the simplified prompt information in the memory 42, and executing a preset operation based on the second to-be-recognized voice information to implement:
acquiring second voice information to be recognized aiming at the simplified prompt information, and performing semantic recognition on the second voice information to be recognized to obtain second semantics;
if the second semantic is matched with the first preset scope, outputting simplified prompt information;
acquiring third voice information to be recognized aiming at the simplified prompt information until the voices to be recognized are acquired;
and executing preset operation based on the finally obtained voice information to be recognized.
In other embodiments of the present invention, processor 41 is configured to execute the operating program for voice information in memory 42 to implement the following steps:
acquiring information in a second preset action domain;
and updating the first preset scope based on the information in the second preset scope.
In other embodiments of the present invention, the processor 41 is configured to execute the following steps before obtaining the first speech information to be recognized in the memory 42 and performing semantic recognition on the first speech information to be recognized to obtain the first semantic meaning:
acquiring historical operation executed aiming at historical voice information to be recognized;
and generating a first preset scope based on the historical to-be-recognized voice information and the historical operation.
It should be noted that, in the embodiment of the present invention, the interaction process between the steps executed by the first processor may refer to the interaction process in the speech processing method provided in the embodiments corresponding to fig. 1 to 3, and details are not described here.
The terminal provided by the embodiment of the invention acquires the first voice information to be recognized, performs semantic recognition on the first voice information to be recognized to obtain the first semantic, outputs the simplified prompt information if the first semantic is matched with the first preset scope, and the corresponding duration of the simplified prompt information is less than the specific duration, and the information in the first preset scope represents the operation executable by the terminal.
Based on the foregoing embodiments, embodiments of the invention provide a computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic meaning is matched with the first preset scope, outputting simplified prompt information;
the time length corresponding to the simplified prompt information is less than a specific time length; the information in the first preset action domain represents the operation which can be executed by the terminal.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
detecting whether the proficiency of the first semantic meets a preset proficiency;
if the proficiency of the first semantic meets the preset proficiency, detecting whether the first semantic is matched with the first preset scope;
and outputting simplified prompt information when the first semantic is matched with the first preset scope.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
acquiring historical identification information of a target semantic matched with the first semantic;
and determining whether the proficiency of the first semantic meets the preset proficiency based on the historical identification information of the target semantic.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
determining the probability that the target semantics are matched with a first preset scope in the historical operation process based on the historical identification information of the target semantics;
and if the probability that the target semantics are matched with the first preset scope in the historical operation process meets the preset probability, determining that the proficiency of the first semantics meets the preset proficiency.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
determining a first number of times that the target semantics are matched with a first preset scope in the historical operation process based on the historical identification information of the target semantics;
determining a second number of times that the target semantics are not matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics;
based on the first number and the second number, determining the probability that the target semantics are matched with the first preset scope in the historical operation process.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
analyzing the first semantic meaning to obtain a keyword;
determining a target scope matched with the keyword from the first preset scope;
detecting whether the keywords are matched with words in the target action domain;
and if the keyword is matched with the word in the target scope, determining that the first semantic is matched with the first preset scope.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
if the first semantics are not matched with the first preset scope, acquiring a second preset scope from the server;
if the first semantic meaning is matched with the second preset scope, outputting simplified prompt information;
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
acquiring second voice information to be recognized aiming at the simplified prompt information, and performing semantic recognition on the second voice information to be recognized to obtain second semantics;
if the second semantic is matched with the first preset scope, outputting simplified prompt information;
acquiring third voice information to be recognized aiming at the simplified prompt information until the voices to be recognized are acquired;
and executing preset operation based on the finally obtained voice information to be recognized.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
acquiring information in a second preset action domain;
and updating the first preset scope based on the information in the second preset scope.
In other embodiments of the invention, the one or more programs are executable by the one or more processors to perform the steps of:
acquiring historical operation executed aiming at historical voice information to be recognized;
and generating a first preset scope based on the historical to-be-recognized voice information and the historical operation.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (13)

1. A voice processing method is applied to a terminal, and is characterized in that the method comprises the following steps:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal.
2. The method of claim 1, further comprising:
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
3. The method according to claim 1 or 2, wherein outputting a simplified prompt message when the first semantic matches a first preset scope comprises:
detecting whether the proficiency of the first semantic meets a preset proficiency;
if the proficiency of the first semantic meaning meets the preset proficiency, detecting whether the first semantic meaning is matched with the first preset scope;
and outputting simplified prompt information when the first semantic is matched with the first preset scope.
4. The method of claim 3, wherein the detecting whether the proficiency of the first semantic meets a preset proficiency comprises:
acquiring historical identification information of a target semantic matched with the first semantic;
and determining whether the proficiency of the first semantic meaning meets the preset proficiency or not based on the historical identification information of the target semantic meaning.
5. The method of claim 4, wherein determining whether the proficiency of the first semantic meaning meets a preset proficiency based on historical recognition information of the target semantic meaning comprises:
determining the probability that the target semantics are matched with the first preset scope in the historical operation process based on the historical identification information of the target semantics;
and if the probability that the target semantic is matched with the first preset scope in the historical operation process meets the preset probability, determining that the proficiency of the first semantic meets the preset proficiency.
6. The method of claim 5, wherein the determining the probability that the target semantic matches the first preset scope during the historical operation based on the historical identification information of the target semantic comprises:
determining a first number of times that the target semantics are matched with the first preset scope in a historical operation process based on historical identification information of the target semantics;
determining a second number of times that the target semantics are not matched with the first preset scope in a historical operation process based on historical identification information of the target semantics;
and determining the probability that the target semantics are matched with the first preset scope in the historical operation process based on the first times and the second times.
7. The method of claim 3, wherein the detecting whether the first semantic meaning matches the first predetermined scope comprises:
analyzing the first semantic meaning to obtain a keyword;
determining a target scope matched with the keyword from the first preset scope;
detecting whether the keywords are matched with words in the target action domain;
and if the keyword is matched with the word in the target scope, determining that the first semantic is matched with the first preset scope.
8. The method of claim 1, further comprising:
if the first semantic is not matched with the first preset scope, acquiring a second preset scope from a server;
if the first semantic meaning is matched with the second preset scope, outputting the simplified prompt information;
and acquiring second voice information to be recognized aiming at the simplified prompt information, and executing preset operation based on the second voice information to be recognized.
9. The method according to claim 1 or 7, wherein the obtaining second voice information to be recognized for the simplified prompt information and performing a preset operation based on the second voice information to be recognized comprises:
acquiring second voice information to be recognized aiming at the simplified prompt information, and performing semantic recognition on the second voice information to be recognized to obtain second semantics;
if the second semantic is matched with the first preset scope, outputting the simplified prompt message;
acquiring third voice information to be recognized aiming at the simplified prompt information until the voices to be recognized are acquired;
and executing the preset operation based on the finally obtained voice information to be recognized.
10. The method of claim 8, further comprising:
acquiring information in the second preset action domain;
and updating the first preset scope based on the information in the second preset scope.
11. The method of claim 1, wherein before obtaining the first speech information to be recognized and performing semantic recognition on the first speech information to be recognized to obtain a first semantic, the method further comprises:
acquiring historical operation executed aiming at historical voice information to be recognized;
and generating the first preset scope based on the historical to-be-recognized voice information and the historical operation.
12. A terminal, characterized in that the terminal comprises: a processor, a memory, and a communication bus;
the communication bus is used for realizing communication connection between the processor and the memory;
the processor is configured to execute a program in memory that operates on speech information to implement the steps of:
acquiring first voice information to be recognized, and performing semantic recognition on the first voice information to be recognized to obtain first semantics;
when the first semantic meaning is matched with a first preset scope, outputting simplified prompt information; the time length corresponding to the simplified prompt information is less than a specific time length; and the information in the first preset action domain represents the operation which can be executed by the terminal.
13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the speech processing method according to any one of claims 1 to 11.
CN201811228875.1A 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium Pending CN111081236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811228875.1A CN111081236A (en) 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811228875.1A CN111081236A (en) 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium

Publications (1)

Publication Number Publication Date
CN111081236A true CN111081236A (en) 2020-04-28

Family

ID=70309666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811228875.1A Pending CN111081236A (en) 2018-10-22 2018-10-22 Voice processing method, terminal and computer storage medium

Country Status (1)

Country Link
CN (1) CN111081236A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002304192A (en) * 2001-04-05 2002-10-18 Denso Corp Voice recognition device
JP2003241793A (en) * 2002-02-14 2003-08-29 Nissan Motor Co Ltd Display device, and method and program for name display
JP2006058641A (en) * 2004-08-20 2006-03-02 Nissan Motor Co Ltd Speech recognition device
CN101158584A (en) * 2007-11-15 2008-04-09 熊猫电子集团有限公司 Voice destination navigation realizing method of vehicle mounted GPS
CN101330689A (en) * 2008-07-11 2008-12-24 北京天语君锐科技有限公司 Method and device for playing prompting sound
CN102246136A (en) * 2008-12-16 2011-11-16 三菱电机株式会社 Navigation device
CN103929533A (en) * 2014-03-18 2014-07-16 联想(北京)有限公司 Information processing method and electronic equipment
CN104535074A (en) * 2014-12-05 2015-04-22 惠州Tcl移动通信有限公司 Bluetooth earphone-based voice navigation method, system and terminal
CN104699694A (en) * 2013-12-04 2015-06-10 腾讯科技(深圳)有限公司 Prompt message acquiring method and device
CN105138250A (en) * 2015-08-03 2015-12-09 科大讯飞股份有限公司 Human-computer interaction operation guide method, human-computer interaction operation guide system, human-computer interaction device and server
US9564123B1 (en) * 2014-05-12 2017-02-07 Soundhound, Inc. Method and system for building an integrated user profile
CN109215640A (en) * 2017-06-30 2019-01-15 深圳大森智能科技有限公司 Audio recognition method, intelligent terminal and computer readable storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002304192A (en) * 2001-04-05 2002-10-18 Denso Corp Voice recognition device
JP2003241793A (en) * 2002-02-14 2003-08-29 Nissan Motor Co Ltd Display device, and method and program for name display
JP2006058641A (en) * 2004-08-20 2006-03-02 Nissan Motor Co Ltd Speech recognition device
CN101158584A (en) * 2007-11-15 2008-04-09 熊猫电子集团有限公司 Voice destination navigation realizing method of vehicle mounted GPS
CN101330689A (en) * 2008-07-11 2008-12-24 北京天语君锐科技有限公司 Method and device for playing prompting sound
CN102246136A (en) * 2008-12-16 2011-11-16 三菱电机株式会社 Navigation device
CN104699694A (en) * 2013-12-04 2015-06-10 腾讯科技(深圳)有限公司 Prompt message acquiring method and device
CN103929533A (en) * 2014-03-18 2014-07-16 联想(北京)有限公司 Information processing method and electronic equipment
US9564123B1 (en) * 2014-05-12 2017-02-07 Soundhound, Inc. Method and system for building an integrated user profile
CN104535074A (en) * 2014-12-05 2015-04-22 惠州Tcl移动通信有限公司 Bluetooth earphone-based voice navigation method, system and terminal
CN105138250A (en) * 2015-08-03 2015-12-09 科大讯飞股份有限公司 Human-computer interaction operation guide method, human-computer interaction operation guide system, human-computer interaction device and server
CN109215640A (en) * 2017-06-30 2019-01-15 深圳大森智能科技有限公司 Audio recognition method, intelligent terminal and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JODI FORLIZZI等: "Where should i turn: moving from individual to collaborative navigation strategies to inform the interaction design of future navigation systems", CHI 2010: DRIVING, INTERRUPTED *

Similar Documents

Publication Publication Date Title
JP6828001B2 (en) Voice wakeup method and equipment
KR102437944B1 (en) Voice wake-up method and device
CN108831469B (en) Voice command customizing method, device and equipment and computer storage medium
CN107644638B (en) Audio recognition method, device, terminal and computer readable storage medium
US10573315B1 (en) Tailoring an interactive dialog application based on creator provided content
US9583102B2 (en) Method of controlling interactive system, method of controlling server, server, and interactive device
CN107146612B (en) Voice guidance method and device, intelligent equipment and server
CN107871503B (en) Speech dialogue system and utterance intention understanding method
US8352273B2 (en) Device, method, and program for performing interaction between user and machine
CN110970021B (en) Question-answering control method, device and system
CN108519998B (en) Problem guiding method and device based on knowledge graph
CN108446321B (en) Automatic question answering method based on deep learning
JP2008203559A (en) Interaction device and method
EP3956884B1 (en) Identification and utilization of misrecognitions in automatic speech recognition
CN111159364A (en) Dialogue system, dialogue device, dialogue method, and storage medium
CN111178081B (en) Semantic recognition method, server, electronic device and computer storage medium
CN109545203A (en) Audio recognition method, device, equipment and storage medium
CN109741744B (en) AI robot conversation control method and system based on big data search
CN113901837A (en) Intention understanding method, device, equipment and storage medium
CN111081236A (en) Voice processing method, terminal and computer storage medium
CN111104502A (en) Dialogue management method, system, electronic device and storage medium for outbound system
CN111666388A (en) Dialogue data processing method, device, computer equipment and storage medium
CN111225115A (en) Information providing method and device
CN114490972A (en) Message processing method and device, electronic equipment and storage medium
CN114596842A (en) Voice interaction method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination