CN110148414B - Voice utterance guiding method and device - Google Patents

Voice utterance guiding method and device Download PDF

Info

Publication number
CN110148414B
CN110148414B CN201910425275.2A CN201910425275A CN110148414B CN 110148414 B CN110148414 B CN 110148414B CN 201910425275 A CN201910425275 A CN 201910425275A CN 110148414 B CN110148414 B CN 110148414B
Authority
CN
China
Prior art keywords
user
target
function
current application
statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910425275.2A
Other languages
Chinese (zh)
Other versions
CN110148414A (en
Inventor
王夏鸣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Original Assignee
Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Volkswagen Mobvoi Beijing Information Technology Co Ltd filed Critical Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority to CN201910425275.2A priority Critical patent/CN110148414B/en
Publication of CN110148414A publication Critical patent/CN110148414A/en
Application granted granted Critical
Publication of CN110148414B publication Critical patent/CN110148414B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a method and a device for guiding a voice utterance. The method for guiding the voice utterance comprises the following steps: acquiring a current application page entered by a user; respectively acquiring a statement set corresponding to each function from the current application page and other application pages, wherein the current application page and other application pages have at least one function; determining a target expression set from the expression sets corresponding to the functions; and displaying at least one statement in the target statement set on a current application page so as to guide the user to carry out voice input. According to the technical scheme of the embodiment of the invention, the accuracy of understanding the intention of the user can be improved, so that the success rate of using the voice assistant is improved, and the user experience is improved.

Description

Voice utterance guiding method and device
Technical Field
The embodiment of the invention relates to a voice recognition technology, in particular to a method and a device for guiding a voice utterance.
Background
With the continuous development of voice recognition technology, various voice assistant products are more and more widely applied to the daily life of people. The voice assistant executes corresponding operations by recognizing a voice input command of a user, but because of a bottleneck of a current voice system in a Natural Language Understanding (NLU) technology and an informality of the voice input command of the user, a sufficiently high Understanding accuracy of a user intention cannot be guaranteed, and a situation that the voice assistant cannot accurately recognize the user intention to cause a task execution failure often occurs. Therefore, the voice assistant guides the user's utterance during the use process, so as to improve the function of the user and the normalization of the utterance, which becomes a key for improving the success rate of the voice assistant.
For this situation, in the prior art, fixed, instruction-based speech utterance prompting is used, that is, a user is prompted with all supported utterance sets no matter what scene the user is. The scheme has the disadvantages that the service which the user wants to use cannot be accurately obtained, and the potential service requirement of the user cannot be effectively developed, so that the explanation of the user cannot be effectively guided due to the fact that the explanation of the user cannot be pertinently prompted, and the use success rate of the voice assistant is difficult to improve.
Disclosure of Invention
The embodiment of the invention provides a method and a device for guiding a voice explanation, which are used for carrying out targeted explanation guidance according to functions required by a user, effectively developing the potential service requirements of the user and improving the use success rate of a voice assistant.
In a first aspect, an embodiment of the present invention provides a method for guiding a speech utterance, where the method includes:
acquiring a current application page entered by a user;
respectively acquiring a statement set corresponding to each function from the current application page and other application pages, wherein the current application page and other application pages have at least one function;
determining a target expression set from the expression sets corresponding to the functions;
and displaying at least one statement in the target statement set on a current application page so as to guide the user to carry out voice input.
In a second aspect, an embodiment of the present invention further provides a method for guiding a speech utterance, where the method includes:
acquiring voice input by a user on a current application page, and performing semantic recognition on the voice;
if the set intention semantics are recognized from the voice, determining a target expression set from expression sets corresponding to the functions of the current application page and other application pages, wherein the current application page and other application pages have at least one function;
and adjusting the semantic recognition result according to at least one of the target expressions.
In a third aspect, an embodiment of the present invention further provides a device for guiding speech utterance, where the device includes:
the application page acquisition module is used for acquiring a current application page entered by a user;
the system comprises a statement set acquisition module, a statement set acquisition module and a statement set processing module, wherein the statement set acquisition module is used for respectively acquiring statement sets corresponding to various functions from the current application page and other application pages, and the current application page and the other application pages have at least one function;
the target statement set determining module is used for determining a target statement set from the statement sets corresponding to the functions respectively;
and the utterance display module is used for displaying at least one utterance in the target utterance set on a current application page so as to guide the user to perform voice input.
In a fourth aspect, an embodiment of the present invention further provides a device for guiding speech utterance, where the device includes:
the semantic recognition module is used for acquiring the voice input by the user on the current application page and performing semantic recognition on the voice;
a target expression set determining module, configured to determine a target expression set from expression sets corresponding to respective functions of a current application page and other application pages if a set intention semantic is recognized from the speech, where the current application page and other application pages have at least one function;
and the recognition result adjusting module is used for adjusting the semantic recognition result according to at least one of the target expressions.
According to the technical scheme, before the voice input of the user on the current application page is obtained, the current application interface entered by the user is obtained, the target statement set is determined from the statement sets corresponding to the functions of the current application page and other application pages, and finally at least one statement in the target statement set is displayed on the current application page so as to guide the user to carry out the voice input; after voice input by a user on a current application page is acquired, semantic recognition is carried out on the voice, if set intention semantics are recognized from the voice, a target language set is determined from language sets respectively corresponding to functions of the current application page and other application pages, finally, a semantic recognition result is adjusted according to at least one language in the target language set, language guidance is carried out on the functions required to be used by the user, the success rate of voice assistant use is improved, and user experience is improved.
Drawings
FIG. 1 is a flow chart of a method for guiding a speech utterance according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a method for guiding a speech utterance according to a second embodiment of the present invention;
FIG. 3 is a flow chart of a method for guiding a speech utterance according to a third embodiment of the present invention;
FIG. 4 is a flow chart of a method for guiding a speech utterance according to a fourth embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a speech utterance guidance apparatus according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a speech utterance guidance apparatus according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for guiding speech utterance according to a first embodiment of the present invention, where the technical solution of this embodiment is suitable for a situation where a user guesses a user's intention according to an application page scene where the user is located before the user inputs speech, so as to perform the guidance for the speech utterance, and the method may be executed by a speech utterance guiding apparatus, and the apparatus may be implemented by software and/or hardware, and may be integrated in various general-purpose computer devices.
The speech utterance guidance device provided by the embodiment is configured with a plurality of application pages, including but not limited to a system main page and pages provided by various types of application programs, including but not limited to a music page, a broadcast page, a program page, and the like. Optionally, the system home page has more than two functions, for example, a function of jumping to a certain application home page, a function of time broadcasting, a function of weather broadcasting, and the like. The page provided by the application program may have only one function, for example, a music playing page provided by the music application program has a function of playing music, and a video playing page provided by the video application program has a function of playing video. Of course, the page provided by the application program may have more than two functions, and the invention is not limited thereto.
In this embodiment, each function is configured with a corresponding utterance set, and the utterance set includes a plurality of utterances that match the corresponding function. For example, the statement set corresponding to the function of playing music includes: i want to listen to songs on the ranking list, i want to listen to the cradle music, i want to listen to a song of a singer, etc.
Based on the above page configuration and description set configuration of the speech description guidance device, the method provided in this embodiment specifically includes the following steps:
and step 110, acquiring a current application page entered by the user.
In this embodiment, the user needs to input voice on a certain application page. Then, the user can start the voice speaking guiding device to enter the current application page; or jump from other application pages to the current application page. The current application page may be a system home page or a page provided by various types of applications.
The voice utterance guiding device can periodically acquire the current application page, or acquire the current application page when a skip event of the application page is monitored.
And step 120, respectively acquiring a statement set corresponding to each function from the current application page and other application pages, wherein the current application page and other application pages have at least one function.
The present embodiment may refer to a page other than the current application page among application pages configured by the speech utterance guidance apparatus as another application page. The number of other application pages is at least one. Based on that each application page has at least one function, acquiring a statement set corresponding to each function from each application page (including the current application page and other application pages).
For example, if the current application page is a broadcast page, first, the functions of the broadcast page and other non-broadcast pages are obtained, for example, the function of playing the broadcast of the broadcast page, and the function of playing music of a music playing page in the non-broadcast page. And acquiring the statement sets respectively corresponding to the functions.
And step 130, determining a target expression set from the expression sets corresponding to the functions.
The target expression set is an expression set determined according to a preset standard from expression sets corresponding to all functions and used for expressing and guiding the user. Illustratively, if the preset standard is a statement set corresponding to a common function, determining the common function according to the historical use times of the statement set corresponding to each function, and determining the statement set corresponding to the common function as a target statement set; alternatively, the target expression set may be randomly determined from the expression sets corresponding to the respective functions.
The target expression set can be an expression set corresponding to the functions of the current application page, and can also be an expression set corresponding to the functions of other application pages. For convenience of description and distinction, a function corresponding to a target expression set may be referred to as a target function.
Step 140, at least one utterance in the target utterance set is displayed on the current application page to guide the user to perform voice input.
In this embodiment, the target utterance set includes a plurality of utterances that match the target function. Optionally, visually displaying at least one statement in the target statement set in a statement guide area of the current application page, wherein the visual display forms include but are not limited to a text form and a picture form; or, at least one utterance is played in sequence in a voice form on the current application page.
When the introduction guide area is visually displayed, all the descriptions in the target description set may not be displayed due to the limited display space, and therefore, a specified number of descriptions are randomly selected from the target description set for display according to the size of the actual display space. For example, the number of the designations may be set to 3.
In a scenario where a user uses the speech utterance guidance device for speech input, the user activates the speech utterance guidance device and enters a current application page, such as a music playing page. At this time, the speech utterance guidance device acquires the music playing page and acquires an utterance set corresponding to each function in the music playing page and other application pages. Then, the speech utterance guidance device may determine, according to a preset criterion, an utterance set corresponding to the music playing function as a target utterance set. Then, at least one utterance in the target utterance set, for example, a song that i wants to listen to B singers, a song that i wants to listen to hot, and the like, is presented on the music play page. Therefore, according to the scene of the current application page where the user is located, the user is guided to use the function of the current application page, the service which the user wants to use is accurately obtained, and the pertinence language prompting is carried out. Certainly, the voice speaking guidance may also determine a target speaking set corresponding to the broadcast playing function in the broadcast page according to a preset standard, and then show at least one speaking in the target speaking set, for example, i want to listen to news broadcast, i want to listen to sports broadcast, and the like. Therefore, according to the scene of the current application page where the user is located, the user is guided to use the functions of other pages, the potential service requirements of the user are effectively developed, and then pertinence prompting is carried out.
After seeing or hearing at least one utterance, the user can learn the functions of the current application page or the functions of other application pages under the guidance of the utterance and input voice to the current application page. Thus, the user is guided before the user may make mistakes, and the error of inputting voice is avoided. In most scenes, a user needs to input a section of voice continuously or discontinuously, and at least one utterance can be displayed in real time in the process of inputting the voice by the user, so that the utterance of the user can be corrected and corrected in time in the process of inputting the voice by the user, and errors in inputting the voice are avoided.
In this embodiment, the speech utterance guidance apparatus is configured with a plurality of application pages, each application page includes at least one function, and each function corresponds to one utterance set. Based on the configuration, the current application page is obtained, the target statement set is determined from the statement sets corresponding to the functions of the application pages, at least one statement in the target statement set is displayed, and therefore statement guidance is conducted according to the scene of the current application page where the user is located. The method comprises the steps that a user is guided to use the function of a current application page by displaying a statement corresponding to the current application page, the service which the user wants to use is accurately obtained, and a specific statement prompt is carried out; through showing the explanation that other application pages correspond, the guide user uses the function of other pages, effectively develops user's potential service demand, carries out the suggestion of pertinence explanation, avoids user input pronunciation mistake simultaneously, improves the use success rate of pronunciation explanation guiding device.
Example two
The embodiment is further detailed on the basis of the above embodiment, and provides a preferable preset standard according to which the target expression set is determined from the expression sets corresponding to the functions, that is, the historical use progress of the functions by the user. Optionally, from the statement sets corresponding to the functions, a target statement set is determined according to the historical use progress of the user for the functions.
The historical usage progress of each function by the user includes but is not limited to: the user has not used any of the functions of the current application page and other application pages, the user has used some of the functions of the current application page and/or other application pages, and the user has used all of the functions of the current application page and other application pages.
The user using a function means that the user inputs voice once and calls the function, and the voice input by the user may or may not exist in the description set corresponding to the function. As long as the function is called by inputting voice, the user is considered to have used the function.
Fig. 2 is a flowchart of a method for guiding a speech utterance according to a second embodiment of the present invention, and a method for determining a target utterance set in different historical usage schedules is described in detail below with reference to fig. 2.
And step 210, acquiring a current application page entered by the user.
Step 220, obtaining a statement set corresponding to each function from the current application page and other application pages respectively, wherein the current application page and other application pages comprise at least one function. Any of steps 230-260 is performed.
And step 230, if the user does not use any function in the current application page and other application pages, determining a target expression set corresponding to the function in the current application page.
In this embodiment, if the user does not use any function, the target expression set corresponding to the function of the application page where the user is currently located is determined. Namely, when the current user is a new user, the user is guided by speaking aiming at the target expression set corresponding to the function contained in the application page where the user is currently located.
For example, when the current user is a new user and enters a music playing page, the functions included in the music playing page, for example, the music playing function and the music search function, are first confirmed, and then a target expression set is constructed according to an expression set corresponding to the music playing function and/or the music search function. That is, the statement sets corresponding to all or part of the functions included in the music playing page are combined into the target statement set.
Step 240, if the user has used some functions in the current application page and/or other application pages, determining a target expression set corresponding to the unused functions.
In this embodiment, if the user has used some functions in the current application page and/or other application pages, but has some functions that have not been used, the user is guided to speak about the target expression set corresponding to the functions that have not been used by the user, so as to guide the user to use the functions that have not been used.
For example, when the current user uses some but not all functions in the application page, and the user enters the music playing page, it is first confirmed that the current user has not used the functions. For example, the user does not use the music playing function on the music playing page, and then determines the target expression set according to the expression set corresponding to the music playing function, that is, the expression sets corresponding to the functions that the user does not use on the current application page are combined into the target expression set. For example, when the user has used a speech set corresponding to all functions of a music playback page but has not used a speech set corresponding to a broadcast playback function of a broadcast page, a target speech set is determined from the speech set corresponding to the broadcast playback function.
The following describes a method for determining a target expression set corresponding to an unused function in a form of a text-combined table.
And when the user inputs voice on the current application page according to the guidance of the explanation and finishes the use of the functions, counting the functions used by the user under the current application page and the use times of each function. For example, under page 1, function 1 is used a times. It should be noted that in the statistical process, different descriptions of the same function are not distinguished, that is, when the user calls the same function twice and uses different descriptions, the number of times of using different descriptions is not counted respectively, but it is counted that the function is used 2 times under the current page, and a specific statistical table is shown in table 1.
TABLE 1
Number of times of use of function Function 1 Function 2 Function 3 Function N
Page 1 A B C D
Page 2 E F G H
Page 3 I J K L
Page N M N O P
It is clear from table 1 which functions have been used and which have not. Based on table 1 and the expression sets corresponding to the respective functions, the total expression set composed of the expression sets corresponding to all the functions, the expression set corresponding to the used function, and the expression set corresponding to the unused function in each page are counted, and table 2 is generated.
In table 2, the total set of expression sets corresponding to all functions in page 1 is YM1, the set of expression sets corresponding to functions that have already been used is YY1, and the expression sets corresponding to functions that have not been used by the user, that is, YM1-YY1, are determined from the above set. YM1-YY1 represents a target expression set composed of expression sets respectively corresponding to unused functions.
TABLE 2
Talking set General set of statements Used function correspondence Corresponding to unused functions
Page 1 YM1 YY1 YM1-YY1
Page 2 YM2 YY2 YM2-YY2
Page 3 YM3 YY3 YM3-YY3
Page N YM4 YY4 YM4-YY4
And step 250, if the user uses all functions in the current application page and other application pages, determining a target expression set corresponding to the frequently-used functions of the user in the current application page, wherein the frequently-used functions of the user in the current application page are determined according to the historical use times of the user for each function in the current page.
In this embodiment, if the user has used all functions in the current application page and other application pages, the target expression set corresponding to the commonly used function of the user in the current application page is determined according to the user history usage record (the statistical result in table 1). This is because, when the user has used all the functions, the user has already formed his own usage habit, and at this time, the user does not need to recommend an unused function for the user any more, but the user is guided to speak the target language set corresponding to the commonly used function of the user in the current page.
Illustratively, the current user has used all functions, and when the user enters the music playing page, the current user's frequently used functions are first confirmed. For example, a common function of the user on the music playing page is a music playing function, and then the statement set corresponding to the music playing function is determined as a target statement set. The function commonly used by the user in the current application page may be a function which is used for the most times in the current application page in the historical mode, or a function which is used for the most frequently in the specified historical time period.
For example, when a user who has used all the functions enters page 1, a target expression set corresponding to the common functions is obtained from the statistical result of the historical usage record of the user (i.e., table 1), and a specified number of expressions are randomly selected from the target expression set to guide the user. Specifically, the method for acquiring the common functions in different pages is shown in table 3.
TABLE 3
History of functions Function 1 Function 2 Function 3 Function N Common functions
Page 1 A B C D Max{A,B,C,D}
Page 2 E F G H Max{E,F,G,H}
Page 3 I J K L Max{I,J,K,L}
Page N M N O P Max{M,N,O,P}
And step 260, if the user uses all the functions in the current application page and other application pages, determining an updated target expression set and/or a target expression set corresponding to the newly added target function, wherein the updated target expression set is updated according to the new expressions and/or the common expressions of other users in the current application page.
In an optional implementation mode, a background of the voice utterance guiding device collects utterances input by other users when the users use the functions of the current application page in real time, and counts the use times of the utterances; and taking the expression that the number of times of use is greater than or equal to the threshold value of the number of times of use or the expression that the number of times of use is ranked at the top as a common expression. Updating the common descriptions to the corresponding description sets to obtain target description sets; and/or updating the new speech uttered by the speech utterance guiding device to the corresponding utterance set to obtain the target utterance set.
In another optional implementation, the speech utterance guidance device may not periodically bring up a new application page, or bring up a new function in an original application page, and then construct a target utterance set according to an utterance set corresponding to the function of the speech utterance guidance device that newly brings up the line.
In an example, on the program playing page, the statement set corresponding to the program playing function in the current application page is updated according to the common statement "i want to listen to the phase" of other users, that is, the common statement is added to the statement set corresponding to the program playing function. In another example, if the speech utterance guidance apparatus adds a new listening book function, the utterance set corresponding to the listening book function is used as the target utterance set.
Optionally, if the user has used all functions in the current application page and other application pages, determining an updated target expression set or a target expression set corresponding to the newly added target function includes: and if the user uses all the functions in the current application page and other application pages and the deactivation time length of the voice recognition function by the user exceeds the preset time length or the use frequency of the voice recognition function is lower than the preset frequency, determining the updated target statement set and/or the newly added target statement set corresponding to the target function.
In this optional technical solution, when the user who has used all the functions has a deactivation time longer than a preset time or a usage frequency lower than a preset frequency, the target expression set is updated according to the new expression and/or the common expressions of other users in the current application page, or the target expression set is determined according to the expression set corresponding to the new online function. For example, the preset time period may be set to 7 days, and the preset frequency may be set to less than 4 times within 30 days.
The following describes the determination method of common expressions of other users in the form of a text-combined table.
The voice saying guiding device uploads the use information of all users to the cloud for synchronous collection in the use process of all users. Summarizing details to say that each page corresponds to, for example, under page 1, say 1 is used N _11 times, and a specific statistical table is shown in table 4. In table 4, the term that the other users have the most use times under the current application page is taken as a common term.
TABLE 4
Figure BDA0002067304370000131
And 270, displaying at least one statement in the target statement set on the current application page to guide the user to perform voice input.
According to the technical scheme, the target statement set is determined according to the historical use progress of the user for each function from the statement sets corresponding to each function, and the target statement set is pertinently guided according to the historical use progress of the user and the scene of the current application page where the user is located, so that the accuracy of understanding the intention of the user is improved, the use success rate of the voice assistant is improved, and the user experience is improved.
EXAMPLE III
Fig. 3 is a flowchart of a speech utterance guidance method in a third embodiment of the present invention, where the technical solution of this embodiment is suitable for a case where, in a process of inputting speech by a user, an intention of the speech that has been input by the user is understood to perform a specific utterance prompt, and the method may be executed by a speech utterance guidance apparatus, and the apparatus may be implemented by software and/or hardware, and may be integrated in various general-purpose computer devices. It should be noted that the configuration of the application page, the functions of the application page, and the descriptions corresponding to the functions in the speech utterance guidance device are described in detail in the foregoing embodiments, and are not repeated herein.
With reference to fig. 3, the method provided in this embodiment specifically includes the following steps:
and 310, acquiring the voice input by the user on the current application page, and performing semantic recognition on the voice.
The semantic recognition is to recognize natural language through natural language recognition technology, so as to analyze user semantics according to recognized user voice input, thereby carrying out pertinence guidance on the user.
In the embodiment, after the voice input by the user on the current application page is acquired, the voice input by the user is subjected to semantic recognition, converted into characters and displayed on the current application page in real time. Wherein the semantic recognition comprises speech recognition of the user input and understanding of the user's semantics.
And 320, if the set intention semantics are recognized from the voice, determining a target expression set from expression sets corresponding to the functions of the current application page and other application pages, wherein the current application page and other application pages have at least one function.
When the user starts to input voice, the voice input content of the user is recognized, and when the set semantic meaning with the intention is recognized, the determining operation of the target language set is triggered to perform the targeted guidance on the user.
In one case, the set intent semantics may indicate a function that the user wants to use, and the target expression set is determined from the expression set corresponding to the function indicated by the set intent semantics. For example, if it is recognized that the voice input by the user is "i want to listen", where "listen" is semantic meaning with intention, functions related to "listen" are first screened out from the respective functions, and then a target utterance set is determined from an utterance set corresponding to the functions related to "listen". Therefore, the user intention can be accurately understood, and the method is favorable for pertinence guidance of the user.
In another case, if the set ideogram does not indicate the function that the user wants to use, the target expression set is determined according to the preset standard from the expression set corresponding to each function. For example, if it is recognized that the speech input by the user is "hello, i do not know what", where "do not know what" is semantic with the intention of requesting guidance, a target utterance set is randomly determined from an utterance set corresponding to each function, or an utterance set corresponding to a commonly used function is determined.
And step 330, adjusting the semantic recognition result according to at least one of the expressions in the target expression set.
In this embodiment, after the target utterance set is determined, the utterance recognition result is adjusted according to the utterance in the target utterance set, and the adjusted result is displayed on the current application page to guide the user to complete the remaining speech input.
According to the technical scheme, the voice utterance guiding device is provided with a plurality of application pages, each application page comprises at least one function, and each function corresponds to one utterance set. Based on the configuration, voice input by a user on the current application page is obtained, semantic recognition is carried out on the voice, if the set intention semantic is recognized from the voice, a target description set is determined from description sets corresponding to functions of the current application page and other application pages, and a semantic recognition result is adjusted according to at least one description in the target description set, so that the user is guided in a targeted manner according to the scene of the current application page where the user is located and the voice input by the user. The semantic recognition result is adjusted according to the corresponding statement of the current application page, so that the user is guided to use the function of the current application page, the service which the user wants to use is accurately obtained, and the specific statement prompt is carried out; the semantic recognition result is adjusted according to the corresponding descriptions of other application pages, the user is guided to use the functions of other pages, the potential service requirements of the user are effectively developed, the pertinence description prompting is carried out, meanwhile, the user input voice error is avoided, and the use success rate of the voice description guiding device is improved.
Example four
The embodiment is further detailed on the basis of the above embodiment, and provides a preferable preset standard according to which a target expression set is determined from expression sets corresponding to the functions of the current application page and other application pages, that is, a historical use progress of each function by a user. Optionally, from the statement sets corresponding to the functions of the current application page and other application pages, the target statement set is determined according to the historical use progress of the user for the functions.
The history use progress of each function by the user is described in the above embodiment, and is not described again here.
Fig. 4 is a flowchart of a method for guiding a speech utterance according to a fourth embodiment of the present invention, and a method for determining a target utterance set in different historical usage schedules is described in detail below with reference to fig. 4.
And step 410, acquiring the voice input by the user on the current application page, and performing semantic recognition on the voice.
Step 420, judging whether the set intention semanteme is recognized from the voice, if so, executing any item in the step 430-step 460; if not, return to execute step 410.
The set ideogram is a semantic corresponding to the voice containing the user intention, and is used for guiding the user in a targeted mode according to the user intention so that the user can complete the rest voice input.
And 430, if the user does not use any function in the current application page and other application pages, determining a target expression set corresponding to the function in the current application page.
Optionally, if the user does not use any function, determining a target expression set corresponding to the currently identified set ideogram corresponding to the function of the current application page where the user is located. Namely, when the current user is a new user, the user is guided to the recognized set ideogram semanteme and the target expression set corresponding to the function contained in the application page where the user is currently located.
Illustratively, when recognizing that a voice input by a user is 'i want to listen', and the user is on a music playing page, firstly confirming functions, such as a music playing function and a music searching function, included in the music playing page where the user is currently located, and then constructing a target statement set according to statements corresponding to the 'i want to listen' in the statement sets corresponding to the functions included in the music playing page.
Step 440, if the user has used some functions in the current application page and/or other application pages, determining a target expression set corresponding to the unused functions.
Optionally, if the user has used some but not all of the functions, the user is guided to speak according to the currently recognized set intention semantics and the target language set corresponding to the function that the user has not used.
For example, when the user uses part of, but not all, functions and enters the music playing page, the unused functions of the user are first confirmed through the statistical results in table 2. For example, if the user does not use the music playing function on the music playing page and the identified semantic meaning of setting intent is "i want to listen", a target expression set is constructed according to the expression set corresponding to the music playing function, so as to guide the user to use the unused function.
And step 450, if the user uses all functions in the current application page and other application pages, determining a target expression set corresponding to the frequently-used functions of the user in the current application page, wherein the frequently-used functions of the user in the current application page are determined according to the historical use times of the user for each function in the current page.
Alternatively, if the user has used all functions in the speech recognition system, a target expression set corresponding to the commonly used functions of the user in the current application page is determined according to the recognized set ideogram semantics (see the statistical results of table 3 for the commonly used functions).
Illustratively, when the current user has used all functions and the user enters the music playing page, the current user's frequently used functions are first confirmed. For example, if the user wants to use a function related to "listening" in the music playing page, and the statement set corresponding to the music playing function in the current page includes the identified set intent semantic, a target statement set is constructed according to the statement set corresponding to the music playing function in the current page.
Step 460, if the user has used all the functions in the current application page and other application pages, determining an updated target expression set and/or a target expression set corresponding to the newly added target function, where the updated target expression set is updated according to the new expressions and/or the common expressions of other users in the current application page.
In this embodiment, if the user has used all the functions, the target utterance set is updated according to the new utterance and/or the commonly used utterances of other users, or the current target utterance set of the user is determined according to the utterance set corresponding to the function of the speech utterance guiding device that is newly on-line.
Optionally, if the user has used all functions in the current application page and other application pages, determining the updated target statement set and/or the target statement set corresponding to the newly added target function, including:
if the user has used all functions in the current application page and other application pages and recognized a pause or thought tone word of a preset duration from the speech, the updated target statement set and/or the target statement set corresponding to the newly added target function are determined (see the statistical results in table 4 for common statements).
In this optional technical solution, after a user who has used all functions starts to perform voice input, a pause of a preset time duration is detected, or a mood word, such as "en … …", "quota … …", is considered, and then the target expression set is updated according to the new expression and/or the common expressions of other users, or the current target expression set of the user is determined according to the expression set corresponding to the newly online function. For example, the preset time period may be set to 2 seconds.
And 470, supplementing at least one statement in the target statement set in the semantic recognition result, or correcting the semantic recognition result by adopting at least one statement in the target statement set.
And adjusting the semantic recognition result displayed on the screen, wherein the semantic recognition result is supplemented or corrected according to the descriptions in the target description set, and the descriptions in the target description set can be supplemented at different positions before, in the middle and after the semantic recognition result when the semantic recognition result is adjusted.
Illustratively, the current semantic recognition result is "i want to listen to", and the determined target statement set contains "i want to listen to the traffic broadcast", so that the semantic recognition result is supplemented by the statement to guide the user to complete the rest of voice input.
Illustratively, the current semantic recognition result is 'my want to listen to a happy song', the determined target language set contains 'my want to listen to a popular song', and the speech recognition result is corrected through the language so as to guide the user to re-input the correct speech.
And step 480, carrying out differential display on the semantic recognition result and at least one of the target expressions.
After the semantic recognition result is adjusted according to at least one of the expressions in the target expression set, the final adjustment result is displayed on the screen, and the semantic recognition result and the target expression set are distinguished, for example, at least one of the expressions in the target expression set is subjected to font thickening or highlighting processing, so that the user can clearly distinguish the input content and the expression guide content, and correct voice input is completed according to the guide content.
According to the technical scheme, the target statement set is determined from the statement sets corresponding to the functions of the current application page and other application pages according to the historical use progress of the user on the functions, and the targeted statement guidance is performed according to the identified and set intention semantics, the scene of the current application page where the user is located and the historical use progress, so that the accuracy of understanding the intention of the user is improved, the use success rate of the voice assistant is improved, and the user experience is improved.
EXAMPLE five
Fig. 5 is a schematic structural diagram of a speech utterance guidance device according to a fifth embodiment of the present invention, where the speech utterance guidance device includes: an application page acquisition module 510, an expression set acquisition module 520, a target expression set determination module 530, and an expression presentation module 540.
An application page obtaining module 510, configured to obtain a current application page entered by a user;
an expression set obtaining module 520, configured to obtain expression sets corresponding to the functions from the current application page and other application pages, respectively, where the current application page and other application pages have at least one function;
a target expression set determining module 530, configured to determine a target expression set from expression sets corresponding to the respective functions;
and the utterance display module 540 is configured to display at least one utterance in the target utterance set on a current application page, so as to guide the user to perform voice input.
According to the technical scheme, before the voice input of the user on the current application page is obtained, the current application interface entered by the user is obtained, the target statement set is determined from the statement sets corresponding to the functions of the current application page and other application pages, at least one statement in the target statement set is displayed, and therefore the statement guidance is carried out according to the scene of the current application page where the user is located. The method comprises the steps that a user is guided to use the function of a current application page by displaying a statement corresponding to the current application page, the service which the user wants to use is accurately obtained, and a specific statement prompt is carried out; through showing the explanation that other application pages correspond, the guide user uses the function of other pages, effectively develops user's potential service demand, carries out the suggestion of pertinence explanation, avoids user input pronunciation mistake simultaneously, improves the use success rate of pronunciation explanation guiding device.
Optionally, the target expression set determining module 530 is specifically configured to:
and determining a target statement set from the statement sets corresponding to the functions according to the historical use progress of the user on the functions.
Optionally, the target utterance set determining module 530 includes:
a first target expression set determining unit, configured to determine a target expression set corresponding to a function in a current application page if the user does not use any function in the current application page and another application page;
the second target expression set determining unit is used for determining a target expression set corresponding to an unused function if the user uses partial functions in the current application page and/or other application pages;
a third target expression set determining unit, configured to determine, if the user has used all functions in the current application page and other application pages, a target expression set corresponding to a frequently-used function of the user in the current application page, where the frequently-used function of the user in the current application page is determined according to the historical usage times of the user for each function in the current page;
and a fourth target expression set determining unit, configured to determine, if the user has used all functions in the current application page and other application pages, an updated target expression set and/or a target expression set corresponding to the newly added target function, where the updated target expression set is updated according to the new expression and/or the common expressions of other users in the current application page.
Optionally, the fourth target utterance set determining unit is specifically configured to:
and if the user uses all the functions in the current application page and other application pages and the stop time of the user for the voice recognition function exceeds the preset time or the use frequency of the user is lower than the preset frequency, determining the updated target statement set and/or the newly added target statement set corresponding to the target function.
The voice utterance guidance device provided by the embodiment of the invention can execute the voice utterance guidance method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
Fig. 6 is a schematic structural diagram of a speech utterance guidance device according to a sixth embodiment of the present invention, where the speech utterance guidance device includes: a semantic recognition module 610, a target expression set determination module 620 and a recognition result adjustment module 630.
The semantic recognition module 610 is configured to acquire a voice input by a user on a current application page, and perform semantic recognition on the voice;
a target expression set determining module 620, configured to determine a target expression set from expression sets corresponding to respective functions of the current application page and other application pages if the set intention semantics are recognized from the speech;
and the recognition result adjusting module 630 is configured to adjust the semantic recognition result according to at least one utterance in the target utterance set.
According to the technical scheme, the voice input by the user on the current application page is obtained, the voice is subjected to semantic recognition, if the set intention semantic is recognized from the voice, a target statement set is determined from statement sets corresponding to functions of the current application page and other application pages, and a semantic recognition result is adjusted according to at least one statement in the target statement set, so that the user is guided in a targeted manner according to the scene of the current application page where the user is located and the voice input by the user. The semantic recognition result is adjusted according to the corresponding statement of the current application page, so that the user is guided to use the function of the current application page, the service which the user wants to use is accurately obtained, and the specific statement prompt is carried out; the semantic recognition result is adjusted according to the corresponding descriptions of other application pages, the user is guided to use the functions of other pages, the potential service requirements of the user are effectively developed, the pertinence description prompting is carried out, meanwhile, the user input voice error is avoided, and the use success rate of the voice description guiding device is improved.
Optionally, the target utterance set determining module 620 is specifically configured to:
and determining a target statement set according to the historical use progress of the user on each function from statement sets corresponding to each function of the current application page and other application pages.
Optionally, the target utterance set determining module 620 includes:
a first target expression set determining unit, configured to determine a target expression set corresponding to a function in a current application page if the user does not use any function in the current application page and another application page;
the second target expression set determining unit is used for determining a target expression set corresponding to an unused function if the user uses partial functions in the current application page and/or other application pages;
a third target expression set determining unit, configured to determine, if the user has used all functions in the current application page and other application pages, a target expression set corresponding to a frequently-used function of the user in the current application page, where the frequently-used function of the user in the current application page is determined according to the historical usage times of the user for each function in the current page;
and a fourth target expression set determining unit, configured to determine, if the user has used all functions in the current application page and other application pages, an updated target expression set and/or a target expression set corresponding to the newly added target function, where the updated target expression set is updated according to the new expression and/or the common expressions of other users in the current application page.
Optionally, the fourth target utterance set determining unit is specifically configured to:
and if the user uses all functions in the current application page and other application pages and recognizes a pause or thought tone word with preset time length from the voice, determining an updated target statement set and/or a target statement set corresponding to the newly added target function.
Optionally, the recognition result adjusting module 630 includes:
the recognition result adjusting unit is used for supplementing at least one statement in the target statement set in the semantic recognition result or correcting the semantic recognition result by adopting at least one statement in the target statement set;
and the distinguishing display unit is used for distinguishing and displaying the semantic recognition result and at least one of the descriptions in the target description set.
The voice utterance guidance device provided by the embodiment of the invention can execute the voice utterance guidance method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
It should be noted that, in the embodiment of the above-mentioned speech utterance guidance apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (5)

1. A method for guiding speech utterance, comprising:
acquiring a current application page entered by a user;
respectively acquiring a statement set corresponding to each function from the current application page and other application pages, wherein the current application page and other application pages have at least one function;
determining a target statement set according to the historical use progress of the user on each function from the statement sets corresponding to each function; the target expression set consists of an expression set corresponding to the function of the current application page and/or expression sets corresponding to the functions of other application pages;
displaying at least one statement in the target statement set on a current application page to guide the user to perform voice input;
the determining a target expression set according to the historical use progress of the user on each function from the expression sets corresponding to each function comprises the following steps:
and if the user uses partial functions in the current application page and/or other application pages, determining a target expression set corresponding to the unused functions.
2. A method for guiding speech utterance, comprising:
acquiring voice input by a user on a current application page, and performing semantic recognition on the voice;
if the set intention semantics are recognized from the voice, determining a target expression set from expression sets corresponding to the functions of the current application page and other application pages according to the historical use progress of the user on the functions, wherein the current application page and other application pages have at least one function;
adjusting semantic recognition results according to at least one of the target speech sets, and displaying the adjustment results;
the determining a target statement set according to the historical use progress of the user on each function from the statement sets corresponding to each function of the current application page and other application pages comprises:
and if the user uses partial functions in the current application page and/or other application pages, determining a target expression set corresponding to the unused functions.
3. The method of claim 2, wherein said adjusting semantic recognition results according to at least one of said set of target utterances and displaying the adjustment results comprises:
supplementing at least one statement in the target statement set in the semantic recognition result, or correcting the semantic recognition result by adopting at least one statement in the target statement set;
and carrying out differential display on the semantic recognition result and at least one of the target expression set.
4. A speech utterance guidance apparatus, comprising:
the application page acquisition module is used for acquiring a current application page entered by a user;
the system comprises a statement set acquisition module, a statement set acquisition module and a statement set processing module, wherein the statement set acquisition module is used for respectively acquiring statement sets corresponding to various functions from the current application page and other application pages, and the current application page and the other application pages have at least one function;
the target statement set determining module is used for determining a target statement set according to the historical use progress of the user on each function from the statement sets corresponding to each function; the target expression set consists of an expression set corresponding to the function of the current application page and/or expression sets corresponding to the functions of other application pages;
the utterance display module is used for displaying at least one utterance in the target utterance set on a current application page so as to guide the user to perform voice input;
the target utterance set determination module includes:
and the second target expression set determining unit is used for determining a target expression set corresponding to the unused function if the user uses partial functions in the current application page and/or other application pages.
5. A speech utterance guidance apparatus, comprising:
the semantic recognition module is used for acquiring the voice input by the user on the current application page and performing semantic recognition on the voice;
the target statement set determining module is used for determining a target statement set from the statement sets corresponding to the functions of the current application page and other application pages according to the historical use progress of the user on the functions if the set intention semantics are recognized from the voice;
the recognition result adjusting module is used for adjusting the semantic recognition result according to at least one of the target grammage set and displaying the adjustment result;
the target utterance set determination module includes:
and the second target expression set determining unit is used for determining a target expression set corresponding to the unused function if the user uses partial functions in the current application page and/or other application pages.
CN201910425275.2A 2019-05-21 2019-05-21 Voice utterance guiding method and device Active CN110148414B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910425275.2A CN110148414B (en) 2019-05-21 2019-05-21 Voice utterance guiding method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910425275.2A CN110148414B (en) 2019-05-21 2019-05-21 Voice utterance guiding method and device

Publications (2)

Publication Number Publication Date
CN110148414A CN110148414A (en) 2019-08-20
CN110148414B true CN110148414B (en) 2021-06-29

Family

ID=67592574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910425275.2A Active CN110148414B (en) 2019-05-21 2019-05-21 Voice utterance guiding method and device

Country Status (1)

Country Link
CN (1) CN110148414B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581946B (en) * 2019-09-29 2024-08-16 百度在线网络技术(北京)有限公司 Voice control method, voice control device, electronic equipment and readable storage medium
CN113703621A (en) * 2021-02-26 2021-11-26 腾讯科技(深圳)有限公司 Voice interaction method, storage medium and equipment
CN114115790A (en) * 2021-11-12 2022-03-01 上汽通用五菱汽车股份有限公司 Voice conversation prompting method, device, equipment and computer readable storage medium
CN118486310B (en) * 2024-07-16 2024-09-20 成都赛力斯科技有限公司 Vehicle-mounted voice guiding method and device and vehicle-mounted terminal

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117195A (en) * 2015-09-09 2015-12-02 百度在线网络技术(北京)有限公司 Method and device for guiding voice input
US20150379993A1 (en) * 2014-06-30 2015-12-31 Samsung Electronics Co., Ltd. Method of providing voice command and electronic device supporting the same
CN109325097A (en) * 2018-07-13 2019-02-12 海信集团有限公司 A kind of voice guide method and device, electronic equipment, storage medium
CN109547840A (en) * 2018-12-03 2019-03-29 深圳创维数字技术有限公司 Films and television programs search index method, TV and computer readable storage medium
CN109584879A (en) * 2018-11-23 2019-04-05 华为技术有限公司 A kind of sound control method and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150379993A1 (en) * 2014-06-30 2015-12-31 Samsung Electronics Co., Ltd. Method of providing voice command and electronic device supporting the same
CN105117195A (en) * 2015-09-09 2015-12-02 百度在线网络技术(北京)有限公司 Method and device for guiding voice input
CN109325097A (en) * 2018-07-13 2019-02-12 海信集团有限公司 A kind of voice guide method and device, electronic equipment, storage medium
CN109584879A (en) * 2018-11-23 2019-04-05 华为技术有限公司 A kind of sound control method and electronic equipment
CN109547840A (en) * 2018-12-03 2019-03-29 深圳创维数字技术有限公司 Films and television programs search index method, TV and computer readable storage medium

Also Published As

Publication number Publication date
CN110148414A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN110148414B (en) Voice utterance guiding method and device
CN107785018B (en) Multi-round interaction semantic understanding method and device
US8515764B2 (en) Question and answer database expansion based on speech recognition using a specialized and a general language model
CN106796787B (en) Context interpretation using previous dialog behavior in natural language processing
CN101535983B (en) System and method for a cooperative conversational voice user interface
CN1145141C (en) Method and device for improving accuracy of speech recognition
CN109791761B (en) Acoustic model training using corrected terms
JP2021144759A5 (en)
KR20180025121A (en) Method and apparatus for inputting information
CN110223692B (en) Multi-turn dialogue method and system for voice dialogue platform cross-skill
Lambourne et al. Speech-based real-time subtitling services
CN109979450B (en) Information processing method and device and electronic equipment
US10089898B2 (en) Information processing device, control method therefor, and computer program
US11967248B2 (en) Conversation-based foreign language learning method using reciprocal speech transmission through speech recognition function and TTS function of terminal
US11373638B2 (en) Presentation assistance device for calling attention to words that are forbidden to speak
US20220093103A1 (en) Method, system, and computer-readable recording medium for managing text transcript and memo for audio file
CN109615009B (en) Learning content recommendation method and electronic equipment
JP2018128869A (en) Search result display device, search result display method, and program
CN111443890A (en) Reading assisting method and device, storage medium and electronic equipment
JP2009042968A (en) Information selection system, information selection method, and program for information selection
CN109326284A (en) The method, apparatus and storage medium of phonetic search
CN109190116B (en) Semantic analysis method, system, electronic device and storage medium
CN110570838B (en) Voice stream processing method and device
CN110099332B (en) Audio environment display method and device
CN109273004B (en) Predictive speech recognition method and device based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant