CN110148414A

CN110148414A - A kind of voice saying bootstrap technique and device

Info

Publication number: CN110148414A
Application number: CN201910425275.2A
Authority: CN
Inventors: 王夏鸣
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2019-08-20
Anticipated expiration: 2039-05-21
Also published as: CN110148414B

Abstract

The embodiment of the invention discloses a kind of voice saying bootstrap technique and devices.Wherein, voice saying bootstrap technique includes: the current application page for obtaining user and entering；From the current application page and the other application page, the corresponding saying collection of each function is obtained respectively, wherein the current application page and the other application page have the function of at least one；It is concentrated from the corresponding saying of each function, determines target saying collection；On the current application page, at least one saying in the target saying collection is shown, to guide the user to carry out voice input.The technical solution of the embodiment of the present invention can be improved and be intended to the accuracy understood to user, to improve voice assistant using success rate, promote user experience.

Description

A kind of voice saying bootstrap technique and device

Technical field

The present embodiments relate to speech recognition technology more particularly to a kind of voice saying bootstrap techniques and device.

Background technique

With the continuous development of speech recognition technology, various types of voice assistant's class product is more and more widely used people's In daily life.Voice assistant is the Lai Zhihang corresponding operating by identifying that the voice input of user is ordered, but due to current language System for electrical teaching in natural language understanding (Natural Language Understanding, abbreviation NLU) technical bottleneck, and User speech input order lack of standard, not can guarantee to user be intended to have it is sufficiently high understand accuracy, usually there is language Sound assistant cannot accurately identify user and be intended to lead to the case where executing mission failure.Therefore, voice assistant in use, is led to It crosses and user's saying is guided, improve the normalization of function and saying itself that it is used, become and improve voice assistant use The key of success rate.

In response to this, in the prior art using fixed, the voice saying prompt of specification formula, that is, It says, no matter which kind of scene user is in, and prompts the user with the saying set of all supports.The drawbacks of this scheme is cannot be quasi- Really know that user thinks service to be used, can not effective exploitation user potential demand for services, thus be unable to specific aim progress Saying prompt, cause cannot the saying to user effectively guided, it is difficult to improve the use success rate of voice assistant.

Summary of the invention

The embodiment of the present invention provides a kind of voice saying bootstrap technique and device, with according to user need function to be used into The guidance of row specific aim saying, and the potential demand for services of effective exploitation user, improve the use success rate of voice assistant.

In a first aspect, the embodiment of the invention provides a kind of voice saying bootstrap techniques, which comprises

Obtain the current application page that user enters；

From the current application page and the other application page, the corresponding saying collection of each function is obtained respectively, wherein The current application page and the other application page have the function of at least one；

It is concentrated from the corresponding saying of each function, determines target saying collection；

On the current application page, at least one saying in the target saying collection is shown, described in guidance User carries out voice input.

Second aspect, the embodiment of the invention also provides a kind of voice saying bootstrap techniques, which comprises

The voice that user inputs on the current application page is obtained, and semantics recognition is carried out to the voice；

If recognizing setting from the voice is intended to that property is semantic, from each of the current application page and the other application page The corresponding saying of a function is concentrated, and determines target saying collection, wherein the current application page and the other application page have extremely A few function；

According at least one saying in the target saying collection, semantics recognition result is adjusted.

The third aspect, the embodiment of the invention also provides a kind of voice saying guide device, described device includes:

Application page obtains module, for obtaining the current application page of user's entrance；

Saying collection obtains module, for obtaining each function respectively from the current application page and the other application page The corresponding saying collection of energy, wherein the current application page and the other application page have the function of at least one；

Target saying collection determining module determines target saying collection for concentrating from the corresponding saying of each function；

Saying display module, in the current application page, at least one saying in the target saying collection into Row is shown, to guide the user to carry out voice input.

Fourth aspect, the embodiment of the invention also provides a kind of voice saying guide device, described device includes:

Semantics recognition module, the voice inputted on the current application page for obtaining user, and the voice is carried out Semantics recognition；

If target saying collection determining module is answered semantic for recognizing setting intention property from the voice from currently It is concentrated with the corresponding saying of each function of the page and the other application page, determines target saying collection, wherein the current application The page and the other application page have the function of at least one；

Recognition result adjusts module, for according at least one saying in the target saying collection, to semantics recognition knot Fruit is adjusted.

The technical solution of the embodiment of the present invention is getting user before the voice input of the current application page, by obtaining The current application interface for taking family entrance is concentrated from the corresponding saying of each function of the current application page and the other application page It determines target saying collection, finally on the current application page, at least one saying in target saying collection is shown, to draw It leads the user and carries out voice input；After getting the voice that user inputs on the current application page, language is carried out to voice Justice identification, if setting intention property semanteme is recognized from voice, from each function of the current application page and the other application page The corresponding saying of energy is concentrated, and target saying collection is determined, finally according at least one saying in target saying collection, to semanteme Recognition result is adjusted, and needs function to be used to carry out saying guidance for user, is improved voice assistant and is used success rate, Promote user experience.

Detailed description of the invention

Fig. 1 is the flow chart of one of embodiment of the present invention one voice saying bootstrap technique；

Fig. 2 is the flow chart of one of embodiment of the present invention two voice saying bootstrap technique；

Fig. 3 is the flow chart of one of embodiment of the present invention three voice saying bootstrap technique；

Fig. 4 is the flow chart of one of embodiment of the present invention four voice saying bootstrap technique；

Fig. 5 is the structural schematic diagram of one of embodiment of the present invention five voice saying guide device；

Fig. 6 is the structural schematic diagram of one of embodiment of the present invention six voice saying guide device.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

Embodiment one

Fig. 1 is the flow chart of one of embodiment of the present invention one voice saying bootstrap technique, the technical side of the present embodiment Case is suitable for before user inputs voice, surmises user according to the application page scene where user and is intended to, to carry out specific aim The case where saying guides, this method can be executed by voice saying guide device, which can be by software and/or hardware Lai real It is existing, and can integrate in various general purpose computing devices.

Voice saying guide device provided in this embodiment is configured with multiple application pages, including but not limited to system homepage The page that face and types of applications program provide, the page that types of applications program provides include but is not limited to music page, broadcast page Face and program pages etc..Optionally, system homepage has more than two functions, for example, jumping to some application program master The function of the page, the function of time casting and function of weather casting etc..The page that application program provides can only have one Function, for example, the music page that music application provides has the function of playing music, what video application provided The video playing page has the function of playing video.Certainly, the page that application program provides also can have more than two function Can, the present invention is not limited thereto.

In the present embodiment, each functional configuration has corresponding saying collection, which includes matching with corresponding function Multiple sayings.For example, the corresponding saying collection of function for playing music includes: that I wants to listen the song of ranking list, I wants to listen lullaby, I wants to listen song of certain singer etc..

Above-mentioned page configuration and the configuration of saying collection, method provided in this embodiment based on voice saying guide device are specific Include the following steps:

Step 110 obtains the current application page that user enters.

In the present embodiment, user needs to input voice in some application page.So, user can star voice saying Guide device, into the current application page；Alternatively, from other application page jump to the current application page.The current application page It can be system homepage or the page that types of applications provides.

Voice saying guide device can periodically acquire the current application page, or monitor application page jump thing When part, the current application page is obtained.

Step 120, from the current application page and the other application page, obtain the corresponding saying collection of each function respectively, Wherein, the current application page and the other application page have the function of at least one.

In the application page that the present embodiment can configure voice saying guide device, the page in addition to the current application page The referred to as other application page.The quantity of the other application page is at least one.There is at least one function based on each application page Can, from each application page (including the current application page and other application page), obtain the corresponding saying collection of each function.

Illustratively, the current application page is broadcast page, then obtains broadcast page and other non-broadcasting pages first Each function, for example, broadcast page broadcasting broadcast function, the function of the broadcasting music in music broadcast page face in the non-broadcasting page Energy.Saying collection corresponding with above-mentioned function is obtained again.

Step 130 is concentrated from the corresponding saying of each function, determines target saying collection.

Wherein, target saying collection is concentrated from the corresponding saying of each function, the saying collection determined according to preset standard, For carrying out saying guidance to user.Illustratively, preset standard is the corresponding saying collection of common function, then according to each function The history access times of corresponding saying collection determine common function, and the corresponding saying collection of the common function is determined as target saying Collection；Alternatively, can also be concentrated from the corresponding saying of each function, it is random to determine target saying collection.

Target saying collection can be the corresponding saying collection of function of the current application page, be also possible to the other application page The corresponding saying collection of function.In order to facilitate describing and distinguish, function corresponding to target saying collection can be known as objective function.

Step 140, on the current application page, at least one saying in target saying collection is shown, with guidance User carries out voice input.

In the present embodiment, target saying collection includes the multiple sayings to match with objective function.Optionally, in current application The saying guidance field of the page visualizes at least one saying in target saying collection, the form of visual presentation Including but not limited to written form and graphic form；Alternatively, being playd in order at least on the current application page by speech form One saying.

When saying guidance field is visualized, since spacial flex is limited, possibly target saying can not be shown Therefore whole sayings of concentration according to practical spacial flex size, the saying for randomly choosing specified number are concentrated from target saying It is shown.Exemplary, specified number can be set as 3.

In the scene that a user carries out voice input using voice saying guide device, user starts the guidance of voice saying Device simultaneously enters the current application page, such as the music page.At this point, voice saying guide device obtains music page Face, and obtain the corresponding saying collection of each function in the music page and the other application page.Then, voice saying guide device It can determine that the corresponding saying of music playback function integrates as target saying collection according to preset standard.Then, in the music page Upper at least one saying shown in target saying collection, for example, I wants to listen the song of B singer, I wants to listen hit song etc..From And according to the scene of the current application page locating for user, it guides user to use the function of the current application page, accurately knows use Service to be used is thought at family, carries out the prompt of specific aim saying.Certainly, the guidance of voice saying can also determine wide according to preset standard The corresponding target saying collection of broadcast playback function in the page is broadcast, and then shows at least one saying in the target saying collection, example Such as, I wants to listen to the news program, I wants to listen sports broadcast etc..To according to the scene of the current application page locating for user, guidance User uses the function of other pages, the potential demand for services of effective exploitation user, and then carries out the prompt of specific aim saying.

User can know the current application page under the guidance of saying after seeing or hearing at least one saying The function of function or the other application page, and voice is inputted to the current application page.To, before user may malfunction just to User's guidance avoids input voice mistake.Under most of scenes, user needs continuously or discontinuously to input a Duan Yuyin, then exists It, can also be at least one saying of real-time exhibition, with timely during user inputs voice during user inputs voice The saying for proofreading and correcting oneself avoids input voice mistake.

In the present embodiment, voice saying guide device is configured with multiple application pages, and each application page includes at least one A function, the corresponding saying collection of each function.Based on the configuration, the current application page of entrance is obtained, and from each applications pages The corresponding saying of each function in face is concentrated, and is determined target saying collection, is shown at least one saying in target saying collection, To the scene of the current application page according to locating for user, saying guidance is carried out.By showing that the current application page is corresponding Saying, guidance user use the function of the current application page, accurately know that user thinks service to be used, carry out specific aim saying Prompt；By showing the corresponding saying of the other application page, guidance user uses the function of other pages, effective exploitation user's Potential demand for services carries out the prompt of specific aim saying, while user being avoided to input voice mistake, improves voice saying guide device Use success rate.

Embodiment two

The present embodiment further refines on the basis of the above embodiments, provides and concentrates from the corresponding saying of each function Determine preferred preset standard based on target saying collection, i.e. user uses progress to the history of each function.Optionally, from each The corresponding saying of a function is concentrated, and is used progress according to history of the user to each function, is determined target saying collection.

User includes but is not limited to using progress to the history of each function: user did not use current application page and its Any function of its application page, user used partial function and the user of current application page and/or the other application page Used the repertoire of current application page and the other application page.

User used a function to refer to that user once inputs voice, had invoked the function, and the voice of user's input can be with It is present in the corresponding saying of the function to concentrate, the corresponding saying of the function can also be not present in and concentrated.As long as passing through input language Sound has invoked the function, that is, thinks that user used the function.

Fig. 2 is the flow chart of one of embodiment of the present invention two voice saying bootstrap technique, is retouched in detail below with reference to Fig. 2 It states different history to use under progress, the determination method of target saying collection.

Step 210 obtains the current application page that user enters.

Step 220, from the current application page and the other application page, obtain the corresponding saying collection of each function respectively, Wherein, the current application page and the other application page include at least one function.It executes any one in step 230- step 260 ?.

If any function in the current application page and the other application page is not used in step 230, user, determine current The corresponding target saying collection of function in application page.

In the present embodiment, if any function was not used in user, it is determined that user is currently located the function of application page It can corresponding target saying collection.When i.e. active user is new user, the function that is included for the application page that user is currently located The corresponding target saying collection of energy carries out saying guidance to user.

Illustratively, active user is new user, and enters the music page, then first confirms that music page The function of including in face, such as comprising music playback function and music searching function, further according to music playback function and/or music The corresponding saying collection of function of search constructs target saying collection.It is, the repertoire for including by the music page or part The corresponding saying collection of function is combined into target saying collection.

If step 240, user have used the partial function in the current application page and/or the other application page, determine The corresponding target saying collection of function is not used.

In the present embodiment, if user has used partial function in the current application page and/or the other application page, but Some does not have used function, then the corresponding target saying collection of function is not used for user and carries out saying to user Guidance, to guide user to use original function.

Illustratively, active user used in application page partial function but not repertoire, and user enters sound When the happy broadcasting page, the original function of active user is first confirmd that.For example, sound was not used in the music page in user Happy playing function determines target saying collection, i.e., by user in current application page further according to the corresponding saying collection of music playback function The corresponding saying collection of original function is combined into target saying collection on face.In another example user used music page The corresponding saying collection of the repertoire in face, but the corresponding saying collection of broadcast playback function that broadcast page was not used, then root According to the corresponding saying collection of broadcast playback function, target saying collection is determined.

Below in the form of text combination table, the determination side that the corresponding target saying collection of function is not used is described in detail Method.

When user is on the current application page, guided according to saying after carrying out voice input and completing the use of function, it is right The number that the function and each function that user uses under the current application page use counts.For example, under the page 1, Function 1 is used A times.It is worth noting that, not distinguishing the different sayings of same function in statistic processes, that is, working as user Same function is called twice, when having used different sayings, will not be counted different saying access times respectively, but be counted to work as Under the preceding page, which is used 2 times, and specific statistical table is as shown in table 1.

Table 1

Function access times	Function 1	Function 2	Function 3	Function N
					The page 1	A	B	C	D
The page 2	E	F	G	H
					The page 3	I	J	K	L
Page N	M	N	O	P

It can understand from table 1 and know which function is previously used, which function has not been used.Based on table 1 and each The corresponding saying collection of function counts in each page and is always collected by the saying that the corresponding saying collection of repertoire is constituted, used function The corresponding saying set of energy and the corresponding saying set of unused function generate table 2.

In table 2, in the page 1 always integrating as YM1 of constituting of functional corresponding saying collection, used function The collection that corresponding saying collection is constituted is combined into YY1, determines the corresponding saying collection structure of the original function of user according to above-mentioned set At saying set, i.e. YM1-YY1.Wherein, what YM1-YY1 was indicated is the corresponding saying collection institute group of original function The target saying collection of synthesis.

Table 2

Saying collection	Saying always collects	Use function corresponding	It is corresponding that function is not used
				The page 1	YM1	YY1	YM1-YY1
The page 2	YM2	YY2	YM2-YY2
				The page 3	YM3	YY3	YM3-YY3
Page N	YM4	YY4	YM4-YY4

If step 250, user have used the repertoire in the current application page and the other application page, user is determined The corresponding target saying collection of common function in the current application page, common function of the user in the current application page It is to be determined according to history access times of the user to function each in current page.

In the present embodiment, if user has used the repertoire in the current application page and the other application page, According to user's history usage record (statistical result in table 1), determine that common function of the user in the current application page is corresponding Target saying collection.This is because user has formed itself use habit, at this time not when user used repertoire It needs to recommend original function again for user, but is said for the corresponding target of common function of the user in current page Method collection carries out saying guidance to user.

Illustratively, active user has used repertoire, when the user enters the music page, first confirms that The common function of active user.For example, user is music playback function in the common function of the music page, then music is broadcast The corresponding saying collection of playing function, is determined as target saying collection.Wherein, common function of the user in the current application page can be The most function of history access times in the current application page, or the highest function of frequency of use in specified historical time section Energy.

For example, passing through the statistics of the history usage record of the user when entering the page 1 using the user of repertoire As a result (i.e. table 1) obtains the corresponding target saying collection of common function, and concentrates from target saying and randomly choose froming the perspective of for specified number Method guides user.Specifically, the acquisition methods of common function are as shown in table 3 under the different pages.

Table 3

The history of function	Function 1	Function 2	Function 3	Function N	Common function
						The page 1	A	B	C	D	Max{A,B,C,D}
The page 2	E	F	G	H	Max{E,F,G,H}
						The page 3	I	J	K	L	Max{I,J,K,L}
Page N	M	N	O	P	Max{M,N,O,P}

If step 260, user have used the repertoire in the current application page and the other application page, determines and update Target saying collection and/or the newly-increased corresponding target saying collection of objective function afterwards, updated target saying collection are according to new What the common saying of saying and/or other users in the current application page updated.

In an optional embodiment, the backstage of voice saying guide device acquires other users in real time and uses current application The saying inputted when the function of the page, and count the access times of each saying；Access times are more than or equal to frequency threshold value, or Access times saying in the top is as common saying.Common saying is updated to corresponding saying to concentrate, target is obtained and says Method collection；And/or the new saying that voice saying guide device is released is updated to corresponding saying and is concentrated, obtain target saying collection.

In another optional embodiment, the irregular online new application page of voice saying guide device meeting, Huo Zhe Online new function in former application page, then according to the new online corresponding saying collection of function of voice saying guide device, construction Target saying collection.

In one example, the page is played in program, is worked as according to the saying " I will listen cross-talk " of commonly using of other users to update This is commonly used saying and is added to the corresponding theory of program playing function by the corresponding saying collection of program playing function in preceding application page Method is concentrated.In another example, the newly-increased function of listening to storytelling of voice saying guide device then makees the corresponding saying collection of the function of listening to storytelling For target saying collection.

Optionally, it if user has used the repertoire in the current application page and the other application page, determines and updates The corresponding target saying collection of target saying collection or newly-increased objective function afterwards, comprising: if user has used current application Repertoire in the page and the other application page, also, user is more than preset duration to the deactivated duration of speech identifying function Or frequency of use is lower than predeterminated frequency, determines that updated target saying collection and/or the corresponding target of newly-increased objective function are said Method collection.

In this optional technical solution, when used the user of repertoire deactivate duration be more than preset duration or Frequency of use is lower than predeterminated frequency, then the common saying according to above-mentioned new saying and/or other users in the current application page More fresh target saying collection, or according to the new online corresponding saying collection of function, determine target saying collection.Illustratively, it presets Duration can be set as 7 days, and predeterminated frequency can be set as access times in 30 days and be lower than 4 times.

In the following, the determination method of the common saying of other users is described in detail in the form of text combination table.

The use information of all users can be uploaded to cloud in all user's use processes by voice saying guide device It synchronizes and summarizes.Summarize content specific to the corresponding saying of each page, for example, saying 1 is used N_ under the page 1 11 times, specific statistical table is as shown in table 4.In table 4, by the most saying of other users access times under the current application page As common saying.

Table 4

Step 270, on the current application page, at least one saying in target saying collection is shown, with guidance The user carries out voice input.

The technical solution of the present embodiment is concentrated from the corresponding saying of each function, is made according to history of the user to each function With progress, target saying collection is determined, with the scene according to the history of user using the current application page locating for progress and user The guidance of specific aim saying is carried out, improves and the accuracy understood is intended to user, so that improving voice assistant uses success rate, The user experience is improved.

Embodiment three

Fig. 3 is the flow chart of one of embodiment of the present invention three voice saying bootstrap technique, the technical side of the present embodiment Case is suitable for understanding the intention for the voice that user has inputted, during user inputs voice to carry out specific aim The case where saying prompts, this method can be executed by voice saying guide device, which can be by software and/or hardware Lai real It is existing, and can integrate in various general purpose computing devices.It is worth noting that the application page in voice saying guide device Configuration, the function that has of application page and the corresponding saying collection of function be detailed in the description of above-described embodiment, it is no longer superfluous herein It states.

In conjunction with Fig. 3, method provided in this embodiment specifically comprises the following steps:

Step 310 obtains the voice that user inputs on the current application page, and carries out semantics recognition to voice.

Wherein, semantics recognition is to be identified by natural language recognition technology to natural language, is identified with basis User speech input parsing user semantic, to carry out the guidance of specific aim saying to user.

In the present embodiment, user is being got after the voice that the current application page inputs, to the voice of the input of user Semantics recognition is carried out, and is translated into text, real-time display is on the current application page.Wherein, semantics recognition include to The speech recognition that family inputs and the understanding to user semantic.

If step 320 recognizes setting intention property semanteme from voice, from the current application page and the other application page The corresponding saying of each function concentrate, determine target saying collection, wherein the current application page and other application page tool There is at least one function.

When user starts to carry out voice input, the voice input content of user is identified, when recognizing setting When semanteme with intentional property, triggering target saying collection determines operation, to carry out specific aim guidance to user.

In one case, setting intention property semanteme can indicate that user thinks function to be used, then be intended to according to setting Property semanteme instruction the corresponding saying of function concentrate, determine target saying collection.For example, the voice for recognizing user's input is " I Want to listen ", wherein " listening " is the semanteme with intentional property, then function relevant to " listening " is filtered out from each function first, then It is concentrated from the corresponding saying of function relevant to " listening ", determines target saying collection.To which user is intended to carry out accurate understanding, has It is guided conducive to targetedly saying is carried out to user.

In another case, setting intention property semanteme cannot indicate that user thinks function to be used, then from each function pair The saying answered is concentrated, and determines target saying collection according to preset standard.For example, the voice for recognizing user's input is that " you are good, I am not What is known ", wherein " what is not known " is the semanteme of the intention with request guidance, then from the corresponding theory of each function Method is concentrated, random to determine target saying collection, alternatively, determining the corresponding saying collection of common function.

Step 330, according at least one saying in target saying collection, semantics recognition result is adjusted.

In the present embodiment, after determining target saying collection, according to target saying concentrate saying, to semantics recognition result into Row adjustment, will be adjusted as the result is shown on the current application page, be inputted with guiding user to complete remaining voice.

The technical solution of the present embodiment, voice saying guide device are configured with multiple application pages, each applications pages bread Include at least one function, the corresponding saying collection of each function.Based on the configuration, obtains user and inputted on the current application page Voice, semantics recognition is carried out to the voice, to be intended to property semantic if recognizing setting from the voice, from current application The corresponding saying of each function of the page and the other application page is concentrated, and determines target saying collection, and concentrate according to target saying At least one saying semantics recognition result is adjusted, thus the scene of the current application page according to locating for user and The voice that user has inputted carries out specific aim guidance to user.By according to the corresponding saying of the current application page to semantics recognition As a result it is adjusted, guidance user uses the function of the current application page, accurately knows that user thinks service to be used, carries out needle Property saying is prompted；By being adjusted according to the corresponding saying of the other application page to semantics recognition result, guidance user makes With the function of other pages, the potential demand for services of effective exploitation user carries out the prompt of specific aim saying, while avoiding user defeated Enter voice mistake, improves the use success rate of voice saying guide device.

Example IV

The present embodiment further refines on the basis of the above embodiments, provides from the current application page and other application The corresponding saying of each function of the page, which is concentrated, determines preferred preset standard based on target saying collection, i.e., user is to each function The history of energy uses progress.Optionally, it is concentrated from the corresponding saying of each function of the current application page and the other application page, Progress is used according to history of the user to each function, determines target saying collection.

User is detailed in the description of above-described embodiment to the history of each function using progress, and details are not described herein again.

Fig. 4 is the flow chart of one of embodiment of the present invention four voice saying bootstrap technique, is retouched in detail below with reference to Fig. 4 It states different history to use under progress, the determination method of target saying collection.

Step 410 obtains the voice that user inputs on the current application page, and carries out semantics recognition to voice.

Step 420 judges whether to recognize setting intention property semanteme from voice, if so, executing step 430- step Any one of 460；If not, returning to step 410.

Wherein, setting be intended to property semanteme be comprising user be intended to the corresponding semanteme of voice, for according to user's intention come Specific aim guidance is carried out to user, so that user completes remaining voice input.

If any function in the current application page and the other application page is not used in step 430, user, determine current The corresponding target saying collection of function in application page.

Optionally, if any function was not used in user, it is determined that the function of the current application page is corresponding where user , be intended to the target saying collection that is consistent of property semanteme with the setting currently recognized.When i.e. active user is new user, for identification To setting be intended to the corresponding target saying collection of function that the application page that property is semantic and user is currently located is included to Family carries out saying guidance.

Illustratively, when the voice " I wants to listen " for recognizing user's input, and when user be in the music page, head The function of including in the music page that first confirmation user is currently located, such as music playback function and music searching function, The saying being consistent with " I wants to listen " is concentrated further according to the corresponding saying of each function for including in the music page, constructs target Saying collection.

If step 440, user have used the partial function in the current application page and/or the other application page, determine The corresponding target saying collection of function is not used.

Optionally, if user has used partial function but not repertoire, according to the setting meaning currently recognized Figure is semantic and user is not used the corresponding target saying collection of function and carries out saying guidance to user.

Illustratively, active user used partial function but not repertoire, and user enters the music page When, pass through the statistical result of table 2 first, confirms the original function of user.For example, user is not used in the music page Cross music playback function, and recognize set intention property semanteme as " I wants to listen ", then it is corresponding according to music playback function Saying collection constructs target saying collection, to guide user to use original function.

If step 450, user have used the repertoire in the current application page and the other application page, user is determined The corresponding target saying collection of common function in the current application page, common function of the user in the current application page is root It is determined according to history access times of the user to function each in current page.

Optionally, if user has used the repertoire in speech recognition system, it is intended to according to the setting recognized Property it is semantic, determine that (common function is shown in Table 3 statistics to the corresponding target saying collection of common function of the user in the current application page As a result).

Illustratively, active user has used repertoire, and when user enters the music page, first confirms that current The common function of user.For example, user is playback of songs function in the common function of the music page, and what is recognized sets Surely intention property semanteme is " I wants to listen ", it is known that, user is desirable to using function relevant to " listening " in the music page, and works as The corresponding saying of music playback function under the preceding page concentrates the setting intention property comprising recognizing semantic, then according to current page The corresponding saying collection of music playback function construct target saying collection.

If step 460, user have used the repertoire in the current application page and the other application page, determines and update Target saying collection and/or the newly-increased corresponding target saying collection of objective function afterwards, updated target saying collection are according to new What the common saying of saying and/or other users in the current application page updated.

In the present embodiment, if user has used repertoire, according to new saying and/or the common saying of other users More fresh target saying collection, or according to the new online corresponding saying collection of function of voice saying guide device, determine that user is current Target saying collection.

Optionally, it if user has used the repertoire in the current application page and the other application page, determines and updates Target saying collection and/or the newly-increased corresponding target saying collection of objective function afterwards, comprising:

If user has used the repertoire in the current application page and the other application page, also, knows from voice It is clipped to pause or the thinking modal particle of preset duration, determines that updated target saying collection and/or newly-increased objective function are corresponding Target saying collection (common saying is shown in Table 4 statistical result).

In this optional technical solution, when used the user of repertoire after starting to carry out voice input, inspection Measure the pause of preset duration, or thinking modal particle, such as " grace ... ", " volume ... " etc., then according to above-mentioned new saying and/ Or the common saying of other users more fresh target saying collection, or according to the new online corresponding saying collection of function, determine user Current goal saying collection.Illustratively, preset duration can be set as 2 seconds.

Step 470, in semantics recognition result, at least one saying in supplementary target saying collection, alternatively, using target At least one saying in saying collection corrects semantics recognition result.

To having shown that semantics recognition result on the screen is adjusted, including the saying pair concentrated according to target saying Semantics recognition result is supplemented or is corrected, wherein, can be in semantics recognition result when being adjusted to semantics recognition result Before, neutralize after different location supplementary target saying concentrate saying.

Illustratively, current semantics recognition result is " I wants to listen ", and determining target saying is concentrated comprising " I wants to listen traffic Broadcast ", then supplement above-mentioned semantics recognition result by the saying, to guide user to complete remaining voice input.

Illustratively, current semantics recognition result is " I wants to listen happy door song ", and it includes " I that determining target saying, which is concentrated, Want to listen hit song ", then upper speech recognition result is modified by the saying, to guide user to re-enter correctly Voice.

At least one saying in semantics recognition result and target saying collection is differently shown in step 480.

After being adjusted according at least one saying in target saying collection to semantics recognition result, by final adjustment knot Fruit is displayed on the screen, and semantics recognition result and target saying collection will distinguish, for example, concentrating extremely to target saying A few saying carries out font-weight or highlighted processing, so that resolution input content and the saying guidance that user is more clear Content inputs to complete correct voice according to guidance content.

The technical solution of the present embodiment, from the corresponding theory of each function of the current application page and the other application page Method is concentrated, and is used progress according to history of the user to each function, is determined target saying collection, has identified that setting is intended to basis Property semantic, the current application page locating for user scene and history using progress carry out the guidance of specific aim saying, improve pair User is intended to the accuracy understood, to improve voice assistant using success rate, the user experience is improved.

Embodiment five

Fig. 5 is a kind of structural schematic diagram for voice saying guide device that the embodiment of the present invention five provides, the voice saying Guide device, comprising: application page obtains module 510, saying collection obtains module 520, target saying collection determining module 530 and says Method display module 540.

Application page obtains module 510, for obtaining the current application page of user's entrance；

Saying collection obtains module 520, for obtaining respectively each from the current application page and the other application page The corresponding saying collection of function, wherein the current application page and the other application page have the function of at least one；

Target saying collection determining module 530 determines target saying collection for concentrating from the corresponding saying of each function；

Saying display module 540 is used in the current application page, at least one saying in the target saying collection It is shown, to guide the user to carry out voice input.

The technical solution of the embodiment of the present invention is getting user before the voice input of the current application page, by obtaining The current application interface for taking family entrance is concentrated from the corresponding saying of each function of the current application page and the other application page It determines target saying collection, at least one saying in target saying collection is shown, to currently be answered according to locating for user With the scene of the page, saying guidance is carried out.By showing the corresponding saying of the current application page, guidance user uses current application The function of the page accurately knows that user thinks service to be used, carries out the prompt of specific aim saying；By showing the other application page Corresponding saying, guidance user use the function of other pages, and the potential demand for services of effective exploitation user carries out specific aim and says Method prompt, while user being avoided to input voice mistake, improve the use success rate of voice saying guide device.

Optionally, the target saying collection determining module 530, is specifically used for:

It is concentrated from the corresponding saying of each function, progress is used according to history of the user to each function, determines target saying Collection.

Optionally, the target saying collection determining module 530, comprising:

First object saying collection determination unit, if the current application page and other application page is not used for the user Any function in face determines the corresponding target saying collection of function in the current application page；

Second target saying collection determination unit, if having used the current application page for the user and/or other having answered With the partial function in the page, the unused corresponding target saying collection of function is determined；

Third target saying collection determination unit, if having used the current application page and other application page for the user Repertoire in face determines the corresponding target saying collection of common function of the user in the current application page, and user is current Common function in application page is determined according to history access times of the user to function each in current page；

4th target saying collection determination unit, if having used the current application page and other application page for the user Repertoire in face determines updated target saying collection and/or the corresponding target saying collection of newly-increased objective function, updates Target saying collection afterwards is updated according to the common saying of new saying and/or other users in the current application page.

Optionally, the 4th target saying collection determination unit, is specifically used for:

If the user has used the repertoire in the current application page and the other application page, also, the use Family is more than that preset duration or frequency of use are lower than predeterminated frequency to the deactivated duration of speech identifying function, determines updated target Saying collection and/or the corresponding target saying collection of newly-increased objective function.

The voice that the executable any embodiment of the present invention of voice saying guide device provided by the embodiment of the present invention provides Saying bootstrap technique has the corresponding functional module of execution method and beneficial effect.

Embodiment six

Fig. 6 is a kind of structural schematic diagram for voice saying guide device that the embodiment of the present invention six provides, the voice saying Guide device, comprising: semantics recognition module 610, target saying collection determining module 620, recognition result adjust module 630.

Semantics recognition module 610, the voice inputted on the current application page for obtaining user, and language is carried out to voice Justice identification；

Target saying collection determining module 620, if semantic for recognizing setting intention property from voice, from current application The corresponding saying of each function of the page and the other application page is concentrated, and determines target saying collection；

Recognition result adjusts module 630, for according at least one saying in target saying collection, to semantics recognition result It is adjusted.

The technical solution of the embodiment of the present invention obtains the voice that user inputs on the current application page, to the voice Semantics recognition is carried out, if setting intention property semanteme is recognized from the voice, from the current application page and other application page The corresponding saying of each function in face is concentrated, and determines target saying collection, and according at least one saying pair in target saying collection Semantics recognition result is adjusted, thus the voice that the scene of the current application page and user have inputted according to locating for user Specific aim guidance is carried out to user.By being adjusted according to the corresponding saying of the current application page to semantics recognition result, draw The function that user uses the current application page is led, accurately knows that user thinks service to be used, carries out the prompt of specific aim saying；It is logical It crosses and semantics recognition result is adjusted according to the other application page corresponding saying, guidance user uses the function of other pages Can, the potential demand for services of effective exploitation user carries out the prompt of specific aim saying, while user being avoided to input voice mistake, mentions The use success rate of high voice saying guide device.

Optionally, the target saying collection determining module 620, is specifically used for:

It is concentrated from the corresponding saying of each function of the current application page and the other application page, according to the user to each The history of function uses progress, determines target saying collection.

Optionally, the target saying collection determining module 620, comprising:

Third target saying collection determination unit, if having used the current application page and other application page for the user Repertoire in face, determines the corresponding target saying collection of common function of the user in the current application page, and the user exists Common function in the current application page is determined according to history access times of the user to function each in current page；

4th target saying collection determination unit, if having used the current application page and other application page for the user Repertoire in face determines updated target saying collection and/or the corresponding target saying collection of newly-increased objective function, described Updated target saying collection is updated according to the common saying of new saying and/or other users in the current application page.

If user has used the repertoire in the current application page and the other application page, also, from the voice In recognize preset duration pause or thinking modal particle, determine updated target saying collection and/or newly-increased objective function Corresponding target saying collection.

Optionally, recognition result adjusts module 630, comprising:

Recognition result adjustment unit, is used in semantics recognition result, at least one saying in supplementary target saying collection, Alternatively, correcting the semantics recognition result using at least one saying in target saying collection；

Unit is distinctly displayed, it is aobvious for carrying out difference at least one saying in semantics recognition result and target saying collection Show.

Language provided by any embodiment of the present invention can be performed in voice saying guide device provided by the embodiment of the present invention Sound saying bootstrap technique has the corresponding functional module of execution method and beneficial effect.

It is worth noting that, in a kind of embodiment of above-mentioned voice saying guide device, included each unit and mould Block is only divided according to the functional logic, but is not limited to the above division, and is as long as corresponding functions can be realized It can；In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection model being not intended to restrict the invention It encloses.

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of voice saying bootstrap technique characterized by comprising

Obtain the current application page that user enters；

From the current application page and the other application page, the corresponding saying collection of each function is obtained respectively, wherein described The current application page and the other application page have the function of at least one；

On the current application page, at least one saying in the target saying collection is shown, to guide the user Carry out voice input.

2. determining mesh the method according to claim 1, wherein described concentrate from the corresponding saying of each function Mark saying collection, comprising:

3. according to the method described in claim 2, it is characterized in that, described concentrate from the corresponding saying of each function, according to institute User is stated to the history of each function using progress, determines any one of target saying collection, including following operation:

If any function in the current application page and the other application page is not used in the user, the current application is determined The corresponding target saying collection of function in the page；

If the user has used the partial function in the current application page and/or the other application page, unused function is determined It can corresponding target saying collection；

If the user has used the repertoire in the current application page and the other application page, determine that user answers currently With the corresponding target saying collection of common function in the page, common function of the user in the current application page be according to Family determines the history access times of function each in current page；

If the user has used the repertoire in the current application page and the other application page, updated target is determined Saying collection and/or the corresponding target saying collection of newly-increased objective function, the updated target saying collection is according to new saying And/or common saying update of the other users in the current application page.

If 4. according to the method described in claim 3, it is characterized in that, the user used the current application page and Repertoire in the other application page determines updated target saying collection and/or the corresponding target of newly-increased objective function Saying collection, comprising:

If the user has used the repertoire in the current application page and the other application page, also, the user couple The deactivated duration of speech identifying function is more than that preset duration or frequency of use are lower than predeterminated frequency, determines updated target saying Collection and/or the newly-increased corresponding target saying collection of objective function.

5. a kind of voice saying bootstrap technique characterized by comprising

6. according to the method described in claim 5, it is characterized in that, described from the current application page and the other application page The corresponding saying of each function concentrate, determine target saying collection, comprising:

7. according to the method described in claim 6, it is characterized in that, described from the current application page and the other application page The corresponding saying of each function concentrate, progress is used according to history of the user to each function, target saying collection is determined, wraps Include any one of following operation:

If 8. the method according to the description of claim 7 is characterized in that the user used the current application page and Repertoire in the other application page determines updated target saying collection and/or the corresponding target of newly-increased objective function Saying collection, comprising:

If the user has used the repertoire in the current application page and the other application page, also, from the voice In recognize preset duration pause or thinking modal particle, determine updated target saying collection and/or newly-increased objective function Corresponding target saying collection.

9. according to the method described in claim 5, it is characterized in that, described say according at least one of described target saying collection Method is adjusted semantics recognition result, comprising:

In the semantics recognition result, at least one saying in the target saying collection is supplemented, alternatively, using the target At least one saying in saying collection corrects the semantics recognition result；

At least one saying in the semantics recognition result and target saying collection is differently shown.

10. a kind of voice saying guide device characterized by comprising

Saying collection obtains module, for obtaining each function pair respectively from the current application page and the other application page The saying collection answered, wherein the current application page and the other application page have the function of at least one；

Saying display module, for being opened up at least one saying in the target saying collection in the current application page Show, to guide the user to carry out voice input.

11. device according to claim 10, which is characterized in that the target saying collection determining module is specifically used for:

12. device according to claim 11, which is characterized in that the target saying collection determining module, comprising:

First object saying collection determination unit, if be not used in the current application page and the other application page for the user Any function, determine the corresponding target saying collection of function in the current application page；

Second target saying collection determination unit, if having used the current application page and/or other application page for the user Partial function in face determines the unused corresponding target saying collection of function；

Third target saying collection determination unit, if the user used it is complete in the current application page and the other application page Portion's function determines the corresponding target saying collection of common function of the user in the current application page, and the user is in current application Common function in the page is determined according to history access times of the user to function each in current page；

4th target saying collection determination unit, if used in the current application page and the other application page for the user Repertoire, determine updated target saying collection and/or the corresponding target saying collection of newly-increased objective function, the update Target saying collection afterwards is updated according to the common saying of new saying and/or other users in the current application page.

13. a kind of voice saying guide device characterized by comprising

Semantics recognition module, the voice inputted on the current application page for obtaining user, and the voice is carried out semantic Identification；

Target saying collection determining module, if semantic for recognizing setting intention property from the voice, from current application page The corresponding saying of each function of face and the other application page is concentrated, and determines target saying collection；

Recognition result adjusts module, for according at least one saying in the target saying collection, to semantics recognition result into Row adjustment.

14. device according to claim 13, which is characterized in that the target saying collection determining module is specifically used for:

15. device according to claim 14, which is characterized in that the target saying collection determining module, comprising:

Third target saying collection determination unit, if used in the current application page and the other application page for the user Repertoire, determine the corresponding target saying collection of common function of the user in the current application page, the user is current Common function in application page is determined according to history access times of the user to function each in current page；