CN104535071A

CN104535071A - Voice navigation method and device

Info

Publication number: CN104535071A
Application number: CN201410742287.5A
Authority: CN
Inventors: 谢波
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: BAIDU INTERNATIONAL TECHNOLOGY (SHENZHEN) Co.,Ltd.; Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2014-12-05
Filing date: 2014-12-05
Publication date: 2015-04-22
Anticipated expiration: 2034-12-05
Also published as: CN104535071B

Abstract

The invention provides a voice navigation method and device. The voice navigation method comprises the following steps of acquiring a first voice instruction of a user, determining a voice recognition model according to the current navigation state, carrying out speech recognition on the first voice instruction of the user by the voice recognition model to obtain a first voice recognition result, and carrying out navigation operation according to the first voice recognition result. The voice navigation method and device solve the problem that the existing navigation process has a high operation cost and low voice navigation efficiency.

Description

A kind of phonetic navigation method and device

[technical field]

The present invention relates to human-computer interaction technology, particularly relate to a kind of phonetic navigation method and device.

[background technology]

Along with global position system GPS civil nature deeply and the development of Internet communication technology, people use navigation client more and more.Navigation client can provide the navigation feature such as path planning and guiding based on electronic chart to user.Because it brings very large convenience to the trip of people, therefore become a requisite part in life gradually.Along with to the navigation attention rate of client and usage degree more and more higher, people have not only been satisfied with the basic navigation feature that navigation client provides, but wish its navigation Service that can provide accuracy, more hommization more.

Sound prompt function be navigation client the important component part of navigation feature is provided, due to the singularity of client of navigating, to be absorbed in driver's startup procedure and drive and note surface conditions, therefore navigation client end interface can not be observed frequently to obtain route relevant information, therefore, sound prompt function just seems particularly important.But, navigation client only has sound prompt function, thering is provided during navigation Service still needs user before steering vehicle is set out on a journey, manual input destination information, if user needs to obtain other navigation informations in driving procedure, carry out manual operation after then needing to stop to complete and check, the running cost therefore at present in navigation procedure is higher, and the treatment effeciency of Voice Navigation is lower.

[summary of the invention]

In view of this, embodiments provide a kind of phonetic navigation method and device, in order to solve the problem that in prior art, in navigation procedure, running cost is higher, Voice Navigation process efficiency is lower.

The one side of the embodiment of the present invention, provides a kind of phonetic navigation method, comprising:

Gather first phonetic order of user;

According to current navigation state, determine speech recognition modeling;

Utilize described speech recognition modeling, speech recognition is carried out to first phonetic order of described user, to obtain the first voice identification result;

According to described first voice identification result, perform guidance operation.

Aspect as above and arbitrary possible implementation, provide a kind of implementation further, and first phonetic order of described collection user, comprising:

Second phonetic order of monitoring users;

Utilize voice to wake model up and speech recognition is carried out to described second phonetic order, to obtain the second voice identification result;

If described second voice identification result meets default wake-up condition, gather first phonetic order of user.

Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described according to current navigation state, determines speech recognition modeling, comprising:

If described current navigation state for start navigation before, determine that described speech recognition modeling is the first model;

Wherein, described first model is for identifying the point of interest search instruction comprised in described first phonetic order.

If described current navigation state is for navigate, determine that described speech recognition modeling is the second model;

Wherein, described second model for identify comprise in described first language instruction with in giving an order at least one: client steering order, navigation hint instruction and point of interest search instruction.

Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described according to described first voice identification result, performs guidance operation, comprising:

If the point of interest search instruction comprised in described first voice identification result is interest point name, obtains and export the Search Results matched with described interest point name; Or,

If the point of interest search instruction comprised in described first voice identification result is point of interest typonym, obtains and export the Search Results matched with described point of interest typonym.

If comprise client steering order in described first voice identification result, according to described client steering order, at least one in following control is carried out to described client: zoom in/out map, increase/reduction volume and On/Off projecting function;

If comprise navigation hint instruction in described first voice identification result, according to described navigation hint instruction, obtain and export at least one in following navigation hint information: arriving the Distance Remaining information of destination, the required time information arriving destination, traffic information and place road information;

If comprise point of interest search instruction in described first voice identification result, according to described point of interest search instruction, obtain and export the Search Results matched with interest point information in described point of interest search instruction.

The one side of the embodiment of the present invention, provides a kind of voice guiding device, comprising:

Voice collecting unit, for gathering first phonetic order of user;

Model treatment unit, for according to current navigation state, determines speech recognition modeling;

Voice recognition unit, for utilizing described speech recognition modeling, carries out speech recognition to first phonetic order of described user, to obtain the first voice identification result;

Navigation performance element, for according to described first voice identification result, performs guidance operation.

Aspect as above and arbitrary possible implementation, provide a kind of implementation further,

Described voice collecting unit, also for the second phonetic order of monitoring users;

Described device also comprises:

Voice wakeup unit, wakes model up for utilizing voice and carries out speech recognition to described second phonetic order, to obtain the second voice identification result; If described second voice identification result meets default wake-up condition, trigger first phonetic order of described voice collecting unit collection user.

Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described model treatment unit, specifically for:

Aspect as above and arbitrary possible implementation, provide a kind of implementation further, described navigation performance element, specifically for:

As can be seen from the above technical solutions, the embodiment of the present invention has following beneficial effect:

The embodiment of the present invention is by gathering first phonetic order of user; Thus, according to current navigation state, determine speech recognition modeling; And then utilize described speech recognition modeling, speech recognition is carried out to first phonetic order of described user, to obtain the first voice identification result, and according to described first voice identification result, performs guidance operation.Compared with prior art, the technical scheme that the embodiment of the present invention provides can according to the phonetic order of user, automatically perform guidance operation, thus realize providing navigation Service to user, do not need user to carry out manual operation and just can realize navigation feature, therefore, it is possible to solve the problem that in prior art, in navigation procedure, running cost is higher and Voice Navigation process efficiency is lower, the running cost in navigation procedure can be reduced, improve the treatment effeciency of Voice Navigation, ensure driving safety.

[accompanying drawing explanation]

In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the exemplary system figure of the technical scheme use that the embodiment of the present invention provides;

Fig. 2 is the schematic flow sheet of the phonetic navigation method that the embodiment of the present invention provides;

Fig. 3 is the functional block diagram of the voice guiding device that the embodiment of the present invention provides.

[embodiment]

Technical scheme for a better understanding of the present invention, is described in detail the embodiment of the present invention below in conjunction with accompanying drawing.

Should be clear and definite, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making other embodiments all obtained under creative work prerequisite, belong to the scope of protection of the invention.

The term used in embodiments of the present invention is only for the object describing specific embodiment, and not intended to be limiting the present invention." one ", " described " and " being somebody's turn to do " of the singulative used in the embodiment of the present invention and appended claims is also intended to comprise most form, unless context clearly represents other implications.

Should be appreciated that term "and/or" used herein is only a kind of incidence relation describing affiliated partner, can there are three kinds of relations in expression, and such as, A and/or B, can represent: individualism A, exists A and B simultaneously, these three kinds of situations of individualism B.In addition, character "/" herein, general expression forward-backward correlation is to the relation liking a kind of "or".

Although should be appreciated that may adopt in embodiments of the present invention term first, second etc. voice identification result is described, these keywords should not be limited to these terms.These terms are only used for keyword to be distinguished from each other out.Such as, when not departing from embodiment of the present invention scope, the first voice identification result also can be called as the second voice identification result, and similarly, the second voice identification result also can be called as the first voice identification result.

Depend on linguistic context, word as used in this " if " can be construed as into " ... time " or " when ... time " or " in response to determining " or " in response to detection ".Similarly, depend on linguistic context, phrase " if determination " or " if detecting (the conditioned disjunction event of statement) " can be construed as " when determining " or " in response to determining " or " when detecting (the conditioned disjunction event of statement) " or " in response to detection (the conditioned disjunction event of statement) ".

The system that the technical scheme that the embodiment of the present invention provides uses as shown in Figure 1, form primarily of client and server, the method and apparatus that the embodiment of the present invention provides realizes at client-side, be mainly used in the phonetic order according to user, there is provided Voice Navigation service to user, realize the speech navigation function of client.

The embodiment of the present invention provides a kind of phonetic navigation method, please refer to Fig. 2, the schematic flow sheet of its phonetic navigation method provided for the embodiment of the present invention, and as shown in the figure, the method comprises the following steps:

S201, gathers first phonetic order of user.

S202, according to current navigation state, determines speech recognition modeling.

S203, utilizes described speech recognition modeling, carries out speech recognition to first phonetic order of described user, to obtain the first voice identification result.

S204, according to described first voice identification result, performs guidance operation.

Based on above-mentioned phonetic navigation method, the method for the embodiment of the present invention to S201 is specifically described.This step specifically can comprise:

First, the second phonetic order of client monitoring users.Then, client utilizes voice to wake model up to carry out speech recognition to described second phonetic order, to obtain the second voice identification result.Finally, client judges whether this second voice identification result meets the wake-up condition preset, if described second voice identification result meets default wake-up condition, client gathers first phonetic order of user.

It should be noted that, in Voice Navigation process, in order to avoid the mistake identification of the phonetic order to user, need to realize real-time voice arousal function, client is under voice wake-up states, if the phonetic order of the user listened to meets wake-up condition, enter speech recognition state, thus just start the phonetic order receiving user.

Such as, client can open a watcher thread, for the second phonetic order of lasting monitoring users.Client utilizes voice to wake model up, to listening to the second phonetic order carries out speech recognition, to obtain the second voice identification result.These voice wake up model for identify comprise in the second phonetic order wake word up.If client is judged to comprise in the second voice identification result and default is waken word up, then illustrate that the second voice identification result meets default wake-up condition, client can be converted to speech recognition state by voice wake-up states, thus client can gather first phonetic order of user.

Preferably, client can utilize audio collecting device to gather above-mentioned first phonetic order.

Such as, when client is positioned on mobile phone or panel computer, client can utilize microphone to gather the first phonetic order.

Based on above-mentioned phonetic navigation method, the method for the embodiment of the present invention to S202 is specifically described.This step specifically can comprise:

Client judges self current navigation state, if described current navigation state for start navigation before, determine that described speech recognition modeling is the first model; Wherein, described first model is for identifying the point of interest search instruction comprised in described first phonetic order.If described current navigation state is for navigate, determine that described speech recognition modeling is the second model; Wherein, described second model for identify comprise in described first language instruction with in giving an order at least one: client steering order, navigation hint instruction and point of interest search instruction.

It should be noted that, in the embodiment of the present invention, before starting navigation and under the two kinds of navigational states that navigating, the speech recognition modeling used is different.Before starting navigation, need the interest point information of being specified by the first phonetic order according to user, carry out point of interest search, therefore, need to utilize the first model to identify the point of interest search instruction comprised in the first phonetic order.Owing to navigating, need the dependent instruction of being specified by the first phonetic order according to user, carry out client control, navigation hint or point of interest search, therefore, need to utilize the second model to identify the dependent instruction comprised in the first phonetic order.

Based on above-mentioned phonetic navigation method, the method for the embodiment of the present invention to S203 is specifically described.This step specifically can comprise:

If the current navigation state of client for start navigation before, first phonetic order of the first model to the user gathered is utilized to carry out speech recognition, or, if the current navigation state of client is for navigate, utilize first phonetic order of the second model to the user gathered to carry out speech recognition.

Illustrate, client utilizes the first model or the second model can include but not limited to the method that the first phonetic order carries out speech recognition:

First, client carries out pre-service to the first phonetic order, can comprise filtering process, sampling and quantification treatment, windowing process, end-point detection process and pre-emphasis process etc.Then, client is to through pretreated first phonetic order characteristic information extraction.Finally, if utilize the first model to carry out speech recognition, then the characteristic information of extraction mates with the characteristic information in the first model by client, using the character information corresponding to characteristic information the highest for coupling mark as the first voice identification result.If utilize the second model to carry out speech recognition, then the characteristic information of extraction mates with the characteristic information in the second model by client, using characteristic information the highest for coupling mark as the first voice identification result.

It should be noted that, when utilizing the first model to carry out speech recognition to the first phonetic order, can make the characteristic information of the characteristic information of the first phonetic order and the point of interest search instruction in the first model to mate mark higher, so just can using the characteristic information of point of interest search instruction as the first voice identification result.In like manner, when utilizing the second model to carry out speech recognition to the first phonetic order, can make the characteristic information of the characteristic information of the first phonetic order and the dependent instruction in the second model to mate mark higher, like this, just can using the characteristic information of these dependent instructions as the first voice identification result.In the embodiment of the present invention, for the difference of current navigation state, use different speech recognition modelings, the navigation needs of user can be identified targetedly from the phonetic order of user, thus accuracy and the reliability of Voice Navigation can be improved.

Based on above-mentioned phonetic navigation method, the method for the embodiment of the present invention to S204 is specifically described.This step specifically can comprise:

During the point of interest search instruction comprised in the first voice identification result gone out when utilizing the first Model Identification above, if the point of interest search instruction comprised in the first voice identification result is interest point name, client obtains and exports the Search Results matched with this interest point name.Or if the point of interest search instruction comprised in the first voice identification result is point of interest typonym, client obtains and exports the Search Results matched with this point of interest typonym.

Illustrate, the method that client obtains the Search Results matched with interest point name or point of interest typonym can include but not limited to following two kinds:

The first: client can be searched in the local database according to interest point name or point of interest typonym, to obtain the Search Results matched.

The second: as shown in Figure 1, if client there is no Search Results in the local database, the searching request that client can be initiated for this interest point name or point of interest typonym to server, in order to obtain the Search Results matched with interest point name or point of interest typonym from server.

In addition, client, after this Search Results of acquisition, can utilize this Search Results of voice broadcast, to realize the output of Search Results.Or client, after this Search Results of acquisition, also can show this Search Results, to realize the output of Search Results.

Such as, first voice identification result is " I will go to Beihai park ", client is according to the interest point name " Beihai park " comprised in this first voice identification result, obtain the Search Results matched with " Beihai park " in the local database or in server, as this Search Results can include but not limited at least one in following information: the routing information of the address of " Beihai park ", phone, arrival " Beihai park ", with the range information of current location, arrive required for duration and mark out the electronic chart of " Beihai park ".Client can report the routing information of arrival " Beihai park ", and display marks out the electronic chart of " Beihai park " simultaneously, to realize exporting to user the Search Results matched with " Beihai park ".

Such as, first voice identification result is " I will go to neighbouring KFC ", client is according to the point of interest typonym comprised in this first voice identification result " neighbouring KFC ", obtain the Search Results matched with " neighbouring KFC " in the local database or in server, as this Search Results can include but not limited at least one in following information: the address of " neighbouring KFC ", phone, arrive the routing information of the nearest KFC of " neighbouring KFC " middle distance current location, with the range information of current location, duration required for arrival and the electronic chart marking out nearest KFC.Client can be reported and arrive this routing information, and display marks out the electronic chart of nearest KFC simultaneously, to realize exporting to user the Search Results matched with " neighbouring KFC ".

When comprising client steering order in the first voice identification result gone out when utilizing the second Model Identification above, according to described client steering order, at least one in following control is carried out to described client: zoom in/out map, increase/reduction volume and On/Off projecting function.

Such as, described projecting function can be projected on the front windshield of automobile by electronic chart, facilitates user to check electronic chart.

Or, when comprising navigation hint instruction in the first voice identification result gone out when utilizing the second Model Identification above, according to described navigation hint instruction, obtain and export at least one in following navigation hint information: arriving the Distance Remaining information of destination, the required time information arriving destination, traffic information and place road information.

Such as, place road information can include but not limited to the speed-limiting messages etc. of the title of place road, the camera information of place road or place road.

Wherein, client can utilize the above-mentioned navigation hint information of voice broadcast, or, also can show above-mentioned navigation hint information.

Or, when comprising point of interest search instruction in the first voice identification result gone out when utilizing the second Model Identification above, according to described point of interest search instruction, obtaining and exporting the Search Results matched with interest point information in described point of interest search instruction.

Such as, in navigation procedure, can according to this point of interest search instruction, point of interest near search client, as refuelling station, service area etc., or also can utilize this point of interest search instruction, upgrade the destination of Present navigation, as destination is replaced by family or company etc.

In addition, client is according to the first voice identification result, after performing guidance operation, if find not collect the first phonetic order again in a period of time, then client can be selected to get back to voice wake-up states by speech recognition state, if user also wants to continue to use speech navigation function, then need again to utilize and wake word trigger clients up and enter speech recognition state.

In the embodiment of the present invention, described client, except can being navigation client, can also be the client utilizing interactive voice mode to provide the information of audio form to user.Described client can be positioned on navigation terminal, intelligent television or subscriber equipment; Described subscriber equipment can include but not limited to personal computer (Personal Computer, PC), personal digital assistant (Personal Digital Assistant, PDA), radio hand-held equipment, panel computer (Tablet Computer), mobile phone, MP3 player, MP4 player etc.

It should be noted that, the executive agent of S201 ~ S204 can be voice guiding device, this device can be positioned at the application of local terminal, or can also for being arranged in plug-in unit or SDK (Software Development Kit) (the Software Development Kit of the application of local terminal, the functional unit such as SDK), the embodiment of the present invention is not particularly limited this.

Be understandable that, described application can be mounted in the application program (nativeApp) in terminal, or can also be a web page program (webApp) of browser in terminal, and the embodiment of the present invention does not limit this.

The embodiment of the present invention provides the device embodiment realizing each step and method in said method embodiment further.

Please refer to Fig. 3, the functional block diagram of its voice guiding device provided for the embodiment of the present invention.As shown in the figure, this device comprises:

Voice collecting unit 301, for gathering first phonetic order of user;

Model treatment unit 302, for according to current navigation state, determines speech recognition modeling;

Voice recognition unit 303, for utilizing described speech recognition modeling, carries out speech recognition to first phonetic order of described user, to obtain the first voice identification result;

Navigation performance element 304, for according to described first voice identification result, performs guidance operation.

Preferably, described voice collecting unit 301, also for the second phonetic order of monitoring users;

Described device also comprises: voice wakeup unit 305, wakes model up carry out speech recognition to described second phonetic order, to obtain the second voice identification result for utilizing voice; If described second voice identification result meets default wake-up condition, trigger first phonetic order of described voice collecting unit collection user.

Preferably, described model treatment unit 302, specifically for:

Preferably, described navigation performance element 304, specifically for:

If the point of interest search instruction comprised in described first voice identification result is interest point name, obtains and export the Search Results matched with described interest point name; Or, if the point of interest search instruction comprised in described first voice identification result is point of interest typonym, obtains and export the Search Results matched with described point of interest typonym.

Preferably, described navigation performance element 304, specifically for:

Because each unit in the present embodiment can perform the method shown in Fig. 2, the part that the present embodiment is not described in detail, can with reference to the related description to Fig. 2.

The technical scheme of the embodiment of the present invention has following beneficial effect:

In addition, voice arousal function can be realized in Voice Navigation process in the technical scheme that the embodiment of the present invention provides, only when the phonetic order of user meets wake-up condition, just Voice Navigation can be carried out, therefore can avoid the mistake identification of the phonetic order to user, improve the accuracy of Voice Navigation.

Those skilled in the art can be well understood to, and for convenience and simplicity of description, the system of foregoing description, the specific works process of device and unit, with reference to the corresponding process in preceding method embodiment, can not repeat them here.

In several embodiment provided by the present invention, should be understood that, disclosed system, apparatus and method, can realize by another way.Such as, device embodiment described above is only schematic, such as, the division of described unit, is only a kind of logic function and divides, and actual can have other dividing mode when realizing, such as, multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical, machinery or other form.

The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add SFU software functional unit realizes.

The above-mentioned integrated unit realized with the form of SFU software functional unit, can be stored in a computer read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, comprising some instructions in order to make a computer installation (can be personal computer, server, or network equipment etc.) or processor (Processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. various can be program code stored medium.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims

1. a phonetic navigation method, is characterized in that, described method comprises:

Gather first phonetic order of user;

According to current navigation state, determine speech recognition modeling;

2. method according to claim 1, is characterized in that, first phonetic order of described collection user, comprising:

Second phonetic order of monitoring users;

3. method according to claim 1, is characterized in that, described according to current navigation state, determines speech recognition modeling, comprising:

4. method according to claim 1, is characterized in that, described according to current navigation state, determines speech recognition modeling, comprising:

5. method according to claim 3, is characterized in that, described according to described first voice identification result, performs guidance operation, comprising:

6. method according to claim 4, is characterized in that, described according to described first voice identification result, performs guidance operation, comprising:

7. a voice guiding device, is characterized in that, described device comprises:

Voice collecting unit, for gathering first phonetic order of user;

8. device according to claim 7, is characterized in that,

Described device also comprises:

9. device according to claim 7, is characterized in that, described model treatment unit, specifically for:

10. device according to claim 7, is characterized in that, described model treatment unit, specifically for:

11. devices according to claim 9, is characterized in that, described navigation performance element, specifically for:

12. devices according to claim 10, is characterized in that, described navigation performance element, specifically for: