CN103700369A

CN103700369A - Voice navigation method and system

Info

Publication number: CN103700369A
Application number: CN201310611734.9A
Authority: CN
Inventors: 高建清; 刘聪; 王智国; 胡国平; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2013-11-26
Filing date: 2013-11-26
Publication date: 2014-04-02
Anticipated expiration: 2033-11-26
Also published as: CN103700369B

Abstract

The invention discloses a voice navigation method and a voice navigation system and belongs to the technical field of voice processing. The method comprises the following steps: receiving a voice signal input by a user, uniformly decoding and identifying the voice signal based on various decoding networks so as to obtain text lexical bundles, wherein the various decoding networks comprise any two or three following decoding networks: a large-scale language model decoding network, an order word decoding network and a high-frequency decoding network, confirming operation corresponding to the text lexical bundles, and executing the operation. By adopting the voice navigation method and the voice navigation system, the identification property for personnel voice response of the user can be ensured.

Description

Phonetic navigation method and system

Technical field

The present invention relates to voice processing technology field, particularly a kind of phonetic navigation method and system.

Background technology

Voice Navigation technology has a wide range of applications in self-service customer service system instantly.Particularly the audio call navigational system based on answer-mode, owing to having higher classification accuracy, becomes the audio call mode of current main flow.

In prior art, the audio call navigational system of answer-mode is designed to sandwich construction by service menu conventionally, and on every layer of menu, system prompt sound is set, to guide user to select.As user speech input " I want to inquire about account detail ", system can enter the second-level menu of " account inquiries ", and prompting " you be to inquire about detail on the same day, historical detail or the appointed day detailed? " by supported type of service under the clear and definite current application environment of mode of menu displaying, help user to select.

Obviously the system prompt sound under answer-mode defines the scope that user answers to a certain extent, as under above-mentioned " account inquiries " application scenarios, the prompt tone of system is " you will inquire about detail on the same day, historical detail or appointed day detail? ", in grammer, comprise " same day is detailed ", " historical detailed ", " appointed day is detailed " three unit.To this, legacy system is general adopts the mode based on the identification of order word to resolve user input voice, and this method can obtain good recognition performance in the situation that user coordinates, and recognition efficiency is higher.Yet when user is not according to system prompt, while carrying out the input of order word, as to the prompting of system " you be to inquire about detail on the same day, historical detail or the appointed day detailed? " user answers " I want the account of looking at this morning ", and the command control effect based on menu setecting is difficult to guarantee.Also, adopt the speech guide system of traditional answer-mode, be difficult to guarantee the recognition performance to user individual voice response.

Summary of the invention

The embodiment of the present invention provides a kind of phonetic navigation method and system, to solve, existingly based on command control, cannot guarantee the problem to user individual voice response recognition effect.

The embodiment of the present invention provides following technical scheme:

On the one hand, the embodiment of the present invention provides a kind of phonetic navigation method, comprising:

Receive the voice signal of user's input;

Based on number of different types decoding network, described voice signal is unified to decoding identification, obtain text word string, described number of different types decoding network comprises following any two or three decoding network: extensive language model decoding network, order word decoding network, high frequency decoding network;

Determine operation corresponding to described text word string;

Carry out described operation.

Preferably, described method also comprises: build extensive language model decoding network, building process comprises:

Utilize corpus to build navigation field language model;

Collect the dialogic voice under particular navigation scene, and utilize described navigation field language model to decode to described dialogic voice, word string obtains decoding;

Utilize described decoding word string training particular navigation scene language model;

Described navigation field language model and described particular navigation scene language model are carried out to interpolation, obtain extensive language model decoding network.

Preferably, described method also comprises: build order word decoding network, building process comprises:

Collect the menu option under particular navigation scene, described menu setecting item comprises: pad name and another name thereof;

In parallel described menu option forms order word decoding network;

Utilize a gram language model average probability in described extensive language model that the weight of each word in described order word decoding network is set.

Preferably, described method also comprises: build high frequency decoding network, building process comprises:

Collect the high frequency language material under particular navigation scene;

In parallel described high frequency language material forms high frequency decoding network;

Utilize a gram language model average probability in described extensive language model that the weight of each word in described high frequency decoding network is set.

Preferably, describedly based on number of different types decoding network, described voice signal is unified to decoding identification, obtains text word string and comprise:

Based on extensive language model decoding network, to the identification of decoding of described voice signal, obtain the first score of decoded result;

Based on order word decoding network, to the identification of decoding of described voice signal, obtain the second score of decoded result;

Based on high frequency decoding network, to the identification of decoding of described voice signal, obtain the 3rd score of decoded result;

Select decoded result corresponding to maximum score in the first score, the second score and the 3rd score as described text word string.

Preferably, described method also comprises:

In the process of identification of described voice signal being decoded based on extensive language model decoding network, if there is default semantic associative key or expansion word in a paths, weight gain is preset in described decoding path;

Score using the score after gain as described decoding path.

Preferably, described definite operation corresponding to described text word string comprises:

If described text word string, for the decoded result of order word decoding network, is determined operation corresponding to described text word string according to semanteme corresponding to described decoded result;

Otherwise, described decoded result and described lists of keywords are carried out to keyword and mate, obtain matching result;

According to semanteme corresponding to described matching result, determine operation corresponding to described text word string.

Preferably, described method also comprises:

Business function is organized into multi-menu structure, and every layer of menu set up respectively to a lists of keywords;

Describedly described decoded result and lists of keywords carried out to keyword mate, obtain matching result and comprise:

Determine the menu level that current business is corresponding;

Obtain the lists of keywords of described menu level and following layer thereof;

Described decoded result and the lists of keywords of obtaining are carried out to successively keyword and mate, obtain matching result.

On the other hand, the embodiment of the present invention provides a kind of speech guide system, comprising:

Receiver module, for receiving the voice signal of user's input;

Decoder module, for described voice signal being unified to decoding identification based on number of different types decoding network, obtain text word string, described number of different types decoding network comprises following any two or three decoding network: extensive language model decoding network, order word decoding network, high frequency decoding network;

Determination module, for determining operation corresponding to described text word string;

Execution module, for carrying out described operation.

Preferably, described system also comprises following any two or three module:

First builds module, for building extensive language model decoding network;

Second builds module, for building order word decoding network;

The 3rd builds module, for building high frequency decoding network.

Preferably, described the first structure module comprises:

First language model unit, for utilizing corpus to build navigation field language model;

Decoding unit, for collecting the dialogic voice under particular navigation scene, and utilizes described navigation field language model to decode to described dialogic voice, and word string obtains decoding;

Second language model unit, for utilizing described decoding word string training particular navigation scene language model;

Interpolating unit, for described navigation field language model and described particular navigation scene language model are carried out to interpolation, obtains extensive language model decoding network.

Preferably, described the second structure module comprises:

Menu option unit, for collecting the menu option under particular navigation scene, described menu option comprises: pad name and another name thereof;

The first unit in parallel, forms order word decoding network for the described menu option of parallel connection;

The first weighted units, for utilizing a gram language model average probability of described extensive language model that the weight of each word in described order word decoding network is set.

Preferably, described the 3rd structure module comprises:

High frequency language material unit, for collecting the high frequency language material under particular navigation scene;

The second unit in parallel, forms high frequency decoding network for the described high frequency language material of parallel connection;

The second weighted units, for utilizing a gram language model average probability of described extensive language model that the weight of each word in described high frequency decoding network is set.

Preferably, described decoder module comprises:

The first decoding unit, for based on extensive language model decoding network to the identification of decoding of described voice signal, obtain the first score of decoded result;

The second decoding unit, for based on order word decoding network to the identification of decoding of described voice signal, obtain the second score of decoded result;

The 3rd decoding unit, for based on high frequency decoding network to the identification of decoding of described voice signal, obtain the 3rd score of decoded result;

Selected cell, for selecting decoded result corresponding to the maximum score of the first score, the second score and the 3rd score as described text word string.

Preferably, described the first decoding unit, also for the process described voice signal being decoded and identified based on extensive language model decoding network, if there is default semantic associative key or expansion word in a paths, weight gain is preset in described decoding path, and the score using the score after gain as described decoding path.

Preferably, described determination module comprises:

Judging unit, for judging whether described text word string is the decoded result of order word decoding network;

The first determining unit, for judging at described judging unit after the decoded result of described text word string for order word decoding network, determines operation corresponding to described text word string according to semanteme corresponding to described decoded result;

Keyword matching unit, after judging that at described judging unit described text word string is not the decoded result of order word decoding network, carries out keyword by described decoded result and lists of keywords and mates, and obtains matching result;

The second determining unit, for determining operation corresponding to described text word string according to semanteme corresponding to described matching result.

Preferably, described system also comprises:

Lists of keywords builds module, for business function being organized into multi-menu structure, and every layer of menu is set up respectively to a lists of keywords;

Described keyword matching unit comprises:

Menu level determining unit, for determining menu level corresponding to current business;

Lists of keywords acquiring unit, for obtaining the lists of keywords of described menu level and following layer thereof;

Matching unit, mates for described decoded result and the lists of keywords of obtaining are carried out to successively keyword, obtains matching result.

The phonetic navigation method that the embodiment of the present invention provides and system, combine the advantage of number of different types decoding network, by the voice signal to user's input, employing is unified decoding identification based on number of different types decoding network, obtain text word string and corresponding operation, thereby can identify user's personalized speech response.Guaranteeing under the prerequisite of recognition result, improving the dirigibility of user's response.Audio recognition method and the system of utilizing the embodiment of the present invention to provide, can improve user's experience greatly.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment be briefly described below, apparently, the accompanying drawing the following describes is only some embodiment that record in the present invention, for those of ordinary skills, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is Voice Navigation process schematic diagram of the prior art;

Fig. 2 is the phonetic navigation method process flow diagram that the embodiment of the present invention provides;

Fig. 3 is the building process schematic diagram of the extensive language model decoding network that provides of the embodiment of the present invention;

Fig. 4 is the building process schematic diagram of the order word decoding network that provides of the embodiment of the present invention;

Fig. 5 is the building process schematic diagram of the high frequency decoding network that provides of the embodiment of the present invention;

Fig. 6 is the structural representation of the speech guide system that provides of the embodiment of the present invention.

Embodiment

In order to make those skilled in the art understand better the scheme of the embodiment of the present invention, below in conjunction with drawings and embodiments, the embodiment of the present invention is described in further detail.

First Voice Navigation process of the prior art is simply introduced below.

As shown in Figure 1, be Voice Navigation process schematic diagram of the prior art.

In prior art, conventionally adopt based on order word decoding network the identification of decoding of the voice signal of user's input, specifically comprise following process:

Step 101: the voice signal that receives user's input;

Step 102: to the voice signal identification of decoding, obtain text word string based on order word decoding network;

Step 103: determine operation corresponding to text word string;

Step 104: executable operations.

Conventionally, order word decoding network is to form by collecting keyword structure common under particular navigation scene.Common keyword can be used to form menu option for pad name and another name thereof etc., thereby by the menu option the obtaining order word decoding network that obtains in parallel.Adopt order word decoding network to decode after identification to the voice signal of user's input, obtain text word string, text word is ganged up normal corresponding with user view, and text word string can be used as the decoded result of order word decoding network.

After obtaining decoded result, need the degree of confidence of judgement decoding identification, also the reliability of i.e. judgement decoding identification, adopts LRT(likelihood ratio testing, likelihood ratio test conventionally) add up.Suppose H ₀representative identification is correct, H ₁represent identification error, conventionally decoding optimal path score is designated as to p (X|H ₀), and be similar to p (X|H by the summation of other all paths score ₁).Conventionally, system is selected LLR(Log Likelihood Ratio, log likelihood ratio), as the degree of confidence score of recognition result: LLR=logp (X|H ₀)-logp (X|H ₁).Whether the degree of confidence score by recognition result is greater than predetermined threshold value, judges that whether recognition result is reliable.If be greater than predetermined threshold value, illustrate that recognition result is reliable; Otherwise, illustrate that recognition result is unreliable.

If recognition result is reliable, to obtain the semantic information that recognition result is corresponding, thereby can determine user view corresponding to text word string, operation corresponding to this user view is often corresponding with menu option.Due in the decoding of order word, each order word is the definite menu option of correspondence respectively, so be easy to obtain corresponding menu function according to order word.If recognition result is unreliable, point out user to re-enter voice signal, again to the voice signal identification of decoding.

Therefore, Voice Navigation of the prior art, can only be for order word navigation corresponding to menu option, for the voice signal outside menu option, because the recognition result obtaining is unreliable, circulation prompting user is repeated to input speech signal, cannot obtain effective recognition result.

For this reason, the embodiment of the present invention has proposed a kind of phonetic navigation method, can effectively identify user not according to the voice signal of menu prompt input.

As shown in Figure 2, be the phonetic navigation method process flow diagram that the embodiment of the present invention provides, comprise the following steps:

Step 201: the voice signal that receives user's input;

Step 202: based on number of different types decoding network, voice signal is unified to decoding identification, obtain text word string;

Step 203: determine operation corresponding to text word string;

Step 204: executable operations.

The phonetic navigation method that the embodiment of the present invention provides, utilize the decoding network of number of different types, the voice signal of user's input is unified to decoding identification, wherein, the decoding network of number of different types can be any two or three in extensive language model decoding network, order word decoding network and high frequency decoding network.

From merely voice signal decode to identify based on order word decoding network in prior art different be, the decoding network of the embodiment of the present invention based on number of different types unified decoding identification to voice signal, in dissimilar decoding network, search for optimal path, thereby guarantee the reliability of recognition result, and can identify the voice outside the order word that the menu option based on order word decoding network institute None-identified is corresponding merely.Particularly, after receiving the voice signal of user's input, extract the acoustic feature sequence of voice signal, obtain the decoding set of paths of unified decoding corresponding to acoustic feature sequence, and therefrom search for optimal path.

Respectively the building process of three kinds of related dissimilar decoding networks of the embodiment of the present invention is briefly introduced below.

The building process schematic diagram of the extensive language model decoding network providing for the embodiment of the present invention as shown in Figure 3.The building process of extensive language model decoding network is as follows:

Step 301: utilize corpus to build navigation field language model;

Step 302: collect the dialogic voice under particular navigation scene, and utilize navigation field language model to decode to dialogic voice, word string obtains decoding;

Step 303: utilize decoding word string training particular navigation scene language model;

Step 304: navigation field language model and particular navigation scene language model are carried out to interpolation, obtain extensive language model decoding network.

In embodiments of the present invention, navigation field language model is to be trained and obtained by the relevant corpus of collection navigation field, and wherein, training method can be the conventional training method in this area, does not repeat them here; The language model of navigation field is generally high level language model, and such as being ternary (Tri-gram) language model or binary (bi-gram) language model etc., the embodiment of the present invention does not limit.

Collect the dialogic voice under certain specific navigation scenarios, the navigation field language model that utilizes step 301 to build is decoded to dialogic voice, the word string that obtains decoding, then decoding word string is trained and obtained this particular navigation scene language model.Navigation field language model and particular navigation scene language model are carried out to interpolation, can obtain extensive language model decoding network.Interpolation is this area common technology means, below only briefly introduces.

Suppose that navigation field language model is first language model, its model unit (N-gram) adds up to N1, particular navigation scene language model is second language model, its model unit adds up to N2, N-gram in the language model that interpolation obtains adds up to N1+N2-(first language model and N-gram number identical in second language model), the probability of each N-gram be first language model and second language model corresponding units probability weight and.For example, in first language model, exist certain N-gram for " online-open-minded ", what " online " this word connect below is " open-minded ", its probability is P1, in second language model, also have " online-open-minded " this N-gram, its probability is P2, and in final mask, the probability of " online-open-minded " this N-gram is k*P1+ (1-k) * P2, wherein, k is interpolation weights.

As shown in Figure 4, be the building process schematic diagram of the order word decoding network that provides of the embodiment of the present invention.

The building process of order word decoding network is as follows:

Step 401: collect the menu option under particular navigation scene, wherein, menu option comprises pad name and another name thereof;

Step 402: menu option in parallel forms order word decoding network;

Step 403: utilize a gram language model average probability in extensive language model that the weight of each word in order word decoding network is set.

In embodiments of the present invention, collect the menu option under certain particular navigation scene, for example pad name and another name thereof, carry out parallel connection by the menu option of collecting, and can obtain order word decoding network.Because the score of the order word decoding network decoded result obtaining and the score of extensive language model decoding network decoded result cannot compare, in order to make the score of decoded result there is comparability, need to be optimized the order word decoding network obtaining.Specifically can use monobasic (uni-gram) language model average probability in extensive language model that the weight of each word in order word decoding network is set, by the score after weighting is processed, as the score of order word decoding network decoded result.For example, in order word decoding network, certain word is " opening surfing Internet with cell phone ", its corresponding acoustic model score represents with ScoreA, a gram language model average probability score in language model represents with ScoreB on a large scale, and the decoded result of this word must be divided into: ScoreA (open-minded)+ScoreB+ScoreA (mobile phone)+ScoreB+ScoreA (online)+ScoreB.

As shown in Figure 5, be the building process schematic diagram of the high frequency decoding network that provides of the embodiment of the present invention.

The building process of high frequency decoding network is as follows:

Step 501: collect the high frequency language material under particular navigation scene;

Step 502: high frequency language material in parallel forms high frequency decoding network;

Step 503: utilize a gram language model average probability in extensive language model that the weight of each word in high frequency decoding network is set.

In embodiments of the present invention, the building process of the building process of high frequency decoding network and mentioned order word decoding network is similar, and the high frequency language material under the particular navigation scene that difference is to collect, is different from pad name, for example, can be complete sentence.At high frequency language material in parallel, form after high frequency decoding network, high frequency decoding network is optimized to the process of processing, the optimizing process with reference to aforementioned order word decoding network, does not repeat them here.

In embodiments of the present invention, first based on number of different types decoding network, voice signal is unified to decoding identification, then in dissimilar decoding network, search for optimal path.By calculating the height of dissimilar decoding network decoded result score, carry out the selection of optimal path, using the highest decoding network of score as optimal path, decoded result corresponding to this decoding network is as text word string.For example, based on extensive language model decoding network, to the voice signal identification of decoding, obtain the first score of decoded result; Based on order word decoding network, to the voice signal identification of decoding, obtain the second score of decoded result; Based on high frequency decoding network, to the voice signal identification of decoding, obtain the 3rd score of decoded result; Select decoded result corresponding to maximum score in the first score, the second score and the 3rd score as unified decoding recognition result, i.e. text word string described in preceding step 202.

In the process of identification of voice signal being decoded based on extensive language model decoding network, if there is default semantic associative key or expansion word in a paths, in order to improve the score in efficient decoding path, guarantee its triumph in decoding competition, can further to this decoding path, preset weight gain, the score using the score after gain as this decoding path.Particularly, gain method is as follows:

(1) determine default semantic associative key or the expansion word of current path, and obtain its weight score p (x) in language model;

(2) weight score p (x) is preset to weight gain, as p ' (x)=p (x) * a, wherein, a>1, a is default weight gain;

(3) score using the score after gain as decoding path.

In the embodiment of the present invention, voice signal being unified to decoding identification, obtain after text word string, need to determine operation corresponding to text word string, can carry out corresponding operating.

With regard to how to confirm text word string, corresponding operation is simply introduced below:

First, judge whether text word string is the decoded result of order word decoding network; If so, according to semanteme corresponding to decoded result, determine operation corresponding to text word string; Otherwise, then decoded result and lists of keywords are carried out to keyword mate, obtain matching result; According to semanteme corresponding to matching result, can determine operation corresponding to text word string.

When text word string is the decoded result of order word decoding network, mean that user is according to menu prompt input speech signal, after identification that voice signal is decoded, can access the decoded result corresponding with menu option, now, according to semanteme corresponding to decoded result, can determine operation corresponding to text word string.

When text word string is not the decoded result of order word decoding network, also be that text word string is while being the decoded result of extensive language model decoding network or the decoded result of high frequency decoding network, mean that user is not according to menu prompt input speech signal, after identification that voice signal is decoded, cannot obtain the decoded result corresponding with menu option, now, need to utilize lists of keywords to carry out keyword coupling to decoded result.

In embodiments of the present invention, lists of keywords can build in advance, according to specific session operational scenarios, can obtain corresponding lists of keywords.Determine that the corresponding menu option of text word string need to determine respectively its operational order and/or parameter conventionally.Due to the corresponding text word of decoded result string, text word string and lists of keywords are carried out after keyword mates, can obtain matching result, this matching result comprises operational order and/or parameter conventionally, according to semanteme corresponding to matching result, can determine operation corresponding to text word string.

In actual applications, the keyword of corresponding different business function can be arranged in a lists of keywords, when mating, directly call this lists of keywords and carry out keyword coupling.And conventionally because service application has hierarchical relationship, therefore, in order further to improve matching efficiency, business function can be organized into multi-menu structure, every layer of menu set up a local lists of keywords.Correspondingly, when carrying out keyword coupling, can first determine the menu level that current business is corresponding, obtain the lists of keywords of this menu level and following layer thereof, then by text word string and lists of keywords are carried out to successively keyword, mate, obtain matching result.In described matching result, can comprise operational order and/or parameter, thereby obtain operation corresponding to text word string.

" surfing Internet with cell phone " business of take is below example, and keyword matching technique in the embodiment of the present invention is elaborated:

" surfing Internet with cell phone " business for example, corresponding operation is " open-minded ", " cancellations ", " inquiry entry-into-force time " etc., the parameter of correspondence is " ten yuan of set meals ", " 20 yuan of set meals " etc.

Keyword matching process is as follows:

(1) according to session operational scenarios, obtain its corresponding lists of keywords.

For example, selecting under " surfing Internet with cell phone " business scenario, its prompt tone is " could you tell me and will open or cancel set meal? ", its lists of keywords is " open-minded ", " cancellation ", " entry-into-force time " etc.; Selecting under " set meal type " business scenario, its prompt tone is " good, open online set meal, we have five yuan, ten yuan, 20 yuan multiple set meals, could you tell me any? ", its lists of keywords is " five yuan ", " ten yuan ", " 20 yuan " etc.

(2) text word string and lists of keywords are mated, obtain matching result.

For example, continuous text word string is " I look on the bright side of things and lead to five yuan of online set meals ", and its lists of keywords is " open-minded ", " cancellation ", " entry-into-force time " etc., first matches " open-minded ", and corresponding session operational scenarios is " opening online set meal ".

The phonetic navigation method that the embodiment of the present invention provides, unifies decoding identification based on number of different types decoding network to voice signal, has solved preferably the problem of user individual voice response identification.If user replys according to suggestion voice, order word decoding network can be identified fast; If user does not reply according to suggestion voice, extensive language model decoding network and/or high frequency decoding network, coordinate keyword matching technique, can realize the identification correct to voice signal.Adopt this phonetic navigation method, can guarantee the recognition performance to user individual voice response.

Correspondingly, the embodiment of the present invention also provides a kind of speech guide system, as shown in Figure 6, is the structural representation of the speech guide system that provides of the embodiment of the present invention.

In the embodiment of the present invention, speech guide system, comprising:

Receiver module 601, for receiving the voice signal of user's input;

Decoder module 602, for based on number of different types decoding network, described voice signal being unified to decoding identification, obtains text word string;

Determination module 603, for determining operation corresponding to text word string;

Execution module 604, for executable operations.

Above-mentioned number of different types decoding network can comprise following any two or three decoding network: extensive language model decoding network, order word decoding network, high frequency decoding network.

Correspondingly, the speech guide system of the embodiment of the present invention can also comprise the module that builds above-mentioned each decoding network, is respectively: first builds module, second builds module and the 3rd structure module.Wherein:

First builds module, and for building extensive language model decoding network, this first structure module can comprise: first language model unit, for utilizing corpus to build navigation field language model; Decoding unit, for collecting the dialogic voice under particular navigation scene, and utilizes navigation field language model to decode to dialogic voice, and word string obtains decoding; Second language model unit, for utilizing decoding word string training particular navigation scene language model; Interpolating unit, for navigation field language model and particular navigation scene language model are carried out to interpolation, obtains extensive language model decoding network.

Second builds module, and for building order word decoding network, this second structure module can comprise: menu option unit, and for collecting the menu option under particular navigation scene, described menu option comprises: pad name and another name thereof; The first unit in parallel, forms order word decoding network for the described menu option of parallel connection; The first weighted units, for utilizing a gram language model average probability of described extensive language model that the weight of each word in described order word decoding network is set.

The 3rd builds module, and for building high frequency decoding network, the 3rd builds module can comprise: high frequency language material unit, for collecting the high frequency language material under particular navigation scene; The second unit in parallel, forms high frequency decoding network for the described high frequency language material of parallel connection; The second weighted units, for utilizing a gram language model average probability of extensive language model that the weight of each word in high frequency decoding network is set.

In actual applications, decoder module 602 can comprise:

The first decoding unit, for based on extensive language model decoding network to the voice signal identification of decoding, obtain the first score of decoded result;

The second decoding unit, for based on order word decoding network to the voice signal identification of decoding, obtain the second score of decoded result;

The 3rd decoding unit, for based on high frequency decoding network to the voice signal identification of decoding, obtain the 3rd score of decoded result;

Selected cell, for selecting decoded result corresponding to the maximum score of the first score, the second score and the 3rd score as text word string.

In embodiments of the present invention, the first decoding unit, also for the process voice signal being decoded and identified based on extensive language model decoding network, if there is default semantic associative key or expansion word in a paths, weight gain is preset in decoding path, and the score using the score after gain as decoding path.

It should be noted that, in actual applications, can be according to application needs, 602 of decoder modules comprise any two kinds in above-mentioned the first decoding unit, the second decoding unit and the 3rd decoding unit.

In embodiments of the present invention, determination module 603 can comprise:

Judging unit, for judging whether text word string is the decoded result of order word decoding network;

The first determining unit, after the decoded result for order word decoding network at judging unit judgement text word string, determines operation corresponding to text word string according to semanteme corresponding to decoded result;

Keyword matching unit, after not being the decoded result of order word decoding network at described judging unit judgement text word string, carrying out keyword by decoded result and lists of keywords and mates, and obtains matching result;

The second determining unit, for determining operation corresponding to text word string according to semanteme corresponding to matching result.

It should be noted that, in actual applications, the keyword of corresponding different business function can be arranged in a lists of keywords, keyword matching unit, when mating, directly calls this lists of keywords and carries out keyword coupling.In addition, in order further to improve matching efficiency, can also set up and each layer of multilayer lists of keywords that menu is corresponding.Correspondingly, keyword matching unit adopts the successively mode of coupling to carry out keyword matching treatment, obtains matching result.

Such as, in another embodiment of system of the present invention, described system can also comprise:

Lists of keywords builds module, for business function being organized into multi-menu structure, and every layer of menu is set up respectively to a lists of keywords.

Correspondingly, in this embodiment, described keyword matching unit can comprise:

Above-mentioned lists of keywords structure module is set and can facilitates system development personnel to carry out function setting flexibly according to the different application demand of user, improve development efficiency.

The phonetic navigation method that above-described embodiment provides and system belong to same inventive concept, and its specific implementation process refers to embodiment of the method, repeats no more here.

The speech guide system that the embodiment of the present invention provides, unifies decoding identification based on number of different types decoding network to voice signal, has solved preferably the problem of user individual voice response identification.If user replys according to menu option suggestion voice, order word decoding network can be identified fast; If user does not reply according to menu option suggestion voice, extensive language model decoding network and high frequency decoding network, coordinate keyword matching technique, can realize voice signal is carried out to correct identification.Adopt this speech guide system, can guarantee the recognition performance to user individual voice response.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually referring to, each embodiment stresses is the difference with other embodiment.Especially, for system embodiment, because it is substantially similar in appearance to embodiment of the method, so describe fairly simplely, relevant part is referring to the part explanation of embodiment of the method.System embodiment described above is only schematic, the wherein said unit as separating component explanation can or can not be also physically to separate, the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of module wherein to realize the object of the present embodiment scheme.Those of ordinary skills, in the situation that not paying creative work, are appreciated that and implement.

The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. a phonetic navigation method, is characterized in that, comprising:

Receive the voice signal of user's input;

Determine operation corresponding to described text word string;

Carry out described operation.

2. method according to claim 1, is characterized in that, described method also comprises: build extensive language model decoding network, building process comprises:

Utilize corpus to build navigation field language model;

3. method according to claim 1, is characterized in that, described method also comprises: build order word decoding network, building process comprises:

Collect the menu option under particular navigation scene, described menu option comprises: pad name and another name thereof;

In parallel described menu option forms order word decoding network;

4. method according to claim 1, is characterized in that, described method also comprises: build high frequency decoding network, building process comprises:

Collect the high frequency language material under particular navigation scene;

5. according to the method described in claim 1 to 4 any one, it is characterized in that, describedly based on number of different types decoding network, described voice signal unified to decoding identification, obtain text word string and comprise:

6. method according to claim 5, is characterized in that, described method also comprises:

Score using the score after gain as described decoding path.

7. according to the method described in claim 1 to 4 any one, it is characterized in that, determine that operation corresponding to described text word string comprises:

Otherwise, described decoded result and lists of keywords are carried out to keyword and mate, obtain matching result;

8. method according to claim 7, is characterized in that, described method also comprises:

Determine the menu level that current business is corresponding;

9. a speech guide system, is characterized in that, comprising:

Receiver module, for receiving the voice signal of user's input;

Execution module, for carrying out described operation.

10. system according to claim 9, is characterized in that, described system also comprises following any two or three module:

First builds module, for building extensive language model decoding network;

Second builds module, for building order word decoding network;

The 3rd builds module, for building high frequency decoding network.

11. systems according to claim 10, is characterized in that, described first builds module comprises:

12. systems according to claim 10, is characterized in that, described second builds module comprises:

13. systems according to claim 10, is characterized in that, the described the 3rd builds module comprises:

14. according to the system described in claim 9 to 13 any one, it is characterized in that, described decoder module comprises:

15. systems according to claim 14, is characterized in that,

Described the first decoding unit, also for the process described voice signal being decoded and identified based on extensive language model decoding network, if there is default semantic associative key or expansion word in a paths, weight gain is preset in described decoding path, and the score using the score after gain as described decoding path.

16. according to the system described in claim 9 to 13 any one, it is characterized in that, described determination module comprises:

17. systems according to claim 16, is characterized in that, described system also comprises:

Described keyword matching unit comprises: